Twitter Screen Scraping and FFT-Based Cross Fade

Craige_Hales · Nov 26, 2017 09:50 PM

A previous post described a complicated set of steps to download a stream of Twitter data. This attached JSL describes a simpler way to get a smaller amount of data by screen-scraping Twitter's Advanced Search (no login required).

Screen-scraping doesn't actually use the data on the screen. Instead, the HTML that would normally be used to display a web page on the screen is captured and analyzed. In this example the analysis is pretty simple-minded: look for URLs ending with .jpg. A URL is the web address of a resource, and a URL ending with .jpg is a picture that can be downloaded (normally by your browser) and saved (in this example).

Because this Twitter interface delivers new pictures slowly, and the same pictures again and again, the JSL has some logic to prevent redundant downloads and ignore certain kinds of pictures. If you write your own screen-scraping code, you'll wind up with similar rules. You'll probably write more complicated code to analyze the HTML to extract related meta data, such as the tweeted text that goes with the picture.

To keep the display interesting, the JSL redisplays old pictures while waiting at least a minute before asking Twitter to resend the query. There are many ways to transition from one picture to the next. Rather than use a simple cross fade, this JSL uses an FFT to convert the images from the 2D spatial domain to the frequency domain and applies the cross fade in the frequency domain. It is slow, and just an experiment. Here's a video with images I have permission to use (unlike most of the images on Twitter.) It is also using a bigger FFT and more transition frames than the JSL because it doesn't have to run in real time.

The attached JSL contains the search term football. That seemed to produce pretty good results this weekend. Here's one picture showing the score for a game between two near-by universities.

update: JSL issues