Hacker News new | past | comments | ask | show | jobs | submit login
Errant cron task yields yearlong time lapse of nytimes.com (joshuanguyen.com)
61 points by ChrisArchitect on July 18, 2011 | hide | past | favorite | 20 comments



I created a web application and set of scripts late last year to snapshot sites like that on a daily/hourly/minutely interval. Also set up the web app to manage the captured images and turn them into videos.

Some of the interesting things I found:

- interesting to compare news sites coverage of the same news stories - see who publishes stories first and where on the page...

- quickly analyse site ad and content refresh rates

- instant time lapse videos from web cam sites

- some interesting artistic effects as content changes and moves on sites

- "photographic record" of web sites was interesting to see some sites not update or be broken at times

- very easy to generate gigabytes of content in small amounts of time!

I have't have time to extend the project further right now, but I still have jobs running capturing some of the top sites daily to get some year-long web-time-lapse videos and do something with the content. If anyone has ideas to commercialize the content or technology, let me know.

Note: Technologies used - Ubuntu, bash, CutyCapt, JSP, ImageMagick


Sorry, just realized more original source is http://news.ycombinator.com/item?id=2777508


The original post suffers from poor titling. Phill MV would have been better off taking a page from TechCrunch's confrontational naming style.


Ah, but the Dylan reference was TOO GOOD TO PASS UP.


Or with partisan hackery and tech-buzzword flavoring: "cron effortlessly disembowels nytimes.com temporal perception"


Mid-terms at 1:19 -- http://www.youtube.com/watch?v=sCKGOiauJCE&feature=playe...

Fun to see the results pile in -- then the usual reaction shots from pols (frowns and smiles depending on party).


I must confess, I actually watched the entire video! Much better than looking through hundreds of screen-captures in a list and interestingly entertaining!

It's striking to me the number of watch makers that advertised through the course of the year. The ads primarily caught my attention - which is strange, because I rarely look at ads when browsing sites. The one constant, from all the ads, was a watch manufacturer.

Curious if the person who captured these images had a browsing history for watches, or if that's what everyone witnessed? Next experiment: Two completely different users capture these on the same time interval -- side by side comparison! ;-)


>Curious if the person who captured these images had a browsing history for watches, or if that's what everyone witnessed?

Alas, no. I used a webkit to jpeg generator that should, in theory, be pretty void of any browsing history. I'd be surprised if they've started tracking those by ip!


You shouldn't be too surprised, it's quite possible they were tracking some combination of IP, user agent, and a number of other things to identify the browser. I don't know that you would have ended up having a "watch" preference, though, especially without clicking on watch ads or visiting a watch site.


I would love to see the times adopt a layout for their front page which is more web based and less like imitating the front page of a paper, at 5 columns across in places it is too squished together at only 970px across.

It would actually be a great candidate for responsive design, could make the current columns setup look far nicer with more width when it is available then remove columns where there is less screen width available, similar to http://theconversation.edu.au/ which has a similar amount of columns across.


how do you use cron to take screenshots like this? does a browser window have to be open somewhere the whole time?


I used http://code.google.com/p/wkhtmltopdf/ . It's an amazing project that I try to use as often as possible, especially whenever a client requires pdfs.


"Open," yes, but not visible. You can use Xvfb[1] which does everything in memory but doesn't actually show any image.

[1]: http://en.wikipedia.org/wiki/Xvfb


I've often found myself needing -- and thinking a/b building -- a webapp/site for taking/collecting screenshots of other sites. But I'm not sure it'd actually be useful and I certainly don't know if anyone'd pay for it.


I was thinking of building something like this recently to test my Django + background worker skills.

http://cutycapt.sourceforge.net/ is one of better screen capture engines I found for OSX. Most are based on Webkit/QT and Xvfb for Unix support.


There were quite a few of these that popped up ~5 years ago. I think most of them are gone.


Yeah, most of those were for thumbnailing.

What I'm talking about is most like a screenshot version of archive.org.


That would be useful. Archive.org is better at not breaking than I imagined it would be, but still often leaves broken assets. The combination of the two would be really cool: see the page, and how the page was viewed at the time.


one odd thing about the video is it should be 7+ mins long, but shows up in YouTube as 5:36 or something. Trick: Watch in 480 mode, and it keeps playing to the full length :-S

You gotta keep watching so you don't miss some more big news events like Osama's death !


This just in.... users of NYT.com like yogurt (see banner ads).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: