

Errant cron task yields yearlong time lapse of nytimes.com - ChrisArchitect
http://blog.joshuanguyen.com/post/7766138893/due-to-an-errant-cron-task-that-ran-twice-an-hour

======
zenpaul
I created a web application and set of scripts late last year to snapshot
sites like that on a daily/hourly/minutely interval. Also set up the web app
to manage the captured images and turn them into videos.

Some of the interesting things I found:

\- interesting to compare news sites coverage of the same news stories - see
who publishes stories first and where on the page...

\- quickly analyse site ad and content refresh rates

\- instant time lapse videos from web cam sites

\- some interesting artistic effects as content changes and moves on sites

\- "photographic record" of web sites was interesting to see some sites not
update or be broken at times

\- very easy to generate gigabytes of content in small amounts of time!

I have't have time to extend the project further right now, but I still have
jobs running capturing some of the top sites daily to get some year-long web-
time-lapse videos and do something with the content. If anyone has ideas to
commercialize the content or technology, let me know.

Note: Technologies used - Ubuntu, bash, CutyCapt, JSP, ImageMagick

------
ChrisArchitect
Sorry, just realized more original source is
<http://news.ycombinator.com/item?id=2777508>

~~~
peteforde
The original post suffers from poor titling. Phill MV would have been better
off taking a page from TechCrunch's confrontational naming style.

~~~
phillmv
Ah, but the Dylan reference was TOO GOOD TO PASS UP.

------
nostromo
Mid-terms at 1:19 --
[http://www.youtube.com/watch?v=sCKGOiauJCE&feature=playe...](http://www.youtube.com/watch?v=sCKGOiauJCE&feature=player_embedded#at=1m19s)

Fun to see the results pile in -- then the usual reaction shots from pols
(frowns and smiles depending on party).

------
codeslush
I must confess, I actually watched the entire video! Much better than looking
through hundreds of screen-captures in a list and interestingly entertaining!

It's striking to me the number of watch makers that advertised through the
course of the year. The ads primarily caught my attention - which is strange,
because I rarely look at ads when browsing sites. The one constant, from all
the ads, was a watch manufacturer.

Curious if the person who captured these images had a browsing history for
watches, or if that's what everyone witnessed? Next experiment: Two completely
different users capture these on the same time interval -- side by side
comparison! ;-)

~~~
phillmv
>Curious if the person who captured these images had a browsing history for
watches, or if that's what everyone witnessed?

Alas, no. I used a webkit to jpeg generator that should, in theory, be pretty
void of any browsing history. I'd be surprised if they've started tracking
those by ip!

~~~
mrkurt
You shouldn't be too surprised, it's quite possible they were tracking some
combination of IP, user agent, and a number of other things to identify the
browser. I don't know that you would have ended up having a "watch"
preference, though, especially without clicking on watch ads or visiting a
watch site.

------
robryan
I would love to see the times adopt a layout for their front page which is
more web based and less like imitating the front page of a paper, at 5 columns
across in places it is too squished together at only 970px across.

It would actually be a great candidate for responsive design, could make the
current columns setup look far nicer with more width when it is available then
remove columns where there is less screen width available, similar to
<http://theconversation.edu.au/> which has a similar amount of columns across.

------
roadnottaken
how do you use cron to take screenshots like this? does a browser window have
to be open somewhere the whole time?

~~~
icebraining
"Open," yes, but not visible. You can use Xvfb[1] which does everything in
memory but doesn't actually show any image.

[1]: <http://en.wikipedia.org/wiki/Xvfb>

------
adamhowell
I've often found myself needing -- and thinking a/b building -- a webapp/site
for taking/collecting screenshots of other sites. But I'm not sure it'd
actually be useful and I certainly don't know if anyone'd pay for it.

~~~
pyre
There were quite a few of these that popped up ~5 years ago. I think most of
them are gone.

~~~
adamhowell
Yeah, most of those were for thumbnailing.

What I'm talking about is most like a screenshot version of archive.org.

~~~
mnutt
That would be useful. Archive.org is better at not breaking than I imagined it
would be, but still often leaves broken assets. The combination of the two
would be really cool: see the page, and how the page was viewed at the time.

------
ChrisArchitect
one odd thing about the video is it should be 7+ mins long, but shows up in
YouTube as 5:36 or something. Trick: Watch in 480 mode, and it keeps playing
to the full length :-S

You gotta keep watching so you don't miss some more big news events like
Osama's death !

------
morgandev
This just in.... users of NYT.com like yogurt (see banner ads).

