Hacker News new | past | comments | ask | show | jobs | submit login

The cluster is interesting, but not as interesting as the giant corpus of SWF files Google got to use. Do you think the government has a crawl as complete as Google's under its hat? How? People notice when the Googlebot does new things. Wouldn't we noticed the Fedbot?

Quite true. Google does have a lot of data. But, I'd wager the NSA has just as much data, just from different sources. Maybe they couldn't fuzz Flash with the optimal set of .swf files, but they could mine vast numbers of voice conversations for correlations.

Additionally, years ago a friend of mine who I'd lost contact with caught up with me and told me he found a cached copy of a website I'd taken down in his employer's equivalent to the Wayback Machine. His employer was a branch of the federal government. I know my anecdote doesn't prove anything, let alone come close to addressing the difficulty of crawling the web without anyone noticing (intercept all http traffic in transit?), but the fact remains that there are literally tons of computers doing something for the government.

Perhaps Fedbot crawls in a less deterministic manner, uses a lot of different ips, and sets user agent to IE?

I suspect "fedbot" works by calling up google and saying "Hi, it's us again, we've got another white van on the way to the googleplex, have a petabyte or two of the Internet ready for us to collect in 20 minutes. thanks"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact