Hacker News new | comments | show | ask | jobs | submit login

Nice catch. I'm not so sure about:

  A simple fix will be just crawling the links without the request parameters so that we don’t have to suffer.
Many links would fail/have different content if the request parameters were removed from the URL. Perhaps the crawler could use some kind of reverse bloom filter [1] to be more careful/back off if it receives the same content from multiple URLs. However nothing is simple at Google scale so there are probably issues with this approach too.

[1]: http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...

You can always change that to


But what if 2.jpg doesn't exist? Or is a trivially small file?

The advantage of the querystring-method is that you can just find one suitable (i.e. huge) file and force Google to pull it down many times.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact