Hacker News new | comments | show | ask | jobs | submit login

Nice catch. I'm not so sure about:

  A simple fix will be just crawling the links without the request parameters so that we don’t have to suffer.
Many links would fail/have different content if the request parameters were removed from the URL. Perhaps the crawler could use some kind of reverse bloom filter [1] to be more careful/back off if it receives the same content from multiple URLs. However nothing is simple at Google scale so there are probably issues with this approach too.

[1]: http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a...




You can always change that to

    =image("http://targetname/1.jpg")   
    =image("http://targetname/2.jpg")   
    =image("http://targetname/3.jpg")


But what if 2.jpg doesn't exist? Or is a trivially small file?

The advantage of the querystring-method is that you can just find one suitable (i.e. huge) file and force Google to pull it down many times.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: