

Ask HN: Is it possible to overwhelm a web-crawler with infinite links? - hellbanner

mysite.com&#x2F;:id<p>shows some text, links to<p>mysite.com&#x2F;:id+1<p>Would a crawler keep hitting these infinity of pages? What scale would it take to bring down a crawler?
======
nvader
Consider that every page that the crawler must download is a page's worth of
bandwidth your server must deliver. If these pages are generated, your server
need to spend the computation to generate that too.

Add to this the fact that a crawler is able to stop and back off if its
resource constraints are being overwhelmed, but as a webserver that may be
serving content that is "mission-critical", for lack of a better phrase, your
server probably needs to maintain a low latency. In addition, there are
multiple web crawlers out there, that could hit your server at the same time.

Given that, I think the scheme you've described is a mug's game, where for
every dollar's worth of damage you do to a crawler, you're doing more than a
dollar of damage to yourself.

------
RKoutnik
It might be possible to kill a naive crawler with some infinitely-spiralling
link web. However, most modern crawlers are built with this in mind and will
stop after a certain recursive level is met.

