

Feeding Reddit to Hadoop (w/ Clojure) - ghotli
http://www.bestinclass.dk/index.php/2010/01/hadoop-feeding-reddit-to-hadoop/

======
jedberg
Interesting article, and it also explains who was pounding us with traffic. :)

Protip: If you are going to scrape a website, put some sleeps into your loop
so the person who runs the website doesn't curse your name while their
automatic rate limiters kick in and block you anyway.

~~~
philjackson
He actually did mention using sleep, in a sort of "do as I say, not as I do"
kind of way :)

