Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Crawling is a problem that is fundamentally different from serving up dynamic websites. I've written a php based crawler framework, it simply uses a database as a central synchronization area and a bunch of parallel scripts to do the crawling.

I could have done the same using curl but for technical reasons chose to do it this way. It's definitely a work around kind of solution though.

I hope to replace the whole thing with a clojure/jvm based solution.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: