Hacker News new | comments | show | ask | jobs | submit login

Hi, I didn't release the code of the crawler, first, because it was not well-crafted enough to be released (quick and dirty linear programming), and second, because any change in the site you crawl calls for recrafting your code.

I used python, sometimes with Beautifulsoup, sometimes with lxml, both are very good for crawling. I would say BS is easier, and LXML cleaner.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: