Hacker News new | past | comments | ask | show | jobs | submit login

is this really a contrarian opinion? i feel like this is a pretty obvious fact that it's easier to read network requests to scrape a website than parsing html



I'm not sure why, but many scraping tools handle javascript websites by emulating clicks and running the javascript.


I think the idea is that clicking on the button is less likely to change suddenly than whatever protocol they're running on top of AJAX.

In my experience internal layout changes seem to happen way more often than changes to the AJAX handlers.


Is there a specific tool you would recommend for doing this?


Chrome Dev Tools is pretty awesome. You can right-click a request and get an 'curl' request with cookies and everything that can be replayed on a terminal.


To read network requests? Charles proxy is great!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: