

Ask HN: Does anyone have experience scraping dynamic content? - sid_viswanathan

http://www.nfl.com/scores<p>For example, for this URL, there is a section on the right for "Big Play Highlights" and the first one listed is "L.Brown 7-yard TD pass from..."<p>It looks like this data is being loaded via some kind of AJAX call.  Do you have any ideas on how I can scrape this stream of highlights data? I've never tried to scrape any dynamic content in the past.<p>Ideas?
======
AznHisoka
PhantomJS/CasperJS is what I've used for my current scraping projects. They're
headless browsers and imitate a browser session. Just specify a fake user
agent like Mozilla, and you're good to go.

------
bartonfink
You _MAY_ be able to do this with Selenium, although I've never used Selenium
to scrape streaming requests. It's going to depend quite a bit on how the page
is structured.

I have used Selenium to scrape dynamic content before by waiting for new DOM
elements to be populated by AJAX, so I know this sort of thing is possible in
a way you couldn't do with wget.

------
jfaucett
yea, just look at the http headers, here's the call:
[http://www.nfl.com/liveupdate/scores/bigPlayVideos.json?rand...](http://www.nfl.com/liveupdate/scores/bigPlayVideos.json?random=1345242430000)

just make your own tstamp for random

------
bonsai
HtmlUnit has very good support for javascript.
<http://htmlunit.sourceforge.net/>

