Hacker News new | past | comments | ask | show | jobs | submit login
Aspider: A lightweight, asynchronous micro-framework based on asyncio (github.com/howie6879)
56 points by howie6879 on Sept 10, 2018 | hide | past | favorite | 10 comments



A "lightweight" "micro-framework" that downloads and runs Chrome in order to make an http request? :/


Hey, at least they aren't bogged down by silly things like controlling what version of deps get installed. aspider depends on pyppeteer (with no version pinning) to download a copy of chromium (with no checksum validation).


It's for scraping, as far as I know the best way to do that these days is with headless Chrome, and then probably injecting some custom JS. What this provides over just using puppeteer or pyppeteer directly I'm not sure...


Is there a better way to scrape nowadays? There's so much javascript that needs to be loaded and run to make a site usable. Scrapy doesn't really do enough anymore.


Framework feels like an odd choice of words. I would call it an async web spider library or async web scraping library.

When I hear framework I think client interaction - whether cli, API, or web page.


Hi lilbobbytables: Thanks for your suggestion, I have modified it.


I think Puppeteer also downloads a recent version of Chrome/Chromium that is guaranteed to work with the API. Why is this out of the ordinary for a headless web scraping framework?


Hi akdor1154: This download will only be triggered if you crawl the page loaded by js.


Title should mention that its a web scraping framework


I am so sorry, you are right, it is my problem.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: