I've previously written a very similar project called "graphql-scraper" (which is arguably a far less cool name...), you can check it out at http://github.com/lachenmayer/graphql-scraper
It works very similarly, with only superficial differences under the hood (eg. I used jsdom, and this uses cheerio). The `waitForSelector` feature is very cool!
I remember seeing GDOM a while back when I first started this project, but forgot to write it down as a source of inspiration. I'm gonna add all of these as alternatives, because they're all great :D
Are you planning to build anything on top of this - service,company? I was thinking it would be a good way to build an api for some projects I've been thinking of working on, although I would probably want to switch out cheerio for https://github.com/intoli/remote-browser/
I've been looking for something like this! I'm trying to play around with it but can't seem to get the selector right. How do I grab a table `td` by its nth selector (tried `td:nth-of-type(n)` to no avail)?
Great project! I can imagine this may greatly improve web certain classes of scraping. @gavino I'm curious what tooling and architecture you used to put this together?
Sure! The backend is actually pretty straight forward, it's a NextJS app deployed on Now with a few added endpoints to handle the incoming GraphQL queries.
Then for actually turning the query into a digestable output I used the GraphQL schema builder that handles accepts HTML nodes from the requested page and grabs the right variables.
It works very similarly, with only superficial differences under the hood (eg. I used jsdom, and this uses cheerio). The `waitForSelector` feature is very cool!
You can see a live demo of the HN example using graphql-scraper at https://graphqlbin.com/v2/lxNohP
This example is deployed on Glitch - you can easily spin up your own using https://github.com/lachenmayer/graphql-scraper-server (with 1-click deploys to Heroku, Now & Glitch)
Of course (as mentioned already) there is also https://github.com/syrusakbary/gdom which uses Python+Graphene.