Listly.io is my private work built days ago.
I hope to hear opinions if it is useful for you... or not.
Listly.io turn HTML to Excel in seconds without coding. It finds the pattern of repeated structure and extracts all of image links and texts. It does find not tags (table, ul ...), but the structure.
Ideally for developers, I think API would be the best way to adapt this extractor to other scraper or your own scraper.
https://hastebin.com/eguluvoquq.html
Actually, any (partial or full) HTML source code is available; <div></div>, <p></p>, <span></span>,<html></html>, and etc.
Following to your advice, I changed the placeholder description to "any HTML Source Code".
Secondly, my server returns 500 error only if there is nothing to extract such as your code. I will fix it soon.
Thank you.
https://support.google.com/docs/answer/3093339?hl=en
I use it all the time for better sorting, filtering, etc.
Compared to it, listly.io works well with all types of tag if there are repeated structures.
In my experiment, it works well with hunderds kinds of web sites.
e.g. Google/Bing search result, Amazon/Walmart/Ebay product list, Twitter/Facebook/Tumblr posts, Twitch list, Bloomberg finance info, Threads of a forum, Instagram comments, and etc.
In addition, it also works well in seconds with Craiglist apts / housing page. http://seoul.craigslist.co.kr/search/apa
Sorry for being slow. This is my private work. I could not predict a lot of new visitors, I need to scale up and out the server.
https://www.import.io/
I tried writing a script to do the same thing before - turns out finding the element on the page with the most children and assuming each child is an entry works surprisingly often.
Import.io needs user's click to determine what to extract, thus, the user has to repeat it whenever the web page changes.
Listly.io needs URL or HTML codes. It always works even if the web page chages.
Now if you would have this generate a graphQL spec file, you could run a graphQL server acting as a proxy to lots of websites.
That would be interesting. Not sure how that fares with the websites' owners' ToS though.
https://app.parseur.com
http://webscraper.io/
their marketing is poor but the product is very powrful
