Why GPT-based then? There are libraries that do this: You give examples, they generate the rules for you and give you a scraper object that takes any html and returns the scraped data.
Great projects, thank you for the links.
On a brief scan neither cover paging/loops - or js frameworks where one would need to use headless browsers and wait for content to load, where a low/lazy code solution might provide the most added value.
Mine: https://github.com/lorey/mlscraper Another: https://github.com/alirezamika/autoscraper