I wonder if there are any startups that specialize in extracting the actual content from bloated web pages, caching it, and serving it via API.
To do the extraction effectively you might need some kind of hand-crafted scraper, or maybe access to GPT-4-like image understanding. Or just extract all of the text and process it with Claude or another system that has a large context window. To throw out all of the extraneous text like ads, navbars, random junk that is only slightly related, etc.
Or just serving cached static content would be useful to speed up the process.
Does Bing search or Google Search give you some of the actual content?