I know the feeling well. I spent a few years on and off few days at a time as a ...

mcwhittemore · 2025-02-28T17:44:17 1740764657

An open index of the web is an interesting idea that I've heard come up a few times now. Putting ever page into a one of 200 categories would be a pretty interesting undertaking. You'd probs need to fund it by "grants" or something from AI companies though...

One thing I feel is getting clear for me is that we often think about this problem from a "how to solve it to make it stay open" POV, but what the user ex and thus the product you are making is unclear. If you made this index, the product would be the data and so you'd be push to selling there. What is the product an end user like ourselves would pay for that helps? RSS feed readers seem to be the genearl suggestion so far of what we have today.

anenefan · 2025-03-01T03:45:54 1740800754

I have given up thinking about it much these days -- I guess I'm just pretty jaded with the direction the web went and most people are happy with it. There are probably better and more practical solutions than what I had in mind - besides it had got quite complicated though I may have just got myself caught up in needless complexity.

Still I should clarify, 200 or more definable characteristics, wasn't aimed all that much at the topic ie not purely a tool that could categorise topics like a library card index.

I was more concerned with outing the rubbish within the web and thus enable quick searches and more productive for the user, as well as saving hours of a desktop or laptop being on while the user conducted an endless search while they dealt with the many fruitless and pointless results.

... and on second though removed the really long reply that roughly outlined it. I will say characteristics are things like if the site is a banner (contact, about, low signal) ... if it's a mirror / pointer type site ... if the site is just a front end to some other place ... geolocation ... dynamic vs static information ... forum / blog / etc ... a new one comes to mind present day, if the site is behind a site protection service (some are awful - I'm so tired of seeing ray id just because I prefer to use an older browser) ... the site hosts any files on offer, code or text based, rather than rely entirely on some other site, pointing the user there or some cloud source which is not theirs ... grouping of what sort of data is in site, spam, x rated, political, safe for work and around kids etc

Again I've given up on it and perhaps for the best, I figure it won't be long there will be a LLM based service which will filter the web for users however they'd like results fine tuned to make the web useful to them again, not the mere shallow depth that some of the better search switch operators could do even in the golden times of search in the 00s, but nearly at the level of determining the characteristics of the site and what the user wants to avoid. ie not a mere LLM summing up the site and information. What a dream it would be to list results for nearest physical store to blah suburb that sells / has workshop / operators manual for recent Baz 1 ton excavator ... and not waste time sifting though pages of results until the search engine won't display any more - and in the end resorting to an old phone book to search for stores directly.