An open index of the web is an interesting idea that I've heard come up a few times now. Putting ever page into a one of 200 categories would be a pretty interesting undertaking. You'd probs need to fund it by "grants" or something from AI companies though...
One thing I feel is getting clear for me is that we often think about this problem from a "how to solve it to make it stay open" POV, but what the user ex and thus the product you are making is unclear. If you made this index, the product would be the data and so you'd be push to selling there. What is the product an end user like ourselves would pay for that helps? RSS feed readers seem to be the genearl suggestion so far of what we have today.
I have given up thinking about it much these days -- I guess I'm just pretty jaded with the direction the web went and most people are happy with it. There are probably better and more practical solutions than what I had in mind - besides it had got quite complicated though I may have just got myself caught up in needless complexity.
Still I should clarify, 200 or more definable characteristics, wasn't aimed all that much at the topic ie not purely a tool that could categorise topics like a library card index.
I was more concerned with outing the rubbish within the web and thus enable quick searches and more productive for the user, as well as saving hours of a desktop or laptop being on while the user conducted an endless search while they dealt with the many fruitless and pointless results.
... and on second though removed the really long reply that roughly outlined it. I will say characteristics are things like if the site is a banner (contact, about, low signal) ... if it's a mirror / pointer type site ... if the site is just a front end to some other place ... geolocation ... dynamic vs static information ... forum / blog / etc ... a new one comes to mind present day, if the site is behind a site protection service (some are awful - I'm so tired of seeing ray id just because I prefer to use an older browser) ... the site hosts any files on offer, code or text based, rather than rely entirely on some other site, pointing the user there or some cloud source which is not theirs ... grouping of what sort of data is in site, spam, x rated, political, safe for work and around kids etc
Again I've given up on it and perhaps for the best, I figure it won't be long there will be a LLM based service which will filter the web for users however they'd like results fine tuned to make the web useful to them again, not the mere shallow depth that some of the better search switch operators could do even in the golden times of search in the 00s, but nearly at the level of determining the characteristics of the site and what the user wants to avoid. ie not a mere LLM summing up the site and information. What a dream it would be to list results for nearest physical store to blah suburb that sells / has workshop / operators manual for recent Baz 1 ton excavator ... and not waste time sifting though pages of results until the search engine won't display any more - and in the end resorting to an old phone book to search for stores directly.
There are some interesting players taking another attempt at helping people explore the web by being aggregators - kagi.com for example. And I hope they make cool things. At the same time, I feel like we've walked that path a few times now and it hasn't made too different of an outcome. I wonder if there are laws in averages that mean it will become the same thing no matter intent. Even if that is wrong, I do worry we are stuck trying to make a faster horse.
Thinking about why I've tended to avoid RSS, there is a consumption vs exploration divide in my mind that I feel RSS readers lean the wrong way on. That is probs an assumption I should challenge in my own thinking.
At some point you have to stop exploring and actually consume what you've collected. I think that's one psychological trick of infinite scroll feeds... our minds naturally want to see what's next. Personally, I spend more time searching and saving things to my various lists instead of actually watching or reading what I've saved.
I'm not really convinced its gone. Its small. It was small 20+ years ago too though. I think the problem now is its hard to find the web where before that was all that was.
Oh, it's not gone, but as you say it's small and getting smaller. About half of my regular haunts (the ones that still exist, anyway) have withdrawn from the public web in the last couple of years and gone behind login walls in order to protect against crawlers. That's why I suspect this is the direction things are going to go.
How do you find new RSS feeds today? RSS readers is something I've never really loved. But I also haven't tried them in a while. I should probs go look at some RSS feed readers to try
I just punched in a bunch of random text in over its self a few times to see what this is like. It is either a new form of music or torture.
I guess I can make it so each call to an instance of morse adds to a queue and that when one bit of text is played it goes on to the next. This seems better for a singleton though and as of right now, this isn't a singleton... hmmm.
---
and... fixed.