I have given up thinking about it much these days -- I guess I'm just pretty jaded with the direction the web went and most people are happy with it. There are probably better and more practical solutions than what I had in mind - besides it had got quite complicated though I may have just got myself caught up in needless complexity.
Still I should clarify, 200 or more definable characteristics, wasn't aimed all that much at the topic ie not purely a tool that could categorise topics like a library card index.
I was more concerned with outing the rubbish within the web and thus enable quick searches and more productive for the user, as well as saving hours of a desktop or laptop being on while the user conducted an endless search while they dealt with the many fruitless and pointless results.
... and on second though removed the really long reply that roughly outlined it. I will say characteristics are things like if the site is a banner (contact, about, low signal) ... if it's a mirror / pointer type site ... if the site is just a front end to some other place ... geolocation ... dynamic vs static information ... forum / blog / etc ... a new one comes to mind present day, if the site is behind a site protection service (some are awful - I'm so tired of seeing ray id just because I prefer to use an older browser) ... the site hosts any files on offer, code or text based, rather than rely entirely on some other site, pointing the user there or some cloud source which is not theirs ... grouping of what sort of data is in site, spam, x rated, political, safe for work and around kids etc
Again I've given up on it and perhaps for the best, I figure it won't be long there will be a LLM based service which will filter the web for users however they'd like results fine tuned to make the web useful to them again, not the mere shallow depth that some of the better search switch operators could do even in the golden times of search in the 00s, but nearly at the level of determining the characteristics of the site and what the user wants to avoid. ie not a mere LLM summing up the site and information. What a dream it would be to list results for nearest physical store to blah suburb that sells / has workshop / operators manual for recent Baz 1 ton excavator ... and not waste time sifting though pages of results until the search engine won't display any more - and in the end resorting to an old phone book to search for stores directly.
Still I should clarify, 200 or more definable characteristics, wasn't aimed all that much at the topic ie not purely a tool that could categorise topics like a library card index.
I was more concerned with outing the rubbish within the web and thus enable quick searches and more productive for the user, as well as saving hours of a desktop or laptop being on while the user conducted an endless search while they dealt with the many fruitless and pointless results.
... and on second though removed the really long reply that roughly outlined it. I will say characteristics are things like if the site is a banner (contact, about, low signal) ... if it's a mirror / pointer type site ... if the site is just a front end to some other place ... geolocation ... dynamic vs static information ... forum / blog / etc ... a new one comes to mind present day, if the site is behind a site protection service (some are awful - I'm so tired of seeing ray id just because I prefer to use an older browser) ... the site hosts any files on offer, code or text based, rather than rely entirely on some other site, pointing the user there or some cloud source which is not theirs ... grouping of what sort of data is in site, spam, x rated, political, safe for work and around kids etc
Again I've given up on it and perhaps for the best, I figure it won't be long there will be a LLM based service which will filter the web for users however they'd like results fine tuned to make the web useful to them again, not the mere shallow depth that some of the better search switch operators could do even in the golden times of search in the 00s, but nearly at the level of determining the characteristics of the site and what the user wants to avoid. ie not a mere LLM summing up the site and information. What a dream it would be to list results for nearest physical store to blah suburb that sells / has workshop / operators manual for recent Baz 1 ton excavator ... and not waste time sifting though pages of results until the search engine won't display any more - and in the end resorting to an old phone book to search for stores directly.