Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know the feeling well.

I spent a few years on and off few days at a time as a distraction from the mid 00's in a design phase of a alternative area that embraced the notion of world wide web, web for all. Even way back then I was fed up with the endless near pointless search results offered up more and more. The idea in its simplest form was a limited tool that would help a website correctly tag each content page, with a value representing 200 or more definable characteristics each page could fall into - a global index could then faithfully use the values to better align search results. Unlike other SEO ideas, the tags were part of centralised service and very small token amount paid to validate each page. Also one of the things I considered important was after a given amount of time, each content page would be migrated from dynamic to static values for long term preservation - deletion was only something done as a last resort and special circumstances and thus aimed to see off content farms that would post, aim for attention and then are done with old stuff after a couple of months. (But times have changed, it's been a long period since running into a content farm.)

It was actually not even part way planned, realised it was better as a separate area to the regular web and ... at some point I realised that most people would not like having to take extra steps and ... competition for a good search engine is very low, the money is in providing a higher ratio of spam to useful links, more clicks equals more money, and web site owners would probably want google or other large well used search engine company to be driving their ad revenue.

Presently there may lie an answer with the use of a LLM based search helper area, even if it just has access to google's garbage, can run down the first few hundred or more results grouping said results into useful summaries and returning the handful with links that I would call truly WWW. Eventually it might inspire web designers to code more robustly or stop using external scripts that are too intrusive..

ie. You have searched for x y z ...

There are 8 results where the website has expired - ignored

There are 329 results where the website admin has not included a if no javascript option - ignored.

There are 45 results where the website will not render as the admin has included an element that is specific to certain software - ignored

There are 83 results where the website admin has not included anything more relevant than a large about content page - ignored

There are 57 results where the website admin has included SEO but the site content is mostly mismatched or is no more functional than a simple banner - ignored

There are 112 results where the website admin blocked older browsers as they might break their fragile site - ignored

There are 94 results where the website admin has set geolocation - ignored

There are 629 results where site's external scripts were found to scrape or aimed to scrape personal information not relevant to the site - ignored

Top 23 results: ...




An open index of the web is an interesting idea that I've heard come up a few times now. Putting ever page into a one of 200 categories would be a pretty interesting undertaking. You'd probs need to fund it by "grants" or something from AI companies though...

One thing I feel is getting clear for me is that we often think about this problem from a "how to solve it to make it stay open" POV, but what the user ex and thus the product you are making is unclear. If you made this index, the product would be the data and so you'd be push to selling there. What is the product an end user like ourselves would pay for that helps? RSS feed readers seem to be the genearl suggestion so far of what we have today.


I have given up thinking about it much these days -- I guess I'm just pretty jaded with the direction the web went and most people are happy with it. There are probably better and more practical solutions than what I had in mind - besides it had got quite complicated though I may have just got myself caught up in needless complexity.

Still I should clarify, 200 or more definable characteristics, wasn't aimed all that much at the topic ie not purely a tool that could categorise topics like a library card index.

I was more concerned with outing the rubbish within the web and thus enable quick searches and more productive for the user, as well as saving hours of a desktop or laptop being on while the user conducted an endless search while they dealt with the many fruitless and pointless results.

... and on second though removed the really long reply that roughly outlined it. I will say characteristics are things like if the site is a banner (contact, about, low signal) ... if it's a mirror / pointer type site ... if the site is just a front end to some other place ... geolocation ... dynamic vs static information ... forum / blog / etc ... a new one comes to mind present day, if the site is behind a site protection service (some are awful - I'm so tired of seeing ray id just because I prefer to use an older browser) ... the site hosts any files on offer, code or text based, rather than rely entirely on some other site, pointing the user there or some cloud source which is not theirs ... grouping of what sort of data is in site, spam, x rated, political, safe for work and around kids etc

Again I've given up on it and perhaps for the best, I figure it won't be long there will be a LLM based service which will filter the web for users however they'd like results fine tuned to make the web useful to them again, not the mere shallow depth that some of the better search switch operators could do even in the golden times of search in the 00s, but nearly at the level of determining the characteristics of the site and what the user wants to avoid. ie not a mere LLM summing up the site and information. What a dream it would be to list results for nearest physical store to blah suburb that sells / has workshop / operators manual for recent Baz 1 ton excavator ... and not waste time sifting though pages of results until the search engine won't display any more - and in the end resorting to an old phone book to search for stores directly.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: