Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think heavy reliance on human language (and its ambiguity) is one of the main problems.

Maybe personal whitelist/blacklist for domains and authors could improve things. Sort of "Web of trust" but done properly.

Not completely without search engines, but for example, if every website was responsible for maintaining it's own index, we could effectively run our own search engines after initialising "base" trusted website lists. Let's say I'm new to this "new internet", I ask around what are some good websites for information I'm interested in. My friend tells me wikipedia is good for general information, webmd for health queries, stackoverflow for programming questions, and so on. I add wikipedia.org/searchindex, webdm.com/searchindex and stackoverflow.com/searchindex to my personal search engine instance, and every time I search something, these three are queried. This could be improved with local cache, synonyms, etc. As you carry on using it, you expand your "library". Of course it would increase workload of individual resources, but has potential to give feel of that web 1.0 once again.



This was devised by Amazon in 2005. They called it OpenSearch (http://www.opensearch.org/) Basically it was a standard way to expose your own search engine on your site. It made it is to programmatically search a bunch of individual sites.


This would be ludicrously easy to game. Crowdsourcing would also be ludicrously easy to game.

The problem isn't solvable without a good AI content scraper.

The scraper/indexer either has to be centralised - an international resource run independently of countries, corporations, and paid interest groups - or it has be an impossible-to-game distributed resource.

The former is hugely challenging politically, because the org would effectively have editorial control over online content, and there would be huge fights over neutrality and censorship.

(This is more or less where are now with Google. Ironically, given the cognitive distortions built into corporate capitalism, users today are more likely to trust a giant corporation with an agenda than a not-for-profit trying to run independently and operate as objectively as possible.)

Distributed content analysis and indexing - let's call it a kind of auto-DNS-for-content - is even harder, because you have to create an un-hackable un-gameable network protocol to handle it.

If it isn't un-gameable it become a battle of cycles, with interests with access to more cycles being able to out-index those with fewer - which will be another way to editorialise and control the results.

Short answer - yes, it's possible, but probably not with current technology, and certainly not with current politics.


Just want to point out that you're on a site that successfully uses crowd sourcing combined with moderation to curate a list of websites, news, and articles that people find interesting and valuable. Why not a new internet built around communities like this where the users actively participate in finding, ranking, and moderating the content they consume? It's not a stretch to add a decent search index and categories to a news aggregator, most do it already. If these tools could be built into the structure of the web we'd be half way there.


Edit: I had myself convinced that comments have a different ID space from submissions, but that obviously isn't true. I've partly rewritten to correct for an over-guess on how many new submissions there are each day.

I agree with your general suggestion, but just want to highlight that scale issues still make me think whatever finds traction on HN is a bit of a crapshoot.

It looks like there were over 10k posts (including comments) in the last day, and the list of submissions that spent time on the front page day yesterday has 84 posts. I don't how normal the last 2 days were, but by eyeball I'd guess around a quarter of the posts are comments on the day's front-page posts. This means there are probably a few thousand submissions that didn't get much if any traction.

Any time I look at the "New" page, I still end up finding several items that sound interesting enough to open. I see more than 10 that I'm tempted to click on right now. The current new page stretches back about 40 minutes, and only 10 of the 30 have more than 1 point (and only 1 has more than 10). Only 2 of the links I was tempted to click on have more than 1 point.

I suspect that there's vastly more interesting stuff posted to HN than its current dynamics are capable of identifying and signal-boosting. That's not bad, per se. It'd be an even worse time-sink if it were better at this task. But it does mean there are pitfalls using it as a model at an even larger scale and in other contexts.


User's search engine doesn't have to trust suggestions verbatim, it can always run its own heuristic on top of returned results. And the user could reduce the weight of especially uncooperative domains or blacklist them altogether.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: