Ask HN: Can we create a new internet where search engines are irrelevant?

adrianmonk · on June 26, 2019

I think it would be helpful to remember to distinguish two separate search engine concepts here: indexing and ranking.

Indexing isn't the source of problems. You can index in an objective manner. A new architecture for the web doesn't need to eliminate indexing.

Ranking is where it gets controversial. When you rank, you pick winners and losers. Hopefully based on some useful metric, but the devil is in the details on that.

The thing is, I don't think you can eliminate ranking. Whatever kind of site(s) you're seeking, you are starting with some information that identifies the set of sites that might be what you're looking for. That set might contain 10,000 sites, so you need a way to push the "best" ones to the top of the list.

Even if you go with a different model than keywords, you still need ranking. Suppose you create a browsable hierarchy of categories instead. Within each category, there are still going to be multiple sites.

So it seems to me the key issue isn't ranking and indexing, it's who controls the ranking and how it's defined. Any improved system is going to need an answer for how to do it.

jtolmar · on June 26, 2019

Some thoughts on the problem, not intended as a complete proposal or argument:

* Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.

* How fast a ranking algorithm is depends on how the indexing is done. Is there some common set of features we could agree on that we'd want to build the shared index on? Any ranking that wants something not in the public index would need either a private index or a slow sequential crawl. Sometimes you could do a rough search using the public index and then re-rank by crawling the top N, so maybe the public index just needs to be good enough that some ranker can get the best result within the top 1000.

* Maybe the indexing servers execute the ranking algorithm? (An equation or SQL-like thing, not something written in a Turing Complete language). Then they might be able to examine the query to figure out where else in the network to look, or where to give up because the score will be too low.

* Maybe the way things are organized and indexed is influenced by the ranking algorithms used. If indexing servers are constantly receiving queries that split a certain way, they can cache / index / shard on that. This might make deciding what goes into a shared index easier.

ergothus · on June 26, 2019

> Indexing is expensive. If there's a shared public index, that'd make it a lot easier for people to try new ranking algorithms. Maybe the index can be built into the way the new internet works, like DNS or routing, so the cost is shared.

But what are you storing in your index? The content that is considered in your ranking will vary wildly by your ranking methods. (example - early indexes cared only for the presence of words. Then we started to care about the count of words, then the relationships between words and the context. Then about figuring out if the site was scammy, or slow.

The only way to store an index of all content (to cover all the options) is to...store the internet.

I'm not trying to be negative - I feel very poorly served by the rankings that are out there, as I feel on 99% of issues I'm on the longtail rather than what they target. But I can't see how a "shared index" would be practical for all the kinds of ranking algorithms both present and future.

jgtrosh · on June 28, 2019

> The only way to store an index of all content (to cover all the options) is to...store the internet.

An index cannot hope to cover all options, these ideas are antithetical.

jppope · on June 27, 2019

this is a pretty killer idea

mavsman · on June 26, 2019

How about open sourcing the ranking and then allowing people to customize it. I should be able to rank my own search results how I want to without much technical knowledge.

I want to rank my results by what is most popular to my friends (Facebook or otherwise) so I just look for a search engine extension that allows me to do that. This could get complex but can also be simple if novices just use the most popular ranking algorithms.

josephjrobison · on June 26, 2019

I think Facebook really missed the boat on building their own "network influenced" search engine. They made some progress in allowing you to search based on friends' posting and recommendations to some degree but it seems to have flatlined in the last few years and is very constricting.

One thing I haven't seen much on these recent threads on search is the ability to create your own Google Custom Search Engine based on domains you trust - https://cse.google.com/cse/all

Also, not many people have mention the use of search operators, which allows you to control the results returned. Such as "Paul Graham inurl:interview -site:ycombinator.com -site:techcrunch.com"

aleppe7766 · on June 26, 2019

That would bring to an even bigger filter bubble issue, more precisely to a techno élite which is capable, willing and knowledgeable enough to feel the need go through the hassle, and all the rest navigating in such an indexed mess that would pave the way to all sort of new gatekeepers, belonging to the aforementioned tech élite. It’s not a simple issue to tackle, perhaps a public scrutiny on the ranking algorithms would be a good first step.

samirm · on June 26, 2019

I disagree. The people who don't know anything and are unwilling to learn wouldn't be any worse off than they are today and everyone else would benefit from an open source "marketplace" of possible ranking algorithms that the so called "techno elite" have developed.

aleppe7766 · on June 27, 2019

I think the proposed improvement to the web in its intentions should mostly benefit the "ignorants", not those that can already navigate through the biases of today's technological gatekeepers. Please note, ignorants are not at fault for being so. Especially when governments cut funds for public education, and media leverages (and multiplies) ignorance to produce needs and sales, fears and votes. Any solution must work first to make the weak stronger, more conscious. A better and less biased web can help people grow their unbiased knowledge, and therefore exercise their right of vote with a deeper understanding of the complexity. Voting ignorants are an opportunity for the ill intentioned politicians, as much as are a problem for me, you and the whole country.

wumpus · on June 26, 2019

blekko and bing both implemented ranking by popularity with your Facebook friends, and the data was too sparse to be useful.

smitop · on June 28, 2019

If the details of a ranking algorithm are open source, it would be easy to manipulate them.

dex011 · on June 29, 2019

Open sourcing the ranking... YES!!!

MadWombat · on June 26, 2019

I wonder if indexing and ranking could be decentralized. Lets say we design some data formats and protocols to exchange indexing and ranking information. Then maybe instead of getting a single Google, we could have a hierarchical system of indexers and rankers and some sort of consensus and trust algorithm to aggregate the information between them. Maybe offload indexing to the content providers altogether, i.e. if you want your website found, you need to maintain your own index. Maybe do a market on aggregator trust, if you don't like a particular result, the corresponding aggregator loses a bit of trust and its rankings become a bit less prominent.

allworknoplay · on June 26, 2019

Spitballing here, but what if instead of a monolithic page rank algorithm, you could combine individually maintained, open set rankings?

===Edit=== I mean to say you as the user would gain control over the ranking sources, the company operating this search service would perform the aggregation and effectively operate marketplace of ranking providers. ===end edit===

For example, one could be an index of "canonical" sites for a given search term, such that it would return an extremely high ranking for the result "news.ycombinator.com" if someone searches the term "hacker news". Layer on a "fraud" ranking built off lists of sites and pages known for fraud, a basic old-school page rank (simply order by link credit), and some other filters. You could compose the global ranking dynamically based off weighted averages of the different ranked sets, and drill down to see what individual ones recommended.

Seems hard to crunch in real time, but not sure. It'd certainly be nicer to have different orgs competing to maintain focused lists, rather than a gargantuan behemoth that doesn't have to respond to anyone.

Maybe you could even channel ad or subscription revenue from the aggregator to the ranking agencies based off which results the user appeared to think were the best.

SequoiaHope · on June 26, 2019

Well I suppose Google has some way of customizing search for different people. The big issue for me is that google tracks me to do this. Maybe there could be a way to deliver customized search where we securely held the details of our customization. Or we were pooled with similar users. I suppose if a ranking algorithm had all the possible parameters as variables, we could deliver our profile request on demand at the time of search. That would be nice. You could search as a Linux geek or as a music nut or see the results different political groups get.

dublin · on June 28, 2019

Building something like this becomes much easier with Xanadu-style bidirectional links. Of course, building those is hard, but eliminating the gatekeeper-censors may finally be the incentive required to get bidi links built. It's also worth noting that such a system will have to have some metrics for trust by multiple communities (e.g. Joe may think say, mercola.com is a good and reliable source of health info, while Jane thinks he's stuck in the past - People should be able to choose whether they value Joe's or Jane's opinion more, affecting the weights they'll see). In addition (and this is hard, too), those metrics should not be substantially game-able by those seeking to either promote or demote sites for their own ends. This requires a very distributed trust network.

bobajeff · on June 26, 2019

I like the idea of local personalized search ranking that evolves based off of a on device neural network. I'm not sure how that would be work though.

sogen · on June 28, 2019

Sounds like ad-Blocker repos, nice!

asdff · on June 26, 2019

Not to mention all the people who will carefully study whatever new system, looking for their angle to game the ranking.

daveloyall · on June 28, 2019

> When you rank, you pick winners and losers.

...To which people responded with various schemes for fair ranking systems.

...To which people observed that someone will always try to game the ranking systems.

Yep! So long as somebody stands to benefit (profit) from artificially high rankings, they'll aim for that, and try to break the system. Those with more resources will be better able to game the system, and gain more resources... ad nauseam. We'd end up right where we are.

The only way to break that [feedback loop](https://duckduckgo.com/?q=thinking+in+systems+meadows) is to disassociate profit from rank.

Say it with me: we need a global, non-commercial network of networks--an internet, if you will. (Insert Al Gore reference here.)

(Note: I don't have time to read all the comments on this page before my `noprocrast` times out, so please pardon me if somebody already said this.)

zeruch · on June 26, 2019

This is a bang on distillation of the problem (or at least one way to view the problem, per "who controls the ranking and how it's defined").

aleppe7766 · on June 26, 2019

That’s a very useful distinction, that brings me to a question: are we sure that automating ranking in 2019, on the basis of publicly scrutinized algorithms, would bring us back to a pre-Google accuracy? Also, ranking on the basis of the sole query instead of the individual, would lead to much more neutral results.

tracker1 · on June 26, 2019

Absolutely spot on... I've been using DDG as my default search engine for a couple months. But, google has a huge profile on me. I find myself falling back to google a few times a day when searching for technical terms/issues.

Retra · on June 26, 2019

Couldn't you just randomize result ordering?

penagwin · on June 26, 2019

You know how google search results can get really useless just a few pages in? And it says it found something crazy like 880,000 results? Imagine randomizing that.

---

Unrelated I searched for "Penguin exhibits in Michigan". Of which we have several. It reports 880,000 results but I can only go to page 12 (after telling it to show omitted results). Interesting...

https://www.google.com/search?q=penguin+exhibits+in+michigan

Theodores · on June 26, 2019

If you think of it as like an old fashioned library or an old fashioned Blockbuster video store.

Sure you could read any book ever printed in the English language in the local library. They might have to get it in from the national collection or the big library in the city. But you ain't going to see every book in the local library. There is more than you could wish for and you will never read every book in the local library. But all the classics are there, the talked about new books are there (or out on loan, back soon). All the reference books that school kids are there, there is enough to get you started in any hobby.

Google search results are like that. Those 880,000 'titles' are a bit like the Library of Congress boasting how big it is, it is just a number. All they have really got for you is a small selection that is good enough for 99% of people 99% of the time. Only new stuff by people with Page rank (books with publishers) get indexed now and put into the 'main collection'.

Much like how public libraries do have book sales, Google do let a lot of the 880,000 results drop off.

It's a ruse!

cruano · on June 27, 2019

I heard that they also filter results by some undisclosed parameters, like they don't show you anything that hasn't been modified in the last ~10 years, no matter how hard you try

dublin · on June 28, 2019

Yeah, this is a real problem for research into older things that have no need to change. Google seems to think that information has a half-life. That's really only true in the social space. Truth is eternal.

ZeroBugBounce · on June 26, 2019

Sure, but then whoever gets to populate the index chooses the winners and losers, because you could just stuff it with different versions of the content or links you wanted to win and the random ranking would should those more often, because they appear in the pool of possible results more often.

cortesoft · on June 26, 2019

That would make it waaaay less useful to searchers and wayyy easier to game by stuffing results with thousands of your own results

onion2k · on June 27, 2019

I suspect just randomizing the first 20 or so results would fix most problems. The real issue is people putting effort in to hitting the first page, so if you took the benefit out of doing that people would look for other ways to spend their energy.

z3t4 · on June 26, 2019

If you find nothing useful, just refresh for a new set. It would also help discovery.

brokensegue · on June 26, 2019

Sounds like a great ux

iblaine · on June 26, 2019

Yes, it was called Yahoo and it did a good job of cataloging the internet when hundreds of sites were added per week: https://web.archive.org/web/19961227005023/http://www2.yahoo...

I'm old enough to remember sorting sites by new to see what new URLs were being created, and getting to that bottom of that list within a few minutes. Google and search was a natural response to solving that problem as the number of sites added to the internet grew exponentially...meaning we need search.

kickscondor · on June 26, 2019

Directories are still useful - Archive of Our Own (https://archiveofourown.org/) is a large example for fan fiction, Wikipedia has a full directory (https://en.wikipedia.org/wiki/Category:Main_topic_classifica...), Reddit wikis perform this function, Awesome directories (https://github.com/sindresorhus/awesome) or personal directories like mine at href.cool.

The Web is too big for a single large directory - but a network of small directories seems promising. (Supported by link-sharing sites like Pinboard and HN.)

ninju · on June 26, 2019

How about this

https://en.wikipedia.org/wiki/List_of_lists_of_lists

kickscondor · on June 26, 2019

Yes! But, of course, for directories outside of Wikipedia. This is very interesting for its classification structure. It's so typical of Wikipedia that a 'master list of lists' (by my count, there are 589 list links on this page) contains lists such as "Lists of Melrose Place episodes" and "Lists of Middle-earth articles" alongside lists such as "lists of wars" or "lists of banks".

brokensegue · on June 26, 2019

Ao3 isn't really a directory since they do the actual hosting

kickscondor · on June 26, 2019

Yes, thank you - I only mean in terms of organization.

adrianmonk · on June 26, 2019

I used Yahoo back in those days, and it literally proved the point that hand-cataloging the internet wasn't tractable, at least not the way Yahoo tried to do it. There was just too much volume.

It was wonderful to have things so carefully organized, but it took months for them to add sites. Their backlog was enormous.

Their failure to keep up is basically what pushed people to an automated approach, i.e. the search engine.

bitwize · on June 26, 2019

I found myself briefly wondering if it were possible to have a decentralized open source repository of curated sites that anyone could fork, add to, or modify. Then I remembered dmoz, which wasn't really decentralized -- and realized that "awesome lists" on GitHub may be a critical step in the direction I had envisioned.

insulanus · on June 27, 2019

I think this could work for small, specific areas of interest. For example, there are only so many people writing about, and interested in reading about, programming language design. Those small communities could stand ready with their community-curated index when an "outsider" wants to research something they know well.

stakhanov · on June 26, 2019

You don't have to go all the way back into Yahoo-era when it comes to manually curated directories: DMOZ was actively maintained until quite recently, but ultimately given up for what seems like good reasons.

iblaine · on June 27, 2019

This is true, and DMOZ was used heavily by Google's earlier search algorithms to rank sites within Google. Early moderators of DMOZ had god like powers to influence search results.

gerbilly · on June 26, 2019

Earlier than that there was a list of ftp sites giving a summary of what was available on each.

alangibson · on June 26, 2019

I wonder if you could build a Yahoo/Google hybrid where you start with many trusted catalogs run by special interest groups then index only those sites for search. Doesn't fully solve the centralization problem, but interesting none the less.

ovi256 · on June 26, 2019

Everyone has missed the most important aspect of search engines, from the point of view of their core function of information retrieval: they're the internet equivalent of a library index.

Either you find a way to make information findable in a library without an index (how?!?) or you find a novel way to make a neutral search engine - one that provides as much value as Google but whose costs are paid in a different way, so that it does not have Google's incentives.

davemp · on June 26, 2019

The problem is that current search engines are indexing what is essentially a stack of random books thrown together by anonymous library goers. Before being able to guide readers to books, librarians have to the following non-trivial tasks over the entire collection:

- identify the book's theme

- measure the quality of the information

- determine authenticity / malicious content

- remember the position of the book in the colossal stacks

Then the librarian can start to refer people to books. This problem was actually present in libraries before the revolutionary Dewy Decimal System [1]. Libraries found that the disorganization caused too much reliance on librarians and made it hard to train replacements if anything happened.

The Internet just solved the problem by building a better librarian rather than building a better library. Personally I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.

[1]: https://en.wikipedia.org/wiki/Dewey_Decimal_Classification

jasode · on June 26, 2019

>I welcome any attempts to build a more organized internet. I don't think the communal book pile approach is scaling very well.

Let me know if I misunderstand your comment but to me, this has already been tried.

Yahoo's founders originally tried to "organize" the internet like a good librarian. Yahoo in 1994 was originally called, "Jerry and David's Guide to the World Wide Web"[0] with hierarchical directories to curated links.

However, Jerry & David noticed that Google's search results were more useful to web surfers and Yahoo was losing traffic. Therefore, in 2000 they licensed Google's search engine. Google's approach was more scaleable than Yahoo's.

I often see several suggestions that the alternative to Google is curated directories but I can't tell if people are unaware of the early internet's history and don't know that such an idea was already tried and how it ultimately failed.

[0] http://static3.businessinsider.com/image/57977a3188e4a714088...

organsnyder · on June 26, 2019

I remember trying to get one of my company's sites listed on Yahoo! back in the late 1990s. Despite us being an established company (founded in 1985) with a good domain name (cardgames.com) and a bunch of good, free content (rules for various card games, links to various places to play those games online, etc.), it took months.

dsparkman · on June 26, 2019

That was not a bad thing. It was curated. Most of the crap never made it in the directory precisely because humans made decisions about what got in. If you wanted in the directory faster, you could pay a fee to get to the front of the queue. The result is that Yahoo could hire people to process the queue and make money without ads.

organsnyder · on June 26, 2019

Isn't paying money to jump to the front of the queue just another form of advertising?

Stronico · on June 26, 2019

That was my experience as well. For old companies and new. Yahoo was just really slow.

davemp · on June 26, 2019

> I often see several suggestions that the alternative to Google is curated directories but I can't tell if people are unaware of the early internet's history and don't know that such an idea was already tried and how it ultimately failed.

¿Por qué no los dos?

1) The idea is that a more organized structure is easier for a librarian to index. Today, libraries still have librarians. The book pile just wouldn't take decades to build familiarity.

2) Times change. New technology exists, people use the internet differently, and there's more at stake. Just because an approach didn't work before doesn't mean that it won't work now.

There are real problems with an organizational approach, but I don't see why the idea isn't worth a revisit.

toast0 · on June 26, 2019

There are plenty of these, wikipedia has a list [1].

I think these efforts get bogged down in the huge amount of content out there, the impermanence of that content and also the difficulty in placing sites into ontologies.

And at the end of the day, there's not a large enough value proposition to balance the immense effort.

I think, if you were to do it today, you would want to work on / with the internet archive, so at least things that were categorized wouldn't change or disappear (as much)

[1] https://en.m.wikipedia.org/wiki/List_of_web_directories

davemp · on June 26, 2019

Obviously a naïve web directory isn't going to cut it.

What would make the approach viable is if there were a nice way to automate and crowd source most/all of the effort. Maybe that means changing the idea of what makes a website. Maybe there could just be little grass roots reddit-esque communities that are indexed/verified (google already favors reddit/hn links). Who knows, but it's an interesting problem to kick around.

jasode · on June 26, 2019

>What would make the approach viable is if there were a nice way to automate and crowd source most/all of the effort.

But to me, crowdsourcing is also what Jerry & David did. The users submitted links to Yahoo. AltaVista also had a form for users to submit new links.

Also, Wikipedia's list of links are also crowdsourced in the sense that many outside websurfers (not just staff editors) make suggested edits to the wiki pages. Looking at a "revision history" of a particular wiki page makes the crowdsourced edits more visible: https://en.wikipedia.org/w/index.php?title=List_of_web_direc...

davemp · on June 26, 2019

Sometimes it just takes a small changes to make an idea work. Neural networks weren't viable until GPUs/backpropagation. Dismissive comments like this aren't very useful.

jasode · on June 26, 2019

>Dismissive comments

I wasn't being dismissive. I was trying to refine your crowdsourcing idea by explicitly surfacing what's been tried in the past.

The thread's op asks: "Can we create a new internet where search engines are irrelevant?"

If the current best answer for op is: "I propose crowdsourced curated directories is the alternative to Google/Bing -- but the implementation details is left as an exercise for the reader" ... that's fine that our conversation terminates there and we don't have to go around in circles. The point is I didn't know this thread's discussion ultimately terminates there until I ask more probing questions so people can try to expand on what their alternative proposal actually entails. I also don't know what baseline knowledge the person proposing the idea has. I.e. does person suggesting an idea have knowledge of internet's evolution and has that been taken into account?

msla · on June 26, 2019

> Maybe there could just be little grass roots reddit-esque communities that are indexed/verified

Verified by who, exactly?

I know, I know... "dismissive comment", but it's an important thing to think about: Who decides what goes in the library? It's an evergreen topic, even in real, physical libraries, as those tedious lists of "Banned And Challenged Books" attest. It seems every time a copy of Huckleberry Finn gets pulled from an elementary school library in Altoona everyone gets all upset, so can you imagine what would happen if the radfems got their hands on a big Web Directory and cleansed it of all positive mentions of trans people?

davemp · on June 26, 2019

I imagine the communities would kind of serve as a public index in aggregate that have a barrier to entry / reputation. If one turns to crap just ignore it with whatever search tool you're using.

It wouldn't be about policing, just organizing.

ehnto · on June 26, 2019

Consider the sheer size of the internet now. Even if you could categorize and file that many websites accurately, how do you display that to the user in a way that's usable? It will probably look a lot like a search engine, no matter which way you frame it.

The underlying goal: "Get a user the information they want when they don't know where it lives" isn't really going to be helped by a non-searchable directory of millions of sites.

PeterisP · on June 26, 2019

The current search engines are also indexing books maliciously inserted in the library in a way to maximize their exposure e.g. a million "different" pamphlets advertising Bob's Bible Auto Repair Service inserted in the Bible category.

A "better library" can't be permissionless and unfiltered; Dewey Decimal System relies on the metadata being truthful, and the internet is anything but.

You can't rely on information provided by content creators; Manual curation is an option but doesn't scale (see the other answer re: early Yahoo and Google).

davemp · on June 26, 2019

Perhaps there exists a happy medium between: manual curation -- unfiltered

PageRank is kind of a pseudo manual curation. The manual effort is just farmed out to the greater internet and analyzed.

zaphar · on June 26, 2019

The really hard part of this to scale is the quality metric. Google was the first to really scale quality measurement by outsourcing it to the web content creators themselves.

Any attempt to create a decentralized index will need to tackle the quality metric problem.

agumonkey · on June 26, 2019

Also, there's an massive economic market on top on what is on the closest shelves. Libraries are less sensitive to these forces.

basch · on June 26, 2019

They are also a spam filter. It's not just an index of whats relevant, but removal of what maliciously appears to be relevant at first glance.

izendejas · on June 26, 2019

This. Everyone's missing the point of a search engine.

We're talking about billions of pages and if not ranked (authority is a good hueristic), filtered (de-ranked), etc then good luck finding valuable information because everyone is gaming the systems to improve their ranking.

I think this is part of the reason you get a lot of fake news on social media. It's a constant stream of information (a new dimension of time has been added to the ranking, basically) that needs to be ranked and with humans in the loop, there's no way to do this very easily without filtering for noise and outright malicious content.

basch · on June 26, 2019

i disagree that there isnt a way, just that nobodies tried a good one yet.

take reddit for example. it should be very easy to establish a few voters who make "good" decisions, and then extrapolate their good decisions based on people with similar voting patterns. it would combine a million monkeys with typewriters with expert meritocracy. you want different sorting, sort by different experts until you get the results you want. it seems every platform is too busy fighting noise to focus on amplifying signal, or are focused on teaching machines to do the entire task, instead of using machines to multiply the efficiency of people with taste who can make a good judgement call with regard to whether something is novel or or pseudo-intellectual. Not to pick on them, but I would suspect an expert to be better at deranking aeon/brainpickings type clickbait than an eruditelike ai, if only because humans can still more easily determine if someone is making an actual worthwhile point, vs repeating a platitude, conventional wisdom, or something hollow.

abathur · on June 26, 2019

It should, but if anyone knows who these kingmakers are, it's still probably just a matter of time before they accrue enough power for it to be worth someone's time to at least try to track them down and manipulate their decisions (bribe, blackmail, sponsor, send free trials, target with marketing/propaganda campaigns, etc.)

basch · on June 26, 2019

Who says it even has the same kingmakers every day? Slashdot solved that part of metamoderation two decades ago.

A person might be an expert in cars but not horses. A car expert might be superseded . The seed data creators could be a fluid thing.

cthaeh · on June 26, 2019

This is a technocracy. Noone wants this but Hacker News.

basch · on June 27, 2019

Let's say you have a subreddit like /r/cooking. You think exposing a control in the user agent (browser, app, ui) that let's you sort recipe results by lay democracy, professional chefs, or restaurant critics taste is a technocracy?

Are consumer reports and wirecutter less valuable than Walmarts best sellers? Is techmeme.com worse than Hackernews by virtue of being a small cabal of voters? Should I dismiss longform.org and aldaily as elitist because they aren't determining priority solely from the larger populations preferences. Is Facebooks news algorithm better because it uses my friends to suggest content?

Is it a technocracy that metacritic and rotten tomatoes show both user and critic score? I'm proposing an additional algorithm that compares critic score with user score to find like voters and extrapolate how a critic would score a movie they have never seen. I think that would be useful without diminishing the other true scores. I would find it useful to be able to choose my own set of favorite letterboxd or redef voters and see results it predicts they would recommend, despite them never having actually voted on a movie or article. Instead of seeding a movie recommendation algorithm with my thoughts, I could input others already well documented opinions to speed up the process.

This idea would work better if people voted without seeing each others votes until after they vote. It might be hard to extrapolate Roger Ebert's preferences if voters formed their opinions of movies based on his reviews. You'd end up with a false positive that mimics his past but poorly predicts his future.

luxuryballs · on June 26, 2019

The reverse is a problem too, Google filtering things out based on their political leanings in an attempt to shape public opinion.

Nasrudith · on June 26, 2019

I haven't seen any examples which were anything other than runaway persecution complexes of those who found their world view was less popular than they believed - which were greeted with exasperation by testifying engineers who had to explain how absurdly unscaleable it would be to do it manually.

aslaan · on June 27, 2019

https://gohmert.house.gov/news/documentsingle.aspx?DocumentI...

IanSanders · on June 26, 2019

I think heavy reliance on human language (and its ambiguity) is one of the main problems.

Maybe personal whitelist/blacklist for domains and authors could improve things. Sort of "Web of trust" but done properly.

Not completely without search engines, but for example, if every website was responsible for maintaining it's own index, we could effectively run our own search engines after initialising "base" trusted website lists. Let's say I'm new to this "new internet", I ask around what are some good websites for information I'm interested in. My friend tells me wikipedia is good for general information, webmd for health queries, stackoverflow for programming questions, and so on. I add wikipedia.org/searchindex, webdm.com/searchindex and stackoverflow.com/searchindex to my personal search engine instance, and every time I search something, these three are queried. This could be improved with local cache, synonyms, etc. As you carry on using it, you expand your "library". Of course it would increase workload of individual resources, but has potential to give feel of that web 1.0 once again.

dsparkman · on June 26, 2019

This was devised by Amazon in 2005. They called it OpenSearch (http://www.opensearch.org/) Basically it was a standard way to expose your own search engine on your site. It made it is to programmatically search a bunch of individual sites.

TheOtherHobbes · on June 26, 2019

This would be ludicrously easy to game. Crowdsourcing would also be ludicrously easy to game.

The problem isn't solvable without a good AI content scraper.

The scraper/indexer either has to be centralised - an international resource run independently of countries, corporations, and paid interest groups - or it has be an impossible-to-game distributed resource.

The former is hugely challenging politically, because the org would effectively have editorial control over online content, and there would be huge fights over neutrality and censorship.

(This is more or less where are now with Google. Ironically, given the cognitive distortions built into corporate capitalism, users today are more likely to trust a giant corporation with an agenda than a not-for-profit trying to run independently and operate as objectively as possible.)

Distributed content analysis and indexing - let's call it a kind of auto-DNS-for-content - is even harder, because you have to create an un-hackable un-gameable network protocol to handle it.

If it isn't un-gameable it become a battle of cycles, with interests with access to more cycles being able to out-index those with fewer - which will be another way to editorialise and control the results.

Short answer - yes, it's possible, but probably not with current technology, and certainly not with current politics.

pharke · on June 26, 2019

Just want to point out that you're on a site that successfully uses crowd sourcing combined with moderation to curate a list of websites, news, and articles that people find interesting and valuable. Why not a new internet built around communities like this where the users actively participate in finding, ranking, and moderating the content they consume? It's not a stretch to add a decent search index and categories to a news aggregator, most do it already. If these tools could be built into the structure of the web we'd be half way there.

abathur · on June 26, 2019

Edit: I had myself convinced that comments have a different ID space from submissions, but that obviously isn't true. I've partly rewritten to correct for an over-guess on how many new submissions there are each day.

I agree with your general suggestion, but just want to highlight that scale issues still make me think whatever finds traction on HN is a bit of a crapshoot.

It looks like there were over 10k posts (including comments) in the last day, and the list of submissions that spent time on the front page day yesterday has 84 posts. I don't how normal the last 2 days were, but by eyeball I'd guess around a quarter of the posts are comments on the day's front-page posts. This means there are probably a few thousand submissions that didn't get much if any traction.

Any time I look at the "New" page, I still end up finding several items that sound interesting enough to open. I see more than 10 that I'm tempted to click on right now. The current new page stretches back about 40 minutes, and only 10 of the 30 have more than 1 point (and only 1 has more than 10). Only 2 of the links I was tempted to click on have more than 1 point.

I suspect that there's vastly more interesting stuff posted to HN than its current dynamics are capable of identifying and signal-boosting. That's not bad, per se. It'd be an even worse time-sink if it were better at this task. But it does mean there are pitfalls using it as a model at an even larger scale and in other contexts.

IanSanders · on June 26, 2019

User's search engine doesn't have to trust suggestions verbatim, it can always run its own heuristic on top of returned results. And the user could reduce the weight of especially uncooperative domains or blacklist them altogether.

ehnto · on June 26, 2019

So long as there is a mechanism for categorizing information and ranking the results, people will try to game the mechanism to get the top spot regardless of your own incentives.

Despite their incentives to make money, Google have actually been trying for years to stop people from gaming the system. It's impressive how far they've been able to come, but their efforts are thrwarted at every turn thanks to the big budgets employed to get traffic to commercial websites.

Nasrudith · on June 26, 2019

The only assured way to have a "neutral" search engine is to run your own spiders and indexers which you understand completely.

Neutral in that sense is only "not serving the agenda or judgement of another" at the obvious cost of labor and not just as a one off thing as the searched content often attempts to optimize for views. It isn't like a library of passive books to sort through but a Harry Potter wizard portrait gallery full of jealous media vying for attention.

And pendantically it isn't true neutral - but serves your agenda to the best of your ability. A "true neutral" would serve all to the best of their ability.

Besides neutrality in a search engine on a literal level is oxymoronic and self defeating - its whole function is to prioritize content in the first place.

narag · on June 26, 2019

A few years ago there was that blogs thing, with rss... all things that favoured federation, independent content generation, etc. Now it's all about platforms. I understand that "regular people" are more comfortable with Facebook but, other than that, why are blogs and forums less popular now?

JaumeGreen · on June 26, 2019

The problem with forums is that you end visiting 5~10 different forums, each with their own login, and some of them might be restricted at work (not that you should visit them often).

So it's easier to have 2~4 aggregators in where all the information you desire resides, even if in each of them there are different forums.

A unified entry point helps adoption.

ajot · on June 26, 2019

So, instead of platforms, other option would be a client software for different forums. Like Tapatalk. Is there anything like that but libre and/or desktop?

sosborn · on June 26, 2019

Reddit really did a good job of moving the masses away from site-specific forums.

asdff · on June 26, 2019

r3bl · on June 26, 2019

I'd argue that forums and blogs require more effort.

Read a cool blog post? Nobody around you will ever give a shit, because in order to do so, they'd have to read it too. Shared a photo from a vacation? It might start a conversation or two with people around you, while you receive dozens or hundreds of affirmations (in the form of likes).

I don't like to use social networks, but that's what I fall back on when I have a few minutes to spare. I rarely look at my list of articles I've saved for later — who has time for that?

asdff · on June 26, 2019

>I don't like to use social networks, but that's what I fall back on when I have a few minutes to spare. I rarely look at my list of articles I've saved for later — who has time for that?

Plenty of people. Ever push an article to a reader view service and see how long it takes to read? Most articles posted here on HN or the nyt front page can be read in 3-5 mins. Occasionally you'd get a 20 min slog.

I used to use social media way more, and by far my biggest wastes of time on the platform were those spare minutes you get a dozen times a day. On the elevator, waiting for the bus, waiting on food, anytime I could sit still the phone went out and my head went down because that's what everyone around me was also doing while waiting on their coffee.

Eventually I realized I was just idly scrolling and not retaining anything at all from those 30s-2m sessions on instagram. Just chomping visual popcorn. Now, anytime I have a spare 10 mins, I'll read an article or two from my reading list. Anytime I have less than a spare 10 mins, I'll twiddle my thumbs and keep the phone in the pocket.

I used to be much more scatterbrained and had trouble winding down for the evening and getting good rest. Now, I feel like a monk.

arpa · on June 26, 2019

the problem is multiple actually: a) most internet-connected devices these days favor content consumption vs content creation (blogs vs instagram),

b) mainstream culture > closely-knit communities (facebook > forums)

c) big-player takeovers (facebook for groups, google for search) over previously somewhat niche areas and, actually, internet infrastructure

d) if you're not a big player, you don't exist... and back to c)

asark · on June 26, 2019

> a) most internet-connected devices these days favor content consumption vs content creation (blogs vs instagram),

You chose Instagram as your example, to make the point that phones favor consumption over creation?

arpa · on June 26, 2019

Yes! Instagram has the appearance of a OC/creation platform, but, typically of such platforms (such as twitter/fb) the "content" is low-effort "convenient" opportunistic trivia, and the product consumed is likes, followers, etc.

z3t4 · on June 26, 2019

A search engine is more like putting the books in a paper schredder and writing the book title on every piece, then ordering the pices by whatever words you can find on it, putting all pieces that has the word "hacker" on it in the same box. Where as the problem becomes how you sort the pieces. Want to find a book about "hacking"? This box has all the shreds that has the word "hacker" on it, you can find the book title on the back of the piece. Second problem becomes how relevant the word is to the book.

wumpus · on June 26, 2019

The library index only indexes the information that fits on a card catalog card. That's extremely unlike a web search engine.

If you'd like to see an experimental discovery interface for a library that goes deeper into book contents, check out https://books.archivelab.org/dateviz/ -- sorry, not very mobile friendly.

Not surprisingly, this book thingie is a big centralized service, like a web search engine.

arpa · on June 26, 2019

maybe crowdsourcing would be a solution - something similar to "@home" project, only for web indexes/cache - maybe even leverage the browsers via plugin for web scraping. It already kind of works for getpocket.

tracker1 · on June 26, 2019

I don't think it would be an issue if Google wasn't creating "special" rules for specific winners and losers (overall). Hell, I really wish they'd make it easy to individually exclude certain domains from results.

The canonical example to me of something to exclude would be the expertsexchange site. After stack overflow, ee was more than useless, and even before it was just annoying. There are lots of sites with paywalls, and other obfuscations to content and imho these sites are the ones that should be dropped/low-ranked.

But the fact that there's no autocomplete for "Hillary Clinton is|has" (though "Donald Trump is" is also filtered). Yes, it's been heavily gamed. It's also had active meddling. And their control over YouTube seems to be even worse, with disclosed documents/video that indicate they're willing to go so far as outright election manipulation. With all indications that Facebook, Pinterest and others are going the same route.

ScottFree · on June 26, 2019

> or you find a novel way to make a neutral search engine

Just because nobody's said it in this thread yet: blockchain? I never bought into the whole bitcoin buzz, but using a blockchain as an internet index could be interesting.

KirinDave · on June 26, 2019

How would Merkle DAGs be relevant?

arpa · on June 26, 2019

even better, have something like git for the web - effectively working as an archive.

ScottFree · on June 26, 2019

The problem with git is countering nefarious forces. The blockchain is better in that regard because the consensus algorithm can be used to verify that the listings are legitimate.

arpa · on June 26, 2019

content change signed by creators private key, otherwise merge is rejected?

or, wiki approach...

fehrnstr · on June 26, 2019

Just signing with a private key isn't a guarantor of anything other than that if you trust that the person with the key is who they say they are, then the actual content is from them. But that would require a massively large web of trust in itself: that all the private keys would be trusted. And if you only let in private keys that you explicitly trusted, then it's very likely you could end up with an echo chamber

arpa · on June 26, 2019

good point, but we already have the PKI in place, and use it for SSL.

neoteo · on June 26, 2019

I think Apple's current approach, where all the smarts (Machine Learning, Differential Privacy, Secure Enclave, etc.) reside on your device, not in the cloud, is the most promising. As imagined in so much sci-fi (eg. the Hosaka in Neuromancer) you build a relationship with your device which gets to know you, your habits and, most importantly in regard to search, what you mean when you search for something and what results are most likely to be relevant to you. An on-device search agent could potentially be the best solution because this very personal and, crucially, private device will know much more about you than you are (or should be) willing to forfeit to the cloud providers whose business is, ultimately, to make money off your data.

jasode · on June 26, 2019

>, where all the smarts [...] reside on your device, not in the cloud, is the most promising. [...] An on-device search agent could potentially be the best solution [...]

Maybe I misunderstand your proposal but to me, this is not technically possible. We can think of a modern search engine as a process that reduces a raw dataset of exabytes[0] into a comprehensible result of ~5000 bytes (i.e. ~5k being the 1st page of search result rendered as HTML.)

Yes, one can take a version of the movies & tv data on IMDB.com and put it on the phone (e.g. like copying the old Microsoft Cinemania CDs to the smartphone storage and having a locally installed app search it) but that's not possible for a generalized dataset representing the gigantic internet.

If you don't intend for the exabytes of the search index to be stored on your smartphone, what exactly is the "on-device search agent" doing? How is it iterating through the vast dataset over a slow cellular connection?

[0] https://www.google.com/search?q="trillion"+web+pages+exabyte...

ken · on June 26, 2019

The smarts living on-device is not necessarily the same as the smarts executing on-device.

We already have the means to execute arbitrary code (JS) or specific database queries (SQL) on remote hosts. It's not inconceivable, to me, that my device "knowing me" could consist of building up a local database of the types of things that I want to see, and when I ask it to do a new search, it can assemble a small program which it sends to a distributed system (which hosts the actual index), runs a sophisticated and customized query program there, securely and anonymously (I hope), and then sends back the results.

Google's index isn't architected to be used that way, but I would love it if someone did build such a system.

ativzzz · on June 26, 2019

To some extent, doesn't Google already do this? Meaning that based on your location/Google account/other factors such as cookies or search history, it will tailor your results. For instance, searching the same query on different computers will result in different results.

Though to your point, google probably ends up storing this information in the cloud

bduerst · on June 26, 2019

Also instant search results, which were common search terms that were cached at lower levels of the internet.

dymk · on June 26, 2019

I think you're suggesting homomorphic encryption to execute the user's ranking model. Unfortunately, homomorphic encryption is pretty slow, and the types of operations you can do are limited. But it's viable if the data you're operating on is relatively small - e.g. just searching through (encrypted) personal messages or something.

ken · on June 26, 2019

I think you've got the right general idea, but I don't know that it has to be homomorphic encryption. After all, an index of the public web is not really secret, and the user doesn't have a private key for it.

In the simplest case, you could make a search engine in the form of a big, public, regularly-updated database, and let users send in arbitrary queries (run in a sandbox/quota environment).

That's essentially what we've got now, except the query parser is a proprietary black box that changes all the time. I don't see any inherent reason they couldn't expose a lower-level interface, and let browsers build queries. Why can't web browsers be responsible for converting a user's text (or voice) into a search engine query structure?

packet_nerd · on June 26, 2019

Or even an online search engine that was configurable where you could customize the search engine and assign custom weights to different aspects.

I'd love to be able to configure rules like:

+2 weight for clean HTML sites with minimal Javascript

+5 weight for .edu sites

-10 weight for documents longer than 2 pages

-5 weight for wordy documents

I'd also like to increase the weight for hits on a list of known high quality sites. Either a list I maintain myself, or one from an independent 3rd party.

Once upon a time I tried to use Google's custom search engine builder with only hand curated high quality sites as my main search engine. It was to much trouble to be practical, but I think that could change with an actual tool.

ntnlabs · on June 26, 2019

I think this is not what was the original question. A device that knows You still needs indexing service to find data for You. IMHO.

bogomipz · on June 26, 2019

I remember hearing something about Differential Privacy from a WWDC keynote a few years back however I haven't heard much lately. Can you say how and where Apple is currently using Differential Privacy/

esmi · on June 26, 2019

https://www.apple.com/privacy/docs/Differential_Privacy_Over...

Apple uses local differential privacy to help protect the privacy of user activity in a given time period, while still gaining insight that improves the intelligence and usability of such features as: • QuickType suggestions • Emoji suggestions • Lookup Hints • Safari Energy Draining Domains • Safari Autoplay Intent Detection (macOS High Sierra) • Safari Crashing Domains (iOS 11) • Health Type Usage (iOS 10.2)

Found via Google...

alfanick · on June 26, 2019

I see a lot of good comments here, I got inspired to write this:

What if this new Internet instead of using URI based on ownership (domains that belong to someone), would rely on topic?

In examples:

netv2://speakers/reviews/BW netv2://news/anti-trump netv2://news/pro-trump netv2://computer/engineering/react/i-like-it netv2://computer/engineering/electron/i-dont-like-it

A publisher of webpage (same html/http) would push their content to these new domains (?) and people could easily access list of resources (pub/sub like). Advertisements are driving Internet nowadays, so to keep everyone happy, what if netv2 is neutral, but web browser are not (which is the case now anyway)? You can imagine that some browsers would prioritise some entries in given topic, some would be neutral, but harder to retrieve data that you want.

Second thought: Guess what, I'm reinventing NNTP :)

decasteve · on June 26, 2019

Inventing/extending a new NNTP is nice idea too.

The Internet has become synonymous with the web/http protocol. The web alternatives to NNTP won instead of newer versions of Usenet. New versions of IRC, UUCP, S/FTP, SMTP, etc., instead of webifying everything would be nice. But those services are still there and fill an important niche for those not interested in seeing everything eternal septembered.

bogomipz · on June 26, 2019

I believe there is/was an extension to NNTP for full text search or at least a draft proposal no?

alfanick · on June 26, 2019

Another inspiration: DNS for searching.

What if we implement DNS-like protocol for searching. Think of recursive DNS. Do you have "articles about pistachio coloured usb-c chargers"? Home router says nope, ISP says nope, Cloudflare says nope, let's scan A to Z. Eventually someone gives an answer. This of course can (must?) be cached, just like DNS. And just like DNS, it can be influenced by your not-so-neutral browser or ISP.

quickthrower2 · on June 26, 2019

The proliferation of Black hat SEOs would render this useless.

PeterisP · on June 26, 2019

How would topic validity get enforced?

For example, if a publisher has a particular pro-Trump article, they would likely want (for obvious financial reasons) to push it to both etv2://news/anti-trump and netv2://news/pro-trump . What would prevent them from doing that?

Also, a publisher of "GET RICH QUICK NOW!!!" article would want to push it to both netv2://news/anti-trump and netv2://computer/engineering/electron/i-dont-like-it topics.

You can't simply have topics, you can have communities like news/pro-trump that are willing to spend the labor required for moderation i.e. something like reddit. But not all content has such communities willing and able to do so well.

swalsh · on June 26, 2019

I like this idea of people dreaming about a new internet :D

The idea of moving to a pub-sub like system is a good one. It makes a lot of sense for what the internet has become. It's more than simple document retreival today.

leadingthenet · on June 26, 2019

To me it seems that you’ve just recreated Reddit.

WhompingWindows · on June 26, 2019

You want to silo information and create built-in information echo chambers? That seems so bad for polarization.

volkk · on June 26, 2019

im starting to think echo chambers are just something that will forever be prevalent and its up to the users to try to view alternate viewpoints

bouk · on June 26, 2019

If netv2 is neutral, I would just stuff all of the topics with my own content millions of time, so everyone can only see my content

dymk · on June 26, 2019

Who maintains, audits, and does validation for content submitted to these global lists of topics?

codeulike · on June 26, 2019

That was what the early internet was like (I was there). People built indexes by hand, lists of pages on certain topics. There was the Gopher protocol that was supposed to help with finding things. But this was all top-down stuff, the first indexing/crawling search engines were bottom-up and it worked so much better. And for a while we had an ecosystem of different search engines until Google came along, was genuinely miles better than everything else, and wiped everything else out. Really, search isn't the problem, its the way that search has become tied to advertising and tracking thats the problem. But then DuckDuckGo is there if you want to avoid all that.

m-i-l · on June 26, 2019

In the very early days, you didn't need a search engine because there weren't that many web sites and you knew most of the main ones anyway (or later on had them in your own hotlists in Mosaic). Nowadays you need a search because there is so much content.

The problem is that the amount of content and the size of the potential user base are so large that is is impossible to offer search as a free service, i.e. it has to be funded in some way. Perhaps instead of having a free advertising-driven search, there would be space for a subscription-based model? Subscription based (and advert free) models seem to be working in other areas, e.g. TV/films and music.

Another problem though is that more and more content seems to be becoming unsearchable, e.g. behind walled gardens or inside apps.

vpEfljFL · on June 26, 2019

Exactly my thought. But it definitely wouldn't get mass adoption which is good because mass-market content websites are questionable in terms of user experience (they also need to cover content creating costs by popups/ads/pushes). One thing, though, ad based search engines lift ad based websites because they can sell ad on a second end.

Maybe we'll see advent of specialised paid search engines SaaSs with authentic and independent content authors like professional blogs.

supernovae · on June 26, 2019

Search is the problem. If you don’t rank in google you don’t exist on the internet. There is an entire economy built on manipulating search that is pay to play in addition to google continually focusing on paid search of natural SERPs. Controlling search right now is controlling the internet.

bduerst · on June 26, 2019

>If you don’t rank in google you don’t exist on the internet.

Maybe in 2009. Today there are businesses today that exist solely on Instagram, Facebook, Amazon, etc.

codeulike · on June 26, 2019

Whatever you replace Search with would be gamed in the same way.

supernovae · on June 26, 2019

true, but when it was lycos, hotbot, altavista, google, webcrawler, aol, gopher, archy, usenet and so many other sources it was much easier to exist in many ways (harder to dominate) - people used to ‘surf the web’, join “webrings” and share stuff.. now they consume and post memes. so i blame behavior as much as monopoly

codeulike · on June 26, 2019

A lot of other things have changed since then, so the difference in tone you are noticing might not have much to do with search engines. In 1996 there were only about 16 million people on the internet, and usage obviously skewed towards the more technical nerdy crowd. Now there are 4,383 million people on the internet. Which is about 57% of everyone.

Sohcahtoa82 · on June 27, 2019

I see this a lot on HN. People forget that a lot of things in the early days of the Internet only worked because there were so few people on the Internet.

If you were rich and had a T1 in your home in the days everyone was on dialup, sure you could host a website yourself. But these days, even if you're one of the lucky residents on a gigabit symmetrical connection, there's a limit to how much you can serve. Self-hosting isn't an option unless your website is a niche.

supernovae · on June 27, 2019

More people and fewer companies dominating how everything is found... i don't think that change is for the better.

Fjolsvith · on June 26, 2019

If your target audience isn't on Google, then you don't have to rank there.

Almost all of my customers find me through classified advertising websites. Organic and paid search visitors to my site tend to be window shoppers.

vid · on June 26, 2019

I think in one sense the answer is it always depends who or what you are asking for your answers.

The early Web wrestled with this, early on it was going to be directories and meta keywords. But that quickly broke down (information isn't hierarchical, meta keywords can be gamed). Google rose up because they use a sort of reputation system based index. In between that, there was a company called RealNames, that tried to replace domains and search with their authoritative naming of things, but that is obviously too centralized.

But back to Google, they now promote using schema.org descriptions of pages, over page text, as do other major search engines. This has tremendous implications for precise content definition (a page that is "not about fish" won't show up in a search result for fish). Google layers it with their reputation system, but these schemas are an important, open feature available to anyone to more accurately map the web. Schema.org is based on Linked Data, its principle being each piece of data can be precisely "followed." Each schema definition is crafted by participation from industry and interest groups to generally reflect its domain. This open world model is much more suitable to the Web, compared to the closed world of a particular database (but, some companies, like Amazon and Facebook, don't adhere to it since apparently they would rather their worlds have control; witness Facebook's open graph degeneration to something that is purely self-serving).

_nalply · on June 26, 2019

The deeper problem is advertising. It is sort of a prisoner's dilemma: all commercial entities have a shouting contest to attract customer attention. It's expensive for everybody.

If we could kill advertisement permanently, we can have an internet as described in the question. This will almost be like an emergent feature of the internet.

worldsayshi · on June 26, 2019

We could supercharge word of mouth. I've been thinking about an alternative upvote model where content is ranked not primarily based on aggregate voting but by:

- ranking content that users you have upvoted higher

- ranking content that users with similar upvote behaviour higher

While there is a risk of upvote bubbles, it should potentially make it easier for niche content to spread to interested people and make it possible for products and services to spread using peer trust rather than cold shouting.

thekyle · on June 26, 2019

> ranking content that users with similar upvote behaviour higher

This is what Reddit originally tried to do before they pivoted.

https://www.reddit.com/r/self/comments/11fiab/are_memes_maki...

worldsayshi · on June 26, 2019

Oh, interesting!

Makes me think that their original plan could still work if they just put a bit more effort into crafting that algorithm.

For example, the main criticism brought up is that things that you dislike that your peers like keep getting recommended. Why not add a de-ranking aspect into it and try adding downvote-peers in addition to upvote peers.

I imagine you could create this interesting query language that could answer questions like: what things do you like if you like X and Y but not Z? (I kind of remember that something akin to this have been hacked together using subreddit overlap.)

endymi0n · on June 26, 2019

As long as there are big companies making money off their products, you can be sure they'll find a way to advertise them to you.

eterps · on June 26, 2019

I've had similar ideas recently. Especially niche content (or shared research) would probably be notoriously hard (WRT false positives) for machine learning to decide whether it is relevant to you, people with similar interests know that much better.

I was also wondering what would be good options to store votes/upvotes in a decentralized way.

worldsayshi · on June 26, 2019

> people with similar interests know that much better

Yeah, I wonder if there is a cheap way to test this. Actually! There could be! Like using favorite's here on hacker news. That could be mined and visualized in various ways. (Although a quick sample shows me that it's a rarely used feature)

> I was also wondering what would be good options to store votes/upvotes in a decentralized way.

Yeah there are a lot of interesting optimization challenges if you really want to utilize upvote graphs for ranking.

scrollaway · on June 26, 2019

Not to echo a R&M quote on purpose but that just sounds like targeted advertising with extra steps.

fifnir · on June 26, 2019

> ranking content that users with similar upvote behaviour higher

That's how you make echochambers

worldsayshi · on June 26, 2019

All social media have echo chamber characteristics. You have to counteract it with transparency and opt-in/out.

loxs · on June 26, 2019

So, basically Facebook?

Fjolsvith · on June 26, 2019

This sounds so much like Facebook.

worldsayshi · on June 26, 2019

Any "social" ranking algorithm is going to sound at least superficially similar to what's already out there.

vfinn · on June 26, 2019

Maybe if IPFS (~web 3.0) succeeds in the future, you could solve the advertising problem by inventing a meta network, where all the sites involved would agree to follow certain standardized criteria of site purity. You'd tag the nodes (or sites), and then have an option to search only sites from the pure network. Just a thought. edit: Maybe this would lead to a growing interest in the site purity, and as the network's popularity would grow, you could monetize the difference to its advance.

ativzzz · on June 26, 2019

Be careful what you wish for, as you might get AMP or some propriety Facebook format as a standard instead.

vfinn · on June 27, 2019

Well, I was thinking we could have endless number of (meta) networks / network configurations / standards. I mean each node could have as many tags as needed, e.g. #safe_for_children_v1.1 #pure_web_v2.0. Then you could configure your search engine / browser according to these tags. You could also stack tags to simplify things, e.g. pure_stack would include both #safe_for_children and #pure_web, etc. Maybe I'm missing something, but it seems doable.

olegious · on June 26, 2019

If we kill advertisement, you can say goodbye to the vast majority of content on the internet. The better approach is to make advertising a better experience and to create incentives for advertisers to spend ad dollars on quality content.

rglullis · on June 26, 2019

There will always be bottom-feeders as long as there is a market where people are not forced to choose with their wallets. Killing the "vast majority of content on the internet" seems like a good thing to me, honestly.

Fjolsvith · on June 26, 2019

> Killing the "vast majority of content on the internet" seems like a good thing to me, honestly.

I sure hope my content of preference beats out yours for not getting killed.

rglullis · on June 26, 2019

I am reasonably sure that even if our preferences are complete opposite and we eliminate 99% of content in general, you would still have enough quality content for what your interests are. But just to be extra sure, please vote with your wallet and actively support the things you like and don't let advertisers do the choosing for you.

_nalply · on June 26, 2019

Advertisement just should not be the central means of income of content producers. I really hope this point of view gets killed together with advertisement.

pif · on June 26, 2019

> Advertisement just should not be the central means of income of content producers.

Can you propose any viable alternative?

anchpop · on June 26, 2019

Ads are placed via an automatic auction upon pageview. GM and Ford both want to show me an ad when I google "what car to buy", and have automatic systems that decide how much they'd be willing to pay to show me that ad based on my likelihood of purchase (income, sex, location, etc). Why not have a system that follows me around and outbids them using funds from my bank account, to show me an ad which is just a transparent image? That way I don't have to see ads but content creators still get what they need?

thekyle · on June 26, 2019

What you are describing is exactly what Google Contributor is trying to do. We'll have to see how it turns out.

https://contributor.google.com/v/beta

anchpop · on June 26, 2019

It says it only works with "participating sites". I wonder why

gerash · on June 26, 2019

The first version worked exactly as you proposed. The UX however was meh. You'd place a monthly limit on your ad (outbidding) spend (eg. $2) and it ended up outbidding only some of the ads: those served by Google which were also outbid by your amount.

So from a user's perspective it didn't fully work. Also the ad space wasn't fully removed (perhaps due to technical reasons) but was replaced with a blank image. It also didn't catch on much.

So they tried to pivot and now the program works with certain cooperating websites to fully get rid of all ads but I'm sure bigger websites would rather be in total control of monetizing themselves and can spend on the necessary IT infra. similar to most online newspapers these days.

I think an advertiser (eg. a legal firm) might be willing to pay eg. $10 per ad impression but no user is willing to outbid it so I think the first model (outbid in the auction) is more sustainable and profitable for both parties but needs to have all ad exchanges on board.

So in short, it's been tried but wasn't an instant (or even a slow) success and idk whether Google will continue investing in it or not.

pif · on June 26, 2019

Are you actually proposing for people to gasp! pay gasp! for content?

anchpop · on June 26, 2019

Google makes around 30 billion/quarter on ads. Assuming most of that comes from 200 million users (they have more than that but I assume a lot are not worth very much to advertisers), and their ad revenue comes from a 50% cut of the total ad payments, that comes out to around $300/quarter or $75 a month. I'd pay it, but I think most wouldn't.

asdff · on June 26, 2019

Certain % of your internet bill goes to helping pay to host the sites you are visiting every billing period. If a site is large enough hosting would be sustained by the visiting userbase rather than the site owner. If a site is too small for that, chances are hosting has been cheap anyway.

bluGill · on June 26, 2019

Subscription. It is only viable for content that well off people use a lot of though, even then only when you are much better than the free competition.

arpa · on June 26, 2019

whatever wikimedia organisation does :)

asark · on June 26, 2019

1) Not to most of the best content, 2) other business models may have an actual chance when not competing with "free", 3) actually-free, community-driven sites and services (and standards and protocols—those used to be nice) will have a larger audience and larger creator interest when not competing with "free" (and well-bankrolled).

fifnir · on June 26, 2019

The vast majority of content is absolute shit though, so speaking strictly for me, I'm willing to try

amelius · on June 26, 2019

The question was about search engines, not about content.

But I think the combination of advertising+search engines is particularly bad, so paying for search would be a great first step.

arpa · on June 26, 2019

maybe it's worth saying goodbye to "8 reasons why current internet sucks that drive spammy copywriters mad". The whole more-clicks-more-revenue based approach did not do good things to the online content.

marknadal · on June 26, 2019

I wrote up a proposal on this, changing the economics to adapt to and account for post-scarce resources like information:

https://hackernoon.com/wealth-a-new-era-of-economics-ce8acd7...

wolco · on June 26, 2019

To kill advertising would mean the web would live behind many walled gardens where each site requires membership.

For the remaining free sites you will see advertising in different forms (self promotion blog, the upsell, t-shirt stores on everysite, spam-bait).

Advertising saved the internet.

Now tracking.. for advertising or other purposes is the real problem.

BjoernKW · on June 26, 2019

Other than a completely new approach for producing value such as the 'Freeism' one described in the article suggested in this comment https://news.ycombinator.com/item?id=20282851 (which I hadn't time to read yet and hence I'm neither in favour of or against) this simply boils down to the questions of who will pay for relevant content and what the business model will be.

By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.

Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.

TeMPOraL · on June 26, 2019

> By and large, people don't seem to be willing to pay for content on the web. Hence, advertising became the dominant business model for content on the web.

I don't think the causality is right here. People might not be willing to pay for content on the web because advertising enables competitors to offer content for free. If you removed that option, if people had no choice but to pay, it might just turn out that people would pay.

BjoernKW · on June 26, 2019

How would you achieve that? By outrightly outlawing advertising?

There absolutely are paid options on the web. It's just that they don't seem to appeal to a sufficient number of buyers so advertising could become irrelevant.

TeMPOraL · on June 26, 2019

> How would you achieve that? By outrightly outlawing advertising?

Yes.

> There absolutely are paid options on the web. It's just that they don't seem to appeal to a sufficient number of buyers so advertising could become irrelevant.

They aren't appealing in the presence of ad-subsidized free alternatives. Remove the latter, and they just might become appealing again.

notahacker · on June 26, 2019

Few things sound less likely to improve the internet than some entity having the power to content-police the web and remove anything it accuses of the thoughtcrime of advertising...

politician · on June 26, 2019

You can block third-party advertising structurally, so that a content-cop isn't required. First-party advertising cannot be blocked, of course, since that's just content.

For example, using browsers that impose a Content Security Policy that prevents anything from being loaded from domains other than the origin.

notahacker · on June 26, 2019

Sure, but if the only ad restriction was mandatory blocking of third party content, you'd just see ad agencies work out ways they can get the content they want to serve hosted locally (and lots of more interesting third party embedded content cease to exist due to it not having the same commercial rationale for workarounds...). If you start forcing companies not to promote third party products with anything that even looks like an ad, you'll just see a greater proportion of the free-to-access internet turn into paid-for reviews and influencer marketing. Not sure that'd be an improvement, and I'm pretty sure the next logical step of getting the content cops ruling which content looks too commercially-oriented for us proles to look at is even worse.

You can block third party advertising structurally using uBlock without ruining the internet for everyone else.

TeMPOraL · on June 27, 2019

Advertising isn't a thoughtcrime, it's a cognitive/psychological assault.

I think a combination of consumer protection laws, truth in advertising laws and data protection laws, all turned up to 11 (even GDPR), could achieve most of the desired outcome on the Internet without much problematic "content-policing". But I'm not sure. You won't eliminate advertising from the Internet entirely, but making it illegal would make undesirable advertising more expensive, by creating vast amount of risk for advertisers and simultaneously destroying the adtech industry, thus rendering most of the abusive practices that much less efficient.

(Also, to be clear, I want all advertising gone. Not just on-line, the meatspace one too.)

Fjolsvith · on June 26, 2019

Huh. That sounds like a free market model.

Isn't this what different newspapers like NYT and WSJ are moving towards? Why can't both models coexist?

TeMPOraL · on June 26, 2019

Because one totally destroys the other.

Slave labour, selling poison or dumping waste into rivers are all superior business models too, but that doesn't mean they should exist in a civilized society.

Nasrudith · on June 26, 2019

The train also destoyed the horse drawn wagon train for bulk land transport.

Just because it totally destroys another business model doesn't mean it is wrong. Felony interference with a business model protectionism isn't good for societies. Historically this stagnant "stability" gets them lapped and forced into the modern world if lucky or conquered if not no matter how vigorously they insist that it is the only and right way.

TeMPOraL · on June 27, 2019

Of course. I'm not saying displacing business models is bad per se. I'm saying that just because one business model can displace a different one, doesn't immediately mean it's good. Plenty of business models are morally bankrupt, and I believe "free but subsidized by advertising" is such, by virtue of advertising itself[0] being morally bankrupt.

--

[0] - as seen today; not the imaginary "informing customers about what's on the market" form, but the real "everyone stuck in a shouting contest of trying to better manipulate customers" form.

Fjolsvith · on June 26, 2019

> Find another way for someone to pay for relevant content and you can do away with advertising. It's as simple as that.

Not so simple. What is relevant for me may be irrelevant for you.

BjoernKW · on June 26, 2019

You pay for content that's relevant to you. I pay for what's relevant to me.

Fjolsvith · on June 26, 2019

Oh, okay. I was assuming we had someone like the government pay for content.

jppope · on June 27, 2019

Promotion is a need, and a very important need for ideas to spread. We all know that the concept of "if you build it they will come" doesn't work". Google's adaptation for this was to make advertising relevant... which is actually a considerable improvement over historical media models...

There's a saying in sales: "people hate to be sold, but they love to buy"... which is akin to what you are saying here. Advertising isn't the problem... the problem is that the reasons why people are promoting aren't novel enough... (rent seeking... which creates noise)

bduerst · on June 26, 2019

The only way to kill advertising is to have perfectly efficient markets.

Until then, you're going to have demand for ferrying information between sellers and buyers, and vice versa, because of information asymmetry. You may disagree with some of the mediums currently used, finding them annoying, but advertising is always evolving to solve this problem, as is evident in the last three decades.

quelsolaar · on June 26, 2019

Yes, we need search engines, but they don't need to be monolithic. Imagine that indexing the text of your average web page takes up 10k. Then you get 100.000 pages per Gig. It means that you if you spend ~270USD on a consumer 10 tera drive you can index a billion webpages. Google no longer says how many pages they index, but its estimated to be with in one order of magnitude of that.

This means that in terms of hardware, you can build your own google, then you get to decide how it rates things and you don't have to worry about ads and SEO becomes much harder because there is no longer one target to SEO. Google obviously don't want you to do this (and in fairness google indexes a lot of stuff that isn't keywords form web pages), but it would be very possible to build an open source configurable search engine that anyone could install, run, and get good results out of.

(Example: The piratebay database, that arguably indexes the vast majority of avilable music / tv / film / software was / is small enough to be downloaded and cloned by users)

rhmw2b · on June 26, 2019

Google's paper on Percolator from 2010 says there are more than 1T web pages. 9 years later there is surely way more than that.

https://ai.google/research/pubs/pub36726

The real issue would be crawling and indexing all those pages. How long would it take for an average user's computer with a 10Mb internet connection to crawl the entire web? It's not as easy a problem as you make it seem.

quelsolaar · on June 26, 2019

I'm not saying its easy, its not, but people tend to think that because Google is so huge, you have to be that huge to do what Google does. My argument is that in terms of hardware google need expensive hardware because they have so many users, not because what they do requires that hardware to deliver the service for one or a few users.

I have a gigabit link to my apartment (go Swedish infrastructure!). At that theoretic speed I get 450 gigs an hour, so I could download ten tera in a day. We can easily slow that down by an order of magnitude and its still a very viable thing to do. If someone wrote the software to do this, one could imagine some kind of federated solution for downloading the data, so that every user doesn't have to hit every web server.

z3t4 · on June 26, 2019

Could be done with a p2p "swarm". Peers get asigned pages to index then share the result.

tudelo · on June 26, 2019

How would you begin indexing everything?