Hacker News new | past | comments | ask | show | jobs | submit login
Google Books and the open web (sappingattention.blogspot.com)
78 points by benbreen 8 months ago | hide | past | web | favorite | 21 comments

Search can no longer be relied upon to return standard results. Using the book index as a synonym, when you pick up a volume of the Encyclopedia Britannica, you know that the index is the same as it was when the book was printed. It's not been secretly updated based how many people looked up each section in that volume globally. You know that if you are the first ever person to look up Early Chinese Pottery that if the entry isn't indexed, then it's not in that volume, and it's not because the index decided that it wasn't a popular enough topic and removed it from the index. I know Google don't actually remove non-popular items from their indexes, they just bury them behind results pages of more popular 'relevant' results. To the lay person, the result they are looking for may as well be removed.

I appreciate that Google and Bing spend millions of dollars trying to purge their indexes of spam, but they also seem to spend millions of dollars creating adaptive results based on an unknown list of parameters and variables that are opaque even to the search organisations themselves.

It's time a new search player stepped up with a proper mature indexing concept that allows for deep hierarchical filter searching based on a semi-static system that only removes results from search when they actually go offline, and doesn't prioritise social media posts above everything else.

We need to bring the internet back to the primary function of an information resource first and foremost. The clickbait-ad-selling junkie that it has become is creating a closed circle of idiotic users consuming and creating yet more idiotic content, like a fish eating its own tail. What's more, Google and Bing are the primary enablers of this whole problem.

I don't know how to fix it, but somebody has to do something before the internet becomes nothing more than an ad-ridden gogglebox in the same way television has already become.

If there were a better way to find the results that users are actually looking for, then Bing and Google would be working their assess off to find it.

That you presume they have any other goal in mind is frankly bizarre to me.

Through experimentation they have arrived at these algorithms to maximize the probability that you find what YOU were looking for. Based on who you are, what you've searched in the past, what context they think you're in, what's happening recently, what's happening nearby, etc.

If you want the search engines to ignore the context of the individual, then most people will get what they are looking for far, far less often than they currently do.

That's what Google and Bing are experts at optimizing.

There are search engines that don't try to optimize based on your personalized context. You and everyone else are free to use them and you don't because they're not as good at what you want. Statistically speaking. I admit there are edge cases.

> Through experimentation they have arrived at these algorithms to maximize the probability that you find what YOU were looking for.

Depending on the query, 60-80% of Google Image results point to Pinterest, which no one is looking for and is not the original source of anything that shows up.

For regular searches, in the past few years there has been a continuous and significant decline in both the accuracy of results and the ability to communicate to Google what you're looking for in the first place.

Quoted phrases are frequently reinterpreted, destroying the intent.

On a query with 3 words, Google will frequently ignore one of them entirely, based on an opaque and constantly changing set of rules, and will return results for the other 2 words that have no relevance whatsoever.

Perhaps most frustrating is the constant irritating battle to trick Google's many different query "filters", not all of which are NSFW "safe search" filters, to avoid the very strange situations where a query with a small number of common english words somehow returned no results whatsoever, despite Google's clear willingness to throw away even the most important words in a query in order to return something.

I think both yours and Jaruzels points are good. But...

> If there were a better way to find the results that users are actually looking for, then Bing and Google would be working their assess off to find it.

I don't think we should assume this. I'd only be wiling to assume that Google/Microsoft would change how they do search if it could make them more money. That doesn't mean 'better' search necessarily. If they could convince users to accept it, I could imagine an extremely ad bloated search service, or search service that just sells the top ranks of terms to the highest bidder.

For the most part, Bing and Google make more money if users find results that they're actually looking for.

Sure, that's not 100% true, but I think people overstate the degree that that's not true.

Google and Bing can't convince users to accept something like that. In fact, people abandoned search services like that in favor of Google and Bing, because they don't do that.

> That you presume they have any other goal in mind is frankly bizarre to me.

Google is an advertising company. They are not a search company. There is absolutely no reason to think that their goal is 'good' results. Their goal is 'engaging' results.

There is a very, very good reason to think their goal is good results:

If users believed they got bad results at Google, then Google could no longer show the advertising.

My mind jumped to this one sentence summary: Search has come to reflect the wishes of the searched, rather than the searcher.

"Wishes of the searched" has several dimensions, and now goes far beyond any setting in robots.txt .

What we basically need is universities taking the lead in developing a variety of search algorithms for different purposes, and publishing them.

I may be mistaken, but I think universities are publishing more than enough papers on the subject. The actual problem is that implementing and operating this kind of things is costly.

Jaron Lanier has good things to say in this space. I for one would happily pay a small amount per month for an adequate search platform, or a fraction of a penny per search, say. I'm sick of my interactions with the web (and other people) being shaped by stupid algorithms. There never was a free lunch: now let me pay for it!

>To the lay person, the result they are looking for may as well be removed.

Imagine if you were looking for books about Lobsters, and you start finding books about Cows. All the books before were about Lobsters, so you assume you've ventured into the Cow section, and turn back.

If you are on page 12 and the relevancy of the results around you is low, you are invisible. Everyone has turned back by now.

Essential issues with comparing google's algo to any other ranking system is that the data they have on those pages is bad, so results are bad, so it's not as good as even a human manually sorting things. Obviously the ability to serve every query imaginable instead of blunt categories is hugely different, but I digress.

Google Search is like when you go into a store and an employee asks, "do you need help finding something?". I tell them no, but they awkwardly follow me around the store anyways.

It's more like going into a store, approaching an employee, telling them what you're looking for, and then complaining about them doing their best to help you achieve your aim.

To stretch the pained metaphor further and disagree, it's like when the employee does not tell you about the low-cost item, but only the vaguely related but high-commission item.

Their best for you, or for advertisers?

Google has been doing this for ages, not only with Books results but web search results. If the query is associated with some Entity, it will return various (hopefully useful) about that entity even if they do not contain text involved at all.

This is how, for example the query "that movie where there are two magicians" returns varied results about The Prestige, whereas "that movie where there are four magicians" returns various results about Now You See Me.

A pure text search would likely return very similar results, but the entity mapping heavily impacts the search results.

I felt like I arrived in the future when I spoke into my phone, "George Clooney movie with the penis chair," and "Burn After Reading (2008)" was the top result.

If you quote a part of your search query, then google only returns pages containing that text.

Used to. My recent experience is that this has either been disabled, or is applied unevenly.

Relegated to Tools → All results → Verbatim.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact