Hacker News new | past | comments | ask | show | jobs | submit login
Google Memory Loss (tbray.org)
1377 points by AndrewDucker on Jan 15, 2018 | hide | past | favorite | 539 comments

I've noticed this many times too, particularly recently, and I call it "Google Alzheimer's" --- what was once a very powerful search engine that could give you thousands (yes, I've tried exhausting its result pages many times, and used to have much success finding the perfect site many dozens of pages deep in the results) of pages containing nothing but the exact words and phrase you search for has seemingly degraded into an approximation of a search engine that has knowledge of only very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!), and in general is becoming increasingly useless for finding the sort of detailed, specific information that search engines were once ideal for.

To add insult to injury, if you do try to make complex and slightly varying queries and exhaust its result pages in an effort to find something you know exists, very often it will think you're a robot and present you with a CAPTCHA, or just ban you completely (solving the CAPTCHA just gives you another, and no matter how many you solve it keeps refusing to search; but they probably benefit from all the AI help you just gave them, what bastards...) for a few hours.

Google had the biggest most comprehensive index for many years, which is why it was my sole search engine. Now I'm often finding better results with Bing, DuckDuckGo, Yahoo, and even Yandex, but part of me is very worried that large and extremely valuable parts of the Web are, despite still being accessible, simply "falling off the radar".

> has seemingly degraded into an approximation of a search engine that has knowledge of only very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference...)

I think the biggest irony is that the web allows for more adoption of long-tail movements than ever before, and Google has gotten significantly worse at turning these up. I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds.

This is a nightmare if you have any hobbies that share a common phrase with a vastly more popular hobby, and is especially common when it comes to tech-related activities. I use Linux at home, and I program VBA at work. At home Linux is crossed out of most of the first few pages, and I just get a ton of results about Windows, and at work VBA is crossed off and I get results about VB6 and .NET.

Completely. Useless.

I can only imagine this has something to do with their increasing reliance on AI, and the fact that the AI is probably incentivized to give a correct response to as many people'above the fold' as is possible. If 95% of people are served by dropping the specifically-chosen search term, then the AI probably thinks it's doing a great job.

It seems like the web is being optimized for casual users, and using the internet is no longer as skill you can improve to create a path towards a more meaningful web experience.

This same AI effect can be seen in the Android keyboard, where _properly_ spelled words will be replaced after typing another word or two because it's been determined to be more likely what you want. It's infuriating.

It actually does this with such consistency that I think it's a very specific mistake. For example, I frequently swipe "See you soon." It always, always renders as "See you son", which I then have to manually retype. Sometimes twice. I don't have a son, and I'm not a blind old cowboy who jocularly refers to any random person as "son". I honestly just want to type "soon", for the love of... anyhow, this is an ongoing, totally inane battle of wills with my phone.

I think what's happening here is that there's a very impressive and sophisticated heuristic for predicting the probability of what you want to type by looking at the frequency of what you have typed in similar contexts. It uses its state-of-the-art AI to evaluate the context and build an array of candidate words, along with their respective probabilities. I suspect it is very accurate as it does this. Then it sorts the array by probability and pops the top element into the predictive text input.

Alas -- per my pet theory anyways -- it sorts like this:

  candidateWords.sort((a,b) => a.probability - b.probability);
Rather than:

  candidateWords.sort((a,b) => b.probability - a.probability);
...which is how a two-character diff can turn a brilliantly helpful AI into an ultimately annoying damnit-I-need-to-smash-something antagonist.

When a system tries to do something automagically and makes a mistake, it is very frustrating, especially because, to allow seamless large changes, hide competitive details, or make the UI more "streamlined", such systems rarely give users options to tune the results. A system that gives controls to the user and expects them to tweak their own experience is so much better in my opinion, except in the metrics of first-time usage (or first-time-since-major-change), when those controls look like information overload and make the system seem like something that must be learned before it can be used.

And yet, when the latter inevitably breaks on an edge case, users can try to fix it themselves. They don't hit a wall of frustrated "I can't do anything", they hit a challenge that they are empowered to try overcoming. They already know what they want and can set things that way, rather than trying vaguely to teach a system (machine learning, hardcoded heuristics engine, department of humans making seemingly unconnected changes to a GUI with each passing version and no obvious plan) to understand their desires.

I miss the days when users were seen as intelligent professionals who are willing to change settings, create and re-dock an assortment of toolbars to every edge of every screen/window to suit their daily tasks, read a manual (or at least search the integrated help entries) to overcome problems. Rather than "busy" phone users who just want to complete a task with minimal time spent learning and get back to posting on facebook or whatever, and who accept the automagical solution because adequate results instantly are somehow considered better than great results with some work.

Ugh, that whole block of text just kept growing; I had better leave and go ramble/rant at trees or clouds or something elsewhere.

There was a time when autocorrect on phones was moderately useful. That time has past. One of the first things I do with a new phone now is turn off autocorrect completely. It doesn't bother me to manually correct my own mistakes, but it bugs me a lot to have to correct the mistakes of the freaking robot that's supposed to be saving me time.

> I frequently swipe "See you soon." It always, always renders as "See you son"

This works fine for me with GBoard. Are you drawing a little circle on the o to indicate you want the double letter?

Wow, I... didn't know that gesture was a thing. Thank you! It seems to help a bit! I'm now getting a 50/50 son/soon ratio. That said: when I manually type "soon" -- definitely with a double "o" -- it still autocorrects to "son" on the first try. So, going to keep my pet theory intact.

The weird thing is that if I type nothing at all, the contextual predicted "next word" on GBoard is actually very good -- I wasn't praising it for comedic effect. But it really does seem like there's a sign error in a sort function which kicks in after you start typing.

If this comment is what teaches me I've been expected to do that, I'm going to throw my toys out of the pram.

I've specifically googled for instructions on how I might be expected to use the swipe-style keyboards, and turned up nothing.

As far as I know it's in the tutorial they insist you do upon first enabling swiping

edit: Don't actually see a tutorial in the app. Maybe I'm confused with another app such as Swype, but the same technique seems to apply to all.

I'm pretty sure Swype did explain it, and I know SwiftKey has this gesture as well (though I don't remember if it was ever explained to the user or just assumed they'd remember it from Swype).

The default keyboard on my Android didn't have any such tutorial. I did look in the app and online.

Swype taught me, it has helpful tips and I still use their keyboard to this day

Swipe style keyboard? How does that effect spelling ? So confused - I get how it changes to "predict," anticipate or whatever the AI engineers say but what I don't understand is changing a real word to a non-word. That's not intelligence. That's something else and I don't know what they get from changing word to a misspelled and nonexistent word other than eventually driving the human race insane: I'm going to be a Luddite;

Er, "son" and "soon" are both real dictionary words. Swipe-style keyboards use an internal dictionary to find the most likely match to the swiped pattern, and have methods for adding new words (mine for example automatically adds any tapped-out words after you hit space).

Very likely, every time you type "soon" and it corrects to "son", the probability of the change gets increased. So, the more it errs, the more wrong it will be in the future.

About it only replacing after you start writing, the probability of "son" must grow faster than the one of "soon" after you type "so". If there were such a huge bug, you would be seeing a new word every time, not always the same one.

Try switching between languages. I work in English, my wife is Italian, my colleagues are french speaking, I am Spanish and have Catalan friends, and I live in Germany. I gave up on predictive keyboards long ago.

You don't even need foreign languages to run into this problem. English is the only language I speak fluently, but I'm Australian. I'm also a programmer, and I have some american friends. So, depending on context sometimes I spell 'color' (programming or talking to Americans), and sometimes I spell 'colour' (talking to Australian friends and family). Same with behaviour / behavior, favour / favor, etc. The context for which spelling I decide to use is complicated. In the same document I might name the `getColor` function, but describe it as getting the colour of a pixel. I might have two chat windows open side-by-side with different people and in each window spell the same words differently.

All my devices insist on shaming me (or autocorrecting me) for one of those spellings. At this point it feels like a complete gamble which I'll get corrected on. I'm just slowly getting used to correcting the autocorrect. :(

When you said this it reminded me that css allows colors to be either ‘gray’ or ‘grey’. Which I’m glad because then I don’t have to fumble until I picked the right one. Though, I’ve learned to type hex colors (especially grey scale colors) intuitively now, so I usually use those a lot more than typed grays now.

SwiftKey seems to manage three languages (Finnish, Swedish, English) simultaneously pretty well. No need to explicitly switch language either, it figures out the current language as you type.

Yes, SwiftKey works almost perfect for me too in three languages scenario (Bosnian, English, Dutch).

The only thing it is confused about is the letter "i", which means "and" in Bosnian, and SwiftKey often capitalizes it where it shouldn't. Probably happens because I often mix Bosnian and English in the same message (instead of translating technical terms).

I have 3 languages in my GBoard. I've just arrived from a Spanish speaking country, and now when I try to write in Portuguese it still completes with spanish words. Sure they are are very similar languages.

Ditto on the SwiftKey recommendation: I have German, English and French always at my disposal.

I'm on the same boat, I'm Italian and speak Spanish a lot with friends, English is my daily language. The latest Google keyboard has helped but it's not nearly as good as T9 was.

It's all the keyboards. Swype used to work great, but now when it actually gets words right, if I have sentence with a second word that could possibly have been two similar words on the path it will just straight up replace both rendering the sentence completely incomprehensible. Who are these people that don't correct their sentences until the second incorrect word?

I second that. 5 years ago I could swipe a whole message blindly with no errors. Now I have to correct every second word.

I'd love a feature to disable all that deep learning and AI and just use the algorithm they originally had (proximity of where you typed to words in the dictionary). That worked so much better.

I'm glad it's not just me that's seen Swype getting progressively worse! Either you have to really emphasize what letters you want, or you give up and type it out. I'm about finished with it.

Wow. Thanks. I thought it was just me!

I had a Galaxy S3 and was a heavy user of Swype. My friends marveled at how fast I could type with it. It was perfect! I recently changed my phone to an S8 and Swype became unusable. It gets almost every second word wrong, so much that I'm thinking of disabling it entirely :(

Speaking of virtual keyboards I always liked (and still use) the one from Blackberry and I never had such problems (you should be able to install Blackberry keyboard on any Android phone). I switched to Google Pixel from iPhone once I found out I can use Blackberry keyboard there.

Oh man, Android does that too? This awful, terrible, no good fake "AI" behavior on iOS was one of the many quality issues with modern iOS that was making Android look more attractive.

iOS keyboards have been getting worse with every update, probably as more and more engineers feel the need to make a mark, or are required to fix bugs, and they spoil it. The simple and predictable statistical model of the original iOS was better than what we have now. So much of iOS was better back then, IMHO.

On BlackBerry if it autocorrected to something you didn’t want, one press of delete would revert to your original word. iOS makes you a) rekey the whole thing and b) will probably try to change it again. I can only assume that no one (or moons, it literally just tried) who works on this uses it themselves!

I always thought Blackberry had the best autocorrect experience. My old Blackberry would occasionally surprise me by catching something I wouldn't think it would know about, and after some initial tuning never ever frustrated me. It only enhanced my experience.

I’m on an iPhone now but in many ways it’s a step backwards from my old BlackBerry Pearl, let alone the Bold I got next

Call me a crazy person, lately I've been thinking long and hard about going back to the BB (BB Bold 9930) and am willing to make all the sacrifices that come with it for a lot of the frustrations with smartphone OSes listed in this thread. Maybe I'm getting older and my demands on what I expect from a phone are beginning to normalize and simplify: Text, Calls, Email, a browser for sports scores and reading news articles on the train.

Honestly it's very probable that I'll only ever keep a smart phone around as a music/podcast/audiobook device.

I never used a BB, but I did have a bunch of "dumb" cell phones, and I miss them. Making calls on them was easy: Flip open, press Talk, dial, Success! Now it's: Turn the phone on (which takes approximately 60 seconds on my Samsung because it can't see keystrokes until it finishes trying to connect to wifi), hit Home button, wait 3 seconds for that to work, hit Phone icon, wait 2 seconds, hit keyboard button, wait 1 second, dial phone waiting 500 ms between each key, press Call icon, wait 20 seconds for call to go through, hit speakerphone button (because as often as not the cheek sensor fails to work and my cheek disconnects the call), Success!

As the saying goes, smart phones are just pocket computers with shitty phones attached.

I think you need a better phone. I have none of these problems on my Samsung Galaxy Note 4.

This is a Galaxy S6. I'd probably get better performance if I didn't keep it in low-power mode, but I have no choice because the quest for thinner phones means the battery only lasts 2 hours in regular-power mode.

I am actually looking at prices now. The last model of Pearl can be had “new” for £85... only question is the battery...

Wish I knew what bb 9930 was but I am considering becoming a Luddite

On iOS, one press of delete and what you originally typed will show as a suggestion.

That’s only when you actually catch it mid stream.

Once you hit space and start a new word it’s game over.

Frequently happens at the worst moment when you’re trying to mash out a complex explanation in a rushed fury.

That's not true for this situation. For the "deep" replacements more than a word back that we're talking about, and also other changes that happen without any user notification via blue text popups, there is no easy way to fix without the long process of moving the cursor.

For a replacement? Or for the next word?

4 years ago was a very different time. Software quality in general continues to drop. I’m not sure if that’s due to increasing incompetence in the software engineering workforce (unlikely, but possible), malice (more unlikely), or apathy (most likely).

When wages don’t grow over 10 years, what incentive is there to write the best software you can?

I suspect the cause is a little more subtle (and terrifying).

The majority of people don't give a shit. Correct spelling and obscure searches are not even on their radar, it's not a part of their reality. Don't let the comments here fool you -- it is a very specific, picky, technical crowd that frequents HN.

The voice of "those who care" has always been a minority, although it used to matter more, simply because people who care and worry and try to do a good job tend to have more power and money (conscientiousness is a great predictor of success), and so businesses cater to them more. Now that everything seems to be turning more uniform, more global, more binary, more equal, that voice is marginalized (good thing? bad thing?) -- you're seeing the effect of a hoi polloi stampede.

So it's not the fault of "incompetent programmers" -- it may be a trickle down effect of our social incentives and economic trade-offs.

Yes, this the real reason. It's a variation on the Tyranny of the Majority.


> The voice of "those who care" has always been a minority

To add to this, people who cares most powerfully had probably switched to alternative, open source, software solutions. This leaves the remaining group with less "care" on average so fewer would complain. Kinda like evaporative cooling.

> 4 years ago was a very different time. Software quality in general continues to drop. I’m not sure if that’s due to increasing incompetence in the software engineering workforce (unlikely, but possible), malice (more unlikely), or apathy (most likely).

I think another dimension is how deployability has changed.

Before... When you wrote and shipped software, getting your software out was a big problem, a big deal. This also meant that if you shipped a bug, shipping an update would be equally expensive (for you and your customers), and the amount of goodwill you lost would be quite tremendous.

Now everyone has a appstore, always up to date apps, and whatever else is usually "in the cloud" somewhere. The time of people installing applications in a normal desktop-context, with installers and having IT-administrators handle updates once every second year is surely long gone.

With that kind of change, and an increased focus on delivering early, doing proper QA is no longer something which is rewarded in the market.

Who cares if you made a bug-free, awesome service, when you did it 6 months after someone else shipped a similar, but buggy service which everyone is already using? They have established a user-base and as such already has social momentum and lock-in.

What do you have to offer which is not only fantastic enough to make some bother migrating, but also so amazing that these people will also go convert their friends and families? "Less bugs" alone is not going to cut it.

Basically, taking the time to deliver quality software these days is increasingly something you get punished for in the market-place.

The result? We get shit like this and we can only blame ourselves.

This will only get fixed when software starts to get liability and refunds.

If enough people start suing or asking back for their money, companies will surely improve their QA.

I find it more reasonable to assume that you can only improve a very specific function so much before your "improvements" turn to "pointless sidegrades".

Well said. I have seen a lot of sidegrades over the past 10 years!

I don't know how easy it is to switch keyboards on iOS, but its a great idea on Android.

I use Minuum - looks whacko at first but let's me use way more of the screen when replying and it is really accurate!

Easy bit wonky swift key is probably the best

you can turn off autocorrect

It would be better if you could turn off some autocorrect features and leave others. It has a lot of facets, and autocorrect is great when it works.

And they can't even fix a simple typo like this after years of usage - "th8s 8s example tex6". Come on Google, you made that UI, you know that i and 8 are next to each other, you have a database of correct words and probably of typical errors and typos, wtf?! (you even know how to correct "wft" to "wtf" and can correct simple word with number typo).

Yes! Those one-off errors used to be fixed by most autocorrect systems. Seems like whoever wrote the new AI-based systems didn't make a checklist of existing features and attempt at least parity before switching over. It's embarrassing.

I don't know; I'm pretty happy that I don't have to type apostrophes any more. I can just go on typing words like "ill" and "wed" and it'll figure out after a few more words that those should be "I'll" and "we'd".

What the parent comment is talking about is something more extreme and I've noticed it too. It sometimes changes prior words that are valid after you have moved on to the next word. It's not correcting the word you just typed, it's correcting a previous word without any sort of feedback like you get for normal corrections- it's going backwards and changing earlier words, and then you try and fix it but the exact same correction applies automatically again over and over.

Unfortunately I can't remember any examples at the moment, it's just something that happens to me every so often. They're really irritating though, because they aren't well expressed in the current autocorrect UI (which works on the current word) and it doesn't seem to get the hint when you go back and correct it, so it keeps applying it over and over.

No, that's what I was talking about too—it needs the context of the (part of speech of) the next word to figure out in the cases I mentioned whether I actually intended the (correctly-spelled) word "wed" or the (correctly-spelled) word "we'd". It doesn't change it until after you hit space to commit the word that comes after the "wed" input.

Ah, ok. I'm pretty sure I've had it happen with more than just contractions though, like with common phrases. The real problem is that it's really hard to undo the correction. I need to start keeping better track of it so I can file a bug report... it makes it really difficult to type certain combinations of words.

If I type "its" and "it's" correctly, they are regularly changed to the incorrect spelling. If I don't bother, they are not corrected. The only way to get correct "its" and "it's" is to go back and fix them after auto-correct has screwed them up.

It's probably because the spelling is corrected based on a machine-learned model, whose corpus is likely to contain many instances where "its" and "it's" were swapped.

Which means the corpus is broken. But regular people rarely care about correct spelling, in my experience, and so I doubt corpus maintainers will care either...

You're imagining that people who make NLP corpora actually vet the text going into them? I dream of a world where people can be convinced to care that much. I'm not even talking about the scenario you suggest of filtering for proper word usage, I'm talking about filtering at all.

The corpora used for popular word embeddings are full of weird nonsense text (in the case of word2vec) or autogenerated awfulness like spam and the text of porn sites (in the case of fastText) or both (GloVe). And most people who implement ML don't care how their data is collected.

I mean, nobody expects the engineers to manually read through everything, but if the quality of the input text is significant for the quality of the autocorrect (or whatever other application you're using machine learning with), you kind of have to make sure the input is pretty good... You could for example choose datasets which is expected to contain mostly correct grammar and spelling (such as Wikipedia, books, etc.) rather than datasets which is expected to contain mostly incorrect grammar and spelling.

Or don't use a machine learning model. I honestly don't care, just don't automatically turn a correct "its" into an incorrect "it's".

Wouldn't late edition books with only corrected text be better, proofread, edited, proofread, edited, ... Google have millions of them they've assumed copyright of. Surely there's enough text there. Do they really just use random website text?? Nearly every news story I read has errors and they have style guides, trained writers, editors, etc..

Do publishers sell their published text as a mass for use in AI/ML? Like 1000 books, no images or frontispiece, etc., possibly jumbled by sentence/para/page.

That is one of the most draining things I've heard recently. Sigh. Are these open projects where someone could in principle improve them?

The best open corpus project I know of is OPUS: http://opus.nlpl.eu/

They say they welcome contributions; I don't know if they just mean new sources of text, or if this includes code for filtering or fixing their existing ones.

I've been trying to figure out how to disable that! Anyone have any ideas? Google search turned up no results, but after this thread I shouldn't be surprised.

I use Swift keyboard on Android, never had these kinds of issues, its predictions are remarkably good.

And it's guaranteed not to have postdictions?

I've never seen it changing words after you have started on the next word.

Nope, the autocorrect works until you press space. It doesn't go back and "fix" the word you're already finished with.

that's not Android keyboard. That's Google keyboard. Solution: use a different keyboard. I've used the AOSP keyboard since I started using Android. Sure it's made by google but it does what I say rather than what some proprietary idiot robot thinks I think. While your'e at it, use the send feedback function to let them know you hate it.

> that's not Android keyboard. That's Google keyboard. Solution: use a different keyboard. I've used the AOSP keyboard since I started using Android.

How to I try the AOSP keyboard? I have a Google Pixel and it looks like my only built-in option is Gboard. I'm guessing they removed it.

Its annoying autocorrection tendency to choose 'fir' instead of 'for' frustrates me. People almost never use the word 'fir', but use the word 'for' often. It would be nice if you could blacklist words you want it to never choose.

There is SwiftKey, on the hand, that does those kind of annoying corrections a couple of times, remembers your choice, and does them no more. It's been a long time since I've seen a 'fir' with SwiftKey.

I've definitely noticed that but I'm not sure the reason is so innocent

That's simple to turn off. At least on my Samsung.

Please teach me. It's driving me crazy.

On my S8 keyboard the configuration menu can be opened using the gear icon appearing in the keyboard. In the configuration menu there should be a bunch of options under "Smart typing" including 'Predictive text' which can be turned off.

This is in the european market, though, I don't know if it's configured differently for different markets.

On google the query "samsung turn off keyboard autocorrect" provides links such as https://www.androidcentral.com/how-turn-and-autocorrect-sams... which may or may not be relevant to you.

Here's the problem. I already turned off all those options long ago, but the annoying behavior remains. To be clear, I don't mind the keyboard predicting things. It's only the retroactive changes that bother me.

Someone else suggested Swift keyboard, so I'm going to give that a try.

I was amazed by the Gboard thing though. It usually picks the correct phrases for me.

I think the biggest irony is that the web allows for more adoption of long-tail movements than ever before, and Google has gotten significantly worse at turning these up. I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds.

Google wants you to be mainstream now. If everyone thinks the same and wants the same things -- even if contextually as a member of one of a couple of hundred disparate "marketing cohort" categories -- it will be far easier to target advertising to you. It's in Google's interest for you to conform now. Be easy to categorize. Be easy to predict. So think the same as the members of your peer group, so they can sell hyper-targeted advertising to other corporations. (Have you noticed that social media tends to motivate you to conform?) Google has no use for the long tail anymore -- no use for quirky and inscrutable scenes and subcultures. Instead, it now has the cultural power to transform you (the product) into an even better product.

Remember: 1. If you're not paying for the service, you are not the customer. You are the product. 2. Given sufficient quantity and concentration, Power corrupts. Always.

I see quite the opposite inceintive for Google. If you are a very eccentric individual and they know those quirks, they have a huge competitive advantage in targeting ads to you vs. some bulk radio broadcast ad etc.

I see quite the opposite incentive for Google. If you are a very eccentric individual and they know those quirks, they have a huge competitive advantage in targeting ads to you vs. some bulk radio broadcast ad etc.

However, the observation of this article, and my observation as well, is that Google isn't currently capable of parsing very individual quirks. Rather, Google is able to place you into one of a number of highly conformist boxes. They don't have to understand you as an individual. They just have to 'box' you more effectively than their competitors.

There is nothing in the market or otherwise emergent in the nature of data and such categorization which fundamentally motivates Google to be able to parse anyone's quirks or understand the essence of a scene or artistic movement. If Google can gain a competitive advantage by creating a number of honeytrap doppelgangers which draw people away from the long tail and sequester them into un-creative, imitative, and highly conformist boxes, then so much the better for them.



In much the same way, I find that recommendation engines come up with annoying pale imitations of bands/musicians I like. I also wonder why authoritarianism seems to spread so effectively across social media, and why certain authoritarian movements seem to get such ready support from within Google and various social media companies. It's because, as a product, conformist/authoritarian screechers are more easily herded, replicated, categorized, and packaged than real individuals who think for themselves and apply principles.

If you liked Radiohead, you must also like Coldplay!

"I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds...It seems like the web is being optimized for casual users"

In other words, the internet is becoming more of a collective and caters less to the individual.

If you have a special interest, then who cares? You should just adopt more normal hobbies. If you have a unique political viewpoint, then get over it and join one of the major parties. If you are oppressed, then it's fine as long as 95% of the people are content (don't worry, we'll carve out a few protected classes so that the pictures still look diverse).

It is much worse. It contextually averages everybody, so if 95% of the people are oppressed, but always on different ways, it also doesn't matter.

> It seems like the web is being optimized for casual users, and using the internet is no longer as skill you can improve to create a path towards a more meaningful web experience.

No. _Google_ is being optimized for casual users, and using _Google_ is no longer a skill you can improve to create a path towards a more meaningful web experience.

Yes. The problem seems to be compounded by Google's intention to correct spelling, and (for the long tail searches) to assume you're really just looking for whatever everyone else is searching for.

Yes! You need to beat Google with a stick to enable searching for the furce awakens (that's not a typo) poster which Disney itself produced. Used to be that quotes around a phrase stopped this typo correcting but no longer.

BTW. this is my go-to example for the "piracy is a service problem" -- this is 100% Disney IP for the fans of two billion dollar movies and you can't buy it as a poster. https://images.moviepilot.com/image/upload/c_fill,h_470,q_au... So I went on eBay, found a custom poster printer service and got it printed and shipped for 12.48 USD: https://i.redd.it/x4mmmvayunbx.jpg I would've been glad to pay double, triple for an official version but no. You just can't buy it.

Huh? I just searched for "the furce awakens". And got:

    Showing results for "the force awakens"
    Search instead for "the furce awakens"
And hitting that got me the Zootopia stuff. So?

CHX is still right in what they say. The point here is that Google override peoples searches based on some random fuzzy logic, and whatever you've searched before, regardless of whether you are logged in or not.

To be fair, >90% searching for "furce" probably meant force. For the rest it's one click more.

I can understand "firce" and "fprce" or even "f9rce" but "furce" is, on a standard QWERTY keyboard, two keys away so more unlikely to be a typo.

Do Google do spelling correction based on letter locality on the expected user keyboard? Never seen any corrections that would suggest that, often wondered why not.

There used to be a service that would, given a query, search eBay for listings that matched it or common misspellings based on nearby keys. I wonder if any of that logic has made its way into modern autocorrect explicitly or if it would be gathered implicitly through studying what users actually correct.

But this is the wrong way around. At least make it an option to offer the typo correction instead of forcing it on you. Think of how much money Amazon made with one click. Two clicks is twice the clicks. Focus on fast and relevant searches not trying to guess what I meant. It's great to have a typo correction on offer but let me decide whether I want that to be default -- I don't. I know what I am searching for.

I agree that having that option would make sense. But arguably the default should be correction, because most users are probably making typos.

Fine but I am logged in, if it were an option I could fix it once and carry on.

Yes, I totally agree.

It doesn't help that it seems every other project out there is trying to name itself after common words in the lexicon. I dare you to try finding information on the Box library.

This one? http://www.wiltshire.gov.uk/librarylocations.htm?act=show&li...

It seems that context-sensitive search is a curse as well as a blessing. Wikipedia at least offers you disambiguation pages; perhaps a search engine should let you pick a "domain" to prioritise search in.

Didn't Google used to do this, one of the SE did definitely - they'd group results into subject areas.


Is it this thing at the very top of the search results?

Fwiw my first page had nothing to do with anything computer related, except for the last result, which was how to make dialog boxes in Windows.

First result: https://littlefreelibrary.org

Search: Box library

Moreover, it was neither of the 2 Box libraries I was referring to. So that makes 3.



> the AI is probably incentivized to give a correct response to as many people

I'd bet it's probably trained to maximize clicks on ads...

Just like youtubes algorithm is trained to maximize viewing time (and thus ads you see), instead of showing you videos you'd actually enjoy more.

> trained to maximize viewing time (and thus ads you see), instead of showing you videos you'd actually enjoy more.

How would youtube quantize how much enjoyment you're having, if not by tracking viewing time?

As an aside, when watching youtube on my TV, I don't think there's a way to thumbs up or down videos. Even on my PC, when the video is in fullscreen, there's no way to thumbs up or down either.

Let's not forget that Google is an ad company, operating a search engine for that purpose.

I bet long-tail queries, while capable of carrying very targeted ads with a high CTR, are just too rare and thus less profitable. Likely much more people have basically no clue and formulate queries approximately, and very few query precisely in improbable ways, so getting the majority to the results they "meant" gives better financial results.

To tell the truth, you still can enter the "verbatim mode" using a menu, and try to find that improbable cluster of words. It's an advanced feature now, requiring a bit of digging, but it's there.

I'm not sure I've got a fully formed concept here, but I'm throwing it out in case someone finds it interesting.

Re: long-tail movements and switching contexts between work and home, I wonder if a better example isn't much better user or persona management. Every person is interested in more than just one thing, and can conceivably be looking for the same word in two contexts. Down below, a multi-lingual speaker has given up on predictive keyboards.

What if we could enable users to switch personas/contexts as intuitively and easily as people code-switch in real conversations? Setting up the profiles would be messy and cumbersome at first which probably kills the idea dead in my hands. I'm not knowledgeable enough about psychology or machine learning to figure out if that could be solved automatically.

One of the difficulties I see with this approach is to identify personas.

It would be fine it was work + home persona. I feel though that it woild end up like work local branch + work parent company report + work programming + home family member + home personal hobbies + home grand parent’s health

For context I already have two IDs for work and private stuff, I still hit a lot of barriers on Google search and ended up in ddg, using the location switch when needed.

A bit on the snarky side I realize, but someone at Facebook has probably already dug into this. Whether or not they'll admit it or discuss it publicly is far less likely.

> I can only imagine this has something to do with their increasing reliance on AI, and the fact that the AI is probably incentivized to give a correct response to as many people'above the fold' as is possible. If 95% of people are served by dropping the specifically-chosen search term, then the AI probably thinks it's doing a great job.

There has to be an additional reason though, otherwise they could've just put the AI "enhanced" above the fold, and append the actual accurate results below it, for the people doing research.

You can add a plus (+) before a word you require to be in the search results. Eg. "+VBS array loop". You can filter out word by prepending a minus (-). You can surround exact phrases in quotes (") and you can even allow synonyms by prepending a tilde "~"

Example query: +vbs array ~loop "magic number"

If Google does not find anything it will remove some parts so that you will get any results at all

Unfortunately Google dropped + when they introduced Google+ social network. Nuff said Google+ is a ghost down nowadays but the + operator hasn't returned yet.

And even written "search term" doesn't work consistently anymore, Google thinks to know better and semi-randomly omit words and show results no one asked for.

Didn‘t they drop the + operator when they launched Google+? I thought now you have to surround your term with „“ instead of +?

Yes, you're right, they did this a looong time ago... BUT, sometimes Google still insists on showing you results that don't contain the word or phrase that you've explicitly requested must be in the results - utterly infuriating!

You now have to go into Search Settings -> Verbatim. Unfortunately you’d have to do this for each search session as it doesn’t persist.

In Chrome, I've set up Google Verbatim as a search engine and made that the default address bar search. Here is the URL format: {google:baseURL}search?{google:RLZ}{google:acceptedSuggestion}{google:originalQueryForSuggestion}{google:searchFieldtrialParameter}{google:instantFieldTrialGroupParameter}sourceid=chrome&ie={inputEncoding}&q=%s&tbs=li:1

Especially when that word is 'not'.

Yeah, you are right; https://support.google.com/websearch/answer/2466433?hl=en

They even dropped tilde :(

But you might be able to achieve the same things with their advanced search; https://support.google.com/websearch/answer/2466433?hl=en

That stopped some time and and it's now extremely frustrating as the main article describes. I'm sure I read that putting the search in quotes is like the old + but it doesn't seem to be, rather just priority to that search.

I feel this is more a case of Google thinking it knows better.

I have a similar problem with ebay, if I'm looking for an "HP Z430" then I'll get pages of other items where Z430 isn't even in the description.

I can see some value in searching for alternative spellings and related items but there should still be the ability to be exact in your search terms.

This is [one of] the dark side[s] of the "adaptive" or "personalized" Web. The more adaptive and less deterministic, the less collaboratively and reliably useful it is, even if it happens to be better at serving attention-monster momentary gratification of the meme du jour.

The reliance on advertising revenue models means that all such Web properties morph into being essentially adversarial attention traps against users.

I would pay $50/month in a heartbeat for access to a no-ads, deterministic, guts-openened-with-API Google type engine (even if rate limited at that price to some high-human volume of usage).

> pay $50/month in a heartbeat for access to a [...] Google type engine

Never. And definitely not at USD 50 per month. That is a huge amount for this, although I suspect you're a pretty rare customer and/or exaggerating. Broadband or Mobile service, Satellite TV, etc. all have packages that cost about this much. Even a magazine subscription is a fraction of this amount, Netflix is only about 5-10 dollars a month, isn't it? I could see people paying 10% of the Netflix (ad free TV) charge for an ad free search engine maybe so USD 0.50 to 1.00 per month, 1% of your suggestion...

What annoys me the most is that Google seemingly no longer allows searching for phrases at all. In the past I could search for a specific phrase like this..

  "It never rains on a Wednesday in Rockshire"
..and Google would return websites containing that exact phrase. That no longer works. Nowadays parentheses are largely ignored as far as I can tell. This is super annoying because quite often I am really looking for a website/websites containing a specific phrase.

It seems they internally switched to indexing single words only. So if you search for a phrase Google will instead return websites containing (some of) the words in your phrase in no specific order and maybe not even next to each other.

I think I understand why: Indexing / searching based on words is massively easier and massively less resource intense than a system which can search for specific phrases.

Furthermore the type of searches Google wants/expects you to do e.g. "best hairdryer" work well enough, in fact work better if you only search for single words and then filter / organize the results using AI / using information you have collected about the user.

EDIT: I was wrong. It still does work for many phrases, just not the ones I tend to search for. See below.

EDIT 2: I actually have no idea what is going on there. I know that searching for exact phrases doesn't work for me like it used to but I have no idea why..

Quote marks still work.

Google tries to find your phrase, and fails, then tries to give you results with some of your words missing without telling you.

If you search for "It never rains on a" with quotes it will only show you pages with that exact phrase. If you search for that without quotes it will show you a messy group of results based on what it thought you meant.

No, they don't but I know what you are referring to.

Most of the time I get the result:

  No results found for [phrase]
  Results for [phrase] (without quotes)
..but the thing is that I know websites containing [phrase] exist. Often many of them and they aren't "dark web" either. Google used to be able to find them but no longer is.

This gets more confusing because sometimes it does work. Namely if you look for phrases which are popular search queries e.g. if you search for..

  "hit me baby one more time"
..it will work amazingly well. But for other phrases it won't work at all and you will get the no results reply instead. It used to work for all phrases independent of their search popularity.

But yes, my original post was wrong. It does work for a lot of phrases.. but not for others.

I guess Google's engine dynamically optimizes things, and only indexes often searched phrases.

EDIT: Okay, this assumption was wrong too. I did some experiments to confirm my theory and the results showed that it is wrong..

This is what it looks like to me: https://imgur.com/zre6HdT

Probably this is because such pages do not exist, except for this page. Or at least DDG thinks so: https://duckduckgo.com/?q=%22It+never+rains+on+a+Wednesday+i...

Indeed a very nice way to prove this thread wrong :-)

I had also wondered whether its ability to match exact phrases has degraded. It would make sense, since keeping individual words on a distributed index is a lot easier than keeping a long phrase. But I had no way of telling, not having benchmarked it in the past.

Can they not index bigrams or trigrams, then chain together index hits? E.g. "It never rains in december" would hit on "It never rains", "never rains in", and "rains in december". Any result that hits on all of the indexes is not guaranteed to hit on the entire phrase but it would be a good candidate for the top result. The longer the phrase, the more likely a candidate hitting all necessary index phrases would match the exact phrase. This would at least put a limit on how large the indexes need to get.

Further, if they retain copies of the full text in their database they could do a filtered scan of the documents that hit on all subphrases to guarantee exact match. I could see that having too much of a performance impact at scale though.

In any case the dumbing down of Google search over the last few years is immensely frustrating to me.

> Google would return websites containing that exact phrase. That no longer works.

Er, what? No search engine finds "It never rains on a Wednesday in Rockshire". Results for "It never rains on a Wednesday" with exact matches

Google - 6 pages

Bing - 3 pages

Yandex - 0 pages

Duck Duck Go - 3 pages

That was just a random example of a phrase. I made it up on the fly.

Not sure what you expect then.

He isn't complaining that the specific phrase "It never rains on a Wednesday in Rockshire" to turn up any results. He used it as an example of the kind of phrase which, even though they exist somewhere on the web, won't be found with Google.

> Er, what? No search engine finds "It never rains on a Wednesday in Rockshire".


Google: https://imgur.com/a/tbOzE

Yandex: https://imgur.com/a/LB2qx

DuckDuckGo: 0

Bing: 0

Well now it does. Strangely, for Bing this thread is third on the list but does not show any of the searched for text in the text blurb like the other suggestions.

I also get very angry at Google trying to stop me from making advanced queries. Even if I only chain a couple of Google's very limited operators together it shows me captchas and after a while the captchas do not stop. I keep solving them and it just wants more.

Lately I've also been noticing that the behaviour of some google search operators are broken. " something " "otherthing" is not considered as an AND " something " OR "otherthing" is not considered as an OR. Google shows me the results it wants. I recently tried to research FreeBSD and Meltdown (I tried many times: "FreeBSD" "Meltdown", "FreeBSD" AND "Meltdown" etc.) and almost no result involved the terms both FreeBSD and Meltdown. The interesting thing was, Google did not say there was no results matching my criteria, it kept showing me popular IT news, linux news etc. It was extremely frustrating.

The only operators that work are site, date and filetype. The logical operators do not work reliably or do not work at all.

If you're searching for something popular google finds it best, but if your search patterns are deviant google ignores you and even thinks you're a bot and refuses to service you.

I'm enormously frustrated that Google has been killing off advanced-search tools.

The loss of '+' is annoying, particularly since quoted and non-quoted mixtures are unreliable. Searching on "freebsd" "meltdown" might have solved your problem, but it's too unpredictable to be sure. My experience suggests that Google is doing something with site-level search, such that a site with only 'freebsd' won't appear but a site with 'freebsd' and 'meltdown' on different pages still might.

'-', meanwhile, seems to simply be disabled some of the time.

I'm primarily a DDG user, and I get annoyed that it often drops terms from my query as well. However, if you prefix the term with a +, such as `+freebsd +meltdown`, it will reliably keep that term in my query and show me what I want. So my queries often get overwritten, but at least I can reliably override it.

I'm glad I'm not alone in thinking this way. I wish I could use the Google of 10 years ago. Pre-personalization, pre-deep-learning RankBrain, just the reliable and consistent search engine where complex queries will return the result I want, if it is there.

Today, if I'm searching for something unpopular or specific, I usually get frustrated. You would expect the opposite to happen as the size of the web should increase over time.

Using the Google of 10-years-ago on the Internet of 10-years-ago, or using the Google of 10-years-ago on the Internet of today? Because I imagine doing the latter would just result in an endless wasteland of SEO-optimized content leadgen pages.

The good thing is SEO optimized content is relatively trivial to distinguish from the snippet, and other search engines like DuckDuckGo feel far more reliable and consistent without being overrun by SEO junk.

You do have a point however.

Not really, not for most people, especially for farm sites like eHow. If you ask how to make a tequila, you'd get an SEO-optimized eHow site instead of say, an authoritative page of the world's top tequila expert.

If DDG was so good, people wouldn't need to use !g for tail queries so much. Too much anecdotal claims every time these issues come up and no objective quality evaluation.

I mainly use !g for the "stupid" queries, actually. The ones where I would actually prefer Google's AI second-guessing me. Also "local"-type searches. I don't voluntarily give Google my location[0] but if I type the city name too, it works fine. Actually I just tried and DDG does it just as well and gives a map and address too, so I may go for that next time.

Except for lyrics, DDG is quite good at that and often even presents the proper "zero click result" straight away (though usually the lyrics are cut off at a point and I still need to click).

On the other hand, the vast majority of my DDG queries are !bangs for other sites, because I know what site will have the page I'm looking for. Usually !w for Wikipedia (and the other wikipedia stuff like !wnl and !wt), after that probably one of the image searches !gi/!yi/!bi, then !imdb, !discogs and !whosampled. Oh and occasionally !hn, of course :)

I believe that DDG would have had a lot harder time getting as successful as it is today if Google had retained its old "AND" search engine behaviour (as explained above, keywords used to have an implied "AND" between them).

[0] I only log in to these types of big "social" things using a private tab, for GMail and to get my personal YouTube suggestions and subscriptions. It's a bit of a hassle to use the 2FA Authenticator code every time after I closed my browser, but it's worth Google not tying everything I search to my account, or getting "bubbled".

> If DDG was so good, people wouldn't need to use !g for tail queries so much.

When I resort to !g Google usually returns nothing interesting either and the most promising links are usually marked as visited, since I already clicked them from DDG.

I'd venture that most people are happy with the eHow results. Those wanting more depth may have to look through 1-2 pages of results, but I don't think most people want/need "tequila expert" level depth when searching for that. I'm not saying this is correct or "desirable," just saying that's one justifiable explanation for this behavior.

SEO penalization is orthogonal to the search algorithm.

The simplest solution: eliminate SEO'd pages from the index before applying search to it.

Something tells me the solution of removing "bad" pages isn't as trivial as you seem to imply it is here...

The important point is that it's orthogonal to search.

You can make SEO-penalization as complicated as you want without affecting search. The only thing search should be able to deal with is a SEO-penalty which is just a number.

Is your point that SEO-penalization is just a sorting implementation concern? I think Google does far more than that - perhaps removing/banning sites or categories of content that are deemed to be gaming their system.

No, I was replying to what derefr said.

And removing/banning sites can be done by setting the SEO-penalty to infinity.

I'm not trivializing anything. All I'm saying is that the concerns can be separated.

Either would be pretty cool :)

I wish I could go back to the Google of 1999. PageRank was magic before people learned to game it. The results were incredibly good.

PageRank was magic before people learned to game it

Even then, before Google started going after them by filtering results, the SEO spam sites were pretty easily recognisable and ignorable as you scrolled through the results. Now, they still show up in droves (try searching for service manual PDFs and you'll instantly see what I mean) but you hit the end of the viewable results far too soon to find the useful stuff buried in later pages.

In other words, Google's ranking now seems to be "good SEO'd sites > spammy SEO'd sites > everything else", and cuts off results before getting to that third category, when that third category should ideally switch places with the second and maybe even the first.


Amusingly enough one of the definitions of "rank" is, according to Wiktionary, "having a very strong and bad taste or odor"... as in the smell of a decaying brain. How fitting. I almost wonder if it's deliberate. If they called it BrainRank (like PageRank), the adjectival meaning seems to be emphasised less.

The emphasis would be incorrect if the term were reversed - RankBrain is a Brain for Ranking like PageRank is a Rank for Pages.

No doubt there is someone in Google, however, who has a project for ranking AIs called BrainRank for their own amusement.

Anyway, your observation is also an amusing interpretation.

PageRank is called after its inventor Larry Page. It's just another case of Nominitive Determinism!

An excellent point, had totally forgotten :)

I think part of this has to do with googles actual customer base: Adwords users. The goal of Adwords is to have every single search be worth the maximum possible value in an advertising sense. So if one person a month searches for "Office desk drawer repairman Concord California" they do not want to service that very detailed long tail search, that would minimize value for themselves and maximize it for the advertiser. They want to treat that search in a vague sense so they can display more ads for their customers and charge for each display and click.

You know, that actually makes a lot of sense. Recently, I attempted to bid on some long-tail keywords in AdWords for some targeted ads, but unfortunately they don’t have the “search volume” to qualify.

You mean you cannot bid on keywords with very low search volume ?

Yes Adwords will prevent you from targeting low volume/long tail combinations and recommend higher volume more vague and simple keyword/combinations.

That seems to confirm a feeling: Google (et Al) transformed the internet into the new TV

It’s a bit funny and sad how Google has both managed to seem overly clever to the point of being clumsily useless, and at the same time, continues to offer the original search experience that people pine for but no one knows or can be bothered with: add “intext:” before keywords and it’s the experience you’re wishing for.

I just tried a few of my old "dry" queries (as in, things I've always wanted to look for but can't seem to get good results) and didn't notice much difference... Furthermore, after a total of 3 "Next Page"'s, I got IP-banned with a CAPTCHA.

I've had similar experiences recently with "site:" and the other colon-operators, so not entirely unexpected, but still immensely infuriating. It's almost like any real attempt to find what you're looking for, if it's rare, will result in punishment. :-(

I start feeling like the web is being de-optimized for nerds & super-users

Nothing wrong with that. You generally shouldn't optimize applications for power users. Software should be usable.

Good software is when both kinds of users are pleased. As Alan Kay said "Simple things should be simple, complex things should be possible.". But it's probably the hardest thing in UX design to make something that is simple and powerful at the same time - most things sadly end up being only one of the two.

I agree, but I didn't say that good software doesn't please its users. I said you generally shouldn't optimize for power users and that software should be usable.

Its likely you are using "privacy" plugins in browsers that obfuscate who you are to Google. "A friend" of mine has the same issues with them (constantly harassed like they're a bot).

Never heard about that before; thanks for sharing.

> will try to rewrite your queries and omit words

This has been bugging me too the last couple of years. Sometimes I would actually prefer an empty answer to one where various words are missing. Empty can mean an idea is still unexplored.

I've started sending feedback to Google every time this happen. Maybe it'll make a difference if more people do it?

Searching using “Verbatim” can be helpful.

What’s particularly infuriating is when searching with verbatim still ignores keywords.

I still remember the day that stopped working. I decided that was the day Google died.

Is that when the + command was overridden in search to refer to G+ instead of 'this is a mandatory search item'?

I am still shocked they ever thought that was a good idea, or that surrounding quotes was an acceptable replacement.

No, when that was changed you could at that time just quote a single word to get the old +word behavior.

Years ago when I interviewed for a Software Engineer position at Google, the person I was matched to eat lunch with between the interviews was from the Search team.

I asked him these exact questions. He said, the last time he checked, quoting a single word to mark it mandatory worked for him and that he definitely would know if it didn't.

I didn't insist much at the time but I knew he didn't know what he was talking about, and it made me lose hope that this feature would ever come back working like it used to.

Are you saying the quote-a-single-word thing doesn't work for you?

It has not been a guarantee for several years, now. At best it seems to be a slightly firmer suggestion.

Do you have some example searches?

I believe you, just never seen it myself. Perhaps I switched some obscure setting on years ago.

None that are guarantees, unfortunately. It's maddeningly inconsistent from day to day. But it isn't that rare, either. I only perform 10~20 searches using the quote feature a day, and I hit a case where it's ignored several times a week.

Doesn't work consistently for me.

I've reported that at least once.

After that it started working again for me within a couple of days and has been more or less working as expected (hmmm. That's what I thought at least) until recently.

Frustratingly they refuse to send any feedback so I just had to try again,

Yes, and I've also gotten the capchas.

What more infuriating is it using synonyms of "verbatim" that actually don't mean the same thing as verbatim in a particular technical context but match an avalanche of noise. I've yet to find a way to turn on synonym matching.

I've also noticed Google omitting the only relevant and crucial keyword in my search over the last year. The First result will consistently show a result that has that keyword crossed out.

> solving the CAPTCHA just gives you another, and no matter how many you solve it keeps refusing to search

They even have a patent on that: https://www.google.com/patents/US9407661

I fear that something like PageRank used to work well just because back then many individuals had personal web pages with well curated links. It seems like this has slowly gone away, and now Google et al. have to resort to increasingly more hand-tuned AI extravaganza to get the search to work decently well at all.

Ironically this sort of "links section" probably dropped out of favour in large part because you could find those pages easily through Google, which knew which ones were good because they were linked in a lot of peoples' links sections...

Also, it gets ever harder to keep up over time, as good links either go away or deteriorate in value.

Perhaps not coincidentally, both the Yahoo Directory and DMOZ have been entirely shut down.

I'm so glad I'm not the only person thinking that. Last year and a half or so I've started scouring forums and reading studies on google scholar rather than searching the web. And it's gotten worse and worse each year.

I've been lazy and so acclimatized to the UI that I haven't changed, but I plan to now. It's especially difficult finding medical information. Mind boggling how 6-7 years ago it used to be so much better.

The worse is when it gives you results containing synonyms of your query words (and even highlights them in the little description under the link). Like, dude, I used this word for a reason

All this is true, and yet when I use Google Chrome's search autocomplete (which is partly server-side and partly client-side), it will usually fill in exactly the query that will get me what I want (rather than what query I think will get me what I want) before I even start typing it, using some combination of searches I've done before and the indexed text of open Chrome tabs.

Google by itself isn't very good, but Googling from the Google Chrome address bar these days is something else.

I wonder if what you're seeing is a result of the Internet growing, not specifically Google degrading. Which is to say, as the indexable dataset grows, you'd expect the individual indices to turn up more and more heterogeneous results over time.

Nice theory, but it's Google degrading. They actually removed features for more accurate searching. Or they arbitrarily ignore them--all the people in this thread saying that "word" now functions like +word used to, are mistaken (or didn't use it much in the past decade) because often it does not.

And you didn't even need to use +word for most words, because it would only give you results that hit all the keywords in the query. You only needed the + operator to include "stop words", a relatively short list of common words that Google would filter out.

Also, given that personal webpages are way less popular than they used to, the Internet might not even be growing that fast, at least not the parts that contain the type of random useful info that we used to be searching for. They can't index Facebook (not very deep), or most of the popular mainstream services that people write their thoughts into.

The fact that other engines are able to offer up results tells me this is a Google thing, not the size of the internet.

... unless there's something Google offers up reliably that those other engines miss. Then what you're observing is heterogeneity of returned results due to the dataset being too large for any one index to cover it fully.

It could be the internet (content) growing, or it could be the search engine changing, but to me, the most salient change over 20 years is the internet user base changing. At the risk of sounding elitist, the users of the internet have become a lot dumber on average. Search engines therefore grew dumber relative to the one that was tuned to people who were early adopters and used the tool for more productive purposes.

With respect: yes, that does sound elitist. I would avoid assuming early use cases were more productive than subsequent average use cases (for one thing, it risks defining "productive" as "stuff I like to do," not via some more objective metric like "revenue generated," "human needs satisfied," or "questions answered").

On the objective side, most revenue today is from advertisement and ad-generating content. Search engines make it easy to find ads, essentially. It's not about the content itself any more. So there is that.

Yandex is pretty good but I haven't found a use case for DDG yet. Whenever I set it for "privacy" reasons (not that I really believe them), I have to go back to Google to get reasonable results. I can confirm the memory loss, though, ego-surfing confirms it for me. Much of my older stuff is gone and all the top links for my name are related to my current work. (So in my case, it's kind of beneficial.)

That's because you're not using DDG's !bang syntax to its full power. I have it as my default address bar search engine so I can redirect straight to Wikipedia (using !w), Discogs or whatever of the 1000s of !bang abbreviations are defined.

Just the amount of times where you know the most useful (and probably top) result on Google will be Wikipedia makes DDG worth it. Saves a click. But for me it saves a click very often.

If I need Google occasionally I just append !g, but it's just one of the many other places I use !bangs for finding stuff.

But browsers already have that functionality built in, why go through DDG for that? (well, Chrome and Firefox do, not sure about others)

If others = Opera, then yes, they came up with that first :)

I've been using those kinds of keyword search shortcuts for years before switching to DDG by default.

The realisation was that having 100s (maybe 1000s already) of easily-guessable search shortcuts pre-programmed with DDG's !bang syntax is way more useful than having top configure 10-20 of them by yourself (and forgetting most of them when you reinstall a browser). Definitely worth the extra "!" keystroke and redirect :) [especially given DDG doesn't track you via that redirect].

I concur - and then to run across a google aficionado who alerted us to a policy change where now paying to move up the charts was cutting them out and the paid for results not labeled as ads. The ultimate result of all good things when they get too big? Corporatize? I fear more that the people who should be creating the next google have been numbed and hypnotized into a state of flux, where everything is acceptable.

You fail CAPTCHAs and get flagged as a robot. Are you sure you're not a robot?

I have the same problem.

To pass Google CAPTCHA you have to perform like an average human. This is different from getting it right. Once I started being lazier, (e.g. back of a sign isn't a sign, picture that contains a storefront but in the distance doesn't contain a storefront), I have had greater success rates.

It's like Google is training me to be dumber.

An image containing the back of the sign doesn't contain a sign. A sign is the front. The back is just a flat surface, not a sign.

But it's part of a sign and all things that are part of a sign, as long as thry are attached to the sign, are signs. ;)

But it's part of a sign and all things that are part of a sign, are signs. ;)

I've noticed the same and if you look across internet and search properties it's trending the same way.

I'm not certain yet, but I think it has to do with systems relying more and more on the the system giving a high weighting on popular or frequent searches. In other words, these systems are filtering content by how frequent they are searched and annealing returns which have low frequency.

This makes sense from a machine learning perspective. If I want to build a system which returns a search quickly, then it will be biased towards pathways with strong weights. So in the end, systems would be biased against outlier searches and very specific terms which have low or weak search pathways. Effectively the system is getting really fast and good at giving you the most popular return, rather than a precise return.

The major problem there is that over time the search space will atrophy, much like memories do, and will kill off pathways which have low frequency. It's unclear if this is good or bad long term for a search engine, because it will remain popular for the majority of users, so long as their search terms and desired results live within the same space. In other words, we're creating a less diverse and more homogenous space by virtue of giving higher weights to more common thoughts/searches/desires.

I think the interesting question is: if you optimize the design to return rarely-accessed results reliably, what feature that we take for granted does that design sacrifice?

To me it's obviously speed. Reducing the search space or cache or however you want to define it is the best way to increase speed.

When milliseconds count that's important.

This is sad because humans are much much slower than computers and even 1 whole second is nothing to us. At least I am quite willing for Google to take a second or more to give me more thorough results. A query that takes 900 milliseconds is better than the same one taking 150 milliseconds but returning poorer results. The additional 750 milliseconds are virtually unnoticeable in human time and a small sacrifice to make for excellent search coverage.

The only place where every millisecond may count is the case of automated queries running into the thousands and millions. I'm not aware Google even allows something like that and it's a corner case anyway. Human typed search is the majority use case.

2 seconds is a human patience threshold. Google aims for 500ms (750ms is very noticeable).


Presumably if Google spend 150ms of processing time typically, they don't want to spend 900ms (I'm ignoring transit times, etc.) on one query as they could get 6x as many ad impressions for that processing cost.

They don't even need to do better than the competition, only well enough to stop customers leaving despite having to do 4-5 searches.

> will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!)

I've always double quoted the specific term I'm looking for. That usually bypasses this, ie. searching for "foo" will look specifically for foo and not food or what have you.

Those captchas are the worst. Google asks me to answer one whenever I make search queries in the middle of the night.

The wierd thing is: Google created two captchas. One to digitize books and a shockingly awful one they use for their own services. Seriously the one they use for their own services is complete garbage and obnoxious to even attempt to deal with

Aren't DDG and Yahoo using Bing?

DDG is now also using Yandex. Not sure which one is better though, I find complex queries still better answered at google (luckily that's quick via !g)

What is the modern search engine that represents best what Google did at its functional peak?

The timing of this correlates with the sharp rise and unavailability of DDR4 memory...?

Coincidence? No...

very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!)

This is the algorithm deciding that it’s way results in more ad revenue for Google. What do you do? You repeat the query again and they show you a second lot of ads.

But let’s be honest if Google could show you as many ads without returning any search results at all, they’d do that in a heartbeat.

I believe it's not that direct. It's optimising for what most people search for and how to keep them using Google most. That in turn leads to more ad revenue.

Ads? What ads? I see no ads.

I actually forgot there's ads in Google's result pages. It's been ages since I've last seen them, uBlock is the first thing extension I install on a fresh Firefox.

Yes, one tends to forget about ads ;)

Ah yes, forgetting... I feel we should maybe commemorate the atrocities of the past in some way... Like a National Remembrance Day for the Fallen Ads, we could have a moment of two minutes of ... loudly over-compressed slick fast talking sales pitches trying to make you feel bad about your life without their product.

Oh, the nostalgia :) Actually thinking about it, I can get nostalgic and enjoy viewing certain ads sometimes, provided that they're over 10 years old. It's that whole icky feeling of being manipulated even while believing you know they do it (except they do it worse) these grabby greasy fingers reaching into my brain. After a about a decade, ads lose that power and they just become quaint. Like the "Jazz Solo Design" from the 90s (image search it and you'll recognize it immediately).

I've convinced myself that this happens in gmail / hangouts history search too. It'll very confidently tell you that here are the only six results for your search term going back to the beginning of time, but if you go and manually dig up something that you know is there from ten years ago, then all of a sudden there are seven results the next time you search for the same term.

I haven't done this methodically, and I can't prove that this is happening, but it's infuriating nonetheless.

This definitely happens. I have 1 email that is about 5 years old that I reference once or twice a year, often enough that it is a suggested search term. Recently Gmail has been unable to find it and I restored to starring it. It is literally the only starred email I have but I can no longer search for it.

This is actually very worrisome for me. I use Gmail as my personal store of weird information, from Wifi passwords to the account number of that service I only use every 4 years. I Just send myself an email with obvious terms to search for in it and the relevant information. It's super convenient and I've never had it fail... yet, apparently.

Has happened to me too. Actually, the Android gmail app is even worse on searching. I often have to launch a browser and search via the web interface because the app returns no results.

My experience with the Android App is that search seems to be local only. So if the email is older than the retention policy or handled elsewhere then it's unlikely to show up there. Emails deleted on the desktop are notoriously invisible on mobile.

This is true in their iOS app as well. I've given up searching in the app and just open up the full web site when I need to find something.

I've had this for Chrome history as well. There have been multiple times where I'm sure I've browsed a site with some keyword in the title and it just doesn't show up in search. I don't tend to have a clue about the time window it would be in either so I can't go looking for it, so I can't prove it.

Chrome history may as well be useless and/or developed by the same people who created Reddit search.

I feel that it is intentionally bad so people don't realize how much Google knows about them.

Meanwhile their image recognition gets better and better. For those of you who use Google Photos backup, try a keyword image search in Google Drive sometime of your untagged photos ("beach", "face," etc.) You'll be creepily surprised on what Google is indexing, even against what they claim they don't (try some sketchier words).

I can one up that: I was living in Dubai a few years ago and have a number of photos of fancy cars I could never even dream of affording. If I search for “Lamborghini” or “Rolls Royce” it gives me the photos of those cars. I’ve never tagged them and I’m not an Android user, so they aren’t reading my messages.


I even have four photos I took at night, in burst mode, as a Bughatti Veyron zoomed passed, and yes, it can recognise those...

That's not a sneaky feature; it was the primary selling point when Google launched the new Photos product in 2015.

All modern gallery apps have some rudimentary photo recognition, but I haven't found any but Google's that will allow you to search terms like "topless/nude" and find accurate results. I would never store photos like that with Google, but I've confirmed it works, and with how many promotions Google has run offering free photo storage with their latest phones there are undoubtedly thousands and thousands of unwitting users who have sensitive photos not just automatically backed up in some Google server, but categorized. Just imagine if these servers were to be hacked and that information was conveniently pre-arranged for extortion.

I understand image recognition searching for generic terms like "cars", but my point is this can even recognise brands. And it only returns photos matching that brand, so it isn't replacing "Rolls Royce" with "cars" to do the search.

I guess it makes sense to do this, given that most things they do is based selling ads, I'm just surprised it is this accurate.

iOS photos app does this as well.

It does generic terms like "cars", but it doesn't work for specific brands (at least on my 5S).

Samsung gallery app does this as well. You can even search for stuff like "selfie".

Maybe every photo using the front camera is a selfie?

Didn't think of that obvious solution!

Maybe, but it's also incredibly poor at the same time. When I search my photos for "dog" I get many many pictures of cats. But that's sort of understandable, since they are both 4-legged animals, right? Well, then I don't know why searching for "dog" also brings up pictures of birds I have in my google photos. It's great about 90% of the time, and the remaining 10% it's hilariously and completely wrong.

I searched for "paper" today and one of the results was a toilet :P But yes I agree, it has been getting better.

Was there toilet paper in the picture? :P

Just had a look, turns out there was haha. Guess that's why then.

Where has Google claimed they don't index certain terms?

Chrome is much better than Edge in this regard. I have a website bookmarked and tagged with a very rare specific word. In edge, typing the keyword will never show up my bookmark. Instead it shows crap from around the world. Firefox and chrome do the right thing, my bookmark is the first suggestion.

The address Bar in Firefox is great for this. I fire off a fraction of the searches I used to after switching.

I remember not too long ago a colleague of mine gave me address to internal monitoring system that we use. I tried to find it few minutes later in chrome omnibar by typing almost exact url i.e. page was aaa.bbb.com so I typed aaa bbb. It gave me nothing, just search suggestions. I'm guessing chrome does it on purpose. The less browser history they search in chrome and show back to you the more you'll go to google web search and that's ad money for them.

That's why I'm back on Firefox after quantum release. I hope mozilla never, ever, ever does something like this but I remember seeing something similar on nightly once. It gave you search suggestions first, that redirected you to google, with option to disable it in settings.

I think Chrome history might be limited in time. Something like 3

Something like 3 months. I forgot a word apparently.

I'm just glad I'm not the only one!

I have experienced this and it is one of the main reasons I still keep an imap client set up. There are certain emails I need to be able to find and gmail does not find them. Claws Mail does. Personally this is a minor annoyance with personal email, but I would hope their commercial offering does not have this behaviour.

Another really interesting thing I've noticed in gmail relating to search is that the number of matches for a given search is approximate, which makes perfect sense if they're using some kind of probabilistic data structure. However, when the correct number of matching emails does become known, because you have gone to the end, the result is not cached even client side. This gives a weird effect when combined with pagination: you go back a page, and the number of matches changes to the estimate again despite the fact the actual number is now known.

Startup idea: a service that will let you search your inbox. Aka google for searching.

Seriously, this is egregious. You rely on your email provider to accurately search your inbox - some emails are important business, tax, and legal documents that are relevant for years, even decades. Or at least be fucking transparent about the fact that you are not really searching all emails. I know Gmail is a free service and in the T&C you agreed to (figuratively) sell your soul but this has huge real-life implications.

> Startup idea: a service that will let you search your inbox. Aka google for searching.

I don't mean to pick on you but picturing the perspective behind this comment is very funny and a little sad to me.

grep is almost 40 years old. It is free software, fast, and doesn't share your data with anyone. Small knowledge of the file structure of MIME enables more advanced search. This is all without mentioning desktop-based email clients.

Reading your comment, I can only picture some web-page javascript-based track-you-and-show-ads 15-employee company whom you give your email password so they can connect to another service and make high-latency queries on your behalf.

Shows how far we've come?

Firstly, that startup idea was an overt irony aimed at Google :). Secondly, my guess would be that maybe 0.001% of gmail users are familiar with command line interface and regular expressions. Not everybody is a coder and that is not necessarily bad.

Apple has the opposite problem. On my version of OSX, Spotlight searches in the Finder return email results too, which increases the noise-to-signal results dramatically. You can turn it off but this requires you to enter your search term as a formula every time--there's no way to make it the default. If I want to search my email, I'll switch to the Mail program and search there. I neither want nor need to search my mail in the Finder.

> I neither want nor need to search my mail in the Finder.

Just exclude Mail from Spotlight search in Settings.

Gmail is not just a free service; g-suite costs money, and if this affects corporate email this definitely has real world implications

If it does not affect corporate email, it does not have real world implications?

Not necessarily, he didn't say "if and only if"

Thunderbird (and I'm guessing most of the offline, true email clients) has this built-in.

It's funny, in discussions about webmail vs local email client, people often say "why would I want a local client anymore, webmail is all I need".

Well... this is why you might want it. Your data under your control. Your choice of tools.

If you're using GMail and Google decides to turn GMail to crap, well, bad luck.

This has happened to me with labels before, too. I'll do a search for all things that have a label and are in my inbox, and then archive them. I'll then go back to my inbox, and see that it missed something with that label. If I then repeat the search, I get zero results, even though it has the label, is in my inbox, and I can go back and find it. It's extremely frustrating.

100% Agreed. Gmail doesn't seem to want to find stuff that is there.

Gmail also drops sent mails occasionally for no apparent reason. They don't even bounce back, they simply disappear and never reach their destination.

I doubt it's malicious

I doubt it meets expectations

I think it meets profits.

I find it incredible that google can profit from Gmail. (Actually I find it incredible that Google can profit at all.)

Advertising is an incredibly profitable industry.

Gmail doesn't show ads anymore.

It provides a pretty good incentive to keep that Google cookie in your browser though (as well as your access to your interests, online purchase history, address book etc. etc. ad nauseam)

It does in the promotions tab.

I’ve gone back to using a local client recently, and being able to search/grep text files I know are in a dir has made me feel much less like I’m going insane / dependent upon capricious mystical forces.

Same here. Offline search in Thunderbird beats everything, particularly the quick filter feature.

Microsoft does this too. I have a massive mailbox going back 15 years and O365 doesn’t handle it well with full text search... you need to scope it to a person.

Ah this is good to know. I current run on-prem Exchange, with a view to moving to Office365, my mailbox is a super-set of all the mail I've ever had and goes back 20+ years.

My email arching ve going back more than 20 years is stored in a single .PST file, and I search it using Outlook. Never had a problem. Every 3 months I copy everything older than 2 months from Exchange Online into that .PST.

Outlook is totally different, and I agree it works perfectly.

I'm talking about OWA Search in O365. I use mostly VDI these days, so PSTs are out. It's a frustrating issue to me because OWA search is better in many ways for more recent stuff.

Also note this is an anecdotal interpretation based on my experience.

Just in case you are not aware, PST files can go horribly wrong when they get to 2GB in size.

Thank you for bringing this up. It would make my month if someone would chime in with a solution. I didn't know about this problem until very recently, and it caused some major headaches in my business.

Can confirm. Google, the search company, has lst the ability to accurately search even the data that it holds, never mind any other.

I've had this happen too. It's not just you. This has happened to me repeatedly and it is so, so frustrating.

I definitely have this exact issue with Google Calendar as well. I search for the exact wording of an event, and it doesn't show any results that are old. I then manually go back in the calendar and find it, do the search again and ta-da! it now shows up as a search result...

Sounds like they don't index the full corpus

Or there is a time bound or other resource bound that they are willing to expend under the current circumstances ( are you a free user? Paid user? Internal user? Mobile? Web? Etc )

Which isn't a bad thing as long as it is communicated. If Google is limiting functionality somehow, let me know if there is a pay to play to enable everything.

I like how Wolfram Alpha does it. You get a certain amount of compute time for free, and if you've got a paid subscription, you get a known amount more. It works well, and I don't mind paying a little to get a more reliable service.

yup. this.

Yep I have the same issue.

Hate to side with the big guys, but its a free service. Beggars can't be choosers. They probably dump indexes after a while for content older than x. Seems fairly reasonable actually.

Then there's not much point to using Gmail. That was a huge part of the "never delete anything again" ploy.

> Then there's not much point to using Gmail.

Almost there...

You're paying with your data, same as with Facebook and many others. These extremely successful businesses are obviously able to make plenty of money using that data.

Maybe your data isn't worth what you think it is these days. Maybe they actively index what is recently used. It's cost benefit for them.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact