Hacker News new | past | comments | ask | show | jobs | submit login
Fewer Than Half of Google Searches Now Result in a Click (sparktoro.com)
752 points by adamcarson 7 days ago | hide | past | web | favorite | 513 comments
 help




I think what folks are missing is that a lot of these "zero-click" searches happen as a result of Google scraping your website, and displaying the results as a "featured snippet."

Yes, they link to you below the featured snippet.

No, more people don't click, because they've taken the answer from your website and displayed it right in their search results.

For example: If I'm searching for "best nail for cedar wood" Google gives me the answer: STAINLESS STEEL - and I never had to click through to the website that gave the answer: https://bit.ly/2MdovdP

• Yes, this is good for users (it would also be good for users if Netflix gave away movies free)

• Overall, the publishers who "rank" for this query receive fewer clicks

• Google earns more ad revenue as users stick around on Google longer

Ironically, Google has a policy against scraping their results, but their whole business model is predicated off scraping other sites and making money off the content - in many cases never sending traffic (or significantly reduced traffic) to the publisher of the content.


No, more people don't click, because they've taken the answer from your website and displayed it right in their search results.

It's for this reason that's I've stopped embedding micro data in the HTML I write.

Micro data only serves Google. Not my clients. Not my sites. Just Google.

Every month or so I get an e-mail from a Google bot warning me that my site's micro data is incomplete. Tough. If Google wants to use my content, then Google can pay me.

If Google wants to go back to being a search engine instead of a content thief and aggregator, then I'm on board.


I just got one of those emails for the first time about my personal site that's basically my resume. Apparently my text is small on mobile (it's not...) and some other crap

I don't get why google thinks it's acceptable to critique my site without prompting. It honestly just feels rude. They want me to do a whole bunch of micro-optimizations on a site that already works fine because it doesn't fit their standard of "high quality". I think I've gotten exactly 0 clicks from Google search results ever and I don't really ever want any.

If it were possible to get a human's attention at Google I'd start sending my own criticism their way but of course it doesn't work like that...


I was curious what it was complaining about, since https://henryfjordan.com looks great to me. I tried to run it through Google's "Mobile Friendly Test" but fetching failed [1] because your robots.txt has:

    User-agent: *
    Disallow: /
This would explain why you've gotten zero clicks from Google (or I would guess anyone else's) search results!

On the other hand, it's surprising that you would get a notification if you had crawling disabled. Did you set this robots.txt up recently?

[1] https://search.google.com/test/mobile-friendly?id=97_WUiIxx-...

(Disclosure: I work at Google, commenting only for myself)


Google seems to see robots.txt as "more what you call guidelines, than actual rules". Sites that block googlebot or all bots with robots.txt still turn up in google searches, just without a description, and are obviously still indexed.

robots.txt is a tool to control crawling, not to specify how you would like your site to be displayed (or not) in search results. If you don't want search engines to include your site, set:

    <meta name="robots" content="noindex">
while to block just Google do:

    <meta name="googlebot" content="noindex">
See https://support.google.com/webmasters/answer/93710

If Googlebot is not respecting robots.txt, and is crawling something it's been instructed not to crawl, let me know and I can file a bug?

(Disclosure: I work for Google but not on Search, speaking only for myself)


But that requires that Googlebot be allowed to crawl the page in robots.txt in the first place.

How do you tell Googlebot to not crawl your site and to not index it either?

Previously, one could use the undocumented "Noindex" directive in robots.txt, but this will be disabled soon: https://webmasters.googleblog.com/2019/07/a-note-on-unsuppor...


The bot doesn't need to crawl your site for it to be indexed; it crawls other sites that link to yours.

You can specify your index preferences in Webmaster Tools. Don't know if there's a domain-wide off switch in there, but there probably is.


Using Webmaster Tools is not a good option since it requires you register with the exact company you are probably trying to not interact with.

The blog post you link has a bunch of alternatives, but I agree they're not great. If there are a lot of webmasters who want to be able to noindex through robots.txt then making the case for adding noindex to the standard would be a good next step.

(Still speaking only for myself)


Googlebot actually used to support a noindex rule in robots.txt, but they are removing it.

https://webmasters.googleblog.com/2019/07/a-note-on-unsuppor...


Yes, that was linked above. It looks like this is part of reducing support to what's in the spec?

Oops, yep. I didn't see that context.

I sent you an email, and I'm posting it here but without identifying info:

---

Hi Jeff,

Thank you for your comment, I'm replying via email to send some info I'd rather not share on HN, but will post the same redacted in HN. I used to (back when starting my web-dev career) run a one man show development team of a web agency and all our development/pre-prod sites (that had to be unauthed) had robots.txt to disallow all bots, but they still popped up in Google. Searching some of the old domains in google I found an example here: http://***.***/***, and attached is an example of it showing up in a SERP and a what the robots.txt looks like (and I'm pretty sure that the robots.txt has looked like that since that page was created).

In this case it is just one page that nobody will care about, and since I'm not working on projects that are open but "robots.txt hidden" anymore I don't know if it is as bad as it used to be, but I regularly see pages with the "No information is available for this page" whose domains have robots.txt's that disallow all bots but still show up in Google.

Please let me know if I missed anything :)


Thanks for sending the screenshot! That site shows up with "no information is available for this page", which means that while robots.txt has disallowed bots from crawling it the page is still linked from other pages that do allow crawling.

The robots.txt protocol gives instructions to crawlers about how they should interact with the site. If you instead want to give instructions to indexes, use the noindex meta tag.


You're right, I was wrong about how to expect a "Disallow: /" to work. But isn't it sorta odd to have a protocol to control crawling (which is usually done to index) but (almost) require a compliant indexer to crawl all pages to comply with the indexing rules?

In this example the robots.txt has clearly told all bots to not crawl this site, but the only way to read the meta tag (or equivalent header) is to crawl the site. So I assume that in this case google either assumes that it is fine to crawl URL's that it has found elsewhere while ignoring the robots.txt or it assumes that pages disallowed by robots.txt are "open for indexing/linking", which would mean that any page both disallowed by robots.txt and which has a noindex meta tag would still show up, right?

What is the intended behavior if a page is disallowed by robots.txt and still linked by another indexed page? Will it get crawled or just assumed to be okay for indexing/linking? Is there any way to tell Google not to index/link and not to crawl?


If you have a calendar where every month links to the previous and next months, a crawler can get stuck and hammer the server. That's the kind of thing robots.txt is for.

>"more what you call guidelines, than actual rules"

they can index without scraping. It is enough that other websites have links to you site. So the google bot follows the rules in robots.txt to the letter. "no-index" is the way to stay away from google.


They can't read my no-index if they obey my robots.txt. Do they break the robots.txt to be able to read my no-index or do they assume my "Disallow: /" means I'm fine with them indexing/linking?

Without the noindex part of robots.txt (which google decided to ignore not so long ago) this is not solvable.


Oh, I just added that yesterday as a response to the email. Before that I was actually running Google Analytics but since I get basically 0 clicks it wasn't really useful.

I have a feeling the PDF viewer triggered it, cause on Mobile it defaults to showing the whole page which results in tiny text but that's easily fixed by the user so I prefer to leave it like that.


Yeah it's amazing how rapidly and rabidly they show up when the complaint is on one of their paid features like a Google cloud (GCE) post for them or a competitor, but nada on the other products. Well no it's actually not surprising.

Google cloud employees are encouraged to go on social media to get a feel for issues users are having and to make the product better.

The rest of Google has a policy of "Engineers will probably say the wrong thing if we let them talk in public"


Google has grown into a cancerous middleman.

> If Google wants to go back to being a search engine

While I understand the problems with Google scraping content, as a user these snippets help me find what I'm searching for faster. If that's all you're optimizing for, Google is fantastic. There are certainly good arguments to be made for other models, but for search, stealing content helps. I'm not advocating stealing content, I'm just saying that it produces more useful results.


How do you know that the content Google features is the best there is? If we stop clicking on sites and just rely on Google to provide us the content we'll go down a very slippery slope.

I don't really see how this problem is any different to 'how do we know the #1 search result is the best content there is?', if it provides you the information you want, then great, otherwise you load #2.

Google lends the weight of its authority to the answers it presents. It's one thing if Infowars says that Obama is planning a coup against Donald Trump, it's another if Google says so.

Try googling "root M89 tablet".

The first three result lead you to fake android blog telling you how you can easily root every chinese android device and specifically the M89 tablet...

The real authoritative result (xda-developers) only appears in the fourth position, under sight. It will tell you if you follow the instruction given in the fake blog post from the 2 or 3 first results, you will brick your tablet.

In a similar way the word "cbd" (for cannabidiol) has been hijacked by dubious commercial compagnies through fake blog posts filling pages after pages of google results telling you how great cbd is for the treatment of every disease on earth... But there is no trace of an actual study in these results. You will have to go with the less popular word "cannabidiol" to start to see some serious articles about it.

Google results can be hijacked and Google do little about it. May be because the ads shown in these fake blog posts are from google ads network ? I don't know...

But google result have clearly deteriorated these last years and the authoritative figure of the companie is not anymore what it was in the past.


I know that sort of thing happens sometimes (Google presenting a spurious statement as a categorical answer) but those are bugs. As long as they are very rare, and fixed quickly when they occur, I don’t see them causing much harm.

OK, some people believe anything they read (especially if it confirms their existing biases), but that problem has always existed. I think Google’s occasional snippet fuck-ups are a drop in the ocean compared to the spread of false information through social networks.


There's the modern news-cycle axis, where Google can and should devote full-time engineers.

But the long tail is important too. It's fixed now (yay) but for years you could search for "calories in corn" and Google would confidently present an answer 5x the true value, scraped from a site with profoundly wrong information. As Google moves to present more direct answers and fewer links, this risk increases.

It looks like they have backed off on the direct answers somewhat which is good news.


If it undermines the websites producing the content Google is scraping by not sending through traffic then those sites may not continue to exist.

This is already happening.

Very few new blogs and content websites are being set up.

All content is moving into apps and walled gardens. Part of the reason for that is that running a well researched blog will never pay for your time, so becomes a hobby thing, and most people are fine to use Facebook for that.


> Micro data only serves Google. Not my clients. Not my sites. Just Google.

Well it also serves Google's users, to be clear. Though I should also be clear that I don't think that justifies it, since I think it's bad for the ecosystem in more subtle ways than are expressed in immediate user satisfaction.


That depends on how you define "users". If you define a website creator also as a Google user (by virtue of wanting to be found through Google), then Google is serving part of its users to the detriment of their other users.

And if you view Google instead as a connection broker, e.g. a middle-man between publisher and consumer, then Google is destroying their own business by snubbing publishers. Assuming that Google is still making rational, intelligent decisions, it follows that Google no longer sees itself like that.


Did Google ever see itself as prioritizing publishers and consumers equally? I think that’s a false premise and the parent is right; Google’s priority has always been consumer first.

> If Google wants to go back to being a search engine instead of a content thief and aggregator

A search engine is inherently a content aggregator; the functions are inseperable.


Not necessarily. Google used to be more of a link aggregator. There's a difference, as the OP proves.

Google (and virtually every other search engine) has always included content with links, what's different now (but not unique to Google, though they are perhaps the most advanced at it) is that now it algorithmically synthesizes content instead of merely aggregating it.

It does help your clients.

I mean, maybe not yours specifically. But snippets are great for users in the typical case.


These users are no longer his clients.

On top of all that, Google's snippets aren't curated and therefore, aren't always correct. They can be (and almost certainly are) gamed. Users that don't click through open themselves up to carrying on being misinformed.

I've found them to be incorrect so often on things when I would click through to the actual page or find a better link. I don't trust just the blurb for any answers any more.

I don't trust just the blurb for any answers any more.

I don't, either.

A site I used to own had a discussion forum on it. It contained a message along the lines of "Real Estate Agent X is a great guy. Real Estate Agent Y is a complete sleazebag."

The blurb that Google displayed for it was "Real Estate Agent X... is a sleazebag." And that was the first result for anyone who searched for that agent's name.

As you can imagine, I received many angry e-mails, phone calls, and legal threats. No, you can't explain to angry people that it's "just" an algorithm that told the world that they're a sleazebag.

I ended up editing the post so that Google would display a different version after its next scrape.


I think there's more to this... Google use lots of fancy Natural Language Processing stuff to extract that data, and unless the wording was very tortuous, I doubt it could make such a big mistake by chance.

They can get it painfully wrong last time. I came down with something like optic neuritis a few years ago. It's often one of the first signs of MS in many folk. When I googled something like "MS life expectancy", the blurb said something like "3-7 years" -- with subtext indicating it's 3-7 years LESS than average rather than "you're kicking it in 3 years".

Turns out I didn't have optic neuritis.


They suck. And something about the way they are presented seems to make people believe them.

I think it gives that one-shot answer to questions people have, even when the real answer is nuanced and multi-faceted.


I think they’re believable because google started by providing things that weren’t wrong. If you search for a time zone google shows it in your local time, if you search for currency conversion google does that. All those things that it’s done for ages, which were things that were also typically correct.

Then the snippets show up, and they are presented in a similarly trust worthy fashion. But the snippets are really just the really just the result of which ever site has the best SEO, and that’s often a really worthless metric these days. The time zone and currency stuff is easy, because it’s math, but opinions aren’t. The thing is though that even if google didn’t have the snippets, those sites that gets snippets would still be the top results that we clicked, and we’d still get the wrong information. That would probably be better, because it might be easier to spot obvious bad sources, but I still think there is just a fundamental flaw in how SEO professionals have learned to game the google bot to bring the world useless information.

I mean, part of it is certainly on google. No one in their right mind wants to comply with Google’s ranking terms, unless you make money from google searches. Which means a lot of useful personal blogs have dropped off the face of the internet, unless you’re really lucky to see them linked on a place like HN.

I wish libraries would band together and make a privacy focused and curated search engine, because librarians are actually kind of good at finding you the correct information.


It sucks. Sometimes the bold text is the exact opposite of the answer to the query I search for. It’s very misleading unless you click through and read the full context.

Yeah. I personally like the feature, in theory, as an end user, but the signal:noise ratio for it has not been great for me.

This is especially true where the answer is time-bound, which happens a lot in technical topics. Many times the snippet is for an earlier version of the language (but still with a high PageRank), or the Operating System (especially Android settings), and the most annoying at all: an ancient answer in an undated blog post.

Google is good at dating undated content. They keep track of the first time they've ever seen a bit of text, and assume it was composed then, even if it later gets copied to other sites.

For example, this WaPo story, about YouTube videos for some medical queries that go to videos featuring quack remedies and anti-vaxxer misinformation.

https://www.washingtonpost.com/lifestyle/style/they-turn-to-...


For a recent search "report amex card stolen", google showed a phone number for a scam who asked for a social security number as soon as you called.

The websites that the results aren't curated either. Clicking through to the site could provide the same incorrect information.

The point is that Google frequently adds another level of incorrectness, that may not be identifiable without checking the source. This is pretty common on Wikipedia, and when people link to things in discussion forums, as well.

And anything Google does, is done at vast scale, which makes me, at least, think it might be substantially affecting society.


But that's the responsibility of that website. Of course it's bad if Google lists a site with wrong information as the first hit, but I think it's worse when Google blindly copies that false info and lists it as their own zero-click result. By doing that, Google itself takes responsibility for the information.

Although sometimes the site is actually correct and Google still gets it wrong by copying the info incorrectly or losing some context or qualifiers.

I loved zero-click results back when DucfDuckGo first introduced them, but I'm less enthusiastic about Google's implementation of them.


sometimes the blurb just has an answer to a different question. Websites are curated, unless its spam.

> Websites are curated, unless its spam.

Yes, but even when they are curated the curators are usually unreliable and sometimes malicious.


snippets are just a reflection of that. how is google faring better in that respect?

Those are sites google chooses are correct.

> On top of all that, Google's snippets aren't curated and therefore, aren't always correct.

The “therefore” is misplaced; curated snippets aren't always correct, either.


People on the web take the risk of being misinformed, clicking or not.

It's important to note that this is strategically incredibly important for Google because this forms the backbone of their voice AI. The better at answering questions directly, the better their voice AI becomes and that leads to a lot of future products.

AdWords is and always has been the goose that lays the golden eggs, none of Google's other initiatives have ever rivaled that revenue. That's why they put so much effort into bolstering and optimizing their search results pages.

Another reason is the use of add-ons such as: "Google search link fix - Prevents Google and Yandex search pages from modifying search result links when you click them."

I have stopped using Google a few years ago, but just in case I keep this (or similar) add-ons of my Firefox.

I have no idea of the popularity of such addons, but they would also impact the tracking that Google does.


Oh my God! This is so useful! I hate that I can't right-click on a search result to copy a URL. We definitely used to be able to do this, didn't we?

It's been this way for ages, although for chrome (iirc) this is managed via hyperlink auditing [1] which allows google to track what you're clicking even though the link appears 'clean'.

The click through google redirect also allows them to track things like relevancy of the content and time on site (if you return to google SERP by clicking the back button), in-case the target site isn't using google analytics (unfortunately most sites do).

[1] https://html.spec.whatwg.org/multipage /links.html#hyperlink-auditing

Hyperlink auditing can be blocked with uBlock Origin / uMatrix


Hmm, right clicking and copying works for me in Chrome and Safari. I just tried searching for "test" and the first result is marked up as:

    <a href="https://www.speedtest.net/"
       ping="/url?...>
Looking at https://caniuse.com/#feat=ping it looks like ping is supported in Chrome, Safari, and Edge, but not Firefox; are you using Firefox?

(Disclosure: I work for Google)


I use both. I'll use this "Google search link fix" extension in Firefox until search results links aren't proxied.

Don't like the product? Switch. While you still can.

Any search engine is going to want to know what people click on so they can make their product better. For example, I just searched for [test] on DuckDuckGo and when clicking on the first result I see DDG sending a ping back:

    https://improving.duckduckgo.com/t/lc?...
which contains which URL I clicked.

(Disclosure: I work for Google, speaking only for myself)


That's not true, for instance Startpage doesn't do that.

Startpage is an anonymizing proxy for Google Search, not a full search engine. Crucially, it doesn't determine how to rank results. If they decided to try to compete with Google, Bing, Yandex, DDG etc directly by bringing ranking in-house they would have a very hard time serving good results without being able to track which of their links were popular among users.

I consider myself privacy conscious and have add-ons like muli-account containers, cookie auto-delete, UB Origin and Privacy Badger working in tandem.

It's embarrassing that I wasn't aware of this extension, given how useful it seems - thanks!


How safe are all these plugins we install to escape tracking? Are we trying to escape big tech tracking only to hand our information over to extension developers? Looking at network traffic often shows a ton of extensions sending data to some aws server almost perpetually.

Asking because I'm not sure of the answer to this question and lately I've become even warier so I decided to uninstall everything except things I absolutely must have like colorzilla, grammarly and full-page screen capture. For adblocking I use brave and never ever touch firefox, opera or chrome.

There's an extension that appends a share=1 parameter to all quora links to prevent them from forcing you to sign in in order to view a post. I like it but I'm trying to minimize my extensions footprint and I'd rather write my own script to perform the same script.

The question is, how do you get to be sure that an extension is safe?


Then the snippet should just be used for voice search. And websites should opt in to the program.

> Google has a policy against scraping their results, but their whole business model is predicated off scraping other sites and making money off the content

Yea a couple days ago I was checking the Places API, which they’ve built off user-generated content and scraping Yelp and others. They charge $17 / 1000 calls for certain items and don’t you dare cache anything for too long.

Great way to build a business: get data for free, wall it off and put a hefty price tag on it, then put your best lawyers around the moat for good measure!


I downloaded all the places data for the world while it was still free. In my jurisdiction, the data is considered owned by the place owners rather than Google, so I doubt they'll come after me.

That's the ancestry.com business model as well.

I disagree. There is an implicit contract between website publishers and search engines that it’s ok to do this. The website can set nosnippet in robots if they want to not have the snippet in search results.

So by having a website, I implicitly agree to Google's search practices?

That doesn't seem right.


You put a resource on an open network and don't use any of the standard, recognized methods to indicate don't index, don't share, (nor lock it away with auth).

It's like if you put a sculpture in front yard and get upset when someone points it out in their neighborhood tour company, even worse cause yard ornaments don't have standard accepted methods of saying "don't use".

Two choices

1) use robots.txt

2) don't put it on the internet


You put a resource on an open network and don't use any of the standard, recognized methods to indicate don't index, don't share, (nor lock it away with auth).

This is the kind of argument people used to use as they flagrantly violated your copyright by cloning your article on their own site. "You put it on the Internet, so it's free for everyone to copy."

The law says no such thing, at least not in any jurisdiction that I'm familiar with. Contrary to popular belief in some quarters, normal laws do still apply on the Internet.

If you infringe copyright, it's still infringement even if what you copied was freely available on someone else's site.

And if you state something that is misleading and harmful, it might still be defamation, even if what you stated was just an automatically generated snippet that takes a small part of someone else's site and shows it out of context.


Nah. Take it easy here, there is a long way between indexing and showing the most relevant hit and outright lifting big parts out of the site and use them on their own property:

It is more like if the guide that used to send visitors to your property has set up their own boot on the best spot on the sidewalk next to you and are raking in money because of the useless (often, in the last few years) ads they have plastered all over it.

Even if it is an educational non-profit resource you don't want that as some of the details get lost when visitors only reads the guides summary instead of taking a closer look for themselves.

And according to people on this thread they will also complain and/or come with suggestions about how you can make it even more useful to them.


> It is more like if the guide that used to send visitors to your property

Here is where your argument falls apart. The web is a public space - it's not your property or your front yard. It's more akin to going to the town square wearing a fancy hat and getting upset if people look at you and your weirdly shaped headwear.


The web is a public space - it's not your property or your front yard.

You're wrong here. Just because it's a public space does not mean nobody owns the property. As a simple example, a shopping street is usually a public place. That does not mean that all window displays, doorways and adjacent buildings are automatically a free-for-all.

In fact, only "the tubes" of the web are a public space. The rest is owned property, even if there are no visible fences.


I think of it more as if you put a banner with content somewhere in the public, and I take a photo of it, what can I later do with that photo?

And for that, it's a question of copyright. It turns out, in the US, if something is publicly available it does not make the copyright a part of the public domain. Thus the original author still retains copyright unless explicitly stated otherwise.

There is an exception to this though, which is called fair use. And for that, I'd recommend reading this: https://amp.theatlantic.com/amp/article/411058/ Book snippets by Google searched were deemed fair use.

So the question remains, would website snippet similarly count as fair use? What will the federal courts rule be? And when it comes to fair use, that's the only way to know if it is or not.


It's worth pointing out in this context that the US legal concept of fair use is not universal. In fact, unusually for US IP laws, it's actually much more permissive than most other places. The more usual practice is to enumerate specific situations where copying without the copyright holder's consent is still allowed, instead of defining general tests, which is how fair use works. This has been a controversial point, because it's not clear that the US scheme is sufficient to meet its obligations under international treaties.

In answer to your final question, I'm not sure whether this use of snippets in search engine results has been tested in any US courts yet, but the issue of search engines showing enough content from the sites they link to that users never actually go through to the original site is sufficiently controversial that the EU's recently passed copyright directive includes specific provisions aimed at exactly that sort of situation.


Laws everywhere are pretty much saying your take is wrong. There is no such thing as an implicit contract, and your take on it is plain victim blaming.

It is very surprising to read this on a board where many people write code: if a dev found unlicensed code, they would certainly not think it is public domain.


It's a devil's bargain. If you opt-out of snippets, it simply means somebody else claims the top spot, and you are left with even less traffic (by a significant amount)

> If you opt-out of snippets, it simply means somebody else claims the top spot

Citation? I thought snippets are just for display, not ranking.


Snippets link to the source URL, so getting into the snippet gets your link to top of the page.

You don't have to inform anyone about your content not being redistributable, that is not how copyright works.

> nosnippet

TIL. That's actually a good idea. Does that eliminate all kinds of snippets? NOARCHIVE may also be of use.


> I disagree. There is an implicit contract between website publishers and search engines that it’s ok to do this. The website can set nosnippet in robots if they want to not have the snippet in search results.

Who made this contract? I never signed one. If I came to your place of business and copied your content and provided it somewhere else, I would be infringing your copyright. Do I have to put up signs specifying that at my place of business? Why is this any different? My web content is not the property of someone else and by publishing my information that is in no way an implicit grant of the right to reproduce it.


I believe citing small pieces from a large text is covered by fair use.

It depends. One of the criteria for acceptable "fair use" is that the usage shouldn't negatively affect demand for the original source.

Although there are other criteria to consider, Google's snippets clearly violate that particular tenet.

See #4 https://fairuse.stanford.edu/overview/fair-use/four-factors/


It's a faustian bargin. Google is so powerful you can't do without them, but they're also inexorably eating your future.

Google is so powerful you can't do without them

I wonder how true that assumption really is any more. The quality of traffic Google drives to sites I operate is very low compared to all other major sources, with much less engagement by any metric you like, notably including conversions. The only reliable exception is when we're running marketing campaigns in other places, which often result in spikes in both direct visitors landing on our homepage and search engine visitors arriving at our general landing pages.

There is this conventional wisdom that SEO, and in particular playing by Google's rules to rank highly in its results pages, is the only way you can run a viable commercial site these days. Our experience has been exactly the opposite: our SEO is actually quite effective, in that we do rank very highly for many relevant search terms, but it makes a relatively small contribution to anything that matters. And really, when I write "SEO" here, I'm only talking about general good practices like being fast, having a good information architecture and working well on different devices. We don't change the structure of our pages just because Google's latest blog post says X or Y is now considered a "best practice" or anything like that.

Of course I have no way to know how representative our experience is. YMMV.


It is a very significant part of our business.

Yes you can. There are other ways to market yourself and your website. For instance, the author of “Fearless Negotiation” has appeared in four or five podcasts I follow. The well known pundits in the Apple ecosystem grew an audience organically through word of mouth.

Hoping to stand out on Google results as a business plan is recipe for failure. You are one algorithm change from going out of business.


> There is an implicit contract

Then why can't publishers scrape google?


From http://www.google.com/robots.txt:

    User-agent: *
    Disallow: /search
    ...

It should be opt-in.

So they're like a modern ebaums world for the information age.

Interesting way to put it - the biggest bully with the most money wins!


Funny to read in the html:

- This site is optimized with the Yoast SEO

- This site is optimized with the Schema plugin

Yeah, optimized to death


Glad the first ranked response was this. It's what I came here to say. These days you simply don't need to click as often to get what you need out of a search, and Google's business model doesn't rely on click through to web sites, but for display & click through of ads.

I'm still on the fence somewhat.

Searching for "best car engine oil" has certain brands displayed straight on the featured snippet. Who cares about the click if Google found your customer for you and got your message through for free?


In the end, Google should care. If a search for "best car engine oil" got your product featured, that means you won a sale. But assuming the sale happens completely offline, Google lost its opportunity to inform you of the search, and of the succesful search->sale conversion.

That means your marketing department can no longer justify investing money in Google SEO, which means less optimization towards Google's crawler, which means less reliable search results, which means less Google searches in the long run.


Increased profits from unknown sources VS decreased profits from known sources. The gained marketing intelligence may come at the cost of the bottom line.

Feel free to add them to your robots.txt; they won't scrape you then (but they won't index or rank you either)

I 0 click search more than I click because but not limited to: 1 ) to get the correct spelling of a word that spell check cant find a suggestion 2 ) avoid going to a site that I might potentially get malware (example searching music lyrics) 3 ) avoid having to deal with slow loading and bloated pages

"• Google earns more ad revenue as users stick around on Google longer"

This one is actually reverse. Google search doesn't net google any money if people don't actually click the link, since ad revenue for google search is Per Click, not per view (per mille).

The incentives for them are actually reversed - increasing the amount of clicks into external websites, specifically advertised links, increases their revenue. (which is why there are so many advertised links on a search page)


I do a fair amount of grammar and spelling searches. Google often displays tips and examples. And typing "sp500" displays a stock chart right in Google itself. Google has a lot of "instant snippets" like that. Quite convenient. However, near-monopolies do make me nervous about supporting them.

Does it matter if they have a policy against scraping? I thought that was explicitly legal, which enables their existence.

As I mentioned before in HN, this guy predicted "Google SEO bubble" http://rajeshanbiah.blogspot.com/2018/01/technology-predicti...

Great point, this was my first thought also. Google has been doing a slow creep of this type of content for the past through years through the featured snippet you mentioned, and other knowledge panel material. They now serve sports, weather, math, translation, flights, etc.

I actually searched for best nail for cedar this afternoon :-P. I clicked several of the articles though...

Speaking of scraping, does anyone know where one can get a hold of full text news articles/press releases for nlp research? Most APIs that I have found only offer partial texts.

I know that Aylien has an API for this but it's out of my price range.


Why is this is not copyright infringement?

If I recall correctly content(news especially) publishers and some Europeans were very angry about that. I think the consensus was that these businesses don't understand the internet.

in that case the news sites did get the click, but they wanted more

How do they get the click? If so, what is the fair amount of clicks that businesses should get?

When i go to google news, there are no snippets, just titles linking to newspapers

Shouldn't they be liable for mis-information? Wouldn't that solve the entire problem?

I don’t see any way to actually achieve this at scale let alone any reason to add an opening for more pointless lawsuits. Let’s say they’re liable and you choose to act on incorrect information recieved for free. Do you really try to take them to court and on what grounds?

Yes, Google and similar companies should be 100% responsible for anything published on their platforms. No more “safe harbour”. They have chosen to take positions in many issues, that makes them more like newspapers than phone companies.

Positions like what? And no, banning radicals from their platform for violating their terms of service is not a position.

Even if they were responsible, it's still legal to lie. You don't see pseudoscience websites being taken down because they are objectively false either.


It’s OK for the NYT to attempt to “prevent another Trump situation”. They have an editor and that person is legally responsible for what they publish. They don’t even pretend to be non-partisan. But Google takes a position then hides behind “common carrier” status. It’s not reasonable that they can pick and choose. Either they’re the phone company or they’re a publisher. It’s their right to be either of course, but they must choose.

Similarly for Twitter.


> It’s not reasonable that they can pick and choose. Either they’re the phone company or they’re a publisher. It’s their right to be either of course, but they must choose.

This is 100% wrong, the opposite is true. The law explicitly protects website operators from being liable for content posted by 3rd parties while simultaneously granting them the explicit freedom to curate content that they deem objectionable.


No content on Google is posted there by 3rd parties. Google does select what is displayed and, in the case of snippets, they go out of their traditional way to promote that content.

The use of the word "post" is my own colloquially imprecise language, the law actually states

> No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.

So content indexed by google absolutely falls under the definition of "provided by another information content provider"


Providing links is indeed within this definition. However, cards go beyond that: by selecting one out of the many results promoting it and possibly alterating its meaning by selecting which parts, and how it is displayed goes far beyond merely displaying content provided by others.

Of course, there is plenty of room for google attorneys to wiggle, but in the end the objective for them is to 1) give credibility to a source and 2) to get the benefits of being the providers of information.


Common carrier and safe harbor are 100% separate and distinct concepts. The same way that a forum could have a theme ("political party X posts only") and still be allowed to remove illegal content is Safe Harbor (both curation at their discretion and no responsibility for illegal posts) - and I don't see how one could be against that - and Google is nowhere near that, whatever "positions" you envision them to have taken.

Google and other tech never claimed to be common carriers, and even internet service providers have been cleared of that status - barely anyone is legally required to transmit without discretion (it's pretty much just phone companies). So why make it about Google and Twitter, and start with ISPs?


Stay tuned for the next batch of revelations from Project Veritas.

They're not like either. If anything they're like a phone book for URLs instead of phone numbers.

Except this is a phone book that sorts not alphabetically (no pun intended) but according to its own interests. No phone company ever did that.

Of course it doesn't sort the internet alphabetically, that'd make no sense and be a bad user experience as well as optimize for URLS starting with A.

I don’t mean literally alphabetically but according to some objective measures. In the old days it was by incoming links (PageRank). But now it is opaque and many people are finding that it orders by whatever is best for Google, not for the user.

There is not nearly enough room on the front page for everyone that wants to be there, google has to make subjective decisions about what shows up there, it's impossible to do it any other way.

They could randomize it. Allow everybody to be on the front page an equal number of times.

Then it's spam farms as far as the eye can see, because they can enter a hundred times as much as everyone else.

I don't think that's a bad idea, but the vast majority of google users would not desire this behavior, especially the way google is used today where users try specific terms to relocate content they have looked up before.

It’s called the Yellow Pages. The more money you spent, the more noticeable your business was.

On top of that every Locksmith and towing company had names like “AAAAA Aaron’s Locksmith”.


Yet there is no spam in the Yellow Pages. It’s very unlikely that if you call Aaron he’ll clone your credit card or install hidden cameras in your house. Also, it’s very likely that he actually is a locksmith, has the accreditation he claims to, is a legitimate business registered at Companies House, fully insured, all the things you expect of a normal business.

Hold on. Google doesn't earn even a penny when you visit their site, find your answer on the search results page, and then leave. That user behavior COSTS Google money, it doesn't earn anything.

If they were trying to monetize you they'd show you an ad that links to your answer and take a profit on the click. Directly giving the user the answer they want is great for the user, but guarantees that Google won't earn any revenue.

So why does Google do it? Simple: because their competitors do. That's the free market for you. Google didn't start that feature, another competitor did; Microsoft made it their primary differentiating feature in fact (remember the "bing and decide" ads?). Google had to adopt the same behavior or lose their customers.

So no, don't blame Google, blame capitalism. This is precisely the kind of feature that you wouldn't get if Google was able to behave as a monopoly.


I think "hypocritical" is a more appropriate description than "Ironic"

that is correct and should be considered copyright infringement... I am so tired of the double standard in the US of people VS corporations... Corporations are considered better people then real people.

This is good for users ... for now.

But as Google sucks up the consumer surplus, it's going to be harder and harder to make money from internet businesses, and the final result a few years down the road will be toxic.

The internet isn't going to work too well if its solely reliant on hobbyists.


They could but the hobbists sites are no longer in the serps.

The funny thing is this used to happen. In the early days, you ask a simple question, you would get the answer in the search results, before they introduced feature snippets. The problem was, because no one was clicking on these useful sites, they were downgraded in the listings to sites that hid the useful info so you had to click on it.

This is the subtle truth that I've seen a few folks on Twitter talking about for the past year or so: That Google has slowly but steadily reduced both the outbound clicks to other websites, but also the portion of their revenue that's based on ads hosted by other websites, while bringing both the "results" and the ad placements in-house, where they no longer have to pay out a share to site owners.

Whereas Google was previously a way for sites to be discovered and for sites to generate revenue, it is increasingly becoming the sole source system where data is scraped and imported into Google, and Google keeps all of the revenue to itself.


The increasing sprawl of non-search widgets invading the search result page reminds me of the AOL years where "the web" was funneled through a narrow portal controlled by one entity.

Having to scroll down past ads, unrelated news, unrelated youtube videos, and ever more of these info boxes has pushed the actual content I'm looking for out to the second page. It's made it much easier to use ddg as default and use the !g flag only when absolutely necessary.


And the downfall of AOL was Google because Google had a better product. In order for google to fall, you need a product that's better (or more of what people want; you need to give them a reason to change their search engine).

I have no trouble believing Google is going to fall.

Their results have gone into the toilet - I ragequit Google search about once a day and do something else like forum searches.


I stick with Google but increasingly try to tweak my searches to hit forums. They're just that much less likely to be made-for-AdSense content by a copywriter paraphrasing other information from the web.

Do you have any tips for this sort of search tailoring? It's something I try for, but I've yet to find any particularly good keywords to leverage.

Usually I add "reddit" to the search phrase and try to find threads / user-generated and hopefully more organic content this way.

reddit is good, and so is just "forum" which will turn up specialty forums that haven't been absorbed by one of the Borgs yet.

I often just add "whirlpool" which is a fairly reliable Australian forum that started covering telcos but these days will have things about cars, home maintenance, personal health, etc. Or I add "forum" and that can be enough to tilt the results.

I usually add site:reddit.com or site:news.ycombinator.com etc. Actually google had a way to search discussion gruops, but they removed this feature as forums don't pay for their adds I suppose.

Same. I cant tell if my questions are just getting more specific and technical, but Google search results have been getting pretty useless in the past year or so

I love how google likes to completely ignore what I'm trying to search for. I wish I could think of an example because it happens to me often, but I can't so I'll make one up.

Imagine you're searching for tail lights for your car or something, but you don't know the size, so you search "Astra tail light size". This might bring up headlights. Wrong but no matter, you'd go on to google "Astra tail light size -headlight -head" or something.

What Google seems to have been doing to me recently is ignoring those negated terms, ignoring quotes, and just giving me the same results again and again. It's really getting annoying. Google seems to assume it knows what I'm looking for, and that my search query is just completely wrong and not what I want.

Note that the car stuff is just an example, I'd expect Google to not give you headlights the second time. It generally but not exclusively happens to me when searching things that are more technical. ESPECIALLY when it's a consumer level thing I'm trying to get info on, Google likes to assume it's giving you errors and you're trying to fix it. Which makes sense for most users, but god it's frustrating when every combination of advanced search parameters you try does nothing!

Google search needs a checkbox or something to turn off it's cleverness and just do an actual search.


Absolutely. I was trying to figure out something to do with timeouts in an SMT solver called Yices, so I had search strings about signals and alarm and Yices - of course. Google decided that this was a generalized programming question and displayed a lot of stuff about signals and alarm handling that didn't relate to Yices.

How likely a search time is "Yices", ffs? Feels like something that exotic ("statistically unlikely") probably is meant to be in the results by default.


I had no idea what Yices is. So I Googled it - the first link is SRI's Yices SMT solver. I tried "yices alarm" "yices signals" "yices timeout", and all of them showed only links related to Yices in the first result page (various manual pages, types, etc). So my attempt at reproducing your experience has failed.

The top Google hit for "yices alarm" is currently the exact Hacker News comment you've replied to. I wonder if Google adapted its search results based on that very comment? Maybe their algorithms shrewdly give more weight to fixing search results when the context mentions Google ("I googled for...", "Google didn't work when...", etc.) and the site is high profile (like HN). That would be very crafty.

I am kind of happy and sad in the same time to know that it is not just me.

This is SO annoying.


Same here. It seems the websites that show up top are become more and more spammy and less relevant to my query. I keep seeing all sorts of one sentence hipster 2.0 sites that want me to believe they are a credible source of information.

... and that product will likely repeat the cycle again, on some schedule of another. Might be the 20+ years of Google, might be the few years that Medium was only modestly annoying, might instantly go to shit.

The problem is the business model breeds for this, and we end up replacing one abusive monopoly with another, until we can break that cycle.

For a time it seemed Free Software might ... free us ... from that, though as even that effort's biggest boosters (Eben Moglen, Bradley Kuhn, RMS) freely admit these days, we've been regressing of late, and at an increasing rate.

What's it going to take?


Searching symptom tracker, google tells me no no you mean symptom checker. No sorry I do need a way to track them not check them

Anecdotally, i am noticing a steady decline in adsense RPMs for the exact same audience over past 2 years, and adsense is now most of the time not filling the ad slot.

Also, it has gotten very hard to rank in google for new sites. SEO blackhat tactics rule, and even local businesses use them. Google went from win win to win-get lost.

Sadly, now that webmasters need it the most, investments in alternatives to search and advertising have dried up. There is almost nothing except G.


I feel the same way about ranking new websites. In high school, friends and I made a website for which we did zero search engine optimization (nor were we even aware of search engine optimization). We still ranked in relevant queries and easily got an audience of about ten thousand uniques a month.

Today, I struggle to get sites I make to show up on Google at all. For my most recent website, even searching for phrases that are unique to my website doesn't cause it to rank. This is more frustrating to me, because I have a Google ad for my website, which drives all of my traffic - so I know Google knows about my website and what keywords are relevant to it.


Google has turned an Ocean into a pond - I think we'll all be the better for it once they're gone.

Regularly cannot turn up results that I know exist - a modest change that has no relation to the query meaning and the results often turn up.

This is not the Google Search Engine I remember.


I had to google turn an ocean into a pond as I was not sure what the phrase entails. I got few images of small cars and your comment. QED

And googling it now brings up your comment!

I don't know how old you are, so it's hard to tell how long ago you're comparing. But how do you know if this isn't simply because there are a lot more content than it used to be ?

https://www.internetlivestats.com/total-number-of-websites/ indicates the number of web sites is still growing at breakneck speed.

Somewhat dated but still relevant:

https://searchengineland.com/googles-search-indexes-hits-130...

says the growth rate for the number of pages to be still substantial.


To your last sentence specifically though, I wouldn’t want Google to use ads data to influence non-ads rankings.

> Whereas Google was previously a way for sites to be discovered and for sites to generate revenue, it is increasingly becoming the sole source system where data is scraped and imported into Google, and Google keeps all of the revenue to itself.

I wondered yesterday: if you provide microdata, Google scrapes it, and you later decide to remove your sites from Google - is Google allowed to keep the microdata and continue to publish it?


If you once communicated the fact that 2+2 is 4 and Joe makes very good spaghetti you own the copyright to the text you published but neither the fact nor the opinion belongs to you in any meaningful way nor should it.

That's true, but a collection of facts ("a database") falls under copyright.

Not sure why you are downvoted. In the EU databases fall under copyright, which indeed leaves the question how google legally deals with this (technically database right isn't copyright, but in this context that's a technicality).

Also, to quote from the Wikipedia article [1]: An owner has the right to object to the copying of substantial parts of their database, even if data is extracted and reconstructed piecemeal

1: https://en.wikipedia.org/wiki/Database_right


> Not sure why you are downvoted.

Because they didn't say "in the EU", and it not being copyright is not just a technicality. Copyright is about creative expression, and utilitarian collections of facts aren't.


> Because they didn't say "in the EU"

They also didn't say "in the US". From context you can only assume "in some jurisdiction google cares about"

> Copyright is about creative expression

That's not true, or at least a very US-centric view. The Berne Convention, the international standard for copyright, reads:

"[...] shall include every production in the literary, scientific and artistic domain, whatever may be the mode or form of its expression, such as books, [...] works expressed by a process analogous to photography; works of applied art; illustrations, maps, plans, sketches and three-dimensional works relative to geography, topography, architecture or science."

also

"Collections of literary or artistic works such as encyclopaedias and anthologies which, by reason of the selection and arrangement of their contents, constitute intellectual creations shall be protected as such"

That's lots of things that are not exactly "creative expression" (even though exceptions for pure statements of fact do exist).

https://en.wikipedia.org/wiki/Berne_Convention

https://wipolex.wipo.int/en/text/283698


"by reason of the selection and arrangement of their contents"

If there was no selection or you make the original selection irrelevant, while also giving your own arrangement, then there's no violation of copyright.


https://www.bitlaw.com/copyright/database.html#data

This doesn't provide any protection for the underlying facts.


No, but if Google imports a database then they are still affected by the compilation copyright. It's too obvious of a hack to "just" claim that "yeah, we imported that entire database, but then we cracked all the facts apart and they're all separate now and it's just as if we never imported the database". That's not how the law works.

Even more interestingly, we've still never yet resolved the question of why Google gets to lift your entire site's contents and re-serve them in arbitrary ways to their own profit in the first place. It's really just a thing that happens on the internet because it was happening on the internet before the lawyers got there. I've said before and still believe that if there was no such thing as a search engine and they were just invented today, they'd be annihilated in court as nothing but one big copyright violation.


I'd rather give up copyright than search engines. Anyone who wants to push too hard ought to consider whether an entire nation might make the same choice.

IANAL, but as long as google is only distributing the individual facts (not the database of facts) they would be in the clear, legally

Removal is irrelevant because Google doesn't rely on a license for its index.

robots.txt is a courtesy, not a legal obligation.


I am not sure there's a specific copyright applicable there. You can ask Google to remove your website's data from their index (primarily via robots.txt)... but of course, that also delists you from search. Essentially Google has left the impossible choice to either let them steal your data for free or accept not being findable in the primary search engine on the Internet.

Yeah, but if you don't get any clicks, that choice is no longer impossible: you're providing value without getting any in return.

Granted, it's still a while away to get into that territory, I think most sites still profit from Google.


I think most of my zero-click search are a quick question on something you can find at wikipedia.

Well, I don't see the problem at google providing a cache to save WP's bandwidth. I block ads anyway...


Honestly, the second google started owning and promoting their own properties on search results, anti-trust regulators should have jumped on it.

I'm really not sure how Google can be protected by Section 230 and at the same time control and publish so much directly. Last time I read an article on the topic, google controls 23% of the top 100 sites.


I've heard a decent amount of wrong takes on Section 230. But this is the most bizarre, yet.

Neither the CDA, nor section 230 specifically, create the sort of publisher/platform dichotomy people seem to be hung up on.

And Section 230 does exactly the opposite of what people commonly think it does. It's actually right there, in the text:

No provider or user of an interactive computer service shall be held liable on account of (a) any action voluntarily taken in good faith to restrict access to or availability of material that the provider or user considers to be obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable [...]

That seems really easy to understand: you can delete nazi propaganda, porn, bad jokes, or just random user content from your platform without running the risk of thereby assuming liability for the rest.


But the inherently problem is that they need to be held liable for the rest. Because the rest is often criminal activity of which those platforms are making a profit. Perhaps removing Section 230 isn't enough, by your definition, because we need a law that explicitly holds platforms liable for profits generated from illegal activity.

Section 230 didn't prevent Google from being fined $500M for publishing ads for Canadian pharmacies selling drugs to US citizens. What kind of illegal activities are you talking about, exactly?

Google regularly distributes malicious websites and malware through ads, and refuses to delist reported malicious websites. (Google Ads is the primary distributor of Windows PC malware today, if my customer support experience is any indication.) And the problem is that Google has a perverse incentive: More bidders for ads means higher bids. Since malicious actors raise the prices, they benefit from bad actors selling ads.

And the problem is that even if someone finally comes in and shuts those actors down, Google kept all the profit from the malicious activity. In order to incentivize Google to police it's ad platform, we need to implement a requirement which seizes all revenue from malicious advertising, retroactively when a malicious account is flagged/reported.

If Google is losing revenue on allowing bad actors on their ad platform, they'll be incentivized to quickly respond to reports and remove them so that legitimate ads, which they make money on, can have those ad slots.


More clicks on ads that lead to malware also leads to fewer clicks in the future, though.

Have you published data about this anywhere, like a list of reported links that were ignored?


> More clicks on ads that lead to malware also leads to fewer clicks in the future, though.

I really doubt this works this way. There's 3 assumptions here, for your scenario to play out, people must pass the following funnels:

1- The person notices the malware

2- The person associates the connection between the ad and the malware.

3- After making the connection they install a non-dummy adblocker. Dummy adblockers like the one by Eyeo whitelist google's ads while actually harming the competition. It benefits them! Note: if I look up adblock on google, uBlock is only mentioned on page 2 of google and only because it's mentioned as a competitor to adblock on a zdnet article. The first whole page is dedicated to the Eyeo plug in.

I'd say very few people will get through that funnel. My experience is that when my family and friends actually seek out help with their computers, they have let it go for years until the computer is a slow mess of malware, self installed spyware in the form of browser add ons and other crazy stuff.

I've actually known a person who buys a computer every couple of years when it 'gets slow' simply to avoid maintenance. The few that I know from IRL relationships that do use an adblock, mostly use adblock by Eyeo, simply because of the domain and ranking on google.


That is a curious definition of need. Are landlords liable in your world for renting apartments to people who run ponzi schemes?

Not in my world; planet earth is a large place.

But in the country I live in, the USA, landlords are liable for many things their tenants do.

A friend of mine is a landlord and he almost lost a house he owns because his tenant was cooking meth in it. I don't remember the exact details, but the liability was no joke.

P.S. I don't agree with this liability issue, I'm simply describing reality as it is.


When they get to know about that fact and do not take steps to end this behaviour, although it would be in their possibilities, they might be held liable, too.

So ocdtrekkies point stands, imho. Google is directly profiting from shady activities through their services and therefore has no incentive to control or stop that behaviour. That's a tricky thing.


My main takeaway from this was where the hell are they getting that data? A little digging… apparently if you install Avast antivirus they're tracking all of this and selling /providing to Jumpshot - wow.

> Baker said Jumpshot’s data comes from 100 million devices worldwide, whose users have downloaded free security software from partner Avast. The devices include smartphones, laptops and tablets.

https://marketingland.com/jumpshot-makes-public-some-amazon-...


I'm sticking with my "don't install antivirus" policy. My exception is Malwarebytes which I don't consider a regular AV.

Even my fave from 10 years ago AVG seems user hostile (try turning it off, it's not easy!). I'd hate to see what the other's are doing.


Avast acquired AVG back in 2016. So yes, all that data ends up in the same company.

errr... sure, if your antivirus is free. There are a lot of good paid antivirus that don't have these problems.

Of course, lately, paying for an antivirus hasn't had the same value - windows defender, the much better browser sandbox and the fact that you don't really download executables anymore have contributed to reducing the risk of going without an anti-virus.


Windows Defender and common sense is enough on Windows based systems IMO. And on mac you don't need one.

I would expect the target groups "allows Avast bloatware on his computer" and "uses DuckDuck Go" are completely disjunct. Makes me wonder how valuable their data really is.

You didn't need to dig through anything - it was right at the bottom of the article.

so it is.

This world we live in is terrifying.

The world our ancestors lived in was too. We traded one hell for another.

So a common trope in the startup world is the difference between a feature and a company, the idea being that lots of features are masquerading as companies. This inevitably leads to them being shocked--SHOCKED--when some bigger player adds that feature, eating their lunch.

I feel the same way about a lot of outbound sites on Google. There are a bunch of things I just don't want to go to another site for. Off the top of my head:

- Exchange rates. Although this one I find infuriating because Google doesn't know how to correctly round off exchange rates (as in, there's a standard). This manifests as, say, showing 2 decimal places for AUD/USD when they should be showing 4.

- Calculator

- Mortgage calculator

- Song lyrics

That sort of thing. If you go to any sites that provide these sorts of things they're typically "scummy". Lots of ads, lots of Javascript, lots of dark patterns to make you load more page views (eg a mortgage calculator that'll mysteriously take 3 steps/page loads to calculate).

I'm glad these are in search results, typically in significantly better versions. And I don't think anyone who runs a site built around a basic formula for interest calculations has any right to complain about it.

Of course this will be painted as "where does it end?" but not every surface is a slippery slope.

Just look at the likes of Yelp who complains about Google "stealing" their content. Well, Yelp is about one of the scummiest businesses out there. So I won't feel sorry for them, not now, not ever.

The one weird case here is AMP. Like I get Google's motivations here. Many companies develop terrible mobile sites that run badly or not at all and AMP IS much faster, generally speaking. Yet still it seems so heavyhanded with seemingly no opt out (on the consumer or publisher side). I don't really understand why Google wants to die on this particular hill.


I understand your stand, but what Google did with its featured snippets is essentially a form of bait and switch. It encouraged entrepreneurs and creators to create content that could answer specific queries with the promise that Google will drive traffic to their sites.

Now Google is using the same content and depriving them of the traffic.

That's pretty scummy in my opinion.


Hm, I think there's scumminess on both sides, since (as GP noted) even simple sites (e.g. for lyrics) are bloated and unusable.

Before:

User: Hey Google, what's the monthly payment on a 20-year mortgage with 3.9% APR?

Google: Oh, MortgateSite would know, go there.

MortgageSite: Here's your calculator.

User: Okay, let me put in the numbers ... awesome, thanks! Oh, neat, they can hook me up with lenders. Let me take a look.

MortgageSite: Thanks for the referral, Google!

Everyone wins.

Today, it's more like:

User: Hey Google, what's the monthly payment on a 20-year mortgage with 3.9% APR?

Google: Oh, that would be this much: ... . Here are some sites where you can dig deeper.

User: Um, okay, might be worth a look. Let's try MortgageSite.

MortgageSite: uhhhhhhh hold on a second. Hey, see our BUY THIS PRODUCT mortgage calculator. AND THIS ONE

User: Uh, okay, I'll just type in--

MortgageSite: HEY! It looks like you're new to this site. Want to get on our mailing list?

User: You know what? Screw it.

MortgageSite: Confound you, Google, for stealing our traffic!


Perfect world:

User: Hey Gateway, what's the monthly payment on a 20-year mortgage with 3.9 APR?

Gateway: Here's your calculator, already prefilled with the data taken from your query.

Done.

The problem with all these little sites is, besides bloat, that the "Oh, neat, they can hook me up with lenders. Let me take a look." ends up with user getting malware and/or scammed. The problem with Google is that they can hardly be trusted at this point to do things like this in the interest of users. They want to be the frontend through which you access the Internet.

I used the word "Gateway" in my example as a placeholder; my imaginary perfect world recognizes that things like "currency conversion", "song lyrics" or "mortgage calculator" are data[0], which should be separate from the frontend used to access it. I dream of the Internet where things like these are API-driven and do not involve loading anything other than what's requested - neither ads, nor "value-adds", and definitely not all the rest of the webpage surrounding a mortgage calculator, which is bloat obstructing requested functionality.

--

[0] - yes, even mortgage calculator; it's a mathematical model, an algorithm, and code is data.


https://www.wolframalpha.com/input/?i=what%27s+the+monthly+p...

You're dreaming of what wolfram alpha is trying to build.

Edit: funnily enough, they parsed the input slightly wrong but you can quickly get to your answer.


works perfectly if you put a percent sign after 3.9

User (spoken): Hey Google, what's the monthly payment on a 20-year mortgage of $500k with 3.9% APR?

Google Assistant (aloud): $3,004

Seems pretty straightforward, no need for a web page.

Actual current results: "Sorry, I don't know how to help with that yet"


It's not even malware/scams - it's just "Person who wants a mortgage calculator" is "person who is interested in getting a mortgage", which is some seriously high-value data. There's a lot more profit in selling you on to mortgage lenders (with a lot of data, like your budget, pre-filled) than selling you to scammers.

And surprise surprise... That's what DoubleClick (= Google) does, they'll sell your cookie ID as "interested in mortgage" audience to advertisers.

While in the interest of "privacy" Google limits the amount of data the banks and others you interact with get from any competing solution, they will sell all of it in Ads Data Hub which is a Google Cloud product. So not so much in the interest of privacy but much more in the interest of Google selling more cloud products.


Perfect world:

User: I can afford a livable four walls and a roof from a small amount of savings, without needing to mortgage decades of my future. Brilliant.


Are you arguing that everyone should be able to buy a house, that they can live in forever, with cash they have with little savings?

The problem with that is that if a poor person can afford a house with cash, then a slightly richer person would just go, "wow, I can buy a really big house! Or two!", and suddenly there is no more space.


That would be nice, wouldn't it?

Maybe they wouldn't be allowed to buy two, because in "a perfect world", a place to live is a necessity, not a profit vector. Keeping housing supply limited to boost house prices, to appeal to house-owning voters and wealthy landlords, to extract six figures of money from normal people over decades, is nothing like "perfect".


Some houses are still going to be more desirable than others... how do we determine who gets which?

This sounds a lot more like inviting a trolling argument than any kind of genuine inquiry.

Here I say "landlord epxloitation, profiteering from necessities, houses left unlived in as investment vehicles for the international super rich, NIMBYism and entrenched residents voting against housing stock, AirBNB, broken zoning incentives" and then you say "FREE MARKET it's rich people's right to buy up everything and if you disagree you're dumb, free market is best market". And then we disagree to agree and both leave unhappy. Is that an approximately good summary?

We don't need to determine who gets which for this comment chain, we only need to question whether a 30 year mortgage should be the default way for a normal person to keep the rain off their bed, someone who isn't thinking of a beach condo in Malibu or a NYC penthouse, whether that is the best possible "perfect" world. I think it isn't.


And yet if you type in a mortgage-related query, the first few entries Google will show you are ads. Not the calculator you're looking for.

Even then, mortgages aren't something Google can realistically calculate because lenders don't structure loans that simply. There is a fixed rate period, and a variable period after that.

First-time searchers may get suckered in by a personal finance site that relies on ads just as Google itself does, but after that, they'll be using the calculators that are on lenders' own sites when it's time to comparison shop.


The ads in search results are a utterly, completely different character to the on-site garbage parent is talking about. They are simple text entries that are clearly marked.

The majority of US home mortgages are 30 year fixed rate with no variable period.

The majority of UK home mortgages are 25 year with fixed rate for 2-4 years followed by variable rate(which if course no one ever goes on, you simply remortgage as you're about to go on the variable rate instead).

Canada is the same. Nearly impossible to find a mortgage where the fixed term is longer than 5 years.

Funny thing that usually "BUY THIS PRODUCT" is also a Google Product (Ad)

>MortgageSite: HEY! It looks like you're new to this site. Want to get on our mailing list?

Dear every web developer in the world who makes no effort to fight your employers' marketing department on this: I hate you.


sometimes I try to mess with sites that ask for an email address for a mailing list by entering their own addresses: like "support@domain.com", "webmaster@domain.com", etc.

let them deal with their own spam, then they'll understand.


That's a great idea. I'll try that next time. My version of petty revenge is to sign up with one of my throwaway Gmail accounts, and then mark them as spam as soon as the first newsletter rolls in. Hopefully, there's others doing the same and they'll be at least partially blacklisted.

My go to has been "ceo@example.com" for years.

Or various Five Eyes addresses.


>My go to has been "ceo@example.com" for years.

I think I may make the effort to learn how to make a Firefox extension to automate this. Scrape "about us", "contact", etc. page in background for first email address, then autofill it into the email spam form. Boom, done.


This is great! I usually just do something like bs@fake.com.

I like you.

> employers' marketing department

In some places this a growth hacking/marketing tactic (that works), in others it isn't the marketing team but someone up high demanding it because they've seen it. Am still a marketer, but I've been in the latter and trying to fight the good fight.


I'm not related in any way to that scene, but what would you tell to marketing when they tell you that that while the mailing list pupup is obnoxious and annoying, it results in X users enrolling and later on Y of them actually ends being customers?

There does seem to be a sizeable number of people who take any request on a webpage as a command, and comply with the email popup. Which would then generate leads.

But, this popup harassment is just going to turn away experienced users. Many of whom will be the educated, well-compensated demographic the site most wants to cultivate. And a subset of that group will have the ability to create or edit blockers, which will then be turned against the offending site. Thus making it more difficult for that company to reach the more desirable users.

That's what I would tell marketing. That wouldn't work for all companies. Some will be thrilled to get a boatload of inexperienced or naive users. But would be nice if the others would start to get worried about what they're losing with their bounce rate.


Are you asking for a hypothetical scenario where I was the web developer in question having to stand up to some marketing drone?

Take a closer look at the key phrase in my post: "makes no effort". If a web developer who's trying to pay the rent makes even a single comment to the marketing drones about how email signup forms might not be a good idea, and asks if they're sure they want to do it, then that web developer has my sympathy and not my hatred. But if they just cheerfully say "Yes sir!" when the marketing drones make that demand, then I hate them just as much as I hate the marketing drones, because at that point they are marketing drones.


I was asking because I have been thinking if there are any convincing arguments or incentives for avoiding such tactics, besides moral and integrity. I'm afraid that if's all depend on moral and integrity it is unlikely we are going to win this battle.

Morals and integrity are foreign concepts to marketers. I used to think that marketing had a place, if properly used. After working with marketers for a decade, I just consider them and their whole wretched field a lost cause. I have no sympathy, no empathy, no compassion for any of them. Nowadays I make every effort to use only paid services whenever possible. (If any Fastmail devs are reading... I frickin' love you guys.)

This is great but it ain't gonna fix my internet

You forgot:

MortgageSite: Please wait while we take 10 seconds to calculate an interest rate AND CHECK OUT THIS SPONSOR!

MortgageSite: Five more seconds. YOUR COMPUTER MIGHT HAVE A VIRUS!! DOWNLOAD THIS FREE SCANNER NOW!

MortgageSite: Your rate has been calculated! Click HERE to reveal it using a totally pointless JavaScript animation!

User: Finally, now I’ll just move my cursor up and close this tab.

MortgageSite: OMFG YOURE GOING AWAY! Please come back and click on me more!!!


also, "your rate is ready! just enter your email address and we will send it to you!"

Your second part is wrong. According to the data most users don't dig deeper, they just stop at the answer Google already gave. MortgageSite doesn't even get the traffic in the first place.

I was using that example to establish that the sites (that are losing the clicks) are just as scummy as (if not more than) Google for "stealing" the clicks. That, in turn, helps explain why users might be reluctant to dig deeper, showing that general scummy practices are at least as much to blame as Google "stealing" their content.

Those sites or sites like them made the content initially that google now steals.

I remember a world without lyric sites. These sites provide value. The top links google shows may not because of page layout or bloated sizes but those are the links google presents there are lyric sites with no javascript. Meanwhile tons of mortgage related calculator pages exist on most bank sites and other non-spamming pages.

Clicking a google search result takes seconds to load the link while clicking a ddg link feels much faster. The additional tracking and redirections start to make things feel sluggish.


It's a chicken and egg problem. Website owners are forced to monetize more aggressively as the traffic from search slows down. And as they monetize more aggressively, the user experience suffers, causing them to tank in the SERPs, fueling even more desperation.

Google could fix that if they wanted to.

I remember when Google used to push content developers to be better: no paywalls, no overlays, faster response times, no scammy ads, no content farms.

Those days seem to be over. Google is constantly sending me to content behind paywalls and under pop up content.


I agree that paywalls seem more common. But the early web had plenty of popups and scummy ads. Popup blockers were added to browsers quite a long time ago.

Content farms were huge before Google started kicking them out in 2011. Have they made a comeback?

Google's latest attempt to keep the crap under control (AMP) is very unpopular on Hacker News, but I guess it shows they are trying to do something?


> Content farms were huge before Google started kicking them out in 2011. Have they made a comeback?

Have they not? Content marketing seems as alive as ever, if not more.

> Google's latest attempt to keep the crap under control (AMP) is very unpopular on Hacker News, but I guess it shows they are trying to do something?

Because we - myself and many of those other HNers disliking it - believe that with AMP, making web more performant and less crappy is only an excuse, and one that doesn't stand up to scrutiny.


It's hard to tell since there is little in the news about them, but it seems like once, content farms were fast-growing businesses, and now they are bottom feeders? Or maybe they're just boring compared to all the other threats nowadays?

So you blame Google for paywalls and full screen ads on the pages... you demand they send you to in lieu of just hosting some content themselves? Seems like Google just can't win in that scenario.

FWIW: I agree that for a long time, the need to rank highly in Google's searches pushed content sites to be a better experience. And the equilibrium we've reached isn't as nice as it was 10 years ago. I just don't think "because Google" really captures the complexity here.


Google's whole value proposition is sending me to the best place on the web for a given query -- so yes, when they don't do that it's frustrating.

No, I don't blame them for hosting the content themselves for some queries. I never said that.

Just as Amazon holds some responsibility when I purchase a counterfeit item on their platform, it behooves both companies to have higher quality standards -- and both have the market strength to do it.


>Now Google is using the same content and depriving them of the traffic.

I found some instructions[1] that says a website can opt out of Google's Featured Snippets with special HTML code:

  <meta name=”googlebot” content=”nosnippet”> 
Are there any cases of websites deliberately opting out of snippets and therefore seeing their referral links (and ad revenue) increase?

[1] https://searchenginewatch.com/2019/03/27/google-featured-sni...


It's hard for me to shake the vague feeling that this feels like someone setting up on the sidewalk in front of your property and saying hey, you could have done X, Y, or Z, to stop me from being here. And you say, I did do X. And they say, oh, X was two years ago. Y was last year. Now you have to do Z. Until next year when it'll be AA. And weirdly, AA is the exact opposite of X, because what you thought would take care of things two years ago now penalizes you.

It's not a perfect analogy, because obviously people still hope for referrals from Google. But this idea that because something is published somewhere, it's okay, reminds me of Arthur Dent finding out that the notification about demolishing his house was in the third-subbasement of a local government building in a file cabinet marked "Beware of the Leopard." From rough memory; please forgive if I mis-remembered.


Do I get to bill google for all extra traffic I have to send by appending that to the body of my websites in order to prevent them from stealing my intellectual property?

Also going forward you have to add this to your site or I will steal anything that isn't nailed down.

<meta name=”fuzzz4lyfe” content=”notheft”>


I deliberately opted out of snippets for TLD List [0], a price comparison site I made for domain names.

I don't know if it's made any difference. Organic search traffic from them has slowly but steadily increased over the years.

Google now sometimes displays a snippet from my competitor's website for searches like "cheapest .io domain" [1]. The snippet seems pretty useless as it doesn't include any registrars' names/links (and my competitor's price info is quite outdated).

In these cases, since the snippet is the 1st thing users see in SERP, and doesn't provide enough info to fully answer the question, I'd wager that my competitor is ultimately receiving the majority of clicks from these snippets.

[0] https://tld-list.com

[1] https://i.imgur.com/aFoZbFw.png


>I deliberately opted out of snippets

Appreciate your datapoint. Also btw, when I "view source" the HTML of tld-list.com, I notice it has "nosnippet" inside a comment:

    <!--
    <meta name="googlebot" content="nosnippet">
    -->
Does google crawler parse and obey "nosnippet" embedded in HTML comments?

No I don't think so. Thanks for pointing that out.

IIRC, I eventually removed nosnippet because it caused google to not display microdata in SERP (see the "$25.99 to $99.80" in the above screenshot) that were desirable for my traffic. I instead replaced nosnippet with:

``` <meta name="robots" content="noarchive"> ```

And this seemed to have the same effect as nosnippet, but with the added benefit of my microdata still being displayed in SERP.


Thus allowing your competitor to take your place

wanna take bets google then deranks you heavily?

I prefer Googles snippet to visiting a shitty site. This is one of the few instances that I fully support Google. If websites want to stay relevant, they need to be more inventive.

Also the data they are serving is often incorrect or outdated. I've grown accustom to not trusting these info boxes.

Yep. And what's worse is that sometimes the site is correct and/or up to date and the Google snippet attributed to that site isn't or is poorly chosen for the question.

Google "chancellor" from the UK and it'll suggest the question "who is the UK chancellor now" which expands to name the previous chancellor (an error Google attributes to gov.uk which would have been updated in July with the new appointment). Click on the link to Google the question and this time the snippet attributed to the same source will be up to date and give you the correct answer. But if I still need to load a new page to get the right answer I'd preferred to go direct to the source page without it serving me wrong answers wrongly attributed first.

And the other questions are also a rabbithole of rubbish to fall down (the question "who was the previous chancellor?" answered by "previous chancellors have opted for whisky..." is my favourite combination) or excerpt the wrong part of the page and then encourage you to search Google again to get the same unhelpful excerpt again rather than deign to look at the source period.

Getting this stuff right across millions of queries and unstructured data of mixed quality is undoubtedly an incredibly hard technical problem but I think it's probably better to sometimes let users leave Google properties than feed them inaccurate answers inaccurately attributed...


I don't trust them either. They seem to pull info out of random places, and you can't easily tell how up-to-date the content is.

They also crop the content so even when it is correct you can't see all the steps.

Not from a user search perspective. That's an awesome search feature. And for all the website that aren't trying to monetize, like my personal blog, or wikipedia, it also isn't a problem, and a great feature to have, which might be argued serves the greater good and improves the internet by making finding free flow of information easier.

I think we need to go back a little and think. When the internet started, if you wanted to monetize your website, and attract users, what would you do? There were no search engines. Nowadays, people claim they're at the mercy of Google, but it's more that their very existence and business model was enabled and made viable by Google. Since it seems, without Google, no one would find them and visit their website. In that sense, I find it hard to say Google is taking things away, it seems more like Google is giving and giving, sometimes, it gives less, and sometimes it gives more.


Can a web site simply stop providing the markup for featured snippets?

Are you arguing that Google violates copyright?

Facebook did this with pages

A bait and switch would imply more of an implicit contract than there ever was along the way.

In the 90s, people were making websites because they wanted to make websites, and some of them were informational, and some of them were just whatever cool thing somebody decided to publish. Keeping track of your favorite sites was harder, but eventually browsers added bookmarks and it became a bit easier.

Okay, now how do you find all of that stuff? Three answers kind of sprung up: sites that are list of sites, directories (Yahoo) and search engines. They kind of work, but finding good stuff is still hard, and while the best answer here is probably search, the indexes aren’t entirely comprehensive and the algorithms aren’t good at sorting good stuff from bad stuff yet, they’re too easily gamed.

Google came along and went with search, but they did two things differently: they built a comprehensive index and they had a good algorithm for sorting good stuff from bad stuff. This didn’t stop an entire industry from springing up to try and game it, but they made gaming their algorithm an increasingly expensive proposition, and just doing the right thing easier.

So now we have a good search engine with a comprehensive index, now the web has a different problem: as much stuff as there is, it turns out there’s not enough stuff about a lot of stuff, and this is obvious once you have a good enough search engine that will till you whether there is enough stuff about your term. Well as it turns out, it is already 2004 and people are just now figuring out how to make lots and lots of money with this internet and web stuff. Google has come out ahead of the pact as the obviously superior choice for searching all this stuff, their competitors get a lot better, but Google stays ahead of them, and they’re able to maintain a lead by simply being so much better there is no reason to stop using Google so long as they’re not evil.

They also figured something else out: people are looking for information, or the tools for getting that information. Who knows more about getting information than peoples whose jobs it is to find and surface the right information? And it turns out that some things you can’t find with just a webpage, you need other kinds of tools, like calculators. So they built them in, because at the end of the day, it’s the information that people actually want, they don’t need the debris that comes along with loading a full webpage.

tl;dr Google is an information services company and always has been. Characterizing them as a search engine or advertising company is a massive understatement and was always inaccurate.


I miss having a good index site. Yahoo's gone, DMOZ shut down. We need a good human-curated index that's not bound to ad spend or SEO nonsense. I don't want to know about "long-tail keywords", I want to know if the site has good content that serves eithet a need or an interest. Now, if it so happens that websites with valuable content also have SEO valuable, that means the search engine algorithm is optimized correctly. But we need humans.

A bunch of those things aren't really searches though, are they?

I do queries like "2 tbsp in tsp" or "45 * 22" or "$25 CDN in USD" all the time and I don't think of them as a search.

Same goes for my Echo. I ask it all the time for today's weather or what time the Texas Rangers are playing or who is the starting pitcher for the Giants tonight. None of that feels like a search.


Well, simple math isn't, but think about your currency conversion, your weather lookup, or your sports schedule. Where did Google get the data for those things? Does whoever provided that data have expenses, and did Google compensate them in any way for those expenses or did Google lift the data for free with GoogleBot?

I'm pretty sure they have partnerships with the major stock and currency exchanges, pro sports leagues, and weather.com to get that data.

EDIT: added some sources (sorry, that's the best I could dig up in 5 minutes)

[1] https://www.google.com/googlefinance/disclaimer/

[2] https://searchengineland.com/google-now-with-real-time-nhl-h...

[3] https://bits.blogs.nytimes.com/2015/03/31/ibm-scores-a-weath...


US weather data is available for free from NOAA as long as the API isn't abused. Everybody else is just reselling the same data anyway.

A good comparison here is WolframAlpha, which can also answer all those queries (I was a bit surprised that the Texas Rangers example worked, but it does [1]). Most of their data comes from a curated list of primary sources, not Wikipedia or random websites.

1: https://www.wolframalpha.com/input/?i=texas+rangers+next+gam...


Curated might be better, Google has been wrong in their answers in the past.

Not only are the sources of WolframAlpha curated, they even go the extra step to make sure it's correct. From their FAQ:

"How is Wolfram|Alpha's data checked?

We use a portfolio of automated and manual methods, including statistics, visualization, source cross-checking and expert review. With trillions of pieces of data, it's inevitable that there are still errors out there.

[...]

How is real-time data curated?

Wolfram|Alpha effectively checks real-time data (such as weather, earthquakes, market prices, etc.) against built-in criteria and models. If an unexpected deviation is found, Wolfram|Alpha will normally indicate it, for example by showing lines as dashed."

https://www.wolframalpha.com/faqs/#data-in-wolfram-alpha


I think the best one I've seen was when Google's natural language processing mistook the subject of a sentence. Rather than saying that the Mariner 10 space probe used a gravity assist on Venus to reach Mercury, it instead said that Mercury received a gravity assist from Venus.

Or maybe Google was getting its astronomy from a site that believes Immanuel Velikovsky's theories... [1]

[1] https://en.wikipedia.org/wiki/Worlds_in_Collision


I think they generally pay for it. It came up with song lyrics a couple of weeks ago, where Genius accused Google of scraping lyrics from Genius, but as it turns out Google payed a third party vendor LyricFind for them, and LyricFind pays the copyright holder, so if anything it seems Genius was more in the wrong...

https://www.theverge.com/2019/6/18/18684211/google-song-lyri...


How was Genius in the wrong? They were not paid for the service, and the data was coming from them. It is possible LyricFind stole the lyrics from Genius, but then Genius is still not in the wrong to sue them for scraping their site.

So LyricFind says they licensed the lyrics from the publisher and then verifies against other lyrics websites. Somehow, they ended up with a copy of the lyrics as posted by Genius.

Genius doesn't own the copyright to the lyrics and screwing with the punctuation doesn't create a new work, so I'm not sure they have much of a case against anybody. Maybe they could come up with a ToS violation or CFAA case against LyricFind?


How did that site get the sports schedule? Did they just copy it from somewhere else?

And that's part of the question: If Google's getting all it's data from first party sources, say, scraping it from the team's website, it may not have harmed anyone there. But if it's scraping it from a site which aggregated and normalized the information into an easily parse-able format (hence, providing value, of which Google took advantage), then they must be compensated for it.

Presumably the site being scraped is getting some value from being included in Google's index. If they aren't, they can always opt out.

First you have to find a way to prove it’s your data which isn’t trivial.

Genius had to invent a watermarking system for texts: https://betanews.com/2019/06/17/google-genius-com-lyrics/


You don't have to prove anything to opt out of indexing.

And in the lyrics case, it wouldn't have helped because Google wasn't scraping the lyrics. Google was paying a third party that copied the Genius site.

So it’s a multilayered problem, and content owners are on the losing side.

If it's getting it from a site that's adding no value (e.g. just copying and pasting another site), then there's probably nobody to compensate either.

I agree here. For the median search I make, having to click on a website is a failure. If I search "John McCain age at death", why would I click on a website? This seems trivial, but it is frustrating for even more technical issues. If I search "Non-standard evaluation in left-hand side of mutate command R", it's a bit weird that I'm expected to click the first StackExchange rather than getting a canonical answer excerpted. Ditto when I search "Analytic standard error of a regularized regression". These are basically true/false questions, and it's a win for Google if they give me the answer.

I do emphathize with the companies being scooped by Google, and so I have concern that they are viable economically and that Google compensates them for getting the info from them.

But in the mean time, it's a success for me when my search gives me the answer immediately instead of two or three rounds of indirection.


> If I search "Non-standard evaluation in left-hand side of mutate command R", it's a bit weird that I'm expected to click the first StackExchange rather than getting a canonical answer excerpted.

Are you saying that there's no excerpt at all above the search results, when you expected one; or that there is an excerpt, but it's not from the most appropriate/relevant source?

It's annoying when there's a page that has all the information you might need, but to find it, you first have to scroll past a bunch of Stack Overflow posts that are more suitable for dilettantes who prioritise trying a quick fix over learning best practices. But when you do make the effort to find the most promising link and your browser allows the search engine to record your choice, you at least get a vote regarding how the ranking should be adjusted. When users just look at the results but don't interact with them any further, that feedback vanishes.

A win for Google isn't necessarily a win for Google users.


Maybe Wikipedia could do with a fact-pulling tool that answers small questions like this?

>The one weird case here is AMP.

If you have been to a local news site recently, you might agree that they are similarly "scummy," overrun with banners, popups and native ads.


Try a local news site with Firefox's Reader Mode. If it's still garbage then the local news site isn't worth any more time whatsoever. Keep a running mental list of what sites not to click on. They're typically in the top 1-10 links of search results.

I wish I could hit reader mode before the page loads so my phone doesn't melt in the time it takes for the page to finish loading its 10000 js frameworks.

If you're following a link, then copy the link and paste it into the URL bar. Before you hit enter, prepend with `about:reader?url=`.

check out the mortgage calculator i made: https://www.mortgagecalculator.io/

no dark patterns, loads quickly, uses fathom for analytics so no tracking. There are affiliate links to LendingTree, but they're just text links and i tried to be non-obtrusive to the ux.

also, have an android app that just hit 5,000 installs since being launched in mid-April: https://play.google.com/store/apps/details?id=io.mortgagecal...

currently ranking #3 for 'mortgage calculator' on google play. pretty proud of that! beating out Zillow and Quicken's mortgage calculators.


It's interesting how one of the best sources of information are sites providing the information as a side dish to their core business.

For example I was searching for a lease termination letter template. There are sites dedicated to documents, letters, forms and templates. Many of them are pretty spammy or messy. Then I found a page of a real estate portal. They had just what I needed, without superfluous ads with nice UX. It's in their interest to help people to do this, because then they will probably search for a next apartment. They, of course, are fast to offer additional services on top like searching for the move services. But nevertheless, providing this service is something on edge what they are doing - a marketplace for real estate. Only then they will get some money, so the side experience should be as smooth and painless as possible.

Another example is a tax/salary calculator on a job board. Or... most what Google is doing to some extent, but with different trade-offs.

I know it's nothing new, but it was interesting for me to notice it recently so directly.


Don't know about everything else, but Genius is unparalleled for lyrics to rap and hip hop. I honestly don't see Google having clout to get Zack Fox to do some commentary; Tidal and spotify pretty much cornered that.

Genius is a good resource for lyrics of all genres now. They used to be called RapGenius, if you remember :-)

>with seemingly no opt out (on the consumer or publisher side)

I though AMP was opt-in for publishers? As in you have to write the front-end spec and pick a CDN to cache with.


>I though AMP was opt-in for publishers?

It is but Google will only put you in the carousel at the top of the search if you use AMP. That's a significant enough driver of traffic where publishers can't afford to lose it.


it's not just the carousel. it's organic rankings too.

we had an emergency AMP project after our traffic dropped 35% due to a smaller competitor implementing AMP and google rewarding them by boosting them in the search results.


That's depressing, but I will remember that when I come across an unnecessarily AMPified URL, I shouldn't blame the publisher for making this possible in the first place.

It's not really. Google will highly reward sites with AMP (even when you're not searching on mobile.)

So unless you can survive getting bumped off of the first page of results, not implementing AMP isn't optional.


Yep, it's opt in for publishers. For users, the only way to opt out currently is to view in desktop mode.

Users can use encrypted.google.com to search on mobile, but yeah, there should be a search setting for them.

I've heard of this strategy before, but it has never worked for me. Encrypted.google.com just redirects to google.com, and I still get AMP.

I now use https://github.com/bentasker/RemoveAMP, but that's only possible because I'm lucky enough to have a Jailbroken iPhone. :/


Are you using a userscript app or extension for jailbroken iOS devices?

I'm using software that allows me to load userscripts in Mobile Safari on my iPhone.

I guess you'd call it an "extension"? It feels weird to use that term, as it implies (to me) that there's some sort of extension framework in place. There isn't, it's just code injection.


Thanks I’ll try looking it up!

Yeah extension does sound weird.


Oh, you're Jailbroken? Feel free to shoot me an email (in my profile), there's one thing that's a little tricky...

Okay cool I will!

> pick a CDN to cache with.

Every link aggregator that indexes your AMP page will cache it, including Bing and Baidu. That's what makes safe prerendering possible.


I was referring to the registered CDNs so you could get the icon in search results.

That's what I was referring to as well. They cache your page automatically to enable safe prerendering. Here's how they work: https://medium.com/@pbakaus/why-amp-caches-exist-cd7938da245...

>the difference between a feature and a company

If the right investments were made, every software company's product could be a feature of Microsoft Windows, and Microsoft Windows could be a feature of Intel CPUs.


Microsoft tried that with Internet Explorer and another time with Bing, it didn't work out so well.

Apple's taking another crack at it, and so far nothing looks likely to stop them.

It's not the same as copying the "feature", in many cases Google straight up takes your content and puts it on their homepage. The difference between a product and a feature is nothing. Every app made is just a bunch of features put together. AWS S3 is a great example, it's just like Dropbox for nerds.

A lyrics website recently put fake lyrics on their site to prove Google was copying, and sure enough the fake lyrics showed up a couple weeks later.

Google isn't just taking content from "evil" companies like Yelp, they're doing it to everybody.

Job search, shopping, song lyrics, news, and who knows what else, is all being somewhat blatantly lifted. And nobody can stop it because blocking Google is a death knell to any site


> The difference between a product and a feature is nothing.

Granted, but the comment you replied to was about the difference between a feature and a company.


So it's a case where the webmasters get greedy, and google steals their food, and everyone cheers for google. I 'm starting to think we need search engines to go back to their beginnings, when they were crawling the web , not creating or replacing the web. If they are only pointing to the websites, but rank them by speed, then webmasters would compete who will answer the question faster.

If I want exchange rates I type xe.com in the address bar

This example proves his point exactly.

I'd rather just type CAD to USD.. or BTC to USD, or any other combination in my address bar than type xe.com, load a bunch of libraries, including Facebook connect. Ghostly reports 6 trackers blocked.

Furthermore, it took 3.45 seconds to load xe.com in Incognito, while google only took 1.1


And I like Dave Ramsey's mortgage calculator: https://www.daveramsey.com/mortgage-calculator

Or retirement calculator: https://www.daveramsey.com/smartvestor/investment-calculator


For an increasing proportion of users, the "address bar" is now the "Google search" bar. You may have discovered xe.com when the distinction was much more obvious, but as time went on the browser UI underwent changes, that will change as well.

New versions of Chromium will begin hiding the full URL entirely, probably so that more and more marketing/targeting related UTM parameters can be jammed in without the user ever even knowing.


No it’s so you don’t remember it or care for urls and just search everything making google the only source of info online.

Pay to play baby if you want to show up


I just type "20 usd in eur" in the address bar and get the result without even hitting the enter key.

I don't know about Google but DuckDuckGo pulls conversions from Xe. It is much quicker to duck the conversion than to find it after hitting Xe's homepage.

Song lyrics really ought be from a rights holders site and not j random site with questionable translations - I have seen some lyrics sites that totally mangle the meaning of basic English words.

They "ought" to be, but I fear that regulation censoring these low quality sites would be worse for society in the long run.

Monopolization of cultural information is antisocial.


> Exchange rates

I set up custom search keywords in every browser I use for this, among many other custom searches. In that particular case I query xe.com from an address bar search (not affiliated with them I've just always used the site) as xe <amount> for my most common currency conversion and others like xeyen for Japanese rates, xegbp for British Pound, etc. Takes me right to the page.

Even for various Google services I set them up to save unnecessary clicks.


The automatic calculator and translator is what me switch back from DDG a few weeks ago.

> Although this one I find infuriating because Google doesn't know how to correctly round off exchange rates

I generally search for "1000 USD in AUD" as a workaround.


Both cases showed 2 decimal places for me.

You get 5 decimal places if you divide by 1000...

Thanks, I missed that. I was just looking at the number of decimal places. Is that method accurate?

(1000 * incorrectly_rounded) / 1000 == incorrectly_rounded


It's not incorrectly rounded. It's correctly rounded to two decimal places in either case. Two decimal places just isn't enough digits when you give it 1 USD, so you have to give it 1000 USD.

The great thing about the calculator and specific DSLs like the exchange conversion is that it's trivial to parse. The helpfulness of the "more than just a search index" degrades rapidly into siphoning people to top 50 sites, like scraping metadata from IMDB. I'd much rather they gave me the ability to restrict the information they show me as most of it is distracting to me and reduces my productivity.

If I want to look up an actor I'll just add 'IMDB' to the end and get a much better experience on the site itself than google could ever offer in their search page.


>If you go to any sites that provide these sorts of things they're typically "scummy". Lots of ads, lots of Javascript, lots of dark patterns to make you load more page views

Exactly why go to those sites when you can get the same experience without leaving Google? And its not like Google isn't serving the ads on both Google and those "scummy" websites anyway, and no one is going to out dark pattern Google.


For pop-culture things or very simple queries Google will do, but for everything else I rely on Wolfram Alpha.

As if Google is not the scummiest service out there. Just because it hides its tracking and doesn't quite use dark patterns but manipulates much the same doesn't mean it isn't any better.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: