That's because those guidelines aren't meant for them.
Google is saying that if you don't follow their guidelines, you "shouldn't" be included in their search results. They aren't saying you are committing some kind of atrocity, just that pages that do that provide bad results from searches.
Of course google doesn't want scraped results showing up in their search results. Would you want your search engine to show results from other search engines which show results from other search engines which show results from a page with a snippet which points to the real source?
They aren't "not following their own rules" any more than a train is braking the rules by being on the tracks in spite of the "stay off the train tracks" sign.
From Google's post about the "too many ads above the fold" update:
"As we’ve mentioned previously, we’ve heard complaints from users that if...it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away"[1]
So, they recognize pages that are heavy with ads at the top, which push down the actual content, aren't a good user experience. That's exactly what I get when I search Google...a bunch of ads or other self-serving stuff on top, that pushes down the content (actual organic results) that I'm looking for.
This isn't by accident. Google has engineered it so that companies have to buy PPC ads for their own search results, to avoid 'competitive' PPC from rival brands.
Ad bids are multiplied by a quality score which represents how useful the user will find the ad to be. An ad for the company searched for will be very useful to the user, therefore having a very high quality score, therefore the advertiser pays very little to have the top spot.
A competitor on the other hand would get a low quality score, and have to bid a lot to get the top spot.
It absolutely is to maximize profit - it's to extract surplus PPC budget out of businesses. Not a huge surplus no, but a surplus nonetheless.
G knows the boardroom shitstorm that ensues when a CEO Google's the company name and sees a competitor outranking them (& that it looks like an organic result).
There's therefore tremendous incentive for brands to park some 'defensive' PPC budget (and it's never clear precisely how much you need to be spending and of course your spend will be anchored by your spend on generics), and a lot of incentive for other brands to try to outbid (even if the net effect of those ads is as display ads snd they don't attract clicks).
Consumers meanwhile, will just click the first 'paid' link, meaning that G is getting 30c for a link click the brand previously would have got for free.
Google doesn't make that distinction for sites though. The ranking penalty applies even if the top heavy ads are highly related and complimentary to your content.
My mother has exactly one way to get to the Nordstrom e-commerce site: search Google for Nordstrom, and then click on the first authoritative-looking Nordstrom search result, which is an AdWords ad, landing her at the home page.
Of course. That's why I said "keep" and not acquire. What I meant is Nordstrom would rather pay Google to not lose the customer to some other site because of the placement of ads on Google.
They can't make sure that none of them get in, but their guidelines are just warning that they may "wise up" at some point and if you break those guidelines you may find your site un-findable in google because of it.
That's what we have Ad Limiter for.[1] It trims Google's sponsored results down to one ad. For some subjects (try "credit card") the entire first screen from Google is ads.
Surely Google follows their own guidelines: You can't find Google Search Engine results indexed on Google itself (or any other search engine with or without ads for that matter). Google Search is more of an application than a content site.
Otherwise, when Google finds itself breaking the "rules", they act:
- Google banned the page for Chrome for buying paid links
- Google banned an acquired company (Beatthatquote) for violating rules.
- Google penalized their Adwords FAQ pages for cloaking.
- Google reduced pagerank for Google Japan for buying links.
- Google removed Adwords support pages for keyword stuffing.
"So the next time you complain about your phone service, why don't you try using two Dixie cups with a string? We don't care, we don't have to. We're the phone company."
Mobile is particularly bad these days. For competitive queries you will often have the entire first screen be ads and need to scroll down to see any organic listings.
That is for all intents and purposes equivalent to an interstitial.
> At the time of writing, the [google] queries “Larry David net worth” and “how much is Larry David worth?” both turned up the answer $900 million and credited Business Insider
> The Business Insider story says that “it has been estimated” that Larry David is worth up to $900 million [...] Then it cites CelebrityNetWorth’s lower number, $400 million, and quotes Larry David denying he was worth even $500 million
The most valuable asset Google could possibly develop at this point is humility.
Google more than any other organization on the planet has bought at least some of its own bullshit. For at least 6-7 years they've been consistently, 100% sure that generally-useful AI (or something sufficiently approximately close) is just a few short years away, and thus any investment in anything else (like people) would be wasted.
They've sacrificed countless customers, products and services on this altar, and will continue to do so probably indefinitely. They've decided they're going to live or die by the AI. Humans work there only to build, configure and maintain the machine. Every time I hear someone say "Google really needs to hire some <people>" or "Google needs to train their <people>" better I shake my head - it's like saying Ford should solve an efficiency problem with their cars by building boats.
Google is not going to do that, ever, and Google is not going to learn humility either. They're convinced they have it figured out, reality be damned.
1. Google has been an AI company from the very beginning (information retrieval).
2. Google is investing and doing generally useful applied AI, not an AGI moonshot.
3. Google's AI researchers are not 100% sure that AGI is just a few short years away.
4. Major source of income is advertisements. A lot of non-technical people work on this, allowing others to do more research and improve search.
5. Like said, AI is Google's DNA from the start. They are the biggest AI company in the world, and will die/be dethroned when they let AI research wither.
6. Avoid blanket humility, and lose hunger, innovation, dare. "At Hooli, nothing is ever impossible".
Business Insider had multiple data points, some clearly more correct than others, and Google's algorithm picked the least correct one. The solution is to gain the humility that allows you to accept that the algorithm you promote at the top of the page is never going to beat the manually curated database in the organic search results.
And since the most correct data point in Business Insider came from the curated source in the organic search results in the first place, the upper bound of the algorithm's correctness would have been the curated source anyway.
Here's a lovely snippet-box example I encountered just the other day (and reported to Google; nothing's changed yet). Search for "tintinnabulum". (Advance warning: You might want to avoid doing this at work or in a public place.) The box at the top contains two things.
1. Some text from the Wikipedia article, correctly informing you that a tintinnabulum is a small bell on a pole in a Roman Catholic basilica symbolizing its connection with the Pope.
2. An image of a sculpture whose title happens to be "Tintinnabulum". The sculpture is of a naked woman riding on a penis-with-legs. The penis has a penis of its own, too. (Regrettably this doesn't continue recursively.)
I am fairly confident that nothing resembling that sculpture is to be found in any Roman Catholic basilica symbolizing its connection with the Pope.
I feel like Google stole this idea from DuckDuckGo (correct me if I'm wrong, but I remember them having this first).
If you use DuckDuckGo for "tintinnabulum," you don't get the girl on a penis, but instead a set of five possible definitions to narrow your search. When you click on them, you typically get the wikipedia box, but off to the right as an aside.
I really miss the world of Lycos, Yahoo, Hotbot, Dogpile, etc. If you didn't find what you were looking for, there were other search engines with different algorithms and different results.
Today if Google censors something (removed by DMCA request or government order, which can vary by country, etc. etc.) there are few other big indexes (DDG uses Yandex) to conduct your search. Their index is so massive that the cost of entry into their market is very high.
Judging by the Wikipedia page (but note I really have no idea/experience with this subject in general), it seems to related to phallic figures, so it's no shock that you get a picture with that.
> A tintinnabulum often took the form of a bronze phallic figure or fascinum, a magico-religious phallus thought to ward off the evil eye and bring good fortune and prosperity.
So those images are showing tintinnabulums as well.
Interesting. But note that that's from a different Wikipedia article from the one the box quotes from. The box quotes from the "Tintinnabulum" article, which talks about Roman Catholic churches; you're quoting from the "Tintinnabulum (Ancient Rome)" article, which talks about wind-chime-figurines with enormous penises.
The photo would make a good accompaniment to the latter article but makes a very strange, er, bedfellow with the one actually quoted in the box.
I wonder whether Wikipedia's "Tintinnabulum" article should be renamed "Tintinnabulum (Roman Catholicism)" or something, and "Tintinnabulum" just take you to the disambiguation page, or alternatively whether those two pages should be merged into an article about small bell-like things, with sections on Ancient Rome and Catholic basilicas. Either way, there'd be less likelihood of a hilarious mismatch between picture and text in the Google answer-box.
> Featured Snippets usually have a note that says “About this result,” while Knowledge Graph answers do not.
This seems like the potentially most interesting part to me. Excerpting data & web sites to build a service without citing the source or giving credit seems like it could be a copyright violation.
The snippets case at least cites the source, and has a link to get you there, so even if it is damaging to a business it's probably legal.
While copyright does have exceptions carved out for copying small snippets of a work, e.g., for educational purposes, there's no clear line for copying all of a work in thousands of tiny slices that are published separately. Seems like an area where copyright law is due for a change as a result of all things digital.
In the case of the celebrity net worth site, though, it seems pretty clear that he'd have a very thin case when it came to copyright. Just because it took them a lot of effort to come up with the numbers doesn't make the end result copyrightable - it is an estimation of a fact, and there is no copyright possibility there. Even if they took the whole thing, there is nothing non-obvious about a list of celebs and their estimated net worth. To me, that seems to have been the biggest problem in their particular business: they invested a lot of money into building something that it would be very hard to protect someone from just taking it once it is done.
> Just because it took them a lot of effort to come up with the numbers doesn't make the end result copyrightable
You're bringing up a valid point - facts aren't copyrightable. And while I was talking about the bigger picture, I'd have to agree that this argument may be harder for the celebrity net worth site than others.
But
Compilations of facts are copyrightable, and a large amount of effort spent compiling the numbers does in fact strengthen their case. It's called the "sweat of the brow doctrine", and copyright cases have been decided in favor of people primarily due to their efforts.
I speculate this doctrine does apply to sleuthing celebrity net worth from a variety of sources. But nobody knowns until it goes to court, and it depends on the quality of the lawyers as well as the nitty gritty details about how celebrity net worth gather their data.
Either way, the bigger picture is that regardless of whether the celeb site lives, there may a problem with the way Google is doing business.
>The United States rejected this doctrine in the 1991 United States Supreme Court case Feist Publications v. Rural Telephone Service;[4] until then it had been upheld in a number of US copyright cases
(which is what I remembered from my IP law class ;-) )
Yes, that's correct, the Feist case was the first one rejected on those grounds, and found that "A mechanical, non-selective collection of facts (e.g., alphabetized phone numbers) cannot be protected by copyright." (https://en.m.wikipedia.org/wiki/Copyright_law_of_the_United_...)
But sweat of the brow doctrine still applies to collections of facts when some creative work has been done, and does strengthen a case when there has been more sweat.
The case of the celebrity net worth site is one that is not a mechanical reproduction of facts. They are not a phone book, they are doing research and estimating net worth using implications. Their work is not single-sourced, but based on what they say is a wide variety of sources, some of which may not be publicly available. Who knows, some of it might be unsubstantiated rumor, or even "creative guessing". Just on the face of it, there is arguably enough creative work in what they're doing to satisfy a copyright claim. I doubt they're interested, and I don't think it'd be easy. I don't speak for them, I'm not a lawyer, and I don't recommend it. But to my eyes, it's not out of the realm of possibility.
Completely agree - would be interesting to see it judged!
I've often thought about this in the context of data collected by major sports leagues. They sure are very vocal with their legalese about how they hold all rights to use any of the data that they collect, but I have a strong suspicion that if someone with deep pockets held their feet to the fire, they may not like the result. The PGATour's ShotLink data is an example that comes to mind.
If your analysis is based on judgement calls and proprietary algorithms then the end result is copyrighted even if the end result is just a spreadsheet.
I don't think that's true in the US – data/information isn't copyrightable, only original creative/intellectual works. A collection of data is still no original work. "Analysis based on algorithms" doesn't sound copyrightable either – only works by humans are covered.
The EU, on the other hand, has a special copyright for databases.
At least in the EU, whether the source is cited/linked or not doesn't make any legal difference: If there's no specific exception to copyright that allows what you're doing, you can't copy something, period. And I'm pretty sure Google's use doesn't hold up as either educational (usually doesn't cover commercial services) or as a quotation (usually requires an original work in which another is quoted for illustration/criticism/commentary).
It may be OK in the US "fair use" system, where it's up to courts to decide what's okay – I don't know the case law well enough.
The examples in the article are all kind of acceptable, but I have asked google some very factual question (e.g. can covariance be negative) where it picked out the wrong answer from stack overflow that was clearly the opposite of the truth. It confused the hell out of me.
It's frankly quite shocking that Google's ad results are displayed with pretty much the same look and feel of their search results. Yes, the more tech-savvy among us will see the "Ad" button in gold, and those with ad-blockers won't see the ads at all.
But just a few years ago, Google's ads would appear in a sidebar, clearly separated from the search results content.
Not everyone is able to tell the difference today, and I think Google exploits this. I recently activated a new B2B website for an client and the senior manager was an older gentleman in his 50s. He never uses the browser address bar to navigate to websites, preferring instead to use the Google search box on his homepage. He sent me a panicked email saying when he typed in the new website's name, the Google search results showed competitors' names ahead of his. This was well after Google had crawled the site and it was showing up on the results page.
As it turned out, he did not understand that the first few results he was seeing were ads. While he isn't a savvy user, I would wager that many, many people are similar to him in that they don't exercise good judgement when they're on the web and would easily be fooled into thinking that something like "proton therapy" is a cure for cancer because it's at the top of the search results.
I run uBlock Origin, so when I see an ad it tends to really stand out (like on the site this article is on, which has an ad that doesn't get blocked currently).
That screenshot is pretty shocking. That's an insane amount of pixel space dedicated to the ads, then the Google featured snippet and finally the actual search.
The point that they've slowly made changes over the years to make the ads look less like ads is very clear though.
They've also generally done a lot of work to push the organic results down. It's very easy to find queries where there's no organic result showing above the fold, even on a fairly decent desktop monitor.
There aren't a lot of clinics that can do it, but proton therapy has emerged as another method for treating tumors by irradiating them. Unlike a beam of high energy photons (used in conventional radiotherapy), a proton beam can be tuned so that most of its radiation dose will be deposited at a specific depth. By contrast, a photon beam will deposit more energy at depths shallower than the target, and it will also deposit some energy at depths beyond the target.
That's all well and good, but it doesn't really matter if individual cases get squashed. The larger issue is that Google has moved away from the stance of "we just give you what you search for, it's up to you to verify it" and is instead presenting some results as "this is the definitive answer". That's bad even if all the horribly wrong results are removed. Does anyone really think it'd be a good idea for a single company to own what is and is not considered "true"?
Good screenshot, that is a very unfortunate snippet. It's interesting to me that the quackery text in the snippet closely matches the input query about carrot juice, but the top ad result (with very little text similarity) is for a cancer center with a real radiotherapy treatment [1], FDA approved since 1988.
Proton beam therapy is not widely known, which might explain the scare quotes in rchaud's comment suggesting that non-savvy Google users "would easily be fooled into thinking that something like 'proton therapy' is a cure for cancer because it's at the top of the search results".
I detest ads, so it feels weird to write something like praise for this one. I guess I dislike quackery even more. Now I want to try similar search queries to find other cases where a Google ad suggests a real treatment amid all the bullshit in the non-ad search results.
That's a really interesting problem space, because what do you do when the foundation of the question is terrible, and yet, there exists plenty of information saying otherwise. What is even the ideal result from that query?
> “But then they went ahead and took the data anyway.”
The only question remaining for me is why Googlers still enjoy the respect of their peers. It was fine in 2000, nobody knew how it would turn out. Today, we need to start disrespecting the kind of person that would still work for that company.
Google has incredible PR/marketing. And this isn't just a public-facing engagement. Google seems to do an incredible amount of inward marketing. The same high quality rhetoric that sells a large amount of the public on the notion that Google is a good company trying to make the world a better place works even better on Googlers themselves.
Bear in mind, Googlers are provided food and amenities on campus, which keep them surrounded by the Google mindset most of the time. One of the funniest things recently, on another site, was getting a response from someone condemning my post about pay equality by a "Googler with a Googler wife". Many Googlers friends and family are also Googlers... that's where they spend all of their time. So there are plenty of people at Google whose entire social circle comprises of other Googlers. And that's before you talk about the fact that Google is paying them to be there.
That's a huge amount of social pressure, and a huge amount of bias in what information they take in and how they interpret it. If their entire social and financial structure is built around a single entity, I suspect it'd be fairly difficult for the ordinary person in such a situation to leave. I don't fault individual Googlers for the direction of the machine.
Ha, I guarantee you no one at Google remembered this. They talked in 2014, Google did it in Feb 2016.
By that point whoever they originally talked to probably got their promotion and transferred to some other part of the company, or quit for somewhere else. If they're still doing the same thing, the person who made the decision to go ahead and take it was probably on some very similar but rival project.
Or most likely everyone forgot, or didn't have any awareness that it happened. They didn't case out a building, they added a symbol to a list. I'm guess they only asked in the first place because they didn't have the technical means to scrape it all before.
Yes, they may not be in the spirit of the web. But if you have ever had a featured snippet (or earned one), you'll know that they actually bring a lot more traffic to your site.
They hardly take away traffic from sites, as people are saying. That's just flat-out wrong.
If Google starts believing their goal is to keep people on their site, they'll have a nasty surprise when people don't want to live it by clicking on ads.
Possibly this is because in the UK (and presumably wherever you are) we don't call that cut "eye of round" (it seems to be from roughly the same part of the cow as our silverside), so we're googling to find out what it is, whilst in the US people are googling to find out how to cook it.
i get the sense that google wants the featured snippets feature but are not devoting enough resources to it. the following featured snippet has been appearing for over 8 months, and i have sent feedback multiple times:
If I were them I would vigorously avoid manually fixing individual cases that weren't dangerous? That's an extremely terrible way to build an NLP system.
This amounts to copyright violation. CelebrityNetWorth went to great extent to organize this data and holds copyright on their site. Google is duplicating and redistributing this research without permission of the author.
What's worse is that Google essentially uses its monopoly (or at least extreme popularity) on search to essentially extort people into going along with stuff like this. If the site in question actually filed a DMCA notice or requested a takedown in a more tactful manner (assuming he could actually find someone to talk to about it) Google would probably just tell them to update their robots.txt file so that the site doesn't show up on the search result pages.
"is it a good idea to build a business around trivia answers?"
if your entire business model could be upended by a collection of 1 line wikipedia edits, then it seems like the problem is in the inability to forecast.
It isn't, but it is Google's responsibility not to infringe on the copyright of other people/companies. Google is not displaying their own content in those snippets. They are taking content from other websites, reformatting it and putting it on their own website. By stealing that content and displaying it to the user, the user now no longer needs to visit the website from which the content originated. They are directly taking away potential customers/revenue from these companies. It seems pretty clear that that's not entirely kosher.
Facts aren't copyrightable in the US. See Feist vs. Rural Telephone. If it's a pure fact and exposed to public view, Google can take the data and repurpose it.
Didn't say it was. What I was getting at is that as a website owner, you aren't safe from Google taking away your traffic, solely by focusing on things that aren't trivia.
No, from your perspective as a consumer Google is making it harder to go to the source. And if you learn to trust the snippets, Google is making it pointless for sources to cater to customers, rather than to please Google.
Then you will just talk to Google, the gatekeeper.
You need to get that data from somewhere in order for it to appear to Wikipedia. You need to analyze it. And Wikipedia would give you credit if you do the analysis yourself.
If a business that carefully curates and delivers quality data can be replaced by an algorithm that delivers incorrect data, the problem from society's point of view is on a whole different plane than forecasting.
I don't know how credible celebrity net worth is, but Wikipedia can't provide the best estimates. It doesn't have the organizational structure to ensure that they're present for all celebrities, or that they stay up to date, and its policy on citations means it can't engage in well informed speculation.
The best Wikipedia could do is (manually or automatically) scrape a site like celebrity net worth for information, linking to its estimates, if wikipedia editors decided that it was a notable/reputable source.
I get this, but in another sense, the answer is to offer people a reason to visit the site beyond the number, build an app with features people who care about celebrities' net-worth will enjoy, make something compelling so it's not as one-dimensional and easy to sidestep. It's like the whole 'keep american jobs' thing. I agree that American jobs are good, but many of the people calling for 'bringing back jobs,' aren't asking for new jobs with new technologies, efficiencies, and ways of doing things profitably, they want things to go back to 'the way they were,' and that really can't be done. Cut the losses try something new, keeping lessons learned.
Google's offering isn't self-sustaining. These numbers will not be updated without the celebrity net worth company and Google will have to remove the outdated snippets. The celebrity net worth company started because no one was doing the leg work for the simple number.
Your suggestion to the celebrity net worth company rings hollow after reading the article: the people Google to get the information and get the information at the top. Now there is no opportunity for the celebrity net worth company to offer a reason to visit the site! When the people had to visit the site, then there was a chance for the site to drive further engagement.
The snippets are always wrong if you think about the ad revenue that Google steals and the copyright material that Google is using for profit. It sounds like a class action lawsuit could fix that.
Yeah, the issue here is that he doesn't want to block Google from indexing his site because he wants to be ranked in the results, so he allows Google to display information that is on his site. He would have to disallow reproduction of his site's materials formally, or the much easier method of robots.txt that you mention. Ideally, he would do both. But he knows that will also kill traffic because then he's limited to other search providers like (gasp) Bing.
This is all on him. Is what Google is doing shitty? Yeah. But at the same time it's almost insane that this guy knew his research could be boiled down to a single number (which is clearly displayed on his website) and his whole business model was getting Google traffic. You can't rely on an informal agreement with a titan of industry and expect to have any bargaining or negotiating power.
What?
The snippet mentions/links to bankrate. If you then go to bankrate, it credits celebritynetworth. Bankrate can list any sort of information they please--you can't tell a site not to mention/link to you.
If the business of your company involves Google advertising and getting hits on Google searches, it's a bad idea to completely block Google from indexing the site.
HN elitism at its finest. Some people out there actually run their internet businesses to put food on the table, not just to stroke their techno-libertarian side-project erection
Virtually every business model involving the internet relies at least partly on Google, so while your comment may be true it isn't particularly helpful.
Helpful? actually I've been pitching this same idea (don't rely on Google traffic for your business model) since 2004. I've seen too many businesses FAIL and have to lay off employees because they relied on Google traffic.
Featured snippets are just that--they've taken a snippet from websites and are displaying them at the top of the search results for certain queries. Google doesn't say that the information is correct--they do ask for feedback (there's a link to submit feedback).
Featured Snippets are not damaging to businesses. In fact, I've seen businesses benefit greatly by getting a ton of more traffic by having them. They get more leads and sales, and more ecommerce conversions.
If you’re trying to rank for the answer to a question like “What is Larry David’s Net Worth?”, and the answer literally is a dollar figure or a “quick answer”, then I have no problem with Google “stealing” your traffic. In fact, if the answer is “$125 billion”, and Google can give searchers that answer without having to go to your website, then sobeit. What do you expect that visitor to do anyway when they get to your site? They’ll leave. They’ll hit the back button. They’ll bounce, because your site is focused too much on short, quick answers. How about creating some real content, content that will make visitors stick on your website and view more than one page?
I don't see this as scraping, actually. They're taking part of the content from a page and displaying it in their search results. They do that with the site's title tag and meta description tag.
I would be more concerned about the Google cache (for scraping) rather than a Featured Snippet.
Go ahead and get a featured snippet (or earn one) and you'll see the massive traffic increase and trust your site will get.
If you're concerned about them scraping, really, then go ahead and just block Googlebot from indexing the site.
> Go ahead and get a featured snippet (or earn one) and you'll see the massive traffic increase and trust your site will get.
According to the article which we are supposedly discussing, the site in question got its traffic reduced by half after the snippet was added to the Google search results.
Google's results these days tend to scrape the information from most sites + list ads on >50% of the listings page.
Really frustrating when Google doesn't even follow their own guidelines.