Hacker News new | past | comments | ask | show | jobs | submit login
How Google uses blacklists, algorithm tweaks and contractors for search results (msn.com)
247 points by tysone 30 days ago | hide | past | web | favorite | 177 comments

Bullet #3: Yes they keep blacklists, I worked on the web crawler for many years. But the article does not understand or differentiate between blacklists for content farms, spam domains, link farms, infinite spaces (that aren't calendars) etc. Blacklists are low level url regexps. In ~2015 some spammer in China overnight created 100 million websites to boost priority and each page had 1000 links. Google saved them all! This will literally crush the web crawler, slowing it by 20000x or more, like a snake swallowing an elephant. Or it will crash the crawler and we'd have to write manual code to cleanse the search logs and add blacklists to keep it from happening again, right away.

Blacklisted domains are blacklisted forever, because they crash the crawler. This happens 1x - 3x times per year. Changes here are all tracked and only owners of the first stage of search (who have no connection to the ranking algorithm) have change rights.

Bullet #5: search quality is assessed by thousands of contractors worldwide and you can become a part of the crawl quality team, although it doesn't pay that well in the USA. The book they follow is 180 pages and available on the Google website for many years. It has guidelines for how to determine sexual, offensive, or illegal (or child porn) content in ALL countries. It has guidelines on how to rank news source reputation and credibility.

> Bullet #5: search quality is assessed by thousands of contractors worldwide and you can become a part of the crawl quality team, although it doesn't pay that well in the USA. The book they follow is 180 pages and available on the Google website for many years. It has guidelines for how to determine sexual, offensive, or illegal (or child porn) content in ALL countries. It has guidelines on how to rank news source reputation and credibility.

Funny factoid for you: Google hires those guys in Russia through recruiters and innocuously sounding shell companies in Saint Petersburg.

The guy can work for the shell for years without realising that he works for Google.

> Funny factoid for you: Google hires those guys in Russia through recruiters and innocuously sounding shell companies in Saint Petersburg.

> The guy can work for the shell for years without realising that he works for Google.

Your comment makes this practice sound shady. I don't see how this practice is even probably shady, if the general thrust of you say is true.

First, you have to set up a different company for each country where you're doing business. Second, it's not a shell company if it hires employees or contractors and has clients. Finally, if the subsidiary doesn't engage in your primary business activity you don't call the subsidiary by a name similar to the primary company's name.

It's a very common corporation structure.

Well, at one point a recruiting company acting on their behalf tried recruiting me for an ops position. After seeing a few metre long NDA in perfect English, and Russian, and them being extremely tight lipped for whom and what for the job is, it was very clear to me that it was Google given that I knew that the recruiter is one of the few with whom Google works in Russia.

I asked "Google?" It raised their eyebrows, but they said they can't answer that.

That's common recruiter behaviour. A lot of them are very afraid of being cut out of the deal.

The point is, Google doesn't hire directly there. Officially, they have their Saint Petersburg office closed years ago, and it is "just 3rd party contractor companies" doing things for them there

Did you miss this part of bullet #3?

>These moves are separate from those that block sites as required by U.S. or foreign law, such as those featuring child abuse or with copyright infringement, and from changes designed to demote spam sites, which attempt to game the system to appear higher in results.

The article directly refutes what you are claiming here.

How exactly does the article contradict that?

The article has very few specific examples, a few mentions of some draft and "suspicious" discrepancies between bing and google.

None of which really mean much, except that it sounds like the typical liberal bias meme.

If you actually read the article most of the statements of fact are about the omnibox autocomplete system, and then they use innuendo to imply some things about search engine ranking. But these are two completely separate systems, and it makes sense that a system that is literally telling you what to type is more sensitive than the search result ranking. It is not a flaw of Google that it won't suggest "is hillary clinton still controlled by the jews" when you type "is hillary clinton". If it was just a big trie of what everyone typed, it would be completely dominated by 4chan troll bots.

First, I don't agree with your assessment of the article at all. There are multiple concrete assertions regarding manipulations of the search rankings themselves. I also don't agree that it is somehow "more okay" to manipulate autocomplete results than the search results proper.

Second, I think the big chunk of the problem here is lack of transparency. Google has traditionally been very secretive about its algorithms to avoid tipping off spammers. So if you ask them directly they will hem and haw when in fact they ban spammers and also, as the article reports, they moderate inflammatory content and manually boost rankings of specific websites. The question is - what is the exact scope of these activities? Where is the red line that they will not cross? I think the public deserves to know.

What’s a site they’ve manually boosted?

> Google made algorithmic changes to its search results that favor big businesses over smaller ones, and in at least one case made changes on behalf of a major advertiser, eBay Inc., contrary to its public position that it never takes that type of action. The company also boosts some major websites, such as Amazon.com Inc. and Facebook Inc., according to people familiar with the matter.

Of course the exact nature of changes and boosts remains unknown but that's just underlines the need for transparency.

"algorithmic changes" implies the boost is not manual.

An "algorithm" still implies human intent. Heck, even a blacklisting system is still a form of an "algorithm." Even if each changes to the algorithms Google have made in the past may be justified, the public can't make an informed decision about it if it's not transparent about what it actually does.

Fact: the overwhelming majority of users aren't going to type a search query that completely specifies the information they are looking for.

The user isn't going to supply the very detailed information necessary to objectively filter down (and rank) everything on the web to a set of relevant results, without the search engine making any judgment calls. That would be a ton of work, more than almost any user would want to do. (And many users wouldn't have the technical skills to do it.) It would be like going into a restaurant and handing the chef a recipe for everything in your meal, down to the level of detail saying the vanilla extract in your dessert should use this type of vanilla bean, infuse it in this type of alcohol, and for this long.

The corollary is that any useful search engine will have to guide you a bit. It will take the incomplete specification you gave about what you want, and it will fill in reasonable guesses about everything you didn't say about what you want (but that it needs to know to find it), and then it will give you an answer that's useful.

Obviously, it's ideal to make this guidance objective or unbiased, but how do you even ensure that you're doing that? The whole point here is that you are making guesses about what the user prefers. It's not useful to guess randomly. Objectivity is a good goal, and you should avoid any unnecessary subjectivity, but the idea that you're going to be totally objective seems like a fantasy.

Isn't the autocomplete feature a sort of search in its own right? I agree there is a distinction between the two tools, but deliberately steering autocomplete carries the same potential for abuse as deliberately curating search results

Your jews example is a positive use case but not all conspiracy theories are created equal. Before Epstein killed himself, google's autocomplete steered you away from any negative searches involving Prince Andrew, despite the fact that it was public knowledge at that point that one of Epstein's victims had named the prince as an abuser

Does DDG do the same then?

Its autocompletes:

is hillary clinton running for president

is hillary clinton running in 2020

is hillary clinton going to jail

is hillary clinton an alcoholic

is hillary clinton democrat or republican

is hillary clinton sick

is hillary clinton under investigation

FYI, Google's "is donald trump" includes "is donald trump the antichrist" so they're certainly not consistently selective.

ddg uses google search results because bing will copy them for specific terms so yes.

If you actually read the article, the very first bulleted point list contradicts what you're saying. It openly talks about messing with search results without any innuendos.

So that query is inflammatory. The question would be more about other political queries which might not be PC but also aren’t inflammatory.

Should they suppress queries about area51, Scientology, The_Donald, The_Müller, r/politics, Trump is a Russian stooge, etc?

You can still search for it, but it shouldn't be auto-completing conspiracy theories

If some conspiracy theory is popular, it should absolutely show up in search and autocompletion. Google isn't the arbiter of truth and shouldn't be reducing visibility of anything on the basis of its "conspiracy theory" status. There are many theories that were ultimately proven true.

It’s possible that censoring them drives more attention to them. I’m not sure it’s actually helping Hilary to hide that content.

“Check out what they won’t let you see or talk about!” is one of their main marketing draws.

And if Google was just a cold representation of human behavior (not bots though) then it might help people develop their own editorial voice.

I don’t know. It’s their call at the end of the day.

Some conspiracy theories are just that. Quack theories with no basis.

But others turn out to be true. How do you deal with that?

You "suicide" everyone with the credibility and potential motivation to expose you, and you use your board-level blackmail influence at numerous publicly traded compaines to establish narratives that associate your political opponents with your own crimes, I think.

The cynical part in me would find that funny. But this also poses the question what the responsibilities of web-search are. In some sense the most qualitative output (including autocomplete) is also conventionally responsible. I‘m not sure if this is always the case.

>It is not a flaw of Google that it won't suggest "is hillary clinton still controlled by the jews" when you type "is hillary clinton". If it was just a big trie of what everyone typed, it would be completely dominated by 4chan troll bots.

Would it be a "flaw of Google" in your opinion if Google blacklisted such an autocomplete if that is actually what a lot of (real non-troll) people were actually typing in and interested in finding results for?

How much value is even added by the autocompletion of queries?

A better question would be how much value is removed from the autocomplete suggesting inadequate searches.

Showing suggestions manipulates what someone might search for, and therefore the results they get, especially as they build up search history.

A better option would be to remove search suggestions and only match proper nouns at most.

it matters a lot more for mobile though, the difficulty of typing makes it much more likely to click those searches.


> hillary clinton emails


> is hillary clinton still controlled by the jews

Nice try.

why would that be wrong? if most people are searching for the truth wouldn't it be better to give them both sides of the issue and have them decide for themselves? Why is Google deciding why that specific Hillary query is taboo? And who gave them that right?

> if most people are searching for the truth wouldn't it be better to give them both sides of the issue and have them decide for themselves?

i think the wording of this is insufficiently nuanced. consider that many issues involve more than two clear dominant points of view, and that for many issues most people would not consider all points of view to be equally credible.

> Why is Google deciding why that specific Hillary query is taboo?

because, as another poster pointed out in another subthread, any search engine that is usable (at the level of time and technical skill that most people have) will necessarily have to make essentially editorial judgements. after almost a lifetime of being the sort of nerd that likes to make lists, categorize things, geek out over philosophical classifications, etc, and after a few years of working in the library world, i'm convinced that coming up with any system of abstraction or classification necessarily implies making editorial judgements and value judgements. i think objectivity is a great and important thing to strive for (in reporting and in information classification), but i think achieving it perfectly is definitely not possible, especially on divisive issues, and especially where lots of people disagree (or claim to disagree) on the basic facts.

> And who gave them that right?

i don't know about right, but effectively they have the ability because: 1) they built a really good search engine, 2) they built a really successful ad business on top of that to monetize it, 3) through ignorance and laziness we let them hoover up our data and use it to greatly improve their ad business, which let them provide us even more free services that everyone got hooked on, 4) everyone seems too apathetic to make the effort to move away and no one seems interested in competing with them as a search or email provider for most people. and here we are.

Here are the findings of the investigation, according to the article:

>More than 100 interviews and the Journal’s own testing of Google’s search results reveal:

• Google made algorithmic changes to its search results that favor big businesses over smaller ones, and in at least one case made changes on behalf of a major advertiser, eBay Inc., contrary to its public position that it never takes that type of action. The company also boosts some major websites, such as Amazon.com Inc. and Facebook Inc., according to people familiar with the matter.

• Google engineers regularly make behind-the-scenes adjustments to other information the company is increasingly layering on top of its basic search results. These features include auto-complete suggestions, boxes called “knowledge panels” and “featured snippets,” and news results, which aren’t subject to the same company policies limiting what engineers can remove or change.

• Despite publicly denying doing so, Google keeps blacklists to remove certain sites or prevent others from surfacing in certain types of results. These moves are separate from those that block sites as required by U.S. or foreign law, such as those featuring child abuse or with copyright infringement, and from changes designed to demote spam sites, which attempt to game the system to appear higher in results.

• In auto-complete, the feature that predicts search terms as the user types a query, Google’s engineers have created algorithms and blacklists to weed out more-incendiary suggestions for controversial subjects, such as abortion or immigration, in effect filtering out inflammatory results on high-profile topics.

• Google employees and executives, including co-founders Larry Page and Sergey Brin, have disagreed on how much to intervene on search results and to what extent. Employees can push for revisions in specific search results, including on topics such as vaccinations and autism.

• To evaluate its search results, Google employs thousands of low-paid contractors whose purpose the company says is to assess the quality of the algorithms’ rankings. Even so, contractors said Google gave feedback to these workers to convey what it considered to be the correct ranking of results, and they revised their assessments accordingly, according to contractors interviewed by the Journal. The contractors’ collective evaluations are then used to adjust algorithms

Despite publicly denying doing so, Google keeps blacklists to remove certain sites or prevent others from surfacing in certain types of results. These moves are separate from those that block sites as required by U.S. or foreign law, such as those featuring child abuse or with copyright infringement, and from changes designed to demote spam sites, which attempt to game the system to appear higher in results.

Google has a permanent demotion applied to a site I've run since 1996: http://onlineslangdictionary.com/ . I estimate that my traffic would be 2.5x - 3x what it is now, were the demotion not in place.

These demotions are hidden, permanent, and cannot be appealed. Moreover, these demotions can be performed by hand internally within Google - for whatever reason they choose, or for no reason at all. That is to say, some demotions are manual and not automated.

I have never been officially notified that the demotion exists, in any of Google's available tools or any other way. However, a Google employee checked the internal status of my website and there is, indeed, a permanent demotion in place.

There is no reason for my site to be demoted. This demotion was put in place while Matt Cutts was the head of the web spam team. I asked him about it here on HN, and he lied about it. (I know he lied because of my communication with the Google employee.) You can read my thread with Matt here: https://news.ycombinator.com/item?id=5408087 .

I'd like to make the email chain between the Google employee and I public. But I don't want to ruin someone's career / life, just because they did the right thing and told me about the penalty.

So... I don't know what to do. Thoughts?

A search for "slang dictionary" gives me your site as Nr.1 result. For "slang words" you are somewhere on page 4 or 5.

A Google demotion looks different. What do you expect? Ranking on the first page for every slang word?

Fair question. That's not how these demotion work.

More details follow. But whether my metaphor is apt or not doesn't eliminate the fact that a Google employee informed me about the demotion.

The following is a FAQ taken from the page on my website about the demotion. It's written for a general audience.

Q: I just did a Google search and your site appeared in the first few results. Does that mean that the penalty has been removed?

A: No.

"Google Juice" is an informal term for how favorably Google views a website and pages on that website. There are a lot of factors that go into how much juice websites and pages earn.

Every time you do a search, Google's algorithms evaluate every page on the web to decide how relevant each page is to your query - how much Google Juice each of the pages has for your query. Then they show the search results, which are ordered by the amount of Google Juice each page has.

What the penalty does is subtract an amount of Google Juice from every page on this site. But whether that means we appear 1st, 2nd, or 307th in the search results depends on how much juice each of the other pages on the web have for your query.

You could think of it as a foot race. The penalty doesn't work as in: however you finish in the race, Google will drop you down by 9 places. It works as in: Google attaches a 20 pound weight to your foot, and whether that means you finish 1st or 307th depends on how good the other runners are.

So sometimes pages from this site appear towards the top of search results. That just means that those specific pages have enough Google Juice - and other pages on the web have so little Google Juice - that even with the penalty, we can appear towards the top. But overall, the penalty drags our pages down far enough that we would get about three times as many visitors if the penalty weren't in place.

It's important to differentiate between "link juice" and overall ranking power which I think you mean by "Google juice" (as links are not all there is to ranking well).

That said, I agree that penalties/demotions seem to have a somewhat dampening effect, e.g. resulting in the previously unmodified rank but times 0.2 or whatever.

However, penalties should not last 13 years if the causes have been fixed. What did said Google employee tell you about your case?

edit: also, there is a way to submit a reconsideration request. Have you tried that?

It's important to differentiate between "link juice" and overall ranking power which I think you mean by "Google juice" (as links are not all there is to ranking well).

True. I wanted the text to be immediately understandable by non-technical people. I'm on like revision 3,194 of the text. :) Any suggestions for improvement are welcome.

Also, there is a way to submit a reinclusion request. Have you tried that?

Yes, several times.

However, penalties should not last 13 years if the causes have been fixed.

They shouldn't. But in this case, it has lasted that long. It's a manual penalty turned on by a Google employee. The only way to get rid of the penalty is for them to (manually) turn it off.

What did said Google employee tell you about your case?

That there's nothing I can do.

Manual penalties can be appealed via the reconsiderations process. If a webmaster is unable to file a reconsideration request, this usually means that there is no manual penalty on the site.

As I mentioned above, manual penalties can be hidden. I know this because there was no indication from any tool that there was a penalty against my site, and because Matt Cutts claimed that there was no manual penalty against my site, and yet a Google employee verified the penalty internally.

If you're a Google employee, I'd love to discuss this with you! My email address is waltergr@gmail.com .

Matt's feedback was correct. There's nothing you need to do for your site.

There's nothing you need to do for your site.

I'm a little unclear on that. Matt specifically mentioned two active penalties.

Matt said that there was an automated penalty due to advertising on my site. Following this, I removed all advertising from the site. My site's ranking did not change, nor did traffic referred by Google searches. Why is that? If there's nothing I need to do for the site, why have those metrics not changed?

Matt also said that there was an automated Panda penalty against my site. Following this, I removed all citations from the site. My site's ranking did not change, nor did traffic referred by Google searches. Same questions as above: Why is that? If there's nothing I need to do for the site, why have those metrics not changed?

What did the prior manual penalty against my site that Matt mentioned have to do with Web Build Pages / Jim Boykin? I had never even heard of them. What specifically was that manual penalty? I have seen no evidence in ranking and in traffic referred by Google searches to ever indicate that this penalty existed. Furthermore, I was never informed by Google via any mechanism that there was a manual penalty against my site. Why is that?

Have Google employees ever been able to apply demotions, penalties, or any mechanism whatsoever to drop a website's positions in Google SERPs, in a way that the website owner is never made aware of it?

Has that ever been done to my site, The Online Slang Dictionary?

Thanks very much for your input.


> Have Google employees ever been able to apply demotions, penalties, or any mechanism whatsoever to drop a website's positions in Google SERPs, in a way that the website owner is never made aware of it?

should have read, in part,

> Have Google employees ever been able to manually apply...

That thread with Matt Cutts looks terrible for Google.

In it he admits that the site is being penalized for having prominent ads above the fold.

90% or more of Google’s revenue comes from presenting prominent ads above the fold.

How is demoting the organic results of sites that use the same business model that it does not the epitome of anti-competitive behavior?

In it he admits that the site is being penalized for having prominent ads above the fold.

Yep. And so I removed all advertising from the site for months.

There was absolutely no change in ranking.

Matt Cutts also wrote, "Your site is also affected by our Panda algorithm..."

My website had citations of slang use. By definition citations are 'duplicate content' since they also exist somewhere else.

So I removed all citations from the site for months.

There was absolutely no change in ranking.

The 3 claims Matt Cutts made were:

1. "...the only manual webspam action I see regarding onlineslangdictionary.com is from several years ago (are you familiar with a company called Web Build Pages or someone named Jim Boykin?)..." Wrong. I've never heard of him or his company, I've never worked with anyone involved with SEO, and there was never a change in ranking / site traffic that would suggest that my site had a penalty. Given the very very dim view of SEO practitioners among technically savvy people, I can only assume that his implication ("are you familiar with") was designed to discredit me here.

2. "You're affected by a couple algorithms in our general web ranking. The first is our page layout algorithm... your site has much more prominent ads above the fold compared to Urban Dictionary." Wrong. (As above, I removed all advertising from the site.)

3. "Your site is also affected by our Panda algorithm." Wrong. (As above, I removed all citations from the site.)

I can't say that anything Matt Cutts told me was truthful.

Putting ads above the fold makes a site worse for users. This doesn't mean it's immoral, and it doesn't mean it's the wrong thing to do. You are trading of the amount of value you provide and ability to capture a portion of that value. Capturing value can be essential to continued existence of the site, and ability to create more content.

However, Google search is acting on behalf of the users, trying to find them the result that brings them the most value. And everything else equal, that is the one without ads above the fold.

The argument that Google is acting on behalf of users is contradicted by the fact that they put their own ads above the fold.

Business model has nothing to with anti-competitive behavior, it’s about being in the same industry / providing the same service.

Suppressing sites that display advertising seems like it is anticompetitive behavior against other ad networks, no?

Only if they targeted sites that didn't use AdSense and let sites that did use AdSense place ads wherever they wanted.

Since their own ads on the search results page are not targeted that is a clear yes then.

Wait, what is the basis of ranking that you expect for your site? Is it based on Planck's constant or ???

Via comparison to similar websites. It's an estimation.

Is it possible that the sites that outrank yours are preferred by search users?

Sure. If the demotion were removed and my competitors got more traffic than my site, hey, that's great.

I don't expect special treatment. All I want is for the demotion to be removed so that I can compete with the rest of the web on a level playing field.

I don't think it would be right to out the whistleblower unless you're going to sue Google for unfair treatment, and you need their testimony.

This sort of thing is why we have courts.

This sort of thing is why we have courts.

I wish I had the money to pursue that.

> Despite publicly denying doing so, Google keeps blacklists to remove certain sites or prevent others from surfacing in certain types of results

Google absolutely does this. Around 6-9 months ago bluelight.org (a somewhat controversial drug harm reduction forum) disappeared from search results overnight.

It will show results if you explicitly use `site:bluelight.org`, otherwise nothing

* I feel the need to add a disclaimer here: I don't abuse drugs, but find blue light an interesting source of information regarding side effects and bioavailability of various prescription medications

I wonder how its founders feel about this culture? Google used to have a specific role, creating a win-win situation for everyone on the public web, but now it has become a greedy monster having lost its mind. I wonder what caused it ? was it reckless hiring or the fact that the founders no longer seem to care or anywhere to be found? It's a shame really, for such a monumental company.

The article (which is really long) talks pretty extensively about founder conflict about manual intervention, this is the passage I feel captured the sentiment:

>Mr. Brin still opposed making large-scale efforts to fight spam, because it involved more human intervention. Mr. Brin, whose parents were Jewish émigrés from the former Soviet Union, even personally decided to allow anti-Semitic sites that were in the results for the query “Jew,” according to people familiar with the decision. Google posted a disclaimer with results for that query saying, “Our search results are generated completely objectively and are independent of the beliefs and preferences of those who work at Google.”

>Finally, in 2004, in the bathroom one day at Google’s headquarters in Mountain View, Calif., Mr. Page approached Ben Gomes, one of Google’s early search executives, to express support for his efforts fighting spam. “Just do what you need to do,” said Mr. Page, according to a person familiar with the conversation. “Sergey is going to ruin this f—ing company.”

> I wonder what caused it ?

I guess money caused it. When you have a market-cap of $900+ billion you’re not going to keep up with promises made back when you were “only” a $90 billion company or a $9 billion company. I remember the pre/IPO days when each index update was followed almost religiously on webmasterworld.com and where an user called Googleguy (supposedly Matt Cutts) was implicitly promising all the website owners in there that Google was our friend, that they were never going to become a portal and that they would never steal anyone’s content.

You’re right on an observational level, but I don’t think we should normalize breaking promises because of money

Being right on an observational level is generally considered to be, well, right when it comes to perceived reality, what should or should not happen from a moral/ethical point of view is another discussion (the OP asked why did this happen, not if what happened was right or wrong).

What is the public web today? The old web is dead -- most normal humans live in the modern versions of AOL. Twitter, Facebook, Instagram, etc.

Microsoft figured out that this was going to be a problem first -- you may recall that Bing was marketed as providing answers instead of results, which was a nascent threat. Google moved towards providing answers, and found that embedding ads / product placement in those answers was profitable.

Also remember that the users of Google in 2002 or whenever were a different audience. The "win-win" for you is confusion to the average punter.

I don't want to come off as a defender of Google -- but search is an incredibly complex business that means something different to many cohorts of users. I think the surveillance stuff is approaching or crossing a line nowadays, but I think it's fair to say they have done a pretty good job considering how the online and offline world has evolved.

> The old web is dead

I don't think so, and that s thanks to people who keep writing blogs, tutorials, make thoughtful videos, posting thoughtful comments etc. The "Old web" had a high technical barrier to entry and that acted as a selection filter. The "new web" is the former TV audience, which moved to facebook/instagram and mobile apps. It's big audience, and marketers are after it, but that didnt make the "old web" disappear. I ve never read anything interesting inside facebook except from Yann Lecun's posts (which he keeps there out of courtesy to his employer).

I just hope people stop thinking the old web should also move inside those because that's where their crowd is. It's not true, that's a different crowd.

Google can't interfere with its own search algorithms by definition. It is their design and theirs alone. The accusation isn't merely wrong but impossible generically.

The whole goddamn point of a search engine is to privledge certain results over another. The claims of armies of contractors reek of the zombie lies of their persecution complex.

Sure, you can define algorithm to be "whatever Google does" in this way, but that misses the point: The point isn't that they violate the algorithm, that is just a semantic convenience in presenting the article. The point is that they privilege and editorialize the results in ways they have claimed they do not do.

"google's algorithm" is not an actual algorithm but the promise of organizing the world's information in an impartial way using machines instead of bribes. It was their selling point when they made the site, and their users have not forgotten those promises.


> Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market.

Google does both nowadays. Shame

It stops being a search algorithm when it becomes an ad algorithm.

While what you are saying is obviously correct on some level, degree of control matters. If you ask google, they will happily tell you a story about a vast inscrutable "artificial intelligence" that soaks up wisdom of the internet and that they can control only in the broadest terms, when in reality the degree of control is much more granular.

Why is this distinction important? From the article itself:

> THE JOURNAL’S FINDINGS undercut one of Google’s core defenses against global regulators worried about how it wields its immense power—that the company doesn’t exert editorial control over what it shows users.

An algorithm implies an objective rules based system as opposed to one that changes to the whims of its operators. With a whimsical system an advertiser can come in and ask an operator to rank it up in the organic section where Google claims the rankings are based on page rank. You’re tied to the meaning of algorithm from a developer perspective where you can make things do whatever you want. In the Journal’s non-developer community an Algorithm is different to Do-What-You-Want, it’s a set of high level rules that are small enough to explain.

Between SEO gamification, ad spam in top search results, neutered advanced search capabilities, auto correction of search terms based on NN models which are regressing to the layman's mean, and increasing evidence of manual manipulation of search results/autocomplete, Google search is rapidly degenerating into a pile of garbage. Unfortunately their market cap combined with the status as the defacto portal to the internet makes them hard to unseat and DDG isn't quite as good yet.

Give me back the Google of 5-10 years ago, and the rest of the internet from that time, not this ad and blogspam dominated AOL 2.0 joke of a net that we're quickly centralizing into. It's sad to see where ad based economics are driving the net.

We should rather be fighting for competitive alternatives and looking more at if/where Google uses it's market dominance to stymie competition. That would be a more fruitful discussion IMHO.

What alternatives do you all use? I like DuckDuckGo but it still depends on existing engines behind the scenes. There is also Quant and SearX but I haven’t used either much.

Surely there is some foss project that has been promising, somewhere?

I don’t understand why Google’s competitors don’t form an independent search engine. If I were Microsoft, I’d talk to Apple and others to see if they would help fund a spun off Bing.

The internet badly needs a big alternative search engine that isn’t beholden to advertisers or dependent on a single corporate owner.

The benefit of such a search engine (whose main incentive is to just be a good search engine) is obvious for the public, but would also give companies who rely on their own OS leverage against Google.

...answering my own question: because Google pays these companies off! I almost forgot they pay Apple almost $10 billion to be the default search on iOS.

Conflation of want and need aside how exactly - lets assume they have said niche. How are they going to manage to scale funding to actually provide for it while big?

Paid by user search? Discourage curiosity or just using the alternative.

Deep pocketed sponsor? They have the control now.

User donations are the biggest "maybe" I can see which would be not worse but depends upon charity and campaigning to some degree.

If you mean “conflation of what customers want and what companies need” that is another way to express “customer focus”

If companies like Apple and Microsoft care about providing a great user experience, Google search is risky. I think users would prefer not to see ads when they search, or worry about Google harvesting their data. If this is so, it might be worth it to fund some sort of independent “search foundation.”

I reckon a simple text-only search engine – like Google before it jumped the shark – would actually be quite cheap to develop and operate.

I can tell you from experience that everyday users don‘t care that much about seeing ads. Some even like them. They care even less about the possibility of manipulated results. It’s also a really big topic to explain.

Besides the question of the funding of this hypothetical search engine. It being „fair and objective“ or even completely transparent about its ranking, would mean it‘d be SEO‘d into oblivion by everything from click/content farms to trolls to more nefarious actors. As long as many on the internet want to make money or manipulate people somehow it‘s not really doable in my opinion.

Am I doing your comment an injustice if I paraphrase it as follows?:

- Ads do not adversely affect customer satisfaction.

- The combined forces of Microsoft, Apple and others could not create a serviceable search engine (despite Microsoft alone having already made one).

If not, let’s just agree to disagree :)

Ads are looked at as either a nuisance and necessary evil or just a part of how the „free web“ pays for itself.

I‘m sure they can. And, as you say, MS does already. supported with ads as well unless I‘m mistaken. Are you suggesting they offer a search engine and subsidize it. That they offer it as part of their ecosystem benefits, sort of?

Yes, that’s what I was thinking.

I can’t disagree that users tolerate ads, since advertising is the model of plenty successful websites. It’s just that, like bundled OS crapware, the user experience is better without it.

Isn't that exactly what GDPR is?

When it gets bad, I'll stop going.

But right now I am still getting decent results.

I don't use Facebook anymore, yet we had some hysteria about that being manipulated.

I don't know about you, but when I search for "python string replace" I'd expect the first result to be the Python 3 documentation for 'string'.

Instead, I get (in order):

1. GeeksforGeeks

2. Tutorialspoint

3. W3Schools

4. Programiz

5. Stack overflow

6. And then, finally, the official documentation... For python 2.7.

How does this happen? Are these sites just paying Google a bunch for the rankings?

Honestly, this is because the organization of the Python 3 docs is terrible.

The documentation for str.replace is located halfway down an enormous page that describes every single built-in type in the language [1].

And then, once you manage to find the entry for str.replace, what does it tell you?

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

That's it. No examples, no link to re.sub or other functions you might want to use for replacement. Stack Overflow or even W3Schools (gasp!) is much better results for this.

[1] https://docs.python.org/3/library/stdtypes.html

Incidentally, if this is the level of information you are looking for, you can get the same thing by typing help(str.replace) in the Python interpreter.

I wouldn't expect the official documentation for anything to fare well with web search algorithms. This is probably a place where I'd support manual intervention, to boost docs/manuals/man pages/etc.

I for one have never cared for the Python Doc Site and layout. Too much info for a "quick" lookup. I only need 1 line of code to show me how to replace strings in p3

Maybe because more people who types those queries find the first result from GeeksforGeeks page more useful ? People searching for those are more likely beginner programmers. Being an old unix geek, I personally do like the official documentation better but it's true the official documentation is a lot more obtuse for beginners, whereas that first result looks a lot more friendly with examples, etc.

It's gotten bad for me. I'm just glad others have started noticing. Google search's integration with other services make it hard to leave, but I'm seriously considering switching to a different search engine, and I've been using google search since it was in beta.

My experience is that DDG results are now on par with google's (minus the brute force effect that the country selector is on local results. Without it, no Irish sites can be found, with it, Irish blogspam will get rated above the site I want, so I need to keep toggling it).

This isn't really down to any huge improvement on DDG, but rather a decline in Google's results. I'd just been putting it down as a consequence of Google having less data on me as I made a conscious effort this year to diversify my usage of other sites.

DuckDuckGo isn't yet on par with Google and will never be (hopefully) because that would mean they keep track of users searches and other data, which is the reason most of us don't use Google for. Privacy is not free, and a bit less accuracy when searching online is a small price to pay.

However although DDG is not going to surpass Google in that field, it is indeed getting better and better every year, and there's one thing it could seriously spank Google's ass: implementing a working discussion filter. The discussion filter was one of the most useful filters Google once had: using it in a search meant one would get only results from blogs, forums, Usenet etc, that is, comments from users of X rather than sellers or advertisers promoting that X. It wasn't perfect but helped a lot to filter out shills, astroturfers, fake forums and similar trash, therefore it didn't suprise me much when they removed it probably because their sponsors didn't like that function. So why not implementing it back at DuckDuckGo?

Yes! That's all I want back! Reddit is a great source but some of the best discussion happens on forums that you have no idea about.

DDG uses bing's API to get those results. They don't have their own search algorithm

Google doesn’t show website names anymore either just breadcrumbs

Same here, I’m getting pretty bad results. The SERPs for me include either news results which I really am not interested in or spammy auto-generated stuff that has almost no relevance.

If you're looking for alternatives, DuckDuckGo has made vast improvements over the last 6-12 months. Tried switching about a year ago and went back to Google due to low quality results.. Gave it another whirl this month and the difference is night and day. I've barely used Google since. Definitely give it a shot if you haven't recently, you may be pleasantly surprised!

>But right now I am still getting decent results.

How can you know?

I don't know about OP, but I almost always go to a search engine looking for something specific, and I know when I've found it.

On the rare occasion that I search for some general concept like a current event, I'm quite happy that nutjobs like Daily Stormer get filtered out.

Even in those cases, I'm usually just trying to get straight to a relevant article on a publication that already know of. e.g. "wapo impeachment"

What are you searching for that would require google to filter out results for daily stormer?

I was going to say "breitbart", but I realized that Google doesn't actually filter that site. I tried to think of something that definitely would get filtered, and I came up with that. The example I gave above is what I had in mind.

Is there research on how to create a distributed search engine ?

There's actually a couple of implementations eg : https://en.wikipedia.org/wiki/YaCy

I’d love to see search decentralized and have been toying with this idea for a while.

The last concept I hacked together was a custom search plugin for Grav and a command line util to use for querying.

It goes like this.

Use the command line util to search a term. The command line util run that term against the search engine _inside_ of the websites CMS itself. You essentially have a list of sites related to a topic that you chose to execute the query against.

I got this working against some sites and the proof is there. But it’s obviously highly inefficient and I haven’t figured that out yet. :-/

One alternative is to run a webcrawler that stores the index in a series of SQLite database files, either by topic or by site, or any other criteria. Then users could download sets of those SQLite databases and run queries on them. Not really completely distributed, but hides some information in the noise of "search sets" and mirrors, and individual queries are run on local. You could mirror the main repository and just run searches on your own server/local. You could also swap the database files with P2P, etc.

Could be some kind of giant computation machine running as a public blockchain. Websites could provide their own keywords and machine would be used to rank them for queries.

Check out YaCy

Not really - it is already known. Run your own webspiders.

For those who can't see the article:

>The practice of creating blacklists for certain types of sites or searches has fueled cries of political bias from some Google engineers and right-wing publications that said they have viewed portions of the blacklists. Some of the websites Google appears to have targeted in Google News were conservative sites and blogs, according to documents reviewed by the Journal. In one partial blacklist reviewed by the Journal, some conservative and right-wing websites, including The Gateway Pundit and The United West, were included on a list of hundreds of websites that wouldn’t appear in news or featured products, although they could appear in organic search results.

Gateway is trash and I'm not sure what United West is but they can't say they aren't blacklisting political sites at this point. Pretty big challenge for them ahead of the bipartisan AG investigations and 2020 elections.

Your assertion is a bit literal. Yes, they can't say they aren't blacklisting political sites. But they can say they aren't blacklisting sites for being political.


Why insult people for no reason?

They are classifying results as featured or news. That is a judgement call. Those other sites still return in organic search results. You are mad because they didn't judge your political leanings to be news? It feels ridiculous to expect Google to not make judgements like this and they cannot please everyone. Sure some people won't like it, but in general their news results are solid and don't have an agenda.

Not the OP and I’m not directly involved with this as I’m not an US citizen, but this is straight up censorship. The Republicans are going to have a field-day with this come election season and they would be right. As metioned in the article, I think Brin was right when he fought behavior like this back in the day.

I think it's nothing like censorship. They aren't restricting anyone's free speech. It is Google's platform and within their right to control the content displayed outside of organic search, in fact it's necessary. That's like saying the Wall Street Journal is censoring all my articles I submit to them they don't publish.

The WSJ it’s not the only place in town when it comes to Internet search, while Google effectively is the only search engine that counts. To go back to your metaphor, it’s like not being published by Pravda, which back in the day was the only newspaper that counted in the Soviet Union. That’s why a guy like Brin was correct in his value assessment.

Internet search isn't Google News, there are countless news outlets. Google News is still not radically political leaning and just an aggregator. None that implement any sane blacklisting could be completely bias free anyhow... It is logical and not censorship

The article just mentioned they are blacklisting material based on political views (as far as I can tell the great majority of the censored stuff is right-wing), can’t see how they’re not “politically leaning”. The Republican-led backlash will come back hard on them.

"Some of the websites Google appears to have targeted in Google News were conservative sites and blogs, according to documents reviewed by the Journal."

That is so vague and unsubstantiated. I don't come to that conclusion from looking at this article. How easy is it to replace conservative with liberal? It seems very likely.

> as far as I can tell the great majority of the censored stuff is right-wing

Don't mistake the majority of the whining for the majority of the censoring. The left-wing equivalents of Gateway Pundit aren't included in Google News, either (and they shouldn't be, to be clear). You just don't see Democratic Congressional reps complaining about that fact in bad faith.

It's fine if you believe their blacklisting improves the results, I agree there. But here's where they are going to have a problem with the AGs:

>Google has said in congressional testimony it doesn’t use blacklists. Asked in a 2018 hearing whether Google had ever blacklisted a “company, group, individual or outlet…for political reasons,” Karan Bhatia, Google’s vice president of public policy, responded: “No, ma’am, we don’t use blacklists/whitelists to influence our search results,” according to the transcript.

How do you feel about Google lying to you?

They're specifically scoping things to search results.

The supposedly blacklisted sites like Gateway Pundit are still in Google's search results. They're just not deemed news outlets included in the News tab.

Same reason my small business doesn't show up in the Finance tab. I'm not blacklisted, I'm just not eligible for inclusion.

If they were referencing organic search doesn't that hold true? They are blacklisting separate curated services such as google news. Either way if they lied about it of course that is bad, transparency is the way to gain public trust. I'm not sure if it was violated by that hearing in particular, but if they aren't blacklisting organic results I don't really see an issue.

It makes sense for Google to blacklist news/featured results. I don't want "Occupy Democrats" showing up in the news feed any more than I want Gateway Pundit there.

Blacklists in the search results themselves would be problematic.

Does Occupy Democrats show up?

No, nor should it.

It's as much "news" as Gateway Pundit is, which is to say "not at all".

Thank you I didn't know that.

imbo - this is fraud on a global scale [1], it negatively affects millions of users and thousands of webmasters, and content producers that do not publish in walled gardens.

I've watched the changes since there was a 'googleguy' and the 'big update was charted with moon phases'.

Since about the time Page started pushing things around it's gone downhill slowly with more and more censoring and less and less transparency.

Google is still benefiting from public trust that was earned years ago, when it was truly doing lots of good around the world.

The lack of transparency about the censoring is a terrible thing for the knowledge of the planet - an entire generation of people are learning truth about life (and the afterlife) trusting google, and being filled with info from youtube. Even the censoring spam filters in gmail are affecting people's lives in the real world today.

I mentioned some of the issues with users not getting transparent info about their searches being censored in a comment here recently: https://news.ycombinator.com/item?id=21487318 (and how I think more web sites need to put notices about the increased censorship of big G)

It's fraud for the all webmasters as well.

"Make a good site, we'll find it and display it. Don't do any SEO, that's evil - just make good content" - well many of us have spent hundreds of hours making good content and watching others who have less content rank higher.

Is there a blacklist from the time period Matt Cutts was on the spam team and around the time he left?

I heard a rumor that if your site was on one of these 'we caught you' blacklists with a certain googlers name on it - you can't get out of the de-rank jail unless that specific person lets you out.. a shadowban, and public notice to do more work to fix it, and whole while knowing nothing will fix the ranking for that site.

maybe not regardless - but telling webmasters to make disavow lists and spent that stupid amount of time putting them together - and still not putting their sites back into the top 10 (knowing you've crafted the blacklists / shadowbans and tweaked the algo to push them back as well) - that's fraudulent isn't it.

You've made people spend tons of time trying to fix things for google - knowing they were wasting their time and losing money.

I'm guessing the goal was to destroy lives and knock the spirit out of those people who would 'game the google system' - those evil SEO people should be destroyed.

I'm sure there's a valid excuse - the algo changed - we added more manual reviews - we have this stay in your lane thing, this your life your money thing - you have to have all this other info to be legit -

Is that really what the end user needs when they are looking for entertainment sites? No - it's a sneaky way to put down a bunch of sites to raise up the others.

Then a video comes out - some googlers say, well it is okay to hire an SEO company now - don't hire a bad one or it'll penalize you - you should only hire one that says it will take 6 months for you to rank.

So google keeps saying one thing and doing another - then saying another thing and not doing that.

Webmasters should be able to get details about anyone who has 'manually scored' their site.

In some cases it's not just are they looking for what's in the manual - their location in the world, their religion, and other factors, could influence how they feel about a site and it being downranked by someone in the Philippines could have drastic consequences for a webmaster in the states, and for all the users who might enjoy that site from Europe.

So the one big G statement said that NOT being transparent is the best thing so bad actors don't take advantage of knowledge of the system - well I believe you are hurting more good actors and more users by hiding everything.

There's even some recent evidence that telling users why things have been moderated actually leads to less problems: https://news.ycombinator.com/item?id=21513871

[1] InMyBiasedOpinion - and not a lawyer, doctor ymmv yada yada

Will the downvoters please comment about what part of these statements you find wrong? I am trying to offer an honest assessment from how things look from the other side of the glass, er bubble.

I put in my comment that my opinion is biased, and I think it is obvious what side of the issue that bias is from. I will add that for a couple of years I had a site in the number one position or top 3 of some cool search results and it was a site that gave the searchers what they were looking for.

For a long time google was good. I even took my love of google and content creation to other businesses in town, got them to make better web sites and even partnered with them to spend more than $100k on adwords over a couple years.

When things are good, they are great, but when cracks start to show - the algorithm changes to enhance national publishers, there is little help when you discover fraud clicks, customer support is a 'volunteer top poster not a google employee' kind of thing.

as was said at a hearing recently; "“Small businesses cannot survive on the internet if they cannot be found.”" - https://www.marketwatch.com/story/tech-giants-google-amazon-...

knocking people out of business with hand wavey 'make good stuff, don't do seo' - knowing that they will be screwed forever and knowing they most likely will never know about the secret manual that some know about and how it actually plays out, it's worse than mean.


I have done a fair bit of work labeling and classifying quality of user submitted URLs for a public facing platform (not Google). That includes many hours spent manually inspecting content and deciding whether a site looks "spammy" overall.


If I were judging by the first few paragraphs of this entry from your site, I would lean toward blacklisting it. Reasons: typos, grammatical errors, and a general lack of polish and punch in the writing. It looks at least superficially similar to thousands of keyword-stuffed, semantically-impoverished blogs that I have encountered before. The page source contains another red flag:

<!-- This site is optimized with the Yoast SEO plugin v9.2.1 - https://yoast.com/wordpress/plugins/seo/ -->

The co-occurrence of the terms "SEO" and "optimized" is almost a good enough signal to blacklist it on that basis alone.

I am not saying that you are personally trying to exploit search or recommendation systems to trick people into visiting your page. There are also two big counter-signals that show me this entry isn't part of a content farm:

- You don't link to commercial sites.

- You don't show ads on the page.

The problem is that there are armies of people churning out "SEO blogging make money fast" content, incorporating ads or commercial links, and spreading it across a multitude of domains. For every blog entry like this one -- unpolished but harmless -- there are many that look textually similar and are purely mercenary.

So the one big G statement said that NOT being transparent is the best thing so bad actors don't take advantage of knowledge of the system - well I believe you are hurting more good actors and more users by hiding everything.

This is where I disagree most strongly. Google already struggles to keep junk out of search results. Process transparency would enable content farmers to evolve more quickly. Thousands of brilliant engineers are not a match for millions of people who pollute the web as a full time job. Some good actors will be hurt, granted. I think that you are badly underestimating the number of bad actors when you say that opacity hurts "more good actors" and users than bad actors.

This "typos, grammatical errors, and a general lack of polish and punch in the writing." - reminded me of something I had considered some years ago...

A time when I read google was ranking edu type sites higher, and blogs lower... I noticed more news sites in top results and mayo clinic types..

It dawned on me that it would be a convenient truth to point to a bunch of 'high brow signals' to justify sanitizing results a bit - and that this would be a slippery slop slide into censoring lots of adult stories and other entertainment, while also playing into the bigger companies that can afford to spend the adwords money.

Could be good reasons for this (less public pressure to remove the porn and such) - could be nefarious, censor the web for users and cater to those who can afford to pay the big bucks, less companies to contend with content questions - while not be transparent to the users and content creators.

This allows big money to influence the results via ads easier, and limits choice - those publishers who spend a lot of time creating content are cast aside, even though one side of the big G keeps saying 'create good content you will soar to the top'.

I think we, er they, big G especially, crossed a threshold of being able to determine intent more often than not and so the need to filter search results for 'how to have sex' and 'watch free sex' for example are different and can be, and should be handled differently.

I know they are handled differently to a degree, I feel it's important to point out that these two different intentional searches should show results even much more different than they are.

The first one would likely benefit from ranking higher sites that meet a lot of the points on the pdf manual checkers document and other factors for trust rank and what have you. However I think the other kind of sexual entertainment searches would actually benefit from not using many of those factors in the ranking process.

I believe you will find many professional sex people do not advertise their address on every page of their site, and many do not use real names in order to make it harder for bad things to happen to these people. For example.

I also think the need for perfect grammar and such is much less when people are looking for erotic entertainment. Millions of penthouse stories magazines have sold over the (pre-internet, get that stuff free via searches from content indexers, years) - and I am pretty sure that if every story had perfect grammar like it was written for a college thesis, that they would not have sold as well month after month for years.

If you combine this with the type of grammar and spelling you see a majority of people using via textual communique - look at Insta, fbk, snap... people expect, engage with, react with, and continue to pursue content that is not grammar and spelling perfect.

I'd go so far as to say that a majority of people in the US at least (?) are actually mostly trying to find more crude discussions and writing styles, and it's a much smaller amount of people searching daily for phd level high brow perfection.

Of course this is different when looking for electrical engineering searches, and even searches for putting together prefab furniture - they are definitely searches where you want things to be accurate and no fluff, no extra personality needed.

Given that I believe this to be obvious to most, and that we do not have the computer systems of 1991 running the search giants, I believe that they know they could provide tons more content that browser reporting behavior would show that people enjoy and are looking for - yet they choose to use some of these trust rank things to censor bigger portions of the net for various reasons.

Hey, I'm a big believer in private companies doing what they want - I just think transparency is seriously lacking with big G - why not be honest about how many semi-good sites are not being shown because google is employing new content filters?

We used to see this chilling effects notices regularly, and some results show that X number of pages or sites are not being shown due to dmca requests... but being honest about how many sex chat sites google use to show in the results, and how they have pushed many good ones down and many more straight out of the index.. we don't see any posts about that.

Sadly, for many people the internet is what google shows it is. I understand there are many in the world who think whatever if on fcbook is the entire internet. Well if things are being removed from these platforms and it's not being understood - then it's a huge disservice to humanity, imho. It's closer to people learning with today's tools may never find Mark Twain and others for they are not perfect in the eyes of the elite.

thank you for chiming in - you see there the partial remnants of what once was an important site to me and attempts to share and get discussions going with average people who internet but are not geeks. That site started as a place to put on a business card for an email contact, and then morphed into pages to help people as I found issues in the non-online world. For a couple of years so many people asked me to help them fixed borked computers, I put up a few pages and directed friends and family to go here - do these three things on this page, call me with the results - it's the few things I would do if I came over to try to rescue your xp or vista system.. then added more things.. then experimented with new(ish) at the time wodpress and moveable type.

may things have changed since then, updates and hacks and thankfully less spyware infected systems friends call me about.

It's not made to take the #1 spot for anti-virus, and the experiment with MT and WP has led to some interesting growth in other areas.

So it's not written to win any writing award or be referenced in a phd or anything, just for the average joe I meet on the street and don't want to spend 20 minutes telling them about viruses and know that sending to sofos or something is not going to help them learn or do anything different in the future.

The yoast seo plugin I have used on other sites as well - the main reason for using that plugin is that wordpress does not handle meta title and descriptions very well out of the box - for the most part they put that into the chosen theme to handle - and most do not handle it well.

When you check how your wordpress site looks in google's eyes, many people with a WP site will get warnings about 'duplicate meta descriptions' which cause a penalty in the results I believe. So the yoast plugin can create meta descriptions on pages and posts that do not have them - and it can set some standard robots.txt type rules to block 'duplicate content indexing' as WP out of the box often puts the same text and such on the homepage, and a category page, and an archive page, and others..

So it's a quick an easy fix a few problems with WP that google will alert you about if you login into webmaster tools or maybe when you run a pagespeed check, some of those issues.

So included the yoast seo plugin is to fix some things, it does not create comment spam for seo or try to insert hidden text for britney spears disney and shady redirect to a porn site or anything like that.

you could rename the yoast, the all-in-seo and similar plugins as "remove the negative self created search engine penalties caused by WP and your lousy theme with one click" - They are even more important on sites that run buddypress on top of wordpress for the same issue - and sometimes cause more - but that's for another post.

I think it's possible that we could both be right (although I certainly could be wrong) - in that I still think more good actors are hurt by the non-transparency, and you could be right in that there would be millions more junk sites / pages, although I think there are many fewer people making the bad stuff, I do know they make a lot of it... I still think that by trying to fool a few thousand bad actors big G is actually seriously hurting hundreds of thousands of web site owners.

Just think how many thousands of people have bought shady seo because they have no way to know what works or what doesn't and what's good or what's bad - I definitely think more good people and businesses are being hurt and the playing field is less even for the average business owner because of the lack of transparency.

The very unusual definition of fraud for one. It seems anywhere between inflammatory and nonsensical when even misleading shady advertising tactics like "price match guarantee when they have their own model number for the store chain identical to others" wouldn't be considered so either by law or colloquially.

Thank you for sharing the thought on this, I wonder if I had written all of the above without the 'fraud' part if people would still disagree with it all that as much.

I must admit that choice of words is partially to be inflammatory - but we are talking about a very inflammatory subject (google blacklists, shadowbans, and non transparency about censorship with users and web site publishers) -

If I could edit it, remove the word fraud and change it to 'intentional deception to secure unfair or unlawful gain' - would that take the sting out of it?

I mean it could be said that big G telling people to make a good site and it will rank high if the content is good - and if you make a mistake do a disavow list.. and don't try to do seo - but the other people in the top obviously are - I mean, you could call it a hoax instead of fraud, but (according to wikipedia); 'A hoax is a distinct concept that involves deliberate deception without the intention of gain or of materially damaging or depriving a victim.'

and I am suggesting that I think it's obvious that some teams at google have indeed tried to damage and deprive victims of time and money by telling them to do this or that - and knowing that it's not going to work to bring their sites back to the top - so it's worse and does not qualify for the hoax definition in some situations

I don't mean fraud like some AG is going to put google in jail for criminal fraud.

I do mean that it is obvious that big G and some people there have been purposefully deceitful, and that google has profited from telling people and companies to keep publishing - keep doing things google will like, (schema data!) - and they use that data to profit from it - and they purposefully do not rank and send traffic to many different sites for different reasons.

So no, I don't expect the whole model number switcharoo defense to be made - and even though at least one state's AG has looked into fraud with google, I don't think they know enough about these issues to bring a case, and google has enough money to pay off all 50 states and the EU and sorry not sorry, not putting the algo on trial kind of go away money.

So I'm not suggesting that's going down. What I am saying that is that some there have been intentionally hurtful, both taking people's time and money and it's publicly shown they are / have been deceitful on purpose.

If you don't see that, perhaps you are not familiar with the timeline of all these events as I described in the downvoted comment above. I am sure there are plenty of webmasters and seo people that have witnessed this timeline - and many people have scratched their heads wondering wtf google (and many people pulling their hair out!) over the years - there are plenty of public posts on non-shady forums showing this.

This has been done on purpose.

Lots of people have lost their jobs, their money and time, in some cases their homes. Partially because google changed things with the algo; but they have not been transparent about that and actually have suggested things to do that they knew would not work - giving people false hope and getting them to spend excess time and money, all the while knowing there was no way out of the downrank hole for most and it would all be spent in vain.

funny thing, it's not their money, their life - so they care? I bet the spam team and algo team celebrated some of these changes, laughed at seeing people try to change things - and watched as ad sales increased and their stock did - not caring about the little people out there - and not even notifying the users that they were censoring the shit out of the results now - which funnily, makes the sites in the ads more appealing.

there are plenty of synonyms; con, scam, shell game, double dealing - they could be used in place of fraud in my original statement. I am sorry (truly), and not sorry, that it is inflammatory in this context, as I think the issue of censorship by itself is a serious subject, yet this story goes well beyond that.

I agree it can seem non-nonsensical, but if you look at the events I describe over the time line - I can't actually think of a more sensible term to give it. I guess from the other side of the glass it could be called a funny and profitable business move, as I would guess some did.

what is the purpose of sharing articles that are behind a paywall?

If there's a workaround, it's ok. Users usually post workarounds in the thread.

This is in the FAQ at https://news.ycombinator.com/newsfaq.html and there's more explanation here:



There's a good chance your local library carries a subscription. Sometimes even online subscriptions. Take the article title/date with you and I'm sure you could get a hold of it to read in-depth.

I know here in Toronto, our library subscribes via Pressreader which gives access to a large number of periodicals and newspapers.


Or try an aggregate service like Apple News where you pay a little more total but get wide-reaching access to a large number of pubs.

edit: Sorry, it doesn't look like Pressreader has WSJ, but they do have a large number of other American pubs. Apple News+ has WSJ, I believe.


The valid set of articles for discussion on HN are not limited to "content that can be obtained without paying for it"

The point is that it's an interesting article. It's not hard to search for the keywords in the title and find some other news outlet covering it, albeit in less comprehensive detail, or look for comments that reiterate the salient details.

You install this Firefox extension to get past many paywalls


Do you prefer ads or paywalls?

Nobody's ever happy..

Ads. The social contract of the web was: link aggregators bring traffic to websites which need views to ad-support themselves. WSJ doesnt need it. (This article is not available on archive.is or anywhere. You actually need to pay to see it).

There is no such social contract.

thats why RSS and those like/share buttons existed

good point

You just made me realize I'm a bit of a hypocrite. I don't want ads, but I will never pay for this either. Then again, I don't really want to read it anyway hehe

That being said, if ads actual ads and not targeted surveillance crap, I'd be ok with that. I might have to change my adblocker settings to reflect that...

You should load up the Privacy Badger extension. It lets you adjust the settings to keep trackers at bay.

Forgive me if I'm missing the point of your question, but isn't the answer obviously ads? Even if they couldn't be blocked, at least they don't prevent me from accessing the content. I guess ideally there would be ads with a pay option to remove them, but that's just from a consumer point of view.

Regardless, I realize we shouldn't be complaining about paywalls in comments. I just wanted to address your question.

When you block the ads, the publication doesn't receive revenue from them as the impression is lost.

Then, the publication suffers and the quality falls.

But this way the people who don't have a subscription (which we can safely assume to be GREATER than those who have ad blockers) don't even open the page, so they still don't get any revenue.


I put that on par with: because I'm not paying for an Aston Martin, I don't get to drive one to work every day. Thus, I have nothing to contribute to discussion on daily drivers of Aston Martin cars. Just because I want to join the discussion doesn't mean I'm entitled to one of their cars anyway. Aston Martin isn't weeping for me because I don't get to drive one of their cars.

Why would the publication just give away the product of their labour for free? [Especially the WSJ. That's against their cornerstone beliefs AFAIU.]

This is easy : I have an ad blocker, so I’m happy with ads.

Blogs= spam

Paid articles = ???

Google has destroyed small business with its monopoly. First search page is all ads on mobile.

Try this:



The 3rd result on duckduckgo.com is learntherisk.org. That result is not presented in the first 10 pages of results on Google.

It seems quite likely to me that one of these sites is manipulating search results. Because the organic search results for these two different search engines should not be that far apart.

Edit: Why the downvotes? Here is an even more egregious example:

Try searching for "learn the risk autism".



Note that I am not promoting an agenda here. This seems like an example of manual manipulation. The article sites "vaccinations and autism" as an example.

Wait why would duckduckgo be more accurate because a random site called learntherisk.org is present. This is just confirmation bias.

Yeah, but why learntherisk.org?

Is this an authoritative source of information? That website screams confirmation bias all-around.

Web sites themselves manipulate search results to appear higher in the results than they might seem to merit, so a naive presentation algorithm is also "manipulated". Any ordering represents somebody's opinion on what you should get back; there is no single universal objective search result order.

I have no idea if Google is specifically pushing vaccine denier sites down the list, though I can say that the entire concept has far more "mind share" than the notion merits scientifically. That much, at least, is objectively true.

Maybe because learntherisk.org is part of the Robert F. Kennedy Jr spin machine?


Google sucks in many ways. Search results are very curated and filtered to match they agenda.

Advertise ? How it's any profitable for advertiser to have multiple same adverts on page ?

Thruth about Bill gates and Epstein is filtered away..just like hn filter and delete comments :)


I can't read the article but unless it's NOT about Google promoting their services on top of others, let me be the devil's advocate and ask:

Are we elevating Google's search engine to public utility status?

Criticism can happen even without calls for public legislation. In fact i don't believe that's what the article suggests (I can't read it)

Same here, but working with that assumption:

Yes. And we might have too I'd say. Given the defacto monopoly they have, I think it is reasonable to agree that they fulfill a public utility function and thus need to be held to a higher standard. This should however be done through regulation, cause that is why we have governments.

But other search engines do exist, and it is trivial to switch from Google to Bing or DuckDuckGo. Unlike a social network, you can switch your search engine in isolation without waiting for others to switch.

The American people arent this stupid. Its obvious to everyone that the search results are being messed with. Its incredibly arrogant for Google to think they can get away with mass scale information manipulation. Capitalism doesnt work this way, Google search is an inferior product, everyday the opportunity for a competitor to move in gets just a little bit bigger. But the time is not quite here yet

> Google search is an inferior product

Inferior compared to what, exactly? I have yet to find a good replacement.

> I have yet to find a good replacement.


I have switched to duckduckgo as my default search engine a couple of years now, and I constantly find myself having to re-enter the query with a !g in front.

As much as I dislike what Google has become, their search engine quality is still the best, and by a very, very large margin.

Bing is a pretty good replacement if you like pornography with innocuous searches.

I tend to disable moderation settings because I'm an adult right?

Google does a pretty good job distinguishing intent. For example a search for "<actor|actress>" and "<actor|actress> nude" with moderation settings off yield very different results. On Bing? Not so much.

Worded that poorly, inferior to what could exist. Clearly too expensive at this point to show up and compete, but I think the time is coming.

Isn't literally everything inferior to what COULD exist?

No, I think Google search is inferior to what google could produce. By that I mean they tuned their own knobs in such a way that search results are worse than they could be. They are hurting the search results in the name of revenue and bias (political and otherwise)

>Inferior compared to what, exactly?

Inferior to what Google was 5+ years ago.

yandex is better for image search

and bing is not that far behind for web.

Right now search engines are not a hot topic. But if people smell that google is starting to lose it, you bet they 'll go after his market share like rabid dogs

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact