Hacker News new | past | comments | ask | show | jobs | submit login
Elevating original reporting in Search (blog.google)
162 points by cryptofits 31 days ago | hide | past | web | favorite | 156 comments



I found the rater guidelines document linked in this blog post quite insightful. Sections like how to assess the "E-A-T" rating (Expertise, Authoritativeness, and Trustworthiness) will be useful to me when writing content.

https://static.googleusercontent.com/media/guidelines.raterh...


I don't think anybody calling it "content" should ever be considered trustworthy. The word reeks of hastily rephrased Wikipedia solely intended for search engines.


It's a standard term of art, hence terms like "digital content creation" tools, "separating content from presentation", and so on.

I think any other connotations you're associating with the word are probably down to you, not the word.


Yes, it's a standard term of art within the hastily-rephrasing-wikipedia-for-SEO community.



The phrase “Table of Contents” refers to the content of a written work. That certainly wasn’t invented for SEO.


Authoritative seeming sources are not always a good thing. Anytime you search for a game guide/faq/hint you get drivel from the big name sites that are super short on content and have bad advice. Almost in all cases the content I’m looking for is in some tiny blog that barely ranks but matches all the search words.


> Authoritative seeming sources are not always a good thing.

Yeah, but what if stupid people search for the wrong thing, find the wrong website and get infected with Wrongthink? Have you thought about that? Crushing amateur content creators and betraying all the ideals of the early Web is small price to pay for preventing that intolerable scenario.


I have this feeling like 30% of the time on HN and never know how to articulate it. I was online in the 90s and never expected to see so much widespread support for curbing free speech.

It's like the Russians won.


There's a dual cause here: illiberalism is increasingly in vogue in our culture and politics, but early internet and tech culture were already heavily skewed towards individualism and independent thought: part of this was inevitable with the mainstreaming of the Internet and the fact that the average person is _far_ less comfortable with non-conformity.


Who or what define "Wrongthink" ?


Whoever gets to define what the "right" or "authoritative" answers are.


People who disagree with something, usually progressive or feminist ideology or mainstream science, but can't or won't make a rational argument for their position, choosing to portray themselves as victims of persecution for their radical, dangerous, status-quo threatening ideas.

Ironically, the same people who insist that terms like "racist", "white supremacist," "hate speech," etc. are arbitrary and can only lead down the slippery slope towards a fascist Orwellian dystopia never seem to believe that an Orwellian metaphor can be incorrectly applied.


i don’t think it’s a coincidence this is all happening before the 2020 election


These days I tend to prepend or append my queries with Reddit, specific subreddits, various StackExchange sites, etc. Usually I get some higher signal there.


If this change is executed well, it will be a relief for original content creators such as myself. I am involved in researching and writing original content, and in recent years our articles keep getting buried by low-effort regurgitators who slightly rewrite our work, then rank above us because their version is newer. I've even bellyached about this on HN before:

https://news.ycombinator.com/item?id=19766276

So, here's hoping Google succeeds in setting thing slightly straighter.


Am I the only one who thinks it's a bad idea for a tech company with no experience or education in the field to have so much influence over what people perceive to be good journalism?


Isn't the original PageRank, and every iteration since, defined by what Google engineers believe to be a relevant and informative webpage? They have a 160+ page guide that tells their ratings team how to identify high-quality content:

https://static.googleusercontent.com/media/guidelines.raterh...


>Isn't the original PageRank (...) defined by what Google engineers believe to be a relevant and informative webpage?

No, quite the opposite.

The original PageRank was defined by what other internet publishers considered to be a relevant and informative webpage. Specifically, the original PageRank was weighting content by the links (and content around the links) it was getting. The trust was placed on the internet publishers linking to it. The trust was distributed and also transitive to a degree.

Which, if you think about it, is pretty close to what journalists do - they prop up the good ones in various ways. And also reasonably close to the peer review process popular in sciences - equal peers judging relevant materials and marking / linking the ones they approve.

All in all, the original algorithm was effectively a "wall" between Google engineering and the actual content, as ranked & searched. It was clear delineation that "we don't editorialize search results".

Granted, the later incarnations of Google Search take into account much more than just the original PageRank, and thus stray away from the idealized original formulation - and also get much closer to editorial decisions.


Right, I understand your point, and maybe I'm being too reductive here, but the idea heuristic that what people link to is authoritative is an opinionated decision that ended up being satisfactorily accurate (and efficient to implement). And evaluating the heuristic's accuracy ultimately boils down to human judgment; from their early Google proposal (though not the original Pagerank paper):

http://infolab.stanford.edu/~backrub/google.html

> The biggest problem facing users of web search engines today is the quality of the results they get back. While the results are often amusing and expand users' horizons, they are often frustrating and consume precious time. For example, the top result for a search for "Bill Clinton" on one of the most popular commercial search engines was the Bill Clinton Joke of the Day: April 14, 1997. Google is designed to provide higher quality search so as the Web continues to grow rapidly, information can be found easily.

(Yes, the opinion that a "bill clinton" query should probably return official/biographical pages before "Joke of the Day" is a pretty obvious and uncontroversial one)


>but the idea heuristic that what people link to is authoritative is an opinionated decision

There are two decision spaces here. One is the selection of search ranking algorithm (whether to use PageRank or any other), and that decision was taken by Google[1].

The second decision space is the decision whether to link (or not) to any given document, and what content to put around it. That is a long lasting iterative process rather than any singular decision. Arguably it's geared towards approximating the distributed consensus - which is a global mutable state; expressed with edges the graph rather than any singular node.

Can the process be gamed, perverted, or corrupted randomly or maliciously? Sure. Do we know any better one? Not yet, at least not in the public.

--

[1] practically speaking we all selected PageRank, by preferring Google's over multiple competing engines.


> The original PageRank was defined by what other internet publishers considered to be a relevant and informative webpage. Specifically, the original PageRank was weighting content by the links (and content around the links) it was getting. The trust was placed on the internet publishers linking to it.

Which fell apart badly when:

1. Personal web sites (and, later, even personal blogs) fell out of favor, reducing the quantity of both independent web sites to surface in search, and the number of available signals of page value

2. The migration of users to social media silos reduced the visibility of genuine personal recommendations, especially as those sites often applied "nofollow" to links in user-generated content

3. Search engine optimization became a thing, leading to a glut of low-quality pages (even from previously reputable publishers!) and massive abuse of the remaining signals of page quality


> And also reasonably close to the peer review process popular in sciences

I don't see the connection between the peer review process and PageRank. Peer review consists of educated experts explicitly making a value determination on the quality of the work before it is published and publicly available. Citations can be useful for judging impact, but this happens after the peer review process has been completed.


Yes, PageRank makes assumptions about what good content is and furthermore those assumptions are probably inaccurate in a lot of cases (dishonest clickbait can have a lot of backlinks). But it is unbiased in a sense that using human raters following guidelines is not. Moving from a purely algorithmic approach is a step away from being a neutral delivery service towards being a publisher, which carries extra responsibilities if you have as much power as Google.


"Quality" is a really hard thing to define algorithmically. Even if you had great journalists on staff, I don't know if their heuristic expertise could translate to an algorithm.

Most signals of trust/quality come via backlink profiles, which may be a poor proxy but are still the best that Google has been able to come up with.


To a large extent they already do, albeit unintentionally. We can discuss whether that should be the case in the first place, but Google has been shaping the news industry ever since "Google" became a verb.

With that shaping now irreversible, and with shutting down being the only way to stop it from continuing, it certainly seems like the right thing for Google to think seriously about the rewards their engine offers. While Google is definitely prone to abusing their influence, this move seems very reasonable.


I haven't actually read any internal resumes, but why do you think no one working on Google News has any journalism education or experience?


What would be the alternative? For lawyers to specify search algorithms?


Maybe a giant list of websites in alphabetical order?

(joking - we don't need another DMOZ)


That would at least make SEO easy:

“How do we appear at the top of the results?” “We’ll call our website ‘AAAAAAAAAAAAAAAAAAA - Wikipedia’”



That's how it worked in the Yellow Pages. "AAA Plumbing" and such.


Tech companies are being pushed to become judges on everything or face backlash.


Agreed, I would want some additional criteria for consideration for a number of these, for example when it comes to news I would want interview transcripts or links to such things to affect ranking, similarly links to scientific papers. Avoiding certain a number of common reasoning errors (such as equivocation) or stats errors (such as not considering biased sampling or even not considering sampling size at all when making points) should also be required for top quality.


Absolutely, but they have that influence and so far they've mostly been using it in ways that make journalism worse (creating powerful incentives for shallow articles with clickbait headlines). This seems like a step in the right direction.


For everyone who complains about Google having worse results, have you blocked or disabled Google's tracking? I haven't and haven't had any trouble finding things with Google and I wonder if the two are related in some way.


In some ways that might be true. But I also wonder if google is using tracking/profiling as kind of a crutch.

A while ago, I had google forget about all the things I had searched for. I then wanted to look up some features of C++17, but google kept coming up with info about the C-17 Globemaster instead. Rather than just searching for my terms explicitly, it tried to interpret what I wanted, and failed miserably without knowing I had previously been interested in programming.

This doesn't explain Google groups searches, though, where zero results come up even though you copied/pasted the title from an old post.


> I then wanted to look up some features of C++17, but google kept coming up with info about the C-17 Globemaster instead

I did the search just now and every single result on the first page was about C++17. (I am logged-out with Noscript, Ghostery, adblock enabled). They might have fixed it?


how do you know what the search was?


I aggressively block tracking and my google results are fine.


Has anyone found that recently Google search is becoming more and more useless?

Google tends to ignore what I actually type in, and tries to search according to some weird NLP machine learning inference on what it thinks I'm actually trying to ask.

Top results will include maybe 50-75% of the words I actually typed in, and it will treat the rest as mere hints or related words.

My queries end up looking like this after several tries and fails:

"something" "another phrase" "also" "this"

If I type the whole phrase without quotes I just get a bunch of ads, blog spam, and irrelevant stuff that is pretending to be useful.

Hell, most dev-related queries will return shallow medium-style blog articles instead of SO / Github.


I've noticed over the past year that a Google search for almost any medical term will bury the wikipedia article on the third or fourth page of search results. The first three pages are all sites with extremely watered-down content and little to none of the cross-referencing links that make wikipedia a good exploratory tool (and a natural fit to PageRank).

Google has obviously decided that anyone searching for a medical term should only be shown pages that are equivalent to what a doctor would tell a scared patient in the first three minutes after delivering a diagnosis. Sure, those WebMD pages are probably going to be a bit more accurate than the wikipedia article, but largely because they're saying much less and have fewer opportunities to be mistaken. It feels rather condescending and patronizing.


As someone in medicine, this drive me up the wall. I don't want to go to uptodate every time I have a question and want the basics of a disease or drug. Where Wikipedia fits my requirements perfectly, Google constantly feeds me nonsense WebMD, blogs, and hospital systems. Now, whenever I search for anything medically related I add wiki at the end.


Set Duckduckgo as your default search engine and add "!w" anywhere into your search query. This will take you directly to the corresponding wiki article. It is a breeze to actually work with a search engine instead of against it!

There are >10k of these bangs, so I'm pretty sure you can even use this tool to directly search at other medicine related sites and portals (check here https://duckduckgo.com/bang) if necessary .


I feel like more people would have better search experiences if they knew about/used custom search. For example for wikipedia I've got my browsers set up so I can type "wiki <search term> and have that powered by wikipedia; why hit google at all. You can set up the same thing for anything else that implements the open search standard (youtube is the other one i use regularly). To me searching the web for a wiki page seems pretty irrelevant.

Instructions for setting up wikipedia's : https://en.wikipedia.org/wiki/Help:Searching_from_a_web_brow...


> The first three pages are all sites with extremely watered-down content and little to none of the cross-referencing links that make wikipedia a good exploratory tool (and a natural fit to PageRank).

Those pages are littered with Google ads. It's no wonder that they're prioritized over relevant and information rich results like Wikipedia.


> Sure, those WebMD pages are probably going to be a bit more accurate than the wikipedia article...

Not necessarily: https://www.vox.com/2016/4/5/11358268/webmd-accuracy-trustwo...


Or decided to drive traffic to pages with ads from google.


Almost every thread that's somehow related to Google Search has this same question (which often goes to the top) and the same answers.

It's difficult to imagine that most of the people haven't seen this a bunch of times as well.

And while I agree with this, I'd love to know why people keep asking and/or upvoting this same question over and over. Is it for the sole purpose of bashing Google or what is it? Honest question.


Yes, people have been switching to DDG for years now, yet still google is utterly dominant.

Candidly, I'm mystified by the complaints. Google's search works great, and has for years for myself and everyone I know IRL.

Any post about any product that isn't brand new on HN is going to get comments along the line of "Company X? Their product is garbage, it used to be great! I've been using <alternative that requires a ton of extra effort to setup> and once I got it going it was amazing!"


I won’t argue about your experience, but DDG isn’t any harder to use than Google aside from the fact that you need to change your default search engine. That’s not a “ton of extra effort to setup”.


When trying DDG again for a while, my use of g! gradually increases as a necessary tactic to surface what I'm looking for. When it gets to over about 80% I go back to using Google again with a sigh.


Can you give some examples?

I think a few years ago this was somewhat reasonable but in my opinion DDG has not only matched but surpassed Google for most searches. As an example (to not make this as if I'm only asking you to do the lifting) a random search that's relevant for me would be 'spacex rocket thrust.' Since Google's results are based on arbitrary tracking and whatever per user magic marketing metrics they decide to apply, it's not repeatable but I imagine you'll probably get at least something similar. We should get identical results for DDG:

-----------

Google:

- 4 redundant wiki pages (all link to each other)

- 2 redundant links to spacex.com site (all link to each other)

- 1 irrelevant theverge article on an arbitrary launch

- 1 space.com link with some relevant information

DDG:

- 1 relevant wiki page

- 2 redundant links to spacex.com site

- 1 tangential article from space.com

- 1 space.com link with some relevant information

- 1 spaceflight101.com link with extremely relevant information

- 1 irrelevant link from teslarati talking about a 'spacex rocket package' for a roadster

- 1 cnn article comparing rocket thrusts

- 1 redundant wiki page

-----------

That was literally the first thing I searched for and I think DDG is clearly better there, though both overall results are quite poor. One big thing is a much better diversity of sources with much less redundancy. But the thing that really pushes this example over the edge is the spaceflight101. It not only provides the most relevant information by a rather wide margin, but is also a critical source for a wide array of related news, specs, and other information.

The reason I think both searches are quite poor is because of how much all search engines today lack any notion of context whatsoever. When I search for 'spacex engine thrust' am I searching for technical information, or am I searching for media information relating to recent launches or developments? That's something that ought be able derived from a contextual analysis of my query, yet nonetheless it quite obviously is not!

Google took us that big leap from 'Abraham Lincoln' not returning hardcore porn, but it feels like the progress we've made since then has been pretty.. meh. And I feel the last few years have seen an overall decline in search quality, but a relative increase in quality for the formerly secondary players such as DDG. In other words it "feels" as though Google of ~4 years ago > DDG today > Google Today. But of course this is, in some ways, going to come down to the person.


No I don't have examples to hand. Search isn't something I'm interested in really - it's just a kind of commodity to me that I want to get out of my way.

The trend is entirely clear for my search usage though - I just don't find what I'm looking for much of the time with DDG, so end up g!'ing. When that usage reaches a certain subjective redundancy threshhold I switch back to google.


People have been bashing MS, Apple for decades (for the same tired reasons), and now FB, Google too. What is wrong in expressing displeasure? Its a normal human trait to do so. If you wrong someone enough times, they're going to spend their own free time and energy telling people how much they dislike you. I suppose taken to an extreme, it might become unhealthy.

To be clear, I don't like Google any more, and I'd be happy if more and more people stopped liking them too. An advertising/data-mining/surveillance company controlling access to information in a digital age is about as dangerous as you can get. Even so, I can still objectively praise some interesting cool tech that they produce...


I expect to get interesting and hopefully new insights when reading comments in HN. The comment guidelines state the following:

>Don't introduce flamewar topics unless you have something genuinely new to say. Avoid unrelated controversies and generic tangents.

Hence I believe repeating the same comment (or should be called rhetorical question) in almost every Google Search (or related e.g. DDG) thread isn't only repetitive but IMHO also goes against the quoted guideline.


Yep. Not sure what do these questions try to go after...If Google stops being useful, just quit using it.

Apparently, Google's search algorithm is one of the most hidden secret in the world, and has insanely complex evaluation mechanism in place, any anecdote feedback is not going to cause a ripple in shaping this algorithm.


1. This is a big community and I haven't noticed this in every thread.

2. Google hasn't changed - so until they do, I suspect they will continue to be criticized.

People generally will continuously criticize things that continuously screw them over.


Raising awareness of what was lost in my case.

I miss the early 2000s when I'd do a search for things and get information heavy, presentation light content made by non-commercial enthusiasts.

Though in my case I mostly don't blame Google as much as the commercialization of the internet and the short attention spans of the majority of its current inhabitants (who are a lot less bright and technical than the ones from the late 90s and early 2000s). What I do blame them for a bit is catering to those masses and their commercial focus rather than trying to improve the quality and depth of information we are exposed to. (I can probably also blame them for some of the censorship they seem to do based on these guidelines)


Hacker News has a pretty cool archive feature. I'm certain if you go back about a decade you'd see a similar response on anything regarding Internet Explorer. It was the most dominant browser, by a wide margin, even when there were browsers many felt had long since improved beyond Internet Explorer. And those alternatives came from companies that didn't have the reputational issues of Microsoft, somewhat ironically Google was one of them.

Yet today it's become the same thing with Google. They had a phenomenal search engine, but it's gradually gotten worse while other engines have taken some big strides forward. Its marketshare is just completely out of wack with its capability, and so I think that motivates people pretty strongly since it indicates a knowledge gap in the market. This is also a pretty new thing which further adds motivation - a few years ago Google search was, without doubt, king. Now, it's not. And, again similar to Microsoft, Google is increasingly destroying its reputation so it feels good to swap, let alone when other services are now also arguably superior!

To see some degree of evidence of this look at posts related to Google, but not their search. Posted just today (so there's going to be a large overlap in users) was this [1] post where a sunken car (found to contain a corpse) was visible via Google Maps. Nobody decrying Google Maps because it's still the clearly best service for satellite/street imagery. Gotta give the devil his due.

https://news.ycombinator.com/item?id=20954545


    > Has anyone found that recently Google search 
    > is becoming more and more useless?
Absolutely. I use DDG now. Google spams me with ads, and ignores my search terms.

This 'search disease' seems to be spreading, too. Retailers like Amazon, NewEgg, Ebay, and others happily insert results that they want the user to buy, instead of putting the user in control. On Ebay, half the time they remove filters I want to use. Amazon haphazardly injects their off-brand products into my results, even if I sort by price. NewEgg and Walmart show me results of partner retailers whom I don't know whether to trust.

Maybe this state of affairs works for the average user. It doesn't work for me. It mainly causes me to wade through pages of useless results, trying to filtering them out by eye.


Yes that trend is worrisome. But when confronted about it those site will respond but "the user asked for it !", "We protect our clients from scams". And I totally agree, it is really annoying.


Absolutely-— for a year or two now.

These days, if I have success with even a mildly technical Google search, it’s because I’ve enclosed every single term in quotation marks and/or turned on verbatim search, which doesn’t even work that well.

Google is trying so hard to think for you that they’re ruining their core product.


I had totally forgotten about "verbatim tool". Thanks.

Further, I see there's now an option to disable "relevant" (personalized) searches.

https://www.google.com/preferences

"Private results

Private results help find more relevant content for you, including content and connections that only you can see."

Followed by checkbox.

The linked explainer suggests the "relevant" (personalized) search is informed by your activity with other Google products. Skeptical me thinks that's hardly the whole story.

"Search results from your Google products" https://support.google.com/websearch/answer/1710607

Either way, I may try google search for a while, see if there's any benefit over DDG.


Does any one know how to erase "personalized search" information from being not logged-in?


Sorry, no. I have a "least effort" approach. Browser plugins and use DDG. I assume everything about me is still being recorded by someone somewhere.


Hey -- do you have an example? (I work on Search.)


It is basically impossible to find things with google anymore. Quotes don't work. You have to go to Tools on the results page, and change "All results" to "Verbatim". (You must do this for every search it's not a persistent setting.) Even then you'll often follow results and Control+F to find one of the key words as part of your query and it won't be on the page. Even DDG has been following similar patterns lately - it's a terrible time to be online anymore.


Yes, this is because Google isn't optimizing their search results to return you the information you want, but rather to optimize their revenue. Showing you regurgitated spam sites like wikihow and pinterest means you go back to the search results to find something relevant, which increases the probability you click on an ad.

Furthermore, if they show you spam sites, which also have google ads on them, then you're more likely to click on an ad and give google money than if they show you wikipedia, which doesn't have ads.

They don't want you to go to wikipedia, because it won't make them any money.

They are sacrificing their product quality for profit. This is the general trend in corporations with a fiduciary responsibility to their shareholders, rather than their customers.

In this case, the customer is the ad buyer, not the searcher, so those of us looking for things are actually 4th in line for prioritization, as it's like this:

  Shareholders come first
  Ad buyers come second
  Ad real estate providers third
  Searchers forth
We need a search engine where the search results are the product, rather than the people searching. We are the product being sold to the Ad buyers.


Do you experience this for dev-related searches or all types? I often have trouble finding relevant content for dev-related searches due to SEO content farms that wrap Stack Overflow content in an ad-infested user interface or worse.


Dev related questions are one of the few times I still use Google over DDG. Maybe it's because I do a lot of searches for fairly basic concepts when learning something new but a relevant SO result is almost always in the top three.


There was talk years ago by Paul Graham perhaps at one of the Pycons. It was about not being afraid to do what other big companies do and take them on. When it came to search, that big company was Google, of course. He predicted that over time, due to various reasons, Goole will move into personalizing the search results too much where they'd stop being useful or interesting. "It's true if it's true for you" is the phrase he used. And it seems to me that he was essentially right.

So, say I search for corn, but Google thinks corn is bad for me, cause I've been eating too much of it, and potatoes are better, so they just go ahead and substitute corn with potatoes to help me out.

On one hand it's nice, they think they understand what I really need. On the other hand, it's very bit creepy when it goes overboard.


Oh thank god, I thought I was going crazy or my problems were so unique. For instance just recently I hit 4 separate TEIID errors and googling the IDs led to unrelated pages either to TEIID itself or pages containing completely different IDs. Using quotes helps of course, but useful results were on 3 page or later.

Can we create a github repo with examples? I believe dev experience using Google is going really downhill.


Yes, please -- or at least paste a few examples here (I work on Search.)


I was having this problem more and more until I realized "hey, those sites all have search engines" and now for general questions I search directly on SO and for deeper dives I go to Github.

Github's search is, in general, very good at finding what you searched for, all of what you searched for, and nothing but what you searched for. The one thing I want, and it's vague and complex, is a way to filter out the several hundred or thousand very very similar repositories that come into being due to a popular tutorial, school lab assignment, or frequently forked and slightly modified repository using the rare function I'm looking into.


I just noticed the same problem. Searching for "lithium" wont let me find the wikipedia article for the metal. News and youtube song from Nirvana are showed instead. PD: Google knows I'm very interested in renewable energy, they constantly suggesting me news regarding the said matter.


It's probably because of "personalized results". If Google knows that you listen to Nirvana it will assume you want to hear the song, instead of showing you the MOST RELEVANT RESULT, that is the freaking Wikipedia article about the metal. They lost the plot, sometimes things just work and you should not overthink it, the original Google page rank algorithm is a beautiful insight into how distributed knowledge graphs work and self organize. Now they destroyed it's usefulness.


That's curious, you don't get the large info card (on the right, if on desktop/laptop)? That has a direct link to Wikipedia.


Twice in the last month I found Bing gave me a better search result than Google (thing I was looking for in the first couple answers, not even on the first page in Google). That never happened to me before.


For something like two years I've been using bing more and more. Google is filtering/censuring the results so much about anything remotely controversial, i.e. torrents, streams, politics on a certain political spectrum, news etc... Basically Google is telling me I should not be using its services anymore, and I'm ok with that. Bing will at least let me find the links I tell it I want it to find.


Bing probably has benefited from more people using Duckduckgo.


Probably fallout from trying to juice the stagnating ad revenues. In a perfectly relevant search nobody would ever click on ads.

I heard from Bing engineers (circa '09-10) that every time Bing would release a relevance improvement, Google would increase their relevance metrics shortly thereafter, suggesting that their relevance is being artificially held back. It makes economic sense, and it'd be super easy to implement, so I believe it.


Same here.

Often, I have to rephrase my search 5-10 times before I see a result worth clicking on.


Hey -- do you have an example? (I work on Search.)


Not the parent, but a recent example of a failed search for me was:

"Saerang story lord of the rings"

The most important term got dropped and needed to be quoted to actually find what I wanted.

A separate and more concerning problem with Search is the emphasis on shallow information without outgoing links for more depth. The majority of searches on many topics (medical topics for example) will have the top results be "terminal" webpages that offer only a very shallow look at a topic without any way of finding more detailed information.

For example, when I attempted to look for a comprehensive breakdown of biodegredation of plastics (e.g. what kinds of environments reduce/increase biodegrading time, what types of plastics have longer times and why) I just kept finding pages by various advocacy groups that talked in vague generalities and offered no opportunities for further research.


I see the same behavior. I think it's time to assemble a list of examples.


"thesaurus that organizes words by category" - gives a bunch of online dictionary pages. I was trying to find a specific physical book I remember using in high-school.

"techniques for interpreting statically typed languages" - gives a bunch of super basic articles/questions about statically vs dynamically typed and compiled vs interpreted languages. DuckDuckGo actually gave better results for this.

I feel what's happening is that google is looking at what people are clicking during searches and then using that data to calculate absolute ranks for pages and websites. Those ranks are then not necessarily appropriate for different, less common queries or searching behaviors. However, this algorithm probably performs better in 99% of cases for some internal metrics, so Google ends up keeping it.


> "techniques for interpreting statically typed languages" - gives a bunch of super basic articles/questions about statically vs dynamically typed and compiled vs interpreted languages. DuckDuckGo actually gave better results for this.

But is there any good result that includes all those words and which Google failed to find?


I don't remember the exact phrasing I ended up using, but I think I remember finding through DDG a StackOverflow question that was essentially "how to make an interpreter for a statically typed language" and it did not appear on Google.

To answer your exact question: I don't know, perhaps not.


I bet googlers on HN who work on ranking would appreciate that. Here's a recent one from me:

  typical tcp packet size


What exactly do you mean by that? When I search it up I get a SO answer saying what the max packet size is and another (closed) SuperUser answer that says that their is no average tcp packet size because that depends on the specific application. Both of those seem pretty good to me, but I'm not a low level networking guy.


I think the problem is that google confidently and authoritatively displays (in a big and bold font) "65535 bytes" which is an answer to a different question.


I definitely wasn't asking for maximum packet size, which a search feature showed in huge print as the answer to my question. I don't expect the search algorithm to know my precise intent, but I'd like it to weigh the word "typical" highly.

Between google, bing and duckduckgo, only duckduckgo presented this above the fold: https://etherealmind.com/average-ip-packet-size/


In Google, after that card, I get their "People also ask", and the second question is "What is the average packet size?", which links to https://superuser.com/questions/964682/what-is-the-average-s...

Seems ok to me.

It should also be noted that the etherealmind.com link doesn't actually contain the word "typical", so if you think Google should try to outsmart users less, it's not reasonable to blame it for not showing that link.


I'm seeing that etherealmind link as #4 on Google, and then #3 on DDG. And the results above it on DDG are also related to the maximum size, and the entire sidebar on DDG is the same stackoverflow link about the maximum size.

Not sure if DDG is really that much better than Google here.


Recently ? This has been an issue for at least a few years, IME.


Imo, the shark-jumping moment for any search engine is the day when they deprecate the "0 results" page. I mean, when the user searches for "alice bob" and the search engine pretends the user searched only for "alice" to avoid showing an empty page. Google certainly started doing that earlier than one year ago.


Any actual examples? I hear this complaint but usually little to support, did it really get worse? I get bad results from time to time, but it is good in most cases.


yes, they've been doing that for a while but it's gotten worse.

also top results tend to be from sites trying to sell something.

which makes sense because businesses can hire SEO consultants and spend their resources getting google to show them in the top results.

however people who just put stuff up and leave up for free don't hire SEO consultants.


Try using verbatim search. It’s under “Search tools -> All Results”. Sadly I can’t figure out how to direct link it.



Wow so intuitive. :)

Thanks for the info!


Those blog style articles are the "original reporting" they're trying to promote.


Google recent algorithm update made a massive change in the way news sites write their content.

It forced them to be original.

I believe elevating original reports is the incentive google gonna bring sites who follow their guidelines


This is cool, i'll have to dig less to see what the original source was for a news report. Also giving more weight to news sources with pulitzers and such is EXACTLY how search SHOULD be rather than ranking up Buzzfeed writers who rewrite news in clickbait worthy fashion


Yeah I hate how the top result for any search is 10 news organizations all with the same low effort headline.


Okay let's give this a try

news.google.com -> Democratic debate

Websites:

BBC, Vox, Guardian, NYT, WaPo, CNN, Slate (wow), The New Yorker, USA Today

Absent:

National Review, Fox News, Wall Street Journal, reason, Forbes, RT

So basically the news algorithm considers these kinds of stories high quality:

1) Stephen Colbert plays Democrat Drinking Game (NYT)

No Greg Gutfeld sketch ?

2) Funniest one-liners at the Democratic Debate (CNN)

3) Where was Mayor Pete Buttgeig at the Debate? (NYT)

4) OPINION: Winners and Losers of the Democratic Debate (NYT)

actually has the word opinion in the title

5) Who won the Democratic Debate? Texas. (NYT)

I think there ought to be more representation from right-leaning organizations


There should also be more representation from far-left organizations.

See this article on how Google traffic to far-left sites dramatically dropped in 2017, due to Google deciding to promote "authoritative" content ahead of "alternative viewpoints": https://www.wsws.org/en/articles/2017/09/19/goog-s19.html

"Authoritative content" seems correlated with center-left in the US context, though the US center-left may be considered centrist or even center-right in some other countries.


Fair enough, I guess, if you also want to give more representation to far right organizations. The organizations OP mentioned are not far right. They are mainstream right. With the possible exception of RT, which I'm not actually all that familiar with.


Fox News is far right, it was founded to with the explicit purpose of putting far right view points into the main stream[0].

https://www.theatlantic.com/politics/archive/2011/06/roger-a...


A) your link doesn't even mention the phrase "far right", and B) what something is today is not the same as what it was intended to be in the 70s.

You can just go to foxnews.com. It may be low-brow journalism that doesn't carry itself with the same gravity as NYT or WaPO, but none of the stuff on there seems very extreme to me. No white nationalism, for instance, which is I think is basically the right wing equivalent of the socialism (as in, actual socialism, not social democracy) and communism you see on the far left.


I think you're just pushing an agenda. Where does the blog say anything about opinion being ranked down.

In terms of what the blog actually talks about, "Democratic debate" is a terrible example. This seems to focus on boosting older articles that break the news and down rank low effort follow up stories. I don't think a consistent topic like "Democractic debate" is a good test bed.

Maybe search for a scandal?


> Where does the blog say anything about opinion being ranked down.

Every ranking boost for one result is a ranking decrease for all other results.


Most likely this is just boosting articles at the beginning of a trend curve for growth for a search. For the purpose of what's laid out here, opinion could and probably should be presented.


A great example of scandal would be the "very fine people" scandal.

A plain text reading of the transcript and a fair viewing of the press conference has the President saying that he was not talking about neo Nazis should be condemned totally, and that there were also very fine people on both sides of the statue debate.

Jake Tapper of CNN has clarified this multiple times on air.

As for agenda, I'm an Indian citizen with no business or personal interests in the USA other than following the news as a hobby and enjoying the reality TV aspect of it.

Googling it gives you essentially fake news. Try it out.

Vote the man out of office but don't destroy your society and potentially others by looking the other way and allowing a company like Google to do this.


> No Greg Gutfeld sketch ?

Who?

> I think there ought to be more representation from right-leaning organizations

For a search specifically about a left-leaning organization?

> National Review, Fox News, Wall Street Journal, reason, Forbes, RT

Two of those publications have much stricter paywalls/adwalls than any of the other publications you mentioned. One is a foreign outlet that's decidedly not known for quality international reporting. One is downright obscure compared to all the others. One is actually present in the Google News results for this topic.

> Slate (wow)

Do you have something substantive to say about Slate or its inclusion in the news results?

I feel like you aren't trying to make the fairest or strongest arguments. I'm sure you could find some more valid ways to try to make a case for google news results being politically biased.



So, do you see how that article explains why a Colbert sketch is more likely to make the news than something from Gutfeld? Colbert's audience is more than twice the size of Gutfeld's, and Colbert has been in this business far longer and is better-known among people who don't regularly watch either show. Other things being equal, Colbert is simply more newsworthy. So unless Gutfeld had an especially good segment about the democratic debate, there's no reason to expect him to be included in a top news listing that only has one or two comedy/satire items.

And that's all ignoring the fact that Gutfeld's show has not yet aired an episode since the last debate, so it's quite impossible for a sketch from his show to be ranking in the news about that debate.


I wasn't making a specific argument about Gutfeld though. The broader context is that it's favoring partisan sources and messages.

That's an outcome.

By explaining why that outcome is reasonable and giving a list of valid claims to support it, it does not address the stated problem which is the fact that this search engine presents a slanted view of American politics.

Why is it ok to have this kind of search result? We all don't owe our media corporations any kind of loyalty, and we all know that they have an interest in a particular point of view.

I'd think that in this kind of situation we are all best served by knowing what the folks at CNN are saying and also what the folks at Fox are saying (not those two specifically, but as stand-ins for multiple points of view).


> I wasn't making a specific argument about Gutfeld though.

Yeah, you were. Your overall point wasn't about him specifically, but you most certainly did attempt to use him as a (completely invalid) supporting argument. You're generally doing a very bad job of supporting your position. That's understandable to an extent because showing convincing evidence of systematic bias requires gathering a non-trivial amount of data, but you could at least do us the favor of not bashing Google for suppressing stories that don't exist.

And there's still the issue that you chose a partisan search term as your example. Given that, it seems very reasonable that Google's news results would include more results from left-leaning sources, because they have more to say on that particular subject. It almost seems like you're arguing that Google News should be biased toward equal time/prominence for different factions even if that's not accurately representative of what news and commentary is out there.


I think the misunderstanding here is that we are focusing on an offhand remark made mostly in jest.

You can read as much meaning and intent into it and in that sense it would be very difficult, if not impossible, for me to do a good job to counteract the manifestation of unintended meaning or intent into what I say, so going forward I will limit any facetious remarks.

Regardless, the data here was that I typed a search query and the outcome was from sources that lean left of center. I think people are best served when there is a diversity of views since we can potentially benefit from the lived experience of everyone, all of whom have something valuable to contribute to our discourse. In a political sense voters are not served well when the news organizations that are shown to them on a search result page are either pro or anti ruling party.

In my country most television news is pro-government, with the exception of one channel. This government frequently misrepresents data and it's helpful to us as citizens when our media organizations can hold them accountable. If Google were to favor some media corporations over others, it can potentially prevent the media from doing this.

Fundamentally I think a better news product is one that provides diverse points of view to the reader to help inform them of the diverse perspectives on a particular topic. By boosting some corporations over others, in a world where bias is fairly well established, I think the product does readers a disservice.


Interesting. When I Google just the word "debate" I get an opinion piece from Fox as the second result.


These results are neither hermetic nor repeatable since the ranking will be polluted by searches and clicks coming from your browser in the past, you account on other browsers, other people using your IP, other people in your region, etc. There is no pure function mapping a search term to a result list, and the desire for such a thing is too silly to take seriously.


If most folks are getting similar rankings then the impact of the pollution is minimal though.

Are you getting different results?


For me (Firefox + noscript + ublock in private browsing mode):

- cnn

- cnn

- cbsnews

- NY Times

- NPR

- AP News

- Wikipedia

All neutral sources except for the NY Times and CNN on a bad day.

Of your list there: Nation Review, Reason, and Forbes are not journalism outlets. RT doesn't have editorial independence like other state backed media. Without getting into debate, maybe Fox News should be on my page before wikipedia and a second CNN link. But I don't see a reason for those others.


FYI, you seem to be listing results from a regular Google web search, not a Google News search.


ah my bad


For what it's worth, with my account logged in to Google geolocated in India, my results are almost the same except BBC and Guardian instead of CBS and NPR. This is for Google Search, not Google News, as the other commenter pointed out.


As a comparison : https://www.thefactual.com/news/story/214519-Democratic-Deba...

Full disclosure: I am one of the techies who built it.

Comments/Feedback welcome.


The first 10 for me were:

NYT, NYT, NYT, NYT, Vox, NYT, WaPo, NPR, CBS, LA Times

They certainly seem to like New York Times


I'd probably pay for a filter that distinquished editorial from reporting.


You'd end up with an empty reading list.


You're thinking of gossip.

Anyone quoting sources and showing their work (published data) is reporting.

Even if you disagree with the interpretation, conclusions. Especially if you disagree.

--

Sadly, I continue to be embarrassed, surprised when geeks proclaim it's all just opinion. This anti-intellectualism is how cults and hoaxes fester and grow.

It's not helping.

Is it too much to ask that the primary beneficiaries of our modern, technical, scientifically advanced society also defend the foundations of our society?

How about just part time? A few hours per week. Take a breather from the cynicism. Just pretend the world is knowable and that we can work to improve stuff. Think of it as a new hobby.


I think the point of the parent is that any story you read is editorialized to some degree, even if all it's doing is listing facts.

Every time you read a report on a topic, mountains of facts are omitted for the sake of disseminating the points the authors and editors thought you should know to build your opinion. It is thus vital that you trust their judgment to do so in a way that's beneficial to you.

Knowing things is just really hard and we tend to fool ourselves into believing we know more than we do and that we can know more than is reasonable (yours truly included).


I empathically agree with your paras #2 and #3.

"any story you read is editorialized to some degree"

The opinion is the value add.

I used to read The Economist, because they challenged my views. I read Tyler Cowen today because I disagree with him and because he argues his case.

The rut we find popular discourse in today is caused by two things. The New Deal consensus we once had has not been replaced with a new consensus. The reluctance to distinguish between blather and discourse (debate).


> Anyone quoting sources and showing their work (published data) is reporting. Even if you disagree with the interpretation, conclusions.

Interpretations are commentary, not reporting. Reporting is strictly facts. Even opinion pieces contain some reporting ("X did Y, and here's what you should think about it"). I don't believe it's "anti-intellectualism" to ask for more reporting and less commentary.


No, you wouldn't. There is actual reporting that takes place in the world. Like all things, what is and isn't reported or investigated is slanted by whoever is doing the reporting. But pure editorial is a very different beast.


I wonder how this will deal with content that evolves over time. Some breaking news was published to BBC News with only a sentence and was fleshed out within the hour, so the original publication time and the possible originality of content are disconnected. In theory Wikipedia is meant to be summarising other sites so it couldn’t be first.


I like the idea of putting a spotlight on original stories, yet this also incentivizes racing to break a story even if it is just a rumor

https://en.wikipedia.org/wiki/Cobra_effect


On the surface, focusing on original reporting should be a very good thing. But this goes way beyond that. These raters are going to effectively be the arbiters of truth and reality. This is a very dangerous opaque centralization of information control given the monopoly Google has on search.

Maybe take a look at some search alternatives like DuckDuckGo or YaCy.


I'm all in on DuckduckGo.

No longer doing !g as I don't accept the Google privacy policy (although I have peaked behind the popup using Ublock)


If you ever need Google results, just use !s for Startpage instead. It pulls Google results in a privacy respecting way, similar to DDG.


thanks!


I wonder how they are going to make this... Only based on timestamp? Based in "trusted sources"? Based in a new API?


Besides algorithms, they have a team of raters who manually evaluate the results. The rating guidelines are specifically enumerated in this guide: https://static.googleusercontent.com/media/guidelines.raterh...

For example, check out page 26 for their grid of "Examples of Highest Quality Pages"

Algorithmic and signal-wise, there are common conventions in English-language journalism that signal "original reporting", or rather, not original reporting. Such as, "...as reported by" or "...', Rep. Smith told the Washington Post". Of course, not every publication uses those, so I'm guessing the trustiness of a site/source will come into play.


The human raters are used to assess the performance of the algorithms (i.e. do human opinions align with algorithmic ones), not to rank content directly.

A recurring theme with Google is that they always want to solve problems algorithmically.


But leaning on automated algorithms seems like the only way to realistically scale for day-to-day needs? I was going to posit that Google seems like the company most open to mixing human and automated processes. But that's not necessarily fair to Facebook, for which moderating user content and interactions is a much different problem space than search rankings; and of course, YouTube seems to have the same issues as FB does.


And what do they do with these human assessments then? Of course they feed them back to the algorithm so that it can improve. In effect it is still humans influencing the search rankings albeit indirectly. So it is a bit duplicitous to say that "it is the algorithm that decides" when there are humans providing learning data for it.


It's still algorithm-first. It's not "learning data" in the sense that human ratings are used to train the algorithm.

It's more of a check step. So, for example, if humans think one website is vastly more (or less) authoritative than human raters, the engineers might dig in to see which aspects are causing the algorithm to evaluate it differently and, potentially, test tweaking the algorithm accordingly.


Google is scared shitless with the DOJ investigation on them.


Google made some pretty serious moves in the last year.

Their recent algorithm update is forcing news sites to improve their overall content quality

and by now highlighting original reports it will force news sites to write original content and not just rewrite each other articles.

It's cool to see how Google is changing online journalism


> It's cool to see how Google is changing online journalism

It really isn't. That sort of 'cool' is what got us AMP, Chrome and search results mixed with advertising as well as Google products that are promoted at an unfair advantage compared to established products by competitors.

Monopolies have their occasional upsides but they also have structural downsides which is why they should be avoided.


I'm generally pretty right wing and I generally think that the tech companies have a heavy left-wing bias. That said, Google already has a massive influence on journalism. From where we are, today, in the real world, this feels like a good change.

In my opinion. For what it's worth.


In December 2018, Sundar Pichai testified under oath to US lawmakers that search results are completely algorithmic with no human reranking of search results. I wonder if they are still going to stick with this story.


Why wouldn’t they?

> their feedback doesn't change the ranking of the specific results they're reviewing; instead it is used to evaluate and improve algorithms in a way that applies to all results


It's not re-ranking, it's rating of the algorithm.

Basically the difference between preparing a exam using annals and buying the exam.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: