
Elevating original reporting in Search - cryptofits
https://www.blog.google/products/search/original-reporting/
======
erikig
I found the rater guidelines document linked in this blog post quite
insightful. Sections like how to assess the "E-A-T" rating (Expertise,
Authoritativeness, and Trustworthiness) will be useful to me when writing
content.

[https://static.googleusercontent.com/media/guidelines.raterh...](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)

~~~
IfOnlyYouKnew
I don't think anybody calling it "content" should ever be considered
trustworthy. The word reeks of hastily rephrased Wikipedia solely intended for
search engines.

~~~
Silhouette
It's a standard term of art, hence terms like "digital content creation"
tools, "separating content from presentation", and so on.

I think any other connotations you're associating with the word are probably
down to you, not the word.

~~~
IfOnlyYouKnew
Yes, it's a standard term of art within the hastily-rephrasing-wikipedia-for-
SEO community.

~~~
tjwds
No, this is patently false. Here are some examples which illustrate
Silhouette's point:

[https://en.wikipedia.org/wiki/Content_delivery_network](https://en.wikipedia.org/wiki/Content_delivery_network)

[https://www.loc.gov/standards/mdc/](https://www.loc.gov/standards/mdc/)

[https://en.wikipedia.org/wiki/Portal:Featured_content](https://en.wikipedia.org/wiki/Portal:Featured_content)

[https://en.wikipedia.org/wiki/Information_content](https://en.wikipedia.org/wiki/Information_content)

------
rdtwo
Authoritative seeming sources are not always a good thing. Anytime you search
for a game guide/faq/hint you get drivel from the big name sites that are
super short on content and have bad advice. Almost in all cases the content
I’m looking for is in some tiny blog that barely ranks but matches all the
search words.

~~~
gambler
_> Authoritative seeming sources are not always a good thing._

Yeah, but what if stupid people search for the wrong thing, find the wrong
website and get infected with Wrongthink? Have you thought about that?
Crushing amateur content creators and betraying all the ideals of the early
Web is small price to pay for preventing that intolerable scenario.

~~~
ldng
Who or what define "Wrongthink" ?

~~~
AnimalMuppet
Whoever gets to define what the "right" or "authoritative" answers are.

------
DamnInteresting
If this change is executed well, it will be a relief for original content
creators such as myself. I am involved in researching and writing original
content, and in recent years our articles keep getting buried by low-effort
regurgitators who slightly rewrite our work, then rank above us because their
version is newer. I've even bellyached about this on HN before:

[https://news.ycombinator.com/item?id=19766276](https://news.ycombinator.com/item?id=19766276)

So, here's hoping Google succeeds in setting thing slightly straighter.

------
sverige
Am I the only one who thinks it's a bad idea for a tech company with no
experience or education in the field to have so much influence over what
people perceive to be good journalism?

~~~
danso
Isn't the original PageRank, and every iteration since, defined by what Google
engineers believe to be a relevant and informative webpage? They have a 160+
page guide that tells their ratings team how to identify high-quality content:

[https://static.googleusercontent.com/media/guidelines.raterh...](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)

~~~
dexen
_> Isn't the original PageRank (...) defined by what Google engineers believe
to be a relevant and informative webpage?_

No, quite the opposite.

The original PageRank was defined by what _other_ internet publishers
considered to be a relevant and informative webpage. Specifically, the
original PageRank was weighting content by the links (and content around the
links) it was getting. The trust was placed on the internet publishers linking
to it. The trust was distributed and also transitive to a degree.

Which, if you think about it, is pretty close to what journalists do - they
prop up the good ones in various ways. And also reasonably close to the peer
review process popular in sciences - equal peers judging relevant materials
and marking / linking the ones they approve.

All in all, the original algorithm was effectively a "wall" between Google
engineering and the actual content, as ranked & searched. It was clear
delineation that "we don't editorialize search results".

Granted, the later incarnations of Google Search take into account much more
than just the original PageRank, and thus stray away from the idealized
original formulation - and also get much closer to editorial decisions.

~~~
danso
Right, I understand your point, and maybe I'm being too reductive here, but
the idea heuristic that _what people link to is authoritative_ is an
opinionated decision that ended up being satisfactorily accurate (and
efficient to implement). And evaluating the heuristic's accuracy ultimately
boils down to human judgment; from their early Google proposal (though not the
original Pagerank paper):

[http://infolab.stanford.edu/~backrub/google.html](http://infolab.stanford.edu/~backrub/google.html)

> _The biggest problem facing users of web search engines today is the quality
> of the results they get back. While the results are often amusing and expand
> users ' horizons, they are often frustrating and consume precious time. For
> example, the top result for a search for "Bill Clinton" on one of the most
> popular commercial search engines was the Bill Clinton Joke of the Day:
> April 14, 1997. Google is designed to provide higher quality search so as
> the Web continues to grow rapidly, information can be found easily._

(Yes, the opinion that a "bill clinton" query should probably return
official/biographical pages before "Joke of the Day" is a pretty obvious and
uncontroversial one)

~~~
dexen
_> but the idea heuristic that what people link to is authoritative is an
opinionated decision_

There are two decision spaces here. One is the selection of search ranking
algorithm (whether to use PageRank or any other), and that decision was taken
by Google[1].

The second decision space is the decision whether to link (or not) to any
given document, and what content to put around it. That is a long lasting
iterative process rather than any singular decision. Arguably it's geared
towards approximating the distributed consensus - which is a global mutable
state; expressed with edges the graph rather than any singular node.

Can the process be gamed, perverted, or corrupted randomly or maliciously?
Sure. Do we know any better one? Not yet, at least not in the public.

\--

[1] practically speaking we all selected PageRank, by preferring Google's over
multiple competing engines.

------
CDSlice
For everyone who complains about Google having worse results, have you blocked
or disabled Google's tracking? I haven't and haven't had any trouble finding
things with Google and I wonder if the two are related in some way.

~~~
sseagull
In some ways that might be true. But I also wonder if google is using
tracking/profiling as kind of a crutch.

A while ago, I had google forget about all the things I had searched for. I
then wanted to look up some features of C++17, but google kept coming up with
info about the C-17 Globemaster instead. Rather than just searching for my
terms explicitly, it tried to interpret what I wanted, and failed miserably
without knowing I had previously been interested in programming.

This doesn't explain Google groups searches, though, where zero results come
up even though you copied/pasted the title from an old post.

~~~
MagnumOpus
> I then wanted to look up some features of C++17, but google kept coming up
> with info about the C-17 Globemaster instead

I did the search just now and every single result on the first page was about
C++17. (I am logged-out with Noscript, Ghostery, adblock enabled). They might
have fixed it?

~~~
MiroF
how do you know what the search was?

------
enlyth
Has anyone found that recently Google search is becoming more and more
useless?

Google tends to ignore what I actually type in, and tries to search according
to some weird NLP machine learning inference on what it thinks I'm actually
trying to ask.

Top results will include maybe 50-75% of the words I actually typed in, and it
will treat the rest as mere hints or related words.

My queries end up looking like this after several tries and fails:

"something" "another phrase" "also" "this"

If I type the whole phrase without quotes I just get a bunch of ads, blog
spam, and irrelevant stuff that is pretending to be useful.

Hell, most dev-related queries will return shallow medium-style blog articles
instead of SO / Github.

~~~
outime
Almost every thread that's somehow related to Google Search has this same
question (which often goes to the top) and the same answers.

It's difficult to imagine that most of the people haven't seen this a bunch of
times as well.

And while I agree with this, I'd love to know why people keep asking and/or
upvoting this same question over and over. Is it for the sole purpose of
bashing Google or what is it? Honest question.

~~~
andrewvc
Yes, people have been switching to DDG for years now, yet still google is
utterly dominant.

Candidly, I'm mystified by the complaints. Google's search works great, and
has for years for myself and everyone I know IRL.

Any post about any product that isn't brand new on HN is going to get comments
along the line of "Company X? Their product is garbage, it used to be great!
I've been using <alternative that requires a ton of extra effort to setup> and
once I got it going it was amazing!"

~~~
snazz
I won’t argue about your experience, but DDG isn’t any harder to use than
Google aside from the fact that you need to change your default search engine.
That’s not a “ton of extra effort to setup”.

~~~
crispinb
When trying DDG again for a while, my use of g! gradually increases as a
necessary tactic to surface what I'm looking for. When it gets to over about
80% I go back to using Google again with a sigh.

~~~
rjf72
Can you give some examples?

I think a few years ago this was somewhat reasonable but in my opinion DDG has
not only matched but surpassed Google for most searches. As an example (to not
make this as if I'm only asking you to do the lifting) a random search that's
relevant for me would be 'spacex rocket thrust.' Since Google's results are
based on arbitrary tracking and whatever per user magic marketing metrics they
decide to apply, it's not repeatable but I imagine you'll probably get at
least something similar. We should get identical results for DDG:

\-----------

Google:

\- 4 redundant wiki pages (all link to each other)

\- 2 redundant links to spacex.com site (all link to each other)

\- 1 irrelevant theverge article on an arbitrary launch

\- 1 space.com link with some relevant information

DDG:

\- 1 relevant wiki page

\- 2 redundant links to spacex.com site

\- 1 tangential article from space.com

\- 1 space.com link with some relevant information

\- 1 spaceflight101.com link with extremely relevant information

\- 1 irrelevant link from teslarati talking about a 'spacex rocket package'
for a roadster

\- 1 cnn article comparing rocket thrusts

\- 1 redundant wiki page

\-----------

That was literally the first thing I searched for and I think DDG is clearly
better there, though both overall results are quite poor. One big thing is a
much better diversity of sources with much less redundancy. But the thing that
really pushes this example over the edge is the spaceflight101. It not only
provides the most relevant information by a rather wide margin, but is also a
critical source for a wide array of related news, specs, and other
information.

The reason I think both searches are quite poor is because of how much all
search engines today lack any notion of context whatsoever. When I search for
'spacex engine thrust' am I searching for technical information, or am I
searching for media information relating to recent launches or developments?
That's something that ought be able derived from a contextual analysis of my
query, yet nonetheless it quite obviously is not!

Google took us that big leap from 'Abraham Lincoln' not returning hardcore
porn, but it feels like the progress we've made since then has been pretty..
meh. And I feel the last few years have seen an overall decline in search
quality, but a _relative_ increase in quality for the formerly secondary
players such as DDG. In other words it "feels" as though Google of ~4 years
ago > DDG today > Google Today. But of course this is, in some ways, going to
come down to the person.

~~~
crispinb
No I don't have examples to hand. Search isn't something I'm interested in
really - it's just a kind of commodity to me that I want to get out of my way.

The trend is entirely clear for my search usage though - I just don't find
what I'm looking for much of the time with DDG, so end up g!'ing. When that
usage reaches a certain subjective redundancy threshhold I switch back to
google.

------
Yuval_Halevi
Google recent algorithm update made a massive change in the way news sites
write their content.

It forced them to be original.

I believe elevating original reports is the incentive google gonna bring sites
who follow their guidelines

------
kkarakk
This is cool, i'll have to dig less to see what the original source was for a
news report. Also giving more weight to news sources with pulitzers and such
is EXACTLY how search SHOULD be rather than ranking up Buzzfeed writers who
rewrite news in clickbait worthy fashion

------
rc_mob
Yeah I hate how the top result for any search is 10 news organizations all
with the same low effort headline.

------
fareesh
Okay let's give this a try

news.google.com -> Democratic debate

Websites:

BBC, Vox, Guardian, NYT, WaPo, CNN, Slate (wow), The New Yorker, USA Today

 __Absent: __

National Review, Fox News, Wall Street Journal, reason, Forbes, RT

So basically the news algorithm considers these kinds of stories high quality:

1) Stephen Colbert plays Democrat Drinking Game (NYT)

No Greg Gutfeld sketch ?

2) Funniest one-liners at the Democratic Debate (CNN)

3) Where was Mayor Pete Buttgeig at the Debate? (NYT)

4) OPINION: Winners and Losers of the Democratic Debate (NYT)

actually has the word opinion in the title

5) Who won the Democratic Debate? Texas. (NYT)

I think there ought to be more representation from right-leaning organizations

~~~
jayd16
I think you're just pushing an agenda. Where does the blog say anything about
opinion being ranked down.

In terms of what the blog actually talks about, "Democratic debate" is a
terrible example. This seems to focus on boosting older articles that break
the news and down rank low effort follow up stories. I don't think a
consistent topic like "Democractic debate" is a good test bed.

Maybe search for a scandal?

~~~
quotemstr
> Where does the blog say anything about opinion being ranked down.

Every ranking boost for one result is a ranking decrease for all other
results.

~~~
jayd16
Most likely this is just boosting articles at the beginning of a trend curve
for growth for a search. For the purpose of what's laid out here, opinion
could and probably should be presented.

------
tobylane
I wonder how this will deal with content that evolves over time. Some breaking
news was published to BBC News with only a sentence and was fleshed out within
the hour, so the original publication time and the possible originality of
content are disconnected. In theory Wikipedia is meant to be summarising other
sites so it couldn’t be first.

------
tosh
I like the idea of putting a spotlight on original stories, yet this also
incentivizes racing to break a story even if it is just a rumor

[https://en.wikipedia.org/wiki/Cobra_effect](https://en.wikipedia.org/wiki/Cobra_effect)

------
ilaksh
On the surface, focusing on original reporting should be a very good thing.
But this goes way beyond that. These raters are going to effectively be the
arbiters of truth and reality. This is a very dangerous opaque centralization
of information control given the monopoly Google has on search.

Maybe take a look at some search alternatives like DuckDuckGo or YaCy.

------
sails
I'm all in on DuckduckGo.

No longer doing !g as I don't accept the Google privacy policy (although I
have peaked behind the popup using Ublock)

~~~
nvrspyx
If you ever need Google results, just use !s for Startpage instead. It pulls
Google results in a privacy respecting way, similar to DDG.

~~~
sails
thanks!

------
kuu
I wonder how they are going to make this... Only based on timestamp? Based in
"trusted sources"? Based in a new API?

~~~
danso
Besides algorithms, they have a team of raters who manually evaluate the
results. The rating guidelines are specifically enumerated in this guide:
[https://static.googleusercontent.com/media/guidelines.raterh...](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)

For example, check out page 26 for their grid of "Examples of Highest Quality
Pages"

Algorithmic and signal-wise, there are common conventions in English-language
journalism that signal "original reporting", or rather, not original
reporting. Such as, "...as reported by" or "...', Rep. Smith told the
Washington Post". Of course, not every publication uses those, so I'm guessing
the trustiness of a site/source will come into play.

~~~
dmg826
The human raters are used to assess the performance of the algorithms (i.e. do
human opinions align with algorithmic ones), not to rank content directly.

A recurring theme with Google is that they always want to solve problems
algorithmically.

~~~
dunkelheit
And what do they do with these human assessments then? Of course they feed
them back to the algorithm so that it can improve. In effect it is still
humans influencing the search rankings albeit indirectly. So it is a bit
duplicitous to say that "it is the algorithm that decides" when there are
humans providing learning data for it.

~~~
dmg826
It's still algorithm-first. It's not "learning data" in the sense that human
ratings are used to train the algorithm.

It's more of a check step. So, for example, if humans think one website is
vastly more (or less) authoritative than human raters, the engineers might dig
in to see which aspects are causing the algorithm to evaluate it differently
and, potentially, test tweaking the algorithm accordingly.

------
user_50123890
Google is scared shitless with the DOJ investigation on them.

------
SalmndraCrypto
Google made some pretty serious moves in the last year.

Their recent algorithm update is forcing news sites to improve their overall
content quality

and by now highlighting original reports it will force news sites to write
original content and not just rewrite each other articles.

It's cool to see how Google is changing online journalism

~~~
jacquesm
> It's cool to see how Google is changing online journalism

It really isn't. That sort of 'cool' is what got us AMP, Chrome and search
results mixed with advertising as well as Google products that are promoted at
an unfair advantage compared to established products by competitors.

Monopolies have their occasional upsides but they also have structural
downsides which is why they should be avoided.

~~~
toasterlovin
I'm generally pretty right wing and I generally think that the tech companies
have a heavy left-wing bias. That said, Google already has a massive influence
on journalism. From where we are, today, in the real world, this feels like a
good change.

In my opinion. For what it's worth.

------
kleton
In December 2018, Sundar Pichai testified under oath to US lawmakers that
search results are completely algorithmic with no human reranking of search
results. I wonder if they are still going to stick with this story.

~~~
JoBrad
Why wouldn’t they?

> their feedback doesn't change the ranking of the specific results they're
> reviewing; instead it is used to evaluate and improve algorithms in a way
> that applies to all results

