
Statement on Google’s conduct by founder of CelebrityNetWorth.com (2019) [pdf] - mosiuerbarso
https://docs.house.gov/meetings/JU/JU05/20190716/109793/HHRG-116-JU05-20190716-SD015.pdf
======
teddyh
The primary case here made against Google might be questionable behavior from
Google, but might not be illegal, since simple numbers like a dollar amount
can’t be copyrighted, and AFAIK there is no other legal impediment to Google
doing what they did (IANAL).

What looks _far more_ inculpatory to me is this, on the last page:

[…] _Because it controls essentially the entire internet, Google has endless
levers at its disposal to significantly harm or snuff out a rival._

 _I don’t think it’s a coincidence that our organic rankings have continued to
suffer since I’ve become a vocal critic of their practices. Earlier this year,
within weeks of the publication of a Wired magazine article that included
quotes from me and a recap of our story, the CNW mobile app was banned from
the Google Play store without explanation or recourse. As a result we were
also banned from using the Google’s dominant Ad Mob mobile ad platform._

This is far more damning of Google abusing its monopoly, not only on search,
but also on the Android platform.

~~~
hluska
> This is far more damning of Google abusing its monopoly, not only on search,
> but also on the Android platform.

No, it’s really not. I just visited celebritynetworth.com. One article took
462 http requests, loaded 9mb of content and though DOM content loaded in
434ms, it took 1 minute to finish loading everything.

The sheer amount of adware on this site is staggering.

Let’s put down the pitchforks and put on our critical thinking caps. Does
serving all that adware provide anything close to an acceptable user
experience?? And if not, doesn’t it look more like this website merely fell
out of search engine fashion as things like performance and UX began to trump
other factors?

~~~
ethanwillis
Maybe if they served up some of Google's AMP™ pages they would get higher
search rankings.

~~~
thefreeman
I know you are being sarcastic but it would result in a 10x better experience
for the user.

~~~
ethanwillis
The joke I'm trying to make is that if they had a much nicer site design it
still wouldn't be ranked higher. It would only be ranked higher if and only if
Amp was used... because of bias.

------
WalterGR
Google is incredibly shady. There’s been a manual penalty in place against my
website The Online Slang Dictionary
([http://onlineslangdictionary.com](http://onlineslangdictionary.com)) for the
better part of a decade. I know this because the data suggested it - and then
a Google employee confirmed it.

It would be easy to explain it away because Aaron Peckham, owner of Urban
Dictionary, worked at Google when Matt Cutts was head of the Web Spam team,
and they knew each other. But that’s incredibly circumstantial. I do know that
when I confronted Cutts about the penalty here on HN he lied about it - which
does lead one to wonder about his motive.

I’ve written in length about the manual penalty here:
[http://onlineslangdictionary.com/pages/google-panda-
penalty/](http://onlineslangdictionary.com/pages/google-panda-penalty/)

Edit 0825 Pacific: I have to work. I’ll respond to comments later. You can
also reach out directly - contact info is in my profile.

~~~
rozab
Do you have any evidence at all for these extraordinary claims? Why would
Urban Dictionary go to such lengths to suppress a competitor which posed no
threat[0]?

I also notice that the homepage hasn't been updated at all for at least 8
years[1]. I suspect this may have had something to do with its decline.

[0]:
[https://trends.google.com/trends/explore?date=2004-01-10%202...](https://trends.google.com/trends/explore?date=2004-01-10%202014-01-10&geo=US&q=online%20slang%20dictionary,urban%20dictionary)

[1]:
[https://web.archive.org/web/20120107121814/http://onlineslan...](https://web.archive.org/web/20120107121814/http://onlineslangdictionary.com/)

~~~
DanBC
You may want to read these posts:

[https://news.ycombinator.com/item?id=9977372](https://news.ycombinator.com/item?id=9977372)

[https://news.ycombinator.com/item?id=5419890](https://news.ycombinator.com/item?id=5419890)

~~~
ender7
I have to say, after reading those posts, Matt is looking more trustworthy
than the OP.

Especially considering the definitely-not-a-sock-puppet post by jimboykin [1],
an account that was created immediately after this thread and has but a single
post on HN.

[1]
[https://news.ycombinator.com/item?id=5476100](https://news.ycombinator.com/item?id=5476100)

~~~
sct202
I mean part of this issue is that there is no feedback to the website owner,
which is shown by the guy only getting a response here on HN.

Like isn't it kind of insane that the top Google search engineer provided
customer service to OP, but he couldn't get a response thru a normal channel.

------
mcintyre1994
> And what about those conjured celebrities I added as a precaution? All five
> were all scraped right into Google’s search result pages. It provided
> undeniable proof that after being turned down, Google simply went ahead and
> stole the entire database of content CNW took eight years and over a million
> dollars to build.

I remember a few years ago Google had a big blog post about how they’d
injected some fake search results to catch Bing scraping like this. It
certainly seemed like they thought it was a bad thing back then.

~~~
strgcmc
I worked at Microsoft from 2010-2012, and the crux of the issue there was
that, for Microsoft users who had the Bing toolbar installed, the toolbar
would track what you were visiting in your browser and use that
session/journey information to try and improve the relevancy of Bing results.
If a lot of people who searched for "best pancake house" (regardless of what
search engine you used) end up visiting ihop.com within the same browser
session, well then Bing would want to rank ihop.com higher for that search
phrase.

Google noticed this, and for unique/low-traffic search terms, was able to
synthetically generate enough "fake" traffic that the "fake" traffic became
the dominating signal for those terms, and therefore Bing started directing
users to the fake results.

The bad thing here IMO is the level of tracking of users via this toolbar, but
fundamentally this seems as bad as any other digital fingerprinting or
advertiser tracking as anything else that's become common on the web. This is
not to excuse Microsoft for doing a bad thing, but it really had very little
to do with "scraping Google", which somehow became the popular media takeaway
for this.

EDIT: a decent contemporaneous article in Wired:
[https://www.wired.com/2011/02/bing-copies-
google/](https://www.wired.com/2011/02/bing-copies-google/), mentioning how
Microsoft was using the clickstream data from the browser/toolbar, not
scraping Google results per se

~~~
whalabi
That's the first I've heard of that side of the story, and I'm pretty appalled
because everyone I know believes "Bing scraped Google" \- which I think was
how Google framed it.

I wonder how common misunderstandings like this can be prevented or treated.

------
throwaway189262
And yet I constantly get CAPTCHA's just from blocking tracking cookies.

The biggest scraper of them all goes to incredible lengths to prevent
scraping. How ironic.

I hope the US govt demolishes these monopolies. It's not just in web either,
the closest historical precedent I can think of is the Robber Barons of the
30's

~~~
Google234
There’s a thing called a robot.txt.

~~~
xondono
Yes, and google has been observed several times to ignore the robots.txt file

~~~
Google234
No, this is true in general. Maybe there’s been some bugs (I would be
surprised, can you link these multiple examples?) but Google respects
robot.txt files

------
alex_young
I searched for a couple of celebrities, and in one case celebritynetworth.com
is the top result, in the other it is the third result.

Both queries show info boxes with the estimated net worth, both cite other
pages (no link to celebritynetworth.com), and one showed another number than
celebritynetworth.com.

celebritynetworth.com has a robots.txt. This file has no restriction on Google
or any other bot.

Each celebrity page seems to have the following meta tag:

    
    
      <meta name=googlebot content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">
    

Given the above, it seems like Google is correctly listing
celebritynetworth.com high in it's rankings, and using other resources for the
info box, which is what celebritynetworth.com is asking them to do.

While it's hard to speculate about what happened in the past, it looks like
Google is doing the correct thing now, and that this information isn't
actually so novel in general.

------
tomduncalf
"The Featured Snippet also incorporated images of the celebrity scraped from
the web to create a widget that took up 40% of a desktop result page and 80%
of a mobile result page. This is still how Google displays most net worth
results today"

FWIW I tried "larry david net worth" and "craig david net worth" on Google
from the UK and didn't get this result - Larry David was a snippet from
Wikipedia, clearly attributed, and Craig David was from CNW, again clearly
attributed - basically an oversize search result with the relevant bit
highlighted. Not sure if this is a recent change in response to this or not

------
gundmc
Out of curiosity, I tried searching a number of "$X Net Worth Queries" to see
how it looks today. Snoop Dogg, Mike Trout, Will Smith.

For each I got an infobox sourced from Wikipedia or a news article and
Celebrity Net Worth was somewhere in the top 1-3 organic results.

~~~
philprx
When you consider the impact/cost of an anti-trust case vs. the ease for
Google to change rankings so that the evidence site (CNW) is back in the top 3
results...

Of course you would do that if you were Google... And here there's nothing
illegal, it would just defuse more journalistic bombs for Google. Smart move.

------
jarym
I only see one solution: Make Google split the search business out so that it
only provides search and nothing else. That was Google's stated 'point' of
their own site and they should be held to it.

They are like a bad rash, just ripping off from businesses of all sizes and
relying on their dominance in search and financial muscle to get away with it.

Siphoning off search into its own unit would defang Google's ability to use
their search engine dominance to upend others business models and would also
reduce the incentives to do such shady things as well.

~~~
mikece
Another solution is to make Google pay for the estimated damages caused to the
website (I think it's reasonable to extrapolate growth for a few more years)
as well as a massive punitive damage as well because of the callousness and
deliberate manner in which they contacted the site owners AND TOLD HIM THEY
WOULD TAKE WHAT THEY WANT. I hope Google is forced to pay CNW no less than one
billion dollars for this obvious abuse of their position and theft of their
content.

------
redm
I think this quote is the most interesting in the testimony:

 _In June 2019, search engine analyst Rand Fishkin put together a report about
Google using data from web analytics firm Jumpshot. The data show that today
an estimated 48.96% of all Google searches end with the searcher NOT clicking
through to a website. The same report estimates that 7% of all search clicks
go to a paid ad result and 12% go to properties owned by Google’s parent
company Alphabet. Moreover, those stats do not even show the full extent of
the problem because the data largely relied upon desktop devices and could not
track searches that took users to a Google-owned app like the YouTube or
Google Maps._

------
praveen9920
I'm not sure if this is ethical and legal problem. But one thing for sure is
Google has too much leverage both technically and financially.

Google has slowly increased the reliability on Google for consumers everywhere
that they are essentially a monopoly which is bad for the market. It is about
time a new player comes into picture with same user interface and different
business model which can empower these internet entrepreneurs.

~~~
techntoke
The same could be said for Microsoft, Amazon, Apple, etc. At least Google open
sources their operating system, browser and are mostly responsible for
Kubernetes. If people are going to have such high expectations of one company
why do they not equally apply the judgement.

~~~
throwaway2048
There is also similar problems with all those companies too, not sure why you
think that is some kind of defence of Google.

~~~
techntoke
Because those companies are ignored while everyone jumps aboard the Google
hate train, and if anything is going to be done to fix it then those changes
need to be applied equally to all companies at the same time. Otherwise it
does more harm than good.

~~~
throwaway2048
People complain about the actions of all those companies quite regularly,
Google isn't a victim here.

------
tyingq
Doesn't this behavior eventually hurt Google? If there's no incentive for
third parties to collate, verify, and organize facts...eventually there isn't
anyone to scrape. And the facts go stale.

~~~
remus
It would depend on how the websites google are collecting data from operate.
For example, somewhere like wikipedia isn't harmed through google's re-use of
the facts they collate.

~~~
ubercow13
They are if they have less visitors to see their donation requests, stop
making enough money and shut down.

------
tehabe
An odd witness for the very true issues people have with Google Search. Google
is crediting all websites it uses information from. It might be sad for such a
site but I wonder how the creator thought that this is sustainable. Sites with
biographical data for famous people already existed and for them adding net
worth information is easy. So if not Google had kill them, they might have
being killed by IMDb oder Wikipedia.

~~~
ehnto
According to the document, the net worth was unique researched data that is
still yet to be accurately replicated. In other words, the company created the
data, they did not simply display it. If they were sunk by IMDb copying their
data then the data source goes away too, unless IMDb invests in a research
team.

What happens when CNW goes offline, how will Google replace the now missing
data? Remembering that celebrities constantly change net worth and new ones
show up every week, so it's a moving target that someone needs to chase.

------
dustingetz
Surely they can find a stronger example than a spammy SEO business that didn't
exist before google, couldn't exist without and doesn't exist after? Google
sucks now but not finding this compelling

~~~
calcifer
Why would the nature of the business be relevant to a House committee hearing?
CNW is not on trial here.

------
treszkai
Information without DRM can easily become a public good: it can be copied
without limits and a loss in quality. Once it's out in the open, people will
access it easier, but that also makes it near-impossible for the information-
owner to make a living. At the beginning, Google was beneficial to both the
information-seekers and the information-owners because it connected them
(while showing ads in the search results). With the "featured snippet" it
increased the user experience by a little (and multiply this with the user
count), while depriving the original owner of information from its entire
revenue. (Much like how Spotify made music cheaper for everyone, at the
expense of artists.) The nice way to solve this would be if Google gave a cut
to CelebrityNetWorth, as Spotify gives to artists. (While Google would bring
CNW more requests this way, their price would also be depressed; Spotify
doesn't pave the way to the riches. [1])

[1]: [https://www.digitalmusicnews.com/2018/04/12/streaming-
music-...](https://www.digitalmusicnews.com/2018/04/12/streaming-music-
services-rates-2018/)

------
manish_gill
I have Zero respect for the Google engineers and PMs who worked on this
project, scraping content without attributing to the original source, and
subsequent change in the search ranking of the original source.

A very bad taste in the mouth indeed.

~~~
dimsumsumdim
I am not trying to defend Google or its practices. But how is the activity of
CNW researchers different from what is Google doing? Unless the author
elaborates on their method, "Educated guess" sounds like scrapping content
without sourcing the author in the first place.

~~~
mrtksn
AFAIK there's no other database that they scrapped from. Reading sources like
news articles, documents etc. is very different from copying the database that
someone else created.

It's like the difference of tapping Yelp's API and create a competing service
and looking at Yelp, Google, Squirespace etc. to create a restorant
recommendations website.

~~~
dimsumsumdim
Humans can also scrap. If CNW employs researchers, they must be using sources.
These sources are not referenced in their content. Either CNW is merely making
up the net worth figures, or their researchers are pulling data from somewhere
to provide information about 25,000+ celebrities. If the second is valid, they
aren't that different from what Google is doing.

~~~
mrtksn
I answered your question about how CNW researchers are different. Using
sources to create a knowledge base for something is very different than
hooking up to a knowledge base that was already curated and displaying it as
your own without compensating the people who curated it.

Yes, sure. It would be nice of them to disclose their own sources too but I
fail to see how this is related to Google straight out copying their curated
database and showing it to its own users.

Like, why are we discussing it? What is the agenda here? Are we trying to
conclude that CNW owners are also not that nice? Are you claiming that CNW
copied your own research and demand justice too? If you explain what is your
point here, maybe we can have a productive discussion instead of repeating the
same stuff.

------
errantmind
It looks more like an ethical argument than a legal one. I don't understand
how this data would be protected under US law. Can someone explain if you
think otherwise?

Is the issue that the data was scraped? Or is the issue Google appropriated
the scraped data for use in their search results?

From my perspective it just looks like a bad business model: spend a lot of
time and effort estimating net worth, and then publish it on the open web.
Publicly available data can be freely and legally scraped (and I wouldn't want
it any other way): [https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-
v-l...](https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-
protects-scraping-public-data)

~~~
mike_d
This has been settled case law for decades thanks to sports almanacs. Facts
(like someones net worth) cannot be copyrighted. However a compilation of
facts can be.

If CNW had published a book, Google purchased it off of Amazon, and paid an
intern to retype it in to fact boxes - that is clear copyright violation.

The fact that Google bought the data from a third party data broker who
scraped CNW does not absolve them of liability in my mind.

~~~
JimDabell
> Facts (like someones net worth) cannot be copyrighted. However a compilation
> of facts can be.

It’s my understanding that this ceased to be true in the USA in 1991 and that
a minimal degree of creativity is required [0]. Other countries have started
following suit more recently.

[0]
[https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R...](https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._Rural_Telephone_Service_Co).

~~~
arcticfox
IMO there's a huge difference between a list of names and numbers generated by
a research team estimating someone's net worth, and a phone book.

One is a list of informed estimates, the other is a list of (essentially)
facts.

Informed guesses are IMO essentially much longer pieces of creative content,
just condensed into a number.

------
dunnock
You would not expect of such big business to behave in a different way. What
we consumers can do is to try to use alternative services where we can, e.g. I
am happy with using Firefox instead of Chrome

~~~
jtxx
same + duckduckgo & protonmail

------
not2b
Many commenters are interpreting this as strictly a matter of copyright
violation, and are asking the question of whether facts can be copyrighted.
But that's too narrow a way of looking at this; there's also antitrust law. If
Google scrapes everything interesting off of other sites and presents it as
their own, even if it finds a way to do this that does not violate copyright,
it's anti-competitive behavior that may violate antitrust law.

(I'm not addressing the European database protection rules which may also
apply).

------
rjkennedy98
> Earlier this year, within weeks of the publication of a Wired magazine
> article that included quotes from me and a recap of our story, the CNW
> mobile app was banned from the Google Play store without explanation or
> recourse

A lot of talk here about the scrapping and capturing of the ad revenue, but I
thought this was the most scary part. It’s one thing to capture ad revenue
from SEO companies, it’s another thing to act like the mob and start taking
retribution against anyone that speaks out against you.

~~~
Wheaties466
I think thats the point of a monopoly. They have such a large overarching
reach that they don't need to respond in a direct way that could be proved to
be retribution. They respond in another avenue that affects your business in
another platform they control, in ways they don't have to explain.

------
iJohnDoe
I think many would be surprised at the trivial and childish changes that
Google does to hurt web sites.

Many don’t speak out for fear of further being punished by Google.

------
dmcbrayer
There's evidence of Google engaging in similar behavior with regard to song
lyrics on Genius.com: [https://www.rollingstone.com/music/music-news/genius-
google-...](https://www.rollingstone.com/music/music-news/genius-google-stole-
lyrics-morse-code-848781/)

I think the points about people voluntarily building their businesses on
Google's platform, and being thus beholden to Google's whims are well-made.
The law, as it exists, doesn't seem to prevent that kind of thing since
they're all private businesses, and this behavior benefits the consumer
(arguably) even if it does hurt other businesses. Antitrust law (in the US)
seems ill-prepared for the 21st century.

I tend to get interested in the policy concerns. Is web search so important
that it should be considered a public utility, and thus regulated by the
government? I think there's a case to be made there.

------
DSingularity
Wow! Holy smokes. What a terrible company.

Oddly enough, as their practices may destroy the internet perhaps they will
eventually destroy their own livelihood.

Speaking for myself, I no longer search for product reviews and comparisons.
Why waste my time when all the content is fake? Probably fake due to the
Google search ranking tweaks. Now I go straight to the sellers.

What is sad in all this is all the great people that work for such shitty
companies and what that implies regarding the state of our morality as
computer scientists and engineers.

------
dave_aiello
The impression I got from reading comments here is that CelebrityNetWorth.com
isn't much more than a compilation of net worth figures in a database. I don't
think this is a fair analysis of the holistic value of that website.

For instance, when I look at the analysis of Tim Cook,
[https://www.celebritynetworth.com/richest-
businessmen/ceos/t...](https://www.celebritynetworth.com/richest-
businessmen/ceos/tim-cook-net-wrth/), who's in the news this week partly
because Bloomberg reported that his net worth has reached $1 billion,
[https://www.bloomberg.com/news/articles/2020-08-10/apple-
s-c...](https://www.bloomberg.com/news/articles/2020-08-10/apple-s-cook-
becomes-billionaire-via-the-less-traveled-ceo-route), I see the value of CNW's
news articles. The CNW news article has a similar value to the analysis that's
present in the Bloomberg article reporting on the same thing.

In terms of how much of the intrinsic value of CelebrityNetWorth's website
Google should be allowed to report directly in its search results, I'd say it
would be fair to report CelebrityNetWorth's estimate of a celebrity's net
worth so long as Google clearly attributed the data to the site that is their
basis, and provided a link to the original information.

But in my opinion, the real value of CelebrityNetWorth.com in 2020 is the
curated biographical content about the celebrities and their analytical pieces
presented in context with their net worth estimates.

------
havelhovel
People are saying this was a bad business model, but looking at the numbers it
clearly wasn’t—-unless we’re also acknowledging that the bad part of the model
was relying on Google to stay true to its stated policies.

------
asah
Isn't this a question if just scraping published data? How is this materially
different from LinkedIn v hIQ ?

~~~
renlo
NAL, but IIRC it was alleged in the lawsuit that LinkedIn was making a
competing product to hiQ. It also seems related to Associated Press vs
Meltwater US Holdings, where the AP has developed a network to investigate and
promulgate cutting news, and another company was copying parts of the news
under "fair use", which was deemed infringement [1].

[1]
[https://en.wikipedia.org/wiki/Associated_Press_v._Meltwater_...](https://en.wikipedia.org/wiki/Associated_Press_v._Meltwater_U.S._Holdings,_Inc).

------
60secz
No moat? Don't be surprised when someone destroys your business model.

------
philprx
"Don't be evil" ;-)

Is it still Google's motto?

~~~
InfinityByTen
They deprecated that years ago.

~~~
EdwardDiego
Gotta say, I preferred Google v1.0. The subsequent patches suck.

~~~
techntoke
Android, Chromium and Kubernetes have come a long way. Plus there are a lot of
other major open source projects that come out of Google. Way more than
Microsoft, Apple and Amazon.

~~~
tester34
>Plus there are a lot of other major open source projects that come out of
Google. Way more than Microsoft, Apple and Amazon.

Is it even true? MS does a lot of open source

~~~
techntoke
Yes if you compare the code base size and the core components. Microsoft still
hasn't open sourced it's main consumer operating system and neither has Apple.

~~~
perseusmandate
Not a valid comparison since those are core products of both companies. Google
hasn't and never will open source Search.

~~~
techntoke
Android is one of Google's core products, as well as web (including Chromium).

~~~
andrekandre
chromium is open source because webkit (khtml) was open source

many useful parts of android are closed, and open source versions of most apps
are languishing (calendar, contacts, dialer etc)

~~~
techntoke
Have you checked out F-Droid lately, and how does this compare to Apple or
Microsoft?

------
renewiltord
Information wants to be free, man.

~~~
philprx
Yes, that's OK if it's symmetrical: why can't we easily automate Google search
results?

It's asymetry here that is the problem, not Information Freedom.

And i'm pretty sure this is the line/dark pattern Google is hoping for: that
people give credit to Google for "Information Freedom"... missing the real
issue.

------
j88439h84
> Are everyday consumers harmed by Google’s practices YES

How are consumers harmed by Google presenting extracted data in an infobox?
Isn't it more convenient?

~~~
dunnock
I think what author said is explained in the article, as that such approach
undermines small businesses which provide informational services. They won't
be able to hire data analysts and researchers anymore, hence quality of
information in info boxes will decline as information gets outdated. In other
words Google is stealing data presenting as it's own taking all the benefits
of embedding ads data

------
ardy42
> When pressed, the Google team said it would be good exposure for our brand.

Offering to pay people in exposure is a really slimy attempt at exploitation,
full stop.

------
lalaland1125
I think Google is in the right here. Facts are not protected by copyright and
they shouldn't be. Imagine being able to copyright the density of steel or the
list of historical US presidents. It would be a disaster.

~~~
onion2k
Some things _are_ facts, like the density of steel or the list of historical
US presidents. But other things _aren 't_ facts, and where something isn't a
fact but instead is implicit knowledge derived from a set of both data and
assumptions (like a model or an algorithm) Google should absolutely not be
taking that data and treating it as their own. Given the PDF includes this
quote " _I decided to build a company that attempted to make educated guesses
based on the sourcing of a full-time research team._ " I think we can assume
that the data on CelebrityNetWorth wasn't "facts".

However, _even if CelebrityNetWorth 's data were facts_, I don't believe
Google should be able allowed to replicate the work that people have put in to
building a website that lists facts. Google could, and eventually probably
will, replace every fact based website with a sidebar in their results page.
People can't compete with that. That's using Google's (de facto) search
results monopoly to _stop people searching the web_. That's making the web
worse. That's stifling innovation. I might be able to dream up some way of
using CelebrityNetWorth data to improve the world (somehow...), and Google
would be stopping that.

The engineers who work at Google need to stop ruining the web by trying to
turn it in to their own walled garden.

~~~
doctor_eval
This is interesting. The final outcome could be that CNW shuts down. Then, the
data won’t be updated. So then google will have to drop the info box. And then
someone will come along with an idea to track the net worth of celebrities...

Has Google set up some kind of hysteresis that results in the long term,
repeated building and subsequent crushing of data aggregation companies?

~~~
erikbye
> Then, the data won’t be updated. So then google will have to drop the info
> box

Google won't drop the info box, they don't care if the information is
accurate.

------
skinkestek
The scariest thing for Google might be that generally people aren't cheering
for them anymore, and many are following them closely to catch them as soon as
possible.

Compare that to 10 years ago when I and others would cut them 9 yards of
slack.

------
sjg007
I am not sure why copyright law doesn't apply? Also celebrity net worth should
create a trademark net worth which is obviously an approximation of the true
net worth. That's different than a sports fact.. But still I don't see why
copyright law doesn't apply if Google is directly sourcing material.

------
asah
Anti-trust q: how is celebrity net worth "your" business if ~100% of your
users came from Google in the first place?

Seems to me, you're entitled to lock up the data behind a paywall, and Google
is entitled to not send you its users. Google's users can switch search
engines.m and some do, e.g. I use duckduckgo for incognito.

More to the point, how would regulating Google change any of this? Who would
determine a fair price for every piece of information?

My sense: to the extent you don't care about the privacy impassioned of CNW,
they're a legit value- added service. The exact economic value is somewhere
between $0.01 per year and $100B. The free market can be cruel, but it's the
best mechanism we have for price optimization.

Personally, I'd have pushed back on Google researchers and told them that we
know our content is unique, and if they want accurate results, to consider
entering into a legit partnership... for example, publish N year old data for
free but require captcha/ToS (excluding search engines) for current data.
Individual users may not care, but search engines will want the fresh stuff as
a competitive matter.

Of course, this all assumes there isn't a second CNW competitor with good-
enough data, offering it for less than CNW... and if less=free, then too bad
CNW...

This to me is the weakest link in the CNW business: too easy to gather good
enough data cheaply and there's not enough value beyond the basic number e.g.
what's the breakdown of CelebrityXs NW, NW over time etc etc - very little
market for this premium content.

Put another way, i could create a website called "what's the current time in
Minsk?" but its economic value would be closer to $.01/yr because it's
undifferentiated and there's no premium value, so search engines can and
should just display the time.

Now let's take weather: in some places the weather is pretty easy (see LA
Story, with Steve Martin) and in others (cough nyc cough) the free feeds like
NWS are crap and you never know whether to pack an umbrella. Accurate weather
for NYC would be amazing value to millions of people and highly
differentiated. The problem is that this data is super hard to get - even
current data are often wrong! But if by some miracle you had this data, no way
would i publish it - instead show that you've got the goods and license it.
Google might play cheapskate but the weather apps would pay up and probably
uber/lyft, the city of NY etc. Once you've got a few customers then approach
Google: this data would be useful across the enterprise. You'd make a killing.
But oh yeah, it's really freaking hard (unlike CNW).

------
visarga
I read the statement and think Google used and then and abused their service,
but at the same time, publishing a database of people's net worth strikes me
as being pretty unethical. Would they have been able to handle a GDPR request?
Why didn't they put their own and their families info up there first?

~~~
gberger
In Sweden, Finland, and Norway, everyone's income tax records are public. Is
that unethical?

~~~
markalexander
Ethical is a bit of a wide term but I’m sure it’s easy to construct situations
that are problematic.

E.g. an abusive ex partner can check whether you’re likely to have switched
job, even from the most basic info. If tax paid is revealed then you can work
out what government benefits a person might be on, including those relating to
health and disability.

Isn’t the implementation in Norway such that the press have special access not
afforded to individuals? That also seems pretty questionable.

~~~
Phaedor
Everyone have access but you have to log in with your ID and the ones you
check can see that you checked them. You can check 500 people each month.

Your scenario seems pretty far fetched to me since this information will be
published in October for the previous year. It contains the income and tax for
the whole year only. It would be pretty difficult to know the information you
suggest from the limited, delayed records.

As a Norwegian I fully support our system with public tax records.

~~~
markalexander
I’m not sure how delayed records would stop you working out disability status
or any other sensitive status that affects tax paid. The tax regime for any
given year is presumably public info.

People are notified but that’s hardly going to stop someone seeking to upset
you as in the example scenario.

Though as I understand the press can do mass lookups with no notification.
Don’t you find that disparity concerning? The idea that the government gets to
decide what counts as ‘press’ is already something people from countries that
don’t operate that way would consider objectionable. Let alone giving them the
power to decide who should be 'outed' for their tax affairs.

To be fair, that's perhaps more an issue of this particular implementation,
rather than fundamental. But it's an example of how it can be used to entrench
the establishment.

~~~
Phaedor
Everyone have pretty much unlimited access in practice so I dont see that the
press have any special advantage. The press have guidelines that they should
only publish the tax of public figures and I have not seen them abuse this so
far.

We are a country with a high degree of social trust, and that is what I
believe makes our country great. I realize that our system is not for everyone
and that is fair.

~~~
markalexander
Sure, I've spent some time in Norway, I get it. It's definitely very different
to societies like the UK or US.

I'm not fundamentally against open tax records like this, but even in Norway
there are bad actors and if we are going to play a game of 'how can this be
exploited', I think it's definitely possible to do so.

I can't really agree that a 500 limit with notification is 'in practice' the
same as no limit with no notification. Data mining and other techniques are
possible at scale but not with only 500 records. It also makes exploitation
more profitable. Harvesting data for e.g. commercial targeting is not very
appealing for n = 500. It's definitely appealing at n = 5 million.

Do the press abuse it in this way? Well, they don't publish stories about
doing it. But how would you really know otherwise until some scandal breaks?
Are lookups by press orgs also open to public inspection? Maybe you have some
freedom of information laws that would let you request that data.

~~~
Phaedor
Sure, almost any system can be exploited so at some point we just have to
compare the advantages with the disadvantages. I am not aware of any abuse of
this system. Everyone knows that their tax and income is pretty much public
knowledge so they will live their life by that assumption.

------
system2
I felt sad reading the whole thing. I searched "will smith net worth" and
after wikipedia snippet, the first result was celebrity net worth.

CNW depends on google, uses them to make money. Google is a search tool and
wants to show whatever results makes users happy. CNW can simply remove itself
from the platform and tell google never to use their data. It almost sounds
like CNW is trying to bite the hand of only company made them something. I
feel bad for CNW but this is natural. I've seen many of my and my clients'
sites go rock bottom because of google updates.

"Google Stole My Business" sounds harsh. More like "google doesn't do what I
want them to do." CNW was nothing before google. Maybe CNW could make
technical changes to their content the way wallpaper/graphics sites made
changes after google image search updates.

~~~
huffmsa
"Google Takes My Data and Presents It as Their Own" a better title for you?

~~~
system2
No of course not. My point is that google will change things as they did in
the past. The way google snippet works is simple, they scan the data and sum
it up.

I searched multiple results right now. "bill gates net worth" data was taken
from Wikipedia and CNW didn't even exist on the first page.

"Beyonce net worth" showed a detailed snippet for businessinsider, and second
link was CNW. Same for Eminem, 210M from groovewallet, and second result was
CNW.

I just couldn't find what CNW is complaining about.

~~~
huffmsa
And the wikipedia page cites [https://www.forbes.com/profile/bill-
gates/?list=billionaires...](https://www.forbes.com/profile/bill-
gates/?list=billionaires#46f7f1c3689f)

As the source for Bill Gates' net worth. Why can a free, community edited
website successfully attribute it's sources but a trillion dollar company
cannot?

~~~
system2
Can you or anyone show me an example? I searched over ten celebrities and only
found CNW as either 2nd or 3rd result. All snippets were giving full details
to the sites.

~~~
huffmsa
CNW's statement was about 2012 to the present. The damage is done.

I distinctly remember even recently that Google wouldn't provide attribution
in the snippet.

Though now, it sure seems like the #1 results for most of my "{celebrity} net
worth" queries are AMP (AKA Google approved™) pages.

