Hacker News new | past | comments | ask | show | jobs | submit login
EU antitrust regulators say they are investigating Google's data collection (reuters.com)
217 points by PretzelFisch 4 days ago | hide | past | web | favorite | 110 comments





I've always been amazed that very few organizations question why Google Analytics is free. Sites are literally handing Google a list of targeted visitors their competitors can advertise to.

Just to be super clear: Google Analytics is not free, the entry version with a limited a visitors/visits it can logs per day is free, then you have to pay to get the full version.

Delivering a quality free product to get companies to sign for the expansive one when their needs increase is a very common marketing system, and allows to keep control of the market despite being vastly out of price. Works for desktop app, mobile app, web services, ... I say this as someone who once had to sign for the full version for a web property I was working at.

This is a matter where people are quick to over react so let's be super clear: I am NOT saying the limit is placed at a reasonable level, nor that companies absolutely need to have the paid version and can't extrapolate enough from the free version data (they absolutely can), nor that Google isn't majorly benefitting in other way than full version sales (they do).

What I'm saying is that "the base version is free and the full version is paid" is a very common thing that doesn't by itself mean anything nefarious is taking place.


"that doesn't by itself mean anything nefarious is taking place"

I don't believe that Google is intentionally doing nefarious things either, but that doesn't mean we shouldn't subject them to close scrutiny over their date collection practices. Whether Google Analytics (GA) is free or not is beside the point. The free version of GA may limit what GA users see, but that doesn't stop Google from capturing far more data than they expose.

This is a company that tracks users on an industrial-size scale that no other online company can match. And yet despite that, most developers are more likely to rush to Google's defence rather than question those data collection practices. (Does a multi-billion company with an army of lawyers need developers to defend it?)

I've said this many times: the hypocrisy that runs through the programming profession when it comes to online tracking really knows no end.


> this is a company that tracks users on an industrial-size scale that no other online company can match. And yet despite that, most developers are more likely to rush to Google's defence rather than question those data collection practices.

If that is what you read in my message, then you are projecting what you want to see on what I actually said. Nothing I said defends Google on that front.

EVERYTHING in your post is completly out of scope of what I said in answer of parent's post. What I said, is that just because some company does something wrong doesn't mean everything that company does is wrong. Or in this case, that while Google is obviously massively syphoning data on a gigantic scale, NO analytics basic tier being free, by itself, isn't an element of wrongdoing.

If anything, your post proves my point, that when they believe they know the end results people are quick to make up the narrative in between to reach it, even if they need to throw the baby with the water too.


Exactly, the context is clear ly not related to marketing.

I wonder if Google uses the data tracked on the pay version for ad targeting also. I bet they do.

I suppose that would be up to the phrasing of the paid version contract. I'm betting even if it says no there is wriggle room, especially for that important 3 months fresh data.

the entry version with a limited a visitors/visits it can logs per day is free

That covers limits on usable data by the user, does it also mean that Google only receives that amount of information?


That's not accurate. Lots of organizations ended up using Omniture for one reason only: it's not owned by Google.

The difficulty in removing google analytics is the main reason I decided not to set up a Freeciv web server.

IMO that’s a major part of a mechanized pathological attack on the public mind and exposing my friends to that feels wrong.


> Google has said it uses data to better its services and that users can manage, delete and transfer their data at any time.

No, users cannot delete their personal data, unless they have a Google account. Google collects an enormous amount of personal data from people without a Google account, and offers no tools for controlling that data.


Even with a Google account, a great deal of data like analytics is not associated or partly associated with the Google account.

In theory one would have to go to every website and request they report/purge data from analytics, but Google provides no mechanism for websites to comply with such a request.


I don't think one can request that even in theory. If the data is not personally identifiable, then it is not owned by you (at least in my understanding of gdpr and similar privacy regulations). "yabadadoes (or their IP) visited the site" is your data. "someone from yabadabadoes country visited the site" is not your data and it's not up to you to say what the site owner does with that information.

A user has clientids and/or userids and Google seems to be capable of connecting these back to learn things about users and sites. So what exactly is personally identifiable is a secret..

As I see it Google has fallen into the false position that things they agree not to do based on user settings is the same as not collecting and storing the data necessary to do them. I don't think regulators will see this Google's way.


If the analytics records aren't associated with those IDs, and can't be reidentified due to protections like k-anonymity, I don't think it's relevant that Google has other records that are subject to gdpr, and are handled differently.

Google asks visitors to agree to personal data collection the first time they visit Google Search: https://i.imgur.com/jRl3ui0.png

Some of this data collection can be turned off, though it appears not for all data categories across all Google services. Visitors are not offered a clear way to access, export, or even delete all personal data collected on them by Google.

The way several Google services are collecting personal data is in breach of GDPR.

https://gdpr-info.eu/chapter-3/


> Google collects an enormous amount of personal data from people without a Google account

Can you provide an example?



I hope part of this investigation revolves around AMP and Google's tracking and analytics on AMP sites...

amp is so annoying. Just link to the sites no need to track (can't they just do it with js) and then the number of people who then post them to Reddit or Facebook because they copy the results link the whole thing is infuriating.


That's a great initiative, though we need a solution that does not load AMP pages from Google before redirecting to the real site.

Changing Google's SERP inline I guess?

Reddit and Facebook are relatively well indexed on DuckDuckGo. Do yourself a favor and change your default search engine (just use !g when you need it).

They're saying that other people link to AMP on social media. That doesn't help their issue.

I was forever frustrated with my iPhone and safari when I would search something and a reddit link would come up with the result I wanted to see. Click it and would get the amp link open and at the bottom of the screen I would get a pop asking me to download the reddit app but it is impossible to close it without actually going to the reddit site which is what I want in the first place. The end result after being frustrated for long enough, I finally went into settings and set safari search to use Duckduckgo. I hope amp dies.

Wait, amp does that, I thought Reddit itself was the culprit!

Reddit is the culprit, amp pages don't inherently have such high level logic as this built into them. Reddit chose to make their amp pages behave this way.

I like AMP. It's at least 10 times faster than most news sites.

It's still useful for bypassing paywalls. I don't care if media orgs like nytimes use it, I care when the average joe blog or website uses it and it becomes annoying.


This doesn't make any sense. If anything, AMP would be something Google would use to try to prove they are not being anticompetitive. Instead of making publishers integrate directly with them like Apple News and Facebook Instant Articles, they ask publishers to publish in a format that can also be consumed by their competitors (and what's more, is actually consumed by their competitors).

Somewhat related, a thing I found suspicious, though not surprising, the last time I tried it:

Try Googling for how to block all Google Analytics and related services in your Hosts file.

Does it seem harder to find a simple list of IP addresses + instructions than it should be?

While Google Search results quality has gone down across the board in the last 3 or so years, it seemingly tries particularly hard to pretend it doesn’t understand what you mean with queries like these.


Why would Google put itself under insane risk from anti-anticompetitve regulation and huge bad PR just to prevent a tiny minority of users from blocking google analytics?

This is hardly unusual for them; they removed the ability to blacklist sites from domains because people were blacklisting commercial sites.

Do you have any source for that? Otherwise, it’s just as much conjecture as the other claim.

I recommend googling :)

There's not going to be a simple list of IP addresses for any service behind an ever changing global collection of load balancers shared with a number of other services.

When I type "block google analytics hosts file" into Google, the first three results are guides on how to do that.

This might help:

https://github.com/StevenBlack/hosts

Apart from adware and malware it offers to block url's falling into other categories like fakenews, social, gambling, and porn.


Googling "hosts zero" should give you a good block list. Google is rarely the only thing people want to block, so it makes sense that there isn't $company specific instructions.

https://news.ycombinator.com/item?id=7743202

https://someonewhocares.org/hosts/zero/


Do you get better results for that with other search engines?

I wonder how long until we get an EU version of Google/Yandex/Baidu.

You mean like Qwant (French) and Startpage (Dutch)?

Startpage isn't an actual search engine, it's just a wrapper around Google's results. That's not the case with Qwant, Yandex, or Baidu as far as I know.

Also, this: https://reclaimthenet.org/startpage-buyout-ad-tech-company/


Quant only exists in order to receive all that sweet, sweet grant money.

All for-profit companies only exist in order to receive that sweet, sweet money.

So basically the quintessential EU version of Google.

Also Seznam (Czech). They have also asorted related services, like video streaming, an auction site, shopping aggregator, etc.

there was a scandal where it was shown qwant had its search result from bing

not holding my breath

DDG seems like a good contender


DDG, Ecosia and Quant all use the same underlying Bing (Microsoft) data.

Startpage recently sold out to System1, an ad tech company.

I would not recommend it anymore.


Never, EU hates tech & its inherent effects on society. This investigation is an example of it.

TIL surveillance is an inherent effect of tech on society.

I wonder if the GDPR had any effect, but I hope it will change things. Annoying users is better than letting the whole data hoarding keep happening.

At least california introduced a bill, too.

I'm really curious where all of this will lead to. I wonder how many people are informed that brexit and the trump election were carried by big data. I'm also curious if Obama got help from those big data techniques, I remember he used facebook, but I don't have details.


Finally. Google should be broken up, they have WAYYY too much power

The problem with breaking them up is that the search engine is the whole ballgame. Separating the search engine from, say, gmail would have negligible benefit. It might even make things worse if the newly independent gmail company started doing a lot of the scummy things free email providers sometimes do to make a buck against everybody's existing gmail account.

What you really need isn't a breakup, it's some way to foster sustainable competition for search.


I’d go further, and I’d be all for a full ban on targeted search and ads in the EU.

Google is programming people on a massive scale. Compare your news feed to your partner/friend/colleague’s phone.

No-one I know outside of tech has any notion that what they’re seeing on that feed is only shown to them. Social targeting in tech has completely broken society with these tools - and Google is arguably worse than Facebook in this regard because their feed is ubiquitous, people just don’t realize they’re in Google’s world the entire time they’re on the phone because they didn’t touch a Facebook or Instagram icon to start an app.

We assume non-tech people out in the world don’t know much about how this works, but the average user is even more naive than any of us realised.


I'm not for it nor do I think it makes a ton of sense, but there's an argument to break up ads market/serving from supply side (could further break apart youtube & search).

Search engine can be broken up regionally and the regional versions allowed to compete with each other globally. So Google France competes with Google Germany competes with Google Spain etc. This happened with Ma Bell - https://en.wikipedia.org/wiki/Breakup_of_the_Bell_System

The breakup of the Bell System was a huge failure. When they were separate they never actually tried to compete with each other and then they just bought each other back up again.

What actually created competition for the telephone network was the internet. Competition from cable and satellite and cellular.

That's what you need. Something similar but not exactly the same, so they can each carve out a niche and provide a backstop against the most flagrant abuses but they aren't in such direct competition that they're more inclined to cooperate than compete. So that you have companies who can make choices knowing that their position is different than their competitors and the competitor can't just mirror their action to neutralize its competitive advantage, which otherwise removes any incentive to be the first to engage in pro-consumer behavior.


Would that work in this case? I’d expect that to function like a massive multi-armed bandit test, but I’d assume (without any insider knowledge) that Google already does that internally anyway?

Would it be “better” (given the regulatory goals) to order Google to do XYZ and provide an audit trail to demonstrate compliance?


Problem is new issues will keep cropping up every 2 days. And govt regulators have to keep reacting.

Putting pressure on Google requires seriously increasing competition.

Let the local units rebrand and compete against each other. Maybe even auction them off to other big players. Whatever needs to be done to even the playing field. Google need a nice kick in the ass for their own good health.


Yeah, and look how well that turned out - all the bits eventually merged back together to form the oligopoly you have in broadband Internet access today.

Took 30 years and in that time the internet was born. Key is to keep creating conditions where they compete with each other. Nobody wins overnight if the game has good rules.

Force federation of search data.

The data is already federated, that’s what crawlers do - they crawl federated data.

I don't disagree, but if we go that way then so do Facebook, Microsoft and Amazon. I doubt the EU can get into that without it triggering some massive and nasty reactions, and the US seems to have abandonned any kind of proper corporate control, so this will never happen.

I wouldn't break them up but I would put very strict privacy laws.

Agreed. At this point I can't run a private email server because Google simply will not deliver the emails anymore.

Being able to simply dictate who can do what on the internet solely because they can undercut everybody's price (aka give away email free in return for mining your data) is far too much power.


Huh? I run a private email server... Took a ton of work though. But it works fine?

Are those mails marked as spam? Or what happens?


Yes, they are now marked as spam.

As I am the sole person to ever use this server and I have been sending to some of those emails since almost as long as gmail has even existed, the fact that they are now marked spam is simply an admission that Google doesn't actually give one shit about accuracy and solely trains their ML networks on things that bring them revenue.


Has your IP changed during that time? How do you have SPF and DKIM configured?

These questions don't help. I get that you're trying to troubleshoot the OPs mail server and are trying to figure out what might be done to get their mail delivered again. But that's a form of victim blaming if that server was never used to send spam and the OP is not sending out spam their mail should not be classified as spam and if it is that is clearly uncompetitive behavior by Google because the difference between incompetence and malice is something for Google to illustrate, not for the rest of the world to bend over backwards to accommodate their demands.

Imagine for the moment that the OP's IP address has changed during this time and they don't have SPF or DKIM configured. Or imagine SPF is misconfigured to say "only IP 1.2.3.4 is qualified to send email for this domain" but they've since moved to another IP. Or imagine their DKIM configuration is broken and their messages aren't validating. There are a ton of ways you can configure a mall server so that recipients can't reliably tell you're not sending spam: it's not just what content you send but also the reputation of your IP and the signals you choose to send via SPF and DKIM.

This isn't specific to Google: all the companies that receive email have the same difficult problem. Email is the only major fully open federated push protocol, and spam has come close to killing it.

(Disclosure: I work for Google, nothing related to mail, speaking only for myself.)


One of major advantages of GMail is lack of SPAM and phishing.

If the trade-off is that Google accommodates mails sent from badly configured servers and I receive a lot of SPAM, or they drop that mails and limit SPAM, I will go with less SPAM. I know it's not cool for decentralized Internet, but that's the utilitarian reality for me.


[flagged]


Would you please neither start nor perpetuate flamewars on HN? We don't want this kind of thing here.

https://news.ycombinator.com/newsguidelines.html

We detached this subthread from https://news.ycombinator.com/item?id=21673898 and marked it off-topic.


Because the citizens of the EU wouldn't put up with it?

Idk, if they put up with their pensions being invested in neg interest rate euro junk now, I don't see how some DPI and tech providers that may have questionable quality/service that they are in charge of is any worse. Might as well keep squeezing…

I think you misunderstand the role of the EU

Every time someone says "I think the EU is overreaching here" the response always is "the EU literally has no power at all"

Actually you might be right: it's really Germany and France pulling the strings here.


This is such a strange comment. I had no idea that anti EU rhetoric escalated so much that it is being compared to China. I mean, of course there are US/EU disputes, even more when the tariffs started, but communist party?

Yes, may the heavens forbid any other solutions are available to pursue outside of continuing to engage with a provider that has had dubious standing with such a jurisdiction for way to long, other than hitting them up for a quick payout every other year.

Must be anti EU, no other notions/stances or thoughts on complex issues could be possible.


These are complex issues and people are willing to discuss them but my comment was about how you straight up compared EU to a communist regime

You are alienating yourself from any reasonable discussion by making such extraordinary and baseless claims that are of no merit but are meant to elicit an emotional response

Arguments like that work in echo chambers but once you step into a discussion that is in good faith you will notice that it is counter productive and doesn't lead to anything useful


Alluding to CCP usage of DPI (baseless?) and saying how it can also be implemented in another jurisdiction if they so desire in order to enforce the usage of their own home grown solutions (or possibly pursing other more constructive options that one person in this thread caught on to, and actually decided to share their vision of with us [I guess I didn't alienate that person enough?]), threatens people so much? Or is it the way I said it?

The real emotional response is how often the EU has to hit up the giants for cash to appease their population, for nothing seemingly changing between the next time they decide to crack open the piggy bank and the headlines hit the tape. Do we even know how all the funds that were received from past fines were allocated exactly (line items)? Or is that just some black box people are supposed to feel good about because it game from the bad guy?

If people are taking offense to talking about things like this really shuts down their ability to think and discuss things… setting aside the fact that I don't even live in the US or the EU… well good luck to them…


As a Brit: welcome to the last three and a half years of my life.

I’ve seen and heard the EU demonised as “communist” by those on the right at the same time as it’s demonised as “neoliberal” by those on the left.

I’ve seen people criticise the EU for having too much power and to little, and sometimes it’s the same person saying both things.

I’ve given up on someone when I had a conversation with them that went:

Them: “Leaving will be easy because they will give us a good deal.”

Me: “No.”

Them: “That proves we should leave!”


[flagged]


Why are you fighting this battle?

As you say, my side lost.

That my side won’t accept it is a you-problem, not a me-problem, as I left the country. If you want to rule over the Remainers who didn’t physically leave the UK, you’ll have to bring them on board. But trying to engage me will do nothing. I’ve given up on the UK.

Have you considered the possibility that everyone who you accuse of 1 is sincere about what you label 2? I do not need to know if you think the EU is too left out too right to believe that you are sincere about not knowing its benefits.


Just now??

At this point, why does the EU even bother with Google? What's stopping them from just funding their own "Google" and forcing all their subjects to use them?

If funding your own Google was enough to build Google Search, then people would use Bing.

> If funding your own Google was enough to build Google Search, then people would use Bing.

Some people do use Bing. The reason more people don't is that there isn't a lot of reason to. Even if you can get the results to be as good as Google, actually making them better is hard. And it's just trading one giant tech company with privacy issues for another, so that's no advantage either. People need a reason to switch away from the incumbent.

But if you fund the development as public domain research and open source code, that's a different story. Now you're putting it within the capacity of even medium sized organizations to do their own indexing and run their own search with quality results. Instead of two competitors, you open it up to potentially hundreds or thousands.

If the Wikimedia Foundation or similar started doing search indexing, people might actually trust them more on privacy than they do Microsoft or Google.

And the more of them there are, each running their own fork of the code, the harder it is for SEO spammers to pollute the results, and the more advantage the little guys have over an incumbent that spammers will still continue to specifically target as long as they continue to have high market share.

If the code was available for anyone to adopt or modify, you could also get some potentially interesting new domain-specific search engines. There isn't a lot of high quality competition in the video search space, for example, and if there was then that would help to erode the dominance of YouTube by making videos hosted elsewhere more findable.

You would also be likely to see it displacing Google Custom Search pretty quickly if it was free to adopt.

And what's the cost? An amount of money less than they've received from Google in fines? The worst case is that it only helps a little instead of a lot. (And it wouldn't hurt to do the same thing against Windows considering what Windows 10 is doing on the privacy front.)


I knew people could imagine other solutions other than hitting up the rich uncles that never really gave a damn about privacy issues all in but name only.

What you propose actually sounds like it could be a better way than the status quo (privacy issues or not with current providers), and if it fails, you can always go back to hitting up the rich uncles for ransom!


Why would they have to? That is not how this works — if your company offers a service somehere, it needs to stay within grace of the antitrust regulations that happen to exist there. That is why we have antitrusts.

What need would there be for such anti trust regulations if they only allowed for organizations that were completely within their purview from the get go and designated To Big To Be Anti-Trusted?

Companies change all the time, that is why antitrust can only regulate what happens right now, instead of regulate what companies guess will be.

Even if they were to regulate beforehand, any company they would allow could evolve slowly into something that breaks the initial idea of what was allowed.


But since said company would be completely under their control and grace ahead of time, they could do whatever they want with it, at anytime, even if it changes or not. No need to go through the regulatory song and dance.

It’s much easier for everyone if you only look for rule violations when someone suspects they’ve been broken, rather than forcing everyone to check with you first before doing anything.

That would be much easier, if the rules in the first place were a static system made available to everyone on real time and understood by all, and the costs of engaging with were proportionally burdensome to all. But its not, though if people seem fine with cat and mouse game between EU regulators looking for quick cash, and Google et al always seeming one step ahead (with no notable improvements people search experience outside more skinner boxes of compliance sprinkled around), then I guess that's what most people want and are ok with… meh

I wouldn’t say “want” so much as “it’s the worst except for all the others”.

Legal systems are like code, except interpreted by humans and “common sense” instead of compilers — we just don’t have a way to make it simultaneously easy to understand (legalese), hacker-proof (legal loopholes), and future-proof (implications of new tech).

Governments do try to clear out old laws and amend the ones they want to keep, but it looks like it’s as hard for them as preventing databases from leaking is for us.


Because it's easier to fine US companies when they break/skirt the law. And EU citizens can make their own choices, you know. Maybe I want to use Google instead of 'EUgle' that would be censored by default under German law to prevent hate speech.

As a European I wouldn't trust any European search engine either.

I am a dual US|EU national. While I do not actually trust any search engine, I still trust Qwant and Startpage (European search engines) over ones like Google (US based). But, I almost always use DuckDuckGo.

I think of the Internet as a public utility. I think it needs to be regulated as such.


Qwant is using Bing. So much for being a genuine European search engine.

Is it though, if you have to keep going back and fining them with nothing really changing in between except for a thin veneer of "compliance"? Or is it preferable to have it that way, a piggy bank/rich uncle of sorts, one EU regulators and always rely on to get some cash quick for the next EU citizens sanctioned scheme? Who is really benefiting? Is your search experience really getting any more productive/better with this prolonged cat and mouse game?

Yawn. They have been “investigating” for what, a decade now? But never take any action and Google ignores them and becomes more egregious every day, because they know this. Same with Google’s tax situation.

What happened to those GDPR fines of 4% of global turnover? Looked good on paper but it’s an open secret now that the regulators have no teeth.


EU has fined Google €8.2B so far. Google might not have gotten the hint yet, but it's not for lack of trying.

Fines are the cost of doing business, you need to do more than that to see meaningful change.

What do you expect the EU to do, dissolving companies for the slightest infraction without giving them the opportunity to change their behavior?

I expect them to levy the 4% fine and if it’s not paid start arresting senior executives. Instead they issue token fines and allow the companies to just keep appealing them indefinitely. No wonder Google’s not taking these complaints even vaguely seriously, why would they?

Some people here, it seems, would actually prefer that. I personally don’t, but it is a thing some people want.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: