
Millions of email addresses leaking to advertising and analytics companies - aspenmayer
https://medium.com/@thezedwards/the-2020-url-querystring-data-leaks-millions-of-user-emails-leaking-from-popular-websites-to-39a09d2303d2
======
aspenmayer
‘One important trend to notice is how often Google Analytics, Google’s
DoubleClick, Facebook, and Twitter are ingesting the user emails — these are
organizations that should be receiving deletion requests en-masse and they
should all have processes to handle this type of effort already (Facebook
likely has this tech already based on conversations on this research and
additional research from a private report from several years ago).‘

‘This type of email user data in a URL bar synced into Javascript pixels is
most typically blocked by a regular person through “Ad blockers” or through
browsers like Safari, Brave, and Firefox — those browsers use
Javascript/cookie blocking as a default features to protect users (each
browser handles it slightly differently). This breach and research included
here would impact all Chrome users of these websites who went through these
specific user flows and who didn’t proactively block all Javascript (a rarely
used option) or use a Chrome “Ad blocker” extension that blocked this type of
Javascript. Some people using the other “safe” browsers (Safari/Brave/Firefox)
could have been protected from the leak due to their 3rd party Javascript
requests being blocked.’

Original title too long. It was: The 2020 URL Querystring Data Leaks —
Millions of User Emails Leaking from Popular Websites to Advertising &
Analytics Companies

~~~
iamacyborg
I think you overestimate how many "regular people" use an adblocker.

~~~
aspenmayer
I would agree with you. Those aren’t my words, just two quotes from the
article with some relevant info. As others have mentioned, it is email
_addresses_ that have been leaking and continue to leak.

------
indymike
This may be the first time I've seen an article try to sensationalize webhooks
and third-party APIs. When I read the headline, I was expecting some kind of
hack, not a story about how Dave in IT hooked the contact us form up to the
CRM using webhooks and Zapier...

The meat in the story, is a real problem - irresponsible mingling of PII in
analytics data.

~~~
vorpalhex
The amount of times I've heard "Well, it's just analytics data, it's public
anyways" from people just drives me up the wall. No, it's not public, it's
still PII, you still have to guard it correctly!

~~~
naravara
It is very hard to make people understand that scale matters when deciding how
sensitive data is. They only ever care about it in whatever narrowly defined
use case they're worried about. The idea that someone can take an element of
not-particularly-sensitive data from you, combine it with elements of not-
particularly-sensitive data from elsewhere, and end up with a database full of
extremely sensitive data simply does not click.

------
polote
Worked for a an analytics company, and this was a problem for us, because we
didn't want to collect pii at all.

So as soon as there was an @\S+\\. in a url we were anonymizing the full url.
Customers were not happy though

But I can tell you that this was present on a lot of websites including Bank
websites

------
iamacyborg
This is explicitly against the Google Analytics ToS - what's the bet they do
nothing about it?

~~~
reaperducer
They'll just invoke the standard SV bubble playbook:

1\. Do nothing until it gets reported in a large dead tree medium.

2\. Blame the reporter for not understanding technology.

3\. Deny it happened.

4\. Say it only affected a small subset of people.

5\. Say it was only a single rogue "trusted partner" involved.

6\. Put out the boilerplate "We can do better" press release.

7\. Keep cashing the checks.

8\. Lather. Rinse. Repeat.

~~~
iamacyborg
Zuboff's Dispossession Cycle.

------
wiredfool
This covers query strings, but in theory, if you've got any third party JS on
your registration/private page, that JS can get the contents of the form and
exfiltrate it.

So, basically any 3rd party analytics has the ability to do this, query string
or no?

------
stiray
How I am handling it (required mail server):

Each email adhering to some rule (magicmarker[a-f0-9]+)goes to my account.

Each registration anywhere gets unique email address generated as
magicmarker<b64(hash(domain+salt))>@mydomain.com Salt is there to keep it
unguessable.

When I get any spam, I can redirect it to /dev/null and verify from where it
came from to sent hate mail to domain owner or whatever.

0 spam. 0 tracability. Ability to track who sold/leaked my mail address.

~~~
dhimes
I thought the bad guys knew to strip the markers by now.

~~~
zenexer
That + symbol is part of a pseudo-regex; it’s not part of the email address.
You’re probably thinking of the email+whatever@whatever syntax, which isn’t
what’s being described here. There’s nothing to strip in this case.

~~~
dhimes
Whoops- you are correct. Pre-coffee me mis-read that. Thanks for the heads-up.

------
battery_cowboy
On a side note, on Firefox Android the page here freezes for about 10 or 15
seconds while loading. It's really annoying.

~~~
vmception
Product managers are seeing "longer time on site" in their analytics reports
and keep adding more things thinking it is meeting the company's quarterly OKR

they don't know the "higher engagement" is because the mobile user's browsers
are literally frozen

and the A/B test says "keep going with the B test!" "do it again!" in a tree
that keeps evolving down one side of the graph towards more and more obnoxious
experiences that the company doesn't even know is obnoxious

given the misaligned incentives I think this is also an area California can
regulate or threaten to regulate, I don't like "tech regulation" but I can't
think of any other party to curb the behavior. If you like "private sector
solutions" more than "government solutions" then Apple and Google can pull the
rug under all the other company's feet by crashing sites on the user's phone
using other user's crowd sourced data, or making certain analytics packages
not run, etc.

~~~
creato
... or maybe it's just a bug?

~~~
vmception
yes the bug of product managers following A/B tests blindly and not knowing
its a bug, resulting in a worse and worse internet browsing experience for all
of us

------
sdoering
I am working as a data analyst. We had some cases were we were called to fix
issues from other agencies.

The clients had forms data being sent as get-requests and from there email
addresses and even more personal data in the URL (street, date of birth, and
even more) was being transmitted into the analytics tools and also into
marketing tools.

Regarding GDPR this is a breach and needs to be communicated to officials as
well as the people affected.

Even a bank was affected by this type of implementation when customers wanted
to open an account or make a loan application.

~~~
jarym
HTTP 101: do not transfer anything you don't want 'cached' as a GET request.
Not only that, but some browsers will pre-emptively send GET requests or retry
them so you'd have the double headache to worry about duplicate requests on
the server-side.

It shouldn't require much experience to know when to use POST or some other
HTTP verb - banks certainly have no excuse.

~~~
user5994461
Email 101: Clickable links in emails are always GET, so extra parameters are
set in the query string.

Marketing 101: User actions should take as little clicks as possible, so the
action should be performed as soon as the user clicks the (GET) link.

~~~
shkkmo
> Marketing 101: User actions should take as little clicks as possible, so the
> action should be performed as soon as the user clicks the (GET) link.

Nope, some email clients might prefetch urls in email for various reasons. You
should absolutely NOT do this (unless you are decitefully trying to game you
engagement metrics.) The only case where you might be able to get away with it
is when the user has an active login session that you can verify prior to
performing the action.

------
superpermutat0r
Everyone in sales is using the leaked emails. I have so many one-time emails
that leaked (haveibeenpwned.com) and someone was smart enough to just use
those leaked databases and sell the emails to sales departments.

I always ask the sales person where the hell they found the email because I
just used it once somewhere long time ago.

------
alexjray
I have a bit of a contrarian view on data on the web. I think, eventually any
data on the web is going to be in the public domain at some capacity. Data
will be everywhere and readily available, mostly for free.

~~~
reaperducer
You sound like a public television series circa 1985 talking about this new-
fangled thing called "the internet."

------
harikb
For many of the examples described in the article, it is the site (not Google
or Facebook) that left the email address in the URL. That is bad coding
practice. Just so we are clear, it isn’t the “evil advertising companies did
this”. Now asking Google to randomly search all unwanted referral URLs
received for a customer specific pattern and delete what may look like an
email address seems unfair. Google in no position to use or recognize that
data as email. If they tried that, it would be brittle and unmanageable.

One could argue Advertising shouldn’t exist or that Google should not store
anything. But the GDPR argument is BS, although admittedly legal.

It is like throwing a small rock in to the neighbors yard and asking them to
retrieve it for you.

~~~
bhhaskin
Google will also ban your GA account if they find that kind of information
showing up. They aren't dumb and know exactly what they can and cannot get
away with.

------
mrgreenfur
Emails on query strings are leaks that should be patched. That said you can
bet these companies are sending emails over formal integrations to tons of 3rd
parties for analysis, targeting, advertising, etc.

CCPA is not nearly as strict as the GDPR and it is not illegal, unfortunately.

~~~
rnd_dude428673
This is the most accurate assessment. Considering how "not secure" email is in
general and how easy it is for this information to be passed around behind the
scenes this is almost a non-story.

I feel this article really stunk of an attempt to over-sensationalize some
sloppy coding that is probably happening on 50% of the websites in the world.
To think otherwise is nothing but a utopian view of reality.

------
carapace
_Email addresses_ not _emails_.

~~~
floatingatoll
“Emails” is a valid real-world variant in use by non-technical people.

“Do you have each other’s emails?” is real and normal usage.

It’s fine to “get off my lawn” this, but it won’t help solve the data leak
posted here.

~~~
carapace
Yeah, sure, in casual converstion, but in this case I read half the thing
before it was clear what they meant, and it makes a big difference because
leaking email addresses isn't the same thing as leaking _emails_ , eh?

(BTW I think it sucks you're getting hammered by downvotes FWIW I upvoted you
just to counter balance them.)

------
dathinab
They are leaking email _addresses_ not emails. I was irritated how they manage
to leak emails in a context where no emails are involved (but email addresses
are).

I which people would be a bit more clear in the language they use, especially
if it's about vulnerabilities.

~~~
Torwald
You are trying to impose your own internal representation of email as a
concept on others here.

To write "email" is correct. It would be more precise to write "email address"
to make the distinction from "email message." But it is not the cultural norm
to equate "email" with "email message" as you seem to do.

Juuust kidding. My point is: they probably do not have the used more clear
language, because their internal concept of what email is is so fuzzy. That
would be my guess anyway.

~~~
lotsofpulp
It’s because emails would get more clicks than email addresses.

