
Google Has Most of My Email Because It Has All of Yours - martey
http://mako.cc/copyrighteous/google-has-most-of-my-email-because-it-has-all-of-yours
======
arb99
Same with Google Analytics.

GA is the standard analytics for a huge % of websites - so even if a website
doesn't use GA for tracking traffic, Google still has the referral data. And
things like Google Adsense (i'm pretty sure that sends back the referral data
too, for tracking click fraud).

There is no way really to avoid Google knowing a lot about you/your website
anymore.

~~~
Smerity
You hit the nail on the head.

Google Analytics is on a substantial proportion of the Internet. 65% of the
top 10k sites, 63.9% of the top 100k, and 50.5% of the top million[1]. My own
results from a research project I did using the Common Crawl[2] corpus
estimates approximately 39.7% of the 535 million pages processed so far have
GA on them.

The real key to tracking is the referrer data. For the vast majority of
clicks, you land on a site that has Google Analytics or you've just left one
that did. As Google Analytics tracks your referrer, that means they still have
your full browsing history if you jump from GA => !GA => GA => !GA => ...

According to my research[3], Google gets activity information on 51.43% of the
42 billion links analyzed in the 535 million page corpus as either the start
or end of the link uses Google Analytics. This activity means they can
accurately track browsing history on most sites, even those that don't use GA,
simply as timing information, referrers, and knowledge of the web graph end up
leaking user activity.

Used in an anonymized fashion, this is beneficial as it helps Google
understand real world web traffic and hence rank search results accordingly
(far better than simulated activity based upon PageRank or similar). In the
theoretical situation you drop anonymization is where this gets troublesome.

If you're interested, there are more details at "Measuring the impact of
Google Analytics"[3], though much of the discussion is on Hadoop + Common
Crawl. For a privacy focused write-up (primarily worried about the NSA using
Google Analytics), refer to "Google, make Google Analytics HTTPS by
default"[4].

P.S. Everyone who notes "Google Analytics is easy to evade" are correct but
missing the broader point -- the majority of web users will never do that.

[1]: [http://trends.builtwith.com/analytics/Google-
Analytics](http://trends.builtwith.com/analytics/Google-Analytics)

[2]: [http://commoncrawl.org/](http://commoncrawl.org/)

[3]: [http://smerity.com/cs205_ga/](http://smerity.com/cs205_ga/)

[4]:
[http://smerity.com/articles/2013/google_analytics_and_nsa.ht...](http://smerity.com/articles/2013/google_analytics_and_nsa.html)

~~~
mike-cardwell
Isn't it about time we dropped the HTTP referer header? If we lived in a World
where that header didn't exist, and somebody came along today and proposed
that we add it to Firefox, Chrome or IE, there would be _absolute outrage_. If
the proponents then argued: "Yeah, but it will make tracking users easier, and
we might be able to target advertising better to make more money", people
would not accept that as a valid argument.

~~~
rubinelli
The HTTP referer hearder was a very valuable tool for webmasters to correct
broken links or prevent abuse years before ad targeting became a thing.

------
sillysaurus3
I've always wondered whether Google ever digs into communications in a
situation where they're trying to decide whether to acquire a company. It
seems like reading a company's email would be a reliable source of information
about whether they're on a genuine trajectory or whether e.g. they're having
trouble with their investors. I've never looked into whether it'd be illegal
for them to do so. Surely in the EU it would be illegal, because privacy
protection seems to be a serious concern there, but I don't know about the US.

If you use Google Talk, every conversation you've ever had will be recorded
and indexed and tied back to you. If you use gmail, same deal. Even your
drafts of unsent emails will be. If you use AIM, same deal: every conversation
you've ever had on it will certainly be logged somewhere and tied back to you.
Yada yada, same deal for almost every chat program, because almost every chat
program has no clientside encryption. If it does, it's not very popular, or
it's hard enough to use to where people will think you're paranoid if you ask
them to go out of their way to "download this chat program that lets us talk
without anyone logging it."

I think the endgame here is to watch what you say. It's safest to assume every
text conversation is public. How many of us have said something in text to our
families or friends that we'd be extremely uncomfortable saying publicly? It's
a little unsettling.

Then again, hopefully when the TextSecure people ship their browser-based chat
program things will improve somewhat, because you'll be able to talk to
someone else without the conversation being duly noted. (There will probably
still be metadata that ties you to the fact that you're talking to someone,
but at least the content will be protected.) Hopefully it will be easy to
use... I wonder if they need any help in that capacity.

~~~
paul
No. That's not just evil, and most likely illegal, it's stupid too. It's too
easy for something like that to leak, and the damage would be enormous.

~~~
sillysaurus3
Hi Paul. Sorry, I didn't mean to write a conspiratorial comment about Google.
I meant to call attention to the fact that every textual conversation you've
ever had has probably been logged, and just because our legal and cultural
framework presently frowns upon digging through those logs, that may not
always be the case in the future. The logs will persist even after our
cultural norms change.

So it seems important to come up with a technological solution to the problem
of how to communicate without all of it being logged. It's a difficult problem
because it's hard to get other people to actually use whatever you come up
with. That's why I'm crossing my fingers that TextSecure's browser plugin will
take off, because if it's as easy to use as email and as powerful as email, it
could have a very tiny chance of becoming the next popular communications
platform. At that point no one would have to trust any company to preserve
privacy, which seems valuable.

EDIT: I'm confused why my comments were moved to the bottom of this thread,
because they don't seem offtopic. For example, the second topmost comment is
also about encryption:
[https://news.ycombinator.com/item?id=7731216](https://news.ycombinator.com/item?id=7731216)

~~~
nutjob2
"the fact that every textual conversation you've ever had is logged"

That's patently untrue. What are you basing that on besides your own paranoia?

~~~
sillysaurus3
The fact that if law enforcement demands access to your conversations,
companies can readily produce them.

~~~
moultano
Naturally you only hear about the times companies could produce them, and not
all of the times they couldn't.

------
eps
I played with an idea of an off-site delivery the GMail-destined emails.

Basically instead of an actual email the recipient would get a link to an
https'd page on my mail server and a brief note explaining that due to
delivery policy the message is available only at the link.

The reason why I started looking at this was that I was buying a house and the
broker person was using gmail to handle the transaction. From negotiation to
all the forms with all juicy details. I switched him back to the fax mode, but
it got me thinking that it'd be nice to have a system in place that would try
and offset such negligence, automatically.

I never got past a rough prototype though, but perhaps I should've.

~~~
andreasvc
I don't see why you would single out Gmail at this point. You're basically
rejecting email as a secure medium (I don't disagree).

~~~
claudius
E-mail between secure servers is perfectly secure (and end-to-end encryption
only adds content encryption but keeps the amount of metadata generated the
same). The problem is that Google’s email servers are not secure; nor are
those of any other email provider. Strictly speaking, not even hosting your
own dedicated server somewhere will protect you from these issues.

~~~
andreasvc
Uh no it's not perfectly secure because if you don't use e2e encryption you
only get opportunistic TLS and you can't control whether your mail will be
transported over unencrypted connections. Furthermore, the contents of the
email arrives unencrypted at every mail server. So you're basically agreeing
with exactly what I said ...

~~~
claudius
You get the TLS you configure the servers to use and a server that only does
opportunistic TLS is certainly not a “secure” server.

~~~
andreasvc
A mail server that only talks TLS is not following the SMTP protocol and is
not a part of the global system commonly understood with the term e-mail.
Maybe it would be a great idea to migrate the whole world to such a
configuration, but in practice it wouldn't give me much confidence. If my
server A hands something off to B for it to be delivered to C, then I have no
control over whether the link between B and C is secured, so e2e is the only
way to be sure.

------
cromwellian
I think if you're really concerned about the Feds snooping on email, you need
to use end-to-end encryption. Any large ISP or portal is going to be a juicy
target, and since the majority of people don't want to run their own email
servers, the only recourse is not to depend on trusting the servers. Even if
you managed to convince everyone to leave G-Mail, they'd still congeal back
into another 2-3 big services that the NSA can target.

~~~
nutjob2
If the NSA is targeting you, or you think they're targeting you, then your
email provider is the least of your problems.

~~~
anilgulecha
This standard response misses the nuance of end-to-end encryption: If the
default everywhere was EoE encryption, it becomes significantly harder for
anyone to target you (for an average value of _you_ )

~~~
XorNot
Well every second start proposition on HN is "you give us access to your..."

So if the default was EoE, it still wouldn't matter.

~~~
lifeisstillgood
this is really pedantic but why is End _T_ o End Encryption shortened as EoE?

~~~
XorNot
No idea - I was just grabbing some terminology used in the post I was
responding to. EtE would make more sense.

------
uptown
Perhaps helps to explain the mega price-tags being placed on platforms like
Whatsapp. If future generations are expected to rely less on email, and more
on messaging platforms, then owning the dominant network in that space gives
you a competitive angle to take-on Google.

------
rmrfrmrf
E-mail is not and has never been a secure method of communication.

~~~
autodidakto
Correct. Running your own server, etc, doesn't matter. If what you're doing on
the internet isn't encrypted by you and decrypted by a trusted and competent
recipient... consider it more or less public.

------
blueskin_
"Peter pointed out that if all of your friends use GMail, Google has your
email anyway."

Peter reminds me of the old "If you have nothing to hide..." fallacy. I'd have
expected more from the EFF.

Yes, anything really sensitive should be PGP'd anyway, but using gmail still
gives google the opportunity to do analytics's.

~~~
ronaldx
I also found it surprising that Peter Eckersley would be satisfied to justify
his use of Gmail in this way.

Peter seems to believe that using Gmail makes his friends' privacy
incrementally worse, and yet he is contributing to this problem.

What he really means is: Gmail is more convenient than the other options -
there is no better solution to this. And, that's the problem.

------
ilolu
Why is google being targeted with all such write ups but Facebook gets a pass.
Facebook has many of my photos because it has all of yours. Facebook knows my
browsing habits because all of you have have like button in your site etc etc.

~~~
noahm
Because facebook only has my pictures if I choose to post them there. This is
easy to avoid. Facebook only has my browsing habits if I choose to allow
content from their servers while viewing non-facebook content. This is also
avoidable, albeit less easily. Avoiding sending email to gmail users is far
more difficult. Avoiding receiving email from gmail users is even more
difficult. Additionally, the cost (at least measured subjectively in terms of
inconvenience) of avoiding all contact with gmail users is far greater than
the cost of avoiding facebook.

So, to answer your question more directly, facebook doesn't "get a pass".
Facebook simply doesn't get used.

~~~
camus2
> Because facebook only has my pictures if I choose to post them there.

Facebook has your pictures when your friends that are on Facebook post
pictures of you.

And even if you are not on Facebook,i'm pretty sure facebook has a "shadow
account" system to track people even if they dont signup.

~~~
ben1040
I have a Facebook account with zero friends on it. I used it to "own" an API
key for an app I built for a freelance client.

The "People You May Know" screen on that Facebook account has plenty of people
I do in fact know.

I imagine through people uploading their address books and then Facebook
mining shared connections, they inferred a bunch of my network without me
doing anything at all.

------
dasmithii
I find it incredibly odd that I've never considered this before. I suppose
end-to-end encryption is today's only defense against top-down surveillance.

That said, I wonder if meshnet protocol could be utilized as an alternative.
Although the traditional mesh network is impractical at scale, a virtual
version, or an email-serving proxy network of some sort, could be beneficial.

Well, beneficial if you'd consider keeping email off Google's centralized
servers a good thing.

~~~
dredmorbius
We had something of a meshnet protocol with regards to email previously, or at
least, it was generally tenable to run a mailserver on any arbitrary IP
address at one time. That ended pretty much by the late 1990s due to the ever
growing onslaught of spam.

Today it can be (and often is) frustrating even for established companies to
get their mail delivered to all sources. I've had repeated frustrations
especially with Yahoo, but also AOL (both continue to have a large number of
addresses, if not active accounts -- problems in scrubbing old email addresses
is another challenge). Larger companies may have their own idiosyncrasies
regarding accepting email -- even with SPF and DKIM records, I've not
infrequently encountered companies (some of which, granted, do things involved
making littler things out of little things called atoms) who requested (and
presumably require) the specific IP address of our outbound mailservers for
communications.

More generally, email badly wants to have some sort of reputation layer put on
top of it, though how to accomplish this has eluded general solution (SPF and
DKIM are only band-aids, and already break a lot of legacy behavior). Total
encryption would be good, including of headers. It's a bit of a mess.

~~~
mike_hearn
All major mail providers already use sophisticated reputation systems. The
difficulty of calculating global reputations for the entire internet, quickly
and with statistically meaningful results is one of the reasons email
consolidates under the control of a handful of big companies. You really don't
want to try and replicate that on your own.

Source: I was part of the Gmail spam/abuse team for several years.

~~~
dredmorbius
The approach I've been considering for quite some time is to focus less on the
_bad guys_ than the _good guys_.

Any given user, and often large groups of users (a company or organization)
are going to have traffic patterns which strongly favor a small number of
other hubs (mailservers), in general. That's going to be, generally, high-
reputation and high-value traffic. You want to ensure that it gets through.
That solves most of your problem right there.

 _Some_ of those sources are also spammers or low-value -- email marketing and
the like.

Everything else is, well, everything else. Might be spam, might not. But as a
first pass _it tends to be less valuable_. Which means you've got an immediate
and low-cost option: deny first delivery on a nonpermanent basis.

If it's a well-behaved system, the delivery system will-retry the transmission
in about 4 minutes. If it's a spammer, odds are that it will simply bail on
delivery, or fail to honor the usual retry fall-back schedule. In the first
case, problem solved, in the second, you've now got an additional datapoint
for the source: it fails to adhere to conventions.

All of this is happening largely at the host-to-host level, not individual
senders, so that you're both getting a large level of aggregation (a new user
or service transmitting through a known host isn't a blank slate, you've
already got a delivery history), and the overhead is smaller.

Yes, there are also reputation and other systems (IronPort / Senderbase, now
part of Cisco, for example, as well as the DNSBLs), many of which are
accessible via DNS queries, though the cost of those queries for a busy system
is itself considerable (you probably want to cache results, fortunately, DNS
allows for that).

And all of that logic can be rolled up pretty readily within an MTA. That's
one of the powers of free software: aggregating brains and experience.

------
lewisflude
I don't really see this as a problem. The kind of language makes me think of
those that wear tinfoil-hats.

I always ask myself, who cares? Worst case scenario, Google will sell this
data to a government and I'll go to jail. The effort required to secure email
at this point isn't worth the time or effort it'd take to maintain.

------
reedlaw
Why not offer to host the email accounts of those you contact most frequently?
Probably most of them couldn't host their own email server, and if you've
already gone to the effort to do so, you can help them increase their privacy
as well.

~~~
deptadapt
Because running a mail server can be a pretty big responsibility, especially
if you're letting others send mail from it. The more people using your server,
the more important it will be to monitor for abuse and deal with abuse
reports.

When I first set up my mail server I was pretty excited about being able to
help everyone I know get their email away from Google, Hotmail etc. But once I
had it running, I quickly realized that I didn't really want to give to very
many people. Even if I trust all of my friends not to abuse, I cannot trust
all of their computers.

------
perlpimp
Not if you use GPG. or PGP.

~~~
PeterisP
What would that change?

In any GPG/PGP solution that I'd use, the encryption would be automated and
transparent. In practice, the web-mail-client would anyway decrypt, store and
index that email for convenience - no matter what you do, if your
recipients/senders use some 3rd party email service, that email service would
have access to your emails after the GPG/PGP layer is removed.

What GPG/PGP achieve is defense against MITM/phishing impersonation and
secrecy while in transit between email providers; what it doesn't neccessarily
achieve is secrecy in storage and defense from your e-mail client software
developer. Coincidentally, these are the exact same security characteristics
that a gmail user mailing another gmail user has - there are no third parties
in transit; gmail can prevent insertions in the middle with a faked sender;
but the stored emails are vulnerable to google itself and legal requests made
to them.

