
How Gmail’s Image Caching Affects Open Tracking - superchink
http://blog.mailchimp.com/how-gmails-image-caching-affects-open-tracking/
======
aiiane
How Gmail's Image Caching Affects Tracking, short form:

* Loading images is now enabled by default rather than disabled by default, meaning that a larger portion of emails will be tracked, because it's more likely tracking images will be loaded.

* Images are now loaded through a proxy, which means that all tracking images will no longer provide information like cookies, IP associated with the account, etc - the only information they'll provide is "this specific email was viewed by someone, somewhere."

There is still an option to disable loading images by default. Toggling that
option still results in images being loaded through a proxy, so the second
item above still applies.

As far as privacy goes, the _potential_ level of privacy has increased (the
proxy now allows you to load images if you desire without leaking IP etc.).
The _average_ level of privacy from the change is a mixed bag - more basic
tracking (open tracking) will occur due to the change of default, but with the
trade off that more advanced tracking (e.g. tracking IPs, setting cookies for
correlation with non-email site visits a.k.a. remarketing) will no longer be
possible.

There is _no_ net change to how hard it is to verify whether an address is a
valid GMail address - that's already possible by simply talking to a Google
mail server.

~~~
wodenokoto
> * Images are now loaded through a proxy, which means that all tracking
> images will no longer provide information like cookies, IP associated with
> the account, etc - the only information they'll provide is "this specific
> email was viewed by someone, somewhere."

No, if you know who you are mailing, you can add their ID to the link of your
tracking image, so you know exactly who opened the mail.

<img src="www.tracking.com/image.gif?user_id&other_tracking_info" >

~~~
bad_user
You've got no guarantee about when that image will be loaded by their proxy,
making statistics next to useless.

Google could prefetch those images. Even if the initial tests seem to indicate
against this, they can change this on a whim, or they could prefetch with a
delay, or they could prefetch only a percentage of those images, depending on
ever-changing heuristics. The only useful info advertisers would get from this
is that the email was sent to a Gmail account.

Even if they don't prefetch, they can detect duplicates, especially from links
coming from domains known to generate tracking pixels and there's nobody else
that could do this better than Google. They can also get rid of images that
aren't visible to the user (e.g. transparent or white or light gray pixels, or
images that are too small to be a part of the content). Tricks like generating
images with unique content only work for images with actual content to show.

And it significantly raises the cost of email campaigns too, just as with
spam. I don't have spam hitting my Inbox these days and that's not because
spam has become impossible. Light spam (e.g. promotions from companies you've
got a relationship with) have been moved in the Promotions tab. And I can't
remember the last time I've seen real spam hitting my Inbox.

All in all, I'm happy that they are introducing this feature. It's better for
regular folks or for me - you know, the kind of people that always click Show
Images, because promotional messages are hard to read otherwise (on purpose).

I do hope they provide the option to turn it off. Google is pretty bad at
providing choices these days.

~~~
taeric
If you are sending a marketing email to a person, an easy image that has
unique data is their name. :) And I didn't even think hard on this. Pretty
much any content that could have been easily done as text is easily done as a
"dynamic" image.

Now, they can detect duplicates, but only after they have gotten the contents.
Unless I am mistaken on anything. (highly possible.)

~~~
bad_user
Of course, but it's a whac-a-mole game.

Such instances can be detected (e.g. if you see 100 emails with the same HTML,
but with image URLs that are slightly different). Google can then prefetch
those images or it can re-enable the optin for displaying just for those
emails.

If I were to do email tracking, I would just filter the GMail accounts out of
the statistics, because you can't be sure of when GMail's proxy loads those
images and what you're interested in is the conversion rate (not in the total
number of people that opened their emails, you only care about totals for
emails sent and clicks). But a service like MailChimp is not interested in
doing this, because MailChimp is a third-party that's interested in showing
big numbers to their customers.

And putting these numbers aside, the privacy issues related to IP tracking, or
the security issues are gone. So I think this is good.

~~~
taeric
Certainly whac-a-mole. Didn't mean to imply otherwise.

For myself, no need to filter them out without evidence. Keep a few controlled
accounts to periodically try and see what the delay is. And get extra
suspicious if all images are opened at once on a mass send out.

------
gkoberger
There's an interesting tradeoff -- marketers no longer know your location,
browser, client or how many times you've opened the email. However, any
spammer now instantly knows that the email address is valid.

It's probably an improvement, but not all the way there. For actual privacy,
what GMail needs to do (and I realize this is slightly unfeasible due to the
amount of email they receive) is instantly open and cache every single email
to every single email address (including non-existent addresses).

~~~
tedunangst
If you want to know if an email address is valid, you connect to gmail's
server and send "RCPT TO:<example@gmail.com>" and they will tell you if it's
valid or not.

~~~
Goopplesoft
Are you sure about this? Just tested it out:

    
    
        openssl s_client -connect  smtp.gmail.com:465 -crlf
    
        220 mx.google.com ESMTP u17sm2709629qeb.4 - gsmtp
        helo
        250 mx.google.com at your service
        auth login
        334 VXNlcm5hbWU6
        < BASE_64 USERNAME> 
        334 UGFzc3dvcmQ6
        < BASE_64 PASSWORD> 
        235 2.7.0 Accepted
        MAIL FROM: <my_email>
        250 2.1.0 OK u17sm2709629qeb.4 - gsmtp
        rcpt to: <my_email>
        250 2.1.5 OK u17sm2709629qeb.4 - gsmtp
        rcpt to: <emaildne39g39jd9j9jfsdk@gmail.com>
        250 2.1.5 OK u17sm2709629qeb.4 - gsmtp
    
    

I get an OK with BS emails too...

~~~
tedunangst
You should try telnet to port 25 and don't login. If you're sending what could
be an outgoing email, it is more likely to queue it.

    
    
        MAIL FROM:<tedu@tedunangst.com>
        250 2.1.0 OK g15si484689qej.92 - gsmtp
        RCPT TO:<tedunangst1233141@gmail.com>
        550-5.1.1 The email account that you tried to reach does not exist. Please try
        550-5.1.1 double-checking the recipient's email address for typos or
        550-5.1.1 unnecessary spaces. Learn more at
        550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 g15si484689qej.92 - gsmtp

~~~
parhamn
Cool, thanks!

------
iamshs
The blog post is optimistic, but it will have an effect e.g.
[https://mailchimp.com/assets/images/features/main_segmented....](https://mailchimp.com/assets/images/features/main_segmented.3442793008.png)
Suddenly they don't know the location, OS, browser and referrer. They only
know the refined open rate. Good.

~~~
dangrossman
They know your location, OS, browser from when you signed up for the list in
the first place. You did that online if we're talking about e-mail marketing
and not some developer listserv. MailChimp in particular does record geo-ip
information at signup, so its features based on recipient location should
still work the same.

------
HaloZero
I assume this means that google is caching images only on demand, when the
user opens the email first and only then. Otherwise there is no way for them
to track the first open.

~~~
tijs
yeah i was wondering that as well. if they tested 'around the office' as the
post seems to indicate they might have missed the effect that for every extra
person that opens an email the picture is coming from google's cache and thus
the open is not counted.

~~~
dangrossman
> for every extra person that opens an email the picture is coming from
> google's cache and thus the open is not counted

Each e-mail has a unique image URL; Google has to make a request for each
individual mail even if they're caching images.

~~~
tijs
ok makes sense. a file hash might fix that and save google some bandwidth but
i imagine they do not want to break tracking completely for now...

~~~
dangrossman
To hash the file you first have to download it. Downloading it is the tracking
action, not showing it. Deduplication does not affect open tracking pixels.

------
MichaelGG
How does this help user privacy? Currently your open action is not not tracked
via images, since they're disabled by default. With this, anyone can find out
when you displayed their email. That seems rather crappy.

Nothing stops Gmail from doing the loading on their side (to hide UA, IP,
etc.) but only when you ask for it.

What's Google's motivation for this? Do they do emails that need to be
tracked? Are they doing this for themselves to avoid having to special-case
their own emails?

Wouldn't it be better to work on a standardized way to embed images in email,
so that recipients can get nicely-rendered emails without exposing themselves
to action tracking?

~~~
chmars
At least for me, Gmail's image settings have never worked reliably. Maybe they
just capitulated?

~~~
tokenizerrr
They work fine for me. There's two types of images, embedded images and
external images. Embedded ones always got shown, and it's the external ones
that are the issue.

------
pwnna
am I the only one that just have thunderbird on plain text mode with gmail and
never display images?

~~~
bingofuel
This actually brings up a good point. Does this affect IMAP users or any users
that not using an Google Gmail App? (iOS, Web, Android, etc) My guess is no?

~~~
kybernetyk
Unless Google rewrites the mails I'd say that IMAP/POP users shouldn't be
affected by this.

------
natwharton
Q: If I set 'display images' to 'off', will Google still retrieve images when
I open my email?

If so, then anyone can include an invisible image and always know when I open
the email.. whereas before they had no way of doing this.

------
acgourley
Why couldn't google show images by default before turning on the cache? I
assume it's a security issue, but would be interested to hear the reason in
detail.

~~~
patio11
Among other reasons, imagine me sending support@example.com an email with <img
src="[http://localhost:3000/carefully-constructed-
url"](http://localhost:3000/carefully-constructed-url") /> if I knew
Example.com was a Rails shop in January 2013. That could have been oodles of
fun. localhost:3000 is one of the many, many examples of things that could be
put there. Other examples include probing for internal redmine instances,
attempting to compromise dev/staging servers which are firewalled from outside
traffic, etc etc.

This is _not_ a risk if Google proxies the image -- they'll proxy a 404,
because Gmail's servers don't have privileged, cookied access to apps on your
internal network, dev boxes, etc.

~~~
cypherpunks01
Good point about an outsider potentially poking at internal.corporate.com.
Though you could only trick support@example.com into making a GET request in
this manner, right? Which ideally doesn't change data, but obviously exposes
bigger attack area for vulnerabilities like the rails one.

~~~
kogir
Rails in January 2013 was mentioned specifically because a series of security
bugs allowed attackers to achieve remote code execution with specially crafted
URL parameters.

Thus someone could get a remote shell on your box running as the rails
account, not just access to an internal application.

------
marquis
We sent a newsletter out today with Mailchimp and didn't notice any difference
in opens - I have image display off by default but I think most people don't
care, or are using an mobile device and quite happy to see the images. I
personally think making it harder to track opens is a good thing. Like
Mailchimp says, make your content worth viewing.

------
eli
There's a nice technical explanation of what's going on here:
[http://emailexpert.org/gmail-breaks-email-marketing-
again/](http://emailexpert.org/gmail-breaks-email-marketing-again/)

------
znmail2003
Wonder if anyone's tested setting no-cache/expires headers on the beacon. I'd
like to expect google honoring the cache headers because its a good citizen of
the web.

------
mikeg8
I was waiting to hear from you guys and the answer was pretty much what I
expected. Thanks for the update, Mailchimp is such a great service.

------
mschuster91
Well, Google _never_ can enable pre-caching of images - simple reason: it
would allow an _instant_ check if the email address is valid. Just send a mail
and wait a bit - and you'd know if the mail was valid.

~~~
tedunangst
You know how long it takes gmail servers to respond to an invalid recipient
with a 550? About two seconds.

~~~
Sami_Lehtinen
2 seconds? Not true at all, it's just tens of milliseconds. Didn't time it
exactly, but it seems 'instant', so it's less than 100 ms.

~~~
tedunangst
Maybe it hiccuped? It seemed instant for a valid address, but a little slower
for invalid. I assumed it keeps used addresses in cache but had to go digging
to confirm a negative, but it could be a fluke. Anyway, it's faster than
waiting for the image proxy.

