
Google Webfonts, the Spy Inside? - plurby
http://fontfeed.com/archives/google-webfonts-the-spy-inside/
======
cromwellian
This may be an unpopular sentiment, but here goes.

The hyperbole over this kind of reasoning threatens the very fabric of the
Web. Snowden did the world a service in revealing all of the NSA hacking going
on, but the paranoia that is resulting from this is breaking the original
spirit of the Web.

It is, after all, a Web of links, and those links were intended to be not just
between siloed content, but between different sites owned by different people.
Links by their nature, permit tracking. All you need for for sites to pool
their web logs and collude, you don't even necessarily need fancy JS tracking.

When Web 2.0 was ushered in, there was an early euphoria in the community, of
everyone offering transparent data and APIs to their sites, and people being
able to easily compose content and services between multiple actors to make
new sites and services.

It is one of the things that makes the Web better than native -- the ability
to compose parts of the Web. No need for stuff like OpenDoc, or other notions
of document composition, all you need are URLs and semantic elements that
import or interface with external resources.

In the pursuit of paranoia levels of "privacy", what will we lose? Will we
balkanize everything into content silos?

I'm not against trying to make things more "private by design", e.g., proxying
to scrub requestors before the CDN sees the hit, or replicating resources
locally. But if we take this to the extreme, we end up making local copies of
everything, and the Web loses some of the semantic information from it's graph
that I think is valuable to retain.

~~~
MichaelGG
A simple "noreferrer" (or referer if you like) tag on elements or in pages
would solve a lot of this. 3rd parties would obviously still get the request,
but they wouldn't know what page it comes from.

Interesting that "norel" got adopted so quickly for spam. So it shouldn't be
hard to have a "noreferrer" tag added, right?

Yes, users can install addons to modify header behaviour, but site designers
should be able to use third parties without disclosing things, too. Not just
privacy, but security. Currently, apps need to implement a bouncer page to
hide sensitive referrers.

~~~
petercooper
_So it shouldn 't be hard to have a "noreferrer" tag added, right?_

It's so not hard that it's actually already a part of HTML5 and supported by
several browsers :-) [http://www.w3.org/TR/html5/links.html#rel-
noreferrer](http://www.w3.org/TR/html5/links.html#rel-noreferrer)

~~~
MichaelGG
Seems like it isn't on link, image, or script elements though, which is the
way most third party content gets loaded.

------
dewitt
I'm not speaking in any official capacity, but to at least get the
conversation started off with data, here's Google's public FAQ regarding the
Fonts API privacy policy:

    
    
      https://developers.google.com/fonts/faq#Privacy
    

_What does using the Google Fonts API mean for the privacy of my users?_

The Google Fonts API is designed to limit the collection, storage, and use of
end-user data to what is needed to serve fonts efficiently.

Use of Google Fonts is unauthenticated. No cookies are sent by website
visitors to the Fonts API. Requests to the Google Fonts API are made to
resource-specific domains, such as fonts.googleapis.com,
googleusercontent.com, or gstatic.com, so that your requests for fonts are
separate from and do not contain any credentials you send to google.com while
using other Google services that are authenticated, such as Gmail.

In order to serve fonts as quickly and efficiently as possible with the fewest
requests, we cache all requests made to our servers so that your browser only
contacts us when it needs to.

Requests for CSS assets are cached for 1 day. This allows us to update a
stylesheet to point to a new version of a font file when it’s updated. This
ensures that all visitors to websites using fonts hosted by the Google Fonts
API will see the latest fonts within 24 hours of their release.

The font files themselves are cached for one year, which is long enough that
the entire web gets substantially faster: When millions of websites all link
to the same fonts, they are cached after visiting the first website and appear
instantly on all other subsequently visited sites. We do sometimes update font
files to reduce their file size, increase coverage of languages, and improve
the quality of their design. The result is that website visitors send very few
requests to Google: we only see 1 CSS request per font family, per day, per
browser.

We do log records of the CSS and the font file requests, and access to this
data is on a need-to-know basis and kept secure. We keep aggregated usage
numbers to track how popular font families are, and we publish these
aggregates in the Google Fonts Analytics site. From the Google web crawl, we
detect which websites are using Google Fonts, and publish this in the Google
Fonts BigQuery database. To learn more about the information Google collects
and how it is used and secured, see Google's Privacy Policy.

For further technical discussion of how Google Fonts serves billions of fonts
a day to make the web faster, see this earlier tech talk from the Google
Developers YouTube channel.

~~~
justcommenting
This particular issue has come up in previous HN discussions, but I would draw
people's attention to innocuous and quite reasonable-sounding phrases like
"need-to-know basis." What does that really mean for a company like Google,
whose core business model fundamentally depends on extensively data-mining
user information? "Need-to-know" could mean almost anything, or whatever
Google wants it to mean. This is a classic Google privacy strategy:
controlling the debate by defining the terms.

Despite a reassuring policy, you, as the website visitor, don't get to decide
these things and to the extent possible, the fact that this is even happening
is abstracted away from most non-technical users.

Another example of Google's brilliance in 'controlling the debate by defining
the terms': policies like this cleverly (but wrongly) lead the reader to
assume that cookies are the only way Google tracks users or correlates their
activities. What about TLS-based tracking mechanisms, for example?

But this is a problem that's bigger than Google. When information accumulates
in distinct places, the value of exploiting that information always increases.
Eavesdroppers naturally move to those places to exploit that information,
sometimes with a legal backing (NSA/GCHQ) and sometimes without one (Aurora
attacks, and other NSA/GCHQ activities).

Even if you interpret Google's pronouncements charitably, it would be a
mistake to assume that using the Google Fonts API can't or won't harm user
privacy. Google is a massive target for essentially all eavesdroppers, and the
Aurora attacks (and other breaches with lower profiles) show that the
accumulation of information--even under reasonable-sounding terms like
Google's--can still end up in the wrong hands, and can be an inherently
dangerous thing for user privacy.

~~~
lallysingh
(not talking for google) Two quick points:

\- The fonts have to be hosted somewhere. And the more common the hosting site
is, the better the browser cache behavior is.

\- The cache behavior prevents requests from going out. If the font is cached,
then there's no web request going back to google. And there's no web request
on the wire for NSA/GCHQ/Verizon to sniff.

As for the terminology, I personally think that there should be some standards
for defining the terminology and criteria, so that we can get human-readable
privacy policies without getting uselessly vague, into a discussion of how
some backend systems work, or into a giant mess of legalese.

~~~
justcommenting
It really depends on the relevant counterfactual; yours makes total sense from
the vantage point of lots of developers, but I tend to prioritize privacy and
autonomy. When I visit catphotos.wordpress.com, my intention is not to leak
information to Google even though they have great fonts. My intention is just
to visit the website.

So the counterfactual I would frame the discussion with would be something
more like self-hosting fonts by default and prioritizing privacy over
performance (different strokes for different folks, and I realize it can be a
significant performance hit).

To respond to your "more common the hosting site is" comment, Wordpress is
_also_ extremely common, and they probably could have devised alternative
solutions by making different trade-offs.

Cache behavior resulting in fewer requests can be a double-edged sword, too:
if you cache fonts with clients, you're probably _also_ caching a bunch of
other things that may decrease your privacy in other ways. There are many
layers of indirection, especially with NSA/GCHQ/Verizon.

I wouldn't argue that this and another services offered by Google don't add
value for developers and even users (they absolutely do), but my argument is
mainly that there are costs--maybe distant/abstract/indirect costs in terms of
privacy/autonomy that are difficult to discuss in concrete terms, but costs
worth considering nonetheless.

I wish WordPress had been more thoughtful about the trade-offs they made.

~~~
acdha
> I wish WordPress had been more thoughtful about the trade-offs they made

More accurately, you wish that WordPress had agreed with your priorities. They
clearly did think about this and made a different decision and it's unfair to
suggest otherwise.

------
kordless
On the face of things, concern over this type of 'privacy violation' seems to
be reasonable. However, coming from a page that is loading content from
Fontfeed, Twitter, Gravatar, Google APIs, Fontshop and lo...Google Analytics,
I think it's a bit of a silly argument.

If you want privacy, don't expect the sites you are hitting to take care of
that for you. If you expect others to enforce security for you, well then, you
have an entirely different problem.

~~~
a3_nm
This doesn't imply that, as a webmaster, you should be fine with asking
browsers to load external resources from third parties with potential privacy
implications, just because privacy-conscious users should have disabled it by
themselves.

~~~
kordless
I'm not sure, as a webmaster, that I have any better control over my user's
data than Google's font service does. If I think I do, I'm probably naive.

------
acdha
A good way for Google to address this would be by enabling CORS and
encouraging the use of crossorigin=anonymous to avoid credentials being sent
for fonts:

<link
href='[http://fonts.googleapis.com/css?family=Open+Sans'](http://fonts.googleapis.com/css?family=Open+Sans')
rel='stylesheet' type='text/css' crossorigin='anonymous'>

Unfortunately, a quick test
([http://chris.improbable.org/experiments/browser/webfonts/goo...](http://chris.improbable.org/experiments/browser/webfonts/google-
fonts-crossorigin.html)) shows that this can't be done currently because
fonts.google.com doesn't have an Access-Control-Allow-Origin header:

[https://redbot.org/?uri=http%3A%2F%2Ffonts.googleapis.com%2F...](https://redbot.org/?uri=http%3A%2F%2Ffonts.googleapis.com%2Fcss%3Ffamily%3DDancing%2BScript)

(Oddly, the actual fonts are served with "Access-Control-Allow-Origin: *" so
it works if you self-host the CSS, which would presumably be a bad idea:
[https://redbot.org/?uri=http%3A%2F%2Ffonts.gstatic.com%2Fs%2...](https://redbot.org/?uri=http%3A%2F%2Ffonts.gstatic.com%2Fs%2Fdancingscript%2Fv6%2FDK0eTGXiZjN6yA8zAEyM2S5FJMZltoAAwO2fP7iHu2o.ttf))

This is the behaviour defined in the HTML5 spec:

[https://html.spec.whatwg.org/multipage/infrastructure.html#c...](https://html.spec.whatwg.org/multipage/infrastructure.html#cors-
settings-attribute)

In some ways, this feels like an oversight in the spec because
crossorigin=anonymous is actually better than the legacy behaviour but any use
of the crossorigin attribute triggers mandatory full CORS checks.

~~~
LeoNatan25
Google has little to gain from these.

~~~
acdha
Google has an interesting position advocating for improved privacy and
security. This would be a cheap way for them to back that up at relatively
minimal expense.

------
majika
I use NoScript and Policeman on Firefox, with conservative settings (disallow
all active content (scripts, fonts, WebGL), whitelist-only cross-site
requests). I've also configured Firefox to block cookies by default; only
permitted sites can store cookies for the session, and just a handful I allow
permanent cookies.

Web pages load much quicker, Firefox uses less resources, my browsing is
significantly more secure (see [1] for risk of loading arbitrary fonts), and I
can browse the web without Google/Facebook/AdvertizingCorp (and thus the Five
Eyes) building a profile of everything I do. It's a nice feeling.

This set up also blocks ads served from third parties, which I feel is an
agreeable compromise on web advertising. If I send a request to your website,
and you send me a document with embedded images stored on your website, I'll
download them and view them alongside the page. However, if you try to tell me
"go send 5 unsecure requests to each of these three companies you've never
heard of, and execute their 20KB of code, to get flashing ads alongside this
page" \- I'll ignore you.

Sites loading resources from external domains (usually Google) is nothing new.
I've been browsing this way for two years now, and I've developed a healthy
level of contempt for 95% of web developers. The vast majority of them just
don't care for their users; campaigning to get the developers to change their
habits is a broken model. Ultimately, you have to take control, and decide for
yourself what you want to run on your computer.

I don't know why more people don't browse this way; some actually ridicule
this approach ("get with the times"). It boggles the mind.

[1]: [https://hackademix.net/2010/03/24/why-noscript-blocks-web-
fo...](https://hackademix.net/2010/03/24/why-noscript-blocks-web-fonts/)

~~~
MichaelGG
The real quote there being:

" It really worries me that the FreeType font library is now being made to
accept untrusted content from the web.

The library probably wasn't written under the assumption that it would be fed
much more than local fonts from trusted vendors who are already installing
arbitrary executable on a computer, and it's already had a handful of
vulnerabilities found in it shortly after it first saw use in Firefox.

It is a very large library that actually includes a virtual machine that has
been rewritten from pascal to single-threaded non-reentrant C to reentrant
C... The code is extremely hairy and hard to review, especially for the VM.

"

FreeType's news page
[http://www.freetype.org/index.html#news](http://www.freetype.org/index.html#news)
\- has something very curious. Two fixes for the same CVE, but the second fix
9 months later. A look at the CVEs[1] for it is also interesting that they're
all memory safety issues (at least, from a quick glance). So in 2014, it's
still difficult to read fonts without exposing yourself to code execution
vulnerabilities, eh? I'd imagine better languages would help here.

1: [http://web.nvd.nist.gov/view/vuln/search-
results?adv_search=...](http://web.nvd.nist.gov/view/vuln/search-
results?adv_search=true&cves=on&cpe_version=cpe:/a:freetype:freetype:2.0)

------
pothibo
Not using external service for font anymore because it's blocking. That means
your time to first render is directly impacted by the time your user download
fonts from a third party.

And have you ever landed on a site fully rendered but can't see the text? High
chances that it's a third party font that can't be downloaded for whatever
reason.

------
joosters
Another good reason to install Privoxy -
[http://www.privoxy.org/](http://www.privoxy.org/)

Add the following to the config and you'll still be able to retrieve fonts and
other shared stuff from Google's servers, but it'll block any tracking cookies
and hide the referring site:

    
    
      { +crunch-incoming-cookies \
        +crunch-outgoing-cookies \
        +hide-referer(forge) }
      .googleapis.com
      apis.google.com
    

Unfortunately, it won't help against SSL sites.

~~~
userbinator
_Unfortunately, it won 't help against SSL sites._

Proxomitron will, although its not open-source, its author has passed away,
and it's only being maintained by the community. Among the things I use it to
block are these "unexpected links to Google" and if it's something like jQuery
or fonts I can have the proxy host it locally.

------
Nux
Aren't these web fonts just files they can include with their code? Why
include anything from any 3rd party, it's a security and privacy issue.

~~~
kbar13
using font files from a popular public cdn like google fonts is a good idea as
they are generally highly available and are generally already cached on the
user's machine from use on other sites.

~~~
Nux
Kbar, why do you need "highly available" fonts when you can bundle them in
your web site? If the web site is up, fonts will work, if not fonts won't be
needed anyway.

Regarding caching, anyone knows how browsers cache content? I.e. if I host my
own fonts and someone visits me, then visits another web site with same
fonts.. are they retrieved from the cache or downloaded yet again? I'm
guessing they are downloaded again which is unfortunate..

~~~
lallysingh
AFAIK, the fonts are hosted as regular URLs with cache policies specified in
HTTP headers. So, if you host your own fonts, and someone visits your site and
someone else's site with the same fonts, they will download it twice (unless
the other site's referring to your site in the URL). The browser doesn't know
that they're the same font until after it downloads it (twice).

~~~
walterbell
Fonts are a small subset of a site that makes a conscious decision to use a
third-party font. There are many other resources being downloaded, which may
or may not be duplicated. If the sites are revisited, they will be cached.

------
mp3geek
For the end user 3rd party fonts can be blocked via Adblock Plus;

[https://secure.fanboy.co.nz/filters.html](https://secure.fanboy.co.nz/filters.html)

~~~
LeoPanthera
Or Ghostery, which only blocks trackers, but allows privacy-friendly ads.

~~~
mp3geek
"privacy-friendly ads", all ads need to track on some level, so it will only
depend on which company you trust more.

------
rayiner
What the hell is wrong with the fonts my browser already has installed?

~~~
MichaelGG
Going through exactly this when trying to optimize a site for a large company.
Designers ended up with a little cursive font for headings. Getting approval
from bigcorp takes forever. So many months in, that's the design.

There's no cursive-looking font that is commonly available on all machines. So
we either have to use a font, or render images. Since it's used in more than a
couple places, the font ends up taking less time.

Or I could try fighting the design, customer relationship, corporate branding,
etc. teams to convince them the site's better off with just "Sans-Serif". In
short: fonts aren't going anywhere.

Now, it would be nice if a dozen or two fonts of varying styles were included
by default with browsers. But I imagine that'd just lead to designers raging
about how <someone> is trying to limit creativity and enforce conformity.

FFS, many fonts don't even render properly on Firefox on Windows (like on
medium.com) so I highly doubt usability is being considered as a main priority
here.

------
PythonicAlpha
Is this only a font problem? Many sites serve jQuery and other files from
Googles CDN. Aren't these the same problem? Also there are other widely used
CDNs that could affect your privacy .... will this not get very far, when you
want to avoid any of these??

------
justcommenting
This seems to be a growing problem, even among so-called 'privacy advocates.'
The last time I checked, EFF's Privacy Badger extension was designed around
having no qualms about making exactly these types of bad trade-offs on users'
behalf.

The cost of your privacy--even in the eyes of the EFF--sometimes is worth
little more than reliably serving a font or a copy of jQuery and claiming to
respect 'Do Not Track'.

------
grimmdude
I do find it odd that Wordpress would call these fonts from the authenticated
section of the site. If it was just a bundled theme or plugin served publicly
I don't really see the big deal, it is the __web __after all. But in my
opinion authenticated sessions are authenticated for a reason, thus requesting
assets from an un-authenticated resource does seem to be a concern. Just
bundle the font!

------
JetSpiegel
I still don't understand why don't the major browsers ship with at the very
least a copy of jQuery installed locally, and then create a way to replace
that URL for the locally installed version. No request made, faster access
times, is there any downside?

~~~
GeneralMayhem
If sites are reliably hitting a major CDN (like Google) for jQuery, then you
get that advantage through caching anyway. The problem is that they don't, and
if they're hosting their own jquery.js, there's no way to know before you
download it that the script can safely be replaced with the known jQuery. I
can imagine a scheme where the browser sends a hash of what it thinks the file
is and the server only sends new content if it's different, but that would be
a massive change, probably at the protocol level, to standardize, and doesn't
do you any good if the bottleneck is a server that's slow to respond.

~~~
cpeterso
You've just reinvented HTTP Etags. :) It's a cookie-like resource caching
mechanism where the server can return an arbitrary value (usually a hash or
timestamp).

[https://en.wikipedia.org/wiki/HTTP_ETag](https://en.wikipedia.org/wiki/HTTP_ETag)

~~~
oldmanjay
Well, no. There is no requirement in the standard for how Etags are
implemented, so there is no way to use them outside of treating them as
server-specific opaque tokens. They are useless for the parent's concept.

------
throwaway420
An even better reason to avoid using Google fonts is that they're frequently
slow as molasses. Part of the issue is that fonts are frequently used badly
and that browsers often don't handle them well, but it's still one of the more
annoying things online.

------
tiatia
Can you not just put 127.0.0.1 fonts.googleapis.com in your hosts file?

------
ncza
Thank you for making me aware of this insanity. I'll make sure to block those
on my sites. The thoughtless denial of privacy is so weird, no one seems to
mind letting third parties spy on their visitors. Yes, you, Google, jquery,
cloudflare, typekit, gravatar, disqus and whatever your names might be.

~~~
kordless
The only person that really cares about your privacy is you. It's cognitive
dissonance to believe otherwise.

~~~
Squarel
And yet plenty of people with the knowledge and means to ensure their privacy
online and elsewhere get spectacularly vocal about the privacy of those who
have neither.

~~~
kordless
I think we _should_ get spectacularly vocal about protecting people-who-
don't-know-better's privacy rights. We created this damn thing called the
Internet, and sold it to them as this awesome tool. We should at least try to
secure it better.

It is my belief that decentralizing services is one possible solution to this
problem. At a minimum, having the user store their data at home on servers
they plug-in and turn-on may be the solution. We're a ways out from that, but
I'd much rather see people pushing the argument toward decentralization than
faulting a website owner for using nice looking fonts for their gluten free
pie crust recipe. Everyone knows what happens when hackers get access to your
pie crust. They eat it.

