
Tracking users via CSS - lawik
https://underjord.io/is-this-evil.html
======
TomGullen
What extra data can you track using this method over just normal HTTP logging?

The only things I can think of are how a user interacts with a page - I don't
particularly think this is too concerning - although as with all these things
there are possibly much more creative uses of it that I haven't considered.

There's a new image property loading="lazy" which generally will load an image
when it approaches the viewport. This could also be "abused" in similar ways.

If this does turn out to be a privacy concern, browser settings/privacy addons
could simply load all lazy images or images refered to in CSS/JS files on load
which would nullify this technique.

~~~
lawik
You can mostly track the interactions. Just the fact that CSS was loaded
distinguishes the user from a lot of automated tracking. You should be able to
track time spent on the page with CSS animations as well, up to a point,
that's mentioned in the post.

I don't think you can do anything particularly nasty even with CSS variable
programming which can apparently be used for interactive games
([https://github.com/propjockey/css-
sweeper](https://github.com/propjockey/css-sweeper)). As I looked into things
writing for this post I couldn't come up with a non-JS way to transition much
significant data into CSS.

~~~
TomGullen
HTTP logging can give a pretty strong indicator of time spent on pages (not
perfect, but good with large volumes) as well as navigated routes through the
site already.

It would seem to me that the only advantage of this technique is to track
mouse movements on the page in a very low resolution, and likely labour
intensive way. Low resolution because it only works the first time on each
page load, and is going to be tricky to get any meaningful granularity on the
data.

I am not particularly concerned about this but my privacy concerns are
definitely lower than the general consensus of this site. I only compare to
HTTP logging as this is the most hidden/covert way of tracking users.

~~~
lknik
I used this (via :hover) to play with mouse movement tracking. The resolution
was significantly worse than the JS approach. But it worked, and even allowed
to use some ML analysis on mouse movement patterns. That was about 10 years
ago, so I no longer have the code/site.

------
achairapart
Ad-blockers have been around for years and I have always wondered why the Ad
industry has not moved to server-side yet. Then I realized that maybe they
weren't hit at all by this, but actually that helped with that 50% budget they
often claim go to waste in advertising.

Much like Nigeriam Scams self-select the most naive users with their silly
stories, some advertisers may likely get better impression/click ratios once
the savvy users are out of their game.

~~~
thehappypm
The ad industry has overwhelmingly moved to server side. Google AdWords,
Facebook ads, Instagram ads, sponsored search results on travel sites. Server
side means you need more trust that it’s an actual user and not a server farm,
so it’s less usable outside the walled gardens.

~~~
nieve
This is also part of why uBlock and others blocking ad elements via heuristics
as well as URL patterns is vital and almost certainly a chunk of why Google
doesn't look favorably on those techniques.

------
onion2k
It depends what "this" refers to.

If it's CSS, then no.

If it's loading an external image, then also no. Certainly no more evil than
any other method of getting a user's browser to make an HTTP request anyway.

If it's tracking users then _maybe_. Gathering data is evil unless you have a
_very_ good reason. If you're gathering it and not actually using it then that
is _definitely_ evil. If you're gathering everything _in case_ you need it
then that is also evil. If you're gathering data that's unique to individuals
that's even worse. If you're gathering data that's unique to individuals, and
keeping it, and using it to build up profiles by blending it with other
sources, _and then selling the information_ that's _really_ evil.

Just gathering browser agent strings or screen resolutions though, it's not
terrible. Although I do wonder why you need CSS analytics rather than just
using the server log from the request for the HTML file.

~~~
bnegreve
> I do wonder why you need CSS analytics rather than just using the server log
> from the request for the HTML file.

From the article:

 _Lots of automated traffic on the web, bots, crawlers and scrapers. So if
there is a way that can remove most of the automated traffic without loading
any JS, is that a win?_

    
    
       body:hover {
            background-image: url("https://underjord.io/you-was-tracked.png");
        }
    

_[...] This has a certain elegance because it actually requires mouse
interaction._

~~~
jiofih
The vast majority of bots and scrapers today are headless browsers, so I’m not
sure that would be very effective.

~~~
chownie
I think that's the point, a headless browser is unlikely to trigger :hover
behaviour.

~~~
hoistbypetard
> I think that's the point, a headless browser is unlikely to trigger :hover
> behaviour.

That is not true. I just tried it with headless chrome and triggered :hover
behavior immediately just by synthesizing mouse movement the same way I would
by using it as a scraper.

~~~
gruez
I think his point is that the average scraper isn't going to bother simulating
mouse movements to trigger the hover behavior.

~~~
hoistbypetard
Not specifically for that purpose, sure. I didn't simulate the mouse movements
for that purpose. The hover triggered when I simulated scrolling and clicks
the same way I do when I want to scrape sports stats from sites that hide them
behind scroll movements or mouse clicks.

------
sriku
The biggest loss of privacy (my personal view) is that we've lost the ability
to read without being observed and that's important to maintain a healthy and
diverse mindset in society. It enables people to read without fear of
"persecution". That pretty puts much of the "analytical web" in the "evil"
basket for me.

edit: autocorrect wrote "prosecution" instead of "persecution". Fixed it.

~~~
bostonvaulter2
This is a good point. One potential workaround for this is to implement
something usenet style where a whole corpus would be downloaded to your device
and then you'd just load all the data locally. Of course only a small fraction
of content would be available this way which has it's own set of biases.

------
feralimal
I like this sort of question, albeit it is akin to preparing to re-tune your
violin, in order to do some fiddling while Rome is burning, given just how far
we have lost all privacy.

In an ideal world, it should be up to the user what they want to disclose. So,
perhaps there should be no logging at all. And having loaded a page, the page
should work 'offline' with no further interaction with the page or site by
default. I mean, that's how simple sites appear to work. That they don't work
like how you think they appear to, illustrates how technologists are selling
illusions for profit.

~~~
ComodoHacker
In an ideal world, users would also disclose some behavioral data to help
webmasters get some meaningful feedback about their work.

In our world however, almost all users won't care, and the rest few won't
disclose anything out of suspicion of abusing that data.

~~~
feralimal
> almost all users won't care, and the rest few won't disclose anything out of
> suspicion of abusing that data

Yes, I wouldn't disclose.

But because the users don't want to give their data to you, that doesn't mean
that it is ethically OK to take it without them realising.

That's the rub with privacy. There needs to be some acceptance and tolerance
of an individual's decisions.

Instead though, we have engineered disclosure. In a pure sense its an act of
aggression against another.

~~~
withinboredom
I view it as looking out into the audience while saying what I have to say (in
the case of blog posts), or paying attention to who I talk to and using the
best strategy when doing door to door sales.

Do people put on a mask when they open a door for a stranger or when they go
to the market?

This kind of knowledge is strategic for both people that want to be heard or
to make a sale. Each kind of person requires a different approach. Being
observant doesn't require consent.

~~~
feralimal
Yes, but we all know what it is to go to a public place. We choose that.

When you are on your own home, looking a website, you think you are on your
own, reading something, like a book. You do not have a sense of interacting,
or of being in a public space. You would have that sense of interacting if you
were on an Internet forum, or discord, etc.

A book does not report on you (and I'm talking about paper books not
kindles!). But all websites monitor you. This is to say that the user is
gamed. They are misled into thinking they are looking at a private thing like
a book, whereas the reality is that they are being monitored. It's the biggest
anti-pattern. It's absolutely without meaningful consent and it is absolutely
baked in to the internet. There is no privacy online.

But, in my perfect world, we should be allowed privacy and be able to go
online. Oh well, maybe in the next life!

~~~
withinboredom
I can see your point. Still, when I write a blog post, I’m writing to you,
specifically. Even though I don’t know who you are or anything about you.

It’s fair to “see” you there; to know my words had some impact. After all, you
sought my words. Do I need to know everything about you? No. And that’s where
this all breaks down.

Today’s tracking is like being a door to door salesman, and after trying to
sell you something, I stalk you day and night. Actually, I’m surprised someone
hasn’t tried to sue/charge Google et al for stalking.

~~~
feralimal
To be honest - I'm labouring the point for sure. I personally wouldn't care
about you getting some fairly anonymous data on me. But it would have been
nice if those organising the internet experience had given even a bit of
consideration to privacy, as opposed to how to eradicate it.

My real concern is how we have been turned inside out by corporate technology.
I mean there was a story going around a couple of years ago that Facebook knew
when a couple was going to split up before the couple did!

Its too much!

------
wegs
I find this not evil. If this is covertly used, that'd be evil.

I agree with the argument "If you do it to extract information from your user
to which they would not consent, it’s evil."

However, we tend to get caught up in the right-now and not think through
consequences. If this were widely used, browsers would implement the same
sorts of privacy controls they do around 3rd party cookies, JS, etc.

This seems like a more semantic way to do tracking than many other techniques.
It seems like it'd be easier for browsers to manage.

------
jiofih
The evil part in tracking is tracking _user behaviour_ and personally
identifying data. If you’re tracking overall metrics anonymously, it doesn’t
really matter if it’s done via HTML/JS/CSS, it is probably not evil.

------
surround
The EFF’s own website has analytics, somewhat ironically. But the information
they collect is limited, and the analytics are loaded from a separate domain
(anon-stats.eff.org) so it’s easy to block. EFF’s privacy policy:

[https://www.eff.org/policy](https://www.eff.org/policy)

I think that first-party analytics are kind of a gray area. Third-party
analytics are always evil.

~~~
lawik
Is it that clear-cut? Is a nice friendly org that considers the options and
picks something minimal and ostensibly ethical like Fathom or Plausible and
then just uses that to keep track of how they are doing evil? Or is that not a
third party, just first-party outsourced? Not sure what definition we're
working with here :)

~~~
surround
I’m late to respond, but yes. Third party analytics allows a centralized
entity to track your activity across multiple websites. No matter how privacy
friendly Fathom and Plausible claim to be, I would rather not have to trust
them.

Self-hosted analytics takes this out of the question.

------
lapcatsoftware
You can use the same technique with a:active to track link clicks, by the way.

Technically, this would all be relatively easy to block with your own user
style sheet. Practically, though, a lot of non-tracking sites rely on
background-image for essential functionality, so you'll see a lot of breakage.
It's a dilemma.

------
huhtenberg
Very clever, borderline ingenious.

The task of filtering out bots from server logs can get really tedious even if
there's JS involved. Being able to spot humans using this technique is really
quite helpful.

Edit - body:hover doesn't seem to work in Firefox, but it's trivial to work
around that.

------
rdevsrex
I toyed around with pixel tracking like this before with PHP and the GD
library. You just create an 1x1 white pixel and set the file extension to .png
or whatever, and as long as you've configured the server right, it will
execute and return a pixel. But then you can do all the other tracking you
want. And the user doesn't know any different.

That said I won't use that in the future but it's scary how easy it is.

~~~
weego
This is how certain large newspaper sites do their user tracking so editors
can improve their article's engagement 'live'

------
sildur
If I were a browser engine I would download all the assets, all the images,
whether the user hovered or focused or interacted in any way or not.

~~~
jfk13
Users on metered connections might not thank you.

------
Angeo34
Font and client dimension fingerprinting are the reasons why people should
stop thinking Brave actually protects them from anything. Brendan we both know
it's impossible to solve using a Chromium base you being bitter against
Mozilla is a different story nobody cares about.

Don't take your personal grudge out on your users by fooling them into a false
sense of security Brendan.

------
can16358p
It really depends on the purpose IMO. If you are cross matching that data from
other sources to track, it is "evil" (in the sense of people describe tracking
people, I personally don't care). If you are using it for your own statistics
(how many people visited where, screen sizes, duration, where they scrolled at
etc.) I don't see any issue there.

------
neallindsay
This technique could be used for good (detect an automated harassment
campaign) or evil (unmask a protester agitating for societal change against a
powerful state).

As technologists we want to be able to look at a technology and discern if it
is good or evil. Unfortunately we don't always have enough information.

~~~
TomGullen
> or evil (unmask a protester agitating for societal change against a powerful
> state)

Can you give an example of how this technique could be exploited in such a
way?

~~~
eswat
Find a site the target visits frequently that lets you inject something that
will make a request to your server (“poisoning the well”). Use the gathered IP
address, User-Agent and Language headers to start building a persona to look
for.

~~~
TomGullen
And how does images in CSS help with that?

------
xlii
I find it lesser evil. On one side it’s hidden analytics but on the other hand
I find it much more superior to cookies that I carry around and which a lot of
different entities can track.

I do find way of voting on this matter very interesting. Parsing the logs to
get the results - how amusingly nerdy!

~~~
athenot
> Parsing the logs to get the results - how amusingly nerdy!

This was a big use-case for the _Practical Extraction and Reporting Language_
2 decades ago... :)

Today with decent JSON logs, it's also quite fun.

~~~
xlii
Oh, I wrote my share of scripts in Perl and parsed logs too (I remember when
analytics was all about parsing access logs). But I wouldn't think to organize
survey like that. I'd put this in DB. _Crafting_ site to utilize this is so
simple though. Much easier then adding table to DBMS for sure.

------
extremeMath
I'm a believer that these kind of issues should be solved at the consumers
end.

I haven't built a web browser, but I built a bot and it's somewhat doable to
avoid getting tracked.

A browser could feed a fake user agent and format the browser to be the
correct size. After that I believe it's only IP address and cookies which are
easy enough to be blocked.

It even defeats the CSS tracking mentioned. "Oh someone downloaded image
6374tracker.png, but they were from UAE and are using Firefox" and are never
seen again.

My only weakness on this subject is the low level headers, anyone familiar?

~~~
social_quotient
What low level headers are you thinking of here?

~~~
extremeMath
I'm going to name drop things I don't know about and might be irrelevant

Data/network/transport/session layer can be detected through TCP?

So you need to fudge a few digits somehow at your router.

------
ChrisMarshallNY
That's not a bad idea. I suspect that the author is not the first to come up
with it. It's better than a heatmap.

Like most tools, it is up to the user, as to whether or not it's "evil."

I'm reminded of that rather silly little speech at the beginning of Dark
Phoenix, where Xavier lectures Jean Gray about the uses of a pen.

If I were trying to understand users in something like A/B testing, I might
use the technique, but I'd probably only do so temporarily. I'd need to make
sure that the practice was outlined in the privacy policy.

------
raxxorrax
I clicked "This is evil" although I think that is just partially correct. The
problem with tracking is that it exploits functionality that wasn't intended
to identify users.

In context of reality it is very nice that the author even ponders about it.
This is already less evil than what we can expect on the "modern" web, however
nefarious and tricky the mechanism might be. But loading a resource for
tracking IPs isn't really intrinsically evil.

------
Shared404
I find this slightly less evil then standard tracking, because information is
not shared with a third party.

That being said, I think there's a line. If you use it to gather the exact
same info as server log's, except with bot's filtered, that seems perfectly
reasonable.

If you use it to track everything a user does on your page, that seems fairly
evil.

------
kuu
The common user is not going to be aware and therefore this allows bad usage.
So I would consider it as potentially evil.

~~~
eli
The common user has JS enabled and probably expects their pageviews to be
tracked

------
smichel17
Assuming that stuff like this will eventually become common in the ongoing
tracking/blocking arms race, I wonder how long it will be before privacy-
oriented browsers/extensions start blocking stuff like :hover pseudo-classes
(or eagerly loading all assets).

~~~
chrismorgan
I can definitely imagine Firefox’s privacy.resistFingerprinting mode starting
to block remote resource loading via :hover _et al._ It’s possible it already
does, but I haven’t tested it and find no mention of it. It does several other
things that break the occasional site, this would be much the same.

But on reflection, it’d be more than that: anything that’s hidden (within
`display: none`) and gets shown on hover… hmm. Guess it’d need to ignore
`display: none` in deciding to at least _fetch_ the resources, or suffer weird
“why is this not loading the background image?” complaints.

------
chrisweekly
Related tangent: I remember being floored years ago by a realization that a
seemingly-innocuous bit of CSS (the `:visited` pseudo-selector) could
represent a significant privacy risk or aid to a phishing attack, if it were a
_readable_ property.

------
Shoreleave
So what is the preferred way to at least estimate real traffic without js? I
have a static site that gets some traffic, but I have no idea how much of it
is real. I'd just like to know if I'm reaching closer to 5 or 500 people a
day.

~~~
rglover
Server-side tracking of some sort. Watching inbound HTTP requests and
identifying/filtering stuff from known spam IPs (or your own IP) and origins.

------
JadeNB
One of those “if you have to ask …” questions. If you do it to extract
information from your user to which they would not consent, it’s evil,
regardless of how elegant it is or what the tech backing it is.

~~~
lawik
Definitely felt that while writing the post. So I don't plan to use this
approach.

Just like JS-based analytics I don't think most users would object heavily to
someone trying to differentiate them from non-interactive bot traffic to get
an idea of whether they have any real readers or just bots. But I'm absolutely
certain that there are people who will not concede any analysis of the
visitor's agent capabilities and behavior as legitimate. The browser sets one
boundary, general public another, privacy advocates another.

I personally try to err on the side of minimal analytics. But the default idea
for people creating sites tends to be that Google Analytics is "fine" and
something more privacy-oriented like Plausible or Fathom are the good/ethical
option. I'm not sold on that though I'm glad that there are options that are
less bad. I doubt most businesses won't throw analytics out entirely but they
might be willing to pick something with a better ethics profile.

------
nicbou
I don't find it evil.

Tracking is a problem when it affects your privacy. For example when too much
data is collected, or when the data is handed to third parties. If you can
collect just what you need, and keep that data to yourself, I really don't
mind.

In this case, you measure visits anonymously without affecting the website's
performance. You give me what I want without taking anything from me. I am
completely fine with that.

In the real world, I'd compare it to tracking how many people enter a venue,
or how many beers were sold. There is no way this could be used to tell if and
when I visited that venue and bought drinks. You don't need my consent to do
this.

------
cercatrova
Why is tracking evil? I want to know how people use my site so I can better
optimize it.

~~~
speedgoose
Your use case may be acceptable, but most tracking is not acceptable.

~~~
cercatrova
Why not?

~~~
speedgoose
Because it does not respect the users privacy. It's not too bad in a working
democracy, but not all democracies last forever. Also, ads are bad.

~~~
cercatrova
So what? Why does not respecting a user's privacy make it evil, and why are
ads bad? I'm just trying to understand why people reflexively say these things
as I see these arguments a lot on HN.

~~~
speedgoose
Ads are bad because they use tracking to manipulate people.

Not respecting privacy can have bad consequences. If a country want a list of
people they want to kill for various reasons such as religion, sexuality,
political interests or whatever else, Facebook sells it.

~~~
cercatrova
I understand the privacy part now, thanks.

Why is manipulating people bad? We are manipulated every day by various
psychological factors, like seeing someone eat and getting hungry, and the
government for example exerts economic pressure for unneeded items like
cigarettes and alcohol with a sin tax, manipulating us to not buy them as
much.

~~~
speedgoose
Manipulating people with interests from corporations that are uncontrolled is
not good.

------
JoachimS
Underjorden is always evil.

~~~
lawik
You got me.

------
dusted
technology is not evil, usage and how leaked data can be exploited is what
needs to be thought on to determine if something is evil.

------
vfistri2
use to love ETag :)

------
hooby
It's kinda bad for multiple reasons...

.) definitely an abuse of CSS, which is meant for visual styling

.) invisible to the user, can't even be blocked with noScript (blocking all
JS), and blocking CSS would make most sites unusable or at least less than
"human readable"

.) could be used to track and record a lot of information that does qualify as
non-anonymous, personal data (in combination with IP address and timestamp)

.) probably illegal according to the GDPR, unless you fully inform the user
and get their consent first (before loading the CSS) - and allow them to opt
out.

~~~
dgellow
What personal information do you get that would be different from the HTTP
requests sent by the browser to load the page in the first place?

~~~
hooby
It's absolutely crazy, how much information you can reliably read "between the
lines" if you just collect enough data.

For example check out this CCC-talk by David Kriesel - in which he
demonstrates how much you can figure out just from some limited, publicly
available data on a news website:
[https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engin...](https://media.ccc.de/v/33c3-7912-spiegelmining_reverse_engineering_von_spiegel-
online)

Being able to track mouse movements and hover interactions and scroll position
and time spent and other activity on the page, gives you a ton more data to
mine that way than when just registering page loads.

~~~
dgellow
But... how do you do this from CSS? Do you mean that you would add a :hover on
a bunch of elements to gather multiple requests and the approximated position
of the mouse?

~~~
hooby
For most behavioural profiling stuff it should be sufficient to just know
which page element the mouse was over, no pixel-precise x/y coordinates
needed.

But if a mouse enters element A at time x, then enters element B at time x+n,
and then enters element C at time x+m (and so on) - you can extrapolate the
path the mouse most likely must have taken - and at which speed and
acceleration - to be able to get to all these positions at those time points.
It's just an approximation of course, but can be surprisingly accurate.

