
Ask HN: Are all analytics violating user privacy? - XCSme
I am building an analytics platform myself, and have often seen comments from other users on HN saying that all client-side analytics are evil, or even that all analytics are bad and violate user privacy.<p>Are analytics evil, even if their sole purpose is to improve user experience?
Let&#x27;s say that we are building the infrastructure in a city, if we had stats on what streets have daily traffic jams, that would tell us a lot about the behavior of the people, but would also allow us to better direct traffic and reduce those issues.<p>Where do we draw the line?<p>I do agree that any analytics selling or mining or for the purpose of persuading users into buying more stuff they don&#x27;t need or make them spend more time in Instagram are not really ethical, but we can still have legitimate use cases for analytics whose sole purpose is to make the user experience better, even at the expense of some privacy (in a public space&#x2F;website).
======
dkersten
I don’t think analytics are evil in and of themselves. If they are used to
improve user experience (eg by analysing what content or features people use
and how so that the workflow can be improved, or new high value features or
content added), then I think analytics are great. Without the data, you are
blindly throwing stuff at the wall hoping it sticks.

But that’s rarely what analytics are actually used for and no matter how much
you want your platform to be used for that, people will misuse it. Instead
analytics are used to wring as much value out of a user as possible, typically
by using analytics to find out how best to suck up users attention and get
them to buy more stuff or interact with more adverts. I don’t think people
would mind their privacy being eroded quite so much if it wasn’t being used
against them to trick them into spending more money (either directly or
indirectly via advertisement). That’s ultimately what it comes down to: why do
people buy personal data? To find out ways to trick users into spending more
money.

Understanding user behaviour so that you can build better products or produce
better content (but why are you doing this? In most cases it s because better
content = more users to click on adverts...) is fine, or even necessary. But
most analytics data isn’t used for that, or at least not solely for that.

In a product I’m currently working on, I’ve taken a stand that the client will
have NO third party scripts or analytics. Zero. And on the backend, I only use
third party services necessary for running and maintaining a quality service
and disclose all providers I use and share data is processed or stored by
them. I’m not completely analytics-blind, as I do track metrics and logs, but
I try to keep them focused around what’s needed to monitor service health and
debug issues and only high level data on what people are using (anonymously)
and how frequently.

As others have mentioned, I think consent is an important aspect. And not this
“your privacy is important to us, so uncheck these thousand checkboxes if you
want privacy” bullshit. I think if you are open an honest with users about
what data you collect and what you do with it, and ask them if it’s ok with
them first, then I have no real problem with it, especially if it really is
only for improving the user experience.

------
badrabbit
Without explicit consent? Yes. When you visit a website or a brick and mortar
store, outside of security monitoring most reasonable people have no accepted
expectation of their activity being monitored and analyzed for purposes
unrelated to the transaction they are attempting to complete.

Most privacy issues boil down to consent.

1) Explicit consent must be granted by users for all groups of data collection
or mining for which any significant portion of people have not granted
implicit consent

2) Lack of consent should not be used as a reason to deny service, except if
the service directly depends on the collected data to fuction.

> Let's say that we are building the infrastructure in a city, if we had stats
> on what streets have daily traffic jams, that would tell us a lot about the
> behavior of the people, but would also allow us to better direct traffic and
> reduce those issues.

Drivers that want to help reduce traffic jams should opt in by a sticker or
some other solution. But to be honest a simple count+location is something I
implicitly give consent for. If a person sits byba roadside counting cars I
have no problem with it. The problem is when they record video,images or
collect identifying information such as color,make/model,etc... Then I no
longer give implicit consent.

To be extreme, blowing up the road also solves traffic jams, the solution
should come with requirements such as keep the road intact and don't stalk
people.

~~~
XCSme
Currently, in the examples given (brick and mortar stores, city
infrastructure) analytics are heavily used without any consent. The common
sense and law now is that there is little privacy on a public property (eg.
you go outside, others have the right to film you), and websites are also
"public" properties (the big difference is that you access them from the
privacy on your own home). You think that the law should be changed? But
again, where do we draw the line. Should a criminal give consent for his
murder to be included in the "yearly homicides" statistic?

~~~
badrabbit
You are telling me how things are, I stated how they should be.

A criminal should give their consent if it is information otherwise considered
private. A simple count of crimes by the police is implied. You draw the line
when identifying information is collected and that individuals that are
identified have not implicitly granted consent. If you are not sure, ask for
consent

Let me use another analogy: If I am unsure a person has granted me permission
to have sex with them, I should ask for consent. If I am unsure that they are
of sound mind at the time, I wait until I am sure.

This is far from rape but the principle is the same. If you are unsure ask for
consent. If you are not sure murderers mind counting of their crimes ask for
consent (I think they'd love the publicity, I would personally be sure consent
is implied).

Also, yes the law should be changed but we are talking ethics not law
here,right?

~~~
wolco
Coming out and asking for sex or if consent is given might be culturely
inappropiate for many. A common approach is to slowly work up to that point
using body language to signal intent to proceed to the next level.

On your other point collecting publically available information is legal now..
changing the law to require permission now brings in ownership. Who can grant
permission.. should wildlife or objects be afforded the same protections?

~~~
badrabbit
Wildlife and objects can't give consent. We are talking ethics not law. The
law allows a lot of unethical things. But if it were a law a lot would be
determined by precedent , but some framework to criminally prosecute
individuap and corporate stalkers would be great.

------
ekimekim
I'm going to copy-paste from an older comment of mine
([https://news.ycombinator.com/item?id=22332136](https://news.ycombinator.com/item?id=22332136))
that I think captures my opinions on this question:

Client-side tracking, if you need any at all, should be a) high value, b)
aligned with my goals as a user, and c) as respectful as possible of my
privacy (eg. anonymising values, only taking what info you need).

The b) condition there is most nebulous - I'm mainly thinking of things like
reporting client-side javascript errors. This is aligned with my goal of your
site being bug-free so I can use it better. Another example would be an (opt-
in!) recommendation system that I find valuable. What would NOT be an example
of this would be tracking of my actions on the page in order to optimize the
chances that I'll engage with the content. Engagement is your priority, not
mine.

------
itronitron
A good faith start would be to start calling it data collection instead of the
industry accepted misnomer 'analytics' . If your platform is collecting data
and you aren't comfortable calling it data collection then maybe you should
reconsider what data you would be comfortable collecting.

------
zzo38computer
I think that:

\- Client side analytics are not helpful (and may make invalid assumptions).

\- Client side analytics waste energy, bandwidth, RAM, etc.

\- You should avoid other wastes too, such as including too many pictures,
CSS, scripts, animations, etc.

\- Client side analytics wrongly violate privacy.

\- In the case of client-side JavaScript errors, yes in that case it may be
helpful (sometimes), but it should ideally ask first. Errors should also be
displayed in the console window, so that the user can diagnose the errors by
themself.

\- A web page should be designed to work without JavaScripts and without CSS
as much as possible, although sometimes they are helpful (although a better
"user oriented" design should be needed rather than the "author oriented"
design; I have some ideas about how to do this).

\- Don't always use web pages! There is such thing also as Telnet, SSH,
Gopher, NNTP, plain text files (over whatever protocol), etc. (I use many
plain text files myself, actually.)

\- Let the user to write a comment (by email, perhaps). Otherwise, you will
just have to guess, and might not be able to. Even if you use client side
analytics, it cannot guess something that isn't there.

So, I use server side analytics instead, is much better.

(In the case of traffic jams: You can see how many cars they are; you do not
need to add a device on each car to count them, nor to read license numbers,
etc. A simple light sensors to see if the light is blocked by cars, would be
good to have.)

(In the case of stores, well they already need to count how many products have
been purchased, as they very well should. They need not track who purchased
each item, just to keep track of how many of each item has been sold. That
will allow them to restock and to bring in enough for everyone, and whatever
else they need to do. They can keep track of returns too.)

~~~
XCSme
> You can see how many cars they are; you do not need to add a device on each
> car to count them, nor to read license numbers, etc. A simple light sensors
> to see if the light is blocked by cars, would be good to have.

But you can also see license plates, car make and model, color. This is the
same for server-side analytics, where you can still see IP address, user agent
(browser), time of visit, etc. So you don't have to add anything to each
device either. As soon as you access a public website, I think they do have
the right to check and use the publicly available information. If this
information is so sensitive that it's considered private, shouldn't we instead
change the infrastructure so that this information is not actually publicly
visible? HTTPS is a good step in this direction, bute the IP/UA is still sent
to the remote server, which is sometimes useful for fraud prevention, anti-
spam, etc.

~~~
zzo38computer
Yes, the IP address and user agent string are sent, and the time will be known
of course, and I am not complaining about that; it is there, you can log it if
wanted (and use it for debugging/statistics/whatever), etc.

The User-Agent header should not be misused. A legitimate user may well access
it in a lot of different ways, some of which affect the User-Agent header, and
some of which affect other stuff. (If you need to determine what file format
to use, the Accept header should be used instead when possible. The User-Agent
header can still be used as a fall back, I suppose.)

In the case of cars, yes you can see the license plates, colours, etc, if you
are physically there and recording them on a paper or camera, or using
surveillance cameras, although for a somewhat different reason I am against
putting surveillance cameras everywhere.

Of course I am not stopping you from recording this data, although for
analyzing traffic congestion, I do not think the colours of the cars is
important (although their size and speed and classificiation (e.g. car, truck,
public bus, etc) (and perhaps how much noise it makes, if you want to consider
noise pollution) might be; I don't really know much about it, as I do not
drive a car; maybe you know better).

------
dylz
One of the nastiest things I've seen from the "analytics startups" espousing
how they're GDPR friendly and compliant and privacy friendly is that they
advertise stuff like CNAME cloaking, using random URLs or hostnames for data
collection, etc.

This is incredibly disgusting behaviour: the end-user has EXPLICITLY signaled
intent to opt out, gone out of their way to try and protect themselves while
blatantly signaling that intent that they do not want it, and the "privacy
respecting and caring new not-like-the-other-guys" data collection service is
attempting to repeatedly force itself on the end-user, sometimes trying
multiple times pretending to be different hostnames, lying about what it is,
etc.

I have seen bullshit "analytics" SaaS bruteforce its way through dozens of
generically-named cloudfront or akamai hostnames until they find one that
works.

No matter what you are collecting, this type of behaviour is evil and
abhorrent. Someone saying no a thousand times until you find a disguise that
works on them is not consent, or a yes.

> Are analytics evil, even if their sole purpose is to improve user
> experience? Let's say that we are building the infrastructure in a city, if
> we had stats on what streets have daily traffic jams, that would tell us a
> lot about the behavior of the people, but would also allow us to better
> direct traffic and reduce those issues.

Server-side analytics can do this fairly well.

But it is always a slippery slope - I have never, ever seen this proven
otherwise. You count (raw number only) on average travel time or # of cars at
an intersection. Okay. Nothing else - just when a car trips your wire you add
one to an integer associated with that day or something. Few people would have
an issue with this.

But now you want more. Now you set up licence plate scanners at every
intersection. Now you store them in a database so you can tell if the same car
is driving the same way and at what time each day. But wait, someone offers
you money for this data, possibly even more money if you also run recognition
and guess vehicle make/models in real time. And now you have a large database
of people correlated to where they are at what time open to retrieval by
others.

~~~
zzo38computer
> One of the nastiest things I've seen from the "analytics startups" espousing
> how they're GDPR friendly and compliant and privacy friendly is that they
> advertise stuff like CNAME cloaking, using random URLs or hostnames for data
> collection, etc.

> This is incredibly disgusting behaviour: the end-user has EXPLICITLY
> signaled intent to opt out, gone out of their way to try and protect
> themselves while blatantly signaling that intent that they do not want it,
> and the "privacy respecting and caring new not-like-the-other-guys" data
> collection service is attempting to repeatedly force itself on the end-user,
> sometimes trying multiple times pretending to be different hostnames, lying
> about what it is, etc.

> I have seen bullshit "analytics" SaaS bruteforce its way through dozens of
> generically-named cloudfront or akamai hostnames until they find one that
> works.

I agree with you; this is very bad. I might suggest for the user software to
allow setting a list of allowed requests and prohibit others (if random URLs
or hostnames are used, then this will block data collection, since if they try
it, it will just end up annoying the user keeping asking), and/or to have an
option tiat if too many improper accesses are attempted then all access will
be denied (if it tries to bruteforce it ways through dozens of stuff, then
none of them will work, especially if the browser pretends that the connection
is just really slow!!!).

