
The relevance of IP addresses in the tracking ecosystem [pdf] - lesterpig
https://hal.inria.fr/hal-02435622/document
======
jedberg
IPv6 improves this situation (now). At first, ipv6 was actually a lot worse,
since the back 1/2 of your address was your MAC address, allowing your device
to be tracked around the internet no matter where it went.

People quickly realized this flaw, and updated the standard so that basically
your client gets to pick the second 1/2 of your address now. And the nice
thing is, most major platforms will actually run multiple addresses in
parallel, allowing new connections to use a new address while old connections
keep using the old one.

So while ipv4 adds some protection by having multiple clients behind the
firewall, ipv6 actually makes it better by looking like _even more_ clients
behind the firewall.

Combined with a browser that blocks fingerprinting you get slightly better
privacy with ipv6.

~~~
Faaak
Absolutely not. Most ISPs will allocate you a fixed /64\. You may well have
privacy IPs in this /64, the prefix will always be the same.. A though day for
privacy activists

~~~
jedberg
Sure, just like with an IPv4 they allocate you a fixed /32.

But you get slightly more privacy by having the client able to randomize the
other 1/2 of the address and use multiple addresses, which would confuse
trackers. Or the trackers just look at the first /64 and ignore the rest and
you're no worse off than you were with your ipV4 /32.

~~~
saurik
You are worse off as now you have some introspection into computers on the
other side of that firewall; with NAT you could have thousands of computers
and they would all get melded together as one, but unless you very carefully
generate a new IP address for every single connection you make (which is how
you can get back to where you were with NAT), you now have the ability to
somewhat differentiate users who before would have been mixed. So it is at
best the same but probably worse, at which point why not use the thing that is
always at least as good if not much better?...

~~~
Avamander
Sounds like we need shorter IPv6 leases and more rotation between the
prefixes, but that somehow goes a bit against what IPv6 should provide us -
freedom to hold on to an address.

~~~
zrm
Some systems already have a solution for this, since devices can have multiple
IPv6 addresses at once. They have one permanent IPv6 address which is not used
for outgoing connections but can be used for incoming connections, and then
temporary IPv6 addresses used for outgoing connections which can be rotated
arbitrarily often. The first address is permanent but can only be used if you
already know it.

An improvement on this would for services to request the machine to hold a
specific address, permitted if the address is valid on the current network and
not already in use. Then services could each have their own address (generated
on first use and then requested when they start) which is effectively
permanent but not the same as the addresses used by other services on the same
machine.

------
jentulman
from the abstract....

"In this paper, we study the stability of the public IP addresses a user
device uses to communicate with our server. Over time, a same device
communicates with our server using a set of distinct IP addresses, but we find
that devices reuse some of their previous IP addresses for long periods of
time. We call this IP address retention and, the duration for which an IP
address is retained by a device, is named the IP address retention period. We
present an analysis of 34,488 unique public IP addresses collected from 2,230
users over a period of 111 days and we show that IP addresses remain a prime
vector for online tracking. 87 % of participants retain at least one IP
address for more than a month and 45 % of ISPs in our dataset allow keeping
the same IP address for more than 30 days."

------
AndyMcConachie
I worked on a document a few years back on anonymizing IP addresses. If you
find yourself in a situation where you need to balance anonymization of IP
addresses with research needs, this paper may be useful.

[https://www.icann.org/en/system/files/files/rssac-040-07aug1...](https://www.icann.org/en/system/files/files/rssac-040-07aug18-en.pdf)

------
kube-system
The big conclusion to be drawn from this paper is that IP address tracking can
be combined with client-side tracking techniques to perform reidentification.

------
api
IP is just one more data point. There are already so many ways a browser can
be fingerprinted, it doesn't make things that much worse.

While you can limit your exposure a bit, I long ago reached the conclusion
that strong privacy is impossible in the current client/server web model.
There is too much surface area.

------
annoyingnoob
I suppose this only works if you combine IP with some other information, like
username or browser fingerprint. Otherwise you could be tracking multiple
'users' at the same IP.

~~~
parhamn
The whole multiple-users-on-one-ipv4 thing always feels like it distracts from
what a tracker's goal really is. It is definitely beyond sufficient
identification and more than needed for a tracker to start targeting you with
ads and what not.

In some ways the fact that IP only tracking is lossy but still good enough is
something to be afraid of. It is easy to 'taint' an IP and your households
behind-the-scenes profile over a google search by your guest using your wifi.

~~~
annoyingnoob
Advertising that is relevant to my wife probably isn't relevant to me, using
IP is not good enough. The title and focus here is on _user tracking_ , which
is the goal I'm commenting about. Its quite common to have multiple users
behind the same IP, those users and their actions may or may not be related (I
have no idea what my co-workers might be doing for example but we share a
static IP).

I honestly didn't follow your Google search/wifi example.

~~~
virgilp
> using IP is not good enough

Define "not good enough"(for who is it not good enough? for advertisers?).

I used to work on a project linking cookies together in anonymous profiles
(for advertising), initially the plan was to separate household/individual
profiles using different heuristics, but I think eventually it turned out that
nobody cares that much - the advertisers just wanted a rough cross-device
profile, not perfect accuracy. I mean sure, for marketing purposes (as well as
engineering/ "performance" reasons) you needed figures to brag about accuracy
and whatnot. But it's really really hard to get those figures right, and
ultimately the thing that will convince advertisers is "using cross-device
profile improved conversion rate by 10%"; everything else is a detail in
comparison.

~~~
annoyingnoob
The title and focus here is on _user tracking_ , which is the goal I'm
commenting about.

~~~
virgilp
I have no idea what you mean by _user tracking_ , can you explain?

~~~
annoyingnoob
A user is usually defined as a human being. I'm a human being and I did not
come with a network connection. The world, generally, does not assign IP
addresses to human beings - thus an IP address cannot track me as a human, a
human computer user.

The paper/study here used a combination of IP address and a UUID to track
'users' which are presumably but not necessarily individual humans. It was the
use of this UUID that allowed the authors to draw conclusions about how often
a given user is seen at the same IP address. The IP address by itself did not
provide enough tracking to draw the conclusions in the paper.

Is IP address identification 'good enough' for advertising, maybe. But IP is
no where near definitive for identifying individual humans.

The paper does not say how often they found multiple 'users' using the same IP
address. The paper focuses on individual reuse of IPs. How often did they find
different users using the same IP in their study?

In internet advertising no one expects every ad to trigger a sale. Since there
is some amount of acceptable waste in the system this paper has decided that
IP is 'good enough' for targeting - because its okay if we are throwing away
20% or more of the ads (we didn't expect any traction from them anyway seems
to be the thinking).

And the authors admit that a larger or different dataset might show different
results.

~~~
virgilp
I was replying to your claim:

> Advertising that is relevant to my wife probably isn't relevant to me, using
> IP is not good enough.

Advertisers do not care about what you as a human do in the same way that e.g.
some government agencies might. Both the paper and you talked about "user
tracking" in the context of advertising. My claim is that "IP address
identification is actually 'good enough' for advertising" (for some kinds of
advertising, at least, but the discussion here is complex and I really don't
want to get into it - on short, the problem is not so much with the
'tracking'/building the profile, but with being confident enough that you're
not targeting the wrong person/household member during the delivery of the
more sensitive ads; and with not spooking/startling the user with "how the
hell do they know this about me" \- sometimes the user needs to be e.g. logged
in so that he understands why he received an ad).

In particular, there are large swaths of companies that don't care whether
it's you, your wife or your child - Disney e.g. will happily serve you ads
based on activity performed by your children or wife, and will make no effort
to figure out if it was you or them because it doesn't matter.

~~~
annoyingnoob
Its only good enough when you expect some non-trivial number of the ads to be
wasted. If you care about that waste then IP is not good enough.

For some brands, like Disney in you example, spraying ads everywhere is good
enough. For other brands that is not good enough, feminine hygiene products
would be an example (both age and gender specific where IP alone would be a
poor indicator).

~~~
virgilp
How do you even define "non-wasted ads"? One ad = one buy? One ad = one
interested person?

Targeted ads, even imperfectly-targeted ones, are "less wasted" than other
kinds of ads. They are not perfect, by far (thankfully, if I may add so!), but
"not good enough?" Come on, what exactly is good enough, then?

~~~
annoyingnoob
>Targeted ads, even imperfectly-targeted ones, are "less wasted" than other
kinds of ads.

How do you know that your 'targeting' is working? If you target females how do
you measure how many females saw the ad? Targeted ads should go to the
targeted audience or they likely went to waste - at least the advertiser
didn't get what they paid for. The quality of your targeted audiences matters
to your advertising customers.

~~~
virgilp
You get better conversion, in aggregate.

~~~
annoyingnoob
Even if that is true it comes at the expense of a lower signal to noise ratio
- you are adding noise.

------
Zenst
Most users will at the very least use two IP addresses - home broadband and
mobile SIM broadband.

Then you have wifi hotspots, friends wifi. The average user uses many IP's and
not limited to the range of one ISP.

SO whilst you can fingerprint devices and usage patterns, the IP address will
by itself be useless to identify such users, it may well augment a little but
is no solution.

But then IPv4 shares many IP addresses across mobile and broadband users in
various ways. Most do not have a fixed IP and even those that do, do not have
a fixed IP upon their mobile data activities - unless they VPN into their home
broadband. Though if they use service VPN offerings, then another layer of IP
ranges.

So the potential towards false assertions based upon an IP and user usage may
well trip up and fail. Imagine using an ISP with dynamic IP and the next user
of that IP uses it for crime, well with some bad logging and aggressive
association, you can mislabel somebody for a crime they did not commit.

Roll on IPv6 and with that, mobile carriers would of been the obvious benefit
of that, yet I'm not aware of any progressing that in any timely manner and
chug along using a pool of IPv4 and various tricks to make those cater for
many.

~~~
scared2
However in this paper the authors tried to show is it's stability over time.
So overall their "findings" indicate ip addresses should not be overlooked in
privacy protection.s they stated as follows:

"... Over time, a same device communicates with our server using a set of
distinct IP addresses, but we find that devices reuse some of their previous
IP addresses for long periods of time. We call this IP address retention."

~~~
Zenst
As somebody who tracks and logs their IP's given via ISP etc, I can attest,
not that distinct over time.

