
Oracle’s BlueKai tracks people across the web – that data spilled online - roldie
https://techcrunch.com/2020/06/19/oracle-bluekai-web-tracking/
======
the_duke
An important feature of Firefox that isn't nearly as well known as it should
be is first party isolation.

It can be a bit annoying due to eg Recaptcha, but it works fine 99% of the
time. Some sites will break, but these are far and few between. I just use
another browser for those.

It was upstreamed from TOR and will scope all data (cookies, caches, etc) per
domain. It's not a panacea, but definitely a good step.

(about:config , privacy.firstparty.isolate)

I feel like something like this should be the (standardized) behaviour of
browsers. Sharing any kind of observable state between domains should require
an explicit opt-in with a prompt.

Like: "Privacy Warning: techcrunch.com wants to share your data with
facebook.com. Do you want to allow this?"

(such fine grained permissions are a complicated topic of course, but that's a
longer discussion...)

~~~
GoblinSlayer
I just disabled third party cookies and everything works fine.

~~~
kohtatsu
Thankfully this became the default in Safari:
[https://webkit.org/blog/10218/full-third-party-cookie-
blocki...](https://webkit.org/blog/10218/full-third-party-cookie-blocking-and-
more/)

Previously it only enabled third-party cookies for domains you had explicitly
visited. [https://www.theverge.com/2017/9/14/16308138/apple-
safari-11-...](https://www.theverge.com/2017/9/14/16308138/apple-
safari-11-advertiser-groups-cookie-tracking-letter)

~~~
mywittyname
Unfortunately, people are working out ways to get around this now that it's
become a default.

~~~
kevin_thibedeau
Use a generic JS blocker. It protects you from workarounds and zero days.

~~~
Wowfunhappy
It also blocks JS.

~~~
PhantomGremlin
You say that like it's a bad thing?

------
soared
I worked at oracle on the bluekai product (search 'oracle' in my comment
history). I helped deliver data using bluekai from oracle to platforms like
facebook and google. AMA.

If you are curious what data is housed in bluekai, here is a 170 page pdf with
lists and descriptions of different data providers/vendors:
[http://www.oracle.com/us/solutions/cloud/data-
directory-2810...](http://www.oracle.com/us/solutions/cloud/data-
directory-2810741.pdf)

~~~
sailfast
What is the best way to opt-out / avoid tracking like this? The Oracle opt-out
page requires providing your physical address / email / name and then only
sets a cookie in your single browser.

~~~
soared
A good adblocker and using this tool on all of your devices:
[https://optout.networkadvertising.org/?c=1](https://optout.networkadvertising.org/?c=1)

Its difficult to actually opt out in full, but we will definitely get to that
point in the future. Also if you use niche hardware/software (like netscape on
an old linux distro) you're data will just get cleansed out. Nobody does
fingerprinting unless you're a spy or in charge of purchasing for a hospital
or something, so I wouldn't worry about that.

~~~
jefftk
_> Nobody does fingerprinting ... so I wouldn't worry about that._

I'm not sure how long you've been out, but this has changed a lot recently.
Many adtech companies (not us) have responded to ITP by building out
fingerprinting to continue personalization.

(Disclosure: I work on ads at Google)

~~~
soared
Yeah that makes sense. As far as I was aware only the low-tier dmp/cross-
device/measurement vendors did, but all the top tier dsps vetted their vendors
and don't allow that.

------
ogre_codes
When I tell people I don't like tracking the defense of all this nonsense is
"People prefer getting relevant adverts". The campaign to normalize this is so
good I often here consumers repeating it.

But the inherent assumption is that the data is only used for benign ad
targeting and ignores the possibility that the data will end up in the wrong
hands like it has here. The response seems to be " _{{ad tech company}}_ has
some of the best developers in the world and they want to protect that data!".
Except leaks like this are inevitable. So long as tracking is a thing, data
will leak, and hostile actors will abuse it.

~~~
reaperducer
_When I tell people I don 't like tracking the defense of all this nonsense is
"People prefer getting relevant adverts"_

I then ask them, "Is it working? Do you really see advertisements online that
mostly show you things that are meaningful to you? Are online ads more
relevant to you than what you see on TV or hear on the radio?"

The honest always answer "No."

~~~
soared
How many ads for hearing ads you have seen on TV? And how many ads for hearing
ads have you seen online?

The honest answer is a lot, and none.

~~~
ogre_codes
> How many ads for hearing ads you have seen on TV?

When I watch Redbull TV I see zero hearing aid adverts. That's about the only
commercial TV I watch. Likewise when I more frequently watched commercial TV
15 years ago, I didn't see a ton. In the years since, the demographics of
network TV viewership have shifted massively and the over 65 crowd is
massively over-represented. As demographics switched advertising switched.

I suppose when I occasionally go to CNN or CNBC for news, I see hearing aid
advertising, but the sites I generally frequent (should) know their readership
better than blast us with that sort of advertising.

Which is ultimately the point _context_ based advertising delivers 90% of the
benefits of target/ tracking based advertising without the performance
degradation, risks, and other nonsense.

~~~
soared
That all makes sense, but your missing the monetary aspect. If I know NYT
readers are my target market, I want to advertise on the NYT. Except thats
absurdly expensive so I can just reach users I know read the NYT, but are
currently on other websites. Same audience, much cheaper.

There are a lot of caveats like this where price and business reasons play a
role. If it weren't a $20B industry then your logic would probably ring more
true.

~~~
ogre_codes
Making something less expensive to advertisers is not my problem. People
sucking up piles of personal information about me is my concern.

What you are missing here is by giving people the ability to advertise to New
York Times users offsite takes income from the Times. If you read the NYT or
you _are_ the New York Time, this should be concerning. As a reader, you want
the the value of their brand to flow back to them so they can continue doing
better work.

~~~
aleppe7766
TL:DR If you want to do any good to a newspaper, buy a subscription.

Advertising hasn’t been enough since the late 2000s. More monetization options
arised and were rapidly adopted because news venues were hungry for them, when
they figured that reservation (buying ads on a specific website with no
control on delivery and performance) was declining. The creepiness of all
this, and the sense of entitlement of a large part of the audience, fueled the
rise of Adblock, which required even more creepiness and gave birth to the
present cat&mouse game.

~~~
ogre_codes
> If you want to do any good to a newspaper, buy a subscription.

My point wasn't that you shouldn't subscribe to the content you want—you
should. It was that I have zero interest in sacrificing anything to lower
advertising costs for some random company. Nor do I care to increase Google/
Facebook's earnings potential because they spy on me.

------
themodelplumber
> But BlueKai also uses more covert tactics like allowing websites to embed
> invisible pixel-sized images to collect information about you as soon as you
> open the page — hardware, operating system, browser and any information
> about the network connection.

View-source

Ctrl-f "pixel"

:-)

This seems really odd to include, given Techcrunch's use of pixels, and
whatever their own covert pixel tactics may be...invisible pixel images are
part and parcel when you start talking about pixels in general. (Edit: The
author seems to have addressed this)

Also, is that _Kai_ as in the Japanese word for ocean? Or is it _Kai_ as in
meeting?

Perhaps the entire word is a Japanese derivative, with buruukai meant to
connote a gathering of nervous, fearful Oracle executives? Not the most
auspicious reading...

~~~
andylynch
To be fair, the author does touch on this, giving the fact that this very
article has a Bluekai tracker as an example of its prevalence.

~~~
themodelplumber
Thank you, I missed that on my first pass.

------
alfalfasprout
Oh, BlueKai is only the tip of the iceberg. I used to work for a spammer (that
called itself an "incubator") that used blukai among other better sources.

3rd party cookie impressions can be sent to a company like LiveRamp to
identify what websites users have visited (in theory only in buckets, but
that's trivial to bypass). With the exception of the very largest websites,
you could see traffic to many, many websites and tie it back to an individual
user (even if their IP changed, etc.).

Companies like FullContact and People Data Labs offer the ability to take the
few pieces of information you have a user and get their other info (eg; verify
email using name and address, or get social media profiles, etc.).

The problem is no matter how good you are about using VPNs, clearing cookies,
etc. the weak link are the websites you're forced to give some info to. Many
of them _will_ well you out even if their TOS say otherwise.

~~~
jefftk
_> I used to work for a spammer (that called itself an "incubator")_

CogoLabs?

~~~
alfalfasprout
small world ;)

~~~
jefftk
I was there about 10 years ago, and wrote a bunch of the initial email stuff.
At the time, we sent email only for things that people explicitly signed up
for (Groupon competitor), but...

~~~
alfalfasprout
Yeah, that changed completely haha.

------
hamax
If you want to check out what bluekai knows about your browser:
[https://datacloudoptout.oracle.com/registry/](https://datacloudoptout.oracle.com/registry/)

~~~
bitxbitxbitcoin
What kind of information do others see? I am currently getting: "No data
available for this browser."

~~~
lucb1e
For me it just keeps spinning at "loading your data" with a bunch of CORS
errors in the developer console.

------
gnicholas
I have a paid Chrome extension and recently implemented an uninstall survey to
see why people are leaving. For people who indicate that cost is an issue, we
explain that many "free" products are either supported by advertisements or by
selling user data. We then ask people if we offered such a "free" option if
they would choose that over our paid option (which costs less than $2/mo).

Almost all of the respondents indicate that they would choose the "free"
version. I was surprised that the feedback was this one-sided, especially
after making the tradeoff salient (and especially for a browser plugin that
sees every page you visit). Apparently most people just don't care about their
privacy.

~~~
tomComb
There are some very important distinctions there.

Maybe you said your free version would use targeted ads - which I'm fine with
personally.

It's selling data or being dishonest about how the data is used that I have a
problem with, so maybe the problem is that your users trust you so they are
comfortable with you having and using their data.

~~~
biggestdummy
How, exactly, are the ads supposed to be targeted if data isn't being sold?
How is Fiat supposed to learn that you are interested in buying a car, if
there's no tracking and sale of your browsing history? Not saying that being
tracked is good. Just that saying "I'm ok with targeted ads, but not ok with
tracking" doesn't make a lot of sense.

~~~
jefftk
Ad personalization can be handled by the ad network. The network contract they
are interested in buying a car, and Fiat can request their ads be shown to
people interested in buying a car. Fiat only learns that you are in the
"people interested in buying a car" category if they win the bidding to show
you an ad. Not great, since showing an ad, especially well out of the
viewport, is pretty cheap.

We can take this one step further with the proposed TURTLEDOV browser API
([https://github.com/michaelkleber/turtledove](https://github.com/michaelkleber/turtledove)),
at which point the ad network doesn't learn your interests and Fiat only
learns you were interested in buying a car if you click on an ad.

Or with the proposed FLoC browser API
([https://github.com/jkarlin/floc](https://github.com/jkarlin/floc)) where
your browser uses on-device clustering to present an interest category instead
of the ad network learning interests server-side.

(Disclosure: I work on ads at Google, and am friends with the folks behind
these proposals)

------
advisedwang
The wide range of data the article hints at and the __guesses __at connecting
data to users suggests that this data is probably full of inaccuracies. That
is one of the really scary things - companies are making decision about me
based on outright wrong data.

This can be brutal for applications like credit worthiness, but I'm still
worried about mistakes in data used for more mundane decisions like who to
offer a discount to or which passenger to bump to business class.

~~~
soared
This type of data isn't really used for business decisions, since an audience
like "good credit score" (highly highly regulated and almost never used, fyi)
contains like 150MM unique identifiers (cookies, device ids, etc).

Everyone using this data knows it contains inaccuracies, but it is much much
better than not having any data at all.

------
soared
Through no fault of the author, he is confusing how bluekai operates. He talks
a lot about the sanitized huge anonymous audience (like in-market for cars)
but then skips to user-specific data (Steve who spent $10 gambling on 4/19.)

These two things, 3rd party and 1st party data, are very different. Stored
differently, different user access, different rules, etc. If Oracle
accidentally made all their client's 1st party data public they'd get sued out
of existence.

~~~
manigandham
Yes, although at this point I expect any adtech articles to be 90% inaccurate.

The real issue is the poor security practices of this company, especially
under Oracle's watch, and how much data is shared by partners. BlueKai can't
get personal details unless someone shares it with them.

~~~
soared
Agreed. Oracle sends their dataset to like 200 different partners and it’s in
a constant state of being updated, so it’s not surprising one of those was
misconfigured.

------
rbecker
> There’s no big conspiracy. Ad tech can be creepily accurate.

> Tech giant Oracle is one of a few companies in Silicon Valley that has near-
> perfected the art of tracking people across the internet. The company has
> spent a decade and billions of dollars buying startups to build its very own
> panopticon of users’ web browsing data.

Doesn't that exactly describe a big conspiracy?

------
gdsdfe
as if we need more reasons to hate Oracle

------
DeepYogurt
Anyone know where to find the dataset?

