Hacker News new | past | comments | ask | show | jobs | submit login
Oracle Data Marketplace (oracle.com)
96 points by auslander on May 29, 2018 | hide | past | favorite | 52 comments



It's also hilariously inaccurate. I'm a freelance media buyer of sorts, and it think I works in either manufacturing or HR at a company of 500. The hilariously awful levels of inaccuracies this contains is the reason why the Facebook/Google advertising empires chug on. They have the data you need to target ads correctly.

Check for yourself (This link also deletes your data from them if you're so inclined.): https://datacloudoptout.oracle.com/registry/


They came in to the last company that I worked at to pitch us on this. We spent somewhere around 200 million a year on advertising and they wanted a chunk of that.

Their pitch involved bringing roughly 30 people into a room for three of them to do demos/presentations/q&a. Definitely the shock and awe approach. Unfortunately for them, we weren't totally clueless. The claims they made about their data and what it would do for us were pretty laughable.

I don't know whose idea it was to let them pitch us but it was a total waste of a day.


Having sat in on my fair share of such ad sales pitches in the past, can you share any more details of the types of claims they made?


The most interesting thing to me was not only how wrong many of the data points were, but how many data points in the same category I matched. My income is one of three different brackets, my age is one of three different brackets, my location is one of three different wrong places (which I would expect to be the easiest to get right?) It’s like there is some heuristic that cuts off the top 2-3 categories. Wish I had more data points to see if I am an outlier or if that is actually how the model constructs this bag of labels.


Try your family :)


Datalogix data (now part of Oracle Data Cloud) is what powers the creation of Facebook advertising segments. So I don't think its that inaccurate.


Datalogix is just one of several third parties that were allowed to push matching data to Facebook for the creation of segments (partner categories), as well as Experian, Epsilon, Acxiom, etc.

The way it works is not very different from custom audiences, but they are syndicated to every FB marketing api user instead of private to your account and there's a revshare with FB based on usage.

So the matching has to be based on one (or more) of the keys listed here: https://developers.facebook.com/docs/marketing-api/audiences... plus a few extra ones like IDFA/IDFP, probably cookie matching, etc.

It's not hard to create a massive audience on Facebook if you're matching on public data like postcodes, but ultimately your understanding of that audience is based on how accurate your data is. Whether you believe that data companies have accurate data or not, it has nothing to do with their segment size on FB.

PS. the partner categories program has actually been discontinued as of April 2018, so these third parties will no longer be able to syndicate their data to all FB ads users in the future.


> PS. the partner categories program has actually been discontinued as of April 2018, so these third parties will no longer be able to syndicate their data to all FB ads users in the future.

One thing to point out is that it's true that the Partner program itself has been discontinued, which gets rid of all the 3rd party targeting categories within the self-service interface using Facebook-managed data integrations with those firms.

But the ToS[1] rewrite effective 5/25/18 goes out of it's way to ensure that they don't outright prohibit the use of third party data. They just effectively decentralized the usage of it and require ad buyers and the data brokers to have a direct relationship now, and leverage the custom audience functionality you mentioned to do the targeting. And in so doing allowed themselves to become willfully blind to the usage of 3rd party data plus shifting liability onto the advertiser for usage of it.

[1] https://www.facebook.com/ads/manage/customaudiences/tos


I primarily deal in Facebook for most of my clients. The 3rd party behavior targets etc.are notoriously inaccurate. Most bigger Facebook advertisers use lookalike audiences and Facebook first party data instead.


Nope. FB has their own models. They used to allow DLX to sync and provide their audiences too but discontinued now.


IIRC FB only uses (used) Datalogix to match uploaded email lists... plus maybe some consumer cost-per-action ad influence segments.


Now imagine companies out there secretly throwing away your resume or rejecting your loan request based on incorrect data like this.

It's not something a web dev will generally need to worry about but for working class blue-collar citizens this is a real nightmare they unknowingly facing.


I posted this a few days ago, but it bears repeating;

If insurance companies and employers use that information (as well as creditors) it become subject to the FCRA; as well as subject to state, local, and federal laws about lending, insurance underwriting, and employment. This information merely existing (for advertising purposes) doesn't suddenly make it legal to use it in specifically illegal ways.

For example, it's a standard practice to use a person's credit history and credit score during insurance underwriting - data shows people who are responsible with credit have less claims, so they have lower premiums. However, say you live in Massachusetts. In Massachusetts it's illegal for insurance companies to take your credit history and credit score into account when determining policy premiums. So insurance companies are not allowed to access your credit if you live in Massachusetts even though that information is widely available and used in other states.


I said "secretly" for a reason. Because every company does everything by the book ;)


Checked - 'No Data available for this browser' - but I'm not typical web user :)


Probably means you use Safari (or an ad blocker). I get a message saying that they can’t track you anyways on iOS safari


Its VPN, Private Tabs (no cookies), uBlock Origin in Medium blocking mode. I guess that's enough.


I have the same result, uBlock Origin on its own is enough.


I got the same message and I'm using Chrome, no ad blocker and I'm not in incognito mode.


uBlock Origin alone is probably enough (although I have PrivacyBadger too). The data wouldn't even load for me until I turned off uBlock haha (I get no data as well).


Must be getting slammed, all I see is "Your data is loading"


It does that if you have an ad-blocker.


Which domains do I need to unblock in uBlock Origin in order to make that app work?


Nothing in Safari and hilariously wrong data in Chrome.


Just thinking, if I would be Oracle, how much data would I put in this page? :)


I get a certificate error on that.


The Oracle Data Marketplace is the world's largest third-party data marketplace and the standard for open and transparent audience data trading. ... data providers offer more than 30,000 data attributes to power your branding ... actionable audience data on more than 300 million users.

That's over 80% of the entire US internet population at your fingertips. ... a range of data ... some of which are exclusive and not available anywhere else. ... Eighty percent of the top 20 ad networks, portals, trading desks, and creative optimizers leverage data from the Oracle Data Marketplace platform to run high-performance ad campaigns.

Equifax? They were kids.


> Users who have demonstrated intent through ... searches

I know marketing always makes exaggerated claims, but to claim that a search implies intent is such blatant nonsense. At least later on they make claims about "interest", which at least could be true, for very broad definitions of "interest".

(As auslander already said, aggregating personal data into huge databases like this invites Equifax-like data theft.)


I've probably spent $50K on media over the past 9 years on media that was targeted using BlueKai data, and worked on the publisher side where their pixel fired billions of times. I remember the early days when its founder Omar was running around the globe selling this data as "Stamps" because they were charging so much for it reaching a single person cost as much as direct mail, printed and delivered.

AFAIK, a web request is sent from a consumer's browser with some form of identity, ie cookie ID, fingerprint etc. The hash of it gets sent to BlueKai when there is a bid request, and it then queries for keys associated with that hashed ID. The bidder (a service that makes purchase decisions on bid requests) returns a value for that data, and if its the highest value among all advertisers bidding on that slot the ad is shown.

The advertiser never sees any of the underlying data. They are able to see how many bids they won against the pool of data. Say you wanted to sell some organic sausages via banner ads- you would purchase impressions from a publisher, ad network, or agency, and pay an extra $2 CPM (mind you most media clears around $1) to BlueKai to show ads against an Organic Food Buyers audience. That audience is most likely derived from grocery store club card data which include things like recent purchases, brand preferences and more. I'm amazed that more people don't care about that form of data collection...

How often do you have to buy organic products to be entered into the targeting pool? 50% of your shopping cart, or 1 item ever? Literally nobody except for BlueKai knows. There is no oversight, and as such the data is almost certainly low quality.

I came to the conclusion after all of that spend against at least 100 different audiences that its almost universally more efficient to buy cheaper banner ads in bulk. It was often the case that media targeted with 3rd party data would perform twice as well as untargeted media, but for 5 times the cost.

Still, I think the concept of audiences, not raw pii, has been a decent compromise for the sake of privacy, and the net effect of all of it is that some marketing budgets got blown on ads that could have been otherwise targeted.

That is one hell of a security risk though to have all of that data sitting out there.

I got so fed up with the inequity of digital marketing I've done my best to move myself away from it. If I were building an advertising business today, the consumer would be an equal partner in all data transactions, for two reasons: 1. Fairness 2. Data quality.


> move myself away from it

I stop approaches from ad-tech companies myself.

> the consumer would be an equal partner in all data transactions, for two reasons: 1. Fairness 2. Data quality

why would a consumer be interested in cooperating? No one likes ads, right? :) The businesses milk data out any way they could, and share it. People defend themselves and share the knowledge. Its how it is now, and looks like will be in future.

Browser companies may be on either side, today I'd stick with Firefox and Safari.


> why would a consumer be interested in cooperating? No one likes ads, right?

Let's assume I can't block ads or choose not to block them to support site/service I'm using.

If that was the case I'd rather see relevant ads as long as they were targeted based on a subset of information about myself I've decided to provide, to the parties of my choosing and with usage restricted to that single purpose only.


Yeah, but you talking about supporting a website/service. I replied to the 'If I were building an advertising business today ...'


I do like useful ads. One of my biggest resentments with Google is that once they started targeting their ads onto me, their ads just lost all the value they had, and they were very valuable before it.

And I'm sure I am not alone. Before the internet, people used to buy ad catalogues. They would go out, pay a bit, just to get a collection of ads home.


This is where the scum of the adtech industry lies. While Google and FB collect a lot of user data, they only keep it to themselves and use the data BOTH for better advertiser targeting as well as for customizing user experiences.

Oracle BlueKai, which was an acquisition, collects data from various sites users visits (via cookies and their SDK) and explicitly sells that data to other ad networks. Ad networks, outside a few of them, don't have a lot of data to target and thus rely on 3rd party data brokers such as BlueKai to fulfill this need. BlueKai also have tie ups directly with the advertisers, especially brand advertisers such as Unilever etc., where these advertisers require the ad networks who are advertising for them to use BlueKai customer segments (which they created using user data) for targeting purposes. And all of this is completely invisible to the user.


It gets worse:

- Personal information that is collected offline and that can directly identify you may include, for example:

- name and physical address, email addresses, and telephone numbers;

www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html


Why specifically is it worse than FB? If all those ad networks merged into BlueKai aka Oracle, so the data was kept "internal", would that be better?


Its unclear. I guess his idea was that FB collects only data you put into FB, while BlueKai collects data from many trackers embedded into majority of websites.

Unclear because Google and FB have its hand (trackers like Like button and pixels) in many websites as well. In uBlock Origin, enable advanced mode and will see tens of tracking domains on pages like newyorker.com online stores et cetera.


Facebook has share buttons on all the world's media. Each one of those views is almost certainly recorded by Facebook, even if you don't have an account. The only way to stop it is put a DNS block ( hosts entry ) on your machines. If you think Facebook records less data than bluekai, I think that is very very unlikely.


It is worse because BlueKai is a data broker and sells that data to anyone. You can literally buy data from BK for various user segments (and that is their business model). FB, otoh, while collecting the data keeps it only to themselves and shares it with no-one (not even the advertiser).


> they only keep it to themselves and use the data BOTH for better advertiser targeting as well as for customizing user experiences.

And for amplifying political misinformation campaigns and conspiracy theories, and occasionally they let some of that data leak. But other than that they're fine. After all, they are mega corporations whose business model is altruism.


... still digging ... FAQ:

- On average, how many categories does a unique user belong to?

- We see 750 million unique users per month with an average of 10-15 attributes per user.

- How long after a user qualifies for a category will that category last in their cookie?

- Categories are stored per user for 90 days. This is the legally allowed time limit. It is also a rolling 90 day period ... if the user is back online on day 2 then the activity counter resets.

It was last addition to my defences - private browsing tabs only, closing tab clears cookies and storages.


Is this a new discovery for you? Third-party cookie data is everywhere, and usually very messy if not completely inaccurate.


No, but what about first-party cookies? And all sorts of local storages? I don't need them too.


1st-party data is different from 3rd-party data, regardless of how it's physically stored.

I'm not sure what your question is though, are you asking how to protect yourself? It seems you already have it covered if you delete cookies.


Facebook's collection system is through share buttons. Those share buttons do cookie share because the domain is Facebook, but the technology is certainly there to track by ip address and device fingerprint, even if you are not a fb user, and a single fb login redropps the cookies and allows association of your internet media behavior with your FB account and behavior.


HN mods changed title. Original title was:

Oracle BlueKai, audience data on 300+ million users, over 80% of the US


The new title seems inaccurate, why did MODS change the title to be vague and less specific? Seems shady practice to me.


The guidelines say to use the original title of the link, unless that title is misleading or click bait. The original title was expressing an opinion.


The original is direct quotes from the top of the page.

What opinion?


Is this new? Because Oracle couldn't possibly choose a worse release date directly following GDPR-day/-week, underlining its relevance. But maybe they're on to something because they're able to sell data when it's getting scarce?


Not new at all. Not sure why this was posted today though. Here is a similar marketplace / solution from one of their biggest competitors, Neustar https://www.marketing.neustar/identity-data-management-platf...

Adobe is also very much in this space.


Where can I get a link that does what I'm reading in the comments. The link sends me to a doc page / what link for them to share their guesses about me please?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: