
How browsers get to know you in milliseconds - wallflower
http://radar.oreilly.com/2014/12/how-browsers-get-to-know-you-in-milliseconds.html
======
morgante
Shame on O'Reilly for burying the sponsored content notice at the very bottom
of the post. While native advertising & sponsored content is a great strategy
these days, disclaimer is usually expected _before_ people start reading.

> This post is part of a collaboration between Aerospike and O’Reilly Media.
> Read our statement of editorial independence.

------
javery
Quick poll - how many of you noticed this was a sponsored article brought to
you by Aerospike?

~~~
queryly
I didn't notice it until you mentioned it. I didn't feel cheated though since
I got good amount of info from the article and was curious about how Aerospike
did it.

I wish all ads were like this.

------
realusername
When I read articles like these, it always makes me sad to see the amount of
incredibly complex technologies dedicated to advertising. I hate to be that
guy but I'm sure these resources could have a better usage.

~~~
nols
Advertising currently provides most of the revenue for many Internet sites.
Until another major revenue source comes along a significant effort will still
be put towards optimizing advertising.

~~~
realusername
Yeah I know... It's even worse than that actually, since advertising is not a
zero sum game (if you have two companies with 50% market share, they will get
less compared to a company with all the market), any alternative would just
decrease significantly the value of advertising and therefore, no alternative
can emerge.

That is why there is no option to pay for Google services instead of the
tracking. I don't think there is any solution to this, unless the share of
AdBlock users would reach a tremendous level.

~~~
aiiane
[https://www.google.com/contributor/welcome/](https://www.google.com/contributor/welcome/)

~~~
hobs
Thats interesting, but it just seems like mining additional money out of
people, they still get to keep all the data even if they are not displaying
the advertising.

------
anonu
This stuff is always fascinating to me, especially the "capital markets"
approach to buying/selling ads. I saw an article a few weeks back on how
traders are arbitraging different ad exchanges in the same way high-frequency
traders do so... You can buy ads for cheap in 1 place and sell them
immediately at a higher price somewhere else. I'd be curious if anyone has a
high-level idea of how this works.

~~~
alphonse23
nope. no one can:
[http://www.slate.com/articles/technology/technology/2014/06/...](http://www.slate.com/articles/technology/technology/2014/06/online_advertising_effectiveness_for_large_brands_online_ads_may_be_worthless.html)
Just another bubble in the making -- just be ready to pull out at the right
time.

------
PhantomGremlin
Nobody has a problem with the title? Yes I know it's directly from the linked
article.

My _browser_ already knows everything about me. It's the web server (and all
the associated ad servers) that "get to know" visitors in milliseconds.

Also, my browsing is much much faster than what the article mentions. That's
because I don't bother with Adblock or Ghostery. Instead, I simply use
NoScript on most websites. The (sponsored) article was fully readable w/o
JavaScript. So, while the timeline discussed was interesting in the abstract,
the flurry of backstage activity didn't occur when I viewed the article. Nor
were any ads displayed. Thankfully.

------
liotier
Block ads & third party trackers to spare your browser the useless effort.
Thanks Adblock & Ghostery !

~~~
switzer
The goal is for display ads to not suck to the point where Adblock, Adblock
Plus, and Ghostery are required.

For better or worse, advertising pays for the salaries of the people who write
the stories and build the websites and apps that people use. The goal is not
to remove ads altogether but to improve display advertising so that it is a
positive improvement to a site rather than the negative effect that it has on
user experience today (yes, advertising can be positive addition to content -
ask anyone who reads Vogue magazine).

Google won search advertising because ads were relevant, looked good, and it
was simple for advertisers to reach the users they wanted. The major reason
for Google's success is because they owned every piece in the advertising
stack:

\- The content (e.g. google.com) \- The adserver \- The auction engine \-
Fraud detection (Google's quality team's success at removing fraud meant that
advertisers trusted that their messages were reaching users) \- Creative
adserver (e.g. text ads and creatives are served by Google)

In display advertising, each of these pieces is operated by a different party.
This means that integration between each of the pieces is not a tight, giving
users and advertisers a low quality experience. Here are some of the effects
of this:

1\. Ads generally cannot be tailored to the website where they run. Publishers
do not have the ability to tailor ads to work within their environment like
they can with Google Adsense (native advertising is starting to change this
though) - so most websites look bad with display ads.

2\. There is massive data leakage from ads. Because a 3rd party adserver is
serving the ad on a publisher site, it is nearly impossible for publishers to
keep their customer data from leaking to everyone in the ad stack, which
quickly becomes everybody in the ad industry.

Because ad networks that buy the publisher ad space from a publisher almost
never actually serve the ad creative (they just serve an ad tag to another
network), a massive circle of redirects where a single ad can be bought and
sold over 100 times before an ad is displayed. This means that the publisher
is giving their customer data to 100 companies before they get an ad on a
page.

3\. There is no industry wide coordinated ad fraud effort that is successful.
There are so many ways for ad fraud to occur in many different parts of the ad
stack, that a single company that tries to reduce fraud from a single point in
the ad stack will only have limited success.

4\. Ad creative does not have rigid controls where the publisher can let the
advertiser know what can and cannot be done on their site. This means that
either an ad seems to 'take over' the publisher page without adding to the
user experience, or an ad is reduced to a backup image, which is not a
pleasing experience.

A reduced set of advertising features needs to be done much better in order
for display advertising budgets grow from where they are now.

~~~
fspeech
The two times that Google succeeded in getting my family to follow through on
ads:

1) My 10 year old, in trying to download ITunes, clicked on a download site ad
and installed malware that I had to rescue him from while he was in tears;

2) My wife while trying to pay for vehicle license renewal clicked on a DMV
look-alike site that attempted to charge her $30 handling fee for something
that DMV does for free.

The other cases for their clicking on Google ads are mostly well known
websites such as Amazon defensively placing ads to protect those who couldn't
distinguish an url from a search query in the address bar or an ad from an
unpaid search result.

So it seems to me the ad model is really built on preying on the
technologically naive to subsidize the technologically savvy. While in general
the technologically sophisticated should be able to charge for their services
to the technologically naive, it would be better if it could be done with
informed consent instead of trickery.

~~~
shostack
How do you know the first ad was even on the Google Display Network? You
sitting there watching him click it and following the redirects seems to be
the only way you could know that. In which case you allowed him to click and
I'd argue it was not his fault you got malware, but yours.

Your second example also sounds like a situation where you heard a (likely
inaccurate) complaint from a second party, and are attributing it to a sketchy
ad when in reality it could very well have been a site that ranked well
organically and had affiliate links to the look-alike site.

You statement of "the other cases for their clicking on Google ads" could use
some clarification. The digital media space is vast, and while Google has a
large market share, they are not the only players, so I'd love to know how you
can attribute both of these examples (which sound like you were not present to
observe and have info from less savvy second parties) to Google.

Publishers absolutely have done shady things to drive ad revenue. You'd be
shocked at how many attempt to arbitrage cheap clicks from low-quality traffic
sources straight into high CPM impressions that they can bundle into direct
buys. There are many other things that go on.

What frustrates me with these threads is when people try to summarize an
entire industry's best and worst practices by a couple anecdotal experiences
using language that indicates a lack of knowledge of said industry, and
frankly lacks any real credibility based on the information provided. That
sort of comment isn't really constructive--it is jumping to poorly informed
conclusions.

Count-point...if I run ads driving people to awesome video tutorials about a
somewhat complex product, and it shows a high CTR and engagement rate, am I
evil? I'm not holding a gun to anyone's head, and people's actions speak
louder than words.

~~~
scholia
Scamming via lookalike sites is very common, and Google should be ashamed of
itself for its part in conning people out of money for no reason except to
line its own pockets. In the UK:

"Google is coming under increasing pressure over taking money to promote
copycat websites. Among the more printable comments sent to us this week from
readers who have fallen for taxreturngateway.com was: "I believe Google is
implicated in this as it is well aware of what's happening and why – and it
too should be prosecuted". Others ask why Google couldn't at least move
copycat sites to below the official sites on its search results page."
[http://www.theguardian.com/money/2014/jan/30/tax-return-
pass...](http://www.theguardian.com/money/2014/jan/30/tax-return-passport-
health-copycat-websites)

I had to deflect my own wife from one of these, because of the deceptive way
Google was displaying ads above organic search. I've experienced the second --
the (failed) attempt to install malware -- myself.

------
Judgement
Does anyone have any other reading recommendations on this topic, or
advertising recommendation in general? How are different websites able to know
what content you are interested- who is storing this data on people?

~~~
ctrl_freak
> Does anyone have any other reading recommendations on this topic, or
> advertising recommendation in general?

Sure. What's being discussed in this article is Real Time Bidding:
[http://en.wikipedia.org/wiki/Real-
time_bidding](http://en.wikipedia.org/wiki/Real-time_bidding) . You can read
up about the specifics of different Real Time Bidding exchanges, for example
here's Google's AdExchange protocol: [https://developers.google.com/ad-
exchange/rtb/](https://developers.google.com/ad-exchange/rtb/)

> How are different websites able to know what content you are interested- who
> is storing this data on people?

The websites that host the actual content don't usually care about what ads
are being shown to you. They just put code in from an ad exchange and let them
handle the rest. The buyers (the ones who run the bidders) are the ones that
are interested in figuring out what to show you. For example, many bidders are
interested in doing what's called Behavioural Retargeting (
[http://en.wikipedia.org/wiki/Behavioral_retargeting](http://en.wikipedia.org/wiki/Behavioral_retargeting)
). Say you visit a shoes website. Then later you go on Facebook. The moment
you go on Facebook, their ad exchange FBX will send a bid request to all the
bidders. If the shoes website has partnered with that ad exchange, when they
receive the bid request, they can identify you by your cookie, and they can
bid and try to show you an ad to get you to come back to their site. Typically
a shoes website wouldn't have the infrastructure to do real time bidding so
they get a third party to do the bidding on their behalf, e.g. AdRoll (
[https://www.adroll.com/retargeting](https://www.adroll.com/retargeting) ).

~~~
Judgement
Thank you.

And, they can identify you through other methods if you have cookies
thoroughly disabled, as mentioned in the article?

If they have no data on you, I assume ad exchanges supply default
advertisements, or there are bidders on default ads?

~~~
ctrl_freak
> If they have no data on you, I assume ad exchanges supply default
> advertisements, or there are bidders on default ads?

To be clear on what happens here: the ad exchange itself isn't usually
interested in identifying you per se, it's the bidders. So for example the ad
exchange might send the bidders a bid request that has some information like
this:

Country, user agent, time zone, website that will be showing the ad,
size/location of the ad, etc.

Then it's up to the bidders to try to figure out how much this impression
would be worth and then bid appropriately. Even if they can't 'identify' the
user as someone they've seen before, they might still bid on the ad because
the website showing the ad might be relevant to the product they're trying to
sell. If the only information they have about a user is what is passed in the
bid request they would probably bid lower.

I don't know what the behaviour is for when an a bid request doesn't receive
any bids, but it would be exchange-dependent, and likely rare on the major
exchanges.

~~~
Judgement
I see, thank you for the information!

------
super_sloth
As an aside, does anyone have any experience using Aerospike?

Any impartial thoughts?

------
known
[https://panopticlick.eff.org/](https://panopticlick.eff.org/)

------
stblack
Is it just me or do the times listed in this article seem implausible?

~~~
nemothekid
The big question mark is the latency. The lookups are/should be really fast -
as I guess that these guys either have lots of memory, and a random SSD read
should be 1/10th of a millisecond.

The article is an ad for database software, but I'd be interested to know in
how they keep the latencies low enough when querying third party data. The
article mentions that the data is colo'd, but I find it hard to believe that
all 150 DSPs are colo'd.

