Hacker News new | past | comments | ask | show | jobs | submit login
Websites using SSAI (server side ad injection) (github.com)
108 points by ressetera 15 days ago | hide | past | web | favorite | 73 comments



Before anyone thinks this is a Eureka anti-ad-blocking technology: Clearly you still need client-side javascript, distributed by the mediator, to ensure that the impression is actually delivered and the click is actually registered.

Otherwise, obviously, the server could just maliciously record impressions/clicks.

Then, logically, if uBlock Origin doesn't remove the ad, but does successfully remove the mediator's script, the server can never book the impression. So why waste precious bandwidth (actually INCREASING the cost of ad delivery for the publisher) delivering an ad you can never be paid for? Boggles the mind.

Embedding the ad into the video is more akin to a native ad, which is generally understood by the advertiser to not have measurable conversion and to be strictly context (as opposed to user) targeted.

We are going full circle--that is, back to the beginning--of ad technology.


> Clearly you still need client-side javascript, distributed by the mediator, to ensure that the impression is actually delivered and the click is actually registered.

There's no point to client-side JavaScript: The baddies just write JavaScript that rewrites basic objects using Object.defineProperty so that document.visibilityState always says so (and so on), or that lie to the visibility sensor. Or they just make a whole fake web browser that runs on a Server. You are in an arms-race, and verification companies simply can't/don't do a very good job.

> Otherwise, obviously, the server could just maliciously record impressions/clicks.

I offer ads† to publishers server-side via XML or JSON, and they can stitch them into the page however they want, and I've been doing this for years.

My publishers often get paid by click, but some of the more valuable ads are paid on referral. Occasionally I see a CPM/CPD deal go through, but it's usually to a larger publisher that I can understand how they get their traffic. I won't help anyone do a CPM/CPD deal unless I understand their traffic.

You're right that it's much too easy to buy traffic from e.g. Google and spray it at my impression and click trackers, which is why I don't rely on them: Adblock and uBlock can remove the trackers all they want, but their users still get ads from my platform, and my publishers will still get paid.

†: Strictly speaking: I offer a platform for publishers, and often help my customers get introductions/recommendations/connections to advertisers who believe in data-driven online sponsorship.

> Embedding the ad into the video is more akin to a native ad, which is generally understood by the advertiser to not have measurable conversion

Server-side stitching can be done in realtime, bespoke, and with standard VAST tags. Not everyone is doing this, because dash/HLS are simpler and still "good enough".

> and to be strictly context (as opposed to user) targeted.

I've worked with several (big) brands who have done completers and demo-guaranteed video campaigns. There is absolutely user-targeting in video.


> There's no point to client-side JavaScript: The baddies just write JavaScript that rewrites basic objects using Object.defineProperty so that document.visibilityState always says so (and so on), or that lie to the visibility sensor. Or they just make a whole fake web browser that runs on a Server. You are in an arms-race, and verification companies simply can't/don't do a very good job.

I agree it's an arms race, but why do you think it favors the attacker? Bot/spam detection is incredibly important, and the folks I've worked with in spam detection are really good at what they do.

(Disclosure: I work on ads at Google, though not in spam. Speaking only for myself.)


> There's no point to client-side JavaScript: The baddies just write JavaScript that rewrites basic objects using Object.defineProperty so that document.visibilityState always says so (and so on), or that lie to the visibility sensor. Or they just make a whole fake web browser that runs on a Server. You are in an arms-race, and verification companies simply can't/don't do a very good job.

You cannot overwrite javascript properties in frames from another domain, right? Am I missing something?

A fake webbrowser requires a lot of IP addresses. Wide-spread abuse seems hard to me, especially when combined with Google's hidden "I'm not a robot" thingy.


> You cannot overwrite javascript properties in frames from another domain, right? Am I missing something?

You don't need to.

The SSP or publisher can slip the naughty JavaScript directly into the ad tag.

> A fake webbrowser requires a lot of IP addresses.

You may be surprised to learn there's a market for buying IP addresses, and they're cheaper than the revenue a bad actor can gain from using them.

There's also a lot of toolbars that embed some limited tunnelling functionality that they can then resell.

There's also a market for hacked DSL routers that you can tunnel through.


You can't use ReCaptcha (or any captcha) for ads. Captchas work because they prevent access to content users want until they solve the captcha.

If you put ads behind a captcha? Well in all honesty you're just doing a service to the user by hiding the ads behind a captcha they're never going to solve (even if they are not robots) because it's not in their best interest to do so.


> You can't use ReCaptcha (or any captcha) for ads. Captchas work because they prevent access to content users want until they solve the captcha.

If you've used ReCaptcha in the past few years [1] you might have noticed it often doesn't ask you to solve a captcha. The parent is describing using a similar approach of detecting bots to identify ad impressions that shouldn't be counted (spam).

[1] https://security.googleblog.com/2014/12/are-you-robot-introd...

(Disclosure: I work at Google in ads, though not in spam.)


There is a hidden "I'm not a robot" "captcha". You might use that to help detect whether the impression/view/click was legit.

https://developers.google.com/recaptcha/docs/invisible

You can programmatically invoke the challenge from the ad's javascript.


If you follow that train of thought to its logical (if perverse) conclusion, we can soon expect ads as the subject matter of captcha.

Instead of selecting three pictures that have a given "thing" in them, we'll be picking the ones showing a given brand among otherwise generic signs.


I've seen some websites that do that, ie watch a short ad and then type in the brand name from the ad.


So life imitates art - again. Too bad the artist is a dystopian dadaist.


> Before anyone thinks this is a Eureka anti-ad-blocking technology: Clearly you still need client-side javascript, distributed by the mediator, to ensure that the impression is actually delivered and the click is actually registered.

I don't suppose I understand why clientside JavaScript is needed here. The serverside code could simply generate a unique hash for every visitor, and include that in the campaign link. Then, server-side code on the receiving end can read this hash, record a unique hit, and monitor the user on the campaign landing page to see if a lead is generated.

This seems obvious to me, but I don't actually work in advertising. Where is the break in this system? What am I missing that allows this to be exploited, in a way that only clientside JavaScript can fix?

EDIT: In context, I've realized that my proposed solution might work for clicks, but would do nothing for tracking impressions. Hrm. I'm not really sure if that problem is solvable. Then again, I'm also not a fan of impression based ad tracking (it feels creepy) so maybe I don't mind if it remains broken.


A couple leads to answer you :

- Video advertising is not happy with only impressions and clicks metrics. In general, advertiser will want to know if their video played while in view from a human (or at least in view on a screen), and for how long it has played (or at the very least a rough estimate, like say how much midpoints).

- The concept of "impression" itself is often not very well defined, but counting it at "the server served a request with the video payload" is really a too optimistic view of things which leaves big holes exploitable by fraudsters. Having client side javascript playing at least requires additional software running, aka additional costs (if minimal) for fraudster.

- You can't really "just" monitor the user on the campaign landing page, since it's different sites involved, it involves different cookies, and actually reconciliating them is doable, but it'd require some work that the advertiser may not be willing or able to do.


If all script appears to come from the website (advertiser scripts routed via the website with random filenames), it'll probably be pretty hard to filter that kind of behavior without breaking other websites.


Sure ! You still need client side javascript though


Threat models:

* The ad server outright telling lies to get paid for nothing.

* The ad server not being trusted to validate that views are legitimate.

Client-side JavaScript can probe the DOM and execution environment for abnormalities indicative of automation.


yes but as was said earlier client-side JavaScript can also be blocked. Everything has downsides.

And it was posited that client-side was needed, the reply was they couldn't see why it was 'needed', which I agree with, I can see why it might be wished for, but you do not need it to record clicks. You do need it for other things, or to improve understanding of the clicks.

Basically any solution is going to be composed of many pieces, all of those pieces susceptible to attack in different ways.


if client side javascript is blocked, then just don't pay for that impression

But why can't a malicious server also serve up some JS that modifies the behaviour of the JS served by the ad network?


You don't need a malicious server.

You can use Google to do this and you'll get a google/branded domain name for your object-hijacking javascript. The number of times I've seen something like document.visibilityState='visible' in peoples ads (or ad wrappers) is astounding.


Isn't document.visibilityState a read-only property?

https://developer.mozilla.org/en-US/docs/Web/API/Document/vi...


No.

It is not.

    Object.defineProperty(document, 'visibilityState', { value: "visible", writable: false })
demonstrates trivially that the documentation is clearly wrong.

Maybe it says it's "read-only" because Google wants bad guys to do this sort of thing, since it makes advertisers buy more ads from them.

Or maybe it's an honest mistake that neither Mozilla, nor Google (nor Microsoft or anyone else it seems) has any idea what "read-only" means.


they try to do that sometimes, but it's rather harder to claim innocence for that than other methods

The server could track it based on the fact it delivered the content to the end user with the same unique hash present, no?


You mean like a tracking pixel? The issue here is that all requests to the ad domain can be blocked by the ad blocker.


But for giant sites like NY Times or the like, you can solve this through commercial agreements and rights to audit.


> We are going full circle--that is, back to the beginning--of ad technology.

I guess they will end serving ads and "content" from the same site, but this site will be controlled directly by the advertisers, not by the publishers.


AMP


>Otherwise, obviously, the server could just maliciously record impressions/clicks.

I'd imagine that advertisers would just bake it into the cost of the impression like they do today for click fraud. In fact, I think this type of fraud is much preferable to click fraud.


That's not what native ads are. They are about following the form and behavior of the surrounding content and can have all the same complex interaction and conversion tracking as any other ad.

Also video players with embedded ads have been around for years. The technology is already more advanced this and is just starting to roll out. Despite what you might think, there are plenty of checks against fraud and any publisher that does such blatant video fraud will get caught very quickly.


clearly we need an AI/ML that will actively scan any video footage and cut out anything that looks like an ad.

then people might actually find out what is and isn't ad.


It will start cutting out actual origin footage at some point because non-obvious ads are already there from the beginning mixed with primary content in a way they can't be easily separated.


What if there existed a standard protocol to (automatically) interact with website's CMSs for the buying and selling of ad space via an API?

Maybe even programmatically allowing the upload of images or videos? (Maybe even ADsafe [0] scripts?)

[0] https://github.com/douglascrockford/ADsafe


Honestly, I'd be happy with that - the biggest problem with ads today is that they're a major source of tracking and malware.


The server could continue to serve ads until the client reports back that one ad was rendered. If the message is blocked, the content would never be delivered. Thus making ad blockers content blockers. I'm not saying this is a good thing or bad thing. Im just saying there are more cats and more mice out there.


It's entirely possible to stitch a targeted advertisement into a video file. This happens.


are ads only sold cpc these days? no cpm ads?


CPM ads still out there, for brand awareness. But you'll be paying for ads that no one clicks. So most advertisers go for CPC.


I get the distinct impression, in the war of ads vs. consumer, that some people will not be satisfied until they've submarined advertising all the way down to sponsored content and we have to go way out of our way to notice that the Try Guys are always drinking Coca-Cola or something.


There will always be those fringe people who insist on content being valueless even though they consume hours of it.

The greater problem is that the ad industry is too unregulated and greedy which has led to a tragedy of the commons with malware and poor UX everywhere, leading to adblockers installed by many who otherwise wouldn't mind.


If someone puts content out there for free, it is by definition freely available, and I decide 100% which content I want my browser to accept and show, and which content to ignore.

If you want to make sure you get paid for your content, put it behind a paywall. Yes, the number of users will drop, but you can't have your cake and eat it, too.

Otherwise, ask nicely for donations or Patreon support or do old-fashioned sponsored content, obviously with full disclaimers that the content is sponsored, so people can decide whether they want to watch it or not.

Specifically talking about video ads, look at what Glenn Fricker from Spectre Media Group does on his Youtube channel. He often gets demonitized because he tends to swear a lot. So he asks people to "spend a buck, give a fuck" on Patreon, and he does short sponsor midway interludes in his videos. It's always a short clip of himself talking about the product or service in question, and it's always something he uses himself, he won't advertise something he can't vouch for. So you don't get the jarring cuts to some random ad agency's standard BS video that runs on thousands of un-related videos.

That's how to do it. Part of and related to the channel's content, but also clearly demarcated and made fully clear that it is sponsorship/advertising. And most importantly: No tracking!


> I decide 100% which content I want my browser to accept and show, and which content to ignore.

Devil's Advocate says the people who make your browser reduce 100% to maybe 60%. Browser extensions are the 10%.


Depends on which browser and which add-on loadout you use.

On Firefox, with appropriate small changes and with uBlock Origin, uMatrix and a few other add-ons, you bet I can adequately control the data accepted by my browser.

If you use Chrome, well that's another situation. Don't use Chrome.


>implying there is no tracking when watching a video on a Google owned platform


Use a VPN, use a private window, never log in.

I posted an example of how advertising could be done in a more sensible way, not a ridiculous claim that Google doesn't track everything they possibly can.


i mean my comment wasn't trying to imply that was what you were suggesting. it was more along the lines of suggesting that in video organic and sponsored ads by content creators isn't going to stop you from being tracked. even with a VPN, a private window, and staying logged out, they can still likely fingerprint you so you better use noscript too! the point is there's no way to not be tracked short of removing the things that facilitate the tracking in the first place.


I think the biggest problem is that ads have given us a culture where people expect everything to be free which makes it really hard to compete with ads.

The malware stuff is a cherry on top, but not the main issue.


I'm hoping for a revival of dumb ads. That large sites such as major newspapers will have ads like they had in the 90:s - internal ad sales departments spending long hours making ad contracts with advertisers who will trust and/or audit the traffic.


Likewise. I've always been curious as to just how significant an impact all this behavioural ad-serving actually has on the impact of their marketing budget. We assume that things like AdSense must provide better ROI than simple dumb placement of ads, but I'd love to know if that is actually true over the long term, and if so just how much better it is. For example, it might work well in the short term for certain things, but at the expense of long-term broarder brand awareness perhaps.

However I suspect it would be fiendishly difficult to work out the effect over anything other than the immediate short term (i.e. impression to click to sale).

If anyone knows of some good experimental work on this I'd love to read it.


I'd love to read some good experimental work on it as well. Off the top of my head, I can guess that there's some benefit to retargeting ads up to a point -- if you can show an ad sequence over time, there's probably an argument to be made that the spaced repetition is more effective than just showing the same ad over and over again, for certain kinds of ads. Then again, that gets back to ad design in and of itself -- a clever campaign could go viral and/or engage in a way an A/B optimized campaign with a poor concept probably never will. With something that's part art/game theory/design and part data, I'm really curious as to how you could even design a reasonably lossless experiment to accomplish this.


This isn't just about trusting the traffic, it's about being able to optimize which creative is the most effective, what's the optimal threshold of #ads served for a user to convert - and conversely, being able to not overspend on someone who's clearly not interested.

The other part of that is those kinds of ads (dumb ads as you say) are going to be mostly useful for branding. Which is fine when you are Apple or Coke, but if you're a smaller player, you'd rather make sure your ads only get in front of the right eyeballs.


Yeah - What I want is effectively having the effect that no small site can have ads and no small business will advertise. It would also likely mean the death of more than half the sites and content on the internet, as well as the loss of millions of jobs in the ad industry and at those sites. And I still hope it happens.


Well actually I think it's going to be different.

SMBs will only leverage Google, Facebook and other big platforms because I believe they'll have the ability to keep being relevant given their "walled-garden" model despite GDPR-like legal framework.

Every small publisher, aka 99% of them will likely collapse one way or another.

As counter intuitive as it might seem, while this would hurt their bottom line, Google and FB would benefit the most from this.

I'd rather see indie ad-networks with an emphasis on anonymity -no PII / control -delete profiles who works with publishers in a fair trade like model. But I'm not naive :(


> I'd rather see indie ad-networks with an emphasis on anonymity -no PII / control -delete profiles who works with publishers in a fair trade like model. But I'm not naive :(

That would be great. I think browsers could push this. A simple standard data structure that people volonteer to provide for themselves. If I want "relevant ads" I populate it thuroughly and honestly. If I don't want that, I don't populate it at all, or even ask my browser to randomize it. But info used for targeting it should be info that the browser (i.e. user) provides and not info that the industry scrapes together.

Second: identity. I think step one must be making sure users can't be tracked or fingerprinted. Browser vendors need to make absolutely 100% sure that no font rendering or other fingerprinting can be used.

Obviously point #2 means that point #1 can't be too detailed, or can't be transferred verbatim every time. If you have narrow enough interests you can be fingerprinted.


Right now big ad is a fleet of dreadnaughts. If we can get them down to a level of submarines that is an improvement, as they are easier to kill.


Would that be so bad? What’s wrong with contextual advertising instead of invasive user tracking?


>that some people will not be satisfied until they've submarined advertising all the way down to sponsored content

Ish.

The key problem that made me give in and start blocking is the lack of accountability/responsibility and the fact that one bad player (or hacked player) can affect many sites at once. Too many times I saw drive-by install attempts, camera/mic access attempts, pop-ups/unders, and so forth, on large popular sites via their adverts. imgur.com was one of the worst offenders at the time, and the final straw, but it was far from uncommon elsewhere too. I got tired of the official response being either "yeah, our ad partners had a problem, nothing we could do, it won't happen again, until next time" or simply complete ignorance.

At least with server-side insertion on the site/apps own servers it gives them more control (and forces them to take responsibility). If they serve a malware ridden ad from their own resources then they are responsible, no one else, and had the control to not do it. They are no longer trusting a 3rd party to be safe without having any audit rights to make sure they are. Currently they add JS/iframes/both from the ad provider and have no control over what goes in there. The ad provider is probably a "network" which farms out the content via redirects/other to yet another party, who may include content that they themselves haven't properly checked. There is no practical way to make that safe, but I'm not willing to accept the risk by not blocking the ads. Server-side insertion may be the practical alternative. It doesn't solve all the privacy/tracking issues of course, the site/app can still collect that data and forward it back to the advertiser, but current ad blocking can't stop that anyway and SSAI does take a hit at the malware & UI dark-pattern issues (from their PoV by giving them better control of what their site serves and from our PoV by meaning use of the shitty "it was someone else, our server did nothing" is even less defensible so they have to use that control to be better if they want to be trusted).

Of course there is a problem with this that I'm sure the ad industry will jump on if CSAI becomes problematical: MitM SSAI. The ad network provides a CDN which your viewers connect to instead of you directly and that inserts the advert/tracker/malware/other on the way through. I'll cost them more due to bandwidth requirements, but it would work for sites/apps that aren't massively latency sensitive, and would let them try enforce exclusivity if the method becomes a common one (by refusing to accept or make connections to other MitM SSAI providers they could reduce in-page competition). Though again, to get around SSAI blocking they still have to keep the source close to them, reducing the farming out of responsibility that we currently see.


>At least with server-side insertion on the site/apps own servers it gives them more control (and forces them to take responsibility). If they serve a malware ridden ad from their own resources then they are responsible, no one else, and had the control to not do it. They are no longer trusting a 3rd party to be safe without having any audit rights to make sure they are.

You're greatly overestimating how much publishers care about security. If they're already willing to embed arbitrary scripts from ad networks (which has full access to the page), why wouldn't they go one step further and proxy it from their servers? It's not like it's giving additional access. I also don't buy the "additional responsibility" aspect. At the end of the day, it's still an ad network, and unless they're manually approving each ad, the risk of malware/scams isn't going to change, and if they happen to display such an ad, they can still deflect blame to the ad network.


> they can still deflect blame to the ad network

Agreed. But it is at least far easier to definitively prove that they are the reason the malware was delivered to a given user. It is perhaps a naive hope, but maybe that and the threat of potential bad publicity (or just being more likely to be included in popular "bad host" block-lists) will encourage a little more due diligence.

> You're greatly overestimating how much publishers care about security.

Oh, my expectations are low. I think more that I'm looking for/at things that might force them to care more than they currently do.


Once a revolution starts, it doesn't stop until it eats itself.


The ML algo of Adblock Radio could be used to bypass those video ads. The case of radio ads is a typical example of server-side ad injection.

https://www.adblockradio.com https://github.com/adblockradio/adblockradio

(Disclaimer: I built this)


In the end, websites will just be server-side generated images. :D


And then Adblockers will convert it back to html and strip ads from it.


Please let me know how to convert images to HTML & how adblockers know what part of it is an ad.

That cannot be done reliably without machine learning.


Then they will use ML.

For everyone saying this ruins impression tracking, couldn't an ad-network just act as a "CDN" (e.g. client <-- Ad-CDN <-- server). They'd basically man-in-the-middle the server response and inject the ads into wherever in the html the server put the ad-tags. To the client, it would still be SSAI, but the Ad-CDN could still record impressions.

Not that any server owner should do this (giving over control of your website ultimately to the ad-network? fuck that), but as ad-tech becomes more desperate, shouldn't these types of MITM setups get pushed more?


An “Ad-CDN” is exactly what AMP is.


What if Cloudflare acting as a reverse proxy, that would really complicate things.


This has been talked about as an obvious evolution for the Ads Network for quite a while. Do anyone with knowledge in this area , know why is taking so long for Adsense/ Criteo come up with a solution like this?


The ad network needs to trust the website to not only deliver the ads reliably but also to report back correct data (impressions, clicks, etc). We are talking about quite a lot of money here.


The way I understand it, SSAI is a video solution because you don't have to worry about impression (you have the view count), just the click so it can be done server side. Google has their own SSAI solution they call DAI. Criteo doesn't do it mainly because they have low video inventory. Correct me if I'm wrong.


Slightly tangential, but does anyone have a good crash course intro guide to ad ops?





Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: