Hacker News new | past | comments | ask | show | jobs | submit login

> I proposed a simple solution to ad fraud on our platform - assume any UA coming from an IP range owned by a cloud provider was a bot

Ad fraud is an adversarial game, and you can't 'solve' it with simple things like this. You'll temporarily cause a lot of pain for the bot operators (yay!) but then they'll adjust and start sending the traffic from botnets (hacked consumer devices). Which is already what they do when trying to defraud advertisers on networks that take fraud seriously (which it sounds like your company didn't).

(Disclosure: I used to work on ads at Google)




> they'll adjust and start sending the traffic from botnets

I'm not saying using a botnet is "hard", per se, but the difference in difficulty compared to using a hosting provider is significant. If bot operators are being forced to use botnets, I'd say the solution is working very well.


I ran a volunteer bot net that generated legitimate looking traffic to websites. Its incredibly easy for anyone with a large network of gray and black hats to pull it off.

My botnet was specifically for the optimization of bounce rates, we kept away from any ads of any sort, and only navigated around the internal website through clicking relative links or absolute links with the same domain.

If you wanted lower bounce rates, you had to also run this on your PC, and kick over $50/mo. It was my favorite service I ever wrote even if it did help rank websites that naturally shouldn't have been ranked higher.


This is one of the more interesting comments I have seen in 15 years of reading hn, care to elaborate anymore?


Maybe this deserves a blog post, but I'm lazy. So here is the back story from 10,000 feet.

I worked for a company that basically gave me 80% time. 20% of my time was supporting the products we already launched, the other 80% was experimenting and coming up with new products.

I was blogging kind of regularly back then (once every week or so), on a subdomain without any actual backlinks other than from a few "no-index" and "no-follow" links on social media. So technically my website should not have been in the top page for any search term and I shouldn't have had ANY traffic, but it usually was #1 or #2 for various WordPress and jQuery related searches (I had a couple of jQuery plugins and wrote some php hacks that eventually took down millions of websites), and I got 10k monthly visitors on average.

So I started looking into WHY I was the top result of those queries, and it was because when someone landed on my site with that query, the bounce rate was only 10-30% compared to most other sources being 85-90% bounce. The exact technical meaning of a bounce is lost on me now, but it had to do with how long you stayed on the site without leaving or if you click on other internal links for the site.

So I proposed to the owner of the company, I would create a "Click Faker". It would go to google on your local PC, it would then search for a term you wanted to rank, then it would navigate the top 10 pages, if it found you, it would click your link, then spend 2-5 minutes navigating around your site before closing the window.

I first tried this with Selenium (or an equivalent back then), and Google blocked it almost immediately. So then I hacked together a headless version of chromium, with some standard but randomly generated user agent, and eventually expanded it to IE and Firefox as well.

And the marvelous thing is it WORKED! And surprisingly well. BUT you had to get your site in the top 100 results, and be running google AdSense before it would work. It also proved what we had long suspected that google would rank sites with Adsense higher than a website without (probably because of telemetry data they could gather).

The concept worked, we launched the product, and got a few dozen subscribers over the next month, BUT the demand just never materialized, and after 6 months we started seeing diminishing returns as google started captcha-ing our requests, and eventually it was no longer useful and we shuttered it.

Without a large enough network of consumer PCs on consumer internet, it was doomed to fail. The network needed to be around 1000 users before it would work. We even tried giving away free limited accounts (20 visits/day for free if you ran the script, and it stacked, so if you had 3 different PCs on 3 different ISPs - home/work/mom & dads house/etc - you would get 60 visits/day).

Ultimately, I think there wasn't enough education around it, and nothing we did marketing wise really helped.


> long suspected that google would rank sites with Adsense higher than a website without

Google says that they don't do that. I don't believe Google based on personal experience and it's interesting to see that you had some experimental confirmation.


It is definitely hard to compare one site ranking vs another and definitively conclude that AdSense made one rank higher than just Analytics, but in every case AdSense+Analytics sites correlated with a better ranking/.


Wouldn't another possibility simply be that that sites which invest in those 2 technologies also invest in SEO?


All of the cases were sites that were doing SEO (on- and off-site). There were lots of differences between them, so never an apples to apples, but Analytics + AdSense > AdSense only > Analytics only.

It was very compelling results. Never did a GA ONLY site improve better than an AS only or an AS + GA.


The word volunteer is what caught my attention so much, sounds like you were making a product to sell, so why use the word volunteer?


Because we didn't forcefully take over anyone's computer. You voluntarily signed up and installed the bot. And it was free if you installed the script to use the botnet to crawl your own websites.


> So I proposed to the owner of the company, I would create a "Click Faker". It would go to google on your local PC, it would then search for a term you wanted to rank, then it would navigate the top 10 pages, if it found you, it would click your link, then spend 2-5 minutes navigating around your site before closing the window.

Dumb question: how would google know you spent 2-5 minutes on that domain and navigated around it?


Your site needed 2 or 3 things to work: Ranking in the top 100 results of Google Search, then Google AdSense and/or Analytics for this to fool Google.


If you're hoping to participate in the SEO game, you have to utilize google's javascript analytics code. Their code collects and reports on a variety of user activity. It's safe to assume (based on google published articles [1]) that the data collected is used to influence the site's rankings.

1. https://support.google.com/webmasters/answer/9205520?hl=en


You don't need to install Google Analytics to rank on Google.

The article you cited mentions CRUX data[1], which comes from the Google Chrome browser, not Google Analytics. That data is reported to website users in the form of the Core Web Vitals report, which is a different product than Google's ranking algorithm. Although similar data is probably used as a ranking factor, you can't conclude that from this support documentation.

1. https://developer.chrome.com/docs/crux/


One way they can tell is whether they see you on the search results page again, or if they see you click another result on the search results page.


> If bot operators are being forced to use botnets, I'd say the solution is working very well.

If ad-fraud goes from "We accidentally ran these 'indexing-bots' against some websites, causing some counters to be off. Sorry about that!" to "We deployed our code to run on stolen or hacked machines through botnets paid for in crypto on the dark web", you've moved from legally gray to clearly illegal.

I can see no down-sides with such a move.


I'm not objecting to making the move (definitely do it) but about selling it as a "solution" when it's really beginning a perpetual fight.

People often don't recognize that they are in an adversarial situation, where taking a step that looks like it solves the problem does much less than you expect because other people will later counter your work.


> selling it as a "solution" when it's really beginning a perpetual fight

If your qualifying definition of "solution" is "magic bullet" then there are no solutions. Every solution is a component in the perpetual fight.


Only in adversarial games, and most of engineering isn't adversarial. Enabling compression on your website or designing your UI to make things clear to users give real improvements that don't degrade with time. Other sorts of improvements like optimizing your JS delivery or your server specs decay with time, because people make incidental changes elsewhere, but this is a slow process. Adversarial situations are very different because there's a motivated person on the other end trying to counter what you're doing, and gains are especially short-lived.


I don't really follow any of this comment. Are you saying perfect solutions are possible on non-adversarial situations?


I like to divide solutions into four approximate categories based on what sort of scenario they're applying to:

1. Collaborative situations: your solution works better and better, because people notice and work with you. Ex: designing an icon or coining a word for a new concept; over time more and more people recognize it, use it, etc.

2. Indifferent situations: your solution continues working about the same, because it's not about interaction with others who adapt. Ex: enabling compression on HTML serving, inventing joist hangars, new cancer surgery technique. Most inventions and engineering is in this category.

3. Decay situations: your solution slowly stops working as well, because the world moves on. Ex: payroll software needs to be updated as payroll regulations change.

4. Adversarial situations: your solution quickly stops working well, because others are directly trying to counter your work. Ex: investing strategies, antibiotics, ad fraud, ad fraud detection.

When you're evaluating a solution based on how it seems like it would work in the current world, thinking about how collaborative-vs-adversarial the situation is helps you predict what the full rollout of your solution would look like.


It's nice that you have a framework but you still haven't addressed my (I think fairly simple) question about how any of it applies here.

I realise this probably comes of a little snarky but I've tried following your comments in good faith and it just seems like a very abstract hammer looking for a nail without really reading/listening to the quite literal/simple/not-very-abstract discussion being had here.


We started with EdwardDiego calling blocking cloud IPs "a simple solution to ad fraud" and me replying that because ad fraud is an adversarial situation this wouldn't be nearly as much of a solution as they seemed to think.

Then, in our subthread it seemed to me like you were saying that it being adversarial doesn't matter, and wins are always ephemeral ("every solution is a component in the perpetual fight"). I responded by explaining how this varies by situation, with some where wins compound (cooperative) but that the adversarial nature of ad fraud shortens the lifetime of wins dramatically compared to other domains.


Well it:

-cuts their profits in half or more because they have to pay for the proxies(or if they own them they can't sell them since they need them)

-it prevents most low skilled people from doing it

-it prevents them from doing on it on an infinite scale, AWS have more than a 100 millions IPs, it's rare to see a grey market proxy provider with more than a few millions clean IPs, and it usually cost like 40 cents per IP, where it can be FREE on AWS

You add basic protection against headless browsers, behavioral analysis etc...And now 99% of the people who can fool you are already making 6 figures in legitimate jobs and won't risk 10 years of jail to earn just a bit more money.


When you’re competitive it can be enough to be just a bit harder to crack than the competition. The bots may just choose another advertising platform as the target.


I'd argue it's prosocial to mislead ad spenders into thinking they are reaching more humans than they actually are.


Each layer of difficulty reduces the number of adversaries you have to deal with. Check your spam folders to see how many are incredibly basic and easily caught with the simplest tools.


Moreover, fraudsters are generally lazy. If fraud that works against somebody else doesn't work for you, the fraudster won't go after you until they've saturated all the “somebody elses”, which may be a long time. I used to work in fraud detection in adtech, there was a lot of low-hanging fruit.


Whether the solution is working well isn't about whether you have caused work for fraudsters (though I'm all for that) but about whether you are actually preventing fake traffic.


Difficulty will always have a direct impact on scale: if you've significantly increased the difficulty for fraudsters that's going to have a knock-on effect on the amount of fraudulent traffic you receive.

There's no such thing as zero; a successful measure is one that achieves significant reduction.


Definitely! I was taking issue with your counting "forced to use botnets" as a success, but it sounds like you're actually saying it's just a decent proxy for success, because it is hard enough that you expect this to massively cut down on fraud?

(I think of botnets as not actually that hard a step for fraudsters, and fraudsters as being very determined, but it depends a lot on how much money people can make with fraud against your particular situation)


One AWS device grinding ads is cheap, if you have a full-fledged widespread bot net you may very well make more money selling it as DDOS service.


It’s not. You have several large proxy providers that use millions of consumers devices to proxy your traffic.

Ever wondered how free VPN services make their money? Lots of them use a portion of your traffic to proxy these types of requests.


Show the ad to identified bots, but don't count it might be a solution


This is like burglars. You don't need to scare all the burglars, you just need to make your house slightly more inconvenient to rob than your neighbor's door.


So obviously it is better doing nothing since it benefits us??


Of course not! The my parent's company is completely in the wrong, and running an online advertising business with a "pretend they're not there" approach to bots is a terrible idea. I'm just saying it's nowhere near as easy as my parent seems to expect.


> I'm just saying it's nowhere near as easy as my parent seems to expect.

Apple could eliminate most ad fraud that pretends to be its platforms, by generating tokens from its secure enclave for advertisers, and providing a REST API to validate the token is from a live device.

And of course, as you are in this industry, you know that Apple devices are the highest quality, valuable, converting clicks in the ecosystem.

Then again, PAT would diminish inventory, prices would rise, and people would get better, but nonetheless similar, ROI that they get today.

It is intellectually dishonest to make predictions about ROI. Nobody really knows. Bot farms, fraud, those are all red herrings. Most advertising has shitty creatives.


> Apple could eliminate most ad fraud that pretends to be its platforms, by generating tokens from its secure enclave for advertisers, and providing a REST API to validate the token is from a live device.

What stops someone from buying an Apple device and generating a zillion tokens?


Rate limiting access to the enclave? Somewhat related, I fear this is where we are going to end up with secure attestation, limiting web access to approved devices.


I wonder if there are ways to trick the enclave into thinking time is passing faster...

Pegging the rate limiter to 100% on an old iphone is also going still give you a lot of tokens very cheaply.


> I wonder if there are ways to trick the enclave into thinking time is passing faster...

Most of serious implementations have their own clocks or incremented counters, which makes tricking them very hard even for a state actor.


But what if you (the party who bans the cloud provider IP ranges) don't have a majority share of the market, and your bigger competitors don't block these ranges?

Then it would be easier to bot owners to just move onto your competitors, and you would have higher efficiency than them.


What if the customers you want to attract use a cloud provider?


That's fair, it wouldn't have prevented all fraud, but it would've excised a decent segment of obvious fraud, and tbh, we weren't Google, so the transition from t2.micros to bot nets may have proven economically unviable for the cut-rate scammers.

All "well, actually" aside, my point was, and remains... ...we could've taken some action against obvious fraud, but we didn't, because the business team didn't want their numbers to go down.


Anything that makes things more expensive for the fraudsters than the defender is a win.


That might be a net benefit for the web if it’s more profitable to rent out botnets for ad fraud than sell them for DDoS-on-demand.


What he said, same disclaimer


there is some big logical fallacies in this line of thinking. better to say that google is totally fine bots traffic. that's what you said, after all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: