
Cleaning up bad bots - m1guelpf
https://blog.cloudflare.com/cleaning-up-bad-bots/?ref=twitter
======
thegeomaster
Great! Let's add it to the list of obstacles to accessing content for us
savages in small, poor, out-of-sight-out-of-mind countries:

\- Harrassment with an extra verification step every _god damn_ time you log
in into a website you've been a paying customer at for years because some
rando risk model thinks you're a bad guy

\- Harrassment by reCAPTCHA to respond to US-centric image challenges when it
decides it's up for some sadistic fun (Mark all images with a store front or
street light? Sure, let me just Google how they are "supposed" to look,
because I sure as hell ain't seeing any I'm familiar with here.)

\- Blanket IP range bans which serve you a default 503 error page and call it
a day, with ripple effects throghout tons of unrelated websites. I always know
when my ISP's IP range is added to some new blacklist.

\- Harrassment by Cloudflare "verifying my browser", presumably burning my CPU
cycles so they can be sure that my browser, which has been hitting their IP
ranges for years with not as much as a cookie wipe, has not suddenly turned
into a bot

\- "Your name is invalid": no, your regex is

\- "Our fetishism for credit prevents us from accepting your debit card
payment unless you submit a scan of your passport signed in blood"

\- (NEW) CloudFlare's anti-bot measures which will surely not misfire because
no one thought to test their shiny new model on the traffic patterns of some
culture with < 50 million Internet users

Apologize for the off-topic rant, but you wouldn't believe how using the
Internet for basic things has gotten difficult in the past few years over here
(and from what I hear, in many other "forgotten" geographies). No one cares.
Long live colonialism!

~~~
Meekro
I'm sorry you have to put up with all of that, and I don't want to be part of
the problem for folks like you. But here's the other side of the coin: I ran a
small VPS hosting company for many years, and the "small, poor, out-of-sight-
out-of-mind countries" are a huge source of fraud. Someone signing up from an
IP in Latvia, for example, was _virtually guaranteed_ to be paying with a
stolen card that would later get disputed. Debit card payments can get
disputed, too, and disputes can happen as much as 120 days after the initial
charge.

For comparison purposes, the dispute rate for US-origin payments (US card, US
origin IP, no evidence of proxy use) was under 1%.

If I couldn't do things like "submit a scan of your passport", I would
probably just decline any cards from high-risk countries.

What would you suggest I do instead?

~~~
thegeomaster
I don't think you as an individual can do a lot, and I don't blame you as an
individual. In your position I would've probably done the same.

The situation with poorer countries is completely clear. As people have less
and less honest options of providing a dignified living for themselves and
their family, their moral principles give way and they turn to crime. Maslow
hierarchy of needs and all that. In my opinion, the blame for a lot of these
situations is the aggressive exploitative treatment and bullying of "small
guys" by "big guys" on the global scene. But that's a separate political
discussion, and a long one at that.

One thing that could help, but I doubt it's under your direct control, is
taking advantage of systems such as 3-D Secure, which is SMS-based OTP for
card payments. My card has this and I'm always happy to verify my payment this
way, but payment processors are free to ignore these capabilities and just
charge the card. (Conspiracy theory: this is because it hurts the conversion
rates, as it adds another step to the funnel.) I think that a payment verified
via 3-D Secure would have a much lower risk associated with it.

But overall, I don't think I can offer any suggestions on how to make the
system better. I ranted due to a perceived injustice, but that didn't mean I
have ways ready to make the situation better. I think that change must come
from the powers that be - and they're unlikely to suddenly start giving a
shit, so there goes that.

In any case, I thank you for considering my perspective and offering to help
move the status quo. Rest assured, my friend, you are not part of the problem.

~~~
im3w1l
> But overall, I don't think I can offer any suggestions on how to make the
> system better. I ranted due to a perceived injustice, but that didn't mean I
> have ways ready to make the situation better. I think that change must come
> from the powers that be - and they're unlikely to suddenly start giving a
> shit, so there goes that.

> In any case, I thank you for considering my perspective and offering to help
> move the status quo. Rest assured, my friend, you are not part of the
> problem.

I disagree with basically all of this. A lot of people here are part of the
problem. They _are_ part of "the powers that be". Not that they want to keep
anyone down or anything, it's just that they have businesses to run and will
do reasonable accommodations but no more.

Since I'm sure everyone here wants the problem to be solved, if it can be done
easily enough, there is a good hope of solving it with some smart
technologies.

3d secure and looking at cookie reputation are both good ideas. If they are
not used as much as you think they should be, we gotta ask why that is? Is it
because it's too hard to implement? People haven't heard of it? Can library
for doing it easily be made? Can it be pushed on people through add it as
default, or including it in code samples?

------
comex
> Our bot detection breaks down into four large components:

> \- Identification of well known legitimate bots;

What about non-well-known legitimate bots? If I run my own web crawler, am I
at risk of falling into the tarpit (and having my IP address reported)?

~~~
avip
What makes your crawler legitimate?

~~~
Crosseye_Jack
What makes Google's legitimate over OP's?

A big part of the net neutrality fight was "allowing the little guy a chance
to play with the big boys".

Part of the service says:- for Bandwidth Alliance partners, we’re going to
hand the IP of the bot to the partner and get the bot kicked offline... If the
infrastructure provider hosting the bot is part of the Bandwidth Alliance,
we’ll share the bot’s IP address so they can shutdown the bot completely. The
Bandwidth Alliance allows us to reduce transit costs with partners and, with
this launch, also helps us work together with them to make the Internet safer
for legitimate users.

My reading of that is if CF decide your IP is bad, they can leverage the
providers Bandwidth Alliance status to shutdown the providers customer. What
if CF's systems misfire? Will there be a grace period? Will there be an
appeals process? Will CF compensate anyone effected and have their hosting
withdrawn from a misfire?

No one likes bad bots, but I'm feeling more and more uneasy allowing CF decide
which is which.

~~~
avip
>What makes Google's legitimate over OP's?

Respecting robots.txt, using a well-known ua, coming from publicly declared,
google owned IP block.

~~~
AznHisoka
"well-known ua"

So if you're a small business that wants to build the next Google, you can't?
Because obviously, your User Agent won't be well known when you start.

~~~
cft
I think IPO empowered Matthew Prince with his benevolent extrajuditial
internet court is becoming a real threat for the open web.

~~~
AznHisoka
I don't have a problem with Cloudflare's reach and power, but would love for
them to be transparent as to what they constitute a bad bot. What's the rules?
Follow robots.txt, Scrape just X pages a minute, what else?

~~~
Crosseye_Jack
The problem I see is they won't explicitly say what would trigger being
flagged as bad (nor what would be considered allowable) as they would claim
(as almost everyone in the anti cheat field does) it would give too much info
to the bad guys to avoid detection.

I would love for them to prove me wrong though and be open about such things.

~~~
danShumway
> it would give too much info to the bad guys to avoid detection.

I'm pretty sympathetic to this line of thought, so don't take this as 'you're
wrong', but I notice that we would never apply this logic to laws, or
environmental standards, or contracts.

You'd never hear someone say, "if we have a clearly defined tax code, that
will just make it easier for people to find loopholes."

Where I do hear this argument come up is explicitly in contexts of moderation
and abuse policies. And it's something that sounds very reasonable, but it's
hard for me to get away from the fact that in most other contexts it sounds
problematic to me.

Maybe part of the problem is Cloudflare's scale? Maybe the reason it feels bad
to have a police officer pull me over because, "we think you're going too
fast" instead of "you went over a posted limit", is because that's a critical
infrastructure that I can't avoid.

Cloudflare is a private company, and even if it wasn't, I don't know if it
would be big enough presence that I would worry about their policies. But when
I hear things like this:

> Our goal is nothing short of making it no longer viable to run a malicious
> bot on the Internet. And we think, with our scale, we can do exactly that.

That maybe shifts the situation a tiny bit farther away from "moderation
policy on a personal blog" towards "policeman pulling me over because I broke
a law I didn't know existed."

I dunno. I'm not 100% sure how to feel about it. I do think that Cloudflare
should be able to filter traffic however they see fit, but that doesn't mean
that every strategy they choose is inherently good, or that it might not be
problematic for them to lack transparency about their standards.

------
ikeboy
>Another trend we have seen is the increase of the combination of bots with
botnets, particularly in the world of inventory hoarding bots. The motivation
and willingness to spend for these bot operators is quite high.

>The targets are goods of generally of limited supply and high in demand and
in value. Think sneakers, concert tickets, airline seats, and popular short
run Broadway musicals. Bot operators who are able to purchase those items at
retail can charge massive premiums in aftermarket sales. When the operator
identifies a target site, such as an ecommerce retailer, and a specific item,
such as a new pair of sneakers going on sale, they can purchase time on the
new Residential Proxy as a Service market to gain access to end user machines
and (relatively) clean IPs from which to launch their attack.

They then go on to spout some economic nonsense about how such bots are
harmful. Actually, resellers make the market more efficient, and cloudflare is
doing a disservice by lumping legitimate bots in with malicious ones like
their credential stuffing example.

~~~
deadbunny
How do bots make the market more efficient?

~~~
ikeboy
By moving the goods to those willing to pay more for them, which increases
consumer surplus, which means the market is more efficient. It's fairly basic
economics.

~~~
comex
Increasing the price should reduce consumer surplus, not increase it.

If the manufacturer of an item increases its price and thus their profit from
it, then _in principle_ the increased profit can drive them to manufacture
more of the item, avoiding deadweight loss due to insufficient supply.
However, that doesn’t apply if a reseller is the one making the profits! And
in any case, some of the concrete examples cited (tickets to concerts and
Broadway musicals) have an essentially fixed supply; it’s hard or impossible
to simply increase production.

Separately, moving goods toward those willing to pay more for them is to some
extent equivalent to moving them toward those who _care more about them_ ,
which could be said to improve efficiency in terms of how much personal
happiness is gained per unit. However, it also moves the goods toward those
who are simply wealthier. And there are other possible proxies for personal
interest besides price, ones that aren’t as affected by wealth. One of them is
a person’s willingness to research in advance when certain tickets will go on
sale, and be online at that time to snag them before they sell out – in other
words, the system that bots are used to subvert. This system has its own
biases (favoring those who are organized, have more flexible schedules, or are
simply in a more favorable timezone), but it‘s still considered to be
reasonably fair, at least when it’s not being subverted.

~~~
ikeboy
Increasing price only reduces consumer surplus when it's the same consumers
who buy. This is not the case when there's a shortage of tickets at the
original sale price, which is the only scenario where scalping is profitable.

Scalping unambiguously increases Kaldor-Hicks efficiency under a simple model.
You can argue that efficiency isn't the best way to be fair, and it's fairer
to someone to not have a ticket due to random luck rather than being priced
out, perhaps. But scalping increases efficiency, which is all I claimed.

~~~
comex
> Increasing price only reduces consumer surplus when it's the same consumers
> who buy. This is not the case when there's a shortage of tickets at the
> original sale price, which is the only scenario where scalping is
> profitable.

I don’t see how that follows. If a consumer can’t buy a ticket from the
original seller due to a shortage, there’s no guarantee that the original sale
price was at (or even near) the maximum price that that particular consumer
was willing to pay. Therefore, that consumer might also be willing to buy from
a scalper. It would be in the opposite case, if there wasn’t a shortage, that
someone who doesn’t buy could be assumed to be unwilling to pay the price.

In practice, I’d expect a decent fraction of those consumers to be near their
maximum price, since if not, we could conclude that the original seller
irrationally set their price far too low. (But the original price must have
been set _somewhat_ too low if a significant shortage exists.) Still,
depending on the markup charged by the scalper, I wouldn’t be surprised if a
significant fraction of consumers actually would be willing to pay it.

As a practical example, according to an article from last year about Hamilton
tickets [1], among seats originally sold for $69-$179, a few were on sale for
>$10,000 each, but the average resale price was $412. $10,000 will obviously
price out most consumers, but going from ~$100 to ~$400 might not, considering
the extremely high desirability of Hamilton seats.

> Scalping unambiguously increases Kaldor-Hicks efficiency under a simple
> model. You can argue that efficiency isn't the best way to be fair, and it's
> fairer to someone to not have a ticket due to random luck rather than being
> priced out, perhaps. But scalping increases efficiency, which is all I
> claimed.

If I understand correctly, it increases efficiency if the scalper could have
hypothetically purchased their ticket from someone who was deprived the
opportunity to buy from the original seller, instead of from the original
seller directly, and still made a profit by reselling it.

But that doesn’t seem true. There’s no reason to think that person who “won
the lottery” would be willing to sell the ticket for the same price they
bought it for. Instead they would likely sell it for a similar price to what
the scalper wants to charge. At best the scalper can gain some value on the
margin by taking on the risk that the ticket won’t sell, but not much value.

[1] [https://www.seattletimes.com/entertainment/theater/how-
did-h...](https://www.seattletimes.com/entertainment/theater/how-did-hamilton-
tickets-get-so-expensive-and-what-does-that-mean-for-future-big-events/)

~~~
ikeboy
>But that doesn’t seem true. There’s no reason to think that person who “won
the lottery” would be willing to sell the ticket for the same price they
bought it for. Instead they would likely sell it for a similar price to what
the scalper wants to charge. At best the scalper can gain some value on the
margin by taking on the risk that the ticket won’t sell, but not much value.

As long as the ticket moves to someone with a higher willingness-to-pay,
Kaldor-Hicks efficiency is increased.

Since the ticket holder without scalpers isn't the same as the one with
scalpers, they have a lower willingness-to-pay than the price the scalper
sells at. Therefore, they'd be willing to sell to the actual ticket holder at
that price.

The only difference between this scenario and the one with a scalper is some
money that the scalper has instead of the original ticket holder, and
transfers don't factor into Kaldor-Hicks so this is irrelevant.

~~~
comex
> As long as the ticket moves to someone with a higher willingness-to-pay,

Which, as I said before, is only some of the time, but even under that
assumption...

> Since the ticket holder without scalpers isn't the same as the one with
> scalpers, they have a lower willingness-to-pay than the price the scalper
> sells at. Therefore, they'd be willing to sell to the actual ticket holder
> at that price.

Then why don't they? Anyone who buys a ticket intending to go to the play is
free to change their mind and resell it. Modern online services make this
quite easy. But most people don't resell their tickets.

There are a variety of possible explanations. One is to invoke arguably
irrational (but still real) behavior like loss aversion. Another has to do
with the inconvenience of rescheduling or the risk of not selling, but that
doesn't apply to the Kaldor-Hicks calculation since being deprived of a ticket
(as opposed to selling it) doesn't create that inconvenience or risk. But here
is one possible situation where even an economically rational actor would
prefer to have a ticket than to be awarded a market price:

The value of the ticket to them may be _higher_ than the scalpers' market
price. This can be true even if the holder was not originally willing to pay
that much, because the holder effectively gained net worth as soon as they
"won the lottery" for the ticket, so they have a different amount of spare
money to work with. And if this person took the time to buy tickets as soon as
they were available, they probably care more about the play than the average
person in the secondary market, so they would put a higher value on it
relative to their wealth.

This is balanced by the fact that other people in the secondary market will
likely be wealthier than them. Indeed, in the case of those rare orders-of-
magnitude price hikes, most holders probably would be willing to sell if they
could be sure the sale would actually go through. But most scalping margins
are much more moderate and may not be enough to get the price over the value
to the original holder.

~~~
ikeboy
>Which, as I said before, is only some of the time, but even under that
assumption...

The only scenario where it's not is if the demand curve is perfectly
inelastic. If the price is set at $100, and selling it for $400 instead
doesn't raise the average willingness to pay of buyers, then there has to be
exactly the same demand at $100 as at $400, which is extremely unlikely.

>Then why don't they? Anyone who buys a ticket intending to go to the play is
free to change their mind and resell it. Modern online services make this
quite easy. But most people don't resell their tickets.

We're talking about a hypothetical scenario without scalpers. In that
scenario, it's difficult to impossible to resell tickets.

>But here is one possible situation where even an economically rational actor
would prefer to have a ticket than to be awarded a market price:

Yes, positing an income effect from winning the ticket lottery can produce a
model where this result no longer holds. But income effects that strong are
insanely rare. To reframe this, you're suggesting if they had found $400 on
the street that morning, the rational thing to do given their preferences is
to spend $400 on a ticket, but they shouldn't otherwise. You're depicting
someone so poor that an additional $400 represents a significant wealth
difference, but also where it makes sense to spend $400 on a ticket rather
than on other things. In the real world, I think we'd call someone like that
irrational.

It might be plausible in other contexts, but it doesn't seem reasonable in the
ticket context.

------
mfontani
I've toyed with the idea of a tarpit service for badly behaved crawlers which
just don't get other hints.

I'm glad to see that if my services are behind cloudflare, I could just turn
something on and let _them_ deal with it.

~~~
dclusin
What is does it mean to tarpit someone? Throttle their connection?

~~~
petre
Waste their time with a very slow server. Like OpenBSD’s spamd. Think one
letter per second for text based protocols.

~~~
mfontani
Precisely that

------
kylehotchkiss
If any cloudflare staff happen to see this... is Cloudflare Warp coming soon?
I was pretty excited to give that a try (and it seemed like something I would
be willing to pay for)

~~~
jgrahamc
[https://blog.cloudflare.com/birthday-
week-2019/](https://blog.cloudflare.com/birthday-week-2019/)

~~~
cj
Sounds like it'll come on Wednesday:

"The wait is over. Our product and engineering teams have been working round-
the-clock to build a new experience that makes the Internet faster and safer
for everyone. If I were to say anything more, it would surely give it away so
I’ll leave it at that — you’ll just need to tune in to the blog on Wednesday
to find out."

------
acolytic
I'm not very familiar with the concept of tarpitting. How do they get the bot
to run CPU intensive code? By passing in extra Javascript? Can this affect a
bot that doesn't run any JS?

~~~
s09dfhks
Was also curious about this step. I assume they're not going to reveal the
nitty gritty details for fear of botters coding around it, but I am curious as
to how you can "make them use more CPU" while crawling a website

~~~
jsnell
Serve the suspected bots a page with Javascript that computes a proof of work,
submits it, and gets the real page in return.

Steamspy.com seems to trigger one of these basically every time when loaded
with a fresh cookie.

------
andrerm
This is Claudfare saying that now that they arbitrated X and everybody said
"OK" they are moving from infrastructure into infra and arbitration.

My question is, what if you are starting an alternative search engine or
something legit?

Edit: my point is, the rules than make a legit or not bot, crawler, scrapper
etc. are not clear at all.

------
tony
Cloudflare, as a service, has it all.

Nice UX, fast, free. Nice domain service if you transfer to them. Fast DNS
management. DOS/bot mitigation. Caching. Quick SSL, and affordable upgrade
options. 2FA via TOTP.

With all the networking/domain stuff momentum build, it would be nice to be
able to spin up servers for apps/db.

Main language at cloudflare is golang? Rust? Any python over there?

Request: Allow changing the super administrator for cloudflare account more
easily. At least for early-stage accounts.

~~~
mfontani
Not yet U2F, which I'm hopeful they get real soon

~~~
judge2020
[https://twitter.com/eastdakota/status/1175522433643601920?s=...](https://twitter.com/eastdakota/status/1175522433643601920?s=21)

------
bonerman69
> for Bandwidth Alliance partners, we’re going to hand the IP of the bot to
> the partner and get the bot kicked offline;

What's that mean, 'kicked offline'?

Isn't scraping 'legal'?

~~~
jgrahamc
There's a lot more to bots than just scraping... DDoS bots, bots that buy hot
sneakers before the public gets a chance and drive up the price, bots that go
credential stuffing, bots that play nasty tricks with airline seats ...

~~~
ikeboy
Only two of those are malicious.

Buying products at the price offered is perfectly legitimate, regardless of
all the scaremongering to the contrary.

~~~
eli
Not sure what "legitimate" means here? Legal? Running aggressive web crawlers
is in many instances against the rules for consumer cloud servers. For
example, AWS requires that you obey robots.txt if you run a crawler there.
[https://aws.amazon.com/premiumsupport/knowledge-
center/repor...](https://aws.amazon.com/premiumsupport/knowledge-
center/report-aws-resource-crawling/)

In my experience a lot of bots seem to be running on hacked servers or through
hacked/insecure proxies. I'd imagine tracking down the owner or someone
upstream of those boxes could be effective in taking them offline.

~~~
ikeboy
What does that have to with my point? Bots used to purchase inventory (and
that aren't otherwise commiting fraud by using stolen credit cards or
something) are not malicious.

~~~
coldpie
> Bots used to purchase inventory are not malicious.

There's no way you're in this conversation without being aware that scalping
is a controversial practice at best.

[https://theconversation.com/the-economics-of-ticket-
scalping...](https://theconversation.com/the-economics-of-ticket-
scalping-83434)

[https://en.wikipedia.org/wiki/Ticket_scalping](https://en.wikipedia.org/wiki/Ticket_scalping)

~~~
ikeboy
I'm well aware that many economically illiterate people like to scaremonger
about scalping.

That doesn't make them right.

~~~
coldpie
Or perhaps they are interested in optimizing the process for something
different than what you are optimizing for.

~~~
ikeboy
Maybe I'd believe that if they stopped saying things like scalping harms the
producer, and acknowledged that they want a less efficient market.

~~~
JakeTheAndroid
if a single bot controller can buy up an entire stock of limited items
legitimately, that is malicious as that company is not longer able to meet the
needs of their consumers. That's bad for the company.

~~~
ikeboy
If it's profitable for anyone to resell, that implies the company priced below
the market price and there would be a shortage without scalpers. So the
company is unable to meet the needs of its customers in any event. Scalpers
just make it somewhat more efficient.

