
When does a human cross the threshold to become a bot? - bwagy
https://medium.com/@bwagy/anatomy-of-bots-a-real-life-use-case-b381d0d63e16
======
luckylion
> We measure up to 48 data points when people read or watch content, to
> understand how it is consumed and provide insight from that.

I wonder what data points those are, and whether that's GDPR compliant. I
assume they have a pretty solid fingerprint with that much data.

It's a valid point, however. Incentivized traffic has always been full of
bots, unless you add tough verification steps (aka captcha, answer questions
about the content etc) at which point you'll have to pay a lot more for people
to jump through your hoops.

~~~
bwagy
Fully GDPR compliant, we're looking at behavior rather than personal data.
We've been on the front foot there - also why we are advocates of content -
win/win without the need for intrusive data usage. More on our site..

And that is the messy part, finding the good from the bad - and is worth
solving that issue. That's the wider point, this type of, or framework of
problem is going to be more and more prominent.

~~~
luckylion
Thanks for the reply! Isn't enough data about behavior (scrolling speed,
typing speed, mouse movements etc) enough to identify an individual,
especially when it's combined with something like Browser version? I'd assume
those don't change much across sites, i.e. Alice behaves similarly when
reading reviews for a new phone or reading the local news.

Height, weight, hair color and year of birth aren't PII, but if you combine
them, you're getting pretty close to identifying individuals, it's a
behavioral finger print - and though I have no idea if it holds true in
general, I'd assume that there's even a strong transfer across devices (a
power user scrolls faster on both mobile and desktop).

Is there a way to really solve it? It seems to me that all you actually can do
is up the ante, make it harder to do, but with full browser control, it'll be
hard to lock them out. And if I'm looking at the behavior of users on my site,
I can probably build pretty good replay users that visit the site you're
protecting. I remember the "attack" on IRC channels back in the day where
you'd join a bunch of bot users that would then have a conversation that is
replayed from another channel (possibly in a different language), so timing,
interaction etc looked very real (though they may seem a bit rude for not
reacting to other people).

~~~
bwagy
In some rare cases it might be, i.e. if you had an obscure browser version or
browser. But scroll across sessions you wouldn't be able to so in general I'd
say no. We've also architected the system such that you can't do that. Clients
get end processed data showing how people have consumed content. Further,
utilizing things like first party cookies tied to the domain.

We put it through the lens of, what helps clients understand the value of
content and if people are actually enjoying it.

In terms of solutions, there is a few - profiling real users, using benchmarks
of the outputs to real actions (i.e. people reading content tend to do this
after) help a lot. As well as measuring in the first place. A surprising
number of people still don't measure adequately.

A soft solution is simplifying the supply chain, keeping it transparent.
Accountability drops the further you are from the end customer.

Something I maybe missed in this post, is the cost of this. This is real
tangible cost, and we have seen cases over time in the millions of dollars.
That's money robbed from creating a free and open internet. Advertising
provides a significant subsidy for the eco-system.

------
bifrost
This is a bit weak/clickbaity, it basically is "we got weird traffic" and TLDR
its some people with macros and a couple browser windows... Its not "bots are
posting fake news" its more like "they're real fake clicks on ads"...

~~~
bwagy
It's a bit more than that if you give it a full read...

