
Traffic Is Fake, Audience Numbers Are Garbage, Nobody Knows How Many See What - bobajeff
https://www.techdirt.com/articles/20160915/18183535533/traffic-is-fake-audience-numbers-are-garbage-nobody-knows-how-many-people-see-anything.shtml
======
tomludus
There are also a huge number of people using as blockers which actively block
many of the tools people use to calculate users.

Furthermore these ad networks charges are based on this "traffic" with some
vague and highly opaque promises that the bot traffic is filtered out.

Makes you wonder how much money is wasted on bot traffic.

------
automatwon
_Could they even be off by a factor of 10?_

I used to work in Data Science. I definitely agree with this sentiment.
Sometimes we did statistical calculations of 'confidence'. Personally, I
rarely felt confident, due in part to my constant OCD / skepticism. Question
for people who know statistics better than I do: what does accuracy mean when
there isn't a known "true" / standard value? What's the precision of a
measurement when things are an order of magnitude off?

The article mainly talks about attribution. Another possibility is for some
non-trivial portion of data to be systematically missing altogether. Data
which isn't a randomly distributed sample, but a particular characteristic.

 _but all these numbers are actually good for (maybe) is relative comparisons_

I also agree with this. I guess it's okay for metrics to be wrong? As long as
all the metrics are equally wrong in perfect proportion? This is why I also
like to track the proportion between metrics, as well, as an internal
consistency check. My mathematical intuition is that if the metric in question
is a monotically increasing (growth) / decreasing metric over time, there will
be a interval of time which this relative proportion is useful, after which
the metrics, growing at different rates, will diverge to incoherence. Of
course, models are not perfect representations of the world, and merely
reductions to the key components. To give a concrete example of a model where
things look good in some narrow domain, but breaks down. I wrote a physics
wave propagation simulation. Instead of actually implementing the Wave
Equation which is too much math for me, I used Hooke's law as an
"approximation" if you can call it that. It's a good enough visual
approximation, but there will is a critical value at which the system implodes
or some fuzzier value at which things no longer look "unnatural" to say the
least. Then again, Newtonian physics breaks down after a while too.

Also, when the model IS outputting the right answers, it could be for the
wrong reasons. Companies will spend money on a particular thing, which they
correlate and assume causes growth. For example, a particular marketing
campaign. It could just be coincidence / fluke. Does Descartes' evil demon
like trolling analysts?

I'm also a self-proclaimed minimalist: materialistic, and beyond. I'm curious
what's people's experience with "hoarding" metrics? Given how hard it is to
have accurate metrics, I feel reporting should be reduced to a small set of
metrics one can be confident about, rather than a large set of metrics, none
of which one can feel confident about. If I ever work on my own startup,
that's how I'd run things, or so I hope. Maybe businesses gets pressured, to
feel productive, to build confidence with employees, or by their investors to
crank out metrics? I'd love to hear your stories there.

The article talks about having no idea what we're talking about. Sometimes we
have no idea what we mean, ie: Semantics. A metric which we describe
qualitatively in English can drastically differ in value based on how we
formulate a query / computation. Maybe basic English sentences are not Turing
complete, even. This notion doesn't really make sense though, because the
issue of Semantics is also about communicating the idea between two people,
and each person's mapping of a English sentence to a computation is a somewhat
abstract idea. Anyways, either the semantics of the query doesn't reflect our
qualitative definition of the metric, or querying the dataset is not a
"closed-path", meaning one or more of the paths consists of wrong or
inaccurate data. In which case, the solution is to be explicit about how a
particular is computed, rather than WHAT we're desperately thinking we're
getting at. Actually, it was when I first wrote two different queries to
calculate the same metric, and having the numbers differ on an order of
magnitude that I realized what we call confidence borders on quantified faith.

