
The Difficulty of Faking Data (1999) [pdf] - tontonius
http://www.kkuniyuk.com/Math119FakingData.pdf
======
acbart
This is awesome, and could be a fun project for introductory students. It
gives practice with:

\- Extracting digits from numbers (easily done with a few function calls in
Python, or some math)

\- Calculating frequency of a known set of items in a sequence

\- Some application of simple mathematical formulas[1]

I wouldn't mind trying this at some point...

[1]
[https://en.wikipedia.org/wiki/Benford%27s_law#Statistical_te...](https://en.wikipedia.org/wiki/Benford%27s_law#Statistical_tests)

------
dragon96
Benford's law offers an approach to detecting fabricated data by looking at
the distribution of the most significant digits of the data presented and
matching.

There's a paper that was published in the last 5 years, which I unfortunately
can't seem to find. It detects fabricated data by examining less significant
digits and taking advantage of the fact that our data is often discrete.

The main idea is that an experiment surveyed 20 people about some statistic
that take an integer value (age rounded down, the number of times participants
blinked in a second, etc.), you should never see a mean value of, say,
1234.56. Why? The average of 20 integers must have a decimal part of {0.00,
0.05, 0.10, 0.15, ..., or 0.95}, so if the author reports some other
statistic, then immediately you know that there's something fishy going on.

------
blueboo
This brings to mind a classic pre-Disney, pre-NYT FiveThirtyEight article
applying some of these principles to take some pollsters to task.

\- From 2009, Strategic Vision Polls Exhibit Unusual Patterns, Possibly
Indicating Fraud: [https://fivethirtyeight.com/features/strategic-vision-
polls-...](https://fivethirtyeight.com/features/strategic-vision-polls-
exhibit-unusual/)

