
A Cheat Sheet on Probability - babelouc
http://www.datasciencecentral.com/profiles/blogs/a-cheat-sheet-on-probability
======
mavam
Shameless plug of my own attempt to summarize key concepts in probability
theory and statistics:
[https://news.ycombinator.com/item?id=13060021](https://news.ycombinator.com/item?id=13060021)

~~~
curiousgal
That is precisely the kind of content I expected a cheat sheet on Probability
to contain. No offence to OP but if you need a cheat sheet for basic concepts
like the ones included in the original post then you'll find it really hard to
do grasp any "useful" Probability concepts.

~~~
max_
>No offence to OP but if you need a cheat sheet for basic concepts like the
ones included in the original post then you'll find it really hard to do grasp
any "useful" Probability concepts.

FYI, "Basic" varies from person to person

~~~
curiousgal
True but when compared to Measure Theory (the foundation of Probability
Theory) almost everyone can agree that those concept are simpler, i.e. basic.

~~~
max_
"almost everyone" is not everyone

------
thomasahle
> Disjoint probability (weather and coins)

I think the weather is probably independent from a coin, rather than disjoint.

Disjoint events would be something like "the coin lands heads up" and "the
coin lands tails up"

------
foxrob92
Holy shit, that visual display of Bayes Theorem makes so much sense. Writing
it as P(A intersect B)/P(B) and having the Venn diagram just made it all click
in my head.

------
thyrsus
I follow the arithmetic behind the birthday paradox, but that doesn't help my
intuition about similar problems: can someone point me to heuristics for
dealing with very large numbers of events in a very, very large universe of
possibilities? E.g., sha1 collisions[0] on blobs in Github?

Suppose I had N initially non-communicating instances of github. Would a merge
of all those repositories be more likely to have a sha1 collision if each used
the full 160 bits for their blobs, or if each repository assigned a random
log(N)+e bit prefix to itself, using only 160-(log(N)+e) bits for its own
blobs, but incurring a possibility of collision within the log(N)+e bit
prefixes? And, of course, one wants to know the increased likelihood of
internal collisions now that we're only using 160-(log(N)+e) bits for the
local identifiers (which of course depends on the number of internal distinct
blobs).

[0] A collision is two distinct blobs with the same identifier; two blobs
containing the same bits having the same identifier is a feature, not a
collision.

------
denzil_correa
Off topic but I just read a comment on the link

> These examples remind me of a paper I came across a few years ago using
> probability to show why the author would never have a girlfriend. It's a fun
> read and can be found at
> [https://logological.org/girlfriend](https://logological.org/girlfriend) if
> interested.

It actually refers to the article on the FP. Nice co-incidence!

[https://news.ycombinator.com/item?id=13490063](https://news.ycombinator.com/item?id=13490063)

------
MLWiDA
I like to throw in the first chapter of "Statistical Inference for everyone"
for a little bit more explanation for these rules
([https://github.com/bblais/Statistical-Inference-for-
Everyone](https://github.com/bblais/Statistical-Inference-for-Everyone)).

It looks to me like an easier approach to the heavy weight "Probability
Theory: The Logic of Science" by E.T. Jaynes

------
tossaway322
The cheat sheet has at least one error. Using set notation, A="long hair", and
B="woman", then the phrase

"not long hair and not woman"

does not correspond to the expression

P(complement(A intersect B))

shown on the cheat sheet.

It instead corresponds to

P(complement(A) intersect complement(B))

Which is equivalent to

P(complement(A union B))

------
unklefolk
This looks awfully like set theory and SQL. I wonder what the SQL equivalent
for each concept would look like. For example (picking the easy one!) marginal
probability for long hair would be:

SELECT SUM(CASE WHEN LongHair=1 THEN 1 ELSE 0 END) / SUM(1.0) FROM B

------
ludicast
Tangential, but could anybody recommend a proof-heavy first-principles book
for probability/statistics?

I know stats etc. from an applications perspective (excel/sas/r), but 100%
sure of the theoretical underpinnings behind much of it.

~~~
dagw
On the probability front I recommend:

\- Probability and Random Processes by Grimmett

\- Probability with Martingales by David Williams

Grimmett is probably a better bet to start with since it doesn't expect quite
as much prior math knowledge and covers a lot more topics. The Williams book
is shorter, denser and doesn't cover much in the way of applications, but
gives a really good theoretical underpinning of how probability theorists
think about probability (ie in terms of Lebesgue measures and Sigma algebras).

edit: There is also a companion book to Probability and Random Processes
called One Thousand Exercises in Probability which contains an interesting
selections of problems and solutions that will let you apply the theories
taught in the main book.

~~~
ludicast
Thanks, the first book (along with the exercises) seems like a great approach.
Just peeked at the contents online, and it's pretty close to what I'm looking
for.

Much appreciated!

~~~
dagw
It's a good book.

Williams is probably better for mathematicians coming at probability with an
already solid mathematical understanding rather than practitioners who want to
try to understand the underlying theory.

------
MayeulC
I also liked the article linked in one of the comments:
[https://logological.org/girlfriend](https://logological.org/girlfriend)

Edit: Actually, it also popped on the front page.

