
A Pirate's Guide to Accuracy, Precision, Recall, and Other Scores - ReDeiPirati
https://blog.floydhub.com/a-pirates-guide-to-accuracy-precision-recall-and-other-scores/
======
jacquesm
The wikipedia pages[1] on that subject are pretty good. I have them open
frequently because of my current project, here is a typical sample of the
output for a single test:

satie f_measure: 82.93 precision 97.62 recall 72.09 accuracy 70.84

The context here is a program that tries to recognize what is being played in
an audio file. The precision refers to the chance that any given note in the
output was actually in the input, the recall the percentage of notes from the
input that was recognized. Depending on your application you might give more
or less weight to those two. In this particular application precision is more
important than recall because a wrong note sounds terrible but a missed note
makes no sound at all...

You can usually adapt a classification problem to give you these figures, it
sometimes requires a bit of creativity to determine what exactly constitutes a
false positive or a false negative. In this case a false negative is a missed
note, a false positive is a note that wasn't there that got recognized anyway.

To be able to easily determine whether or not a given change is an improvement
you steer by either f_measure or accuracy.

[1]
[https://en.wikipedia.org/wiki/Precision_and_recall](https://en.wikipedia.org/wiki/Precision_and_recall)

~~~
salawat
So, are you trying to go from audio file to sheet music or something?

That'll be a bit of a pain I'd imagine if you aren't separating out subtracks
by instrument.

Besides which, wouldn't a time sliced Fourier transform give you the
characteristic frequencies of the piece?

~~~
jacquesm
> So, are you trying to go from audio file to sheet music or something?

From audio file to midi file.

> That'll be a bit of a pain I'd imagine if you aren't separating out
> subtracks by instrument.

You're on to something there :) For now I've decided to stay single instrument
until I can do that with high enough fidelity to be happy about it. The other
day someone posted a link here about a neural net that separates out the
various instrument tracks, that would be a _very_ nice complement.

> Besides which, wouldn't a time sliced Fourier transform give you the
> characteristic frequencies of the piece?

If only it were that simple. Fourier transforms are very useful tools but
their output is very far removed from something that can be played again.

------
motohagiography
Product managers for ML/AI products need to understand this post.

That confusion matrix is something I have failed to get traction for in the
past, but I'd say all market fit will be measured based on the customer's
valuation of that matrix vs. the profile of it your product delivers. It
doesn't matter what field you are in, this is the metrics framework for every
analysis product out there, imo.

e.g. a customer has an expectation of tp/fp tn/fn rates that is valuable for
themselves, and your given tech has hard limits on the trade offs between
them. The real secret is that the values of the confusion matrix do not apply
to whole or most of a customer vertical, but rather to individual customers
with similar matrix values across verticals.

I think you can predict when a ML/AI product company will fail because they
think they can apply their product to the vertical of their first few
customers, making the false assumption that their confusion matrix applies all
the way up that vertical - instead of hunting customers with the same matrix
values in other verticals.

------
kache_
While the article has well sourced information, I find the usage of emojis in
technical articles to be distracting and tiresome. I even witness it happening
in internal documentation at places I've worked at. It's unnecessary and may
discredit the validity of the information in the document.

~~~
jacquesm
I don't have a problem with that so much as with the incessant spamming of
floydhub.com

[https://news.ycombinator.com/from?site=floydhub.com](https://news.ycombinator.com/from?site=floydhub.com)

Especially resubmitting the same links over and over again is really not nice.

~~~
omarhaneef
In their defense -- those links get up-voted. That indicates that many people
missed the earlier postings, and find them valuable.

(I am not associated with Floyd hub, but I have used their product once.)

~~~
jacquesm
That still does not excuse resubmission, even if they get upvoted, that's
simply abuse and smacks of entitlement.

------
tomrod
@Author, you should include the Matthew's correlation coefficient.

------
avip
Defining recall as _TP / (TP + FN)_ is a great way to confuse people and
prevent them from understanding the term.

Recall should be explained exactly as in wiki - _how many of the relevant
items are correctly identified_.

------
nodoodles
Could anyone explain to non English speaker why 'recall' is called 'recall'?

It doesn't make intuitive sense unlike accuracy and precision.. Thanks!

~~~
pierrefar
In simplified terms, did you find everything you could have possibly found?
Looking at the formula in the article, it includes the false negatives, that
is, items you misclassified as negatives when you should have considered them
positives. And because that happened, you didn't find them in the set, that is
you "forgot them". The opposite of forgetting is... recall.

Another place this idea comes up is a search engine index. If the algo doesn't
find, for a given query, documents in the index it should have (falsely
classified as not matching the query), it will have lower recall.

~~~
nodoodles
Ah, that makes sense - thanks! Been using the word for a while but never
figured out the linguistic logic, TIL, awesome.

------
_raoulcousins
Not the point of the article, but what's the idiomatic way to generate [1, 1,
1, 1, 1, 1, 1, 1, 1, 1]? The author's [1 for n in range(10)]? [1 for _ in
range(10)]? [1]*10?

~~~
mason_jake
Python dev for 5 years: personally I like `[1 for _ in range(10)]` and `[1] *
10`. The `for _ in` syntax denotes the variable is basically a throwaway,
whereas the `for n in` construct is a bit confusing - almost like we're
supposed to be doing something with the `n` in the comprehension.

