A Pirate's Guide to Accuracy, Precision, Recall, and Other Scores 144 points by ReDeiPirati 23 days ago | hide | past | web | favorite | 17 comments

 The wikipedia pages[1] on that subject are pretty good. I have them open frequently because of my current project, here is a typical sample of the output for a single test:satie f_measure: 82.93 precision 97.62 recall 72.09 accuracy 70.84The context here is a program that tries to recognize what is being played in an audio file. The precision refers to the chance that any given note in the output was actually in the input, the recall the percentage of notes from the input that was recognized. Depending on your application you might give more or less weight to those two. In this particular application precision is more important than recall because a wrong note sounds terrible but a missed note makes no sound at all...You can usually adapt a classification problem to give you these figures, it sometimes requires a bit of creativity to determine what exactly constitutes a false positive or a false negative. In this case a false negative is a missed note, a false positive is a note that wasn't there that got recognized anyway.To be able to easily determine whether or not a given change is an improvement you steer by either f_measure or accuracy.
 So, are you trying to go from audio file to sheet music or something?That'll be a bit of a pain I'd imagine if you aren't separating out subtracks by instrument.Besides which, wouldn't a time sliced Fourier transform give you the characteristic frequencies of the piece?
 > So, are you trying to go from audio file to sheet music or something?From audio file to midi file.> That'll be a bit of a pain I'd imagine if you aren't separating out subtracks by instrument.You're on to something there :) For now I've decided to stay single instrument until I can do that with high enough fidelity to be happy about it. The other day someone posted a link here about a neural net that separates out the various instrument tracks, that would be a very nice complement.> Besides which, wouldn't a time sliced Fourier transform give you the characteristic frequencies of the piece?If only it were that simple. Fourier transforms are very useful tools but their output is very far removed from something that can be played again.
 Product managers for ML/AI products need to understand this post.That confusion matrix is something I have failed to get traction for in the past, but I'd say all market fit will be measured based on the customer's valuation of that matrix vs. the profile of it your product delivers. It doesn't matter what field you are in, this is the metrics framework for every analysis product out there, imo.e.g. a customer has an expectation of tp/fp tn/fn rates that is valuable for themselves, and your given tech has hard limits on the trade offs between them. The real secret is that the values of the confusion matrix do not apply to whole or most of a customer vertical, but rather to individual customers with similar matrix values across verticals.I think you can predict when a ML/AI product company will fail because they think they can apply their product to the vertical of their first few customers, making the false assumption that their confusion matrix applies all the way up that vertical - instead of hunting customers with the same matrix values in other verticals.
 While the article has well sourced information, I find the usage of emojis in technical articles to be distracting and tiresome. I even witness it happening in internal documentation at places I've worked at. It's unnecessary and may discredit the validity of the information in the document.
 I don't have a problem with that so much as with the incessant spamming of floydhub.comhttps://news.ycombinator.com/from?site=floydhub.comEspecially resubmitting the same links over and over again is really not nice.
 In their defense -- those links get up-voted. That indicates that many people missed the earlier postings, and find them valuable.(I am not associated with Floyd hub, but I have used their product once.)
 That still does not excuse resubmission, even if they get upvoted, that's simply abuse and smacks of entitlement.
 [flagged]
 Please don't respond to a comment you dislike by breaking the site guidelines yourself. That only makes the thread worse.
 @Author, you should include the Matthew's correlation coefficient.
 Defining recall as TP / (TP + FN) is a great way to confuse people and prevent them from understanding the term.Recall should be explained exactly as in wiki - how many of the relevant items are correctly identified.
 Could anyone explain to non English speaker why 'recall' is called 'recall'?It doesn't make intuitive sense unlike accuracy and precision.. Thanks!
 In simplified terms, did you find everything you could have possibly found? Looking at the formula in the article, it includes the false negatives, that is, items you misclassified as negatives when you should have considered them positives. And because that happened, you didn't find them in the set, that is you "forgot them". The opposite of forgetting is... recall.Another place this idea comes up is a search engine index. If the algo doesn't find, for a given query, documents in the index it should have (falsely classified as not matching the query), it will have lower recall.
 Ah, that makes sense - thanks! Been using the word for a while but never figured out the linguistic logic, TIL, awesome.
 Not the point of the article, but what's the idiomatic way to generate [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]? The author's [1 for n in range(10)]? [1 for _ in range(10)]? [1]*10?
 Python dev for 5 years: personally I like `[1 for _ in range(10)]` and `[1] * 10`. The `for _ in` syntax denotes the variable is basically a throwaway, whereas the `for n in` construct is a bit confusing - almost like we're supposed to be doing something with the `n` in the comprehension.
 All those options are pretty idiomatic. You can also use itertools to write it as `list(it.repeat(1, 10))`. Optionally you can leave out the `list` if you don't need it (which is the main reason to use itertools).

Search: