Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is, to me, one of the major problems with many algorithmic solutions to problems. An x% increase does in precision, F measure or any other score does in no way mean that the results are better.

I've repeatedly seen improvements to traditional measures that make the subjective result worse.

It's incredibly hard to measure and solve (if anyone has good ideas please let me know). I check a lot of sample data manually when we make changes, doing that (with targeting at important cases) is really the only way I think to do things.



If you've got a dictation system on a phone, wouldn't a very good metric be the corrections people make after dictating?

I guess a problem would be if people become so used to errors that they send messages without corrections. I have some friends who do this: they send garbled messages that I have to read out loud to understand. But there will always be a subset of people who want to get it right.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: