
Ask HN: How do you manage to keep improving your ML models accuracy? - gerenuk
Hello everyone,<p>What do you guys do to improve training data set labels to get better success rate? Are you using some framework or doing it manually after some time?<p>We have been working on topic modeling stuff, and managing everything manually is quite hectic, so looking for some better solutions.<p>Thanks
======
PaulHoule
From an engineering standpoint, the question is understanding where the
bottlenecks are.

For instance, your feature set might limit your accuracy. Let's say you are
interested (or uninterested) in posts about the Go programming language on HN
and you are classifying based on the title. "Golang" predicts "Go Language"
accurately, but "Go" does not. No matter how much you train, you will reach a
limit unless you have beyond-bag-of-words features like "Go Development", "Go
Implementation", ...

Many NLP projects fail because people decide up front to throw away critical
information that they can never get back. Beyond BoW is not trivial, however,
because if you vastly increase the number of features, most will be poorly
sampled and you won't learn from them.

Past feature engineering there are very interesting questions in active
learning that are not covered well in the academic literature, largely because
active learning experiments are not reproducible in a Kaggle-like competition.
There is also the human factor; you can destroy people psychologically by
making them split hairs that don't matter; realistically you can get 2000
judgements a day out of a person if that is all they do, 200 is more likely
from an expert who does other things.

Click on my profile link and send me an email and I can share what I know.

