
 Training classifiers with natural language explanations - ngaut
https://blog.acolyer.org/2018/08/24/training-classifiers-with-natural-language-explanations/
======
wcrichton
Part of what this reveals is the intimate relationship between querying and
labeling. A query (in the SQL sense) is the human attempting to express a
domain concept through a program. Here the queries have an imperative flavor
(being written in Python). From the article for example, identifying causal
relationships in text by searching for “due to” is a domain concept encoded as
a weak heuristic.

This suggests that our query tools need to be more deeply integrated into
human-in-the-loop machine learning workflows. For example, in my use case of
analyzing TV news videos, let’s say I want to identify a panel of guests. I’ll
come up with a query like “3 to 5 people on screen, whose pose suggests they
are sitting, and they’re looking at each other.” While this query isn’t a
perfect filter (precision nor recall), it will likely find a few positive
examples. Then I can query “show me more scenes like this one,” and slowly
build up a training set from my queries. Then I train a classifier, and
inspect its results. Rinse and repeat.

Edit: also, I’m sure there’s a billion startups that do some variant of this
for some domain, but I think we really need a better open source ecosystem
around visualization and labeling of data for this workflow to be truly
accessible in most domains.

