
Towards Interactive Weak Supervision with FlyingSquid - danfu09
http://hazyresearch.stanford.edu/flyingsquid
======
staticautomatic
This looks awesome! Super fast Snorkel with even better performance? Yes,
please.

~~~
azinman2
Can you describe the types of problems in which this is applicable, and when
it’s not?

~~~
staticautomatic
I am not experienced enough in machine learning to say when it's not, but I
can say it's extremely useful for quickly building classifiers of stuff that
would be a nightmare to label by hand for training purposes.

I do a lot of text classification where I'm working with a set of semi-
structured sample documents. I can usually hard-code rules to classify my
sample set, but I know that any new sample document might well break my rules.
Typically, I know a handful of things about what distinguishes some documents
from others-- some I know will always be true but many I only know will
sometimes be true. The naive way to handle this tends to be a score-based
approach, where documents get "points" for meeting certain criteria. But
unless it turns out that stuff you thought was sometimes true is always true
and/or the things that are always true are enough to classify correctly 100%
of the time, the naive approach is always going to regress (no pun intended)
to a model because you have to identify a cutoff for your scoring. You can
take statistical approaches to figuring out the best cutoff to achieve the
result you want (e.g. a Principal Component Analysis optimized for a certain
chi-square distribution), but it's still turning into a modeling task no
matter what you wanna call it.

It's great to be able to dump all of my roughly understood features into
Snorkel and let it build a model around them. It's also super fast because
I've usually already written the hard-coded "rules", so I have very little new
code to write.

Usually the classifier ends up being a supplement to what I'm already doing
but in certain cases it could become an outright replacement.

------
1_over_n
similar claim made in another paper posted a few days ago

[https://arxiv.org/abs/1905.11786](https://arxiv.org/abs/1905.11786)

