
Epoxy: Interactive Model Iteration with Weak Supervision - polm23
https://github.com/HazyResearch/epoxy
======
cs702
Neat.

You would use this, for example, if the distribution of your data is prone to
change over time and training or finetuning a model to acceptable accuracy
quickly enough is impractical. To use it, you need two things:

* Noisy labeling functions -- e.g, user-provided heuristics or methods for labeling samples, or existing models trained on subsets of the data. These functions need accurately classify only a subset of possible samples.

* Embeddings from a pretrained model -- e.g., a convnet or language model. These embeddings cover practically all possible samples but do not classify them.

At a very high level, the authors iteratively extend the coverage of the
labeling functions using nearest neighbors search in the embedding space, and
then aggregate them using a library called FlyingSquid (by the same authors;
there's a link on the repo) for _quickly_ building an aggregate labeling
model.

