
Show HN: ALMa – Active Learning Data Manager - talolard
https://www.lighttag.io/blog/active-learning-manager/
======
talolard
OP here. A bit of context in case your not doing ML day to day.

So everyone talks about AI, but the dirty secret of our space is that you need
a lot of labeled data to train your AI on. Labeling data is a manual process
and active learning is a way to use AI to speed up that manual process.

The core idea is that you let your model choose what to label next based on
how "valuable" the next piece of information is. "Valuable" can be defined in
many different ways, the most common is to choose data points that the model
is least certain about.

ALMa is a utility that makes the engineering aspects of implementing active
learning a little easier. When implementing active learning you need to keep
track of what data has been labeled (what you train the model on) and what has
not been labeled (what you label). The common ecosystem is very array based so
this becomes an exercise in tracking offsets that is tedious and error prone.
ALMa abstracts that away.

\----A bit about LightTag ---

We make tools to annotate text, entities, classification and relationships. We
make it particularly easy to work with larger teams of annotators, through
automated workforce management and analytics (IAA, adjudication etc).

We've traditionally been on the fence about active learning because you run
the risk of biasing your data to whatever model it was that your using. It's
been requested often enough that we'll make it an optional feature and ALMa is
a component in that pipeline.

~~~
anentropic
I think there's a couple of typos here:
[https://www.lighttag.io/features](https://www.lighttag.io/features)

> Keyboard Shortcuts

> Annotate faster with out optimzied keyboard shortcuts

------
tastroder
tiny aside: you have a typo in
[https://github.com/LightTag/ALMa](https://github.com/LightTag/ALMa)
s/Leanring/Learning/

