
Doccano: Open-source text annotation tool for machine learning - polm23
https://github.com/doccano/doccano
======
hadsed
One of the most important and underserved aspects of actually doing machine
learning is data collection and error analysis (and in software they are
essentially the same thing).

Prodigy by Explosion AI (creators of spacy) is very good, great UX focused on
making you extremely efficient. It's a paid product but I'm happy to help fund
such a talented and impactful team.

That said we don't use any tool out there nearly as much as we could. One
fundamental reason is that they don't cover all of our use cases. My team
works with legal contracts, and oftentimes the flow has to be: scan through
the entire document to find the region you're looking for and drag your cursor
over it to highlight. I haven't seen any annotation tool that works for that,
so we built our own.

In a similar vein, doing error analysis for those predicted highlights on
large documents is also painful. Scrolling here is a chore. If anyone has seen
LiquidText in action then you know the solution here.

There's so much left to do in this world. The exciting part is that the better
the models start to work, the more interesting UX challenges you have to push
efficiency even further. All ML projects largely have to look at the costs of
building and analyzing their datasets, and making that cheaper with models in
the loop and exceptional UX is super critical and super fun to think about.

