
Launch HN: Humanloop (YC S20) – A platform to annotate, train and deploy NLP - jordn
Hey HN.<p>We’re Peter, Raza and Jordan of Humanloop (<a href="https:&#x2F;&#x2F;humanloop.com" rel="nofollow">https:&#x2F;&#x2F;humanloop.com</a>) and we’re building a low code platform to annotate data, rapidly train and then deploy Natural Language Processing (NLP) models. We use active learning research to make this possible with 5-10x less labelled data.<p>We’ve worked on large machine learning products in industry (Alexa, text-to-speech systems at Google and in insurance modelling) and seen first-hand the huge efforts required to get these systems trained, deployed and working well in production. Despite huge progress in pretrained models (BERT, GPT-3), one of the biggest bottlenecks remains getting enough _good quality_ labelled data.<p>Unlike annotations for driverless cars, the data that’s being annotated for NLP often requires domain expertise that’s hard to outsource. We’ve spoken to teams using NLP for medical chat bots, legal contract analysis, cyber security monitoring and customer service, and it’s not uncommon to find teams of lawyers or doctors doing text labelling tasks. This is an expensive barrier to building and deploying NLP.<p>We aim to solve this problem by providing a text annotation platform that trains a model as your team annotates. Coupling data annotation and model training has a number of benefits:<p>1) we can use the model to select the most valuable data to annotate next – this “active learning” loop can often reduce data requirements by 10x<p>2) a tight iteration cycle between annotation and training lets you pick up on errors much sooner and correct annotation guidelines<p>3) as soon as you’ve finished the annotation cycle you have a trained model ready to be deployed.<p>Active learning is far from a new idea, but getting it to work well in practice is surprisingly challenging, especially for deep learning. Simple approaches use the ML models’ predictive uncertainty (the entropy of the softmax) to select what data to label... but in practice this often selects genuinely ambiguous or “noisy” data that both annotators and models have a hard time handling. From a usability perspective, the process needs to be cognizant of the annotation effort, and the models need to quickly update with new labelled data, otherwise it’s too frustrating to have a human-in-the-loop training session.<p>Our approach uses Bayesian deep learning to tackle these issues. Raza and Peter have worked on this in their PhDs at University College London alongside fellow cofounders David and Emine [1, 2]. With Bayesian deep learning, we’re incorporating uncertainty in the parameters of the models themselves, rather than just finding the best model. This can be used to find the data where the model is uncertain, not just where the data is noisy. And we use a rapid approximate Bayesian update to give quick feedback from small amounts of data [3]. An upside of this is that the models have well-calibrated uncertainty estimates -- to know when they don’t know -- and we’re exploring how this could be used in production settings for a human-in-the-loop fallback.<p>Since starting we’ve been working with data science teams at two large law firms to help build out an internal platform for cyber threat monitoring and data extraction. We’re now opening up the platform to train text classifiers and span-tagging models quickly and deploy them to the cloud. A common use case is for classifying support tickets or chatbot intents.<p>We came together to work on this because we kept seeing data as the bottleneck for the deployment of ML and were inspired by ideas like Andrej Karpathy’s software 2.0 [4]. We anticipate a future in which the barriers to ML deployment become sufficiently lowered that domain experts are able to automate tasks for themselves through machine teaching and we view data annotation tools as a first step along this path.<p>Thanks for reading. We love HN and we’re looking forward to any feedback, ideas or questions you may have.<p>[1] <a href="https:&#x2F;&#x2F;openreview.net&#x2F;forum?id=Skdvd2xAZ" rel="nofollow">https:&#x2F;&#x2F;openreview.net&#x2F;forum?id=Skdvd2xAZ</a> – a scalable approach to estimates uncertainty in deep learning models<p>[2] <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;2766462.2767753" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;2766462.2767753</a> work to combine uncertainty together with representativeness when selecting examples for active learning.<p>[3] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1707.05562" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1707.05562</a> – a simple Bayesian approach to learn from few data<p>[4] <a href="https:&#x2F;&#x2F;medium.com&#x2F;@karpathy&#x2F;software-2-0-a64152b37c35" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@karpathy&#x2F;software-2-0-a64152b37c35</a>
======
ZeroCool2u
This looks pretty great, though the SaaS model is an absolute non-starter for
my own usage unfortunately. We've been pretty prolific users of Explosion AI's
(Makers of SpaCy) Prodigy [1] and actually the interfaces look very similar.
What would you say the core differences are between Humanloop and Prodigy?

1: [https://prodi.gy/](https://prodi.gy/)

~~~
razcle
Thanks! Prodigy is a good tool and we definitely were inspired by some of
their UX decisions. Reducing each decision to a small atomic unit and avoiding
context switching makes a lot of sense.

Our starting place is similar to Prodigy in that we also see active learning
as a key piece of the puzzle but we think to make active learning work
reliably really does need taking into account parameter uncertainty. As far as
I know Prodigy doesn't do this. We are also working to make our active
learning work at the level of batches and be cost-aware. Often the most
valuable examples to label for the model are the most time consuming for
humans and we work to trade this off.

A few other differences are that we do offer a cloud hosted solution so
getting set up is much faster and it's more natural for us to be able to
accomodate team annotation and quality assurance. By providing a hosted model
we also give you the option of deploying features very quickly and continuing
to improve them post deployment.

I'd be curious to know the barriers that Saas introduces for you?

~~~
ZeroCool2u
Good to know! I work with a lot of highly classified information. Much to my
teams frustration, the notion that any of which will be cloud hosted on GCP or
AWS was laughable until somewhat recently. It will probably be years until we,
and I imagine many institutions like ours, can take advantage of something
like this. I appreciate the effort with the VPS / private cloud hosting
option, but without an on-prem deployment option, this wouldn't even make it
passed initial discussions.

------
gauravsc
[https://jacobbuckman.com/2020-01-17-a-sober-look-at-
bayesian...](https://jacobbuckman.com/2020-01-17-a-sober-look-at-bayesian-
neural-networks/)

"But in practice, BNNs do generalize to test points, and do seem to output
reasonable uncertainty estimates. (Although it’s worth noting that simpler
approaches, like ensembles, consistently outperform BNNs.)"

------
Maxen2020
It looks awesome!

I see the snorkel logo on the website and they recently also launched snorkel
flow for data annotation and model training. There isn't much detail on that,
but I wonder is there any advantage humanloop has over that?

On the same track, prodigy also has a prodigy team version that is being ready
for launch forever. So glad you guys are few steps ahead.

I am also building a labeling interface myself because I couldn't find the
right product for my needs(I have tried tools like label studio, doccano,
prodigy, dataturks and ml annotate). They just miss one thing or the other. I
really wish there is one place that features like HTML support, hierarchical
labels, active learning, batch labeling, project tracking, multi user
management and most important the UI/UX are all well put together.

~~~
razcle
Hi, sorry I missed this is earlier! We're in many ways complementary to weak
labelling techniques (like Snorkel) and are actually working to include them
in the tool. We think weak labelling is a great way to overcome cold starts
and active learning then helps you improve rapidly.

The big difference between us and Snorkel is our emphasis on active learning
and HITL deployment. We think the existing paradigm of ML deployment is very
waterfall and slow.

Would love to hear about the annotation interface your building. Agree there
should be one place with all those features! (We're hoping it will be
Humanloop ;) ).

------
Rickasaurus
Is this something we will be able to buy and run on our servers? I don't think
we're the only ones wary of working hard to develop IP for a different
company.

Also predictions/month pricing is just really challenging and incompatible
with many downstream business models. The value has to be really huge to
justify that.

~~~
razcle
The model you train and data you upload are yours to own, unique to you and
don't get shared across users or reused for any other tasks so hopefully you
shouln't feel too much like your building our IP ;)

In terms of deployment options, we're trying to lead with cloud hosting by
default but know that for a lot of people the whole reason they're annotating
in house is privacy so we've been exploring deploying in your VPC and for
larger enterprises on-prem.

Interested to hear more of your thoughts on the pricing model, this is
something we're still iterating on so I'd be interested what you think would
be most compatible with your use cases?

------
Grimm1
Neat how do you compare yourself on the annotation capabilities with
Datasaur.ai which launched in the last YC batch?

In terms of training the models for deployment -- do we own the artifact? Can
I move that into my own model repository?

Also how do you feel this compares to using fine tuning on a publicly
available BERT family model which is already fairly fast and easy not
requiring a huge corpus, speaking from experience of recently having done so?

Are the benefits more from the tight feedback loop and already standing
infrastructure?

~~~
jordn
All great questions!

Datasaur are great. I hope Ivan would think it's fair that I'd describe their
current product as as a modern, cloud-hosted Brat
([https://brat.nlplab.org/](https://brat.nlplab.org/) – this remains very
popular!) with the features to make that work with teams. As you point out
we're focusing on the tight integration of annotation and training enabling
you to move faster and iterate on NLP ideas... essentially trying for move a
waterfall ML lifecycle to a an agile one.

Fine tuning on BERT is the way to go. It's what we do, and that already
reduces the data annotation requirements by an order of magnitude. Doing that
offline in a notebook is still wanted by some (you can use our tool just as
the annotation platform, and download the data and you'll still get the
efficiency benefit through active learning) but integrating or deploying that
model is still a time-suck. Having the model deployed in the cloud immediately
has a load of supplementary benefits (easy to update, can always use the
latest models etc) too, we hope.

(edit: typos)

~~~
julvo
Firstly, congrats on the launch! Active learning is a super interesting space.

You say it's possible to download the data and use Humanloop for annotation
only while still benefitting from active learning. I'm curious about your
experience with how much active learning depends on the model. Are the
examples that the online model selects for labelling generally also the most
useful ones for a different model trained offline?

~~~
jordn
Cheers. It's a good thing to be wary of. Poor use of active learning will end
up biasing the data according to the model it's trained on – so that data
won't be the best X samples to train on a different model. Most of this issue
comes from bad active learning selection methods. If you have well calibrated
uncertainty estimates and sample for diversity and representiveness too, it's
far less of a concern.

------
foobaw
#1 and #2, if they work as advertised, are great features but a lot of other
companies claim to do this but have failed.

One of the biggest problems I have is image annotation using CVAT - the tool
works when the task is simple annotation but outputting the annotation data
and integrating it has been a pain-point. Also CVAT has a tool is great but
has a lot of missing features :/

~~~
anthonysarkis
Related: Diffgram is working in the Vision (Image & Video) space. Not NLP yet.

Integration paint points are mentioned often. We are working on solutions
here, eg:
[https://www.youtube.com/watch?v=w7yiW5wpnMg&t=128s](https://www.youtube.com/watch?v=w7yiW5wpnMg&t=128s)
Imagine adding bucket event triggers as next step here

Some really exciting features coming soon that make this even better.
[https://diffgram.readme.io/docs/what-is-
diffgram](https://diffgram.readme.io/docs/what-is-diffgram)

Can try shared platform and do private install for actual
[https://diffgram.com/user/new](https://diffgram.com/user/new)

We would love your feedback on missing features please feel free to email me
directly anthony+hn@diffgram.com

------
epberry
This workflow is still so complex to get right. Really excited to see more
tools for it and try it out ourselves!

At visitorX we're building a fairly large bank of comments and a tagging
system and Humanloop looks really great for that.

------
an_ml_engineer
Cool! I'm curious, how do you compare your service to Scale (scale.com)?

~~~
razcle
Hi,

Raza here (one of the other co-founders). Good question! I think our visions
are quite different even if our starting points look similar.

Scale has always positioned themselves as an API to human labour and their
goal is to abstract the labelling task away from the end user as much as
possible. So scale works really well when you can easily outsource your
annotation task.

Our ultimate goal is to try and give domain experts the ability to teach ML
models themselves. We're much more focussed on NLP and on tasks that require
domain expertise and are hard to outsource. For people where deep domain
expertise matters or their are privacy concerns, Scale isn't really an option
and we're building tools for them.

On another point, Scale makes its money by charging per annotation so we think
they aren't as incentivised to reduce how much you need to label.

thanks!

------
caiobegotti
Is it English-only or true NLP that would work with multiple languages?
Congrats for the launch!

~~~
razcle
We wrap a lot of popular frameworks and have implementations of most SOTA
models. By default we use a multilingual BERT model so it should work out of
the box on different languages.

------
jeffbarg
Humanloop is such a great name for an AI platform :) Congrats on the launch!

~~~
jordn
haha so great to hear! For a while google search kept trying to auto correct
it to 'human poop'

------
alihabib123
This is really cool! Wish you all the best of luck!!

------
stuartaxelowen
Do you allow for on-premise inference?

~~~
peadarohaodha
Our default deployment option is cloud first for both training and inference
at the moment, but we have thought about the ability for users to export a
trained model. Either exporting the model parameters in some standardised
format, or a compiled predict function, or a docker image that encapsulates a
full inference service, etc. So if you could use this kind of export within
your application, this would allow on-premise inference. This is something we
could probably make available pretty quickly if necessary for your use case.

------
haffi112
What type of annotations do you offer?

~~~
jordn
Right now document level classification and span tagging within text
documents. These can also be combined (as in the landing page screenshot) so
that for a given input, you're learning multiple tasks at once as you
annotate.

The core of this platform should generally be independent of the data input
type and the output labels, so we're building out other annotation options for
our business customers. If there's a use case you would like it to support, it
would be great to chat jordan[at]humanloop.com :)

~~~
hbcondo714
>> text documents

Congrats on the launch! Would Humanloop be able to support HTML files or URLs?
A client of ours has a need to annotate verbose web pages.

~~~
razcle
At the moment we dont support the ability to render the HTML but it is
something that has come up before. One of the teams we're speaking to wants to
classify blog posts and would like to be able to preserve their formatting. If
this is something that's important to you we would consider adding it so maybe
drop me an email at raza[at]humanloop.com and we can discuss?

~~~
hbcondo714
Thank you for your reply. Yes, preserving the formatting is important for us
too.

------
ml_basics
Great stuff!

