
Dataturks – ML data annotations and labeling doesn't need to suck - gajju3588
https://dataturks.com/
======
XR0CSWV3h3kZWg
Neat that you can look at and download the free projects.

One of the things I'd look for in a product like this is easily doing rater
reliability & interrater reliability.

Optimally I'd love to see a project also allow for easy semi-supervised
labeling. I don't see an API for grabbing data points for a model you are
training to label and put them back.

~~~
gajju3588
Hey, Thanks for checking us out and the inputs. API access is getting ready to
launch as I am typing this. Agree that rater reliability is a very important
thing which will matter here, ultimately it converts to dataset quality. Over
the period of time, we would show dataset quality in some form using feedbacks
from dataset users.

------
o_____________o
The "How does it work" section doesn't explain how it works.

> You and your team can now easily collaborate to build ML datasets super
> quick. Send email invite to anyone to help label your datasets, your team,
> friends, colleagues or external labelers. Pre-built support for more than a
> dozen data annotation use cases.

~~~
gajju3588
Hey, thanks for input. We will certainly work on improving that section.

------
juliend2
Do you guys know if there is any open-source project doing this kind of thing
(UI for image labeling, NLP tagging, classification)? I think I've seen
something like this before.

~~~
convertml
Labelbox?
[https://github.com/Labelbox/Labelbox](https://github.com/Labelbox/Labelbox)

------
amelius
Does Dataturks own the data after uploading/labeling?

Do they keep copies?

That's important because in ML, data is everything.

~~~
gajju3588
Hey, If its a free plan, data stays open to public. But for paid [for larger &
enterprise datasets] plans, DataTurks doesn't have any ownership of data, and
we don't keep any copies.

------
darknoon
Does anyone have a good tool for pixel-level annotation? I was thinking an
iPad app with pencil support might be nice, but open to whatever is out there.

~~~
kozikow
We used Sloth for segmentation labelling (polygons instead of pixels). Direct
pixel labelling is very expensive, so usually you can save lots of time by
simplifying the problem - one of them being polygon labelling.

Although Sloth is a gtk app it was the best option available year ago - much
better than free web based options available back then. We packaged it into
single windows exe that we were distributing to labellers in a zip alongside
data to label. Labellers would send back zip + annotated data in json format
custom to Sloth. Zips landed in s3 folder and we had some scripts doing
analytics on that (e.g. Labellers reliability). It was very easy to extend and
add new functions to sloth. Although it seems low tech comparing to web
labelling it took us surprisingly lots of work to beat this workflow with web
based purpose-built workflow.

------
Eridrus
Why do so few of these systems have integration with something like Mechanical
Turk? Labeling your own data is a good way to get started, but often its a
much better to just pay some people to annotate the data.

~~~
mrgordon
[https://www.figure-eight.com/introducing-instance-based-
pixe...](https://www.figure-eight.com/introducing-instance-based-pixel-
labeling/)

Yes Figure Eight (formerly CrowdFlower) does a ton of this kinda thing.

[https://m.youtube.com/watch?v=wxi2dInWDnI](https://m.youtube.com/watch?v=wxi2dInWDnI)

------
saltandvinegar
For work use, I'd need to host it on-prem. Otherwise even if it's great, legal
etc won't let me use it.

~~~
Riegerb
For on-premise data support, it may be worth considering Labelbox
([https://labelbox.io](https://labelbox.io)) as well.

Disclosure: I work at Labelbox.

