
Show HN: UniversalDataTool – open-source collaborative data labeling - seveibar
https://github.com/UniversalDataTool/universal-data-tool
======
a_c
May I suggest to have some sample datasets in the demo. Am a bit confused as
to what the tool can do as I don't mingle data frequently

~~~
seveibar
Absolutely, will add some larger/easier to use datasets.

There is a button to import cat images in "Samples -> Import -> Import Cat
Images" which you may find interesting for testing Computer Vision stuff

------
rg2004
to make this more effective, add keyboard shortcuts

~~~
seveibar
for sure- a couple seconds saved per sample will amount to many hours of saved
time even for a small dataset.

I created an issue to track interest in keyboard shortcuts, but I've heard
this feedback from the friends I had beta-test so I think it's gotta be next
on the agenda :)

[https://github.com/UniversalDataTool/universal-data-
tool/iss...](https://github.com/UniversalDataTool/universal-data-
tool/issues/10)

------
djohnston
very cool, i was wondering what open source data annotation tooling existed,
and coincidentally this popped up at the same time.

------
erezsh
Does it have an API for adding labels programatically?

I couldn't find any documentation, or even installation instructions..

------
mkl
> Scales to tens of thousands of data points per dataset

Can you clarify what this means? That sounds pretty small.

~~~
seveibar
You can load tens of thousands of images into the web app at a time.
Unfortunately after 100,000 I start to see performance issues- especially with
collaboration.

I'd like to fix this in the future- but with a web application these limits
are hard to break through.

~~~
mkl
Yes, that should definitely be fixable. I have a simple Flask image labeling
app (whole images, not items within them like this), and it has no upper
limit. Millions is just as quick and easy as dozens. It's so primitive I don't
even use a DB, just one flat file per 10,000 images, that can then easily be
processed with grep etc.

------
kwerk
How does this compare with Label Studio or Doccano?

