Show HN: Audino – Open-Source Audio and Speech Annotation Tool

jcims · on June 11, 2020

You're likely to find people using this to build solutions for remote depositions (in the US). Seems to be something fairly ripe for disruption, the pandemic is exacerbating the demand.

Is there a recorded demo of it somewhere? Would be nice to see it in action as I'm having a little trouble understanding the workflow.

manrajsingh · on June 11, 2020

We're working on a recorded demo. For now, the tutorial section explains the workflow well (with screenshots).

nayuki · on June 11, 2020

Nice choice of Pokémon. https://bulbapedia.bulbagarden.net/wiki/Audino_(Pok%C3%A9mon...

manrajsingh · on June 11, 2020

To be honest, it's a happy coincidence. The name is an amalgamation of audio and annotation.

_lx4l · on June 11, 2020

Audino (Japanese: タブンネ Tabunne) is a Normal-type Pokémon introduced in Generation V.

While it is not known to evolve into or from any other Pokémon, Audino can Mega Evolve into Mega Audino using the Audinite.

wgerard · on June 11, 2020

This is incredible!

While we were trying to build up a corpus of transcription data for our company, I often thought we should build something like this an open-source it. We ended up building a one-off hacked-up thing to do it instead, but I'm really glad this exists for any future people in our shoes.

Annotating speech data is super tedious and anything that improves the process even 10% is a huge, huge win.

manrajsingh · on June 11, 2020

Even we used to do the same. Hence, we developed this tool to mitigate a lot of the pain points that previous tools brought. Thanks for sharing your experience and would love to hear your experience with this tool.

lostgame · on June 11, 2020

The name is incredibly easy to confuse with Arduino.

Maybe even ‘audinote’ would not only be less confusing; but also more clear on what the app does?

blipmusic · on June 11, 2020

How does this compare to ELAN (https://archive.mpi.nl/tla/elan) in regards to doing the actual annotations/transcriptions? Or could ELAN/EAF-files perhaps be considered for input formats in future releases?

tmaly · on June 11, 2020

What are some example usages of this?

I could not really tell right away looking at the docs. Why would I want to use this?

manrajsingh · on June 11, 2020

At our lab, we extensively work on problems that involve speech data. This includes tasks like speech recognition, speech scoring, emotion recognition, topic detection and speaker diarisation. Some of these tasks have public data available, while tasks like speech scoring and low-resource speech recognition, the data is fairly limited for supervised learning. Hence, we developed this annotation tool to generate corpus for our need.

mkagenius · on June 11, 2020

In case still not clear, it does not do the transcription, it does not. Oh Hi Mark. It asks you to manually annotate it (in case you want to prepare a training data set for your algorithm), its not an AI algorithm.

jtbayly · on June 11, 2020

This is the most helpful comment here. I still don’t understand what the tool is for though. Up until now I assumed it would allow me to get automatic transcriptions, including breaking them down by speaker.

fluential · on June 13, 2020

I was looking into that space recently and I have used otter.ai for transcriptions which gives you 6000 minutes/month for 8 USD, which is insanely cheap in that space. Their British language model is quite good as well.

I’ve bulk exported generated srt/vtt files from my fav podcasts and using tinysearch that was posted here recently with ableplayer to provide audio full text search of my Jekyll published podcasts posts and with clickable timestamps to audio play of search phrases.

Whenever I want to know what podcaster has to say on specific subject a quick search makes such a difference!

jtbayly · on June 13, 2020

Awesome. Thanks for the info. I look forward to trying out your suggestion.

rock_artist · on June 11, 2020

So this tool is mostly a way to store your dataset?

Eg. doing things like force alignment should be done in other tools and use the api to put in the dataset?

jononor · on June 11, 2020

Is there an API for getting the data to annotate into that app, and for getting the annotations out?

Most of the time I need the labels in my own system, and don't want to manually move data back and forth.

manrajsingh · on June 11, 2020

Yes! To add data to your project, we provide an API-Key and an endpoint to upload the data.

To export the data, there's an option on admin panel from where you can download it in JSON format.

Please check the tutorials section for more details: https://github.com/midas-research/audino/blob/master/docs/tu...

jononor · on June 11, 2020

Great. Will you consider adding an API for the JSON export as well?

carom · on June 11, 2020

If I had a subtitle file to use as best guesses for sentence segmentation, could this help extract clips and clean up start and end alignment?

manrajsingh · on June 11, 2020

Interesting usecase! Currently, the tool allows creation of datapoint along with reference transcripts. From what I understand, you wish to fix the subtitle start and end time while keeping the transcription for that segment same. If yes, we plan to add an enhancement where you can pass annotations aka segments with transcripts. This should solve your usecase.

carom · on June 15, 2020

That would be awesome. My hacky solution was a waveform and start and end sliders. It would just iterate through and you could accept, reject, or modify the times and text.

jcims · on June 11, 2020

FWIW changing version to "3" in the compose file was necessary to get it to build with the latest release of docker-ce.

manrajsingh · on June 11, 2020

Strange. Can you open an issue with logs and docker version?

donpark · on June 11, 2020

README.md file could use an image. I recommend the one from this page: https://github.com/midas-research/audino/blob/master/docs/tu...

manrajsingh · on June 11, 2020

Thank you for your suggestion! We're working hard to get a demo video out and intentionally left space for it. But sure, we can add a placeholder image till then.

sheeeep86 · on June 11, 2020

This could be cool for analyzing lectures

4ndrewl · on June 11, 2020

Maybe it's me, but I was expecting this to be something to do with Arduino, given the name and the colour they've chosen for the logo.

rockwotj · on June 11, 2020

Branding is hard

manrajsingh · on June 11, 2020

True, it took days to decide on the name.

aasasd · on June 11, 2020

Yeah, took me a bit of time staring at the title to unsee it being about Arduino or a portmanteau of it.

classified · on June 11, 2020

OMG, that list of frontend dependencies is just soul-crushing. How does anyone stay sane using NodeJS?

masonhensley · on June 11, 2020

Actually, wasn’t bad when I looked at it. I’ve seen much much x5 worse.

A few font-awesome, testing-library, ES-Lint & react imports. Some of those broader libraries have been broken up so you don’t have to import the whole enchilada.

But ya in a larger project, mixing and matching the versions of some of those components can get tricky. This repo seems reasonable in dependencies, the dependencies of dependencies on the other hand can be crazy in any project these days.

chrismorgan · on June 11, 2020

yarn.lock is just under half a megabyte, and lists 1461 packages that it installs. (232 of them are second or subsequent versions of the same package, which typically indicates unmaintained software. It has five versions of kind-of, and four versions of ten other packages.)

masonhensley · on June 11, 2020

Ya, that's not great - don't think the parent project of this post has gone off the rails though. More of an ecosystem problem.

manrajsingh · on June 11, 2020

I think you should refer to package.json for actual dependencies. But yes, I agree that tool dependencies are dependent on a lot of dependencies. I'll evaluate and reduce tool dependencies, if possible.

That being said, the gzipped js bundle size is fairly small (under 200kb).

jononor · on June 11, 2020

NodeJS == backend

silviot · on June 11, 2020

I'm afraid you'll need to revisit this "fact".

The project in question, for instance, only uses NodeJS to build its frontend. The backend is written in python.

People who want to use react for their frontend _have_ to use NodeJS, for instance.

yamank · on June 11, 2020

We want to hear more about the tool for speech annotation. Please try and let us know

jtbayly · on June 11, 2020

We, as in the developer? Or we as in a potential user?

I’m a potential user, and I’d certainly like to hear more from somebody who has tried it.