Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Audino – Open-Source Audio and Speech Annotation Tool (github.com/midas-research)
123 points by manrajsingh on June 11, 2020 | hide | past | favorite | 40 comments



You're likely to find people using this to build solutions for remote depositions (in the US). Seems to be something fairly ripe for disruption, the pandemic is exacerbating the demand.

Is there a recorded demo of it somewhere? Would be nice to see it in action as I'm having a little trouble understanding the workflow.


We're working on a recorded demo. For now, the tutorial section explains the workflow well (with screenshots).



To be honest, it's a happy coincidence. The name is an amalgamation of audio and annotation.


Audino (Japanese: タブンネ Tabunne) is a Normal-type Pokémon introduced in Generation V.

While it is not known to evolve into or from any other Pokémon, Audino can Mega Evolve into Mega Audino using the Audinite.


This is incredible!

While we were trying to build up a corpus of transcription data for our company, I often thought we should build something like this an open-source it. We ended up building a one-off hacked-up thing to do it instead, but I'm really glad this exists for any future people in our shoes.

Annotating speech data is super tedious and anything that improves the process even 10% is a huge, huge win.


Even we used to do the same. Hence, we developed this tool to mitigate a lot of the pain points that previous tools brought. Thanks for sharing your experience and would love to hear your experience with this tool.


The name is incredibly easy to confuse with Arduino.

Maybe even ‘audinote’ would not only be less confusing; but also more clear on what the app does?


How does this compare to ELAN (https://archive.mpi.nl/tla/elan) in regards to doing the actual annotations/transcriptions? Or could ELAN/EAF-files perhaps be considered for input formats in future releases?


What are some example usages of this?

I could not really tell right away looking at the docs. Why would I want to use this?


At our lab, we extensively work on problems that involve speech data. This includes tasks like speech recognition, speech scoring, emotion recognition, topic detection and speaker diarisation. Some of these tasks have public data available, while tasks like speech scoring and low-resource speech recognition, the data is fairly limited for supervised learning. Hence, we developed this annotation tool to generate corpus for our need.


In case still not clear, it does not do the transcription, it does not. Oh Hi Mark. It asks you to manually annotate it (in case you want to prepare a training data set for your algorithm), its not an AI algorithm.


This is the most helpful comment here. I still don’t understand what the tool is for though. Up until now I assumed it would allow me to get automatic transcriptions, including breaking them down by speaker.


I was looking into that space recently and I have used otter.ai for transcriptions which gives you 6000 minutes/month for 8 USD, which is insanely cheap in that space. Their British language model is quite good as well.

I’ve bulk exported generated srt/vtt files from my fav podcasts and using tinysearch that was posted here recently with ableplayer to provide audio full text search of my Jekyll published podcasts posts and with clickable timestamps to audio play of search phrases.

Whenever I want to know what podcaster has to say on specific subject a quick search makes such a difference!


Awesome. Thanks for the info. I look forward to trying out your suggestion.


So this tool is mostly a way to store your dataset?

Eg. doing things like force alignment should be done in other tools and use the api to put in the dataset?


Is there an API for getting the data to annotate into that app, and for getting the annotations out?

Most of the time I need the labels in my own system, and don't want to manually move data back and forth.


Yes! To add data to your project, we provide an API-Key and an endpoint to upload the data.

To export the data, there's an option on admin panel from where you can download it in JSON format.

Please check the tutorials section for more details: https://github.com/midas-research/audino/blob/master/docs/tu...


Great. Will you consider adding an API for the JSON export as well?


If I had a subtitle file to use as best guesses for sentence segmentation, could this help extract clips and clean up start and end alignment?


Interesting usecase! Currently, the tool allows creation of datapoint along with reference transcripts. From what I understand, you wish to fix the subtitle start and end time while keeping the transcription for that segment same. If yes, we plan to add an enhancement where you can pass annotations aka segments with transcripts. This should solve your usecase.


That would be awesome. My hacky solution was a waveform and start and end sliders. It would just iterate through and you could accept, reject, or modify the times and text.


FWIW changing version to "3" in the compose file was necessary to get it to build with the latest release of docker-ce.


Strange. Can you open an issue with logs and docker version?


README.md file could use an image. I recommend the one from this page: https://github.com/midas-research/audino/blob/master/docs/tu...


Thank you for your suggestion! We're working hard to get a demo video out and intentionally left space for it. But sure, we can add a placeholder image till then.


This could be cool for analyzing lectures


Maybe it's me, but I was expecting this to be something to do with Arduino, given the name and the colour they've chosen for the logo.


Branding is hard


True, it took days to decide on the name.


Yeah, took me a bit of time staring at the title to unsee it being about Arduino or a portmanteau of it.


OMG, that list of frontend dependencies is just soul-crushing. How does anyone stay sane using NodeJS?


Actually, wasn’t bad when I looked at it. I’ve seen much much x5 worse.

A few font-awesome, testing-library, ES-Lint & react imports. Some of those broader libraries have been broken up so you don’t have to import the whole enchilada.

But ya in a larger project, mixing and matching the versions of some of those components can get tricky. This repo seems reasonable in dependencies, the dependencies of dependencies on the other hand can be crazy in any project these days.


yarn.lock is just under half a megabyte, and lists 1461 packages that it installs. (232 of them are second or subsequent versions of the same package, which typically indicates unmaintained software. It has five versions of kind-of, and four versions of ten other packages.)


Ya, that's not great - don't think the parent project of this post has gone off the rails though. More of an ecosystem problem.


I think you should refer to package.json for actual dependencies. But yes, I agree that tool dependencies are dependent on a lot of dependencies. I'll evaluate and reduce tool dependencies, if possible.

That being said, the gzipped js bundle size is fairly small (under 200kb).


NodeJS == backend


I'm afraid you'll need to revisit this "fact".

The project in question, for instance, only uses NodeJS to build its frontend. The backend is written in python.

People who want to use react for their frontend _have_ to use NodeJS, for instance.


We want to hear more about the tool for speech annotation. Please try and let us know


We, as in the developer? Or we as in a potential user?

I’m a potential user, and I’d certainly like to hear more from somebody who has tried it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: