
Show HN: Audino – Open-Source Audio and Speech Annotation Tool - manrajsingh
https://github.com/midas-research/audino
======
jcims
You're likely to find people using this to build solutions for remote
depositions (in the US). Seems to be something fairly ripe for disruption, the
pandemic is exacerbating the demand.

Is there a recorded demo of it somewhere? Would be nice to see it in action as
I'm having a little trouble understanding the workflow.

~~~
manrajsingh
We're working on a recorded demo. For now, the tutorial section explains the
workflow well (with screenshots).

------
nayuki
Nice choice of Pokémon.
[https://bulbapedia.bulbagarden.net/wiki/Audino_(Pok%C3%A9mon...](https://bulbapedia.bulbagarden.net/wiki/Audino_\(Pok%C3%A9mon\))

~~~
manrajsingh
To be honest, it's a happy coincidence. The name is an amalgamation of audio
and annotation.

------
wgerard
This is incredible!

While we were trying to build up a corpus of transcription data for our
company, I often thought we should build something like this an open-source
it. We ended up building a one-off hacked-up thing to do it instead, but I'm
really glad this exists for any future people in our shoes.

Annotating speech data is super tedious and anything that improves the process
even 10% is a huge, huge win.

~~~
manrajsingh
Even we used to do the same. Hence, we developed this tool to mitigate a lot
of the pain points that previous tools brought. Thanks for sharing your
experience and would love to hear your experience with this tool.

------
lostgame
The name is _incredibly_ easy to confuse with Arduino.

Maybe even ‘audinote’ would not only be less confusing; but also more clear on
what the app does?

------
blipmusic
How does this compare to ELAN
([https://archive.mpi.nl/tla/elan](https://archive.mpi.nl/tla/elan)) in
regards to doing the actual annotations/transcriptions? Or could ELAN/EAF-
files perhaps be considered for input formats in future releases?

------
tmaly
What are some example usages of this?

I could not really tell right away looking at the docs. Why would I want to
use this?

~~~
manrajsingh
At our lab, we extensively work on problems that involve speech data. This
includes tasks like speech recognition, speech scoring, emotion recognition,
topic detection and speaker diarisation. Some of these tasks have public data
available, while tasks like speech scoring and low-resource speech
recognition, the data is fairly limited for supervised learning. Hence, we
developed this annotation tool to generate corpus for our need.

~~~
mkagenius
In case still not clear, it does not do the transcription, it does not. Oh Hi
Mark. It asks you to manually annotate it (in case you want to prepare a
training data set for your algorithm), its not an AI algorithm.

~~~
jtbayly
This is the most helpful comment here. I still don’t understand what the tool
is for though. Up until now I assumed it would allow me to get automatic
transcriptions, including breaking them down by speaker.

~~~
fluential
I was looking into that space recently and I have used otter.ai for
transcriptions which gives you 6000 minutes/month for 8 USD, which is insanely
cheap in that space. Their British language model is quite good as well.

I’ve bulk exported generated srt/vtt files from my fav podcasts and using
tinysearch that was posted here recently with ableplayer to provide audio full
text search of my Jekyll published podcasts posts and with clickable
timestamps to audio play of search phrases.

Whenever I want to know what podcaster has to say on specific subject a quick
search makes such a difference!

~~~
jtbayly
Awesome. Thanks for the info. I look forward to trying out your suggestion.

------
jononor
Is there an API for getting the data to annotate into that app, and for
getting the annotations out?

Most of the time I need the labels in my own system, and don't want to
manually move data back and forth.

~~~
manrajsingh
Yes! To add data to your project, we provide an API-Key and an endpoint to
upload the data.

To export the data, there's an option on admin panel from where you can
download it in JSON format.

Please check the tutorials section for more details:
[https://github.com/midas-
research/audino/blob/master/docs/tu...](https://github.com/midas-
research/audino/blob/master/docs/tutorial.md)

~~~
jononor
Great. Will you consider adding an API for the JSON export as well?

------
TACIXAT
If I had a subtitle file to use as best guesses for sentence segmentation,
could this help extract clips and clean up start and end alignment?

~~~
manrajsingh
Interesting usecase! Currently, the tool allows creation of datapoint along
with reference transcripts. From what I understand, you wish to fix the
subtitle start and end time while keeping the transcription for that segment
same. If yes, we plan to add an enhancement where you can pass annotations aka
segments with transcripts. This should solve your usecase.

~~~
TACIXAT
That would be awesome. My hacky solution was a waveform and start and end
sliders. It would just iterate through and you could accept, reject, or modify
the times and text.

------
jcims
FWIW changing version to "3" in the compose file was necessary to get it to
build with the latest release of docker-ce.

~~~
manrajsingh
Strange. Can you open an issue with logs and docker version?

------
donpark
README.md file could use an image. I recommend the one from this page:
[https://github.com/midas-
research/audino/blob/master/docs/tu...](https://github.com/midas-
research/audino/blob/master/docs/tutorials/annotation-dashboard.md)

~~~
manrajsingh
Thank you for your suggestion! We're working hard to get a demo video out and
intentionally left space for it. But sure, we can add a placeholder image till
then.

------
sheeeep86
This could be cool for analyzing lectures

------
4ndrewl
Maybe it's me, but I was expecting this to be something to do with Arduino,
given the name and the colour they've chosen for the logo.

~~~
rockwotj
Branding is hard

~~~
manrajsingh
True, it took days to decide on the name.

------
classified
OMG, that list of frontend dependencies is just soul-crushing. How does anyone
stay sane using NodeJS?

~~~
masonhensley
Actually, wasn’t bad when I looked at it. I’ve seen much much x5 worse.

A few font-awesome, testing-library, ES-Lint & react imports. Some of those
broader libraries have been broken up so you don’t have to import the whole
enchilada.

But ya in a larger project, mixing and matching the versions of some of those
components can get tricky. This repo seems reasonable in dependencies, the
dependencies of dependencies on the other hand can be crazy in any project
these days.

~~~
chrismorgan
yarn.lock is just under half a megabyte, and lists 1461 packages that it
installs. (232 of them are second or subsequent versions of the same package,
which typically indicates unmaintained software. It has five versions of kind-
of, and four versions of ten other packages.)

~~~
masonhensley
Ya, that's not great - don't think the parent project of this post has gone
off the rails though. More of an ecosystem problem.

~~~
manrajsingh
I think you should refer to package.json for actual dependencies. But yes, I
agree that tool dependencies are dependent on a lot of dependencies. I'll
evaluate and reduce tool dependencies, if possible.

That being said, the gzipped js bundle size is fairly small (under 200kb).

------
yamank
We want to hear more about the tool for speech annotation. Please try and let
us know

~~~
jtbayly
We, as in the developer? Or we as in a potential user?

I’m a potential user, and I’d certainly like to hear more from somebody who
has tried it.

