
Ask HN: Machine learning resources for audio processing - samrohn
What are some good learning resources on audio processing, detection and anomaly detection  using machine learning or deep learning? I am interested in machine predictive maintenance using audio anomaly detection
======
citilife
There's a good class at UIUC regarding signal processing:

[https://courses.engr.illinois.edu/cs598ps/fa2018/material.ht...](https://courses.engr.illinois.edu/cs598ps/fa2018/material.html)

Course is led by Paris Smaragdis, one of top researchers in the field of audio
processing.

------
sdenton4
The folks behind audio set have been working on general audio event detection
for some years now, I believe.

[https://research.google.com/audioset/](https://research.google.com/audioset/)

There's a huge amount to discuss in the audio domain... But for a starting
place, using ResNet on spectrograms to build a binary classifier is a good
place to start.

------
enisberk
I am taking a course called "Speech and Audio Understanding" from Prof.
Michael I Mandel, you can check course website[1] , he has a good collection
of resources. Also his github stars are good collection of related
projects[2]. In class we are using a book called "Human and Machine Hearing:
Extracting Meaning from Sound" by Richard F. Lyon, authors shares it for free
[3] For example one of the resources you will see on the course website is
presentations from interspeech2018, you can check all tutorials from there[4].

[1] [http://mr-pc.org/t/csc83060/](http://mr-pc.org/t/csc83060/)

[2] [https://github.com/mim?tab=stars](https://github.com/mim?tab=stars)

[3]
[http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf](http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf)

[4] [http://interspeech2018.org/program-
tutorials.html](http://interspeech2018.org/program-tutorials.html)

------
am807
Just found this thread on the fast.ai forum yesterday that may help:
[https://forums.fast.ai/t/deep-learning-with-audio-
thread/381...](https://forums.fast.ai/t/deep-learning-with-audio-thread/38123)

~~~
zneveu
I just found this too! Great thread for the new Kaggle Freesound competition.

------
Tangokat
I don't know if this is off topic but would it be possible to remove the sound
of mechanical keyboards with ML in realtime from a VOIP stream? Sell the
technology to Discord and profit.

~~~
alphagrep12345
Is this a big problem? I thought people loved mechanical keyboard sounds.

~~~
wpietri
People love _their own_ , I'm sure. I've never met anybody who puts on a tape
of it, though.

------
dest
You may reuse some concepts I have described for an audio adblock:
[https://www.adblockradio.com/blog/2018/11/15/designing-
audio...](https://www.adblockradio.com/blog/2018/11/15/designing-audio-ad-
block-radio-podcast/)

More precisely, audio spectral preprocessing then neural network such as LSTM.

------
williamsmj
I think the slides/recording of this excellent Spotify talk will be posted
shortly: [https://qcon.ai/qconai2019/presentation/deep-learning-
audio-...](https://qcon.ai/qconai2019/presentation/deep-learning-audio-
signals-prepare-process-design-expect).

------
telesilla
aubio and librosa are two excellent MIR (music information retrieval) tools I
can recommend from personal use. They can both be implemented for real-time
audio using pyaudio or similar.

[https://aubio.org/doc/latest/](https://aubio.org/doc/latest/)

[https://librosa.github.io/librosa/](https://librosa.github.io/librosa/)

~~~
telesilla
To append to my own comment, something I haven't tried myself yet but I'm
planning to: Urban Sound Classification with Neural Networks in Tensorflow

[https://www.kdnuggets.com/2016/09/urban-sound-
classification...](https://www.kdnuggets.com/2016/09/urban-sound-
classification-neural-networks-tensorflow.html)

------
konsoleXD
I am also curious about this topic! I have picked up a jetson nano and fully
intend to put this device to use by projecting comic-book panel-style speech
bubbles (plus, who knows... random panels?) on the wall leveraging pytorch +
deepspeech.

That's at least the idea kicking around in my head at the moment.
[https://github.com/SeanNaren/deepspeech.pytorch](https://github.com/SeanNaren/deepspeech.pytorch)

I'm no expert. Haven't done it. Don't really want to send every convo into the
cloud or my tinfoil hat will start burning.

You do not need a jetson to get started investigating. Maybe just nvidia for
that particular library. If you find something, maybe you can let me know
somehow.

Peace

------
devin
[https://github.com/ybayle/awesome-deep-learning-
music](https://github.com/ybayle/awesome-deep-learning-music) a "Non-
exhaustive list of scientific articles on deep learning for music"

------
tixocloud
Here's a resource that breaks down the various audio processing tasks and
provides case studies: [https://www.analyticsvidhya.com/blog/2018/01/10-audio-
proces...](https://www.analyticsvidhya.com/blog/2018/01/10-audio-processing-
projects-applications/)

It's slightly academic so here's a more practical resource:
[https://towardsdatascience.com/audio-classification-using-
fa...](https://towardsdatascience.com/audio-classification-using-fastai-and-
on-the-fly-frequency-transforms-4dbe1b540f89)

------
ransom1538
I would get lunch with these guys:

[https://www.audiblemagic.com/](https://www.audiblemagic.com/)

These sketch balls can use your phone's mic to detect what is streaming in a
living room.

------
contingencies
Recently I started looking in to this as a backup method of anomaly detection
while performing automated testing of our robotics. I concluded that it's
actually pretty easy. Depending upon how simplistic your requirements, you can
even achieve this cheaply and effectively on a very tiny microprocessor with
an attached surface mount MEMS microphone. Additional features like anomalous
audio recording, timestamping and alert transmission are not that hard either.
No need for a fully-fledged general purpose operating system, or complex
algorithms.

------
bjourne
See this book and the sources it links to:
[https://musicinformationretrieval.com/](https://musicinformationretrieval.com/)
Also google for pitch and onset detection. If you want more specific help, you
have to ask a more specific question.

~~~
inetsee
It sounds as if he's looking for tools that can be used to monitor the sounds
coming from machinery to detect or predict impending failures. I found your
link interesting since I'm interested in musical applications of machine
learning, but I don't think it's what he's looking for.

~~~
bjourne
MIR is where the research is at. Not nearly as much work has been done in the
general audio IR domain. But most methods are easily transferable. E.g tempo
estimation would perhaps serve his anomaly detection needs.

------
ml-engineer
There are many great resources to reference here:

[https://www.science.wiki/search?keyword=audio+processing](https://www.science.wiki/search?keyword=audio+processing)

------
iagooar
Contact the founder / maker of Auphonic.com - he's a super nice and clever guy
who does this kind of stuff for a living. He'll definitely point you into the
right direction.

------
jamesb93
This depends if you're interested in creative applications or analytical (MIR)
ones. The two fields share a lot of techniques, but the way they are used is
wildly different.

------
preetiagarwal
thanks for sharing article [https://www.exltech.in/mechanical-design-
training.html](https://www.exltech.in/mechanical-design-training.html)

------
xylophone
piston aircraft?

