Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Machine learning resources for audio processing
252 points by samrohn 32 days ago | hide | past | web | favorite | 26 comments
What are some good learning resources on audio processing, detection and anomaly detection using machine learning or deep learning? I am interested in machine predictive maintenance using audio anomaly detection

There's a good class at UIUC regarding signal processing:


Course is led by Paris Smaragdis, one of top researchers in the field of audio processing.

The folks behind audio set have been working on general audio event detection for some years now, I believe.


There's a huge amount to discuss in the audio domain... But for a starting place, using ResNet on spectrograms to build a binary classifier is a good place to start.

I am taking a course called "Speech and Audio Understanding" from Prof. Michael I Mandel, you can check course website[1] , he has a good collection of resources. Also his github stars are good collection of related projects[2]. In class we are using a book called "Human and Machine Hearing: Extracting Meaning from Sound" by Richard F. Lyon, authors shares it for free [3] For example one of the resources you will see on the course website is presentations from interspeech2018, you can check all tutorials from there[4].

[1] http://mr-pc.org/t/csc83060/

[2] https://github.com/mim?tab=stars

[3] http://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018.pdf

[4] http://interspeech2018.org/program-tutorials.html

Just found this thread on the fast.ai forum yesterday that may help: https://forums.fast.ai/t/deep-learning-with-audio-thread/381...

I just found this too! Great thread for the new Kaggle Freesound competition.

I don't know if this is off topic but would it be possible to remove the sound of mechanical keyboards with ML in realtime from a VOIP stream? Sell the technology to Discord and profit.

Is this a big problem? I thought people loved mechanical keyboard sounds.

People love their own, I'm sure. I've never met anybody who puts on a tape of it, though.

In some of my early YouTube videos (for classes I taught), I would live code. One complaint one of my students had was while they loved the videos, the key strokes were distracting.

You may reuse some concepts I have described for an audio adblock: https://www.adblockradio.com/blog/2018/11/15/designing-audio...

More precisely, audio spectral preprocessing then neural network such as LSTM.

I think the slides/recording of this excellent Spotify talk will be posted shortly: https://qcon.ai/qconai2019/presentation/deep-learning-audio-....

aubio and librosa are two excellent MIR (music information retrieval) tools I can recommend from personal use. They can both be implemented for real-time audio using pyaudio or similar.



To append to my own comment, something I haven't tried myself yet but I'm planning to: Urban Sound Classification with Neural Networks in Tensorflow


I am also curious about this topic! I have picked up a jetson nano and fully intend to put this device to use by projecting comic-book panel-style speech bubbles (plus, who knows... random panels?) on the wall leveraging pytorch + deepspeech.

That's at least the idea kicking around in my head at the moment. https://github.com/SeanNaren/deepspeech.pytorch

I'm no expert. Haven't done it. Don't really want to send every convo into the cloud or my tinfoil hat will start burning.

You do not need a jetson to get started investigating. Maybe just nvidia for that particular library. If you find something, maybe you can let me know somehow.


Here's a resource that breaks down the various audio processing tasks and provides case studies: https://www.analyticsvidhya.com/blog/2018/01/10-audio-proces...

It's slightly academic so here's a more practical resource: https://towardsdatascience.com/audio-classification-using-fa...

I would get lunch with these guys:


These sketch balls can use your phone's mic to detect what is streaming in a living room.

Recently I started looking in to this as a backup method of anomaly detection while performing automated testing of our robotics. I concluded that it's actually pretty easy. Depending upon how simplistic your requirements, you can even achieve this cheaply and effectively on a very tiny microprocessor with an attached surface mount MEMS microphone. Additional features like anomalous audio recording, timestamping and alert transmission are not that hard either. No need for a fully-fledged general purpose operating system, or complex algorithms.

See this book and the sources it links to: https://musicinformationretrieval.com/ Also google for pitch and onset detection. If you want more specific help, you have to ask a more specific question.

It sounds as if he's looking for tools that can be used to monitor the sounds coming from machinery to detect or predict impending failures. I found your link interesting since I'm interested in musical applications of machine learning, but I don't think it's what he's looking for.

MIR is where the research is at. Not nearly as much work has been done in the general audio IR domain. But most methods are easily transferable. E.g tempo estimation would perhaps serve his anomaly detection needs.

https://github.com/ybayle/awesome-deep-learning-music a "Non-exhaustive list of scientific articles on deep learning for music"

There are many great resources to reference here:


Contact the founder / maker of Auphonic.com - he's a super nice and clever guy who does this kind of stuff for a living. He'll definitely point you into the right direction.

This depends if you're interested in creative applications or analytical (MIR) ones. The two fields share a lot of techniques, but the way they are used is wildly different.

piston aircraft?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact