Hacker News new | past | comments | ask | show | jobs | submit login
Using deep learning to analyze the ultrasonic vocalizations of rats (mathworks.com)
51 points by respinal 55 days ago | hide | past | web | favorite | 14 comments

It'd be cool to have a watch or smartphone app that translated some animal sounds into estimations of what they might mean. Shazam for animal sounds.

There's certainly interest on this topic from the neuroscience community. At least for mice vocalizations, the first (unsolved) problem is to find a dictionary of possible vocalizations. This is a highly non-trivial problem, mainly because it depends on having a good similarity between (noisy) vocalizations, but also because we don't have a ground truth or gold standard.

The vocalizations vary between strains, and some of them are more less clean than others in their spectrogram representations. Different strains vocalize in different frequency ranges.

Assuming that you've found a dictionary, then you'd have to learn how to map dictionary elements to behavior. Behavior labeling is done by human annotators that spend hundreds (or more) hours looking at mice behavior and learning how to identify different these. This, by nature, is a noisy process as well (and possibly biased).

Given the difficulties, the problem itself is very useful for people in neuroscience because mice only vocalize in social situations, so they see it as a window into studying social behavior of, for example, mice with autistic behavior.

Edit: grammar

Do you know if anything similar is being done with any sort of birdsong?

Yes, with zebra finches, bengalese finches, and starlings, among others.

If you can get that close with your phone the animal is probably scared.

I’d watch an animal ‘big brother’ show with DL generated subtitles though. Couldn’t be worse than the one with people.

Well they haven't quite stooped to cannibalism yet; but give them time...

It would be really cool. And we would probably learn that we are hurting them pretty badly (except maybe some pets).

At that point we wouldn't be able to ignore them or pretend we don't understand them, as we do now.

It’d definitely be really cool. We could find their secret hideouts. Startup idea right there you have.

A guy I know is already doing this, as a nonprofit: http://stochasticlabs.org/portfolio/ai-animal-intelligence/

In case anyone is interested, I'm developing a Python/Tensorflow-based open source project, vak, that annotates birdsong, similar to what Deepsqueak does: https://github.com/NickleDave/vak

(DeepSqueak runs in Matlab.)

We've also developed a neural net architecture which we find gives low error across individuals and is much more lightweight than Faster-RCNN, the net that Deepsqueak uses under the hood: https://github.com/yardencsGitHub/tweetynet/

I've tried to make it so that any net (even Faster-RCNN) can be used with vak and I've built tools so it works with multiple audio and annotation formats (https://github.com/NickleDave/crowsetta).

We're finally about to submit a paper on this, thought I'd share here now case anyone is interested in contributing to the library in the future.

(edit for clarity)

I could be wrong but it seems that most animals do a better job of understanding our intent than the other way around. If that is actually the case I couldn't hazard a guess as to why, but it only takes me a short time to teach my dog to sit, but I'm pretty dumb to what she's trying to tell me. Hopefully this will help us catch up.

Why 2D convolutions over a spectrogram instead of 1D convolutions over the waveform? It seems strange to do that small amount of feature extraction if deep learning is being used anyway.

I tried some deep learning to figure out what my cat was saying. Turned out it was:

"feed me" "feed me" "feed me" "Feed me" "Feed Me" "FEEED MEEEE you stupid human"

Your learning model may be imcomplete. I observe far more complex and interestingly nuanced layers of meaning in my daily interactions with a number of cats. That number is seven, and for starters I can always single out - by sound alone - exactly who is doing the talking and what the gist of it may be. Might be food, yes, but might equally well be about my desired presence for company, about changing of litter, opening of doors, a bat in the woodstove, a stranger in the driveway, a settling of scores, an attempt at bullshitting me into some action, a call of general distress, what have you. Certainly, I'm only scratching the surface of understanding. I notice that my dog - of whom some of them seem extremely fond - gets targeted with some interspecies communication that I either don't or am too thick to grasp.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact