
Using deep learning to analyze the ultrasonic vocalizations of rats - respinal
https://blogs.mathworks.com/headlines/2019/04/24/deep-learning-deciphers-what-rats-are-saying/
======
dmix
It'd be cool to have a watch or smartphone app that translated some animal
sounds into estimations of what they might mean. Shazam for animal sounds.

~~~
kmundnic
There's certainly interest on this topic from the neuroscience community. At
least for mice vocalizations, the first (unsolved) problem is to find a
dictionary of possible vocalizations. This is a highly non-trivial problem,
mainly because it depends on having a good similarity between (noisy)
vocalizations, but also because we don't have a ground truth or gold standard.

The vocalizations vary between strains, and some of them are more less clean
than others in their spectrogram representations. Different strains vocalize
in different frequency ranges.

Assuming that you've found a dictionary, then you'd have to learn how to map
dictionary elements to behavior. Behavior labeling is done by human annotators
that spend hundreds (or more) hours looking at mice behavior and learning how
to identify different these. This, by nature, is a noisy process as well (and
possibly biased).

Given the difficulties, the problem itself is very useful for people in
neuroscience because mice only vocalize in social situations, so they see it
as a window into studying social behavior of, for example, mice with autistic
behavior.

Edit: grammar

~~~
heyitsguay
Do you know if anything similar is being done with any sort of birdsong?

~~~
joeyo
Yes, with zebra finches, bengalese finches, and starlings, among others.

------
nickledave
In case anyone is interested, I'm developing a Python/Tensorflow-based open
source project, vak, that annotates birdsong, similar to what Deepsqueak does:
[https://github.com/NickleDave/vak](https://github.com/NickleDave/vak)

(DeepSqueak runs in Matlab.)

We've also developed a neural net architecture which we find gives low error
across individuals and is much more lightweight than Faster-RCNN, the net that
Deepsqueak uses under the hood:
[https://github.com/yardencsGitHub/tweetynet/](https://github.com/yardencsGitHub/tweetynet/)

I've tried to make it so that any net (even Faster-RCNN) can be used with vak
and I've built tools so it works with multiple audio and annotation formats
([https://github.com/NickleDave/crowsetta](https://github.com/NickleDave/crowsetta)).

We're finally about to submit a paper on this, thought I'd share here now case
anyone is interested in contributing to the library in the future.

(edit for clarity)

------
jcims
I could be wrong but it seems that most animals do a better job of
understanding our intent than the other way around. If that is actually the
case I couldn't hazard a guess as to why, but it only takes me a short time to
teach my dog to sit, but I'm pretty dumb to what she's trying to tell me.
Hopefully this will help us catch up.

------
tagh
Why 2D convolutions over a spectrogram instead of 1D convolutions over the
waveform? It seems strange to do that small amount of feature extraction if
deep learning is being used anyway.

------
WalterBright
I tried some deep learning to figure out what my cat was saying. Turned out it
was:

"feed me" "feed me" "feed me" "Feed me" "Feed Me" "FEEED MEEEE you stupid
human"

~~~
interfixus
Your learning model may be imcomplete. I observe far more complex and
interestingly nuanced layers of meaning in my daily interactions with a number
of cats. That number is seven, and for starters I can always single out - by
sound alone - exactly who is doing the talking and what the gist of it may be.
Might be food, yes, but might equally well be about my desired presence for
company, about changing of litter, opening of doors, a bat in the woodstove, a
stranger in the driveway, a settling of scores, an attempt at bullshitting me
into some action, a call of general distress, what have you. Certainly, I'm
only scratching the surface of understanding. I notice that my dog - of whom
some of them seem extremely fond - gets targeted with some interspecies
communication that I either don't or am too thick to grasp.

