The vocalizations vary between strains, and some of them are more less clean than others in their spectrogram representations. Different strains vocalize in different frequency ranges.
Assuming that you've found a dictionary, then you'd have to learn how to map dictionary elements to behavior. Behavior labeling is done by human annotators that spend hundreds (or more) hours looking at mice behavior and learning how to identify different these. This, by nature, is a noisy process as well (and possibly biased).
Given the difficulties, the problem itself is very useful for people in neuroscience because mice only vocalize in social situations, so they see it as a window into studying social behavior of, for example, mice with autistic behavior.
I’d watch an animal ‘big brother’ show with DL generated subtitles though. Couldn’t be worse than the one with people.
At that point we wouldn't be able to ignore them or pretend we don't understand them, as we do now.
(DeepSqueak runs in Matlab.)
We've also developed a neural net architecture which we find gives low error across individuals and is much more lightweight than Faster-RCNN, the net that Deepsqueak uses under the hood:
I've tried to make it so that any net (even Faster-RCNN) can be used with vak and I've built tools so it works with multiple audio and annotation formats (https://github.com/NickleDave/crowsetta).
We're finally about to submit a paper on this, thought I'd share here now case anyone is interested in contributing to the library in the future.
(edit for clarity)
"FEEED MEEEE you stupid human"