The Australian Acoustic Observatory (
https://acousticobservatory.org/) has 360 microphones across the continent, and over 2 million hours of audio. However, none of it is labeled: We want to make this enormous repository useful to researchers. We have found that researchers are often looking for 'hard' signals - specific call-types, birds with very little available training data, and so on. So we built an acoustic-similarity search tool, allowing researchers to provide an example of what they're looking for, which we then match against embeddings from the A2O dataset.
Here's some fun examples!
Laughing Kookaburra: <https://search.acousticobservatory.org/search/index.html?q=h...>
Pacific Koel: <https://search.acousticobservatory.org/search/index.html?q=h...>
Chiming Wedgebill: <https://search.acousticobservatory.org/search/index.html?q=h...>
How it works, in a nutshell:
We use audio source separation (<https://blog.research.google/2022/01/separating-birdsong-in-...>) to pull apart the A2O data, and then run an embedding model (<https://arxiv.org/abs/2307.06292>) on each channel of the separated audio to produce a 'fingerprint' of the sound. All of this is put in a vector database with a link back to the original audio. When someone performs a search, we embed their audio, and then match against all of the embeddings in the vector database.
Right now, about 1% of the A2O data is indexed (the first minute of every recording, evenly sampled across the day). We're looking to get initial feedback and will then continue to iterate and expand coverage.
(Oh, and here's a bit of further reading: https://blog.google/intl/en-au/company-news/technology/ai-ec... )
* A camera shutter,
* A camera with a motorised drive,
* A car alarm,
* A chainsaw, ... etc ?
https://youtu.be/mSB71jNq-yQ?t=113