Hacker News new | past | comments | ask | show | jobs | submit login
Behind the scenes of Sound ID in Merlin – Identify birds using your phone (2021) (macaulaylibrary.org)
235 points by _xerces_ on Dec 6, 2023 | hide | past | favorite | 51 comments



The Merlin app has transformed my nature/photo walks, and got me interested in birds.

I never realized how many unique birds I was hearing as I’d walk through an area until I turned this app on. It opened a whole new awareness, and helped me remember which bird calls go with which species.

There’s also something incredibly cool and satisfying and beautiful about the resulting spectrograms. Watching the shape of the bird call appear visually in realtime really changed my experience of listening.

I blame this app for making me buy a birding lens for my camera, which had previously been used primarily for landscapes/cityscapes. I highly recommend giving it a try.


> I blame this app for making me buy a birding lens for my camera

Haha, same thing, girlfriend installed Merlin on her phone, our walks turned into birdwatching hunts and now I’m eyeing the nice and affordable RF 100-400mm F5.6-8 IS USM to make things more interesting. I’m hoping it stops at that, birding lenses are expensive.


I should warn you now. That lens will definitely get you started and you’ll get some great shots. It’ll also make you want something even longer if you’re anything like me.

Still seems like a good place to start because it’ll make you extremely aware of everything you want the next bigger lens to do ;)


One of the underappreciated health benefits of birding is the increase in upper body strength you get from lifting all that heavy glass you end up purchasing ... ; )

And, yes, Merlin has been transformative for me, too.


I made a choice to make birdwatching an interest I immerse myself with while there. Basically meaning, I prefer to watch birds through binocculars, in the moment, and keep them in memory, more than capturing the moment for later. I am not against the idea of photographing birds as such, I just realized that in most cases, they will remain small even with very expensive lenses, and the effort to capture really great birding images is going to be significant.


Part of the reason I started photography is because I have Aphantasia, and when I learned that other people had visual memories, it cemented photography as an important hobby for me. Without the photos, I only take home a sense of knowing what I saw. Nothing visual.

I don’t think I would have invested in a birding lens if I wasn’t already deep into photography, but it’s possible to get a reasonable starter kit for $2K or less if the goal isn’t to make 30” prints.

It’s still nothing to sneeze at, but those $10K lenses aren’t necessary to get some really great results for hobby purposes.

The challenge of getting the shots is part of what makes the process enjoyable and the resulting photos extremely rewarding. It also lets me explore in far more detail once I get home. I brought home a 40MP photo of a cardinal fully filling the frame today, and it’s pretty incredible zooming up on the plumage, seeing the intricacies of their talons, etc.

With that said, I completely respect the non-photography approach and do think that the gear gets distracting at times.


Are there not binoculars available that will take pictures?


There are binoculars with cameras but they aren't good cameras cause small sensor and not good binoculars cause cheap.


Find a nice used EF 400 f/5.6L. It's hard to know what to write about it, except to say that it's great and has been a reliable companion for years.


for birds I would recommend something 600mm or more. Smaller birds are hard to photograph with anything under. Sigma makes cheap ones.


I have similar experiences. My parents both use the app also. It’s one of the few apps I hear their generation (retired and 65+) talking about.

Now I’m imaging a crossover with this and Pokémon Go. Gamified bird watching!


thats Birda!


Seconded. using this app made me realize the diverse ecosystem singing in the neighborhood. I’ve always enjoyed outdoors, but never really saw the point of birdwatching until this app.


I'm hooked as well. I recognize many birds by sight, and quite a few by their calls and songs. Many tho, especially smaller wrens, chickadees, etc I cannot tell apart. And others like vultures I see often but have never heard.

If they implemented a function to post your list for a single recording even to social media, usage would blow it up. I've recorded 15 species on a 10 minute walk. Blows my mind.


What specs does your birding lens have, if I may ask?


I went with the Fujifilm 150-600mm F5.6-8 and I’m shooting on an X-T5.

Because it’s an APS-C sensor, it’s a 225-900mm full frame equivalent focal length.

I had tried the Fuji 70-300 with a 1.4X teleconverter before that, and while it’s a nice lens in a small package, I wad unhappy with the reach. The 150-600 is pretty great so far, but is not good in low light.


Same here. It's helping me train my own ear and eyes to identify birds.


So funny to see this! I've had the pleasure of working with Grant (one of the lead researchers on Merlin Sound ID) on other porjects in the past! The whole team is kind-hearted and incredibly smart. Sound ID is one of the very few research projects that make me instantly smile hearing about in the news.

I've always been impressed with the responsibility that the Cornell Lab and Macaulay Library give toward involving people (citizen scientists, raters, and experts alike) in the curation of the data they use. This isn't an application where "hoovering up the entire Internet" is the right thing to do; it's a case where those hand-tweaked parameters make all the difference. A considerable amount of thought is given to every sound sample, and every person in the pipeline's working incredibly hard.

Hats off the team to publishing more about the internal process!


The article is very thin on actual technical details. It doesn’t describe how the spectrograms are processed or details of the neural network they’re using (other than using convolutions, which is pretty standard in CV/audio/speech processing).

What is interesting to me, though, is that they mention the importance of more precise time labeling of sounds. Previous BirdCLEF Kaggle competitions just provided many-minute long recordings where all that is known (labelled) is that a certain list of species are to be heard, but not when in the recording. That’s not too useful as training data, as they point out in the article:

    This can lead to problems: if other species are singing in the same recording, the model will erroneously call all species in the recording a White-breasted Nuthatch, leading to false predictions.


I've always thought that these apps are missing a feature-set: Manipulation of the sound clip prior to identification. In both this app and BirdNET the selection always selects from the lowest recorded frequency to the highest -- I would like to be able to select the frequency band I'm interested in. Maybe first select time, then put corners on the rectangle to be shaped. The sound file passed to the identifier would appear cropped such that the unselected region is totally silent to the identifier. I am often trying to identify an owl or other lower-registered bird but it is unidentifiable because of noisy crickets/cicadas.


It would be useful to be able to change the spectrogram parameters to achieve what you’re suggesting, but it’s likely that this might cause the model to misbehave if it was trained only on spectrograms of audio samples that weren’t frequency-band limited.


There are applications that have features like this (for example ArtemiS by HEAD Acoustics). I have long wanted an open-source solution for this.


If you're in the Bay Area in California, you're in one of the best spots in the world for birding. Grab some decent binoculars -- https://birdwatchinghq.com/best-binoculars-for-watching-bird... has a list and I'd add in these Athlon Midas 8x42 for a tad under $200 if that's your budget https://www.walmart.com/ip/Midas-8x42-UHD-Binocular/53653462 -- and get out there. https://www.kqed.org/news/11804822/a-beginners-guide-to-bird... has a list of spots, as does https://www.sidewalksafari.com/2023/01/bird-watching-bay-are... .

Birding is what got me into field recording, which is another fun hobby among the far-too-many that I don't have enough time for. I'll save you some trouble: get a Zoom F3 or Sound Devices MixPre-3 II and a Sennheiser MKE 600 plus something like a Rode Blimp with the big fuzzy "dead cat" for wind noise...or two DPA 4060 -- get the DPA KIT-4060-OC-SMK stereo kit from StudioCare or Thomann because those XLR to MicroDot adapters are expensive and the knock-offs aren't as good...not a worthwhile compromise if you're looking at spending this much -- and put those on a DIY Jecklin disk for ambient recording or a parabolic for highly directional sound + another on the handle for ambient.

https://www.macaulaylibrary.org/resources/audio-recording-ge... has more info on microphones and parabolics if you're looking to get into this, but do go out and spend a few dozen hours in the field first before dropping a ton of money on your new hobby.


I'll take this change to hype up BirdNET-Pi, a citizen science project where participants run a bird listening station 24/7. The results can be sent to birdweather.com. I'm always kind of shocked that I'm one of only six such stations in the Los Angeles area, and I'd love to see more people sign up.


Would love to setup a station but not sure how I would protect the Pi from the elements if installed outside. Did you somehow weatherize your Pi or do you use a remote microphone over a long wire or some kind of wireless mic solution?


I mean to write a blog entry about this, but clearly that isn't happening soon, so here's the quick summary. Yes, I weatherized my pi by putting it in a weatherproof case. I bough a fancy case (no longer in production, but kind of like this: https://corpshadow.biz/raspberry-pi/raspberry-pi-ip65-weathe... ) but honestly if I were to do it again, I'd buy a big cheap waterproof box for electrical connections like this: https://www.amazon.com/Outdoor-Electrical-Weatherproof-Prote... .

I put the mic under an eave so it doesn't get wet. I bought a nice mic, but it's definitely overkill. Next time, I'll buy a half-dozen cheap mics from ali express and treat them as semi-disposable.


I've always used Birdnet for my birdsong identification. Googling the differences, it looks like BirdNet is better for outside of the US.

https://goldengatebirdalliance.org/blog-posts/birding-by-ear...


So, very different from the Shazam/Abracadabra post from the other day. It's a different technique. It would be nice to get a comparison.


For music id, you have a static base doing that you're looking to compare to.

Bird sing varies by individual, however, so extracting peak frequencies doesn't stand a chance of working. For example, birds might song at a higher or lower pitch depending on body size or environmental noise. Depending on species, you may have regional, flock, or individual various in the songs themselves. There are also typically a number of different songs and calls per species.


That's cool I've always wanted a Shazam for the outdoors ... like at night what are all the animal sounds Im hearing in the pond, while camping in the forest, etc

It could pinpoint how far away those animals are from you. Which when camping in the forest could be a good thing to warn you of a bear or another threatening animal close by..it could also listen while you sleep and wake you up if it detects something threatening semi close or close.


Anyone use Merlin with location off? I've used and watched it be ultra common/vague but mostly fail in 6 countries. Is it simply using location and the merest hint of sound captured to calculate the suggested bird? Full disclosure, I haven't used it (since) for around a year, some things can change for the better. I also have a brand new iPhone since - better mics?


Bird vocalizations are variable and with only so much space within which they can vary not all sounds can be identified even with close study of the spectrogram. Plus there are plenty of birds which mimic the calls of other birds very accurately. If you're taking away context clues from location it is not surprising that it would not work super well.


Yes pretty much but usually that is enough. It is usually accurate enough to distinguish Northern Flickers and Gilded Flickers which sound almost identical so good enough.


If you like Merlin, take a look at Seek by iNaturalist.


I don't know much about Merlin's accuracy with bird calls, yet unfortunately Seek isn't enough to accurately identify plants and often leads to incorrect or overly confident (ie wrong) identifications when not enough information is available. In keying out plants, especially at the level of distinguishing species, features needed for identification often can't be gleaned from a single or even sequence of phone photos. Blooms (not always present due to seasons), internal flower or fruit structures, seasonal foliage, and features at photo incompatible scales (hairs on a leaf and also overall branching structure are hard to capture in one picture) are all common diagnostics in keys, and using neural nets biased to suggest the most commonly occurring plants leads to similar yet less common ones being misidentified.

It's a great resource for getting down to the family or sometimes even genus level, yet botany knowledge and a local or regional key/flora is really needed beyond that in most cases, unfortunately. I still think it's great for folks who otherwise would never get into botany and also raising natural awareness, yet I do worry about its impact on citizen science by giving a false sense of confidence in shaky algorithmic classifications, potentially tainting data collected for more rigorous purposes.


> Seek isn't enough to accurately identify plants

Same for other apps that identify plants, insects, and spiders. It's very frustrating that the public is being sold fully-baked photo identification, when in reality the technology is not there. Especially bad though, as you mentioned, is that even if it were perfect there would still be uncertainty based on features that weren't or can't be photographed.


This app is beyond amazing! Got me interested and curious about birds - only in my backyard I have ID'd more than 30 different species (I live besides a forest reserve in southeast Brazil). I now know them better and learned already how to recognize their music by ear for at least a few of them!

A couple of hours ago I was lucky to have the app recording when a Mesembrinibis cayennensis flew above the house making loud and curious sounds I can't describe really well. I heard it before a few times, but never knew what bird it was and was never able to spot it in the sky (not even today, which is unfortunate because it's a beautiful bird by the photos online!).

When I get some cash and free time I'm thinking about mounting a 24/7 bird id station to leave in my backyard :D And perhaps a live big beautiful spectrogram inside the house..


I'm super interested in their choice of features here; I never expected they'd be using the image of the spectrogram fed into a vision model like a CNN rather than the audio signal itself. My hazy memory from school was that speech recognition typically used MFCCs (Mel-frequency cepstral coefficients), which were some sort of manipulation of the frequencies. I don't remember how that really works but it seems more straightforward than turning an audio problem into a visual problem.

I'm not critiquing, as I've used the app and it's excellent, just wondering if anyone could explain more about how feature engineering works in audio and whether this approach by the Merlin team is a standard one or not.


Just wanted to concur that this is a great app.

I first discovered it on holiday in Zakynthos. I wanted to identify what bird was making such a loud, clear call at dusk. Searched for bird call apps and found Merlin, which analysed a recording and told me it was a Eurasian Scops Owl.

It's great fun.

https://en.wikipedia.org/wiki/Eurasian_scops_owl


Almost every time I try to use Birdnet to identify a bird, these cheeky bastards decide to go silent the moment they see me pulling my phone out of my pocket.


I’ve tried this a couple times but can’t get it to work because the bird song Im interested in is never the only sound in the environment.


In my experience Merlin can recognize known calls in very busy environments, like on a street with cars running, people talking and four different bird species screaming at once.


It doesn't require that. But the bird sound does need to be fairly prominent, so yep, somewhere not too noisy would be good.


Perhaps it's possible they could add a pre-filter that cancels selected birds. I often have squawky magpies and starlings that drown out the bird I really want to know about.


Lovely to see this. I use the amazing Raven lite program from Cornel to view spectrograms. It draws beautiful spectrograms and their choice of the intensity to color map really helps in locating start point of sound events.

I’m downloading this today.


I love birding, and I think many people who think they are not interested would be if they gave it a try.

I think it's a hobby with very few negative side-effects. More people having an interest in birds is great, as it benefits the individual, society, ecosystems and the entire planet.



I use the app daily. I would like to know how it works. And I get the bad gateway msg as well.


Did HN destroy the site? When I click all I see is "Bad Gateway".

Anyway, I use the app and find it helpful. I'm astonished that the microphone on my cell phone is sensitive enough to pick up birds up in the trees.


(2021)


Added above. Thanks!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: