Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I trained an AI model on 120M+ songs from iTunes (maroofy.com)
753 points by subtech 7 months ago | hide | past | favorite | 427 comments
Hey HN!

I just shipped a project I’ve been working on called Maroofy: https://maroofy.com

You can search for any song, and it’ll use the song’s audio to find other similar-sounding music.

Demo: https://twitter.com/subby_tech/status/1621293770779287554

How does it work?

I’ve indexed ~120M+ songs from the iTunes catalog with a custom AI audio model that I built for understanding music.

My model analyzes raw music audio as input and produces embedding vectors as output.

I then store the embedding vectors for all songs into a vector database, and use semantic search to find similar music!

Here are some examples you can try:

Fetish (Selena Gomez feat. Gucci Mane) — https://maroofy.com/songs/1563859943 The Medallion Calls (Pirates of the Caribbean) — https://maroofy.com/songs/1440649752

Hope you like it, and would love to hear any questions/feedback/comments! :D

This is very interesting, but unfortunately I haven't had the greatest luck in finding new songs I would enjoy listening to. It absolutely finds similar sounding tracks, but it doesn't distinguish which part of the song made it enjoyable. There's no tempo consistency or genre consistency or even main instrument/vocal timbre consistency between recommendations. I think locking one or more of those dimensions would allow for much better recommendations. I'm not sure what aspect you're using to order the results, but having extra metadata to filter or group the results in some way would help a lot.

Take Raga's Dance by Vanessa-Mae, A R Rahman, ... Royal Philharmonic https://maroofy.com/songs/476841571 . I put in this track expecting other fusion songs to pop up, and arguably some do, but much more often it feels like a 20 second section was used to define the original song and it misses the underlying concept. Like it got, in my subjective description, the epic violin in orchestral music, but it completely ignores the fusion between the distict styles of traditional indian singing/instrumentals and western ochestral and also ignores the call response structure between the violin and carnatic players, which is the what I actually care about. Other songs have the vocals but no epic backing. It feels like it's matching multiple samples from the song instead of the whole song.

This feels very promising since it clearly is picking up the styling of the specific songs across different genres and languages. I look forward to seeing where this goes.

I also think it would be interesting if there was a way to specify two different songs to find either only the common things and/or to find what the fusion of those two tracks produces.

I agree with everyone's criticisms that it seems to identify similar tempo and melodic riff, irrespective of genre. But to me this is a feature, not a bug. I could see this or something like it opening my eyes to music I would never possibly have found on my own. I really like it!

Spotify on the other hand seems to want to send me to the same group of artists and tracks I've listened to before, following some Collatz conjecture type algorithm that eventually converges on the same tuned playlist for that genre, no matter what the starting parameters may be.

It’s a pretty cool idea and gets to a philosophical question really quick “what do people mean when they say they like similar music?”

Era? Artist? Genre? Sound? Tempo?

Personally I spend my time finding similar-era music because I like to hear how sounds evolved.

Ideally one would like an algorithm to be able to realize,"this person prefers to explore new music from the same era," vs "that person prefers to jump around to different countries," vs "the other person prefers to remix their existing playlists," and thus come up with the optimal degree of novelty for each listener. Or at least let the user set a novelty slider to customize their own experience.

> But to me this is a feature, not a bug. I could see this or something like it opening my eyes to music I would never possibly have found on my own.

What makes it different from a big "play me a random song" button then?

They have some similarity based on the actual music.

> it seems to identify similar tempo and melodic riff

Spotify wants you to listen to the tracks that they are paid to promote.

Hey thanks for the feedback! I definitely have a lot of improvement to do on the model, it currently performs better for some styles/genres of music than others.

But the model architecture I'm using is kinda outdated as well, gotta iterate on it more to improve it further!

I'm also thinking of letting users upvote/downvote results, which can also help improve quality on the ranking side.

Honestly it's loads better than current Spotify/YouTube Music suggestions. Mostly they just seem to suggest popular stuff that's heavily marketed...even though I seeded all my "thumbs up" with only eclectic stuff.

Yes, it's hard to find a song I really really like, but 1-in-10 seem to be something I'd add to my eclectic "thumbs up" playlist. And almost none of them are by any artist that I've heard of before.

This is huge for me. Thanks.

You're not alone. For me, Spotify suggestions are "things you won't hate." Most everything is palatable, but forgettable and too usually not all that interesting.

I'd like to add, it's not all the platforms' fault. Too many artists aren't artists at all. The make too little effort to be unique.

I never get any heavily marketed music recommended on Spotify. Almost invariably it's something obscure. But I only ever listen to obscure music. I guess I'm saying I don't think the Algo is weighted for payola.

I honestly think they try first to make you happy, second to reduce their spend.

Wanted to hop on and say this is amazing, thank you for sharing this! Also agree that it seems that it's really good at finding literally similar sounding songs, but not what I would expect a friend to recommend (this is both good and bad I guess). As someone else said, this is already way better than my spotify recs

ty for ur kind words! <3

Another strange music for your testing that gives complete bonkers recommendations: https://maroofy.com/songs/1486467186

If you need someone to test your model, you will never find one with more eclectic/strange taste than me ;)

Another bonkers one:

Nine Inch Nails


Recommended Muppets

I'll note my own experience, that Spotify and Apple Music both struggle to find me latin reggaeton outside a small subset of popular artists, and my first couple searches with this tool have found me so much music I've never heard before that matches exactly the 'vibe' I want to hear, and is introducing me to different-but-related sounds and artists I couldn't have found on my own.

I agree with the other commenter - this is huge for me. Please, do whatever you need to do to monetize this so it never goes away. I would love to pay you for this.

I don't know if someone already said this, but as an amateur music producer i would love to upload my songs and discover similarities. Thanks for this Amazing tool

FWiW I had one shot and entered "Tabaran"

Rather than get back anything "acoustically similar" it simply returned a list of other songs on the same album (several of which are far from being acoustically similar).

No drama, you're attempting to cover a lot of ground, but I'm guessing there was no actual fingerprint there for that work and no sense of other songs that sounded similar.

ADDENDUM: Okay, I had to select the song <doh> .. but still "something went wrong" - perhaps hugged to death or not found to process. No matter :-)

If I am not mistaken, it this is only trained on the preview and not on the entire song.

If you listen to a music with a real intro, it gives strange results. For example: "Goodbye Blue Sky - Pink Floyd" (https://maroofy.com/songs/1065976153)

Same for "Station by Station - David Bowie" -- lot's of tracks with ambient noise.

Categorising music is surprisingly different.

See this paper from https://everynoise.com/ : https://everynoise.com/EverynoiseIntro.pdf

IIRC they try to classify music on 17 different points/features. What you see on the web is an attenpt to visualise (and provide a guide to music based on) some of them

Yes. I think many of those features are based on pre-NN feature detectors (such as BPM), and Danceability, Valence and Energy sound like primary components that have been given names.

Echo nest was great for its time, but if they have kept up, they're not exposing their more modern learned features to users anymore.

They were acquired by Spotify, and there's been some work done by/for Spotify since then.

I'm not at liberty to say what, sadly, as I work for Spotify.

I think I can say that one of the main challenges is running this analysis for users. It's prohibitively expensive (or was prohibitively expensive) to use this to keep track of and run recommendations for what users are listening for each user.

It can be used on smaller scales, but, well, it's probably NDA :)

Can you say why Spotify's recommendations are so bad? Something like what OP has made should have been relatively simple to make for Spotify for many, many years already, yet that hasn't happen. Is the whole system just rigged to only recommended a few "sponspored" artists?

Because, as I said above, it's a very complex problem :)

I honestly don't know much about recommendations (and what I know I probably cannot tell). But there's definitely continuous work done on them. But it can also be hampered by extremely conflicting requirements (where "some" both means double-digit procent of users and these "some"s overlap with each other):

- some users want more of the same, some users want a more diverse listening experience. Some of these users are the same user, but on different days

- some users mostly prefer curated suggestions, some users want ranodm stuff. They can also be the same user :)

- some users a heavily weigted to only a few artists, some users listen to evereything and anything. And even this can be the same user :)

- there's probably stuff about licensing, availability, contracts etc. at play as well, because in streaming services it's always there, in very bizarre ways

Basically every single tweak to recommendations will break them. And yeah, Spotify employees will complain about this more than anyone else, all the time :)

I doubt that "why does your product suck" is one of the things a Spotify employee is allowed to talk freely about in public!

But I've been watching them, I will speculate. A few years ago, Spotify had two young interns, Sander Dieleman and Aäron van den Oord. We know a bit of what they worked on, because Dieleman blogged on it, and indeed it was something a lot like what OP has made here - only better, I would say. I asked him, and Dieleman was allowed to say that the thing they built was one of the inputs into the then-new Discover Weekly, which made headlines for how outrageously good it was.

But Dieleman and v.d.Oord did not stay at Spotify. They were headhunted by DeepMind, and have had a VERY impressive track record there over the years.

And I wonder why. Was there a conflict between the old school ML of the Echo Nest people and the new fancy neural net kids? Or was it just, as GP alludes to, that the NN methods were just too computationally expensive and they failed to justify their costs to leadership?

A distributed, local-first architecture much work well for this. I’m happy for my computer to crunch away on my behalf, generating recommendations and indexing stuff. I’m happy to recontribute that work to a common index of some kind.

I def prefer for that common index to have a permissive license though!

I had the same experience. Could see the element which it matched with, beat, pitch, etc.but missed the riff or nuance that made the source song special to me

I am enjoying Raga's Dance, which is nothing like what I was just listening to. Thank you for the recommendation ;)

It doesn't seem to find similar-sounding tracks at all for me.


The Oblio Joes - "Captain of the Moon"

The Bondage Fairies - "Levenus Supremus"

... both chosen so "Just shove a bunch of recent pop-rock at the user" won't work.

I've only tried a few songs but they've mostly been bangers! I did come across a couple examples where the recommended songs just heavily sampled the original but overall very impressed.

Same here

I've never commented on HN before, but I feel compelled. Congrats, you get my first.

I've been using your site for 10 minutes and already added song after song to my library. I'm sure you'll improve the matching algorithm over time. This is a great first step; I'm being exposed to songs I've never come across before.

Good job!

Same for me, I don't comment often, but this is great! I like having instrumental music for when I'm working (with no vocals), so if the model could classify vocals vs no vocals that would make it even more useful for me.

Thanks a lot, definitely have a lot of work to do with improving the model!

good job to you for commenting. i wouldn't have checked it out otherwise

This is actually great, one of the most promising recommendation algorithms I've come across.

I love that it's working by sound. So often eg Spotify will insist I check out other bands in the same "scene" or from the same era as other artists I like, and in genres as broad as "70s rock" it can be really tiring.

One of the first tracks I tried was Natural Woman by Carole King. I love that it recommended other slow but rhythmic piano vamps with tender vocals by artists singing in other languages, some modern, some old, as well as some sung by men. It even recommended me more Carole King which is I guess shows it's picking up on something constant in her music.

What impressed me was that it recommended quite a lot of numbers in the same key! It was funny clicking through them and the tonal centre being unchanged. It was like they really were different but the same.

It definitely doesn't understand everything about tonality though! I tried some atonal music next, Naama for solo harpsichord by Iannis Xenakis, expecting to get more atonality back. Nope, first result was very firmly tonal: "Suite in E flat major" by Bach for solo harpsichord. It definitely got some essential aspects of the pieces down but completely missed the central concepts underpinning their musicality.

Very promising like I say, can't wait to try more things out!

I'm enjoying it a lot too, I've always been a bit frustrated by what you describe with Spotify. It's like it's keying more on genre than sound, which has both pros and cons, but ends up giving me a lot of music that is nominally in the same genre but missing the qualities I like about a certain song.

Probably my favorite song of all time is Close to the Edge, by Yes. Spotify will happily provide me with tons of recommendations for 70s prog - much of which I love too, but some of it leaves me cold.

Maroofy came up with "Good Day" by Leigh Ashford, a song I had never heard of. It's very interesting to compare it to CttE - it's not a very similar song in most respects, but has a similarly prominent bouncy bassline. I like the song, and I don't think I'd have found it via spotify or any of my other usual music discovery sources.

Overall this is very cool, great project.

Thank you so much! With an improved model, things should get a lot better! :D

Wow, these songs were impressively similar to my queries. I would love for a interpolation playlist feature, where I put in 2 different songs (from say, classic rock and EDM), and I could get 10-20 songs that slowly change from the start song to the end song.

Yes, this: can you get any interesting insights by playing with the embedding vectors? What happens if you add embeddings together? Weighted average of multiple tracks? Follow the average vector for an artist's work over time?

100% This is definitely worth exploring, and I'm currently trying to figure out the appropriate front-end UI/UX to expose this functionality for users.

I second this, somewhat like http://boilthefrog.playlistmachinery.com/

Tried Erreur 404 by L'Impératrice[1], and I noticed the beat of the other recommended songs were eerily similar!! I'd argue your project is actually too good.

Where Spotify's Discover Weekly tried to connect you to music other people listen to {B, C, D, ...} because you've listened to a particular song {A} {B...->A}, your model quite literally tried to find other music {A₂} that sounds like what you're looking for {A₁} {A₂->A₁}.

Edit: OK, some of these are actually pretty dope...

[1]: https://maroofy.com/songs/1458902217

This approach is better for remix beat-matching than as new song recs, IMO.

… what?

IIUC: Spotify seems to use something similar to collaborative filtering. People rate songs based on more than similarity, which is what this model seems to provide.



IIUC -> If I understand correctly

{W₁} {H₂->A₁}*T?

My thoughts exactly.

r/iamverysmart vibes… someone who has watched one Lex whatever his name was video on the maths of neural networks…

Probably just BS by GPT3.

Could you not shitpost like the parent, please?

How did you get the samples for the song? iTunes allows scraping?

Your project is extremely motivational .. how long did it take you? What did you train on? I do DL for work and just play with things like cifar. This is so inspired.

Apple Music subscription is $12/month and it allows full downloads of songs.

Hmmm. I would have thought that Apple would detect and block you if you try to download their entire catalog without some kind of permission. If they're the same size as the files in my Apple Music it would take in the order of a petabyte to store (compression would obviously reduce that).

120M songs would take approximately 1000 years to listen to in normal time and is way beyond normal usage.

... but certainly not ALL songs? I would think they notice if you download more then one minute of audio per minute over a long time.

But aren't they DRMed? Or does it let you download in mp3 or related formats directly (I'm not on Apple music)?

You're correct, it's DRM only

Analog hole.

most relevant

I just wanted to quickly THANK EVERYONE for taking the time to check this project out and give your feedback!

I honestly didn't expect this project to get this much traffic -- I really can't express my emotions via text rn lol.

I'm working on an improved AI model that should address a lot of the shortcomings of the current one, along with a lot of other features people have mentioned (playlists, deduplicating results, volume control, better search UX, etc.).

Got a lot of updates coming, time to ship! :D

You should add an option to donate.

Volume control is a must. I had my headphones on, at normal volume, and... let's just say I didn't realize just how loud it would be.

Other than that, however, this seems neat. I've been trying to listen to new music lately, so this'll definitely come in handy.

Finding sources for input data is something I struggle with when building deep learning models. Out of curiosity, how did you go about programmatically accessing the music files for all 120M+ songs, in order to create your embedding vector? I can't imagine iTunes has an API which would let a person do that.

Good reminder of the value of Adversarial Interoperability https://www.eff.org/deeplinks/2019/10/adversarial-interopera...

If by”adversarial” you mean a publicly documented and freely available API that has been around in some form for two decades.

They do, it’s just rate limited. See https://news.ycombinator.com/item?id=34641623

Also would like to know. I can't even listen to the full songs, and assuming I have to pay. I can't imagine buying 120 million songs, so it has to be some collab with iTunes.

Thinking about both processing time and the difficulty of sustaining 120M downloads' worth of programmatic access, I wouldn't be surprised if this is actually trained on the track previews.

I’m almost positive it is. If you put in a song with a bunch of different styles, sections (e.g. bohemian rhapsody) the suggestions match the preview

> so it has to be some collab with iTunes.

There’s no way today’s Apple would allow such a collaboration. They’d just keep the feature and market it as part of Apple Music.

Probably scraped them

I searched for "Poinciana" by Keith Jarrett[1] (one of my all-time favorites).

The top three responses were "La Raya" by Los Islenos [2], "Days of Our Love" by Deepa Dremata [3], and "Flying Home" by Michelle Mack [4].

While I didn't hate any of them, and they all featured a piano, I wouldn't say any of them sound like Keith Jarrett, either.

[1] https://music.apple.com/us/album/poinciana/1446740946?i=1446...

[2] https://music.apple.com/us/album/la-raya-feat-ben-murphy/154...

[3] https://music.apple.com/us/album/days-of-our-love/1608767255...

[4] https://music.apple.com/us/album/flying-home/1577716851?i=15...

I had similar experience

it doesn't seem to understand anything about the style of the music

seems to find stuff which is sonically similar rather than musically similar, and even then I'm being generous

no useful recommendations

I'm wondering if anyone has done something similar, but instead of trying to find similarities in the raw audio, they use tags available from sources like Last.FM, Musicbrainz, Discogs etc? And the ultimate answer to that is probably "those sources kinda suck". Discogs is like a trainspotter on the spectrum, fascinated by release IDs. Musicbrainz is kinda similar (each song will have a dozen matches of wildly different quality). Last.FM tags are used-generated which make some of them amazingly useful, and others amazingly detrimental.

I have a human-powered recommendation service that uses my own tags that I've added to my mp3 library over 25 years. I add instruments (not all, just the ones that stands out, like synth, flute, distortion, violin, piano), vocals (male/female, falsetto, spoken, rap), moods (happy, sad, angry, mellow, dramatic, chillout) and genre (I don't go too deep here, because I hate getting recommendations stuck within some obscure sub-genre). And that's it. I get it to play a random highly rated track with a keyword or two, and then use the tags from the first 10 songs to generate the next. But since, for me, music is a somewhat interactive experience, every 10 songs or so, I'll think of something that I want on the list (maybe reminded of it by another one that just played).

Other things I think might be useful for recommendation is Last.FM histories. Think about it, the are hundreds of thousands of active listeners "scrobbling" their listening history. You could easily parse that and group songs together that have been played within 5 songs of each other as long as they're not by the same artist and the time between the songs is around zero (ie: listened to in order, no pauses). Similarity is higher for songs that were next to each other and score drops.

In fact, ListenBrainz (partner project to MusicBrainz) is doing some stuff similar to what you mention about listening histories. We're using the data to generate similarity based on when songs are listened to each other in "listening sessions" along with other songs.

Follow the troi-bot user with a ListenBrainz account, and we'll generate you a daily playlist: https://listenbrainz.org/user/troi-bot

This is still very much work-in-progress, but we're doing as much as possible out in the open to solicit feedback from people.

I've tried with Discogs and found it to work pretty well. Kinda similar to what OP did just the "embedding" vectors was created by the Genre/Styles on Discogs. I didn't have a Vector database though, so it was kinda very slow. On Discogs those tags are per album and not per track. To create a playlist of say 10 songs similar to a song, I'd find the ten closest albums, then search for them on last.fm and pick the most popular track on each to add to the playlist.

A similar embeddings model based on Discogs genre/style data is the Effnet-Discogs model made at the Music Technology Group at Universitat Pompeu Fabra: https://replicate.com/mtg/effnet-discogs

Per-album metadata is useless for a lot of stuff that I like. It's even useless for a lot of The Beatles stuff because they tend to have a range of styles on an album and tended to bring in weird instruments on individual tracks.

Discogs is great, it just doesn't concern it self with how the music sounds...

Which is unfortunate because it has (on a tiny number of releases) instruments and vocal tags. It's just so unreliable. AllMusic is another decent source for tags, but not instruments. It's the age-old problem with ML/AI: data quality. Garbage in, garbage out. If only we could crowd-sourcev listeners and get them to tag music from a list of available moods, instruments etc. Oh wait .. that's exactly the feature that recommendations services have been removing for the last 10 years.

User tagging isn't a panacea either, because people tag inconsistently, and people who tag a lot are probably not very representative.

For an extreme example of that, see the boorus. Some machine learning people have become interested in those, since they are huge dataset of extensively tagged material ... or maybe it's the booru people who have become more interested in machine learning. Either way, I'm sure they're great, if you're into waifu anime, porn, or waifu anime porn. Both types, country AND western, as they said in the Blues Brothers movie. Any tag remotely subjective (such as "beautiful", God help you) is going to be extremely coloured by the tastes of an extreme fringe.

At least, relying on fanatics to do the work for them, I assume they've got a handle on simple spam on the boorus. Commercial recommender service tagging systems don't have that luxury, and that's probably why they end up eventually removing them.

This is very true. I'd pay for a metadata-only / playlist service that works with Spotify/Tidal/Apple/local music.

And don't allow free-text tags. Instead you give a list of available tags - the lowest number needed to describe most tracks. I mean, let people add their own if they want, but you should ignore those while training the model.

I actually think that instead of trying to tag some specific mood (eg "happy") some sounds be a sliding scale between two opposites:


Instrument tags are easier to understand. Give a list of instruments (or instrument types, because the user might not know precisely which woodwind or percussion instrument it is) with checkboxes beside each.

Some users will be experts because they play woodwind. Let those users apply to become experts, pass a test, "identify the instrument", and if they pass, give them half price subscription as long as they moderate X tunes per week.

rateyourmusic.com is an alternative to allmusic.com while chosic.com focuses on finding music.

The later has generated more similar music for me so far. But I welcome every additional project improving the search for music which has been so neglected by most services.

I've been a paying subscriber to rateyourmusic for years because one extra feature you get is per-track ratings. Extremely useful if you're in discovery mode.

Quite impressive given you’re working with tiny snippets of songs. Kudos.

This is so interesting out of the gate that criticisms seem rather foolish. But I’ll just say what I tried.

I tried pop songs, jazz tunes, and even rather strange, unique recordings like “One Step Beyond” by Madness. And every time I got something with similar aspects.

I also like that all cultures were represented.

Would I use this to find music? Maybe? It kind of underscores that we don’t always listen just for a particular sound, but for what that artist represents to us. I may really like a band and absolutely hate a band that’s pretty much a copy of them (maybe done a few decades later) precisely because they are too much a copy.

I don’t think I could build playlists (in this current offering) because the songs are _too_ similar. Maybe if I want an exercise playlist with a particular bpm and vibe?

I think if I were looking for music for TV or advertising this would be an absolute winner. If I feel like I want a song like Daft Punk here, but I can’t afford to license it, I can use your tool to find a lot of tracks with similar vibes that I can afford or that I haven’t heard of.

Search doesn't work in a recent Firefox on Windows - typing anything in the search box shows the "Loading..." dropout below it and then nothing happens. This generates no network activity and the console shows:

    Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote
    resource at https://cdn.segment.com/v1/projects/F4GFNelOpRsgUJc6iwTuiXr2t6AH5LCY/settings.
    (Reason: CORS request did not succeed). Status code: (null).

    NetworkError when attempting to fetch resource  --  _app-f73fb5ecceb5fc1e.js:1:82295
PS. It looks like your code has some hard dependency on segment.com and/or panelbear.com tracking services. These are blocked by default by uBlock. Turning uBlock off seems to resolve the issue.

Stuck on my mobile device. Disabled both ublock and Firefox enhanced tracking restrictions. Tried brave, chrome, Samsung internet, and regular Firefox (usually use Mull). Nothing works.

I've tried in both Firefox and Chrome and getting the same issue. I assume settings holds that magic that makes the network requests because right now none are being made.

Edit: Turned off privacy badger and then it started making requests on Firefox.

Same goes for Vivaldi... I thought it might have received the famous "hn-hug-of-death".

//Edit: Just read the edit regarding ublock origin. So this website definitely needs a Privacy Policy then.

Same. Get stuck on an infinite loading. Even without uBlock and similar. Too bad sounded cool. :(

Turned off uBlock, still the same issue

Weird, I'm using Firefox with uBO and it's working fine.

Getting the same error on macOS, both Safari and Firefox.

Everything loads instantly. Plays almost instantly. And it really seems to find very similar style/beat/music. Interface is clean.

I am not sure if intentional, but the loading animation on the play buttons feels like it is in sync with when the music starts playing. Makes for a responsive feedback.

Hm, maybe it's the hug of death but I can't search for any songs as of now. (Stuck loading)

EDIT: It's working on my desktop. Above was on mobile Firefox/Chrome.

Thanks! :)

Let's just say that setting up the backend systems involved a lot of tears & frustration lmfao

What tech did you use to pull down all of this data and comb through it?

How did you get access to 120,000,000 songs from iTunes? Not just the listing, but the actual audio.

The preview audio is free. It may just be 120,000,000 previews.

Totally explains this part of the current top comment here:

> but much more often it feels like a 20 second section was used to define the original song and it misses the underlying concept

Yeah, like the linked The Medallion Calls example catches similarities to the slow starting section of the track and totally misses the main part

Even so, how do you get the 120M previews?

Apple has a public API with some rate limiting that returns a link to the preview audio file, see:

https://itunes.apple.com/us/lookup?id=1023678453 (https://www.chrisjmendez.com/2017/06/19/working-with-itunes-...)

So probably all that is required is a couple threads downloading and a proxy service with a large pool of IP addresses randomly rotating on every request. Maybe OP also found an undocumented API endpoint somewhere that was not rate limited.

... without getting banned?

im also curious

This is so sweet. It's finding gems that when I look them up on YouTube, are 10+ years old and sub-200 views. Total sleepers, but I can discover them through this service. I love it. Great work.

How did you scrape the audio of 120M songs? That sounds expensive?

Also curious about this… It seems impossible.

I wonder this too. I tried scraping iOS App reviews from Apple's server and I didn't get far before my IP was blocked.

Probably just used the previews (makes sense, songs tend to keep their rhythm throughout).

Lots of industrial metal has a drawn out start before the actual song lol

Previews don't start at 0:00 but at a place selected by the artist give the best quick impression possible. Of course, I don't know what they've done with the "old" catalog.

How the hell do you even get access to the entire iTunes catalog?

i think that would be an even more interest post.

"Apple Music doesn't have rate-limiting. The end."

um, isn't that just the Apple Music paid subscription service, their Spotify competitor? They only advertise ("over") 100M songs though, I'm not sure where the extra 20M come from.


Sure, but it really can't be as simple as paying $10 for the month and looping through the entire catalog and downloading it...right? Did nothing in their system catch millions of simultaneous track requests and a petabyte+ data transfer for a single user?

Why would that be against the TOS?

That is a great idea! Measuring distance between embeddings has always been a cool concept (ex. If I have a vector that represents the word "king" and from it subtract the vector that represents "man" then add the vector that represents "woman" it will approximately equal the vector for "queen") and it's awesome to see the same concept applied to music.

Most other services try to find matches by seeing what other songs the people who like the searched song like, and finding trends amongst those. This site finds songs with similar sounds and rhythms by looking in the vector space. Awesome! Congrats on the finished site!

Fantastic tool, thanks for making it!

As others said, it would be a nice option to export it as a playlist for Spotify/Apple/Youtube or a .txt file with "Artist - Track name" in each line. Then you can import the txt file into a playlist converter tool like www.tunemymusic.com and play it in your favorite music service. Mine is discoverquickly.com where you can listen Spotify playlists fast with mouseover, and discover related songs/artists.

An autoplay x seconds of every track, where you can choose the number of seconds, would be a nice addition too. This way you can discover music while doing other things.

Playlist + some sort of auto-playing mechanic is coming!

Possibly an unpopular opinion (which is ironic), but I think this could do with taking popularity into account. The vast majority of music out there is generic landfill at best, and just outright bad at worst. I think once you've done the first pass and got the sonically similar songs there should be some kind of filter which prioritises tracks with higher listener counts which would inject some 'wisdom of crowds' into the decision making. It would be a pretty cool feature to then be able to have this setting on a slider with obscure at one end and popular at the other.

I've found probably the most egregious example of what knaik94's comment is talking about: https://maroofy.com/songs/214977681 ("Being Alive" from Company)

All the AI model seems to understand is the opening 5 seconds piano. Listen to the actual song it just opens like that because it's from a play.

I think this will struggle with any song that has build up, it's very promising, though you need a sample of entire songs not the sample Itunes gives you.

Definitely understand the current model's shortcomings that people have mentioned.

I actually think this is largely in part due to how the current model + training process is designed.

I have some ideas for improving things on this front, will give it a try and push an update soon! :)

Neat, but I'm convinced after giving it a try that this is absolutely not the way to do music recommendation. The recommendations were complete misses for me.

Tried with a song I knew well- Everything In Its Right Place.

Feels a little bit like fortune telling, I guess it is, in the sense that I am listening closely to what makes the songs similar, not just listening, but actively trying to find the similarities, so even a couple notes in progression, or drum-beats and I'll say oh, yes, that matches.

Finds very different music, not necessarily what I'd listen to in many cases, but kudos for getting me clicking through a decent pile before going wait, that's a nope, you're grasping AI-type-being.

Well done!

A comment: for classical music, it would be necessary to see the name of the composer. Currently, it's only the performing artist that's displayed in the interface.

It did well on ""Eyes of the World" Grateful Dead finding some artists with similar vibe I never heard from. It did poorly in "Heart of the Sunrise" by Yes and "Nautilus" by COVET https://maroofy.com/songs/1084458728.

One thing that stuck out for me is that for some genres (not jazz, not classic) the singing really matters. Covet, for example, is instrumental music, no singer at all. I wonder if you'd get better results by separately training on the vocals and the rest. (I think I've read that vocal extraction works these days, though I confess it's a lot more work.)

Solid work, but this only reinforces my belief that music recommendations is an area that is not (yet) cracked by AI. You can find similar songs with great accuracy, sure, but even if two songs are close to each other by all calculable metrics they can still vary drastically in subjective quality.

And a quick suggestion – all my top results when searching for popular songs were songs that were either different versions of it or those that had heavily sampled the original. While the algorithm is spot on, that isn't exactly what I am looking for. Maybe have a filter to exclude results that are too close a match?

Have you tried Gnoosic[0] yet? It's a more standard recommender system, I think. Some of the recommendations are really on point.

[0]: https://www.gnoosic.com/

Gnoosic and other existing tools are entirely useless for some things. Most music tools don't even allow entry of individual songs, which can really mess things up if your artist has changed sound styles over the years, or plays in multiple genres.

For example, any time I hear music I like in a Disney movie and want to hear similar music, all I can get recommendations for is movie soundtracks or whatever the guest artist is known for.

The example that covers up for me on occasion is the soundtrack to The Road To El Dorado, specifically the instrumental tracks (Cheldorado, The Brig, Wonders of the New World); Chosic.com gives me a ton of other soundtracks, mostly orchestral scores, for things like Kung Fu Panda, LOTR, Assassin's Creed, Angry Birds, POTC, etc. While it's nice to be able to tweak the algorithm on chosic to pick quiet movie scores for background music, that's not what I'm looking for. And gnoosic is entirely useless for that.

Tried few recommendations and can definitely use an alternative for Spotify radio.

Seems like you are dodging the question of data access and I am not sure why. Even running an intelligent scraper to download 120M song previews sounds too complex and might take days to months as you have to rotate IPs and not bombard the server all the time. If you manage to do that, kudos. That itself is a great achievement. If not, can you let us know how did you get the data access? You might help other devs who want to try something similar

He’s not obligated to tell us if he knows it’s an edge he wants to keep

I don’t think I agree with that. If he is scraping, all he can say is yes. The details of the scraper is proprietary and that’s his edge for sure.

If not and he found an unauthorized source of retrieving information, this reveals a serious security breach in iTunes API and it’s my valid concern as a paid customer.

120M is a huge number and it’s not even text. It’s media

I’m pretty sure the model is trained on the 30 second previews. If you type in a song with a lot of different sections, the suggestions match the preview and not the rest of the song. Bohemian rhapsody, for example

How did you download that many songs? Wouldn't that be something on the order of 360 terabytes?

I guess he is working with the 30s low rez previews, I know you can download them with the Spotify API. Apple Music should be similar.

Wouldn't this still be about 36 terabytes of data?

Meh, you can store that on a single d3en.2xlarge for $780.

You would only need to store the embeddings and not the file audio itself.

Could you elaborate?

Also curious about this... where did you get the dataset?

Nice snappy UI. I found a couple of times on iOS that after I'd clicked through to Apple Music, if I returned, the app had somehow stopped being able to play again but this is very well done. I think I would give a boost to song names over bands/albums in the search as I think a fuzzy song match is probably more likely what a user wants than an exact album/band match on something they've not heard of. I'd definitely use a playlist feature that just queues up the top 20 matches in Apple Music, but I don't know what the API looks like. Anyway, only feeding back because it's great.

I can certainly see what the model is getting at each time, and I've not hated any of its suggestions so far, but I've also not stumbled over any new favourites yet. I don't know what kind of features the model is able to learn, I think it might miss one of the things I like most in music, which is not just dynamics, but something that builds tension over time and then blows up. If there are no longer range features like that I'd certainly experiment with them.

I've had a lot of success with Apple's own suggestions (which are admittedly extremely hit or miss), and I've probably grown my collection 1000% in my 30s and 40s after letting music drift away from me in my 20s. There's nothing better than the feeling of a new suggestion and you click through, and there's no artist profile because they're unknown, and they've got like 300 Twitter followers but you love them like a 15 year old. At least once on here I clicked through and found that I was listening to the only song ever recorded by someone, which seemed quite special.

This is FANTASTIC! There's a ton of naysayers here, but I'm going through songs and having a great time with this.

It seems most forms of EDM work great with this setup! One funny thing is for heavily remixed tracks, all the remixes pop up as suggestions. :)

Thanks! <3

The current model does tend to do well with EDM, but got a new model in the works that should hopefully address a lot of the shortcomings of the current one!

The search feature is great from what I can tell!

I tried an obscure(?) Japanese rock band "9 Ball" and they were extremely high on the search results. I can't even find the album by using Apple Music.

Searching for 9 Ball in Maroofy shows them in spots 2-6, easily: https://maroofy.com/songs/185487468

When I try to search for the same band in Apple Music, I get this jumbled mess of results: https://music.apple.com/us/search?term=9%20ball

Even searching for a specific song of the band, the correct artist is 5th on the list: https://music.apple.com/us/search?term=9%20ball%2024%20hours and doesn't even link to any of their songs: https://music.apple.com/us/artist/9-ball/28599994

I literally cannot use Apple Music search to get to the page to buy Sound Seeds by 9 Ball, without going through Maroofy first. I had thought Apple removed that album years ago due to some kind of licensing issue. Strangley enough, even though there is an option to "Buy for $7.99" clicking on the button in Apple Music doesn't seem to initate any kind of prompt. Maybe this is old data, but the previews are still available? Who knows. Either way, the search for that information was excellent.

Something I’ve dreamed about but haven’t found: a tool/service/etc. that can take my tastes from an era, say my eclectic early 2000s mix of jazz, electronic, and tango, and find me a similar set of music I might be interested in from the 2020s or the 1990s. I would love to explore my own taste in music in different eras. Interesting work. Reminded me of my little dream.

Exciting to see AI used this way. My main feedback is I'd look at incorporating other factors to rank results, not purely how similar it sounds. Audiophiles might prefer a pure similarity ranking, but that could be offered as a non-default setting if anything.

e.g. I'm sometimes seeing several essentially identical tracks at the top of recommendations (also mentioned in a comment by rayshan). You probably want to penalise tracks like that so they're pushed well down the list, i.e. penalise matches by metadata similarity (artist, title, etc).

OTOH I think it should boost results from more popular songs/artists, so the top result is less likely to be an obscure result that happens to sound similar. Some might argue it's a good thing to discover/highlight obscure artists, but for most users, it's more practical to recommend results that are already "proven" to be appealing. More obscure results could still be blended in if highlighting them is seen as a goal of the project.

I think popularity based ranking should absolutely be optional and a toggle. I think it's reasonable to have default rank be popularity, but in my opinion, the value of a model like this is finding obscure tracks. I also would definitely like seeing an exposed toggle, and maybe automatically toggle it off when someone presses refresh?

Sure, a toggle is one option. The main point is to introduce popularity as a factor and ultimately the best UI and defaults are best decided through A-B testing.

I'd also add that it's usually a good idea to incorporate this setting into the URL, regardless of the UI, so that a specific order can be bookmarked and shared.

If the point was to surface popular recommendations why would I go to a different Web site instead of just using what Apple Music already gives me? It has to do something different to be interesting.

Not everyone is using Apple Music (or its competition).

Even for those who are using a music platform, this project still has its unique algorithm that’s likely to surface distinct results. Partly because the developer can do their own innovation and partly because the platforms influence the algorithm in ways that aren’t necessarily aligned with users’ interests. e.g. promoting artists they have favorable relationships with.

The developer can also offer features that Apple et al aren’t offering, perhaps because it would complicate their app too much or isn’t high priority, but makes sense for a specialized tool like this. e.g. fine grained settings to filter and sort results.

No, but the tool already features heavy integration with Apple Music, so I assumed that was the target demographic.

https://maroofy.com/songs/724345280 That's just the same song always 6 times in a row. you probably already know, one would prefer to have removed versions of the same song by the same artist at the very least.

Bro, don't listen to the critics. It is amazing as it is! Don't fix it if it's not broken. This is much better than my spotify recs. Much much better!

This is a very music recommendation engine. The best feature I guess would be the serendipitous finding of unknown artists we might not hear otherwise. Great job!

This is a project I've always dreamed of building! Already pretty excited about some of the recommendations I'm getting in some niche genres.

Feature request: allow me to auth Spotify and click a button next to each track to add a track to a "Maroofy Recs" playlist

Noted! A lot of people have asked for a playlist support, definitely shipping that soon!

I'm sure this has a lot of value, and it's cool to see. I immediately am interested! One thing came up in the first search that's frequently overlooked in music, though: unless the prompt is Christmas music, the response should probably not be Christmas music.

Holy cow does this thing have wildly obscure taste in tunes. I plugged in “Work It” — Marie Davidson, and it returned “Beautiful Weather” — Blemow. This is an 8-minute opus of a techno jam from the album Dutch Cow #13—the 10th and final Holy Cow album released in 2018, a true annus mirabilis from Blemow.

IMO my idea for making something like this really cool is to give the user more explainability (why are these two songs similar? according to which factors?), and then more control over search results (brainstorming here, but stuff like an obscurity slider, importance of beat similarity slider, etc.). You can try to extract explainable factors from your embeddings with something like NMF.

(PS—I like the esoteric results. This is cool, good job.)

Feedback: "sounds like" for music is more than just rhythm and tone. The search results are all useless for me because two songs with similar sheet music often have wildly different lyrical styles and subjects.

It’s interesting from a musicology POV.

I was looking at Islands in the Stream by Dolly Parton and Kenny Rogers and it had some Latin neighbors that make some sense to me from the synths in IitS.

What would you say was the best music similarity search or recommendation engine you've used?


Nice and very responsive, well done. I searched for "Imeprial March" (the John Williams Wiener Philharmoniker version) and this was one of the highest ranked results: Inauguración del Ferrocarril (by Nathan Stornetta). I kind of chuckle thinking about what a happy empire this would be with that theme song. And of course it also answers my initial research question...which national anthem is highest ranked for Imperial March. The answer is Taiwan...make of this what you will. Star Spangled Banner is the second one a bit down the results list.

Interesting, I tried a few songs and artist combinations:

I picked a Three 6 Mafia song I used to listen to in my younger years, the particular song I chose has everything in the soundtrack from high hats, to trumpets, several rappers with varying styles of exerting their voices, and a lot of the resultant songs have either high hats or similar sounding trumpets, or similar sounding drums, and in other cases, something about how the artists rap reminds me slightly of Three 6 Mafia's own style. I don't recognize any of the artists suggested, which is a good thing, it basically did what it was supposed to.

Next I picked a song by Bullet For My Valentine, not all the songs are quite nu-metal but that's because the song I chose isn't typical nu-metal (or whatever genre they were...) unlike some other metal bands they dont spend the whole song screaming, they actually do normal singing, so it picked up what to me sound like old school metal bands like Dio, Iron Maiden and so on... So it picks up on how some parts of the song are, which can get you mixed results, so if you want to find a similar artist down to the musical style, pick the most "metal" song you can find by them. This isn't an issue for me since I like some of those artists too, but I would rather get more bands that are more like Bullet For My Valentine, though it seemingly did catch one or two I had not heard of before.

Also tried another metal band Killswitch Engage with the song "My Curse" and it recommended another Killswitch Engage song titled "For You" which honestly, is weirdly close in style and I never noticed that before. Again this one's another metal band, I just wanted to see how it would do.

Lastly, I tried a Spanish reggaeton song, and I was not disappointed. You really did a hell of a job on this.

My only wish is that Apple would hire you / buy out your efforts because I wish this was part of Apple Music, their "suggested music" is awful, and the genre discovery type of thing doesn't always yield artists I remotely care for, I feel more inclined to listen to some of the suggested songs but it sucks you can only give samples and link to the song directly, if you could generate playlists from the results it would be good, this isn't a you issue, its moreso an Apple limitation.

Good work! I'm trying this out. It seems it tries somehow to match keys tempos and percussion characteristics.

Some suggestions:

- Have a relevant/irrelevant button, so users can tag useful music and suggestions that were irrelevant. Save that data and use it to make your model better, either in realtime or incrementally.

- Allow some other options for sorting tracks also. Have the most relevant but also the most popular (as in charting) on top. Or maybe sort by the user rankings (how relevant they think a track is).

The similarities between songs are quite literal (e.g. songs are similar in some technical sense), but their mood, cultural meaning, etc., is completely different. I got some pretty festive pop as recommendations for similar songs to Enjoy the Silence from Depeche Mode - but the chord progressions and amount of syllables in choirs were similar. No idea how this works and whether that's by chance or on purpose.

It's a great idea and I like it a lot. Especially the execution. It loads nearly instantly and has everything I want there.

But of course I tried it with 2 very difficult songs I never could find anything like it: Desert Rose by Sting and Zoolook by Jarre. For Desert Rose it found some similarish things (but of course failed to capture the essence, what makes Desert Rose so unique) but for Zoolook it completely fell apart.

There might be an interesting open source / self hosting angle to this. Some folks have a large library of music stored locally. Platforms like Roon can give you recommendations on top of this, but are expensive and include a lot of other features.

You could provide discovery services to these users in exchange for model updates and feedback. Couple thoughts on this:

- there are modern techniques to update an ML model at many edge locations, then combine the learnings without violating user privacy. One common application is type-ahead models.

- People who have large local music collections tend to care about music, and would take the time to provide high quality labels for you.

- computers used as media servers often have unused compute cycles because music playback is not that intense and most folks don’t have music on 24/7. You could harness these to reduce training costs for your model

- These libraries would give you access to the long tail of the music catalog, including many things that aren’t on iTunes or other streaming services

- This would also put you in a position to run an open music catalog. Your embedding index would be a key differentiator from existing options.

Interesting. Some feedback

1. Please add match score

2. Group and fold duplicates

3. Add the year with the sort feature - to identify rip offs

Noted! got a lot of updates shipping soon.

Nice work. Are you going to write a blog post on the process. Would love to read it.

Super sick, thanks for sharing this. I found some really interesting songs and artists to check out later. It's very time-consuming to listen through all the "matches" as many of them are not close to what made the "source" track appealing to me, but that's difficult to quantify for each song. Still, tons of stuff bookmarked and added to my Bandcamp wishlist :)

What vector database do you use? Did you run into any scalability challenges there?

100% ran into a ton of scalability challenges lol. Maybe I should write a blog post about it sometime.

But for now, ended up using plain old FAISS.

This is incredible! I've gotten really into some niche genres this year and in trying to discover more of it I have been really frustrated by Spotify's limited radio mixes - it seems like I've been "black holed" into the same 100 or so songs with rare exceptions, for any of the albums I autoplay from in a given genre. This really might be the best music discovery tool I've ever used, it reminds me of how magic Pandora felt the first time I used it.

My only request is to please let me multiselect some or all of the songs the algorithm finds and automatically create a playlist from them on Apple Music. The very first search I tried, for a song that Spotify always creates the same limited mix from, brought up a ton of music I want to check out.

Edit: on that note, it would also be neat to request a combined playlist that mixes together multiple searches. This might help provide feedback for the AI as well about what artists/songs people consider 'similar' to each other.

- How about an aggregate function and year/decade result filters?

Say, to figure out what are The Pixies ('measured') musical references (y)

Interesting, I put in a bunch of my favourite songs and found nothing I liked. The songs seem to be over matching on drum beat and tempo so there is a similarity to the song I suggested and the matches but it's often superficial.

For instance I picked a song with a very strong snare drum line. All the suggestions also had a strong snare drum line but wildly different melodies, genre's, tone, etc.

Yea, this is due to a shortcoming in the current model's design.

Got some stuff in the works for an improved model, hopefully will be able to ship it soon!

It would be great to be able to interrogate the model. For example, I'd love to know what it found to be similar between Sound Chaser[1] and Mellotree Park[2] !

1: https://www.youtube.com/watch?v=Eks6KcV2ufg

2: https://www.youtube.com/watch?v=IYPfbX0DmrE

Possible bug: I keep getting an error when I try to find similar songs to any song by Neutral Milk Hotel. (For example: O Comely, King of Carrot Flowers Part I) https://maroofy.com/songs/5611590

Love your application! Great job. Found some very surprising similarities to many songs that I tested.

Wow great project. I tried it with a song and I kind of get what I expected. The results are more unique compared to other song search sites

Hey thanks for your feedback! :D

I've spun up Hitchin' a Ride by Green Day and I think the results are quite interesting. Despite them being completely different from one another, they somehow manage to catch the vibe of certain sections of the song.

Here's the link: https://maroofy.com/songs/1159778217

Really cool project.

It's actually great. I got a ton of good recommendations. The key is to actually use it to search songs of similar beat or tone.

Personally, from using it a lot during development, I found that I kinda developed a sense for which types of songs it'll do really well on, and which types of songs it will sometimes struggle with.

But a lot of this should go away with a better, improved model!

This is amazing. I actually really like that it gives you songs across completely different genres and moods (while keeping the beat). One piece of feedback - it'd be nice if the search box didn't clear if you click out of it to e.g. look at another tab while it loads.

If you ever write a blog post about the process of making this, I'd love to read it.

It's quite good. Tried a couple house/electro/ebm tracks and it gave a good mix of stuff you'd expect and novel stuff you wouldn't. It finds a lot of songs that sample a given song, for obvious reasons. Glad someone gave this a go without any collaborative filtering or user ratings, just digging through waveforms alone.

I'm teaching myself deep learning at the moment, and learning about embedding vectors was the first "holy shit!" moment I had.

To me, it's fascinating that not only can you:

-represent things like words as vectors,

-map them in a multi-dimensional space, and

-use that space to find the "closest" neighbors (i.e. the most similar words)…

…but you can actually perform "mathematical" operations on them.

The canonical example is that, if you represent "king", "queen", "man", and "woman" as vectors in your embedding space, then you can ask your model "What is king - man + woman?" and (provided it's trained appropriately) it will return "queen".

I look forward to the day when we can ask something like "What is 'Bohemian Rhapsody' - 'Queen' + 'Velvet Underground'?". Which, if OP's model were to be trained on whole songs instead of previews, would probably be a reality!

Interesting idea. I think it needs some finetuning to find songs that are similar but not covers. For well-covered songs like Bohemian Rhapsody, this is more like a cover-finder.

Also this song, Shangri-La Is Calling, is bugging out: https://maroofy.com/songs/1632142336

I think I either found some kind of bug or Apple Music is really weird.

The similar songs for Anaal Nathrakh - Endarkenment [0] have Annal Nathrakh with エンダーケンメント at the top. So a misspelling of the band, and apparently (according to google translate) a Japanese transliteration of the song title (Endākenmento, enderkenment). The song itself is literally the same one, others listed are their other songs, or more Japanese transliterations of their songs. The weird versions can be played, but the linked page does not exist on Apple Music [1]

[0]: https://maroofy.com/songs/1522388463

[1]: https://music.apple.com/album/%E3%82%A8%E3%83%B3%E3%83%80%E3...

Ya Apple Music can have some interesting duplicates in their catalog lol :/

Can you do some deduping? A lot of the matches are just the same song appearing on multiple albums or single vs. album version.

Came with the same feedback, I'm having fun playing with it, but it's amusing to search for "Comfortably Numb" and the first match is "Comfortably Numb" :D


Yup sorry about this bug! Need to do some additional post-processing to prevent dupes in the catalog from showing up in the results.

I entered "Spiro - The Vapourer", a brilliant dance music inspired instrumental folk piece with intense, layered melodies.

It recommended "Buzz Cazon - Sentimental Attitude".

I have trouble putting into words how bad that recommendation was. Seriously, just compare those two pieces, do they have ANYTHING in common besides maybe vaguely the tempo?

Wait, you really don't hear the similarity? To me the two pieces sound uncanny close. The tempo sure, but also the rhythm guitar does almost the same thing, and the higher pitched instruments in both have similar timbre and do similar things. You could probably mix the two pieces together and it'd sound alright.

This happened to me before, I'd point out that for example Reckoning Song and Counting Stars are the same song, but people would swear they have nothing in common. Or Sail vs Believer, why was the second one even made. Is it that people focus mostly on the manner/vibe of a song, and don't notice what it actually sounds like?

I think maybe you can argue that from the 30 second excerpt from the middle of the song (which I guess the model is trained on). The Vapourer briefly drops into the "Arches" theme there, an original tune from one of Spiro's earlier albums which is one of the overlapping tunes in the piece. But I still don't think the similarity is great!

OK, but I'll concede it does SOMETHING pretty wild. Just not necessarily something good for recommendations.

Because I tried another song: "Sam Sweeney - The King of Prussia's March", and the second recommendation was "Polsk Nr. 48" by Rasmus Storm.

The wild thing? Well, first of all, Rasmus Storm isn't strictly speaking the artist. He was the composer. He was a 17th century Danish fiddler, who - rarely for his time and social standing - knew music notation, and wrote down his tunes in a book.

The crazy thing is that in that same book, "Murchy nr. 14" is the tune better known in England as "The King of Prussia's March"!

The odds of that happening by chance has to be absolutely tiny. "Obscure" doesn't even begin to describe it when it comes to Rasmus Storm's notebook.

If you could somehow filter the beginner bedroom producers the results might be even more compelling...but great job so far!

Super interesting to read the varying opinions on here. Seems it differs a lot for music styles and personal preferences. I tried a few songs from my favorites and while the similarities were cool to see, it didn't correlate to actually liking the song for me that well. Then again I couldn't say what makes me like a song. My best guess currently is a correlation the voice of the singer(s) for songs with lyrics but not the text in itself.

Still really excited to see updates on this, especially recommendations based on a playlist, since the big 2 (Apple Music, Spotify) haven't figured out recommendation for a all-over-the-place music taste yet.

Wow. The songs it suggests have such a similar tempo and vibe that when I try it with songs I don’t know I can’t really remember which is which.

I wonder if this app has more potential for the record companies and songwriters in terms of finding copyright infringement than it does for the consumer finding new music they like?

"similar-sounding music" is indeed a good way to put it. 120 millions ! This is awesome! Reminds me of https://cyanite.ai/ (Search by audio). I will have to give it more time, but so far, I like your results better. Well done!

This site works great for EDM. For example, just lookup some random future house song and it will pump out tons of similar tracks. This is a great tool for people that want to keep music consistent in their content, but don't want it to get boring. I will be using this for as long as possible!

Lot me try this with Lotuk by Arsenal. Every other service will only recommend other songs from Belgium, disregarding any other form of similarity (to my dispair).

Currently, the website seems down though?

EDIT: the results are ... not what I expected. They are similar in some sense, but I wouldn't consider them "musically" similar. It's like each of the recommendations has something similar, whether the drums, the baseline or the voice. But none of them feel similar in the whole.

It's as if similarity is measured with an L2-norm in a high-dimensional embedding space, instead of cosine similarity? Did you experiment with different scoring metrics in semantic search? I would recommend cosine similarity, if this is an embedding space trained with neural networks, whether with a contrastive loss or with a gaussian prior.

The results do sound very similar, which is interesting from a philosophical point of view: after searching all my favorite songs, I liked not one similar sounding result. What makes one enjoy art vs. not? There's clearly more to it than how something "sounds."

A good chance OP is going to hear from Apple legal, Apple recruiting, or both =) But cool project!


I think what you've done is very cool! But I haven't had much luck with it, I think it's because of the emotions that I've anchored into the songs that I like, which is very noticeably missing in the songs that are suggested.

This would also explain why people love and cherish songs that would objectively speaking be considered bad.

The feelings that people feel when they listen to the song, the environment that surrounds us and the situations that we're in.

But it is an interesting discovery tool!

What part of AI is this? Is it something related to how Shazam worked, as per their White paper. The idea is to create a visual image of the song, it's representation and run a model to identify things similar to it.

Is this AI or machine learning?

Nice work on this! As people have commented already, seems to provide more "very similar sounding sounds" versus recommendations.

I tried Glass Animals - Heat Waves, and after clicking through the recommended songs it's eerie how similar they sound to each other. As I would let the preview play and start the next song I sometimes thought my click didn't register because of just how similar one song would be to the last (https://maroofy.com/songs/1508562516 for those who want to see for themselves).

This is great! It works quite well. I would love to see something like this with the ability to use license the music for a Youtube video so that you could get good music for a video, and exposure and money to lesser known artists.

I tried it with a few tracks I really like, but no luck with the results.

They had some similarities in the tonal range, but not in style or quality and were sometimes completely different musical genre.

I can see some value in this kind of recommendation system, but this is a lot of work and it should probably be flexible enough to learn from the user, not with a single track but a complete collection.

I also think that automatic playlist management is rarely well done.

In practice I am often disappointed by simple shuffle algorithms, the critical part, again, is that we all have different taste a good software should be able to somewhat match those.

Didn’t do much for me. But I am an atypical listener — most of my playlists are different recordings of the same jazz standard by different artists or just same artist different era.

Probe 1: Birdland. First hit was the canonical Weather Report recording. 2nd or third was a popular Man Tran version. Then I saw a Maynard Ferguson track — ok that was a discovery as I haven’t listened to Maynard much for a few years. Didn’t like his version much, but still your software gets full points for discovery.

Probe 2: Minor Swing. First hit was obscure, and it linked off to a bunch of totally unrelated stuff. Django nowhere to be found.

Personally for me the best way of finding new music is friends whose opinions I trust. I find going by similarity will pull up a lot of derivative artists that suck (eg Last.Fm/Pandora). I want something good, and if it's radically different, even better!

One awesome use-case I can see for this though is finding alternatives to copyrighted songs. Let's say you make a sports video, and you have this fantastic song in your head, but you can't secure permission. It would be cool if this could find something similar to that song. Same style, tempo, etc. Even better if it's royalty free.

I believe a very similar use case was the original primary function of Richard's compression algorithm in the show Silicon Valley. Which of course was entirely underutilizing the technology but that was the joke.

This looks great.

Does anything similar exist that lists possible genres for a given track?

I’m a big fan of things like Ishkur’s Guide to Electronic Music and everynoise.com. It would be cool to be able to list out the genre(s) and show similar tracks.

Seems like your server is down or overloaded

Other people have called out limitations and I don't disagree, but I'll say that I do find the different approach, based entirely on sounding similar, interesting. Recommendation engines in music streaming services often tell me about related things I already know I like. That's not necessarily unwelcome (if I'm listening to one song that appears in Footloose, sure, it's not a bad surmise that I might like to hear another), but this tool surfaces a bunch of stuff I would have never thought to look for or found normally.

Good work! I tried some house tunes. I guess the results were similar but expected something more direct.. a suggestion; since some listens only includes the intro - it would be a good idea to match the first 5-10 seconds in harmonics; probably in the span of 90-180 hz. But I guess that would take a whole lot of new work.. So, anyways - if there's a way to jump in a bit into the song I think it might be beneficial. But after trying a bit more I realized simply that the quality of good productions was more of an issue. Great work anyways!

Hi! The website looks awesome! I have a problem with searching though. I can see the recommended songs below the search bar, but as I type to search something, it is just loading infinitely. I checked the network tab in DevTools and there is no requests made when typing or after typing and pressing enter.

I used Chrome. Cheers!

Edit: also, clicking away from the text input field shouldn't clear the value, in my opinion. The element which displays the results can still be hidden though. It would also be nice to control the volume of the played song.

This is cool. Have you tried Plex? They offer sonic similarity matching that's trained on a user's personal music library. I think it might be fun to do a comparison? This feature was added in 2021 though so I'm sure models have improved since then.


I tried with my favorite song and uuuh... https://maroofy.com/songs/1099847184 Most songs don't sound like Minerva at all. :(

There is nothing like "Crimson and Clover"? https://maroofy.com/songs/59412471

Yeah, typed in Ultravox's Just For a Moment. After the 4th or 5th suggested rap song I clicked I just gave up.

It's true though, there is nothing else quite like Crimson and Clover. I'm pretty sure when I was 5 years old and first heard that song it caused me to trip. Strawberry Fields Forever is like that too, you get high just listening to it. ;-)

Haha, true

You are into something here I am telling you, while spotify does provide recommendations of what you would like to hear, it does not provide similar sounding songs, as a musician I love it.

Haha, thanks! :)

Will definitely work to improve the current model!

I wonder why it doesn't find Weird Al songs. If you search for Michael Jackson's "Bad", I'd expect to see Weird Al's "Fat", but neither one appears in the other's results. Instead, the actual results are interesting, because you can hear some similarities if you listen closely. But they're not what I'd consider "similar"

Still, this is a really cool project and I'm sure there's a lot of potential for building stuff with it.

It seems to be finding snippets of songs similar to snippets. I tried on Metallica - Unforgiven III, which starts off with a slow piano composition, and then enters a riff and cuts out.

It ends up recommending piano songs, many Korean ones.

There's some interesting ones like Ghost - Cirice, where it finds other songs with similar riffs. I like Ghost's music in general, just not the Satanic themes, so this is a great tool for finding similar music.

I'm somewhat amused that it doesn't match Under Pressure with Ice Ice Baby.

good point, probably a final version should have a pipeline like 1. cluster song segments into styles and 2. search for each cluster or only the main cluster.

What would be a good NN architecture for the first step?

It seems like a data problem, not architecture. Usually these preview snippets capture the essence of the song. It works on say, Unforgiven II but not Unforgiven III.

We'll likely see a lot of first generation products tackle these low hanging fruit like "I dumped data into AI/ML and here's the output". But the next generation will likely be people who can handle data better - interpret it, clean out bad samples, and so on.

Really great tool. I found a lot of similar songs for less well known songs of mine where Spotify and Youtube Music both have stopped delivering any relevant recommendations!

Lots of duplicate tracks! Is that costing you more to train? Like, if I type a song I find the same recording on the original album, single, greatest hits and a compilation album.

Working on some fixes for the duplicate track issue! Sorry about that!

https://maroofy.com/songs/1651341589 is one of my favorite songs, but it says, "Hmm, something went wrong," and shows no similar songs.

When I was a teenager, I thought no band was as cool as Rainer Maria.

Does this mean I was right?


Nice app! I tried it for another fave https://maroofy.com/songs/1578598760 and will give those a listen.

Seems to work well on some stuff, not on others. I get that AI is able to correlate things that we can't really detect because we don't experience music as a raw waveform. But the emotion and sentiment in the vocals regardless of music genre is something that can link sings together in a playlist, and I don't see how this could do the same unless it's also considering mood tags (if you can get access to decent quality tags) and lyrics.

Can you share more details about the "custom AI audio model" used to generate the embeddings?

Curious what sort of metrics are taken into account, and if any supervision is provided.

I laughed a lot.


The one of "similar-sounding" is:

"Смели Граничари Димитър Коларов & Стоянка Колева"


"The Brave Border Guards"

It's a very Soviet song from such a different world that if you send "Man Man" to it in a time machine, the group will immediately end up in the Gulag.

So, it's all about the "similarity" definition. But it's fun of course.

Tried LSDREAM - Oblivion, and I get meditation music. Completely different and makes it hard to believe it "analyzes raw music audio as input"...

it is not the typical "song radio" approach that would provide more a popularity / artist similarity based playlist but a real "similar song" google like search. I think that is valuable to discover new music, especially to find unknown artists. But if someone were to embed it into streaming clients that would need to be a different feature too, probably more search related.

I tried a few of my favorites and the results were... predictable, I guess.

It kind of matches genre, but has no grasp of musicianship or why I might like a track.

And this is a fundamental problem. See Avery Pennarun's brilliant explanation of tracking users, data analysis, and recommendation systems to understand why:


Seems like the recommendations are somewhat hit or miss so far. I would suggest finding a way to say why it recommended a particular song. I know that this is already common, but maybe you could come up with a new spin on it.

Also, other comments have suggested that this was trained on 120MM of the audio previews instead of the full songs. That might explain why the recommendations seem a little off for some people.

I'm totally going out on a limb and guessing here - I'm more on the UI/UX side of things so I know nothing low-level about what goes into a building recommendation engine, algorithm, or whatever. But I do know music pretty well, and this feels like it's matching too closely to the technical aspects of the music and not the overall theme/je-ne-sais-quoi that makes a song something you feel.

I tried two of my all-time favorite songs:

The Gaslight Anthem - Handwritten

Thrice - The Artist in the Ambulance

On the first, I would say it was a total swing and miss - the recommendations had similar elements to Handwritten like strummed power chords, but the vibe was completely off. With Thrice, same thing. The first match was a remaster of the song itself by the band, so of course despite being a "different" track it was the closest possible match. The rest of the recommendations had a similar tempo and more heavy metal riffs (as are common in Thrice's songs), but none of them were songs I would voluntarily listen to.

This is a cool idea and I hope OP if you read this you take the criticsm as it's intended - I'm not trashing the implementation or anything. This just feels like it truly is a robot that can't feel the emotion of a song making me recommendations.

Honestly, thanks for taking the time to give such detailed feedback!

The current model I have isn't as good as I want it to be, and I'm working on a newer one with a different training process as well. This should address some of the shortcomings people have mentioned.

I was really nervous of shipping my v1 with the current model, but thought I might as well share what I have so far with the world, in case someone finds it useful lol.

Hey, no problem! Best of luck going forward. FWIW I almost never try out Show HNs and recommendations on here, the idea really piqued me. I am always looking to find the next new song or artist I love. So that should be a good signal for you. I look forward to revisiting in the future!

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact