Fact that Shazam is 18-years-old made me curious, and found the following on Wikipedia:
>> “Initially, in 2002, the service was launched only in the UK and was known as "2580", as the number was the shortcode that customers dialled from their mobile phone to get music recognised. The phone would automatically hang up after 30 seconds. A result was then sent to the user in the form of a text message containing the song title and artist name.”
I remember the first time I Shazamed a song. It was during the flip phone era and it was honestly the most magical thing to me at time. I had always wanted something like this, and here it was at my fingertips.
Around 2000, MIT had a number you could dial (toll-free) and ask a computer about the weather. It would keep track of where and when you were talking about during the conversation, so after asking "will it rain in Boston today" you could ask "how about Thursday" and then "how about Rochester, NY" and it would read you Thursday's forecast for Rochester. A very cool end-to-end demo of conversational tech.
According to Shazam Paper[1] (I can't find publication date), there were nothing fancy, just keypoints extraction on spectrogram. No deep learning with 100 layers or something like that which wouldn't be possible in 2000.
It's pretty obvious what GP meant. The concept of crowd-sourcing small chunks of work as a service is not the same thing as a hoax which involved a single person masquerading as an automaton. Amazon chose a clever name for its service that is memorable and references humans fronted by machines.
Ehh, I have seen references to using a person instead of an algorithm before Amazon released their service. Basically, if your automating human dexterity it's robotics, if your automating the brain it's AI. Sorting vegetables being a useful early example where humans could be thought of as a replaceable black box if you can break down what they are specifically doing. Thus Machine vision and classification where two common tasks because you need to play any game not simply replay a specific one.
The notion of a "mechanical turk" however, and of operations putting a facade on humans doing the work -- to which the parent alluded to ("a big mechanical turk sort of operation") and not specifically Amazon's variety , has been known and used for centuries.
Lol I just recalled that I applied for a job there in 2014 when they had a position open in San Diego. Gmail turned the email I sent with my resume. It always bummed me out that they never contacted me back.
2580 was also the middle digits of the keypad, making it even easier. I made heavy use of the service back in the day.
There was another service around the same time called Any Question Answered. This was before high quality internet on phones, and you could SMS them reasonably complicated questions and (at first) get good replies. Notable successes were their getting me ownership information for a pub, and telling me which local shops had an iPad in store. Service degraded significantly over time.
That's how you see it today. But back in the day people probably thought, wow, look how creatively they are using the incredible capabilities of modern phones.
Apple would have no problem implementing something similar.
It's the brand, mindshare and music store/service lead gen that's more difficult to replicate. Why get rid of an icon that's already on everyone's phones that could be a funnel to apple music instead of spotify?
It's odd that they didn't try to purchase soundhound then. The company has more evolved tech, and also has voice recognition services beyond just music through houndify.
If they bought soundhound for the tech to bake into their own service, they'd be competing with Shazam. If they bought Shazam for the tech to bake into their own service, soundhound would just exist as an alternative. Seems like an attempt to buy the "name brand" to get tech and inherently beat the competition at the same time.
Note: I've never heard of soundhound though, so it might be popular in some places. Shazam is like the name-brand of music recognition though, to the extent of being a verb.
Workflow wasn't a popular Android application that sends users to a competing service though... I could see the app live on iOS with Spotify integration stripped out, but seriously doubt it has a future on Android.
This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.
Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.
> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.
That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.
Lol. When you superimpose noise, the original data is still there. When you have a FM radio playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store and being captured by a terrible microphone and then being compressed, a significant amount of nonlinear distortion has taken place. That is extremely hard to model. And you would have to model it or have real data to train a neural network. Neural networks are extremely hard to train without excellent data.
I mean, you can easily find thousands of hours of music online. Recording background noise is easy (just go to a random bar where they are not playing music). Now simply add the two signals (you can shift them randomly to generate more data). You can also add some linear filtering if you like (just imagine random settings of an equalizer for starters).
This should give you enough data to build a proof of concept at least.
Illegally grabbing thousands of hours of music to train a commercial model hardly qualifies as fair use. Any company you build upon that would be tainted.
For sustaining:
In addition, you'll need to keep an updated catalog of music to identify new songs against, and most uses of a service like shazam are to find names of songs people aren't familiar with, so that catalog needs to be very fresh.
That means you'll have to grab some sort of feed, and engage in large scale music piracy for commercial gain or have access to a library of songs from many disparate music providers, such as ascap.
Background noise:
there are literally hundreds of different background noise environments you need to train against. Dozens of common microphone configurations. Clipping, variations.
It's very much a problem where a proof of concept is neat but doesn't really get you anywhere.
Also, I'm not saying it's impossible or not worth doing (obviously, it's possible and worth doing), just that a few minutes of thinking and hacker news comments are going to hardly touch the breadth of difficulties required to get this to work even somewhat reliably.
Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.
I'd be more curious to see you try to build this in an afternoon.
Also, it works a lot better than being able to find "slightly distorted" versions. It can catch a song in a noisy room where you can barely make out the song to begin with. Couple months back it found a song when there was a very loud crowd yelling over it. They're also able to determine differences between versions of songs pretty well. Some remixes might sound very close to the original.
Other thing you might be missing is just how fast it is even on a slow mobile connection.
This is a heap of nonsense. Not only does this summarily dismiss the enormous challenges in digital signal processing required for removing arbitrary background audio, it exposes some confusion associated with the ideas of correlated random variables, inner products, and affine transformations.
A tiny percentage of the time occasionally when a successful company makes a N-hundred million purchase of a technology and company and you don't understand why, it's because they have made a mistake.
The smart money, though, is on the main chance: you don't understand the purchase, or the problem domain, or both.
In this case I think you are overestimating the progress in NN and search, and underestimating the signal processing. Have you tried this with any significant corpus?
"Whack it through a FFT and do correlation " seems like one of the obvious solution to the toy problem version, but this is exactly the sort of thing that usually falls apart in practice.
Building the service is not usually the hard part, but building the ecosystem around it is. There were/are countless services similar to Facebook or Twitter, but only a few of them can be really successful because of herd mentality.
Thanks. This actually proves my point that the core concepts of Shazam can be implemented in a weekend. Of course, programming the front-end etc. is more work, but that is besides the point.
> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.
Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.
Yes, you can look at the frequency with the highest intensity in the FFT. This is the "dumb" version of converting music to notes (and is what I really intended to say but didn't choose to for sake of brevity).
The thing is that the process looks for spectral patterns, let's call it "harmonic content per unit of time," not just notes. Mere notes would result in lots and lots of false positives.
Let's just agree that the process is not too far removed from my initial brief description, and should be simple to implement, as the article shows. For any competent signal processing engineer, this should all be evident, which was the main point.
Also, even if you have many false positives, you have already narrowed down the search, and this allows you to do more brute-force searching like computing cross-correlations.
Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?
y'all realise they run a music service too right? Having access to cross-platform data that gives them insight into bleeding edge emerging/trending artists and songs is priceless.
Huh. I thought Apple completely killed Spotify. I personally had to switch because Artists started doing exclusives with Apple only and I really couldn't justify Spotify over Apple Music, even though I much preferred Spotify's experience. I've also noticed Apple's catalogue is much larger than Spotify's. Is Spotify getting better? I'd love to switch back.
I haven't used apple music before, but spotify's ml recommenders are really impressive, and most of my favorite songs and artists were recommended to me via its "discover weekly" playlist. Its apps are super slick (cross device play/control is really handy!), especially compared to itunes.
Just buy a month of premium ($10/month or $5 if you're a student) and try it.
Apple Music has recommendation playlists as well: New Music Mix, Favourites Mix and Chill Mix which all get updated weekly based on your likes/dislikes and existing music collection.
I actually find Spotify apps to be far worse than iTunes at least on iOS. And the Apple Watch app for Apple Music is really impressive.
Apple's recommendations can be ridiculous though. When I go to the "For You" tab in their iOS app here is what is shown as I scroll down
1. Favourites mix - i.e. the music I've played the most
2. Recently played - i.e. the music I've played recently
3. Tuesday's Playlists - the first of any real recommendations so far, but 4 of the 9 album covers it shows in the thumbnails are music I've played recently
4. Heavy Rotation - i.e. music I've played a lot, but not just recently
5. Tuesday's Albums - recommendations based on an artist (Waxahatchee) I've listened to
6. Artist Spotlight Playlists - a selection of playlists, including "Influences" and "Inspired By" playlists by artists that I don't listen to and are really unrelated to most of my collection.
7. New Releases
finally there's the wordy stuff I don't care about, social media posts.
Most of this stuff is not even bad ML (like the "Amazon recommends me vaccuum cleaners because I searched for and bought a vaccuum cleaner" problem) it is just literally showing me what I listened to. I've tried the recommended playlists a handful of times and they don't really show me much new things, they remain pretty unchanged in the weeks or so that I check them.
When you throw in the fact that they periodically delete all of the music I've downloaded, and nuked a chunk of my music collection after I signed up ... I have to say, my experience of Apple Music overall is pretty terrible.
Just as a single counterpoint: Spotify has Mixtape of the Week and two similar auto-generated playlists, and the songs in there are at best vaguely related to what I listen to on Spotify - I haven't found anything interesting in there.
The albums I haven't listened to in a while and might want to listen to again according to the app are those I listen to daily.
New releases are not sorted or filtered by genre, so I guess it is great that some pop or reggae artist has a new album out when I only listen to metal on Spotify?
Etc., etc... in other words - I use Spotify for totally unrelated reasons and switched from Google Play Music, but it has all the same faults, it just works better for some part of the target group, but it is in now way perfect, or even good, with regards to their recommendation engine either.
Yeah I agree with this. Spotify is a good experience. Save one. Their security is shit. After having my account hacked for the unpteenth time I finally threw in the towel since they clearly are not interested in securing their damn service and jumped to Apple Music. The UI is not as good as Spotify but I don’t worry about my account being hacked every week and their catalog is bigger at Apple Music. My wife found some obscure ass Pakistani tune she loved since she was a kid in Delhi. That was hardcore. Spotify never had much desi stuff.
I hope Spotify fixes their security woes. Either way I have no reason to leave Apple Music now.
Taylor Swift is back on Spotify, which is the only artist that I ever noticed missing (and on Apple Music).
Personally I think Spotify's recommendations, radio stations, and app (both mobile and desktop) are just more pleasant to use than Apple Music and iTunes.
For me it remains amazing and has more of the music I like than Apple, and it's recommendation algorithms blow Apple's out of the water. I guess it depends what music you are into. 3rd party integrations are also miles ahead of Apple Music such as ability to use Amazon Echo and Fire TV as output devices.
Wow, that is an unusual exit after 6 rounds of funding. Crunchbase has it at $143M in funding up to that point.
It goes to show how the switch from radio (station directed programming) to streaming (user directed programming) has put a huge crimp in music discovery and music promotion.
Question from a non-Silicon Valley professional: Where does all that money go? Obviously, some [likely big] portion of it is still on the balance sheet, but does an app like Shazam really cost that much in engineering talent to update it and iterate it and server space to keep it running?
It varies quite a bit from company to company. They were founded in 2001 so I expect most of it was salary and physical plant.
The business is/was connecting resale opportunities to brands and artists[1] so you'll have a fairly significant sales and marketing effort although typically you will pay sales people for performance so their compensation will track revenue.
But to give you some things to think about, if you have an engineering team of 15 engineers, median salary $120K, and an 'overhead' (office, health plans, insurance, etc of 60%) then that is $200K/engineer/year (or $3M/year or $48M for 16 years [2001 - 2017]) that is just integrating cost per engineer over time using constant engineering. You can put any function in you want for head count (does it grow exponentially? does it grow in chunks? etc) and then add a C-suite team (higher median salary) and an 'overhead' team (IT, marketing, HR, etc) and you can burn through that fairly quickly.
It is a useful thing to build models for this stuff as your 'pre-operationally-cash-flow-positive' costs are really the health and future of your company.
There are a lot of people that work at tech companies that aren’t highly paid engineers. GP specifically called out engineers at a higher salary, so GP was obviously including receptionists, admin staff, etc. I’d be shocked to find that the starting salary anywhere around here is 120k.
Most of Shazam's engineering is in London. Going rates for engineers in London are way below the valley. Facebook/Google/Apple pay UK engineers about 30-40% less than their US counterparts.
Just because all you see is an app, doesn't mean there isn't more software they have running behind the scenes. A lot of consumer facing companies need to have DMCA copyright compliance software, explicit content software and various "big data" integrations, both to power better recommendations engines, but also for BI purposes. They also need a way to ingest content so they can identify new music.
A good app looks like it only takes a few engineers to maintain, but in likelihood there's a lot of complexity, even that's outside of the core "platform" software going on.
Marketing, sales, legal (especially in the music space, though less so for Shazam), engineering (integrations with Spotify, Etc.). People cost a ton of money. :)
Salaries for - Chief Twitter Feed Monitor, Chief Assistant of The Twitter Feed monitor. The Special Secretary to The Assistant of Twitter Feed Monitor, etc etc etc
Perhaps. Impossible to know the inside info but I can conceive of a scenario where Apple said "we are going to build this, either by acquiring you, your competitor, or starting from scratch. ... so what will it be?"
Best outcome for all in that case would be acquisition. (Just ask Flux)
I suspect the hardest part of their business was not doing the technology, rather it was building relationships with the brands and artists. Making it frictionless to push a button on your iOS device to recognize a tune and then put it in your Apple Music playlist for many future billable plays would seem to be a better model than Spotify.
The data that Shazam can gather must be incredibly useful to record labels, in that many people wanting to know the artist/song is a strong signal that a track is a potential hit. Since Apple already have data about what people are actually listening to (by location), this gives them additional data about what people might want to listen to.
I love Shazam for finding new music and this is really one of my favorites. A couple times a month I see what the top Shazam’s are and usually pick up a new song or two.
Music is at my fingertips from a variety of apps. My biggest problem is discovering new artists or songs.
Forbes, or was it BusinessWeek, actual did an article on that (i.e., selling data to record labels) a year or so ago. If I get a moment I'll try to track it down later.
I'm a bit shocked that they're paying $600M. Is Shazam even profitable? It feels like a clever tech demo from 10 years ago that's already been replicated in search engines like Google.
According to MacRumors, it's a deal "that could be worth $400m". If I were to guess (based on just public information which is almost nothing), the stock is not going to be worth much. Obviously Apple is going to pay the least they can, so I wouldn't be shocked if they're paying exactly what investors need to either get their money back or maybe a slight profit (if the recent investors had conditions for it). Employees will probably get nothing, except a new plan. The total sum of that new plan would depend on hitting some very aggressive numbers, that are unlikely to be hit (no idea what they'd put the metrics on). And the bulk of that is probably going to the CEO who has to stick around for 4 years to cash in.
Yes, they became profitable in 2016 via mostly brand advertising [1][2]. The company has nearly 400 employees according to LinkedIn. With a very conservative personnel estimate of $125k/employee/year, that's $50M alone. Let's throw on another $25M for everything else and they need to be doing at least $75M+ in annual revenue.
I have no idea, it's bewildering to me frankly. Take two recent IPOs for example: MongoDB had 820 employees as of July 31st. StitchFix had 5,800 employees as of July 31st [2].
Is it actually profitable? "Profitable" is a fun term to throw around to make your business look good, but it can mean a lot of different things (net margin profitable, gross margin profitable, pre tax profitable, etc). I'd be surprised if they really were all that profitable (meaning they would end each year with more money than they started with).
Fair point, guess we'll see when Apple discloses more details on financials.
Given that Apple seems to compensate junior engineers with compensation packages north of $300k, it wouldn't be too surprising if they overpaid for Shazam too. Apple also has $74.2B in cash and short-term investments, so paying $0.6B for Shazam doesn't really move the needle.
Has anything replaced "Into Now"? I loved using it, and ever since Yahoo bought them up and then shut them down, I've always wondered why Shazam or Sound Hound hadn't integrated the audio from TV shows and movies into its database. (http://mashable.com/2011/01/31/intonow/)
I still can't understand how that feature works. They stated that everything is done locally on the device, so privacy is not an issue. But how can they host a huge catalog of songs fingerprints without blowing the storage? Are they only saving the top charts?
It's definitely not just the top charts. Notably it has picked up when someone near me was playing Chopin on a piano as well as the occasional KPop tune my girlfriend is listening to.
My guess is it supplements the data with songs from your Google Music and youtube history.
They only store a catalog of the most popular songs. (Where 'most popular' is determined by your geographical location). This catalog is periodically updated.
I think the Pareto principal kicks in hard here. If it stores the top 20,000 songs as the other comments suggest then I expect that would include the vast majority of music you’d come across.
at 3.3kb it's only about 64mb to store 20,000 songs. If they periodically update that (in the background, on wifi, etc) you'd likely never know. It's basically noise when OS updates are multi-gb.
According to a friend who works at Google, they only store a catalog of ~20k popular songs that updates periodically. Fingerprints are small, so an archive of that size isn't too big.
> The 18-year-old company, which has required twice the average time to deliver an exit for backers, was valued at about $1 billion when it closed its last funding round in 2015.
If the last round had a strong liquidity preference then it wasn't really valued at $1 billion, and those investors might have even come out ahead.
Apple does not seem to like to do big purchases. Even though they probably could. Spotify is not (yet) profitable so they would probably have to hike up the prices. At this point they are probably better off doing their deals on their own and slowly catch up.
I should probably try Spotify as so many people say it is better than Apple Music (which I do genuinely like). Spotify currently offers 3 months premium subscription for 1€ so it may be worth a shot.
This is (unfortunately) a huge testament to patenting technology. I remember Shazam shutting down a lot of early iOS apps that provided music discovery services. The technology it’s self became relatively simple in the last 10 years, but their ownership of the patent has kept them on top.
I suspect it’s due to end soon, and they realised once it’s gone they would just become a feature of music streaming services. Good to get out now while there is still some exclusivity for Apple to milk.
I've used Shazam for years but lately for some reason I've been running into more music that it can't identify from two separate phones (so it's not possibly a bug limited to one device).
Hoping this doesn't mean there service is degrading because I've really benefited from it over the years.
I used to use it to identify music in shows or soundtracks, but it started just saying "Breaking Bad Episode 4" which while more accurate was less helpful.
Shortly after that everyone else had music discovery natively anyway.
Can anyone think of another brand that has become generalized as an action, the same way as "Googling" other than Shazamed? To me "Shazam it" is shorthand for use whatever music ID service you have.
Fact that Shazam is 18-years-old made me curious, and found the following on Wikipedia:
>> “Initially, in 2002, the service was launched only in the UK and was known as "2580", as the number was the shortcode that customers dialled from their mobile phone to get music recognised. The phone would automatically hang up after 30 seconds. A result was then sent to the user in the form of a text message containing the song title and artist name.”
SOURCE: https://en.wikipedia.org/wiki/Shazam_(company)