
Breaking Spotify DRM with PANDA - ivank
http://moyix.blogspot.com/2014/07/breaking-spotify-drm-with-panda.html?
======
moyix
Post author here! I wanted to make it clear that the original technique for
breaking Spotify DRM is not mine - it was developed by Wang et al. in their
excellent paper, Steal This Movie: Automatically Bypassing DRM Protection in
Streaming Media Services. I just thought it would be a nice showcase of
PANDA's capabilities. In particular, we can avoid the crazy optimizations in
that paper because we can operate on a replayed execution rather than doing it
live.

~~~
msane
On the subject of cracking streaming music DRMs, a realization I have been
sitting on for a while about what people can do with it... Considering the
common wisdom that:

\- storage will get exponentially cheaper

\- data transfer speeds will get higher

It makes me think that eventually there will be illicit torrents of _all_ the
worlds music, plus the index, plus the metadata, and plus the interface/app
for browsing it. In other words, people would not only pirate individual songs
or movies, they would download their own complete copies of Spotify / Netflix.
It isn't feasible now but it could be sometime in the next 5-15 years,
depending bandwidth speeds.

I'm not sure how many people see this coming or take it seriously but I wonder
what the effect could be and what the remedy attempts would be.

~~~
null_ptr
You can see the remedy attempts already

* push for cloud storage over local storage

* push for locked down devices over general purpose computers

* push for DRM on the open web

* big ISP companies fighting against net neutrality

~~~
msane
I very much agree. I hope that, capitalism, ironically enough for DRM, is the
probably always the solution to overstepped DRM remedies. Regardless of the
issue of piracy, which I am not advocating.

For instance there will always be a market for custom computing and "PCs", and
so non-locked PCs will (hopefully) always exist in a capitalist environment.
That market I think ultimately circumvents any attempt at ubiquitous control
of hardware. The same thing is at play with software. And hypothetical new
methods of connectivity may be able to circumvent many attempts at central
control of the net.

~~~
userbinator
_For instance there will always be a market for custom computing and "PCs",
and so non-locked PCs will (hopefully) always exist in a capitalist
environment._

Unfortunately, that market is slowly becoming the minority, and because those
"more free" devices may have limitations that make them incompatible with a
lot of proprietary content (which is the majority) and circumventing those
limitations could be illegal and difficult, there will be fewer users of them.

Reminds me of this [https://www.gnu.org/philosophy/right-to-
read.html](https://www.gnu.org/philosophy/right-to-read.html) and the old
saying "If you outlaw freedom, only outlaws will have freedom."

------
timsally
Shameless plug to follow...

Much of the technology here was invented by Brendan and others at MIT Lincoln
Laboratory, which is where I work. We have been very lucky to have Brendan
join us for a few summers while he was completing his PhD at Georgia Tech and
he gave a great showing at RECON. Brilliant guy. If you're interested in
reverse engineering his most recent papers are essential reading:
[http://www.cc.gatech.edu/grads/b/brendan](http://www.cc.gatech.edu/grads/b/brendan).

In addition to some of the automated RE work, we've also got multi-million
dollar research efforts hacking the Linux kernel and reverse
engineering/analyzing embedded systems. Lot's of fun stuff. You get to work on
really exciting problems and you'll have the funding and the skilled coworkers
you need to execute successfully.

If you find this type of stuff exciting, you should drop me a line at
sally@ll.mit.edu. We're always hiring^. We've got great benefits too, like a
pension, unlimited sick leave, 13 holidays, 20 vacation days, and free classes
at MIT.

^One caveat is that because of how we are funded, we are only able to employ
US citizens.

~~~
reitanqild
> One caveat is that because of how we are funded, we are only able to employ
> US citizens.

Just wanted to say thank you: just being aware of limitations like this and
letting people know up front makes it a whole less annoying. Also the
circumstances makes it understandable.

~~~
test_account_1
I don't understand why being funded has anything to do of only hiring US
citizens.

~~~
sqrt17
From Wikipedia:

Since MIT Lincoln Laboratory's establishment, the scope of the problems has
broadened from the initial emphasis on air defense to include programs in
space surveillance, missile defense, surface surveillance and object
identification, communications, homeland protection, high-performance
computing, air traffic control, and intelligence, surveillance, and
reconnaissance (ISR).

Lincoln Laboratory conducts research and development pertinent to national
security on behalf of the military services, the Office of the Secretary of
Defense, and other government agencies. Projects focus on the development and
prototyping of new technologies and capabilities. Program activities extend
from fundamental investigations, through simulation and analysis, to design
and field testing of prototype systems. Emphasis is placed on transitioning
technology to industry.

Essentially, the Lincoln Lab is one of several labs that are part of a
strategy to make best use of creative people (who can think up things, but
don't necessarily want to productize/weaponize them), industry (who can
productize/weaponize things, but don't want to be part of a war effort) and
military.

To see why they would not hire non-US citizens, consider the first author of
the "Steal this Movie" paper, who now works at a Chinese university
(presumably on similar problems to the ones that people in LL and at UCSB are
tackling).

~~~
vfclists
Interesting. The DoD funds research which enables them to hack into other
systems, which means devices and our communications, and in return we get the
tools to hack DRM and everything else.

Not a bad trade.

------
MichaelGG
If you're wondering, PANDA is
this:[https://github.com/moyix/panda](https://github.com/moyix/panda)

QEMU based recorder. Sounds pretty neat.

Although if you wanted realtime audio decryption, I'd imagine just writing a
sound driver or hooking the OS's sound functions would be far more direct.

~~~
nzmsv
That would miss the point. The approach in the article does not hook the audio
driver, but rather the decryptor. This gets you the original compressed audio
file, not a stream that has been uncompressed (and would have artifacts when
re-compressed).

~~~
MichaelGG
That's fair. Although a lot of BluRay rips are re-encoded and seem quite
usable. Even the YiFY stuff at 1GB/hour or so.

Edit: Why can't a compressor perfectly re-compress the decompressed audio?
It's obviously possible since the compressed data exists producing that
specific decompressed data.

~~~
e12e
> Why can't a compressor perfectly re-compress the decompressed audio? > It's
> obviously possible since the compressed data exists producing that specific
> decompressed data.

It's not granted that a compressor c1, that given A produces Az that
decompress with d1 to A', can easily find any Ax that compress to Az, or
equivalently can easily find Az given A'.

Formulated like that it doesn't seem quite so obvious: finding Az from A'
amounts[1] to finding A from Az -- ie: lossless compression.

> Although a lot of BluRay rips are re-encoded and seem quite usable. Even the
> YiFY stuff at 1GB/hour or so.

Usable at any given viewing/listening set-up != actually remotely "good
enough". I always say people shouldn't buy more expensive hi-fi gear than what
they can actually tell apart -- the one problem with that (apart from people
not being honest with themselves, optioning for the more expensive stuff
anyway) is that when you're used to listening to crappy audio, you stop being
able to tell the difference.

It's like listening to an FM radio that's slightly off station -- after a few
hours, you probably don't notice anything wrong, until a new person walks into
the room and adjusts it to be better.

Another point -- while BluRay certainly isn't lossless -- when you're talking
the kind of compression/quality differences you mention (not sure what regular
bluray films are, but if they max out at 48mbit/s for AV, that's by my
calculations about 20GB/h) -- 1:20 -- I think you'd be hard pressed to notice
any "additional" artefacts. It would be like comparing a raw/flac audio file
compressed first to 320 kbps vbr mp3, and then compressed down to _16_ kbs
mp3, versus just doing the compression to 16 kpbs mp3 (well order of magnitude
is correct, obviously this is going to be mostly cutting into the video data,
but still). Just something to keep in mind.

[1] Ok, "may almost amount to".

~~~
sqrt17
Lossy compression != lossy decompression/restoration.

With vanilla JPEG, you should be able to redo the DCT and find the quantized
values exactly as they were, which means that you could losslessly reverse the
JPEG compression not in the sense you get a compressed version that
decompresses to the same lossy reconstruction.

With deblocking filters in MPEG2 and later, this is not necessarily the case,
because you try to smooth things over in decompression and can't reconstruct
the compressed version either.

------
lucb1e
Uh what? Wow, I thought for a minute it said you took a 30-second recording
and I was wondering what you were going to do to get the file format right.
Use a raw encoding to get close to audio levels? Then I read on, noticed
something and had to read back. Indeed, you didn't record the audio, you just
recorded and replayed the operating system. Say what!

~~~
keehun
Yeah, it's next-level stuff, here. He said around 12 billion instructions
which sounds like a lot, but with our current processors, not that much work
for the CPUs.

~~~
jzwinck
More specifically, a billion instructions is one giga-instruction. A 2 GHz
processor can execute roughly 2 billion instructions per second (this is a
very rough estimate, thanks to superscalar, pipelining, uops, yada yada). So
12 billion (typical) instructions will take around 6 seconds to run.

This sort of thing is handy when profiling: see a function taking a billion
instructions? Half a second of CPU time. And this ratio hasn't changed all
that much in quite a while (what has changed, of course, is how many threads
can execute simultaneously).

------
MichaelAza
This is really cool. Both referenced tools (PANDA and Volatility) seem
awesome.

I have some experience with IDA and Olly, but learning about new tools is
always nice. Any resources you guys recommend on the subject of reversing?

------
Steer
I don't know, perhaps I'm totally wrong, but this just depresses me. I can see
the point from an engineering point-of-view, this is a riddle to be solved,
but why would you want to help people circumvent one of the (to me) reasonable
ways of enjoying and paying for music? I know the author stopped short of
giving the full solution to getting it to work, but still.

Is this what we've come to? No one should get paid for anything if we can
enjoy it for free regardless of the hoops we will have to jump through to not
pay?

Sorry if I am too dramatic. I can often see the point of pirating things, but
in this case I just don't get it.

Edit: I would appreciate an explanation of the downvotes.

~~~
tptacek
I understand where you're coming from (also, piracy repulses me).

However: it's worth knowing whether content protection is implemented soundly.
Contrary to overwhelming popular opinion, there are content protection schemes
that work. Generalist engineers mistakenly believe that content protection
schemes must be unbreakable to provide value. They don't: all they have to do
is cost more to break than the value of the content they protect (across all
the users who might subsequently get access to it).

For an example of a content protection scheme that worked extremely well, see
modern satellite TV smart cards.

Given that there are ways to implement content protection soundly, there's
validity to research that determines whether a given content protection scheme
is sound.

~~~
bri3d
I generally agree, but I think you're conflating two kinds of content
protection.

One prevents the attacker from accessing something they aren't subscribed to,
and relies on crypto and secure subscriber identity mechanisms. This is
completely possible to implement soundly.

The other prevents the attacker from copying something they can see or listen
to, and relies on bizarre mechanisms designed to prevent the user from
learning the state of their hardware.

I find the latter awful, because it's an infinitely losing battle (you can
always point a camera at your display in the end) which erodes consumer
freedoms and encourages walled gardens.

Satellite TV is a funny example - since the communication is strictly one-way,
the hardware state needs to be protected or it can be cloned, but I still
think it's fundamentally a question of the transport protection variety rather
than the copy protection one.

------
ilyagr
I am wondering if someone who understands statistics better than I do could
explain conceptually how encrypted data is distinguishable from compressed
data. I always assumed that Shannon's paper says that perfectly compressed
data should be indistinguishable from random data (which is indistinguishable
from encrypted data). Is mp3 compression not sufficiently perfect? Is my
understanding wrong?

Thank you!

P.S. This is a really clever trick!

~~~
MichaelGG
Pretty sure you explained it: it's not _perfectly_ compressed data. Even a
small bias is enough to give it away.

------
shmerl
Isn't music available through DRM-free sources most of the time anyway? Just
don't use Spotify if you are against DRM. It will also be a vote with your
wallet against it. By using it you implicitly support DRM proliferation.

~~~
jdong
Voting with your wallet is much less effective than publicly complaining,
unless of course your wallet is worth a lot.

~~~
shmerl
My comment wasn't really to the authors of this tool. Their tool is a form of
the public protest and is appropriate. It was more directed to those who
actually use Spotify. For them, complaining while actually using these kind of
services (and helping them to spread more DRM in the process) is strange.

~~~
jdong
What if they like the product but would rather have it without the DRM?

~~~
shmerl
I'd say they should stop using it until the product drops the DRM. Otherwise
their complaints don't sound sincere. Or at least not convincing.

It's like as if smokers would complain that cigarets industry ruins public
health.

~~~
MichaelGG
Smokers complain, and now we have e-cigarettes which may be less harmful while
still allowing the enjoyment of nicotine and the fun of smoking.

I'm not sure why it's insincere. I keep buying ThinkPads, but I hate the new
designs with a passion. (And they hate me; they literally cause me RSI where
the earlier ones didn't.) It doesn't make my arguments against the new
ThinkPads any weaker.

~~~
shmerl
_> and now we have e-cigarettes which may be less harmful while still allowing
the enjoyment of nicotine and the fun of smoking._

So, we can have digital goods sold without DRM, so we could enjoy them without
police state methods attached.

 _> I'm not sure why it's insincere._

Because by buying from those who push DRM on them, users support and prolong
the usage of the said DRM. Complaints won't persuade DRM Lysenkoists. Loss of
profits can.

------
icoder
Would it have been possible to figure this out without the statistics by
playing something very specific, like a 440Hz tone (yes, there's spotify
'music' that does just that)?

------
spindritf
I'm pretty sure you used to be able to just copy .ogg files from Spotify's
local cache at one point. Am I remembering right? It's a soup of 10-2000kB
files now.

~~~
AlyssaRowan
At one very early point in its history, yes, I understand that was accurate.
Then they encrypted them, but the key was easy to find, and then they started
doing more complex stuff.

It's of idle academic interest to me. Never used Spotify, but I don't wish
them harm either.

------
n0body
A very interesting read.

------
the_cat_kittles
this is cool, but for someone who actually wants to rip songs off spotify, why
would you do this instead of use audio hijack? is there some advantage?

~~~
MichaelGG
As mentioned in another comment, if you do this you get the original
compressed file. If you capture the audio and then want to recompress it,
you'll introduce artifacts.

Although, it seems to me that an intelligent compressor could perfectly
recompress the audio back to the original form.

~~~
KMag
This presumes that downsampling in either bit depth or sample rate isn't
happening somewhere (such as your audio drivers) between decompressing the
audio and where you're capturing the audio.

------
sigzero
I guess I could just google it but isn't breaking DRM illegal?

~~~
DanBC
Yes, under US DMCA and EU something or other.

EDIT: here's the relevant wikipedia article
[http://en.m.wikipedia.org/wiki/Anti-
circumvention](http://en.m.wikipedia.org/wiki/Anti-circumvention) (sorry for
mobile link)

~~~
eli
_" Sec. 103(f) of the DMCA (17 U.S.C. § 1201 (f)) says that if you legally
obtain a program that is protected, you are allowed to reverse-engineer and
circumvent the protection to achieve the ability the interoperability of
computer programs (i.e., the ability to exchange and make use of
information)."_

Though I wouldn't want to have to test that in court. IANAL.

------
desarun
I read this and thought "I am not a true developer, I am merely a journalist-
style hack".

Disclaimer: Contractor, native app dev.

