- storage will get exponentially cheaper
- data transfer speeds will get higher
It makes me think that eventually there will be illicit torrents of all the worlds music, plus the index, plus the metadata, and plus the interface/app for browsing it. In other words, people would not only pirate individual songs or movies, they would download their own complete copies of Spotify / Netflix. It isn't feasible now but it could be sometime in the next 5-15 years, depending bandwidth speeds.
I'm not sure how many people see this coming or take it seriously but I wonder what the effect could be and what the remedy attempts would be.
* push for cloud storage over local storage
* push for locked down devices over general purpose computers
* push for DRM on the open web
* big ISP companies fighting against net neutrality
For instance there will always be a market for custom computing and "PCs", and so non-locked PCs will (hopefully) always exist in a capitalist environment. That market I think ultimately circumvents any attempt at ubiquitous control of hardware. The same thing is at play with software. And hypothetical new methods of connectivity may be able to circumvent many attempts at central control of the net.
Unfortunately, that market is slowly becoming the minority, and because those "more free" devices may have limitations that make them incompatible with a lot of proprietary content (which is the majority) and circumventing those limitations could be illegal and difficult, there will be fewer users of them.
Reminds me of this https://www.gnu.org/philosophy/right-to-read.html and the old saying "If you outlaw freedom, only outlaws will have freedom."
It seems to me that the a-grade private torrent trackers are exactly that. Larger catalogue, more comprehensive metadata, and better interface than any commercial service.
> In other words, people would not only pirate individual songs or movies, they would download their own complete copies of Spotify / Netflix. It isn't feasible now but it could be sometime in the next 5-15 years, depending bandwidth speeds.
If you have the infrastructure, and are really dedicated, you can definitely do it today.
>It seems to me that the a-grade private torrent trackers are exactly that. Larger catalogue, more comprehensive metadata, and better interface than any commercial service.
Regardless of metadata quality and collection size which I think is debatable although having not seen one... they are a la carte, so much different.
> If you have the infrastructure, and are really dedicated, you can definitely do it today.
No infrastructure is required, just multiple terabytes of storage (which most households do not have... yet), and bandwidth that makes downloading terabytes an attainable feat. Needless to say the later is not viable today either. But eventually both those things will become commonplace.
15 is a high number too. The set of music one would be possibly be interesting in hearing (of the set of all music which has already been produced) may be closer to 3-5 terabytes. Movies of course, are different larger set of numbers.
It's always fascinating to me how far the tools of the trade have come on since my heyday. They surpassed my own custom tools of yore a little while back, and things like this were things I could obviously never do at the time - nowhere near enough storage, for a start, but I also didn't know about the compressed/encrypted distinction (that would have saved me a lot of time!).
That you can do it in such an automated fashion now, where I just paged through the disassembly and hexdumps until I saw something that leaped out at me, is stunning. I'd coded a few helper routines, and my debugger was completely stealth and I could rewind a couple thousand instructions (the secret of my success! <g>), but never anything like that back then.
Perhaps I should look into this PANDA.
- Records (with PANDA) every instruction run and every piece of data read or written by that virtual machine for half a minute while it's playing audio. (!)
- Analyses that recording and uses some very clever statistics to identify functions that read chunks of data that looks encrypted, and write chunks of data that looks compressed (yes, you can tell the difference, compression is imperfect).
- Out pops one likely candidate, which sure enough is the decrypter.
Much of the technology here was invented by Brendan and others at MIT Lincoln Laboratory, which is where I work. We have been very lucky to have Brendan join us for a few summers while he was completing his PhD at Georgia Tech and he gave a great showing at RECON. Brilliant guy. If you're interested in reverse engineering his most recent papers are essential reading: http://www.cc.gatech.edu/grads/b/brendan.
In addition to some of the automated RE work, we've also got multi-million dollar research efforts hacking the Linux kernel and reverse engineering/analyzing embedded systems. Lot's of fun stuff. You get to work on really exciting problems and you'll have the funding and the skilled coworkers you need to execute successfully.
If you find this type of stuff exciting, you should drop me a line at firstname.lastname@example.org. We're always hiring^. We've got great benefits too, like a pension, unlimited sick leave, 13 holidays, 20 vacation days, and free classes at MIT.
^One caveat is that because of how we are funded, we are only able to employ US citizens.
Just wanted to say thank you: just being aware of limitations like this and letting people know up front makes it a whole less annoying. Also the circumstances makes it understandable.
Since MIT Lincoln Laboratory's establishment, the scope of the problems has broadened from the initial emphasis on air defense to include programs in space surveillance, missile defense, surface surveillance and object identification, communications, homeland protection, high-performance computing, air traffic control, and intelligence, surveillance, and reconnaissance (ISR).
Lincoln Laboratory conducts research and development pertinent to national security on behalf of the military services, the Office of the Secretary of Defense, and other government agencies. Projects focus on the development and prototyping of new technologies and capabilities. Program activities extend from fundamental investigations, through simulation and analysis, to design and field testing of prototype systems. Emphasis is placed on transitioning technology to industry.
Essentially, the Lincoln Lab is one of several labs that are part of a strategy to make best use of creative people (who can think up things, but don't necessarily want to productize/weaponize them), industry (who can productize/weaponize things, but don't want to be part of a war effort) and military.
To see why they would not hire non-US citizens, consider the first author of the "Steal this Movie" paper, who now works at a Chinese university (presumably on similar problems to the ones that people in LL and at UCSB are tackling).
Not a bad trade.
Could you comment on whether your participants are in any way, directly or indirectly, working to find vulnerabilities for the NSA to exploit, to your knowledge?
QEMU based recorder. Sounds pretty neat.
Although if you wanted realtime audio decryption, I'd imagine just writing a sound driver or hooking the OS's sound functions would be far more direct.
Edit: Why can't a compressor perfectly re-compress the decompressed audio? It's obviously possible since the compressed data exists producing that specific decompressed data.
It's not granted that a compressor c1, that given A produces Az that decompress with d1 to A', can easily find any Ax that compress to Az, or equivalently can easily find Az given A'.
Formulated like that it doesn't seem quite so obvious: finding Az from A' amounts to finding A from Az -- ie: lossless compression.
> Although a lot of BluRay rips are re-encoded and seem quite usable. Even the YiFY stuff at 1GB/hour or so.
Usable at any given viewing/listening set-up != actually remotely "good enough". I always say people shouldn't buy more expensive hi-fi gear than what they can actually tell apart -- the one problem with that (apart from people not being honest with themselves, optioning for the more expensive stuff anyway) is that when you're used to listening to crappy audio, you stop being able to tell the difference.
It's like listening to an FM radio that's slightly off station -- after a few hours, you probably don't notice anything wrong, until a new person walks into the room and adjusts it to be better.
Another point -- while BluRay certainly isn't lossless -- when you're talking the kind of compression/quality differences you mention (not sure what regular bluray films are, but if they max out at 48mbit/s for AV, that's by my calculations about 20GB/h) -- 1:20 -- I think you'd be hard pressed to notice any "additional" artefacts. It would be like comparing a raw/flac audio file compressed first to 320 kbps vbr mp3, and then compressed down to 16 kbs mp3, versus just doing the compression to 16 kpbs mp3 (well order of magnitude is correct, obviously this is going to be mostly cutting into the video data, but still). Just something to keep in mind.
 Ok, "may almost amount to".
With vanilla JPEG, you should be able to redo the DCT and find the quantized values exactly as they were, which means that you could losslessly reverse the JPEG compression not in the sense you get a compressed version that decompresses to the same lossy reconstruction.
With deblocking filters in MPEG2 and later, this is not necessarily the case, because you try to smooth things over in decompression and can't reconstruct the compressed version either.
A good example of this is the guy who re-uploaded the same video to YouTube many times.
Yes, it sounds theoretically possible, but it may involve searching a huge search space and may be computationally infeasible.
For example, Opus's CELT encoder uses lapped transforms and keeps about the same level of constrained-energy. Combined with the (hybrid) voice codec in some very advanced ways, it makes for the most advanced audio codec around by quite some margin.
You look at video, and nothing's that good yet, not even HEVC. The only thing that leaps to mind is Xiphophorus's Daala project - https://www.xiph.org/daala/ - which is hoping to do for video what Opus did for audio and develop a royalty-free, awesome video codec (rather than being donated a royalty-free, okay two or three) - and that's I'd say one or two generations ahead of HEVC, but of course, still very very early work.
Many devices have hardware acceleration for mp3 decoding, which also helps with battery life. I'm not sure how many devices have hardware accelerators that can be used/repurposed for Vorbis decoding in the Spotify case. I believe it's less common for hardware decoders to be useful for FLAC decoding, so there may be a battery life penalty for re-encoding mp3 or Vorbis audio as FLAC. (I also presume everyone realizes re-encoding lossily-compressed audio using a lossless codec likely pays a size penalty without any quality increase.)
This sort of thing is handy when profiling: see a function taking a billion instructions? Half a second of CPU time. And this ratio hasn't changed all that much in quite a while (what has changed, of course, is how many threads can execute simultaneously).
I have some experience with IDA and Olly, but learning about new tools is always nice. Any resources you guys recommend on the subject of reversing?
Is this what we've come to? No one should get paid for anything if we can enjoy it for free regardless of the hoops we will have to jump through to not pay?
Sorry if I am too dramatic. I can often see the point of pirating things, but in this case I just don't get it.
Edit: I would appreciate an explanation of the downvotes.
I strongly doubt this will significantly affect the level of piracy in music; instead it's a really interesting application of a pair of interesting things, PANDA and the difference in 'randomness' between encrypted and unencrypted streams.
I wonder if we'll start to see more crypto methods that deliberately avoid looking like encrypted data to make this sort of attack harder. ISTR agl's pond did something like this, but I could be imaging it, and I've not seen it widely implemented.
Because DRM sucks. Why can I only listen to Spotify through their player? Why am I restricted to using it only on computers/devices that they've ported their player to?
It's not exactly the same, but back when iTunes sold DRM encrusted music files, I used to strip all the DRM from the files I legally purchased, so that I could play them on my Linux system, stream them to my Philips wifi speaker system, etc.
Being locked in to a single vendor's ecosystem sucks. This is part of the reason why I currently refuse to purchase video from iTunes or Amazon. When the industry wises up and drops DRM from video, I will happily start patronizing them.
Spotify is a bit different—it's more like Netflix than iTunes or Amazon. Still, if I'm paying for it, it's still frustrating to only being able to use their players .
 My TiVo has a built-in Netflix player, but it really sucks. Much of the newer content doesn't actually decode properly. Being locked into their system means I'm at their mercy. If I could get at the stream I could use a computer somewhere to transcode it into something the TiVo could reliably play... But the DRM prohibits me from doing that, and that's frustrating.
However: it's worth knowing whether content protection is implemented soundly. Contrary to overwhelming popular opinion, there are content protection schemes that work. Generalist engineers mistakenly believe that content protection schemes must be unbreakable to provide value. They don't: all they have to do is cost more to break than the value of the content they protect (across all the users who might subsequently get access to it).
For an example of a content protection scheme that worked extremely well, see modern satellite TV smart cards.
Given that there are ways to implement content protection soundly, there's validity to research that determines whether a given content protection scheme is sound.
One prevents the attacker from accessing something they aren't subscribed to, and relies on crypto and secure subscriber identity mechanisms. This is completely possible to implement soundly.
The other prevents the attacker from copying something they can see or listen to, and relies on bizarre mechanisms designed to prevent the user from learning the state of their hardware.
I find the latter awful, because it's an infinitely losing battle (you can always point a camera at your display in the end) which erodes consumer freedoms and encourages walled gardens.
Satellite TV is a funny example - since the communication is strictly one-way, the hardware state needs to be protected or it can be cloned, but I still think it's fundamentally a question of the transport protection variety rather than the copy protection one.
Meanwhile, it all works because nobody cares to break your DRM because audio is already widely distributed in lossless DRM-free formats (CDs), and Blu-Ray DRM is already broken. The pirates don't want Netflix's 5Mbps stream when they can just buy a Blu-Ray and get a 50Mbps copy instead. (Similarly, they don't want Spotify's Vorbis stream because they can just source material from the uncompressed CD.)
(Piracy works because it's a http://en.wikipedia.org/wiki/Smart_cow_problem)
My point was that normally we explain piracy by "it is an easier way of enjoying ..." and I can agree with that, but few things are easier to use (and pay for) than Spotify. I can see your point about researching their content protection.
Edit: Wow, learned my lesson, don't express admiration. I guess I just don't get the downvoting without leaving a comment.
Whereas any software-based mechanism, without buy-in from the OS/OEM, is essentially going to always be one click away from being cracked, as far as end-users view it.
I have a PPC computer running Linux. It is impossible for me to use Spotify on this computer. I am happy to pay for the service. Is it irresponsible for me to reverse-engineer the protocol so that I may use a service I have paid for on a device they do not support?
Reverse-engineering the protocol doesn't necessarily mean I want to pirate the content.
Right now I'm listening to an audiobook. I would LOVE to listen to that on my Linux machine. How..?
I would DEFINITELY prefer not to lose that access (again?), when I decide that the subscription for crappy DRM'd content isn't worth my money.
Still, I'm annoyed at the DRM, I can't listen to what I want where I want, and there are alternative DRM-free alternatives, so I'm also seriously considering cancelling my account. Glad to hear they still let you access your previous purchases (it was my main worry).
Also, DRM is not necessarily intrinsically linked with payment. For example, people release DRM-free games on the Humble Bundles, and they make millions for charity. There are some business models that don't work well without it, yes (Spotify's particular model of streaming, for example), but plenty that don't need any DRM at all (digital radio stations, for example).
I would even argue that Spotify only needs to include DRM to pacify the labels it has deals with. Because you can get basically any specific mainstream song via Youtube and if you want to have huge collections of some music genre there are torrents for that, too. The main selling point of Spotify is its implicit promise of rewarding the authors and the comfort of music selection it offers.
The albums are The Hands that Thieve by Streetlight Manifesto and The Hand that Thieves by Toh Kay, but I'm not sure if there's a detailed writeup anywhere. The band posted this: http://streetlightmanifesto.com/update-the-continued-struggl... and eventually had to cancel all of the preorders. They also posted this about their label: http://streetlightmanifesto.com/streetlight-manifesto-proudl...
Purchasing drm-free downloads, physical CDs, cassettes, and vinyl are the reasonable ways.
It's a damned convenient service though.
My iTunes library is huge. I still end up listening to the same albums with my (paid) Spotify subscription for the pure convenience alone.
Edit: I've just remembered the boxes upon boxes of CDs I have in storage in the uk. That's five years, doubt I've given them a second thought for at least the last four.
I see the "problem" as a simple one. If there is chance that a systems DRM can be broken by merely one person, all efforts by that provider, and now most other providers, at least those that share methodologies, is pointless.
If a song, book, or movie are locked down, only legitimate subscribers can use that media under the terms the owner or distributor of that media define. But if only one person of the potentially millions is able to break that encryption, one person has now made all the work and man hours of into said encryption completely worthless.
If it takes a few weeks to break encryption that took months of multiple man hours to create, was that time wisely spent? Once it's broken, the data is now open to everyone. That being the case, I feel were in a "why bother" scenario.
The main reason I see illegally pirated material not proliferating even more is the technical barrier to acquiring the files. And often times, applications and protocols like bit-torrent, tor, VPN's, SSL links, etc. have too high a technical barrier for the common user to start pirating their media.
Once that burden is removed, it's game over and everything will be free at a click of the mouse, until something new comes along or a better business model that works with the fast changing technology world we are in.
When you are waging a war of one against millions and the one can actually win without putting themselves in harms way, you have a war that many will think is a sure fire win for the millions. A million against one is a pretty good ratio. But in this case, the one has a very good chance of annihilating the millions.
Eventually we will learn there is no point. In the meantime, people do this type of reverse engineering for a number of reasons I imagine. Curiosity, the challenge, making a political statement, and most importantly, to learn.
From this, perhaps something new is learned. And from that a new library is made which then gets adapted to detect intrusions on our personal computers and embedded devices. Who knows what may come of this research.
In the end, my position on piracy doesn't matter. What I do know is it will happen, encryption will be reverse engineered or broken, and those who can't learn the technology to use the new research will continue to find the barrier to piracy too high. Others won't, and will get their media freely, sans any lockdown from what you want to do with what you purchased.
Cars can go well over the speed limit. Some to speeds that make no sense to even exist. But they are legal, and most people follow the speed limit laws. The cost and barriers of breaking those laws are too high so people follow the rules. With DRM, the cost is negligible, so the rules will be broken.
But mainly, I think the answer to your question is that curiosity drives a lot of this. Aside from curiosity it could be a lack of trust. Researchers want to know what is going on with their media and hardware.
How do you know an encrypted file is not up to nefarious deeds? The common user never will. But a researcher like this would discover that the file was up to more than just DRM, and that could potentially help those millions of people become more aware of security, and learn to be more skeptical of what they buy.
P.S. This is a really clever trick!
Also, audio codecs usually use prepared huffman codebooks for the final step.
For most codecs they're defined as part of the format, few (such as vorbis) prepend the codebooks to the data stream so compressors can use different codebooks if they choose to do so.
For vorbis there used to be an experimental tool that optimized huffman coding of an existing stream, giving another 4%-or-so compression with otherwise identical data (ie. no quality change).
These 4% plus framing are probably the entropy differential they're exploiting here.
It's like as if smokers would complain that cigarets industry ruins public health.
I'm not sure why it's insincere. I keep buying ThinkPads, but I hate the new designs with a passion. (And they hate me; they literally cause me RSI where the earlier ones didn't.) It doesn't make my arguments against the new ThinkPads any weaker.
So, we can have digital goods sold without DRM, so we could enjoy them without police state methods attached.
> I'm not sure why it's insincere.
Because by buying from those who push DRM on them, users support and prolong the usage of the said DRM. Complaints won't persuade DRM Lysenkoists. Loss of profits can.
confusing Spotify with amazon or apple DRM would only display lack of understanding on part of the consumer. The truth is for streaming services you are NOT buying music!!!
1. Renting does not make sense for digital goods (I already explained in the past why).
2. DRM doesn't protect anything. I.e. it can't enforce the renting paradigm. This very article demonstrates that DRM is broken and it can't stop piracy. Therefore there is no need to use it ever even if we assume that it's an ethical practice (it is not). All DRM does is punishing paying customers by crippling the usability of the product (limiting supported devices / platforms / players / formats and so on), while having zero effect on pirates who pirate the same stuff DRM free.
It's of idle academic interest to me. Never used Spotify, but I don't wish them harm either.
Although, it seems to me that an intelligent compressor could perfectly recompress the audio back to the original form.
EDIT: here's the relevant wikipedia article http://en.m.wikipedia.org/wiki/Anti-circumvention (sorry for mobile link)
Though I wouldn't want to have to test that in court. IANAL.
Disclaimer: Contractor, native app dev.