Hacker News new | comments | show | ask | jobs | submit login
Hack-petya mission accomplished – Petya ransomware decryptor (github.com)
451 points by aw3c2 on Apr 11, 2016 | hide | past | web | favorite | 79 comments

Must be pretty nice to have a son-in-law who can, oh, you know, just come over for Easter and reverse-engineer the cryptographically complex malware holding your system hostage.

Less at stake on this one, but I had a friend in college whose father was a business owner. He was stuck with some database software that all his customer records were locked in; he wanted to migrate but had no way out other than copying them all over by hand. Up until one weekend when his daughter brought home her boyfriend, who dad had regarded as a dirty hippie up until then. Turns out dirty hippie was a pretty good hacker and in the course of a few hours, he wrote a program to export dad's database into a usable format.

Moral: your significant other may not be impressed by your hacking skills, but you might win the family over for life in a few hours.

I am guessing the daughter was less impressed by the event? I mean, she went out of her way to find someone her parents disliked, and it turns out he is just what his father was looking for... :)

I realize it was a tongue in cheek comment (at least interpreted that way), but that was a serendipitous opportunity for the young lady's boyfriend. I know I've had several such opportunities with my now in laws. Regardless of the differences of opinions, nothing wins over people more than just helping when help is needed.

Jeez, thank God that malware authors are too stupid to include gnupg within their code.

I'm afraid of the moment when they get more clever. Or if they include filesystem drivers that silently encrypt every file... if done right, you can even compromise backups transparently for weeks until arming, so that backups are essentially useless too.

> so that backups are useless too

It can hardly be considered a backup if it's easily compromised from your every day computer. In a good backup configuration your computer only have read and "append only" privilages for the backup so past snapshots are safe.

Having said that not many people have this kind of config (including me).

Not sure why you're being downvoted, you make a very valid point.

I guess it's because this kind of backup doesn't mitigate an ongoing compromise of your machine (much like what the GP talked about). I don't think any backup can protect for that. However a good backup limits the damage to the timeframe of the machine being compromised (if your machine is compromised for 1 day you might only lose that day's new data).

> Last, but not least, it fires up a genetic solver which gets us the key in a few seconds.

That's pretty impressive! On the other hand, is it a general vulnerability of Salsa that genetic solvers can break it? Sounds like a huge vulnerability.

It's a 16-bit version of Salsa20, which is far less secure. Proof: this project. I have no clue what possessed the author to do this.

Stop hand-rolling crypto, people! Ransomware needs security too. :(

On the other hand, it's not a critical security application. If someone breaks it, you just don't get a little money. If the exploit becomes widespread, you can update the malware.

The author probably wanted to be able to somewhat quickly encrypt/decrypt a full disk on potentially slow hardware.

Even a poorly-implemented homebrew encryption function would have been better than using a broken implementation of a common, trivially-recognizable function. It was easy for them to run this code through an SMT solver because it was easy to obtain the original source code.

It would have been smarter to bundle the ransomware with a 32-bit DOS extender so the known-good Salsa implementation could be used unchanged.

It's a way of sponsoring community-led crypto research.

My guess is that the authors wanted to ensure they could encrypt a full disk quickly.

Maybe it's just a test to see how quickly this was broken.

For a genetic algorithm to make beneficial mutations, one needs a fitness function. What would the fitness function be if the password either works or it doesn't?

The target fitness function is clever. Basically, it is "symmetric difference cardinality".

That is, the closer the bits in the hash result for the candidate are to the target hash result, the better it considers the candidate (in particular, cardinality of symmetric difference for a bitset is the count of the number of one bits in the symmetric difference. It then sets the fitness to mean lower is better, so the smaller the number of different bits, the better it considers the candidate)

This means it just tweak input key bits through mutation until the result comes out right. Now, in theory, there should be no correlation, so this should be no better than random search, but ...

This violates everything I know about hash functions, so it's amazing that it works. Has anyone tried this for industry standard hash functions?

An ideal hash function would have perfect avalanching; that is, changing one bit of the input would result in a 50% chance of each of the output bits being flipped. While modern cryptographic hash functions have at least really good avalanching, definitely enough to resist a genetic search, I'm not sure that it's perfect. Home-brewed crap like what Petya was using is unlikely to be very good at it at all.

The first random number generator I learned about was super simple. Middle squares. You take a number, multiply it by itself, and take the middle bits. Because of the way the binary multiplication algorithm works, the bits are pretty thoroughly scrambled and not correlated to the input.

I'm not sure if there aren't clever ways to undo it, but at least it would resist such a simple method like that.

Klondike has had a closer look as to why the genetic algo could work at all: https://klondike.es/klog/2016/04/12/cryptanalyzing-petya/#mo...

From a quick look at the code it looks like something derived from the key is compared with some value on the encrypted disc, i.e. a key verification check. Like you store passwords, you hash the password and compare it with the stored hash to check whether it is valid. You can then, at least theoretically, use the hamming distance as a fitness function and this seems to be what the code does.

  return int(bitset.From(c.qwords()).SymmetricDifferenceCardinality(target_bitset))
What really confuses me is how this can possibly work for a cryptographic function where any change in any one of the input bit is supposed to, on average, flip half of the output bits. But then again I am not curious enough to analyze the code in detail.

It looks like the implementors failed to follow the recommended usage and design (and indeed common sense) of the algorithm:

* They used 16 bit math instead of 32 bit math (with, I'm assuming, a 32 bit output size rather than the recommended 64 bit output). Which has the effect of looking like 10 rounds, but it's a rather more serious security failure.

* They generated keys in the alphanumeric range (instead of the full byte range) significantly reducing entropy.

All which seems to have weakened it substantially (understatement).

I read that on the GitHub page but I am still having a hard time imagining that those changes weaken the function enough to become attackable by a genetic algorithm. According to my intuition even relatively weak cryptographic functions should be pretty hard to tackle by genetic algorithms due to the complex structure of the search space with many extrema. Maybe I will have a closer look tomorrow, I am starting to get really curious.

I think the limited key space is a large part of the answer, 46 lopsided choices is easier to learn about than 256 equally distributed choices. The smaller word width probably makes the memory requirements reasonable.

I was wondering the same thing, I would naively have expected that genetic algorithms are absolutely useless for attacking cryptographic functions because the search space is full of local extrema. Maybe the used version is really a lot weaker. I would also like to know why the author chose this solution, it would probably have been one of the last things I would have considered. Maybe he observed something that pointed in that direction?

I don't really understand the matrix substitution used, but the fact that the genetic algorithm works like it does seems to suggest it's about as secure as a Vigenère cipher (working on bit strings). Unless the genetic solver doesn't provide any speedup and is just ignorantly used instead of a brute force approach. It's extremely unlikely that Salsa itself is vulnerable though.

Yeah, my first guess would be that the genetic solver is degenerating to basically a brute force search here (which is feasible in this weak instance). Will have to take a look to confirm...

It's probably a crap hand rolled key derivation function instead of a proper hash.

Off topic - this might be the single most uttered phrase in crypto reviews in the last year.


Looks like it used a [1-9a-xA-X]{8} key. Even with a CSPRNG that's only 46 bits of entropy. You can brute force that in 2-3 hours with adequate hardware.

Don't these things generate a unique key for each file?

User-space malware, sure. Petya does a full-disk encryption.

Even if they do, the per-file keys must be algorithmically-derived from the master key. (Otherwise, how can it unencrypt the files after you pay the ransom?)

What's the shelf-life of one of these ransomware fix utilities? I'd be inclined to assume that if they operate by exploiting a flaw in the ransomware, the ransomware author can simply "fix" the bug and the utility no longer works.

Here the crypto used by the ransomware seems to be thoroughly and systematically broken, so while it's certainly possible that a stronger variant will appear, I have my doubts on the author's capacity to do much "better" any time soon. I'm far from being a cryptographer, but at a glance it looks so weak I have to wonder if it's deliberate.

People who are cryptographers have a tough time writing secure implementations the first time around, is it really surprising that amateurs do?

Presumably they have enough financial resources at the moment, they don't have to do it themselves. They can pay someone to do it.

Most anyone able to charge market rates might be more motivated to reveal they've been contracted to do this than to actually do that.

Someone who has a hard time getting market rates might not be as qualified to fix the software as they think they are.

There are places in the world were good computer / science education doesn't necessarily mean being able to demand "market rates" comparable to the West. I can see some programmer n Russia or Romania being tempted by a side-deal like this. Especially if presented with a Western "market rate".

Many programmers in Russia and some eastern european countries probably wouldn't even see that as a "side deal"

I haven't been back there in many years now, and tried to give the benefit of the doubt ;-)

>at a glance it looks so weak I have to wonder if it's deliberate.

Any suppositions as to why it would be deliberate?

Maybe the person who posted the exploit/solution was the person who created the ransomware? That way they get BTC from people who pay and donations from people who wait?

/krazyconspiracy :-P

If I was an aspiring ransomware writer, I'd extract _a lot_ of useful ideas from this thread. :P

The targets for this malware are laypeople with enough funds to pay the fee but not enough to go after the creators. Hypothetically, if a national government or other agent with sufficiently deep pockets were infected they would first attempt to crack it, failing that they would go after the creators directly- they would not pay the fee.

So implementing weak crypto dramatically lowers the risk of prosecution or worse for the attackers.

Maybe not deliberate, but most likely optimizing encrypting speed over stronger encryption. Better for hit and runs I guess.

To add to the suppositions: perhaps the author was afraid they may get bitten by their own malware and designed in a weakness as a backdoor in case they lost access to the key server and needed to decrypt their own drive.

This is my favourite of all the answers, because it would mean that the dev would have known how incompetent s/he was, and thus made a weak system in order to account for the likelihood of a "self-inflicted wound".

Here's one...maybe the actual developer who was contracted to build the damned thing purposely used weak encryption to try to ameliorate his/her part in making this malware mess.

Well, I can't say that it's deliberate, but it's definitely some "interesting" choices.

A reasonable Salsa20/20 implementation on a modest processor encrypts on the order of 400MiB/s [0][1]. So that the author went to the trouble of halving the number of rounds and changing the word-size when there are perfectly good and plenty fast off-the-shelf Salsa implementations strikes me as rather strange. Surely this is not the bottleneck on most employee's machines.

And then there's the apparently tiny keyspace used. You could argue that this makes the key more "user-friendly" (!), but just a hex representation of a random 256bit key would seem much more natural, and the poor targets are hardly in a position to complain.

[0] https://www.cryptopp.com/benchmarks.html [1] https://cr.yp.to/snuffle.html

Actually customer service is very important to these operations. They have to create trust that paying will relieve your machine of their infliction.

Maybe they wanted to keep CPU usage low? Although disk usage would be noticeable as well (usually).

As a security reporter I perceived a large technical gap between the developer of warez and most users, who tend to pay a one time or ongoing license for use of some tool. While a developer of illicit tools may adjust in a future release, typically the user does not have capabilities to patch their own tooling.

Interesting assertion. What have you seen to make you think this?

Also, does it apply to the PHP/Perl world of malware, the kind that gets installed on cracked, old and weak WordPress sites? There's 20 variations of "Web Shell by oRb" floating around, for example, and many, many versions of Perl IRC bots, for example.

In many cases (don't know about this one) they come after the network has been shot down.

I'm curious as to also why they just don't go for full Salsa20 and do 32-bit words instead, seems like a logical quick fix upgrade that might break the decryption.

Maybe some machine they are targeting (hospitals?) only work with 16-bit words.

I have the strangest urge to build a tiny user space around the malware kernel that gets booted by the infected MBR... opens IDA

Careful, I don't know about you but anytime I have a thought that ends with "let me just open IDA real quick" nobody sees me for 2-3 weeks minimum.

I don't know what IDA is, but you get an upvote, because I have had this exact experience so many times... so many times.

If I could have a dollar for everytime I put my head to a random problem and came out 3 months later... I'd have more than fifty dollars.

IDA is the industry-standard binary disassembly tool. Expensive, so used over (dramatically) cheaper competitors like Hopper mainly by people whose company pays for a license :D

Or people who have ...disassembled the locked down original?

I got to play with some hackmes in a hackathon and my teammates got frustrated by the crippled IDA demo version that they just downloaded a pirate version.

Amusingly the readme for the pirate version said "I cracked it with IDA Pro" or something along those lines.

This is why Pandora's Key was a fruitless business endeavor

I'm very sad that Hopper has been discontinued on Windows, but radare2 has promise.

Coming soon™: GNU/Petya

It doesn't even allow installing non-Free software, FSF will love this.

I'm guessing the ransomware uses Salsa10 because it's fast?

According to https://www.cryptopp.com/benchmarks.html, Salta20 is 4x faster than AES - and if I understand it, Salsa10 is half the iterations.

Salsa20 / ChaCha20 are quite nice when you want to do byte-for-byte file encryption. It's a good choice in a lot of situations.

The unusually reduced number of rounds is more difficult to explain, since I expect hard drive I/O would be the bottleneck in almost any situation. Maybe it's just a little more "stealthy" to not have elevated CPU usage while the ransomware is incubating.

It's not Salsa10; it's just that the author is counting double-rounds, which makes it looks like it's half the iterations. This has as many rounds as Salsa20, but whoever converted this to 16-bit words didn't know what they were doing---the security degradation is enormous.

Even ChaCha8 (similar to Salsa8) so far is infeasible. The problem is that the randomware used a modified Salsa algorithm that uses 16-bit words instead of 32-bit words. This demolishes the algorithm, and is not a standard modification.

"Don't roll your own crypto" strikes again.

Are there any security researchers that can comment on this? I'm very confused about the efficacy of a genetic algorithm on a hashing function. Having read some of the comments, I'm only more confused (how can the symmetric difference be a useful fitness function?).

Anyone has a link to the malware sample?

Looks like someone got their hands on a sample to throw in theZoo. Also, live malware sample warning! BE VERY CAREFUL WITH THESE.



HN typical question:

Why Go?

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact