Hacker News new | past | comments | ask | show | jobs | submit login
Editing binaries: easier than it sounds (danluu.com)
106 points by platz on Mar 24, 2014 | hide | past | web | favorite | 46 comments

Tangentially related:

>But they couldn’t push the old version into the auto-updater because the client only accepts updates from higher numbered versions

This is one of the reasons why I always make my auto-updating routines as dumb as possible: The only thing the client does is to tell the server its version and then let the server decide. If it believes there's an update, it sends the (signed) update. If it doesn't, it 404s.

This way the server (easily modifyable) has full control and there's less chance for the client (hard to modify, needs to work for any kind of update to work) to end up in a non-updatable state.

Putting as much logic as possible into the server component is definitely the way to go.

I did something similar, I would send the version as well as the md5sum of the executable.

Aside from being able to tell the 32 and 64-bit versions apart, the other advantage was that I'd be able to track whether they were running an executable with the md5sum I shipped, or if it had been hacked. Add that to an ip geolocation server, and now I've got a map of who is using a legal version and who is using an illegal version. I even plotted the success and spread of different cracks over time.. in our case, the Russian cracks usually lagged the Chinese cracks, but then had a better distribution network. Our auto-update function would check once a day on startup, based on the same version string and md5sum of the last check. So, we could see the person creating the hack.. same IP address, same version string, multiple md5sums, and their final md5sum would spread.

I also added the demo status and a unique machine id, and that let us answer questions like "Do paying customers ever install a hacked version?", "Do people who start with the hacked version ever pay?", "Into what languages should I translate my software", "How many demo users are in our funnel", "How long does it take a demo user to convert to paid", etc.


For us curious, could you please tell what answers did you find for those questions?

Also, I'd love to see that crack spread map, or hear more about it :).

To answer the questions, at least in our particular case:

- Do paying customers ever install a hacked version?

Yes, this happened fairly often.. less than 1% of the time, but still often enough for us to take it into consideration.

There were some silly suggestions about what to do when we detected it was a cracked version. If there is _any_ chance that it would occur on a paying customer's computer, you can dismiss the silly suggestions pretty quickly. In the end we decided to just save out files which were unreadable by end users, but that we could recover the data from. Our customers had to contact us, and then we could recover the data and sort out their licensing issues at the same time. We didn't have any issue with them using a crack to get their job done, we just wanted to make sure their experience was good.. sometimes the cracks broke functionality.

Non-paying users could become customers and get their data, but this didn't really happen very often.. it was enjoyable to read their reactions on the warez forums though.

- Do people who start with the hacked version ever pay?

Um, not really, as far as I could tell. IIRC it was 1 or 2 out of tens of thousands.

You have to be careful not to spend too much time on this stuff.. if you aren't converting cracked users to paid users, then the time is better spent just improving the product.

- Into what languages should I translate my software?

In our case it was Italian, Korean, Japanese, Spanish, German, and French. We had a lot of Russian and Chinese users, but almost none of them paid, so it's not worth paying for the translation.

- How long does it take a demo user to convert to paid

We switched from a time-limited demo (30 days) to a usage-limited demo (25 saves). After using up your saves you could still use the product to learn it, but just not save or export your work, and if you needed a time-limited demo or any other kind of trial extension we would provide that for the asking. We did this because we thought that our target users were very busy, and giving them one shot to try out the product over 30 days was unreasonable.. it was more reasonable that they would have a few hours here and there to learn it and evaluate it.

Anyhow, the data on conversion time did bear out the decision.. we had users who converted from demo to paid the same day, after 1 week, after 6 months, etc.


In general, our approach was to encourage people to pay if they're the type of people that pay for software, and not worry too much about people that don't pay for software.

Unfortunately I don't have the actual heatmap of legit / cracked users. I do have a copy of the country by country breakdown, but I'm not sure what date range it was for.

I do remember that the cracks would come out only a few hours after we released our software.

For these numbers, legit was a combination of currently in demo, and fully paid. Anything else was a cracked version. From most total users to least, here are the top few.

US: 78% legit

China: 17% legit

Korea: 37% legit

Italy: 64% legit

UK: 75% legit

Russian Federation: 18% legit

Germany: 61% legit

Japan: 80% legit

Spain: 71% legit

Turkey: 22% legit

Sweden: 83% legit

Taiwan: 37% legit

Ukraine: 27% legit

Poland: 73% legit

France: 64% legit

Czech Republic: 51% legit

Hong Kong: 17% legit

Canada: 78% legit

Most of this isn't too unexpected. We did have an amazing Polish trainer/reseller.

Here's some more data. The number of users was quite small in most of these countries, so it's not a good sample.

Countries with 100% legit copies: Switzerland, UAE, Peru, Vietnam, Mauritius, Malta, Morocco, Latvia, Venezuela.

Countries with 0% legit copies: Asia/Pacific Region, Bangladesh, Chile, Costa Rica, Estonia, Kenya, Singapore, Armenia, Kazakhstan, Moldova, Syrian Arab Republic, Pakistan, Georgia, Lithuania, Indonesia

I think I've got the raw data sitting around somewhere, maybe it would be nice to put it into a blog post some time. I'd have to get permission though.

So, do paying customers ever install a hacked version? Do people who start with the hacked version ever pay?

The short answers are "yes, sometimes" and "hardly ever". Long answer is in the other message.

At a previous job we had a pretty serious off-by-one error in a crypto module. This module was distributed as a binary DLL to dozens of customers. For various reasons it would have been a costly and inconvenient project to rebuild it from source and redistribute it. So I located the bug and realized it could be fixed by just increasing a constant in the binary.

The best part is that the relevant bit pattern turned out to be legible in ASCII. So this serious bug could be fixed by searching for something like ".," and replacing it with ".~". I taught the devops guys to do this in Notepad++. The DLL could even be reloaded without restarting the server. Problem solved!

When I left the company and wrote a little report about this, the architect patted me on the back and sighed deeply. With his customary tone of affection and good cheer, he said "mbrock, you are a horrible, horrible person."

I don't understand the architects response, was he being sarcastic?

I think he appreciated my willingness to sometimes be a horrible, horrible person for the sake of the business. :)

In my youth, I learned to recognize conditional JMP instructions by their opcodes and would bypass copy protection and nag screens all the time with CD 90, or explicit JMPs depending on how the code executed. I probably spent more time working on the crack than I ever did playing the games I'd patch, but I have no doubt that the practice is what inspired me to become a developer. It was so satisfying when I understood how a program worked at such a low level and I could get into the heads of the creators of those applications and games.

Chinpokomon, if I didn't know you weren't me, I would've sworn I wrote these lines.

Like a program I wasn't even interested in. Once done, wouldn't even use it.

Years I was 14-15 were spent looking at disassembly listings(I used Win32DASM) getting drowned in patching versions (HexDecChar Editor), changing things. Sometimes not keeping track, wait, what did I change ?

Learning to organize, even on paper. Offsets.

Then writing small programs in Assembly, Pascal and C to do that.

EB, 74, 75, 76, 90 etc.. Had a special meaning to me. Test calls, bogus functions and bloated DLLs getting called from a gazillion places(some of these on purpose).

This has been invaluable to me. When I went to college, some courses have people struggling with debugging a program, but as you said, when you've been so close to the machine and didn't get lost, following an index is fairly easier.

Also undeniably, when we had uP architecture course, I already knew that 5 years before, so it just was a refresher (bear in mind that they didn't even scratch the surface of what I was doing on my own as a kid. The most interrupt they toid with was printing (09h/21h) whilst I played with TSR and some neat things.

It was also very useful with microcontroller class. PIC micros had only 35 instructions (didn't like MikroC).

A lot of what I did as a kid or a bit later, turned to come in handy, always, at a later time..

So it became a sort of rule .. Whenever I'm doing something, I trust my gut instinct that some time, I'll use it. I'm yet to be proven wrong.

Thanks for sharing, man.

I remember one of the more esoteric applications of binary patching I had to do was to get a bunch of DOS utilities available for an older version running on a newer version, since they had somehow decided to hardcode a versioncheck in them. I used a text editor in binary mode to do it, entering the bytes using Alt+nnn on the keypad.

Thanks to that, the sequence 180 48 205 33 is now embedded in my memory, as is the number 144, and I've acquired the ability to literally read x86 machine code in ASCII. That's a skill that is probably not much more useful than memorising digits of pi, but it's fun to see the reactions of experienced developers when I open some executable in Notepad and start mentally disassembling it.

I've bypassed a few protections too, but probably have spent more time on tweaking various software to my liking. Correcting spelling errors in messages, adjusting UI elements, removing useless messageboxes ("Operation complete! Click OK to continue."), etc.

That skill is actually very useful if you are researching/developing format string vulnerability exploits.

There was a presentation I saw where a guy was able to extract a private RSA key from memory of a novell login screen by using typing in machine code using alt key combos in the username or password text field, exploiting a format string vulnerability.

This is the video of that presentation for the interested: http://www.youtube.com/watch?v=jv0adeL4x1U

Here's a more recent version of that same presentation where you can actually read the screen https://www.youtube.com/watch?v=9b_ZWJec95w

Frankly, I know a little x86 asm and patch stuff if needed.

For example, I once patched the setup of NT 3.1 to work with an Aztech IDE CD driver: http://www.betaarchive.com/forum/viewtopic.php?t=28570

(for those who can't be bothered to click the link, it was a conditional jump to unconditional jump. I always prefer to patch a conditional jump to an unconditional jump, or NOP it out, depending on the code flow, as that way it would work in all cases)

..and in some other related subject, I know a little z80-gb asm too, and I've been playing with that Pokémon arbitrary code execution.. destroying a save file in 23 bytes is fun. But then again, so is beating the game in 12 bytes: something that I ported across to every language and version of Red/Blue/Yellow for the lulz.

Me too. We created a group for competing breaking protections on commercial programs.

It was more than patching things with NOPs(), but correcting checksums and so on.

We never released the programs once without protection. It was fun.

()for people that are not into this, just replacing opcodes alone does not work as the OS checks that the executable has not been modified, and the program could do it too itself in very sophisticated ways.

Just because I'm interested.. what was the name of your group?

The cracking was the game. The actual game starting was just the victory cut-scene.

There are some binary edits which are just plain trivial: Editing data which is embedded into binaries. As an 11 year old, I was annoyed that Civilization had "Americans" but not "Canadians", so I pulled out a hex editor and corrected that omission...

Of course that's only trivial if the strings are the same length - in my childhood I broke some programs that way.

Ah, the memories. I had to thrice reinstall DOS on my childhood computer because of that. I later learned that editing binary files in a text editor wasn't a very bright idea...

In the case of Civilization, the strings were padded out to a constant length.


Raymond Chen has some interesting articles on patching and hot patching Windows. Among others:

* http://blogs.msdn.com/b/oldnewthing/archive/2011/09/21/10214...

* http://blogs.msdn.com/b/oldnewthing/archive/2013/01/02/10381...

This one most closely matches the spirit of the article:

* http://blogs.msdn.com/b/oldnewthing/archive/2012/11/13/10367...

The Microsoft Money one is actually linked from Dan Luu's article. Interesting reads, all of them.

This is worth it just for the link to radare: http://www.radare.org/y/?p=features

I don't actually program a whole lot and when I do it's usually part of a reversing effort on something obsolete or buggy or that I can't figure out how it works (not warez stuff, though). Assembler is enormous fun to dig around in if you have time on your hands and enjoy puzzle-solving. Another good free tool if you're into this sort of thing is Ollydbg. : http://www.ollydbg.de/

You can do pretty radical modifications of code too if you're willing to move it somewhere else to get the room to enlarge things. It turns out that almost all x86 instructions are position independent; there's only a few that are. Then you jump from the original location to the patch, run the patch, and jump back.

Gratuitous plug - that was my doctoral research (among other things, hot-patching in safety checks on a running Apache binary), and I wouldn't mind seeing it get used for a good purpose. The code's available at www.dyninst.org.

I used to do a something similar to this (though less sophisticated) every time a new Chrome update came out, changing a single byte in the binary so that I could restore the http:// at the front of URLs.

I considered making a website to publish the proper offset to change for each version, but I got complacent after a while.

How did you figure out which byte to change?

I haven't done it in a year or two, so it took me a minute to figure out the basics again.

The short version is that I crawled through the source of Chromium for a while until I found the flag that controls it [0].

Then, since FormatUrlType was a uint32, and I assumed the storage of constants would be close together, I did a little trial and error searching through the binary in Hex Fiend until I found the value for kFormatUrlOmitAll. Then I would change this value from a 7 to a 5, which would remove the kFormatUrlOmitHTTP flag (or sometimes to a 1, to see if I liked trailing slashes on bare hostnames).

Of course, since Chrome autoupdates, I had to do this every few times I restarted the browser, until I just got too lazy. :) I can't seem to find the offset this time, though, so I very well might be missing a step!

[0] https://code.google.com/p/chromium/codesearch#chromium/src/n... [1] https://code.google.com/p/chromium/codesearch#chromium/src/n...

I'm so glad Firefox still allows this as an option. I guess it's irrational, but it drives me absolutely nuts to see bare URLs without protocols.

I just used IDA for this today (I patched the Bochs bios to skip the 3-second F12 delay on boot).

There's a handy "Patch program" menu under Edit. It allows you to replace bytes, assemble code, and apply your changes back to the binary.

wow, what a great idea. too bad my BIOS is basically made by north korea and isnt hackable.

»If it had occurred to anyone to edit the binary to increment the version number, they could have pushed out a good update in a minute instead of half an hour«

Or longer because now you have to patch the signature as well (as unsigned binaries probably won't be distributed by the updater).

This is cool, and reminds me of a disaster I once wrought upon myself:

Maybe 6 years ago I used TextWrangler to find and replace against a folder. It was only a few characters being changed but there were loads of instances of them. In my PHP code it was no biggie but I accidentally overwrote parts of a Flash FLA (hundreds of instances in the binary) and made a massive piece of a project for my biggest client at the time worthless.

Totally avoidable, lesson learned!

...revision control? Backups?

I think a few of us probably have incidents like this that inspired us to use source control and/or better backup solutions.

Like I said, lesson learned.

The problem you now have two identical but not identical versions. I would not like to be part of a team that does binary edits. Unless there is no other way.

Yeah I think it may be short sighted to shortcut a build+release process just to save a few hours. I can only imagine what might go wrong or what I might miss. Someone else mentioned code signatures which is a good point.

Assuming they just compiled the old version with a bumped version number, that's exactly what they ended up with. If you compile the same source twice you get the exact same binary.

> If you compile the same source twice you get the exact same binary.

You wish this were true. You really, really wish it were. But it's not always. Reproducibility is a really tough problem when you support 25+ Linuxes, each with their own versions of GCC and libc, different compiler flags by default, etc. There's some work being done to make this "better" like GNU ld's build-id hashing binary sections so it's easier to tell if they changed, and projects that stash the compiler's flags in a binary section with -frecord-gcc-switches/-grecord-gcc-switches. But we've still got a long way to go, at least in the wonderfully fragmented landscape of Linux software development.

Furthermore, the example in the article is rather specious. Versioning is very often done by macros and string replacement and not some constant somewhere in a .section that's easy to just "lemme tweak this" and push a new release. No, you'd have to go through the entire object file to find any place that version number is used.

It's not the exceptional case in real world software with multiple supported releases that you see code like:

    if (version_number == 2 || version_number == 3) {
      do this thing; 
    } else if (version_number == 4) {
      do that thing instead;
or its equivalent with #ifdefs, which the compiler can then fold away in preprocessing or dead code elimination leaving you with the right chunk for the product version. Hex editing the binary won't catch this, leaving you with a broken binary.

tl;dr: rebuild the software if you screwed up.

The title reminded to investigate emacs core package bindat (bidirectional pack/unpack of binary layouts) to generate disassemblers/patchers


This is always an interesting subject. I enjoyed reading this similar (more practical) application of binary editing by Russ Cox to patch the Mac OS X kernel: http://research.swtch.com/macpprof

While replacing some text and editing some assembly code in a small application is kind of easy and fun, changing some logic in a larger application will require a way better understanding of the code. Basically if you're unable to attach a debugger to it, you'll have a really hard time.

Not to forget that changing binaries violates in many cases the EULA of the application. Sure nobody will notice or complain, but it's something to keep in mind.

nop and jmp were my favorite instructions to break key auths. I miss doing that.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact