Hacker News new | past | comments | ask | show | jobs | submit login
Demystifying programs that create programs, part 1: A disassembler (briancallahan.net)
103 points by cyberlab 3 months ago | hide | past | favorite | 19 comments



> Both the D compiler and the GNU Modula-2 compiler are highly complex pieces of software.

That wasn't my intention, which was to make a simple compiler. Somehow, that got away from me :-/


If it's any solace: I find the language itself to be delightfully simple. It definitely feels like it's the minimum set of changes you could make to C to turn it into a comfortable and modern language.

I suppose if that comes at the cost of a complex compiler I can live with it.


Would you ever start over and try again? :)


The trouble is, we tried lots of ideas in D. Most worked out very well. But there are a few weeds which I'd really like to take behind the woodshed. But some users are wedded to them, and we really don't like breaking their store, which can be a bit frustrating.

It's sorta like designing your own house. I'm lucky enough to have done it. 20 years on, and it's mostly right. But there are a few maddening mistakes that would be terribly costly to fix.

For example, the garage has 8 feet of headroom. Dammit, I need 10 so I can put a hydraulic lift in. It's really hard working under the car without one. I also should have put in a geothermal heating system, which would have been really cheap when the property was all dug up. I could have saved $$$ in heating bills over the years!


What weeds in the language? I'm curious as I'm designing a language now, and my regrets are piling up around the complexity of the type system.


The `lazy` storage class turns out to be unreasonably hard to reason about.

`alias this` in class objects turned out to make no sense. When people have problems with it, I just sigh and say stop using `alias this` in classes.


Starting a series on programs that create programs by looking at a disassembler feels like starting a series on the people who make movies with a film critic: it's a bold move... let's see if it pays off.


Z80 is a good choice for this. About 20 years ago I wrote a JAVA disassembler of similar complexity. After a decade and a half more dev experience, about 5 years ago I tried to write an x86 disassembler, couldn't wrap my head around it, and eventually gave up.


An x86 disassembler isn't that hard (I wrote one in the 80's and still use it every day) it's just a boatload of special cases.

https://www.digitalmars.com/ctg/obj2asm.html


To explain in more detail:

The x86 machine code format is decently simple to decode. It has a couple of opcode maps (one-byte opcodes, another for a two-byte opcode, and two more for three-byte opcodes. Each opcode needs to note which register bank the r field encodes, which one the r/m field encodes, and which one maps to the first and second argument; and additionally if it uses the ModR/M byte in the first place and the size of the immediate it uses (if any).

There are two complications to this basic scheme. The first is that some prefixes are "mandatory prefixes" (66h/F2h/F3h being the most common) for some instructions as opposed to simple modifications. The second is that some instructions take a ModR/M byte but use the different values of that byte to distinguish instructions. For example, 0FAE esp, [mem] would actually be XSAVE [mem] while 0FAE ebp, [mem] would be XRESTOR [mem].

Personally, I've considered building a kind of "world's worst x86 decoder" where ignoring the latter considerations is actually a feature of the decoder.


The first is that some prefixes are "mandatory prefixes"

If you look at the structure of the encoding you'll see that the 66 prefix, which is traditionally known as the "operand size prefix", is quite sensibly used to select between different register widths, e.g. between the SSE/SSE2 wider (128-bit) registers as opposed to the 64-bit MMX ones for the same instruction.

https://www.sandpile.org/x86/opc_2.htm has a nice summary of this.


In my "world's worst x86 decoder" idea, the idea is to make the prefixes (save possibly the segment register prefix) part of the instruction opcode, also including the REX.W and VEX.L bits as well. A nice side effect of this definition is that it makes operands on differently-sized registers different opcodes entirely, which isn't necessarily a bad idea.


When you get to the edges of the ISA finding flaws in dissassemblers is common enough that you often need to use more than one for RE-ing.

Christopher Domas for example used 5 I think when he was fuzzing CPUs.


I found doing the new prefixes for 64-bit to be a pain--so much so, that I basically rewrote mine from scratch.


Both x86 and Z80 instruction sets are far simpler to decode when you split the fields in an octal (2-3-3) format:

http://www.z80.info/decoding.htm

https://gist.github.com/seanjensengrey/f971c20d05d4d0efc0781...

I recently had a look at ARM64. If you thought decoding x86 was complex... I was seriously expecting something simpler, being a RISC and such, but the instruction encoding is surprisingly complex. They are a fixed length but various fields are packed into each instruction in a pretty nonintuitive way.


Check out how the RISC-V jump offsets are encoded/packed.

   |   imm[20|10:1|11|19:12]   | rd | 1101111 |  JAL
I've heard that there's some reason for the bits being strewn about (something about how it enables some microoptimizations on tiny decoders by sharing muxes?), but I don't know the exact specifics.


It's because then the other fields, such as the source and destination registers can stay in the same place across various opcodes. In the rare cases there's an immediate field, such as the jump offset, it uses what would otherwise be a register or an operand.

Disassembling RISC-V is then quite straightforward because of this feature.


As someone who's written a lot of disassemblers, this feature is far more complicated to write a disassembler for than how the other RISCs handle it, by close to an order of magnitude when you account for the extra tests that you want too.

All of the bits being contiguous would just mean that you could just

  jump_offset = (((i32)opcode) & 0xFFFFF000) >> 12;
rather than this weird game of trying to piecemeal back all of the requisite components. Or, if the components are in order then you're given the choice; it doesn't have to be a mutually exclusive decision.

It's got to be saving some muxes on tiny cores by keeping the sign bit and bits 10:5 in the same place, but honestly even that seems like a micro optimization that doesn't have a lot of benefit at the end of the day. I wouldn't think that this single saved mux would have been in a critical path (and the sign bit would have been in the same place regardless).


Where is my program creator dammit? Flying cars and household robots missing too. Everything is just like 1969, your "program creator" is some House Negr^H^H^H Estonian sitting by a Teletype.




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: