Just had a read of this to see what it did... And I must admit, I don't understa...

rssoconnor · on Oct 29, 2023

The program does also dispose of comment lines.

One could argue that this is just a kind of trick so they can say the next "binary" is actually a "source" file because it happens to be written by a human in ASCII.

Still the phase distinction between what is a source and what is a binary becomes blurry at this low level. I believe the next stage of compiling is to, writing in ASCII represented machine code with comments, to allow for the existence of labels and then compute offsets for jumps to labels. And then more and more features are added until you have a minimal assembler letting you write somewhat machine independent code, and then continuing to work you way up the toolchain.

So at which point does the translation from "source" to "binary" become a real thing and not just a trick of semantics? Is it when we have a machine independent assembly code? Is it when we computed offsets for labelled jumps? It is when we started stripping comments out of the source code?

ralferoo · on Oct 29, 2023

Yeah, I kind of agree, but my issue is kind of with this statement (in the link from the peer post):

> What if we could bootstrap our entire system from only this one hex0 assembler binary seed? We would only ever need to inspect these 500 bytes of computer codes. Every later program is written in a more friendly programming language: Assembly, C, … Scheme.

And my issue is that this isn't true. hex1 isn't written in assembler any more than hex0 is. Both of those bootstrap files can get onto the system simply by ignoring whitespace and anything after #, converting the hex into binary and writing it to a file.

Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source and both have the same problem of needing to prove the hex in the source matches the assembly (and that the program even does what the comments claim).

hex1 is a more useful bootstrap point, because you can use standard system tools to create the binary from the source (e.g. sed) and also compile itself and verify that the files are the same.

Having hex0 and hex1 just means you need to manually verify both rather than just hex1.

I guess my point is that if you have insufficient trust in your system that you can't e.g. trust "sed" to create the original binary files, or trust the output of "dd -x" or "md5sum" to verify the binary files, you also can't trust it enough to verify that the hex in those source files is correct or that the binary files match.

lmm · on Oct 29, 2023

> Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source

Well presumably you toggle hex0 in on the front panel and then type hex1 with the keyboard, which is easier than toggling in the binary of hex1.

reactordev · on Oct 29, 2023

It’s the first stage. Likely piped. Hence the hex out. The context on how it’s called is key: https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304b...

autumn-antlers · on Oct 29, 2023

Section 1.6.1 of the GNU Mes manual places these early stages assemblers into context:

https://www.gnu.org/software/mes/manual/mes.html#Stage0