
Menuet – A pre-emptive, real-time and multiprocessor OS written in assembly - walterbell
http://www.menuetos.net
======
exikyut
_Almost_ GPL.

MenuetOS has both 64-bit and 32-bit variants.

The 32-bit version has a fully open source kernel, the 64-bit version's kernel
is closed. Light details and strong hinting of drama here:
[https://board.flatassembler.net/topic.php?t=16822](https://board.flatassembler.net/topic.php?t=16822)
(just found this to cite the 32/64 open/closed thing and TIL'd)

What IS interesting is that it uses flat assembler. fasm is kind of
interesting. Anybody looking to play with it on Linux may find this ( _very_ )
high-level/vague/rough idea of where to start helpful:
[https://news.ycombinator.com/item?id=15282073](https://news.ycombinator.com/item?id=15282073)

~~~
andrewflnr
The comment you linked says that nasm is "tied to gcc", "indirectly". What
kind of tie does it have? All I really know about it is that it uses a
different syntax than GCC's assembler.

~~~
exikyut
nasm cannot generate ELF binaries directly, only .o files which must be
formally linked. So you have to use `ld` on the generated object file(s) to
produce a binary. (I've not (yet) tested using LLVM's new linker or the linker
stages from other compilers.)

Some extremely anecdotal and unscientific testing I did a little while back
_seemed_ to show that fasm's "hello world" example is several scales smaller
than nasm's. I decided to be scientific and re-do my test with similar
programs.

fasm includes a hello-world program in the distribution:

    
    
      format ELF64 executable 3
      segment readable executable
      entry $
        mov edx, len
        lea rsi, [msg]
        mov edi, 1
        mov eax, 1
        syscall
        xor edi, edi
        mov eax, 60
        syscall
      segment readable writeable
        msg db 'Hello world!', 10
        len = $ - msg
    

Here I pretty much converted the fasm example to match nasm as closely as
possible (using various online references to get an idea of how it should look
for nasm):

    
    
      section .text
        global _start
        _start:
          mov edx, len
          mov rsi, msg
          mov eax, 1
          mov edi, 1
          syscall
          xor edi, edi
          mov eax, 60
          syscall
      section .data
        msg db  "hello, world!", 10
        len equ $ - msg
    

What isn't really possible with nasm is to remove the symbol table and _start
export. This makes a phenomenal difference in file size.

Here's nasm:

    
    
      $ ls -lh nasm64 nasm64.o
      -rwxr-xr-x 1 i336 users 512 Oct  9 10:10 nasm64
      -rw-r--r-- 1 i336 users 880 Oct  9 10:10 nasm64.o
    

Because fasm doesn't need to mess around with exports or symbols and can just
write binaries directly, it absolutely kills nasm:

    
    
      $ ls -lh hello64
      -rwxr-xr-x 1 i336 users 222 Oct  9 10:29 hello64
    

One of my major interests in assembly language is in having tight control over
output file size, so fasm is super interesting to me.

I _am_ very curious as to the file format differences though. Apparently nasm
generates SYSV-format ELF files, while fasm generates a Linux-specific
variant...?

    
    
      $ file nasm64
      nasm64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
    
      $ file hello64
      hello64: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, stripped
    

That's interesting. I'm genuinely curious what this means.

As far as the size differences are concerned,
[http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...](http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)
shows that to make smaller stuff than ld generates, you have to start hand-
composing the ELF file. So fasm is pretty much the only way to produce the
smallest ELFs with minimal fuss on Linux.

~~~
chaosite
File isn't smart enough to read too much into the actual ELF format, all it's
doing is parsing part of the header.

What that difference means is that fasm put a 3 in byte 7 of the header, while
nasm put a 0 there.

And apparently putting a 0 there no matter what the target OS ABI is common
practice.

[https://en.wikipedia.org/wiki/Executable_and_Linkable_Format...](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header)

~~~
exikyut
Oh! I see!

Thanks so much for this little detail - you helped me figure out that in the

    
    
        format ELF64 executable 3
    

at the top, that _3_ is what gets put into the binary!

Change it to 0, and sure enough it shows as SYSV. That is cool.

...change it to 2 and it shows as NetBSD

...and 1 makes it show as HP-UX

...4 makes GNU/Hurd

O.o

5 is 86Open

6 is Solaris

WAT

7 is "Monterey"?!?!

(Yup, still executes just fine)

8 is IRIX

9 is FreeBSD

10 is Tru64

O.o

11 is Novell Modesto

12 is OpenBSD

13 and 14 don't display anything, and I presume that's because that's where
this madness ends.

That was extremely interesting...

~~~
xelxebar
I've hand-written a couple elf binaries just for kicks. The wikipedia page on
ELF has a list of current conventions on the 7th byte in the elf header
(e_ident[EI_OSABI] in the spec) [1]:

    
    
        0x00	System V
        0x01	HP-UX
        0x02	NetBSD
        0x03	Linux
        0x04	GNU Hurd
        0x06	Solaris
        0x07	AIX
        0x08	IRIX
        0x09	FreeBSD
        0x0A	Tru64
        0x0B	Novell Modesto
        0x0C	OpenBSD
        0x0D	OpenVMS
        0x0E	NonStop Kernel
        0x0F	AROS
        0x10	Fenix OS
        0x11	CloudABI
        0x53	Sortix
    

I don't know enough about linker/loader details, though, to tell you how
different ones actually make use of this information. IIRC, the glibc loader
just checks that it's 0x0 or 0x3 and leaves it at that.

[1]
[https://en.wikipedia.org/wiki/Executable_and_Linkable_Format](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)

------
verri
Maybe interesting to post here are the comments of Steve Yegge (GeoWorks
developer) on large scale assembly projects:

[https://youtu.be/tz-Bb-D6teE?t=2161](https://youtu.be/tz-Bb-D6teE?t=2161)

His argument was that while assembly allowed for more optimized local
routines, they've lost the perspective on how the system should behave at
large (the handling of window drawing routines, etc.)

~~~
GeorgeTirebiter
Folks might forget that before unix, most OSs were written in assembly, the
obvious important one not written in asm was Multics (used PL/I). I worked on
the Tops-10 and -20 OSs and they were pure assembly. People seem to think that
it's not possible to write maintainable, reliable code in assembly, that's
just not true. The DEC source was well-documented and well-tested. Sure, it
had bugs fixed each release, as every code base has bugs. Still, I was never
flummoxed as to what a routine did: after all, it was executing one machine
instruction after another, and once you knew the operands, you knew the
result. Debugging was at the machine level, instruction by instruction.

Building higher-level code was accomplished just like it's done today: from
smaller modules / subroutines. PUSHJ - Push PC and Jump, or 'Call'.

These days, writing software is almost magic, meaning there is a whole 'chain
of trust' that must be invoked -- all based on belief that it's all correct --
before proceeding. With assembly, magic stops at the hardware / software
boundary; that is the actual machine instructions.

~~~
pjmlp
Burroughs B5000 was written with zero lines of Assembly in 1961 with ESPOL,
followed by NEWP, both Algol 60 variants with intrinsics.

Xerox PARC used Mesa with microcoded CPUs, after using BCPL initially.

IBM used PL/8 for their RISC OS and compiler research.

OS/400 was written in PL/M.

There are plenty of other examples, UNIX was not the only one to be written in
an high level language, regardless of what C devs advocate.

------
maxpert
I have followed this project for quite long time, have even used it once or
twice for fun (in floppy days). Makes me wonder why didn't anyone tried to
write similar thing for Raspberry, I mean Linux is sometimes too much, this
could have spinned off a complete genre of OS for these kind of devices for
education as well.

~~~
userbinator
_why didn 't anyone tried to write similar thing for Raspberry_

This is close:
[https://news.ycombinator.com/item?id=7926004](https://news.ycombinator.com/item?id=7926004)

But the answer to your question could probably be answered in one word:
Broadcom. By keeping secret the details of their SoC, it's far harder for
"from scratch" projects to appear, despite it being one platform. On the other
hand, the PC, while far more diverse, still relies plenty on documented, de-
facto standard interfaces and there is no shortage of articles on things like
"how to write a bootsector".

~~~
feelin_googley
And yet, as far as I have seen, it is still much easier to run _a variety of
OS_ from a RPi than a PC. Simply swap mini SD cards. While PCs can usually
boot from USB, I have seen ones that cannot also boot from SD card.

(Bias disclosure: For many years I've been compiling own kernels with embedded
filesystem and minimal utilties and then running from removable media, even on
the PC, so RPi was easy transition. Further I stopped using disk or swap years
ago, even on PC, again making RPi transition painless. Finally, I am more
interested in non-graphical tasks so I can live without the use of a GPU.)

The greatest value to me of the RPi is the ability to run _a variety of OS_.
One can readily find other boards that are better than RPi in many ways, but
in each case _running the variety OS_ that using a RPi allows for is not as
_easy_ , if it is even possible. Despite the disadvantage of RPi's closed
bootloader and GPU, the flexibility to run _a variety of OS_ is great
advantage over many alternatives, for me.

The way I think about it, user freedom in choosing an OS is actually
constrained by "too much choice" in hardware. I reason that this is because it
is difficult for OS projects other than Linux to timely add support for each
and every different item of hardware coming off the assembly line in each new
year. Whereas with the RPi, the "moving target" of hardware support moves
less. This IMO opens up possibilities for more OS projects to add support for
RPi hardware.

Whether this is the reason a variety of OS are suppported on the RPi I do not
know. But I am glad that this variety exists.

------
hoodoof
This sort of thing no longer makes sense.

It used to, back in the days when every cycle counted.

Not today, more important is maintainability of the source code, because
compiled C is fast enough.

Menuetos may be fast, but it also does less - for those who would compare to
current OS's - it's not an apples versus apples comparison, no pun intended.

~~~
wyldfire
Maintainability is a high-ranked concern for the project maintenance, for
sure. But for the sake of the project's applicability and longevity, an even
more important reason to write it in C is portability. It's an OS, after all.

Porting Menuet to another architecture would likely be a significant amount of
work. Whereas most OSs have very limited arch-specific portions.

------
ParrotyError
I suppose you can run it in bochs or quemu or one of those things if you want
to use it on non-x86 hardware.

------
raverbashing
At this point, in the evolution of compilers, assembly does not make it faster
(by means of "running faster" though it does by means of "it's harder to
write, hence bloat is kept smaller")

~~~
davemp
> assembly does not make it faster

It is still very possible to hand write assembly that’s faster than optimized
C. The compiler can’t just magically assume the same pre/post conditions that
you can. Also register allocation is an NP problem (graph coloring).

Can a human keep that level of optimization up for the length of an OS? No
idea. Would it even be worth it? Probably not.

~~~
betterunix2
Actually compilers are much better at register allocation than humans and have
been for many years; the fact that it is NP complete is irrelevant, because NP
completeness is just as difficult for humans to deal with. Other than taking
advantage of special instructions that are hard for compilers to use, there
are few cases where the performance of hand-written assembly is better than
compiler-generated code. Compilers cannot assume pre/post conditions, but they
can analyze much larger amounts of code than humans and will often detect
things that humans miss -- which constants that can be propagated, expressions
that can be reduced, etc. Compilers are a lot better at deciding which of two
equivalent instruction sequences will be faster on a given CPU architecture.

~~~
gizmo
And yet the most efficient programs are written in the low level languages,
and the slowest programs are written in high level languages where
optimization is the responsibility of the compiler.

Compilers can definitely do better than humans in some situations, but you can
also use an optimizing assembler that helps with register allocation (not as
important for x64) and all sorts of peephole optimizations. The compiler
doesn't reason about the data, which means the compiler has to make worst-case
assumptions that the programmer knows won't occur. This is a big source of
inefficiency. Compilers will never be able to make strategic optimizations
because the compiler doesn't understand what it's compiling.

Like with chess, computer + human is the winning combination.

~~~
tyingq
_" Like with chess, computer + human is the winning combination."_

Curious what you mean here. My layman's observation is that computers have far
surpassed humans in that space.

~~~
Retric
Not starting from scratch. Modern chess programs leverage a lot of human
design and analysis work.

~~~
fourthark
Are there any domains in which computers start from scratch?

~~~
Retric
There are various abstract learning systems that work just fine for simple 2d
video games. The problem is they have a lot of overhead.

------
johnhattan
A bunch of years ago, there was a "Menuet" product that was a cross-platform
C++ UI library for Mac/Windows/Whatever.

Did they expand the underpinnings into a complete OS, or is this an unrelated
product?

------
FollowSteph3
Does it feel any faster than a modern OS not written in assembly?

~~~
lightedman
I remember it being a practically instant bootup from USB on a Pentium 4
(needed a little FDD hack to make it work but it screamed.)

------
mikebenfield
My tired brain initially thought it saw "GPU" in the acronym soup "GPL GUI PC
OS" and started to think "Weird, what must an OS running on a GPU be like?...
Oh. Nevermind."

------
kryptiskt
So what's the point of GPL if it's written in assembler?

~~~
setzer22
Assembly language can be reasonably structured and have comprehensive comments
to name subroutines, for example. There is some value to that vs what you
would get by de-assembling a binary.

Depending on the GPL, this would also ensure this OS doesn't run on
restrictive platforms like iOS.

~~~
colejohnson66
That’s not the GPL, that’s just the usage of assembly. As a side note, I
remember years ago (around iOS 5 or earlier IIRC), there was a version of
Bochs compiled for iOS and available on Cydia. It came with an IMG of an
installed Windows 95 box and was _very_ slow, but it was “running”[0] x86 on
iOS.

[0] Quotes around “running” because Bochs isn’t so much of an emulator as it
is an interpreter. It’s still an emulator, but it runs by interpreting each
instruction instead of binary translation like “modern” emulators.

------
winter_blue
I think our goal should be to develop _even better_ programming languages that
can express high-level concepts succinctly, and that provide _even larger_
sets of compile-time guarantees (e.g. static typing).

I don't see the benefit in writing anything that doesn't need to be super-
optimized in assembly language. Compilers are so good at optimizing right now
that assembly is only justified for very small pieces of code that for example
need to use special new CPU instructions (for e.g. in video decoding), or for
particular segments of code that a profiler shows that the compiler can't
optimize well enough.

Assembly language is hard to write, read, understand, and reason about. It
shares the same memory model as C, with all its flaws. There is nothing
impressive about this. It's like someone saying they built a house with their
bare hands without using any machines or technology -- machines and modern
technology could have helped them build a house _faster_ , and that was of
_better quality_.

~~~
pjmlp
Actually Assembly usually does what I mean, instead of the UB pulllig rug
under my feet tricks of C optimizers.

~~~
sigjuice
Your hardware might still pull the rug from under you with caches,
prefetching, NUMA, out-of-order execution, speculative execution and other
stuff.

~~~
badsectoracula
These will only make the program run slower, they wont make it produce wrong
results, create heisenbugs or even prohibit from running it.

