> On a set of kernel-intensive benchmarks (including NGINX and Redis) the fraction of kernel CPU time Biscuit spends on HLL features (primarily garbage collection and thread stack expansion checks) ranges up to 13%. The longest single GC-related pause suffered by NGINX was 115 microseconds; the longest observed sum of GC delays to a complete NGINX client request was 600 microseconds. In experiments comparing nearly identical system call, page fault, and context switch code paths written in Go and C, the Go version was 5% to 15% slower.
10% slowdown in return for memory safety could be a worthwhile tradeoff in some cases. And GC pauses were almost not an issue (less than 1ms in the worst measured).
If the kernel was only using 1% of the CPU time before, and you rewrite it in QBASIC for a 5x slowdown, your application only sees a ~4% slowdown...
I beg to differ.
Lisp has been influential since the 60s, especially in certain niches (AI, symbolic algebra, etc.) and is still going strong.
Smalltalk was used throughout the 70s (although first "released" in 1980), and is far more "dynamic" than the scripting languages widely used today (e.g. Python); the classic example is 'ifTrue' being a method call which we can override (via a built-in live editor, no less!).
Prolog has been around for a while, and saw a lot of attention in the 1980s. It's much more high-level than most languages, since its interpreter runs a search algorithm to calculate results, rather than dumbly stepping through a single execution path.
Most of the cool features popping up in "modern" languages, like pattern-matching, currying, (Hindley-Milner) type inference, etc. came from ML in the 70s.
Scheme is also from the 70s, and has features like tail-call elimination which many languages/platforms are still lacking (spawing a whole sub-genre of blog posts about how to implement it in language X via trampolines). Scheme also brought attention to continuations, giving us things like coroutines. The async/await and yield features added to many languages recently are close cousins of Scheme's call/cc; as are try/catch/throw, for that matter! Although call/cc is undelimited, delimited continuations have been around since the 80s.
That's just off the top of my head, wearing my Programming Language Theory hat. If I put on my Hacker hat I could mention Sh, Snobol, Awk, Icon, ABC (Python's predecessor), etc.
My initial statement was too strong, what I meant to say was that the higher level languages were not practical for commodity application development and distribution until more recently. They’ve been along for a very long time but Smalltalk is an infamously isolated system and it was very slow even when running on an Alto (I’ve used it!). I admittedly know much less about the practical history of Scheme and Prolog.
To jolux's point, Scala, F#, Ocaml/Reason, Clojure, Haskell, Erlang while not as huge as Java and C#, they have billions in value attributable to them.
They did the same with C back in the day, versus what was being done in systems programming since the 60's.
The language C builds on, B, is a spin off of BCPL, a language designed to Bootstrap CPL.
Thanks to UNIX's success we ended up with a language whose original purpose was to bootstrap compilers, not to be a full stack programming language.
Here, systems programming in 1961, 10 years before C was born, still being sold by Unisys.
Or at US military,
Or during the 70's,
Remember that project Bell Labs stepped away from, which gave C's authors plenty of free time?
"Thirty Years Later: Lessons from the Multics Security Evaluation"
Null terminated strings.
In, "Trusting Trust" Ken tacitly admitted that C itself was the inside job. Once you align yourself with Cs core tenets, performance above all else, the game itself is up and your mind has been infected against looking at things holistically.
When one isn't available, the runtime plays the role of an OS, and much be adapted as such.
Alternative ideas, here are four, combining the productivity of automatic memory management with ownership,
"Linear Haskell", https://arxiv.org/abs/1710.09756
In the context of Rust, maybe showing lifetimes graphically in an IDE would help productivity versus languages that offer automatic memory management alongside ownership.
Implementing Nim's bounded time GC could cut that overhead down and defer work for idle periods. That should be first priority when implementing an OS in managed code.
Microsoft even tried to rewrite Windows in C# on the premise.
Meanwhile in 2020 Java programs still suck ass to use. (Sorry for the crude language, but it's true.) Microsoft memoryholed the whole C# thing and went back to promoting C++.
And you are mistaken with "Going Native fever from MS", it was a flop that produced UWP, now being fixed with Project Reunion.
Most of the System C# features from Midori ended up in C# 7.x, 8.0 and 9.0 brings even more goodies like naked function pointers and C ABI to call into .NET code.
Isn't that due to memory usage, rather than speed?
- value types
- some form of RAII
- untraced reference (aka raw pointers)
- stack and global memory segment allocation
Examples of such languages:
Mesa/Cedar, Active Oberon, Mesa/Cedar, Component Pascal, D, Modula-3, Nim, Swift, C# (depends on the version, C# 9 is already quite sweet)
The future is not garbage collected.
The future is pretty much GC with affine types, even Rust has to deal with crude RC<RefCell<>>, without the corresponding tracing GC performance, when doing GUI applications.
It's essentially the Pareto principle. Most of your garbage is generated by a minority of your program. So only small bits of your program need to be optimised to not use the GC.
Also Go's GC still is terribly naive. They recently fixed the latency problem, but it still needs to be redesigned to match a good Common Lisp GC.
Rust is not just the future of systems programming -- it's the future of all programming. You may not program directly in Rust the language, but borrow-checked static memory management is coming to an application language near you.
Yeah, no thank you. The OS community should stick with Rust.
 Matthew Hertz, Emery D. Berger. Quantifying the performance of garbage collection. ACM SIGPLAN Notices, volume 40, issue 10, October 2005. https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf
> Swift is intended as a replacement for C-based languages (C, C++, and Objective-C).
Reference Counting, chapter 5 from The Garbage Collection Handbook.
It is funny that we have these capabilities within arms (fingers) reach but it takes someone daring enough to put symbols in a certain order to show that a different world is possible.
Joe Duffy mentioned in one of his talks (Rust Keynote) that even with Midori running in front of them (it even powered part of Bing for a while), there were people on the Windows team quite sceptical that it was possible at all.
Anti-GC cargo cult goes a long way.
And you don't have to worry about memory safety either. Memory safety has been formally proven as part of the SeL4 spec.
And that's even the worst case of synchronous/blocking calls. With different API design - e.g. asynchronous interactions as done with io_uring or IOCP - the latency becomes less important as long as the overall efficiency and thereby throughput is high.
Non-local performance effects arise from TLB and other cache invalidation, which is required when changing from one task to another. You can't avoid that without putting everything in the same address space, which would make the system a fake microkernel. Fake microkernels have the less-readable code of microkernel design but without any benefits other than buzzword compliance.
CPU bugs like Meltdown and Spectre mean that every IPC now needs to blow away all sorts of caches and prediction. For example, you wipe out branch history. The cycles spent during the IPC call are only a tiny portion of the cost. The loss of cache content (TLB, branch history, data cache, code cache, etc.) greatly slows down the CPU.
Two, the benchmarks for IPC usually include the cost of cache invalidation. This one  for SeL4 clearly shows the difference on Skylake for the kernels compiled with Meltdown mitigations. Still in the sub-microsecond range.
Meltdown: Intel x86, IBM POWER, ARM Cortex-A75
Spectre: Intel x86, AMD x86, ARM (Cortex-R7, Cortex-R8, Cortex-A8, Cortex-A9, Cortex-A15, Cortex-A17, Cortex-A57, Cortex-A72, Cortex-A73, various Apple-designed cores) and some IBM hardware.
And there is the whole point that many of the performance issues with microkernels have long been solved.
They are hybrid design, or some would even refer them to monolithic. Definitely not microkernel, although they are moving in that direction with DriverKit and other kit initiative.
Plaudits to all involved.
What it would be missing, and was fixed in Active Oberon, is explicit support for untraced references.
could you explain that a little more. Are references in Oberon bidirectional? Meaning there is a list of every reference ever taken and by whom?
Traced references are tracked by whatever form the language offers automatic memory management.
Untraced references are a kind of safe pointers, basically the memory they point to isn't tracked by the GC/RC infrastructure and they are managed in some form of manual memory management, and can be used as part of unsafe code for pointer arithmetic.
The languages then provide means of converting between both worlds, naturally it always requires making use of some form of unsafe code block.
I thought it might be an Oberon specific concept. I'd like to see something like ref counting, but an actual ref list where there is a global list of references to an object.
Just like C, C++ or Rust you would need a thin Assembly help and that is about it.
That's quite different than a low-level kernel bootstrap, which is what you linked. Regardless of whether you do it in Assembler or the aforementioned languages, you must build a hoist outside of Golang.
Inline Assembly doesn't count as C, isn't part of ISO C and any language can have such extensions, including Go.
It’s not even uncommon, as liballoc is one of the top recommended allocators for hobby OSes.
I never mentioned inline assembler. Whilst any decent OS will most definitely fall back on assembler, and I never claimed otherwise, it’s not a requirement. It, or another of the aforementioned, would be within the realm of standard Golang.
You’re jumping through hoops to equate a language out of it’s intended use cases to ones completely within their’s. It’s not a knock on Go (or other GC’d languages), it’s just a reality of the compromise on specific features. You also couldn’t write an OS in a VM’d language without having a VM hypervisor; it doesn’t mean they’re bad languages.
> A potential problem with garbage collection is that it
> consumes a fraction of CPU time proportional to the
> “headroom ratio” between the amount of live data and
> the amount of RAM allocated to the heap. This section
> explores the effect of headroom on collection cost.
> In summary, while the benchmarks in §8.4 / Figure 7
> incur modest collection costs, a kernel heap with millions of live
> objects but limited heap RAM might spend
> a significant fraction of its time collecting. We expect
> that decisions about how much RAM to buy for busy
> machines would include a small multiple (2 or 3) of the
> expected peak kernel heap live data size.
> If CPU performance is paramount,
> then C is the right answer, since it is faster (§8.4, §8.5).
> If efficient memory use is vital, then C is also the right
> answer: Go’s garbage collector needs a factor of 2 to 3 of
> heap headroom to run efficiently (see §8.6).
I'm sure there are tricks you can do in go (as in java) to subvert the garbage collector, but I'm not sure I'd want to build a kernel based on them when I could just use another language.
Another data point is Apple adopting automatic reference counting for objC and Swift. Another is Discord switching from go to Rust ("go did not meet our performance targets.")
Potentially an OS kernel written (for example) in a language like Rust (though not without its own challenges) could have more consistent performance and lower memory overhead.
disclaimer: I am aware of OS kernels written in Rust but have no experience developing or using one
Yeah potentially, but implementing in easier language in less time may be acceptable trade-off.
Because Go is not poster child of fast GC'd languages either (conscious choice of a simpler optimzer for fast builds), and the implementation only showed 10%-15% difference with linux, the situation is optimistic.
With better escape analysis in GC languages and better compilers, the gap can be reduced further.
> Apple adopting ARC
That's because of legacy interop concerns mainly. Swift's ARC implementation was once horrible, and ARC still seems to be a significant bottleneck in SwiftUI. The ARC more efficient blanket statement myth is mainly spread by apple fanboys, who might not even have heard the words 'cache' and 'contention'.
The efficient ARC methods like deferred RC approach a tracing GC.
The benefit of RC is predictability (though not always) & RAII. That's why Rust and many C codebases use explicit refcounting when lifetimes are unknown.
While making the compiler slower, contradicting the upside you mention earlier.
Not to mention 70% of optimization can be done in 10% of code and much faster. It is the more exotic optimizations that have diminishing returns for slow compile times.
> go (and garbage-collected languages/runtimes in general) may be poor successors to C and C++ for applications where consistent performance and memory efficiency are important.
I think the latter part here is indeed important. From my pov, Go is a lot better suited for a lot of the stuff I previously used C/C++ for, where a scripting language like Python was sometimes an alternative. I rarely touch C/C++ anymore these days (except for arduino stuff), and python, which I used a lot before, has become something I only use when I really need a dynamic language for some hacky stuff.
But languages are just tools, pick the best-one for your problem. Go certainly carved out it's niche, and Rust is also going pretty strong, and although maybe not as accepted, is also very interesting and promising.
Were we having this discussion during the 80's, no one would think C was usable for anything serious, when its compilers couldn't produce better code than junior Assembly developers and all AAA games of the time were 100% Assembly.
I also read the paper, but I also have read tons of other papers of systems implemented in other languages.
While current implementation is quite good, there is lots of room for improvements, including improving the way Go deals with value types.
Also with a small improvements to Go (untraced references), or even a //go: annotation, there would be more room for C like data structures, when required to go down that path.
Without trying to diminish the work that went into this thesis, speaking from experience, when it was done it was done, performance improvements were most likely not pursued as the point to write an OS in Go as thesis was already proven.
Without it, some guessing games might happen that bring everything down, or the GC has to be more conservative regarding its guesses.
An example of this are the latest restrictions on Go 1.15 regarding pointer conversions
Since it’s possible to use C data structures and pointers directly from Go that seems like the safest way to do it, although of course then you need a C toolchain installed.
One of the reasons why they are changing the pointer rules in Go, was exactly because there are data races across domains if you happen to nest conversions between GC and non-GC pointers, and the GC happens to run just in the moment that it thinks the reference is no longer in use, while actually it was stored temporarily in an uintptr.
When you use less GC, you increase performance, but you also get fewer of the safety and convenience benefits of GC. There is no Holy Grail of GC just around the corner that will make this tradeoff obsolete, despite what GC advocates have been telling me for 20 years.
Learn to use the tools and enlightenment will be achieved.
gVisor, being a regular userspace program, has no such worries.
At the time this project showed up, I gave it a try on QEMU. It was cool and I was like "I must join this!" As the result, except for one minor fix, I was not able to contribute more because the development environment was not comfortable. The patched go tree was not easy to follow. It also looks impossible for others to just rebase to later go.
My take on this is that, although go provides many OS-like features, you still have to draw a clear line between the OS and the language it uses if you want it to be maintainable and evolvable. I am still obsessed to the idea of making an OS in go and somehow trying doing one in a very slow pace.
I'm hoping to avoid this in one of my own OS projects using Forth, actually; I think using a language with most features implemented outside of the core language implementation, but rather in libraries, helps a lot here.
For example, exceptions are implemented in the OS code rather than by compiler magic (though there are some built-ins the exception system needs to call that are only really relevant if you're building an exception system). The actual Forth implementation is under 2000 lines of aarch64 assembly, after removing comments and whitespace.
On the other hand, you stated
> I think using a language with most features implemented outside of the core language implementation, but rather in libraries, helps a lot here.
I don't see how your experience contradicts with my previous statement. Would you clarify more?
Also, I don't know Forth before. Thanks for enlarging my view.
Needing to avoid diverging from an upstream compiler is also somewhat alleviated by (most dialects of) Forth being suitable for OS development out of the box, so patching the compiler isn't often necessary.
Additionally, it's not ridiculous to implement a custom Forth for each project, depending on its needs, which makes tracking an upstream a non-issue. A Forth implementation is far less code than an implementation of C, Go, etc., so it's not very time-consuming, and the way Forth user code is typically written makes it easy to port code between implementations.
This would be important because even if you have proven the functional correctness of a kernel, that typically excludes the concurrency aspect.
Like one thing it already gets wrong is that you can send mutable pointers around without clear ownership.
Which Lisps are you thinking of? CL and Scheme both allow having multiple copies of mutable objects.
You can compose small systems, even with multiple parties, prove they cannot deadlock, then make them a 'black box' with defined IO, and build larger, more complex systems with equal properties.
The downside is you must guard every piece of shared data with a separate thread, but there may be ways to reduce the performance penalty.
The longest single GC-related pause suffered by NGINX was
115 microseconds; the longest observed sum of GC delays
to a complete NGINX client request was 600 microsec-
"This repo is a fork of the Go repo (https://github.com/golang/go). Nearly all of Biscuit's code is in biscuit/."
From one of the linked papers:
"Biscuit has nearly 28 thousand lines of Go, 1546 lines of assembler, and no C."
There is C code in biscuit/user/c, but it appears to be userland test programs, not kernel code.
They’re in completely different spheres, with regards to practical production usage.
Go generally is nicer when you can afford a GC and a runtime. By nicer, I mean you'll get things done faster.
Rust generally is nicer when you can't, or you need better C interop, more low-level control, or you need a stronger type system for some reason. It's not as productive as Go, or at least I'm far from reaching that point after months of full time rust development, and I'm a pretty experienced polyglot developer.
Since this doesn't include "OS kernel" what is the state of Rust on deeply embedded devices? Rust for PIC12 anytime soon?
Dealing with the more expressive type system, additional errors and warnings, figuring out generic functions, types and type constraints.
Having the additional concept of ownership to think about, design around, and run smack into face-first. GC is a joy by comparison, it requires zero effort.
Figuring out which abstraction to use, in Go there is typically only one obvious way to do it.
Dealing with less mature libraries.
In general there is just way more cognitive overhead working in Rust, much more to think about, more options to choose from, and more constraints. Some of this should pay itself back by avoiding certain classes of bugs - but I find on a solo project (I've used Go in teams, but not Rust) I get very little benefit here because all the code is written by me, and I'm experienced enough not to make a lot of the mistakes Rust can protect me from, like race conditions or nil pointers/interfaces most of the time. On a larger team you'd get more benefits here, especially if working with more junior developers. But I don't think you ever get to the same productivity you get with Go.
So my advice is if you can afford a GC, and you don't need too much low-level C interop or performance optimiations, choose Go.
I do want to add, that from a software craft perspective, I find Rust more elegant to read and write, but only really when I'm not pulling my hair out because of the borrow checker, or hairy details around closures, or whatever else. I do take a certain satisfaction in that elegance sometimes, but overall I'd rather get stuff done.
What plays against it is having a tiny community without mega-corp sponsorship, so anyone that wants features has to do them themselves.