- Know how your hardware actually works, especially CPU, storage, and network. This provides the "first principles" whence all of the below are derived and will allow you to reason about and predict system behavior without writing a single line of code.
- Understand the design of the operating system well enough to reimplement and bypass key parts of it as needed. The requirements that would cause you to to build, for example, a custom storage engine also means you aren't mmap()-ing files, instead doing direct I/O with io_submit() on a file or raw block device into a cache and I/O scheduler you designed and wrote. Study the internals of existing systems code for examples of how things like this are done, it is esoteric but not difficult to learn.
- Locality-driven software design. Locality maximization, both spatial and temporal, is the source of most performance in modern software. In systems programming, this means you are always aware of what is currently in your hardware caches and efficiently using that resource. As a corollary, compactness is consistently a key objective of your data structures to a much greater extent than people think about it higher up the stack. One way you can identify code written by systems programmer is the use of data structures packed into bitfields.
- Understand the difference between interrupt-driven and schedule-driven computing models, the appropriate use cases for both, and how to safely design code that necessarily mixes both e.g. multithreading and coroutines. This is central to I/O handling: network is interrupt-driven and disk is schedule-driven. Being explicit about where the boundaries are between these modes in your software designs greatly simplifies reasoning about concurrency. Most common concurrency primitives make assumptions about which model you are using.
- All systems are distributed systems. Part of the systems programmers job is creating the illusion that this is not the case to the higher levels in the stack, but a systems programmer unavoidably lives in this world even within a single server. Knowing "latencies every programmer should know" is just a starting point, it is also helpful to understand how hardware topology interacts with the routing/messaging patterns to change latency -- tail latencies are more important than median latencies.
The above is relatively abstract and conceptual but generalizes to all systems programming. Domains of systems programming have deep and unique knowledge that is specific to the specialty e.g. network protocol stacks, database engines, high performance graphics, etc. Because the physics of hardware never changes except in the details, the abstractions are relatively thin, and the tool chains are necessarily conservative, systems programming skills age very well. The emergence of modern C++ (e.g. C++17) has made systems programming quite enjoyable.
Also, the best way to learn idiomatic systems programming is to read many examples of high-quality systems code. You will see many excellent techniques in the code that you've never seen before that are not documented in book/paper form. I've been doing systems work for two decades and I still run across interesting new idioms and techniques.
I learned about MIPS CPUs in college, 20 years ago, with Patterson and Hennessy, and built one from logic gates. It was basically a high-powered 6502 with 30 extra accumulators, but split across the now-classic pipeline stages.
I understand that modern CPUs are much different than this. I understand that there are deeper pipelines and branch predictors and superscalar execution and register renaming and speculative execution (well, maybe a bit less than last year) and microcode. Also, I imagine they're not built by people laying out individual gates by hand. But since I can only interact with any of these things indirectly, I have no basis for really understanding them.
How does anyone outside Intel learn about microcode?
- Agner Fog's instruction tables for x86, which has the latency, pipeline, and ALU concurrency information for a wide range of instructions on various microarchitectures.
- Brief microarchitecture overviews (such as this one for Skylake), that have block diagrams of how all the functional units, memory, and I/O for a CPU are connected. These only change every few years and the changes are marginal so it is easy to keep up.
Knowing the bandwidth (and number) of the connections between functional units and the latency/concurrency of various operations allows you to develop a pretty clear mental model of throughput limitations for a given bit of code. People that have been doing this for a long time can look at a chunk of C code and accurately estimate its real-world throughput without running it.
Thanks for sharing the above very useful resources, blocked my Saturday for first scan!
I don't disagree, but at least anecdotally a lot of the shops I've been involved with/worked with are really excited about Rust and Go. A previous employer that used C++ exclusively has even shipped a few Golang based tools, and is planning to introduce it into the main product soon. No new projects are being started in C++ either.
Definitely recommend learning C, but having Rust (or Golang) exposure will likely be helpful in the near future.
Great post btw.
Know your requirements and then you can decide on the runtime.