Hacker News new | comments | show | ask | jobs | submit login
Writing a Simple Linux Kernel Module (sourcerer.io)
429 points by daftpanda on Nov 30, 2017 | hide | past | web | favorite | 118 comments

"And finally, you’ll need to know at least some C. The C++ runtime is far too large for the kernel, so writing bare metal C is essential."

That line reminded me that NetBSD added Lua for writing kernel modules. (https://news.ycombinator.com/item?id=6562611)

Anybody have any experiences to share from this?

There are plenty of operating systems that use C++ on the kernel side of things. Some things, like exceptions, are frowned upon and rarely used -- if at all -- but there is no shortage of C++ kernel code. Just not in Linux.

When it comes to Linux, one could say that most reasons to avoid it are historical, but this does not quite paint the awkward truth -- namely that, for most of the kernel's lifetime (since back in 1991), C++ compilers simply did not have the level of maturity and stability across the breadth of platforms that Linux required. Linus Torvalds' stance on this matter is pretty well-known: http://harmful.cat-v.org/software/c++/linus .

Today, when x86-64 and ARM are the only two families that you need to care about in the following ten years or so (maybe RISC-V but I rather doubt it), it probably makes sense to look at C++ for operating systems work, but the runtime is certainly heavier than back when Linus was writing about it, too. A modern C++ compiler has a lot of baggage; C++ was huge back in 1998, now it's bloody massive. IMHO, all the reasons why you would want to use C++ (templating support without resorting to strange hacks, useful pointer semantics and so on) are reasonably well-served by cleaner languages with less hefty runtimes, like Rust. What these alternatives do lack is the amazing level of commercial support that C++ has.

> C++ was huge back in 1998, now it's bloody massive.

I don't think any "run-time" feature was added since, though. It's all either OS support (<thread>, etc, that you wouldn't use in-kernel anyways) or template stuff that has 0 impact on runtime (and actually sometimes helps decreasing code size).




If some guys are able to run c++ on 8kb microcontrollers, there's hardly a non-political reason it couldn't be used in-kernel.

See also IncludeOS: http://www.includeos.org/

Additions have certainly been made since back in 1998 (things like smart pointers are relatively new on this scale, as far as I know). Many runtimes for resource-constrained embedded systems do not support all of C++'s features. Exceptions are the most usual omission.

You can certainly strip things down to a subset that can fit 128K of flash and need only 1 or 2K of RAM at runtime, but the question is not only one of computational resources used for the library itself. Additional code always means additional bugs, the semantics sometimes "hide" memory copying or dynamic allocation in ways that many C++ programmers do not understand (and the ones who do are more expensive to hire than the ones who do not), and so on. You can certainly avoid these things and use C++, but you can also avoid them by using C.

I agree that mistrust and politics definitely play the dominating role in this affair though. I have seen good, solid, well-performing C++ code. I prefer C, but largely due to a vicious circle effect -- C is the more common choice, so I wrote more C code, so I know C better, so unless I have a good reason to recommend or write C++ instead of C, I will recommend or write C instead. I do think (possibly for the same reason) that it is harder to write correct C++ code than it is to write correct C code, but people have sent things to the Moon and back using assembly language for very weird machines, so clearly there are valid trade-offs that can be made and which include using language far quirkier than C++.

I absolutely do not understand your point. Anybody doing OS development in C++ is doing so with absolutely no C++ standard library support, same as if you were using C++ to develop for your microcontroller. If C++ binaries are compact enough for Arduino or Parallax Propeller development (<32KB RAM), they are absolutely fine for kernel development.

The real answer is historical, and cultural. On the latter, Unix is a product of C (well, and BCPL) and C is a product of Unix. They two are intertwined heavily. The former is as was mentioned a product of the relative crappiness of early C++ compilers (and the overzealous OO gung-ho nature of its early adopters perhaps as well...)

C++ without exceptions, RTTI, etc. has a lot to offer for OS development. Working within the right constraints it can definitely make a lot of tasks easier and cleaner.

It won't happen in Linux, tho.

And lets not forget C++ got created, because Bjarne didn't want to touch C after his experience going from Simula to BCPL, but it had to be compatible with AT&T official tooling.

> I do think (possibly for the same reason) that it is harder to write correct C++ code than it is to write correct C code

I call BS on this.

So many mistakes in C simply cannot be made in C++ if you follow the well-established coding patterns and don't try to switch to C-style code. e.g.: You just cannot simply forget to free any resource, because you never have to do it with RAII. You cannot forget to check a status code and return early because you don't have to; the exception will propagate until someone catches it. You cannot forget to initialize a vector because it initializes itself. I could go on and on.

That said, there is a huge caveat to what I am saying above: I am comparing experienced programmers in each language with each other -- those who are basically experts and know what they're doing. I'm not debating whether it's easier to shoot yourself in the foot with C++ if you use it with insufficient experience (and part, though not all, of the reason is that you will probably write C-style code most of the time, and write neither proper C nor proper C++). I'm saying that an experienced programmer is much more likely to write correct code in C++ than C.

> Additional code always means additional bugs

What additional code?

> You can certainly avoid these things and use C++, but you can also avoid them by using C.

Right, but what you can't get with C is destructors and move/ownership semantics.

> I do think (possibly for the same reason) that it is harder to write correct C++ code than it is to write correct C code

The ability to write typesafe data structures with move/ownership semantics and specified interfaces while being a superset of C would lead some to say that this is not true.

> What additional code?

All the code that you need in order to support smart pointers, templates, move/copy semantics, exceptions and so on. To paraphrase someone whose opinions you should take far more seriously than mine, you can't just throw Stroustrup's "The C++ Programming Language" on top of an x86 chip and hope that the hardware learns about unique_ptr by osmosis :-). There used to be such a thing as the sad story about get_temporary_buffer ( https://plus.google.com/+KristianK%C3%B6hntopp/posts/bTQByU1... -- not affiliated in any way, just one of the first Google results).

The same goes for all the code that is needed to take C++ code and output machine language. In my whole career, I have run into bugs in a C compiler only three or four times, and one of them was in an early version of a GCC port. The last C++ codebase I was working on had at least a dozen workarounds, for at least half a dozen bugs in the compiler.

If you’re writing a kernel, you’ll almost certainly be using “-nostdlib” (or your compiler’s equivalent) so unique_ptr, etc. won’t be there. You could however write your own unique_ptr that allocates via whatever allocator you write for your kernel. See [1] for a decent overview of what using C++ in a kernel entails.

[1]: http://wiki.osdev.org/C%2B%2B

> If you’re writing a kernel, you’ll almost certainly be using “-nostdlib” (or your compiler’s equivalent) so unique_ptr, etc. won’t be there.

Huh? unique_ptr has a customizable Deleter though; you should be able to provide a custom one so it doesn't call delete.

And doesn't a kernel implement a kmalloc() or something anyway? You would just write your own operator new and have it do what your kernel needs, and the rest of the standard library would just work with it.

True, but the "rest of the standard library" is somewhat harder to port and last time I played with it (admittedly some years ago) no compilers were good at letting you pick and choose.

> All the code that you need in order to support smart pointers, templates, move/copy semantics, exceptions and so on.

smart pointers are their own classes, and exceptions would certainly be disabled in kernel-mode, sure, but for the rest ? which additional code ? there's no magic behind templates and move semantics, and no run-time impact. It's purely a compile-time feature.

> and hope that the hardware learns about unique_ptr by osmosis

Unique_ptr is just a template and does not need the standard library. Also move/ownership semantics don't need unique_ptr.

> The last C++ codebase I was working on had at least a dozen workarounds, for at least half a dozen bugs in the compiler.

There are at least four compilers that been extensively production tested - ICC, GCC, MSVC and Clang. Which one had these bugs and was it kept up to date?

I will say in this that C++ can certainly be used for small, fast kernels given L4/Fiasco was written in C++:


That was targeted at x86 with 2MB of RAM usage, though. So, I quickly looked for a microcontroller RTOS thinking that would be a good test. I found ScmRTOS that claims to be written in C++ with size being "from 1KB of code and 512 bytes of RAM" on up. So, it seems feasible to use C++ for small stuff. I'll also note that it has some adoption in high-assurance industry with standards such as MISRA C++ (2008) already available. They might be using powerful CPU's, though, so I looked up the MCU stuff.


The throwaways are talking safety features. I used to think that was a benefit of C++ over C. Following Worse is Better, that's no longer true: the ecosystem effects of C produced so many verification and validation tools that it's C language that's safer than C++ if one uses those tools. There's piles of them for C with hardly any in FOSS for C++ if we're talking about static/dynamic analysis, certified compilation, etc. I put $100 down that a CompCert-like compiler for C++ won't happen in 10 years. At best, you'll get something like KCC in K Framework.

The reason this happened is C++'s unnecessary complexity. The language design is garbage from the perspective of being easy to analyze or transform by machines. That's why the compilers took so long to get ready. LISP, Modula-3, and D showed it could've been much better in terms of ease-of-machine-analysis vs features it has with some careful thought. Right now, though, the tooling advantage of C means most risky constructs can be knocked out automatically, the code can be rigorously analyzed from about every angle one could think of (or not think of), it has best optimizing compilers for if one cares little about their issues, and otherwise supports several methods of producing verified object/machine code from source. There's also CompSci variants with built-in safety (eg SAFEcode, Softbound+CETS, Cyclone) and security (esp Cambridge CHERI). duneroadrunner's SaferCPlusPlus is about only thing like that I know of that's actively maintained and pushed for C++. The result of pro's applying tools on a budget to solve low-level problems in C or C++ will always give a win on defect rate to former just because they had more verification methods to use.

And don't forget that, like with Ivory language, we can always develop in a high-level, safer language such as Haskell with even more tooling benefits to extract to safety-critical subset of C. Extracted code that's then hit with its tooling if we want. We can do that in a language with REPL to get productivity benefits. So, we can have productivity, C as target language, and tons of automated verification. We can't have that with C++ or not as much if we had money for commercial tools.

So, these days, that's my argument against C++ for kernels, browsers, and so on. Just setting oneself up to have more bugs that are harder to find since one looses the verification ecosystem benefits of C++ alternatives. This will just continue since most research in verification tools is done for managed languages such as Java or C# with what's left mostly going to C language.

I'm really sorry you're getting downvoted, because there is a lot of useful data in your comment. And I think we definitely see eye to eye on this:

> The throwaways are talking safety features. I used to think that was a benefit of C++ over C. Following Worse is Better, that's no longer true: the ecosystem effects of C produced so many verification and validation tools that it's C language that's safer than C++ if one uses those tools. There's piles of them for C with hardly any in FOSS for C++ if we're talking about static/dynamic analysis, certified compilation, etc. I put $100 down that a CompCert-like compiler for C++ won't happen in 10 years. At best, you'll get something like KCC in K Framework.

Lots of people think additional safety features result in safer code. They likely do most of the time, but when you need to swear to investors, to the public and to the FDA that your machine will not kill anyone, what you want to have is results from 5 verification tools with excellent track records, not "my language does not allow for the kind of programming errors that C allows". Neither does Ada, and yet they crashed a rocket with it, with a bug related to type conversion in a language renowned for its typing system (not Ada's fault, of course, and especially not its typing system's fault -- just making a point about safety features vs. safety guarantees).

A more complex language, with more features, is inherently harder to verify. The tools lag behind more and are more expensive. And, upfront, it seems to me that it is much harder to reason about the code. C++ does not have just safety features, it has a lot of features, of all kind.

> I do think (possibly for the same reason) that it is harder to write correct C++ code than it is to write correct C code

I'd disagree. Modern C++ has much better safety features than C ever has.

The "safety" is not the issue, it's the compulsion of using many layers of (somewhat leaky) abstractions that make debugging and otherwise reasoning about behaviour difficult.

I could not agree with this more.

gstreamer, gtk, etc, are really easy to work with and browse the source.

This is why I love golang as well.

Side note, it is funny how much gstreamer and glib try to add c++ "ish" features to c.

I was going to say +1 with golang, and you said it. So just emphasizing that many people prefer code clarity over cool things that make code unreadable (templates...)

Interestingly, I just found Linus' stance on Golang :) https://www.realworldtech.com/forum/?curpostid=104302&thread...

From that post... But introducing a new language? It's hard. Give it a couple of decades, and see where it is then.

> 0 impact on runtime (and actually sometimes helps decreasing code size)

beware, performance and code size does not always go hand in hand!


> There are plenty of operating systems that use C++ on the kernel side of things. Some things, like exceptions, are frowned upon and rarely used -- if at all -- but there is no shortage of C++ kernel code. Just not in Linux.

I used to work on one. It was developed in the 90s, around the same time C++ became an ISO standard. I think most of you have used it without knowing. :)

I think OOP is natural for writing kernels and most mainstream kernels implement some kind of object system because of this. I also wrote my toy kernel in modern C++, despite that I'm "fluent" in both C and C++. So yeah, C++ kernels are here, just not in Linux land.

Are you referring to macOS?

I think the kernel itself is called Mach. Not sure if it was written in C++ though.

The kernel is called XNU. https://opensource.apple.com/source/xnu/ It is mainly written in C.

The BSD part of the kernel is written in C, but IOKit, the kext framework, is written in (a subset of) C++:


Nah. Something from telco land.

>Today, when x86-64 and ARM are the only two families that you need to care about

Wait what? What happened to MIPS, z/Architecture,Power Architecture, QDSP6, TMS320?

I meant that strictly in the realm of general-purpose operating systems (the article is, after all, about writing a Linux kernel). Two of those are DSPs. The Linux kernel and busybox account for 99,99% of the general-purpose code running on them, and bare-metal code for both is still very common. In fact, Linux runs on QDSP6 only under a hypervisor, and gained C6x support only somewhat recently (~5 years, I think?). Last time I saw a MIPS workstation was a very long time ago :-) and, barring a few niche efforts, the same goes for PowerPC as well.

Back when Linus was ranting, one could make a convincing case for supporting all of these architectures in an OS meant for general-purpose workloads, from server to desktop and from thin client (eh? feel old yet :-)?) to VCR -- and . Now you can sell an OS that supports nothing but x86-64 and a few ARMs, and you get most of the server market, and virtually all of the desktop and mobile market.

Obviously the architectures that you need for a particular project are the ones that you want to "care for" -- but in broad terms, it is perfectly possible today for someone to have worked as a programmers for ten years without encountering a single machine that is not either x86 or ARM-based. Someone who had 10 years of programming experience in 2003 probably ran into a SPARC system whether they wanted or not.

We are talking about Linux kernel here. There are billions of embedded processors running Linux on other than ARM and x86.

S390X (IBM mainframes) and OpenPOWER variants are fully supported by Ubuntu and those are not insignificant market either in banking, finance, or numeric computing.

One reason why Linux runs all Top 500 supercomputers is because it they have number of relatively recent systems close to the top running Sunway (Alpha derivative) , Power and SPARC64. Fastest supercomputer today runs on Sunway.

I think you’ve missed large segments of Linux’s user base; you’ve definitely missed Android devices using MIPS processors.

Name one recent android device using mips. There are, to my knowledge, only a handful of mips android devices (literally).

There was an announcement for a 64bit mips android device for 2016 (Warrior i6400) but I didn't hear anything about it. Did you?

Also there was an ainol tablet with the mips architecture (nov07) which flopped (those are the only instances I can recall where mips on android really came into play).

I’m not sure the number of hardware architectures matters that much, as opposed to the number of OSes or compilers. If you have to target N different C++ compilers written independently by different vendors, then sure, each compiler has a heck of a lot of room for bugs in the frontend or standard library. But with a given compiler (say, GCC or Clang) that has N architecture backends, there shouldn’t be that much added risk of architecture-dependent bugs with C++, compared to pure C.

After all, most of the “extra stuff” that C++ adds on to C has no inherent hardware dependence and exists only on the frontend. By the time the code makes its way to the backend, templates have been monomorphized (i.e. copied and pasted for each set of parameters); methods have been lowered to functions with added ‘this’ parameters; hidden calls to destructors and conversions have been made explicit; and so on. For any given C++ program, you could write a C program that compiles to near-identical IR, so if the backend can reliably compile C code, it should be able to handle C++.

True, that hypothetical C program might not look much like a typical human-written program. In order to achieve zero-cost abstractions, C++ code tends to rely a lot more on inlining and other backend optimizations than pure C, even though those optimizations apply to both languages. So if the backend isn’t reliable, C++ may generate more ‘weird’ IR that could end up getting miscompiled, and the level of indirection may make it harder to figure out what’s going on. But compiler backends these days are a lot more reliable than they used to be. And buggy backends can certainly cause problems for C code as well.

I don't get how you make the connection from Torvald's arguments to what you say. His complaints are on a fundamentally different level: ~"Programmers over-abstracting are a timeless problem, the more complex the language/standard library, the easier it is for programmers to get lost in abstraction. Bad programmers just get lost, mediocre programmers even defend what they do."

I despise his choice of words and general attitude towards programmers "less capable than him", but I do think there is some truth in what he says there: Imagine we'd all be writing asm only - we'd spend much more time on the drawing board and find much more elegant solutions to problems that we today solve with mindlessly writing hundreds lines of code within a "framework".

Side note: I wrote C++ in 1998; it was more "bloody massive" back then than it is today, in my humble experience.

I was referring only to the parts of that rant that are/were of some objective value, e.g.:

> anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny

> the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C.

You can certainly write C++ code without overabstracting it. Maybe Torvalds only ran into C++ programmers who liked overabstracting their code -- as it is usually the case with opinions of some people about other people, that part of the rant is best discarded :-).

But as my memory serves me, back when "portability" meant x86, SPARC, Alpha, ARM, MIPS, PPC, PA-RISC and m68k, the only way to do "good, efficient, and system-level and portable C++" was to limit yourself to the (relatively small) subset of C++ that was well-implemented by widely-available compilers (particularly GCC) across all these architectures -- not as a matter of discipline and code elegance, but because writing non-trivial C++ code back then usually resulted in very unpleasant trips down the standard library's source code and poring over objdump outputs trying to figure out what the hell broke this time.

Yeah, which is why Java felt like fresh air compared with those portability hurdles.

> There are plenty of operating systems that use C++ on the kernel side of things

Language-wise, Darwin/XNU/IOkit uses a subset of C++, notably one that excludes exceptions and templates IIRC. stdlib-wise I suspect it's even more restricted/different.

Has Linux stated any opinions on rust? Some of his issues with C++ also apply to rust.

> any compiler or language that likes to hide things like memory

“That's not a new phenomenon at all. We've had the system people who used Modula-2 or Ada, and I have to say Rust looks a lot better than either of those two disasters.”


Is Rust available on the platforms Linux targets? The kernel community won't react fondly to tries to restrict the target support. I'd expect a justified rant with lots of insults at intellect and sanity, if need be.

Is Rust well-understood, both by the users and the toolchain developers? How many independent compilers are there for Rust? What architectures do they support?

I'm not implying that Rust isn't suitable for Linux, but it has open issues, that prevent usage right now. Rust and it's core ecosystem have a lot of churn, and still feel like a rather early work-in-progress. One example not mentioned is the Rust language server and its integration with consumers.

I cheer for Rust, and hope for solid progress.

Another issue I've seen with rust (benchmarks) that's rarely talked about is memory usage, which is pretty important in a kernel. Sometime it only uses a little more memory, but quite often it will use twice as much and occasionally an order of magnitude more.

To be fair these benchmarks are aiming for speed, but the c baseline typically produces faster code with less memory, so it may very well be a case of TANSTAAFL.

I don't see how it necessarily must be TANSTAAFL. Maybe Rust is indeed suited for being zero-cost in memory and speed performance compared to C, but the toolchain isn't there yet.

We use jemalloc by default, which does its own caching. A closer comparison would be to use the system allocator, or make the C code use jemalloc as well.

Oh, I'm not saying Linus would accept Rust code in the kernel at all; I don't think he should for a variety of reasons, including the ones you're talking about. That wasn't what I was responding to, though.

BeOS, Symbian, macOS, Windows, AS/400, GenodeOS, mbedOS

Just a couple of examples.

BeOS' kernel itself didn't use C++ if I recall correctly, though some drivers did. Haiku, BeOS' open-source successor, has a kernel written almost entirely in C++.

If you look at the NetBSD kernel source https://github.com/NetBSD/src.git, it doesn't look like there are any real and substantial Lua kernel modules.

  $ find sys -name '*.lua'

At work we write plenty of C++ for kernel modules, but with the caveat that our C++ can rarely include a Linux kernel header directly. We have to write a small shim for every kernel function we consume.

No library runtime features, so no RTTI or exceptions, but we use templates and dynamic dispatch.

I created a linux kernel dynamic binary translator [1] (think of it as being like an in-situ vmware esxi) using C++. The key was to not use anything that touches floating point. Mostly what I wanted was templates, atomics, and a few other nicities. Recently I got granary working on the 4.4.0 kernel (after a few manual tweaks here and there to the kernel source code and to some of granary's auto-generated files).

[1] https://github.com/Granary/granary

There is plenty of C++ in the Linux kernel. It just masquerades as function pointers and initial fields in structures called "base".

Your getting downvoted, but you are kind-of correct. Even if it isn't actually written in C++, there is a lot of "OO-style" code in the kernel that does things like dynamic disoatch using VTables...basically making explicit what C++ is going to do behind the scenes for you anyways.

I can see the argument from Linus' POV about avoiding over abstraction, and knowing exactly what your code is doing on a low-level, but at the same time it leads to a lot of reinventing the wheel and makes me think there is a better balance to be found.



It doesn't only make you think there is a better balance, it also makes you think harder whether the task at hand needs this kind of complexity. If this is all hidden under abstraction, developers might not care enough.

I absolutely agree with you on the dangers of overabstracting / hiding too much detail. That said, I do wonder if things have changed, since Linus's c++ "rant" re: the type of developers who would choose to work on kernel code in c++. I do remember a time when learning c++ was very much in popular, before the "in" language moved to java, to python, to JS, etc., so there may have been more of a problem with inexperienced coders back then than there is now. (I have no idea if this is true at all, just curious!)

I'm addressing this as someone who has done C++ work on very small embedded systems (e.g. microcontrollers with no external memory) including RTOS code and low-latency control loops, and there were definitely some nice features in C++ that can save you some time and lines of code without adding a bunch of needless overhead. Of course, there was a learning curve when bringing new people on board the project about what was/wasn't allowed for performance reasons. In general though, I see no reason why there couldn't be some subset of C++ used for kernel development other than Linus' / the kernel dev community's general distaste for it.

It is my understanding that there is no technical reason but rather a personal preference of Linus to continue working with plain C.

I did some kernel development (mainly some drivers) and I am also doing a lot of embedded development (credit card terminals, personal projects for ARM Cortex-M) I must say that I am happy with this choice.

The kernel is complex enough and you want to focus on understanding what really matters without being distracted by arcane constructs that would inevitably come with C++.

My experience with C++ guys is that there will inevitably be a population of smart developers which will try to be "smart" with the language which typically ends up with everybody else spending more time on understanding what is happening than the time that was actually saved by the construct.

C does not pose that problem. C is simple and there is relatively little occasion to be very smart with it, which is a good thing, IMHO.

The C++ runtime is too big for the kernel. It is also largely unnecessary. You can easily compile C++ code without STL or larger library support.

This still leaves some holes that the runtime provides, but all of which are easy to provide -- namely things like 'new' and 'delete'.

> "And finally, you’ll need to know at least some C. The C++ runtime is far too large for the kernel, so writing bare metal C is essential."

And here I was about to ask if anyone has a similar article based on Rust :)

"Anybody have any experiences to share from this?"


Does anyone have a story or two of a time you’ve created a kernel module to solve a problem? I would be interested in hearing real world use cases.

I wrote one so I could keep my job...

I work at a company that provides backups for both Linux and windows. The entire concept was around block level backups. You could just open up the block device and copy the data directly but it would quickly become out of sync by the time you finished copying it. We did not want to require LVM to be able to utilize snapshots to solve the sync problem. On top of that we had a strong requirement of being able to delete data from the backup.

This resulted in me learning how to build a kernel module and then slowly over about 6 months creating a kernel driver that allowed us to take point in time snapshots of any mounted block device with any fs sitting on top.

Other requirements also dictated that we keep track of every block that changed on the target block device after the initial backup (or full block scan, after reboot).

I wish I could release the source but my employers would not like that :( So at least for me learning how to write kernel modules and digging in to some of the lower stuff has keep me gainfully employed over the years. It is still in use on about 250k to 300k servers today (it fluctuates).

The hardest part was not writing the module, but getting others interested in it enough so I don't have to be the sole maintainer. I like working on all parts of the product and don't want to just be the "kernel guy".

One other time I wrote a very poorly done network block device driver in about 8 hours. You can find it here https://github.com/mbrumlow/nbs -- Note I am not proud of this code, it was something I did really quick, wanted it on hand to show to a perspective employer -- I did not get the job, I am also fairly sure they did not even look at the driver, so I don't think the crappy code there affected me.

So you implemented something like Windows's volume snapshots/shadow copies? (Why couldn't you use the existing feature?)

EDIT: Thanks for both replies! :)

Pretty much. But Windows snapshots aren't writeable after you take them, which is a major downer and a major necessary feature. (Having this feature lets people do things like "Take snapshot and delete everything except *.lic files, under C:\Users, on every backup" and you can just issue deletes to the shadow volume after it is consistently created, and intercept the new deletions, like normal. Tracking deletions like this lets you more easily know what you should/should not back up -- only back up the allocated blocks in the shadow copy.)

So, there was a thing that added this feature to the Windows kernel in the product to make this work. Aside from the Linux stuff, which was totally separate. But if you don't need the writing capability, Shadow Copies are good enough, sure..

(Source: I used to work with the guy who made the above post)

Yes, and no.

Windows Volume Shadow Copy has the advantage of being integrated with he FS a bit closer. So in Windows VSS can avoid some overhead by skipping the 'copy' part and just allocating a new block and updating the block list for the file.

For the Linux systems we had the requirement to work with all file systems (including FAT). So we could not simply modify the file system to do some fancy accounting when data in the snapshot was about to be nuked. So that resulted in me writing a module that sits between the FS and the real block driver. From there I can delay a write request long enough to ensure I can submit a read request for the same block (and wait for it to be fulfilled) before allowing the write to pass through.

> (Why couldn't you use the existing feature?) We did on Windows, VSS is used with a bit of fancy stuff added on top. For Linux there is no VSS equivalent (other than the one I wrote, and maybe something somebody working on a similar product may have written). And even if one did come about (or is and I am just not aware of) it for sure was not available when I started this project.

It sounds like you implemented something very similar to the LVM snapshot feature. You didn't want to require LVM but ... is that really worse than requiring your custom module which is roughly equivalent?

EDIT: ok so it looks like the management of the snapshot space is a bit different. still, you could probably have wrapped LVM management enough to make it palatable, in less than the time it took to write a custom module

The problem is requiring LVM is a NOGO if they did not already have LVM. We could have standardized on LVM, but then all the people who were not already using LVM at the time would not be able to use the product. At the time many hosting providers -- who we sold too -- just were not using LVM. We to this day still have many fewer people using LVM than raw block devices, or some sort of raid.

Also at the time LVM snapshots were super slow. I don't have the numbers but even with the overhead my driver created I was able to have less impact on system performance.

I was able to do some fancy stuff to optimize some of the more popular file systems by making inline look-ups to the allocation map (bitmap on ext3). This allowed me to not COW blocks that were not allocated before the snapshot. This was a huge saving because most of the time on ext3 your writes will be to newly allocated blocks.

Wrapping LVM would probably not work, and still require a custom module to do, the user space tools don't do much. LVM really is a block management system that needs manage the entire block device, so existing file systems not sitting on top of LVM would get nuked if you attempted to let LVM start managing those blocks, and you still had the issue that reads and writes were coming in on a different block device. Asking people to change mount points was not a option. There were also some other requirements like block change tracking that LVM does not have the concept of doing. This is for incremental snapshots. Without this sort of tracking you will either have to checksum every block after every snapshot if you wish to to only copy the changes. This module also was responsible of reporting back to a user space daemon that keep a map of what blocks changed. So when backup time arrived we could use this list (and a few other list) to create a master list of blocks that we need to send back. This significantly cuts down on incremental backup time. Some companies call this "deduplication" but I feel that is disingenuous -- to me deduplication is on the storage side and would span across all backups.

So yes, requiring a module is much easier than telling a customer they can't trial, or use this product until they took their production system off line and reformatted it with LVM. Many people hated LVM at the time, it was considered slow and caused performance problems, this was like 8 years ago... LVM has vastly changed and does not have these type of complaints any more. But I can tell you people still are going to scream bloody murder if we told them they had to redo their production images and redeploy a fleet of 200+ servers just to switch to LVM so they could get a decent backup solution.

Also shout out to aseipp! Miss working with you. Have yet to find a bug in the code you wrote :p

Most Linux device drivers are kernel modules, so I suppose most of the stories I have would be of the form "I needed to write a device driver for X" :-). Userspace drivers, while certainly in existence, are not very popular in Linux land.

Other interesting stories would include:

- Instrumenting certain filesystem operations (all modules share the same memory space; it's possible for one module to take a sneak peak at another's internal structures). This was back before dtrace & friends were a (useful) thing.

- Real-time processing of instrumentation samples. Doing it in the kernel allowed us to avoid costly back-and-forth between user and kernel memory -- but we only did relatively simple processing, such as scaling, channel muxing/demuxing and the like. If you find yourself thinking about doing kernelside programming because carrying samples to and from userspace is too expensive, you should probably review your design first.

> Userspace drivers, while certainly in existence, are not very popular in Linux land

Actually, that's not true at all! Just look at Mesa3D and any number of GL proprietary blobs from graphics vendors. Userspace drivers are actually quite common, they just aren't what folks think of immediately when they think about 'drivers'.

A few years ago, I worked at a company that had hardware peripherals conntected to a master CPU card running linux through RS-232. One of the peripherals was a power supply that supplied power to another peripheral. Since they were architected as Slaves, the peripherals couldn't directly talk to each other, they had to get the messages relayed through the master CPU.

This whole communication process had to happen within a small time frame, something like 50 ms. A kernel module handled the communication, decoded the packets, and automatically relayed peripheral messages to other peripherals. The kernel module made it easier to achieve stable and real-time operation within the time constraint.

When writing the same function in userspace, there was absolutely no guarantee that messages would be sent or received in time.

I modified a kernel module to use the wifi enable/disable button on my hp laptop as a normal button. For some reason the bios sends an acpi event when the button is pressed, instead of just having it as a normal keyboard key, so there is a kernel driver to handle it.

I didn't end up writing the kernel module, so this might be an anti use case:

I was working on an embedded system and needed to have fast I2C access (mostly just wanted very small latency bc it's an instrument). I2C from Linux userspace (using ioctls) adds a lot of overhead. I started looking into kernel modules but after a day of research I found out that you can access hardware registers from userspace using mmap("/dev/mem") which is even faster than a kernel module.


> mmap("/dev/mem")

but that would only work if some kernel module or the kernel itself hasn't mapped that IO space, right?

I don't think mmap stops you from mapping the same memory twice, nor does linux. In my case I didn't even disable the system I2C driver. If course if I tried to use it I would get io errors and if I am not careful I can crash the kernel, but that's what you get for poking around. This was a raspberry pi.

Yeah, that's what I meant since i2c is a serial protocol and most drivers assume mutual exclusion on device registers.

This was a super-long time ago, but a kernel driver was needed for video capture back in the 90's, to control devices across I2C, etc. At the time I had a Matrox video card which did analog capture, decoding and TV-in, and so wrote the first few iterations of a kernel driver[1] to get it to work. It since got adopted by a great maintainer. Nowadays I'd guess nobody is using it because the hardware is ancient.

1: http://www.cs.brandeis.edu/~eddie/mga4linux/

Hardly a problem per se but I was bored on a long train trip (7+ hours) and I just happened to have (on purpose) a Xbox One controller around, so I wired it in, hacked around on userland with libusb, then wrote a kext for the thing to appear as a proper HID device on OS X.

Two lessons learned:

1. dealing with tangible things for once is incredibly satisfying.

2. next time, download the fscking docs instead of relying on an intermittent 3G connection.


I wrote one so that I could set hardware breakpoints and use them to extract decrypted data from an app which was heavily protected against debugging. The thing was that the app (running as an unprivleged user) had no way of knowing what I was doing in the kernel (uprobes can be used for this too providing the code doesn't checksum itself). But I guess reverse engineering is not a common use case for writing linux kernel modules though :)

Maybe not common, but super interesting if you want to share more.....

I guess that kernel modules are not commonly used for RE because in general there are better tools that support emulation, symbolic execution, dynamic instrumentation and so on (such as qira, vtrace, angr, Intel Pin and more, there are many articles about them). I just wanted something simple and portable to extract/modify registers/memory pointed to by registers/unwind stack when a given instruction was executed, while being as fast and transparent as possible, so I wrote a kernel module... and while it worked(hook mmap via kprobes to get base address + register_user_hw_breakpoints + printk to get output), using it was alot of manual work, and I'm sure tools I listed above could be more efficient. Also since there was no emulation, I had to spoof opening /dev/ urandom, which you don't have to do if you're using qira

It'D be great if My small module Could be published As is, but I'd need to strip all addresses/filenames, and also add more proper locking (but yes, I want to publish it anyway).

Some years ago I was working on an embedded platform with two processors connected via ethernet, one handling data and most of the application load, the second one handling some automotive buses (CANbus and ARCnet). I wrote a kernel driver to expose raw data coming from the ARCnet to userspace applications in the most transparent way. So I labeled data coming from the second cpu with an unused ethertype, I changed the ethernet driver to inspect the ether type of the incoming data and send it up inside the network stack as if it was received from a certain fake ARCnet device. The fake ARCnet device was a dummy module that allowed userspace to receive that data via normal sockets. Probably it could be done in a better way but it was quite simple and it worked well.

I wrote one that decodes Wiegand[0] off of GPIO. I ended up not using it because epoll-ing the pins is easier to use from a userspace perspective and fast enough to capture the transmission. I loosely based the kernel module off of one that was written for another microcontroller[1].

[0]: https://en.wikipedia.org/wiki/Wiegand_interface

[1]: https://github.com/rascalmicro/wiegand-linux

I did the same but kept it in-kernel to meet Wiegand timings. I had (horrible) another driver which could hang the system for long enough to lose bits.

I was in Australia for that work too...

I've written simple Linux Security Modules, as learning experiences, but also to help tighten up a bunch of virtual machines.

The one that I'm most proud of delegates decisions on whether executables can be run to .. userspace. Which is simultaneously evil and brilliant.

i.e. User "foo" tries to run "/tmp/exploit" and the kernel executes "/sbin/can-exec foo /tmp/exploit". If the return code of that is zero then the execution is permitted otherwise it is denied.

This gives you ample opportunity to log all executables, perform SHA1-hash checks of contents, or deny executables to staff-members outside business hours. There's a lot of scope for site-specific things, creativity is the limit!

I think it would be much appreciated if this software could be released as open source.

How did the user space redirection handle potentially high spikes of execve's?

And btw, thanks for all the years of debian-administration.org.

Yes, I wrote one a long time ago that exposed basic counters from the ARMv7 PMU to userspace -- where they are normally not accessible unless used through perf_events. This is because perf wasn't available (I think the shitty old vendor BSP I was using couldn't support it for some reason) and it's relatively costly (perf_event_open is a whole syscall + another syscall to `read` off the event info) -- I just wanted some basic cycle timings for some cryptography code I wrote to see what worked and what didn't.

Even though it has some annoying gotchas (such as the fact ARM cores can sleep/frequency scale on demand with no forewarning, meaning cycles aren't always the most precise units of measurement), and is very simple -- this thing ended up being mildly popular. Even though I wrote it years ago, someone from CERN recently emailed me to say they happily used it for work, and someone from Samsung ported it to ARMv8 for me...

(I should dust off my boards one of these days and clean it up again, perhaps! People still email me about it.)

Usually to control on-SoC peripherals/IP blocks or to add support for other external devices (that's what kernel's for, mostly, anyway) - camera sensor, I2C controlled regulator, thermal sensor,... story? no. :D I just wanted to have a nice non-amd64 SoC based desktop, and some things were missing at the time.

My real-world use case is a hypervisor. That turns out to be some pretty wild code. Intel's VMX documentation is not well-organized or complete, processor models vary, the hardware has bugs, and some things we have to do aren't even officially supported by Intel. We have to mess with page tables. We have a huge block of assembly code. It all has to run perfectly.

In case that doesn't make you want to run away screaming, I put a post in the "Who is hiring?" article:


i was working on an embedded device with only RAM and a flash chip for the software blob.

i wrote a kernel module that allowed us to keep logs in RAM even across a soft reboot (i.e., if the device crashes or does a software update). it basically reserved a chunk of 1M at the top of RAM (using the same physical address each time). there was a checksum system that allowed it to tell during boot whether the data is still valid or if the bits had decayed.

i've also done a couple i2c drivers for temperature sensors or LED controllers. kernel work is fun.

Not for Linux per se but I've been developing a variant of CAN(+FD) kernel extension for XNU/OSX. Some things are manageable through userland so not everything is done in kernel but extending berkeley sockets is really fun and I'd suggest anyone interested to try it.

Replying after checking my notes from the University (NTUA): Before around 14 years (~ 2004) one of the excersies we had in Operatin Systems Lab was to actually implement a char device driver for Linux called ... lunix as a kernel module. The actual device was just a buch of bytes in memory.

These times were so happy - implementing a device driver seemed like owning the world for a student!

If anybody wants I'd be happy to put the source code in github - mainly for historical reasons because I'm not sure that the code will be still working today.

There is a similar exercise in "Linux Device Drivers." The first driver they ask you to write is a char device that runs in memory called "scull."

Here is a link to the free book (warning it's a pdf):


And here is a link to a github project with all exercises updated for the most recent kernel:


Ha ha "scull" ringed a bell and after taking a look at the cover of the book you mention (it's not in the pdf you provide but can be found if you search for images for Linux Device Drivers oreilly) and see the horse ... I totally remembered the book!

It was one of the refereces I used while implementing the module and it was a really good and comprehensive book - totally agree with the recommendation :)

Please do. One of the issues for tfa is how little it actually covers. While it quite fairly says module development is more like writing an API than an application, it then proceeds to entirely ignore any hooks other than init and exit!

Here you go:


Notice - I tried compiling it but was not able to (and I don't have time to research modernizing this code).

I'll try adding it tomorrow to github (it's a little late here right now). In the meantime, I'd like to also recommend the book "Linux Device Drivers" that fellow commenter magpi3 mentioned.

>> A Linux kernel module is a piece of compiled binary code that is inserted directly into the Linux kernel, running at ring 0, the lowest and least protected ring of execution in the x86–64 processor.

The author seems to be implying that rings are implemented at the processor level for x86-64 processors. If I’m interpreting the wording correctly that’s interesting! Coming from the ARM world I’d always thought that rings were an OS construct.

Most chips have different privilege levels... ARM processors have 'modes' like User, FIQ, IRQ and Supervisor (sort of the equivalent to ring 0). Different modes can have different access controls, e.g. varying access to memory.

Edit: See http://www.heyrick.co.uk/armwiki/Processor_modes

Right - out of curiosity do you know if the Linux kernel uses the term "ring", or some other terminology to map to the underlying hardware implementation, be they rings for x86 or modes for ARM? Maybe it's just "privilege level" or something similar?

Gustavo Duartes has a really great blog[1] about kernels and low-level development, with Linux on x86/amd64 being the reference. Here[2] is the one where he goes into CPU rings and kernel- versus user-mode.

1. http://duartes.org/gustavo/blog/archives/

2. http://duartes.org/gustavo/blog/post/cpu-rings-privilege-and...

The Linux kernel just talks about "kernel mode" and "user mode" (or "userspace"). Those then map to ring 0 and ring 3 on x86, or other mechanisms on other platforms.

Cool, thanks!

ARM has modes which is somewhat analogous to x86 rings.

The genesis of hardware protected modes in x86 devices go back to 1982 and the 80286.

Kernel newb w/ a question on this. When you register the device with

you also have to create the file in userspace with


You never need to use mknod yourself anymore. Based on the metadata you provide to the kernel, the kernel will create the device itself in devtmpfs, and even set appropriate baseline permissions (e.g. root-only, world-readable, world-writable). udev will then handle any additional permissions you might need, such as ACLs or groups.

Out of historical interest, why was it necessary before? Was it an intentional separation between userspace/kernel responsibilities?

A few reasons. Prior to the existence of udev, all devices had to be created with mknod and have permissions set; a script called "MAKEDEV" did that. That didn't handle dynamically created devices, so (skipping "hotplug" and an old attempt called "devfs" which was filled with race conditions) the kernel started sending events to userspace for dynamic devices, and udev created those. Then, it made sense to "coldplug" all the existing devices and have udev create those, to use the same unified path for everything, and use udev to set groups and permissions. But that did take some time at boot time, and it meant that you needed either udev or a manually created /dev for tiny embedded systems. devtmpfs handled that use case, automatically creating devices from the kernel. But then, if the kernel knows how to create devices, why not have it do so all the time? So, udev jettisoned the code to create devices itself, and started requiring devtmpfs.

netcat released their album as a linux kernel module [0]. Browsing their code might be a more fun way to learn how to write a kernel module with a real "use case".

0: https://github.com/usrbinnc/netcat-cpi-kernel-module

Kernel code does not operate at "incredible speed". This guy doesn't know what he is talking about.

You are correct about the speed, but the short response combined with the unjustifiedly overly general attack of the authors knowledge is unnecessary.

This is somewhat unnecessary, but this article is spreading false claim about kernel mode "incredible speed" and this should be denounced.


At least virtual memory impacts performance, which is one thing user space code uses

Kernel code also uses virtual memory. There is no such thing as using "physical memory" unless you are running in real mode. CPU is switched to protected mode during boot.

It depends on what you're doing. Eliminating context switches can be hugely valuable. I agree the author should have been more nuanced, but, he's not flat-out wrong.

That's actually true. There might be some more initialization overhead if you exec a process (the libc stuff), but non-kernel (i.e. user space) code is not slower per se.

In the device_read function,

  len — ;
should be

  len --;

Probably a CMS thing. printk is probably also discouraged, these days. pr_warn/info/..() or dev_warn/info/err...() should be used.

Also for anyone writing kernel code, this is indispensable: http://elixir.free-electrons.com/linux/latest/source

It doesn't exactly apply here since the author doesn't seem to be allocating resources dynamically, but another thing that's "newer" and less widely used are the devm_ family of functions.

Nowadays, lots of code could be using devm_kmalloc, devm_ioremap, etc which will release the resources automatically when the driver detaches from the device.

Copy-paste trap!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact