Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why the Linux Kernel doesn't have unit tests?
240 points by develatio on Nov 25, 2022 | hide | past | favorite | 238 comments
Linux is present in a big percent of the devices on the planet, from smart phones to servers to IoT devices to (...). It's fair enough to assume that bugs and/or regressions in it can (and will) affect big portions of these devices. This makes me wonder why aren't there any unit tests in the kernel (and the different drivers in it; especially file system related). Or maybe there are and I just haven't found them?



To add to all the excellent answers in this thread: unit tests are massively overrated.

It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.

What is the external interface of the kernel? What is its surface area? A kernel is so central and massive the only way to test its complete contract with the user (space) is... just to run stuff and see if it breaks.

TDD has some good ideas but for a while it had turned in a religion. While tests are great to have, a good and underrated integration testing system is just for someone to run your software. If no one complains, either no one is using it, or the software is doing its work. Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

https://www.youtube.com/watch?v=EZ05e7EMOLM


Unittests is not the same thing as TDD religion. At its base, it just means dividing things up into small units and testing their functionality as exhaustively as possible. It's a divide-and-conquer.

The hard part in adding unittests is deciding what a unit is, and when a unit is important enough that it should have its own battery of tests. Choosing the wrong boundaries means a lot of wasted time and effort testing things that likely won't break or change so fast that put a drag on refactoring.

I disagree that a kernel can't or shouldn't be unit tested. At the very least, it has a strong interface in the userspace system calls. Of course you should unit test the system calls. Especially because Linux's motto is not to break userspace.

The other benefit from unit testing that is overlooked is that it accelerates optimization. If you have a good set of unit tests that test all observable behaviors of a system means you can optimize the logic inside and constantly rerun those tests to make sure it's not broken. This speeds up the experimentation process and clearly delineates the interface so that you can see when and how you can "cheat" on things that aren't observable.

So, hard disagree. Test the bejesus out of the kernel. It will harden it and open up the ability hyperoptimize the units inside.

edit to add: well-written (i.e. terse and readable) tests are excellent documentation on how a unit should behave.


I both agree and disagree. There are PARTS of the kernel that probably should be unit tested. As I specified in my answer, these tend to be algorithmic. I would also agree that an interface that cannot change ("don't break user space!") makes a good candidate possibly.

However, I think there are parts that probably should NOT be unit tested, because unit testing isn't free and slows down change by getting very close to internals. As an example, device drivers that are tied tightly to hardware. While it would sound nice to have an nvidia GPU fake/mock, in reality it is probably not possible to create one, keep it up to date, etc. The complexity would be enormous and might not show anything as you wouldn't know if it was a bug in the fake or driver.

As such, I retain my answer of: it depends what you are trying to accomplish. Different testing strategies for different types of code and domains.


There are some gray areas, sure. I usually find a way around making mocks, but maybe it's just the domains I am in. My biggest mistakes are mostly just writing tests that are too intertwined with internal interfaces. So they break all the time and need to be adjusted when refactoring. That's a drag.


I think the same is true of any sufficiently complex project. There are areas where unit tests are valuable, and areas where the costs outweigh the benefits. There isn't anything really special about a kernel in that regard (except for the specific things that aren't worth unit testing).


In a commercial setting, unit test coverage inevitably becomes a KPI that needs to be driven upwards regardless of whether rank and file engineers consider it worthwhile in particular cases.


> As an example, device drivers that are tied tightly to hardware. While it would sound nice to have an nvidia GPU fake/mock, in reality it is probably not possible to create one, keep it up to date, etc. The complexity would be enormous and might not show anything as you wouldn't know if it was a bug in the fake or driver.

Ok, but then again: How would you make sure your Nvidia driver is working correctly?


> If you have a good set of unit tests that test all observable behaviors of a system means you can optimize the logic inside and constantly rerun those tests to make sure it's not broken

This is a naïve utopian world view. The word "good" is doing all the heavy lifting for you. In the last 15 years I haven't seen a single company that had "good" unit tests. I'm inclined to believe they don't exist.

Some of the tests I saw turned out to be good. Most of them weren't.

The reality is that if you're committing to unit tests, a big chunk of your tests will be shit. And when that's the case, accelerated optimization is far from guaranteed.


Your comment sounds like TDD religion to me.

"Deciding what a unit is" isn't the hard part. The hard part is finding the units that benefit from unit tests (and convincing other religious people about this, as I am trying to now, which in this case - not many).

The Linux system calls are "deceptively simple". Simple to test, right? How complicated can say `write(2)` be? But if you actually tried doing it, I'd be surprised if you can write reliable "unit" tests beyond writing to /dev/null.

The practical way to test system calls are to test it with real world usage. Unit tests here might catch some problems, but the vast majority of kernel bugs aren't those that can be caught with unit tests. (If they're bad enough, the bug will cause the OS to crash before it completes booting, no unit tests needed.) The more subtle bugs are often hard to reproduce, only happens in certain loads and hardware.

As to your final edit-to-add: you're joking right? Like Linux syscalls need more documentation. POSIX (I'm aware this is not exactly Linux documentation, but still) was a spec literally before Linux was started.

edit PS: https://tenor.com/view/unittest-unit-test-gif-10813141


> Your comment sounds like TDD religion to me.

GP is using the original definition of unit test, GGP is using the religious implementation-level definition TDD advocates have come up with.

From GGP:

> but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.

They call this an integration test, but by the original definition, testing the interface is a unit test. An integration test is how multiple units interact with each other. Testing the implementation details is what TDD advocates have turned unit testing into, by changing the definition of "unit" from semantics (a self-contained module that does one thing from the business perspective) to syntax (a function).

> edit PS: https://tenor.com/view/unittest-unit-test-gif-10813141

Seems like you agree with the original definition. The TDD version of this GIF would be a test for each part that makes up the handle (the implementation details instead of the interface).


The semantics doesn't really matter either way. The comment I originally replied to basically implied the whole Linux kernel is one "unit" (or at least each kernel syscall is one unit... which isn't much difference due to large interdependencies between them).

Using that definition, that's just basically saying "the kernel can benefit from ... tests". Of course the Linux kernel doesn't have "TDD-religion unit tests", but there's already more than enough "tests" for the Linux kernel, and there's more than enough benchmark tests for the Linux kernel (for the optimization argument). They're just not "unit" tests in any meaningful way.

From the pedantic side of things, the original comment said "(TDD-religion) unit tests are overrated", then the reply was either "no they're not", or "no, unit tests (as originally defined) are not overrated". The latter is somewhat inconsistent with the "hard disagree", so either they were confused as to which definition they wanted, or cherry picked properties from both unit testing regimes.


I don't know how many LOC you could trigger with a write. Including all drivers, filesystems , ..., perhaps millions. Hardly a unit test. And then, a driver is difficult to unit test. BUT some parts of the kernel could use unit tests. The thing is, they don't and the usage of the kernel shows that they would be triggered by usage, but it wouldn't harm.


Yeah even in the webdev world, the zeitgeist has shifted from

"write as many tests as possible, 100% code coverage, in fact 100% branching code coverage. Use an absurd amount of mocks to achieve this"

to

"write tests, not too many. mostly integration". (Guillermo Rauch tweet)

And thank god because 100% code coverage always felt like an exercise in obedience to dumb process over good judgement.


To add to Guillermo's mantra, a good suggestion from the video I posted is "write tests exercising the internal logic before a big refactor. Then remove them."

Because otherwise, the more tests you have against implementation details, the more times your tests will break just because you've moved some code around. Tests need to go red when there is a bug, not because you've renamed a couple of private methods and now the mocks are broken.


Depends a lot on what it is you're doing. Like if you're writing some finnicky library code, exotic algorithms or data structures or whatever, you'll probably want more test lines than code lines because that sort of stuff is notoriously difficult to get right. Even something as pedestrian as a binary search is downright gnarly to get working in all cases[1], the bugs are difficult to reason about and the code doesn't reduce to more comprehensible primitives.

On the other hand, if you do that with application code, you're essentially submerging your code in a tar pit. Any change in behavior will require changing dozens of tests. Fixing broken tests will become so routine that they stop telling you anything about your code. Development will become "I changed from foo to bar, now let's update the 73 assertions that failed to whatever value the test runner says they have now".

[1] https://stackoverflow.com/a/6393352


To go farther: tests should not break during a refactoring. They're here to let you know if your refactoring broke something. They should even be green when you decide to scratch your software to rewrite it in another language.

Code coverage? Use it to determine what is dead code you can remove: if your tests don't go through some code, that code is useless.


I agree with the first part of your comment, but not the second. Achieving 100% code coverage is very hard and sometimes unreasonable. E.g. How do you test a GUI? I think that often it is better to write tests for the application logic and test a GUI by using it.


question:

How do unit tests stay green when you rewrite in another language? Are your tests an external API you call?


Rewrites are not super common -- but IMHO if you must do one the first step is to move the tests over. Unless your goal with the rewrite is to rearchitect the interfaces at which point your basically greenfielding anyways.


this is simply not possible. if you move a method from one class to another every unit test breaks.


If you’re using a lot of mocks you’re gonna have a bad time.

I think I can agree with the sentiment that mock heavy tests should be short lived, and why you don’t want to keep those around. Not sure I can agree with the rest.

I aim for max of two stubs per test, and many of those end up with a TODO. Pure functions need no stubs, and most well factored code should only need at most one. One stub one test is pretty sustainable. It’s easy to rewrite such a test if the requirements change. Unit tests should absolutely be disposable, but you don’t dispose of them all at the same time. Just the ones that don’t fit the new rules.

I do run into a steady stream of people who can’t seem to understand that the tests should affect the structure of your code. “And then a miracle happens” is what you have there - long intervals where the system produces no verifiable state or output is bad. That’s not an architecture. It’s lack of it. A functional programming style makes this easier to avoid, but it’s not a cure, because the disease is in their heads, the code is the symptom.

There’s a substantial overlap between people with untestable code and people with undocumentable code. They can’t explain the code to the test framework any better than they can explain it to each other.

All that said, testing is hard. It shouldn’t be this hard and we need to keep looking for ways to improve that situation. But even here we have people who reach for the least expressive solution quite frequently, such as assert over more reflective matchers, which make for much more useful red tests. At least BDD style seems to be winning out.


I worked at one place that had so many mocks, your tests were barely testing any real code. In one case, a dude checked in a 5000 line test suite for an "internal API." The server was mocked. All the tests were doing was checking that calls echoed back their own parameters. What was the point? Well, the API client now had 100% coverage.


Going into new segments of our code, I’ve had to rewrite an awful lot of tests. It was a mess, and each one had at least a few tests that were only testing the mocks. Hand written mocks at that. Those people need to be stopped.


This is 95% of tests I read and write at my job.


It wasn't that bad at a previous job, but it was close. The sad part is nobody else would comment on the useless nature of these tests that didn't actually test anything.


It’s very tricky to be against any kind of testing in a professional setting… opens you up for other engineers to question your maturity, professionalism, an and commitment to reliability. Or at least to look substantially more mature, professional, and committed to reliability than you. In front of management that can be death.

People with a lot of clout have absorbed the virtues of automated testing in general and applied it to unit testing in particular. It’s hard to swim upstream on that one.


It's true. I mean, I wouldn't comment on them either. I'd just roll my eyes when asked to review another 4000 line "test suite" that upped the coverage but did not test anything meaningful. I wrote many useless tests myself, assigned tasks like "upping test coverage for module XYZ." They'd all get thumbs up and looks-good in reviews.


To play devil's advocate here, what usually suffers the most in integration-only testing is testing error handling. In some codebases it's extremely important to test all the error handling paths because they are not extremely exceptional events. Many times, the disproportionate amount of failures are from error handling not precisely doing the right thing even though it looks reasonable upon code review. It might not mean 100% coverage, but in these situations, ensuring comprehensive tests of all the error conditions (inducing OOMs and other resource limits) helps make sure the failure paths work just as well as the normal ones. Failure of a failure path can manifest at best in ways such as hiding the true source of an issue, telemetry problems, logging statements without salient values; at worst, crashing because of a chain of events triggered by the seemingly innocuous error handling that passed a code review. OOM modeling to test every code path that allocates in a function is the most tedious thing in a gtest, but often yields the most surprising, actionable fixes. Sadly, most programmers just ignore OOMs and hand-waive "if allocation fails, the system is broken and my software doesn't need to work correctly".


This. So many "do not cover" hints for exception handler blocks. So many lost root causes when handler tries to format a error.


https://www.npr.org/2008/01/01/17725932/in-defense-of-food-a...

Just want to point out that advice is inspired by Michael Pollan's diet advice: "Eat food. Not too much. Mostly plants."


For the webdev world, I think a big reason for this change is that many web systems talk to so many underlying 3rd party services that often times it's subtle, backwards incompatible changes in those other services that break things. If you're just only testing everything with mocks, you're never going to catch what will often be the worst breakages and bugs that your users experience.


I think better type systems, like the one in Typescript have made mainstream the use of types to remove some basic bugs. This makes some basic tests not needed anymore, and gives more time to focus on the important ones.


So we’ve forgotten the lesson of the testing ice cream cone already. Good to know.


> Write as many tests as possible, 100% code coverage, in fact 100% branching code coverage.

In my eyes this still hold true. Software that has expected behavior should be tested to make sure that the behavior isn't broken due to changes to any of the involved hot code paths or data formats.

I once wrote a code library for some boring business system that handled integration between the system and a JWT library, which would make sure that certain requests should be serviced. Using a library without tests wouldn't be acceptable (security related/adjacent code is perhaps the best example of such circumstances). Neither would my code not having tests, either, given that this library would be used across multiple services within the system.

Thus, I wrote tests until I got pretty close to 100% coverage and doing that actually helped me discover a few bugs while I was actually writing the tests! Not only that, but once the need to refactor something arose due to changing requirements, the tests breaking told me exactly what I had overlooked while doing those changes. Not only that, but if I'm long gone and someone comes to make changes to the library, the CI will tell them about the things they might overlook themselves, aside from any boring Wiki that they wouldn't read or other docs. The tests also demonstrate all of the ways how the code can actually be used, so aside from the occasional code comments, they also serve as living documentation.

There absolutely are cases where testing something won't be viable (e.g. different file systems the code implementations for which depend on the runtime that's installed on the system, whereas all you get is a leaky abstraction in front of these and your test setup doesn't contain every covered platform, for example, checking which file paths are parsed as valid and which aren't across different file systems on different platforms), but in most systems they're not the majority.

> Use an absurd amount of mocks to achieve this

You also hit the nail on the head here - this is a problem and a symptom of us perhaps developing all of our systems wrong. The main reason for not writing tests (one that I can understand) is the fact that it's not easy to do so. You end up with various mocking frameworks and libraries that try to take away some of the pain caused by the fact that your entire system is not testable, but end up with more complexity to dance around in the end.

I think the only way around this is to do data driven design that's coupled with functional programming in ample amounts, with as many pure functions as you can get. This would be completely un-idiomatic for many of the languages out there (e.g. those that rely on injecting services/repositories/whatever in fields, instead of passing everything a function needs in the parameters), but is also the only way how you could make testing easier. Maybe passing interfaces to "services" (many seem to use the service) pattern would be wrong and instead you'd need to pass in separate methods that your code will use. So instead of passing in UserService you'd pass in UserService::getUserById.

So in a sense, it's a struggle to find the balance between code that is absolutely untestable without being in mock hell, to the point where you test mocks and your tests are useless and ending up having to write code that goes fully against how things are done in any given language and the frameworks you'll use within it, probably ending up with more code meant for decoupling those parts than you have the time to maintain.

> write tests, not too many. mostly integration

In an imperfect world, I guess we can just pretend that this is okay, because it will give you the most results, compared to the amount of work you need to put in. At the end of the day, nobody wants to pay 10x more for systems that are nearly perfectly tested, they just want something that is vaguely decent and will accept hand-wavy apologies for everything constantly breaking, as long as the breakages are small and non-critical enough. Devs also don't seem to typically enjoy writing tests, in part due to some systems not being testable easily, but also because of many tools, in particular mock frameworks and even integration testing tools (like Selenium, which thankfully has more and more alternatives), just being unpleasant to use.


Completely agree. Integration tests over unit tests all day.


Are your integration tests able to run quickly/reliably enough to include them in your ci pipeline?


Yes, and in my experience from the Ruby world integration tests are often as fast or faster than unit tests.


Yup. Very much so. In Java. Can't speak for other languages.


> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

Maybe not, but isn't it true that any new code that is being added to the kernel has been run on exactly 0 devices? And new code is being added all the time.

Maybe that's why the most recent commit as of right now is loaded with fixes of broken things: https://github.com/torvalds/linux/commit/08ad43d554bacb9769c...

It's one thing to say that it's impossible to test effectively locally before release (which I'm not sure if that's true or not). But you're saying it's just not worth testing because it'll break in real life and that's even better, which I"m not sure I can agree with.


That merge only has fixed because of how the development model works: there’s a merge window where new features are merged and which then becomes -rc1. After that only fixes are allowed until the final release of that version and then the merge window opens again.


That makes sense, so I just picked another commit at random without a -rc tag:

https://github.com/torvalds/linux/commit/c60c152230828825c06...

  The fix is to change && to ||
This seems like the exact type of bug that a unit test could prevent.


> Maybe that's why the most recent commit as of right now is loaded with fixes of broken things

Are you saying that if Linux had tests, there would be no bugs?


I think that's a bit of a strawman. I don't think anyone says tests prevent all bugs. They can help fix some parts of the contract down and give you a place to check against regressions.


GP pointed out to a bug fix commit as proof that the kernel would benefit from having tests, so it's not that much of a straw man.


Isn't SQLite a counterexample? They have more code that has tests than their actually running code. https://www.sqlite.org/testing.html


SQLite doesn't have hardware drivers (that IIRC are still majority of kernel bugs) that need actual hardware to test (as hardware mocks are at best mildly useful, coz real hardware can be... weird)

And unit testing is relatively useless, what would be useful are end to end tests that start with userspace API but that's much harder task, altho hopefully those tests wouldn't need to be changed that often.


I work as a software developer in a hardware company. We have A LOT of tests.

But testing costs a lot more when there is hardware present. It's a whole new dimension in your testing matrix. You can't run on cloud provider CI. Hardware has failures, someone needs to be there to reboot the systems and swap out broken ones (and thing break when they run hot 24/7). And you need some way to test whether the hardware did what it is supposed to do.

Although some kernel driver developers have testing clusters with real hardware (like GPU drivers devs), there is no effin way Linux could be effectively tested without someone paying a lot of money to set up real hardware and people to keep it running.

Of course the hardware can be simulated but simulation and emulation are slow (and potentially expensive). At work we run tests on simulation for next gen hardware which doesn't exist yet. It is about 100,000x slower than the real deal, so it's obviously not a solution that scales up.

Testing purely software products without a hardware dimension is so much simpler and cheaper.


I think people have forgotten or indeed never learned in the first place that testing terminology borrows from physical device testing culture. A “fixture” for instance is a mock environment that you place a piece of hardware into to verify it.

Hardware testing takes up a pretty sizable portion of the development cycle. Why do we think we are special?


kind of this -- I can't imagine any hardware company moving forward without fixture test flows of all sorts and a lot of effort put into them. It really seems like an area where software tries to stand on the edge of "art" forgetting that there is a science and craft needed when teams revolt against testing.


> Although some kernel driver developers have testing clusters with real hardware (like GPU drivers devs), there is no effin way Linux could be effectively tested without someone paying a lot of money to set up real hardware and people to keep it running.

Just crowdsource this testing: I am sure that there exist some people who own the piece of hardware and are willing to run a test script, say, every

* night

* week

* new testing version of the kernel

(depending on your level of passion for the hardware and/or Linux). I do believe that there do exist a lot of people who would join such a crowdsourcing effort if the necessary infrastructure existed (i.e. it is very easy to run and submit the results).


I think you are vastly overestimating the willingness and capability of volunteers and underestimating the effort needed to coordinate such an effort.

And having any kind of manual intervention required will almost certainly reduce the reliability of the testing.

This is further complicated by the need to reboot with a different kernel image. Qemu and virtual machines can't do all kinds of hw testing needed.

And in fact, the kernel is already tested like this. Just very irregularly and sporadically. The end users will do the field testing and it is surprisingly effective in finding bugs.


> I think you are vastly overestimating the willingness and capability of volunteers and underestimating the effort needed to coordinate such an effort.

> And having any kind of manual intervention required will almost certainly reduce the reliability of the testing.

Perhaps I am underestimating the necessary effort, but the willingness and capability problem can in my opinion be solved by sufficiently streamlining and documenting the processes of running the test procedure.

If the testing procedure cannot be successfully run by a "somewhat experienced Linux nerd", this should be considered a usability bug of the testing procedure (and thus be fixed).


I thought they already had clusters like this. Maybe I’m thinking of FreeBSD?

Dogfooding of builds is also testing, just a different kind.


The NixOS community would be perfect for this, since "nothing" (i wouldn't wanna do experimental filesystems) can break my system, I just atomicly roll back to my previous generation and report it broken :)


You can buy TH3 Testing Support for SQLite. Sole item on pricing page (https://www.sqlite.org/prosupport.html) that has "call" as a price. From their page: "The TH3 test harness is an aviation-grade test suite for SQLite. SQLite developers can run TH3 on specialized hardware and/or using specialized compile-time options, according to customer specification, either remotely or on customer premises. Pricing for this services is on a case-by-case basis depending on requirements."

SQLite Test Harness #3 (hereafter "TH3") is one of three test harnesses used for testing SQLite. TH3 meets the following objectives:

- TH3 is able to run on embedded platforms that lack the support infrastructure of workstations.

- TH3 tests SQLite in an as-deployed configuration using only published and documented interfaces. In other words, TH3 tests the compiled object code, not the source code, thus verifying that no problems were introduced by compiler bugs. "Test what you fly and fly what you test."

- TH3 checks SQLite's response to out-of-memory errors, disk I/O errors, and power loss during transaction commit.

- TH3 exercises SQLite in a variety of run-time configurations (UTF8 vs UTF16, different pages sizes, varying journal modes, etc.)

- TH3 achieves 100% branch test coverage (and 100% MC/DC) over the SQLite core. (Test coverage of extensions such as FTS and RTREE is less than 100%).


> tests the compiled object code, not the source code

That's not something you see called out very often. Correct code fed to a broken compiler can definitely give you a broken binary. Likewise a correct binary will pass on a correct simulator and may fail on broken hardware.


I would note here that SQLite gets reasonably close to the Unix Philosophy. It’s not really a feature factory sort of application. That does make it quite a bit easier to write good tests. Applications that don’t know what they are or what they do are difficult to pin down.

I take that as an argument not for or against tests but against making it easy for management to turn your project into a feature factory.


A project as large and well funded as the Linux kernel could have a hardware test farm at least with reasonable coverage of popular hardware. "But Linux isn't that well funded!" Sure, but it's orders of magnitude better funded than Embassy[0], which runs tests on real hardware automatically before every merge.

There's also the Linux testing project, which is technically third party. It's not clear to me how extensive it is but for a project as important as Linux I think it has to be graded as "needs improvement."

[0] https://github.com/embassy-rs/embassy

[1] https://linux-test-project.github.io/


In other words: unit testing requires that your code is a unit.

i.e. does not depend on external timing or fickle hardware.

Very, very difficult to unit test someting that runs using hardware interupts.


> real hardware can be... weird

That weirdness is precisely the sort of thing you want automated tests for.


I don't think you can argue they are classic, mock-the-world test one function tests. Giant majority of those are just huge amounts of SQL statements running all over - which are integration tests.


> They have more code that has tests than their actually running code.

I'd say that's very common and not exclusive to some specific project.

I made a change of about 30 lines of code recently, and it had 140 lines of tests (and I, by far, do not cover every situation).

I think I've rarely encountered well tested pieces of code that were not way smaller than the tests that came with them.


Isn't the public API quite narrow, though?


The API is massive. Just the number build parameters and boot parameters is very high. There is also a ton of things you can tune at runtime in /sys/proc . Not to mention all the syscalls and the subtle ways they all interact with each other.


Surely it’s miles smaller than the public Linux API.


sounds like someone should write tests for that


Those are more integration tests (load stuff into the DB and query it out) vs. unit tests (set up mocks around functions and call them with different arguments).


It's a cursed notion that unit-tests are "set up mocks around functions and call them with different arguments".

It's more useful to categorize tests by how hard to setup an environment. In reality cost/usefulness line lies on this boundary.

* in single process * multiple process using IPC * multiple processes using network * tests validate functions calling foreign services

I see many devs (including myself) call "integration" tests as "unit" tests because in your particular app/system they are easy to spawn locally even without any container.


While I tend to agree, the Go toolchain authors are very specific: it doesn’t count as unit test coverage when the code and the test are in different packages. And our management has no interest in breaking with them on this.

You are free to write tests that compose packages, and I’ll do it for my own sanity and confidence. But that’s in addition to, not instead of, the minimum “not horrifically irresponsible engineering” standard of universal, exhaustive, very detail oriented mock driven unit tests within all packages.


Agreed, people tend to over complicate and make testing religious. In reality, testing is nothing more than a means to an end. Ask yourself these questions for your project:

1. Are there pieces that can be tested independently where verification is valuable?

2. What is the impact of a bug or regression?

3. What type of testing is most likely to uncover issues in my product given its architecture and domain?

4. Am I confident to refactor the code base without creating new bugs?

etc. etc. Iow, testing isn't magic, it should be driven by business goals. For example, I use Rust which due its great type system makes some types of code "just work", however, Rust cannot prevent logic errors. This means when I'm writing 'algorithmic code' I tend to write heavy unit tests. When I write API driven code, I tend to use more integration/end-to-end style testing. Do what makes sense for your code and goals. Tests take time and need to be refactored, so they aren't free, but can be valuable.


> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

This attitude blows my mind. The point of tests is to find bugs before you ship them to the user!

Writing tests after you've already verified that some piece of code is working correctly (e.g., by executing it 10^15 or whatever times on different systems with different inputs) is useless, I give you that. But what if you want to make a non-trivial change to the code? How are you going to make sure you didn't introduce any bugs without tests?

Automated tests are not the same as TDD.


I think the OP took the strictest and dumbest definition of testing possible, used that as his straw man, and then lit it on fire.

The fact that systems level programmers, generally speaking, have a disregard for testing automation is apparent. Over the years if somebody had the "pleasure" of running any type of linux distro, how many times would things break randomly for reasons that the end user could only imagine? Many of those issues probably could have been surfaced and fixed before the code shipped had there been any testing automation in place.

But let's forget linux for a minute. Look at Windows. For the longest time the quality of windows was the biggest joke in software. And those engineers working on it, sorry to say, were cut from the same cloth as the Linux kernel folks and all other systems programmers. We don't need no stinking tests. Well eventually somebody that cared about the reputation of the company forced testing upon these teams and lo and behold things have gotten way better. Apple is the same way.

To me it is honestly astounding that there is still such a strong anti-testing sentiment in the industry. When you have more than a handful of people working on a complex system I greatly prefer to know that there is a robust test suite looking at important functionality, seeing if performance degrades, checking for static analysis errors, etc. And when something does break, since I already have a testing solution in place, it is generally easy to add a test which covers the regression and ensures that it never comes back.


>What is the external interface of the kernel? What is its surface area? A kernel is so central and massive the only way to test its complete contract with the user (space) is... just to run stuff and see if it breaks.

Having never interacted with the kernel, I have to assume it isn't just one massive file right? It's broken into separate files and components? And if you ever want to modify or refactor one of those components its nice to have the confidence that your code changes are safe without having to rebuild the whole thing and run your integration/behavior/e2e tests. Especially if you aren't the one who originally wrote the component you're modifying and don't know what the intended behavior for edge cases was.

Obviously if you're mocking everything then the tests might be pointless(there can still value in preserving in the test what you assumed the behavior was), but I feel like in some ways people have over-corrected on TDD and claiming unit tests are pointless.


> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

Yes, you do, because otherwise you will find that some change or new hardware will break it.


If some hardware breaks it, the hardware must be seriously broken.

If a change breaks it, some one will probably have called read(2) a few times before that kernel is shipped to you. A change is filtered through multiple developer and the kernel has some kind of smoke and consistency tests it is subject to. Also there are people and companies that compile their kernel from the master branch and run software on it.

No one's gonna ship Linux with a broken read syscall. And if there were tests, one could still ship a broken read syscall anyway.

Tests do not guarantee the absence of bugs.


> If some hardware breaks it, the hardware must be seriously broken.

And it sometimes is.

> If a change breaks it, some one will probably have called read(2) a few times before that kernel is shipped to you.

I've seen to many kernel bugs on new hardware to believe this is enough. Maybe not in read(), but still in something that should work. Throwing testing on users may have been acceptable in the past, but not anymore.


> Also there are people and companies that compile their kernel from the master branch and run software on it.

Genuinely curious: why would a company ever do this?


I was thinking of Red Hat (for example), that pays kernel developers and regularly tests and backports features for its paying customers. That means having a finger of the pulse of the kernel tree, and trying changes to decide whether it might be worth to port them to their supported kernel branch (which lags behind master).


Ah, Red Hat is indeed a good example. Thanks for clarifying.


> why would a company ever do this?

Distro kernels are general purpose and include a lot of stuff you do not need.

A custom-configured kernel is much smaller and (consequently, theoretically) more secure.


Not sure I follow. A custom kernel doesn’t need to based on master.


If you have 100000 servers it's in your interest to test HEAD on 100 of them to be able to catch and bugs before they hit a release. That's how open-source should work!


Why not run 1-2 releases behind and test/qualify a later release?

A commenter below gave a good example of a company that would need to run a bleeding edge kernel (Red Hat). The vast majority of companies would never need to do that though.


> Why not run 1-2 releases behind and test/qualify a later release?

This may work for most people, but not all. It's easier to get things fixed while developers still work on something, than weeks/months later. Also if you have not most-popular hardware/software configuration, testing done by others isn't necessarily sufficient for you.


Maybe your company needs that super duper new feature that's on master. Or you're Google scale and that 0.5% improvement in IO throughput would save you millions a year. You wouldn't want to wait until it hits the stable branch if possible.


If the read(2) implementation itself never changes, then unit tests of it shouldn't ever break, I think...


> If no one complains, either no one is using it, or the software is doing its work.

Or it crashed and they couldn't be bothered to report it. Looks the same as no one used it but has a different cause.


If one person globally used a feature, it crashed and they didn't report it (so it was not critical for them), it's a massive waste of time and energy to commit and maintain a test case for it. In fact, perhaps that feature has no place existing at all in the first place and should be dropped.

You're underrating the fact that Linux's massive and diverse user base is a great filter for bugs. Only the weirdest heisenbugs can survive this filter, and because of their nature, they wouldn't be tested anyway.


We're supposed to be reporting Linux crashes?

I always assumed the fact I can reliably break Linux on Laptop A with Docking Station B by doing Action C was pointless to report, because how is anyone supposed to reproduce it if they don't own Laptop A and Docking Station B?


Can you reliably break Linux on Laptop A with Docking Station B? Then go here: https://bugzilla.kernel.org (and follow the big yellow notice)

There's a good chance it's a firmware or hardware bug Linux cannot fix, but it's worth a shot.

But to answer your question, Laptop A and Docking Station B probably interface with each other with the same two chips/subsystems present in other devices. And if they work with Linux, the maintainers of these drivers, often the manufacturers, are publicly listed and are the ones that will try and troubleshoot it with you.


Sometimes it’s just obvious like a bad assumption.


At the company I am working 80% of the code needs to be unit tested or you won't be able to commit. This results in tons of useless unit tests and you learn to restructure classes in a way that makes writing the unit tests later easy instead of what would make the most sense, for example exposings fields that should be private.


I'm a huge proponent of good testing, mainly because it's so much easier to work sustainably on a code base with good tests. But I hate, hate, hate test coverage mandates. Every time I've seen them, it's as you say: some people write garbage tests to hit the metric. Before with those people you wouldn't have been able to trust their code. Now you have two problems, because you can't trust their tests either.


Sometimes you are forced to write a garbage test because there is not much to say about some simple code in terms of tests.


That's an interesting phrasing. Who forces you?


It's integrated into the build process. The build fails if there are not enough lines covered by tests.


You're lucky. It's 96% for me. I guesstimate that the tests are >2/3 of the development effort.


> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

I’m not sure why you’re expecting the answer “no” here. The biggest value of automated tests is non-regression. So with billions of users, yes, you absolutely need reliable and reproducible tests. Maybe not unit tests (I agree they’re overrated, you want to test contracts and behaviors, not implementation details) but a strong integration test suite is a must have. To ensure you don’t break anything your billion users might rely on.


I do actually agree with you, what I am saying is that a big part of Linux testing is crowdsourced: you and I and hundreds of millions are testing if there's a regression every time we run Linux, or running buggy, incorrect software on machines in random levels of stability and soundness.

Few unit tests in the kernel would be able to compete with so many chaos monkeys (a reference to Netflix's https://github.com/Netflix/chaosmonkey).


Unit tests and TDD should be used as tools when necessary. Unfortunately, many team leads and higher-level engineers don't take the time to explain to juniors when to use these tools. Just that the tools are essential. And therefore, for many engineers, they have become a religion, or worse - a cargo cult.

But really, most code does not need to be tested. The parts that are likely to break when others change them should be. Testing can mitigate risks for complex, esoteric, highly dependable, domain-specific, and technical code. In such cases, tests can save a lot of time that would be spent fixing defects later. In contrast, testing as a religion is entirely counter-productive. It's a time sink in an effort whose whole purpose is to save time. And what is worse, poorly written tests make the codebase less maintainable, more complex (to pass wrongly written tests), and degrade code quality (through a false sense of security).

Functional/integration testing can suffer from the same problems. There's nothing more annoying than fixing a defect in a system only to uncover a mess in functional tests that relied on the bug. Of course, some of this is unavoidable. However, cargo cult-style testing is avoidable and entirely self-inflicted harm in many teams.

In short: testing is a tool to save time and increase code quality; such a tool should be used where it saves time and improves code quality, and it should not be used where it just results in the opposite.


I can't upvote this enough.

It breaks my heart when I see a manager with some engineering background, whose idea of better is to make the code coverage number higher.


Is it weird if I would be *thrilled* to work with a manager like that?

People here have great arguments against tests. I agree that they’re not always useful. Aiming for 100% code coverage is a waste of time.

But if a manager wants easily measurable metrics to hit and they have the budget and bandwidth for it, why should I complain?

Writing tests is easy, predictable work. My rate is my rate and if this is how management dictates I spend my time, I’ll take an easy couple weeks and paychecks any day.


> Writing tests is easy, predictable work.

But is it useful work? This is the problem: engineers have at some point decided that keeping busy writing tests is better than finding other ways to write correct code.

Now that TDD has died down as a religion the current fad became strong typing. I will repeat it again and again, I have yet to see an error caused by me passing an int where an array-type was expected in Python or JavaScript. I have been doing this for a long time. This too shall pass and we will see posts like “Show HN: new lightweight dialect of XYZ without the burden of types”.


Static typing is machine-readable and machine-validatable documentation largely of a sort you should be writing anyway. If adding static types is a substantial burden either you're stuck with a really bad language, or you weren't writing enough documentation in the first place. Further, if things like much-improved autocomplete, far better editor-provided errors while editing, and better refactoring aren't saving you time, overall, we must write code very differently. And during handoff of code, static types are really nice and save tons of time.


I am not writing code to make it easier for the computer to do its job. I am writing code so that the computer can make my job easier. Wasting time telling a computer how to show me autocomplete suggestions is better spent on creating APIs that are easier to use and require less of it. Or creating smarter autocomplete algorithms.


I think what the person replying was saying was not about static typing being easier for the compiler to understand, but the fact types being documented in some way(whether it be embedded in syntax or via documentation) make code far more readable.

Python was my first programming language(cliche, I know) and I didn't understand the whole "static typing is good" thing until I learnt languages like C, C++, C#, and rust.

Python also introduced type hints and made them a valid part of syntax for good reason. You'd typically end up writing a docstring with the return type of a function and the types of the parameters anyway, which is what the person replying to you was presumably referring to.

It's not about passing in an int instead of an array, but rather but being able to figure out what a function wants without needing a stackoverflow thread or having to search through tons of documentation for something that would be otherwise trivial in statically typed languages where the function declaration tells you quite a bit about a function.

Hence it's safe to say that types documented in code in some way are very useful for humans, and if you have issues with statically typed programming languages, there's a chance you were not documenting your code enough already.


> Or creating smarter autocomplete algorithms.

Ah, the Sufficiently Smart IDE, companion to the Sufficiently Smart Compiler. Sure, go ahead and wait for the magic autocomplete algorithm that can deduce precise function signatures in a large Python or JavaScript code base.

Meanwhile, some of us have work to do, and we'll use the best tools at our disposal for managing non-trivial code written by many people across teams. And those tools include modern static type systems.


Refactoring is easier when you have either tests or types, but I think it's easiest when you have types as tests don't always remain compatible during refactorings. In TypeScript I let the types guide me as I change code. Too many times I've looked at a type error, thought "why would this be an issue", ignored it, patched up my test suite, and realized the type checker was right all along; I would have saved time just listening to the types.


Not for nothing, but static type checking isn't some kind of fad. It's been the rule, not the exception, since the 1980s at least.


For some languages it’s a whole lot more important than others. Python and JavaScript are memory-safe so you don’t need to worry about writing a bigger object into a too-small memory allocation.

You will discover that you reversed the order for parameters to a function by running your code with much less effort than annotating all your code. This type annotation is entirely useless:

    def get_user(email: str, create_if_not_exists: bool)


Additionally, if you are testing the code anyway for other reasons (like, to check that the logic is correct), you exercise the dynamic type checking for free.


And then it silently creates a user named True because the non-empty email address you passed as `create_if_not_exists` evaluated as truthy.


I have had a number of times where static typing would have saved me a lot of time in both Python and Javascript. I have a recursive-descent parser for a non-trivial language in Python, so when you pass something incorrect down in the bottom it bubbles up and fails up the stack and a bunch of calls later. But if I'd had static typing, the compiler would have caught it immediately. Similar story for a 3D model generator in Python. I've wasted enough time on stuff like this that then next Python project I do that I think will be over 1000 lines long I'm going to write in some statically-typed language.


I really don't understand how strong typing does anything related to testing behavior. I keep seeing this statement and it makes me believe maybe these devs idea of testing behavior are testing nulls or maybe trying to pass in hashes with missing keys. What a waste of time.

If you don't have tests, all that means is you or somebody else is testing it manually.


I equate the two as popular HN religions, not as substitutes for each other. TDD is on its way out as a religion. Strong typing in dynamic languages is gaining popularity.


I don't think it's weird at all, in general. I actually believe that's the thinking of the majority of programmers. As employed by Tata, HCS, HP, IBM, Oracle, Cognizant and so many other huge corps employing developers.

We feel it "funny" because that culture is perpendicular to the Startup culture that abounds in this forum. But number-wise, it's a minority.

The great majority of people are not wholly in love with their craft. They just want to do their job to get paid. They dont read about Rust on weekends, and dont program yet-another-X in their spare time.

And that Ok. Thats what the majority of the world does.


> if this is how management dictates I spend my time, I’ll take an easy couple weeks and paychecks

I would rather my employees (and contractors!) work on what's actually valuable, and not just take blind instruction. The people closest to the code should be most empowered to improve it. Is that not a common management view?


I find writing tests is often not easy because test tech debt is usually worse than regular code tech debt! Really with non TDD the tests are doing gymnastics to test that code. Code smell example: GetInternalHitCounter_ForTests()


> It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.

Okay, but Linux has internal implementations of base64, assorted hashing algorithms, some compression, probably other algorithmic tidbits; wouldn't those be easy to test?

Edit: And downthread people are saying that Linux does do that:)


I was genuinely interested in OP's question and was hoping responses would not fallback on the much-already discussed "TDD or not" topic.


The Linux Elantech V3 touchpad driver for an old 2013 laptop of mine (and most laptops of the similar era) has been broken since 2019 (reports half the size compared to actual coordinates, offsets vertical coordinates, so the entire right half is interpreted as edge scroll). I debugged and proposed a fix months ago, and there's been no motion towards merging it: https://lore.kernel.org/linux-input/20221014111533.908-1-nya.... Evidently, the amount of "run stuff and see if it breaks" they're doing so far is insufficient for niche hardware.

Additionally the in-tree Realtek R8169 driver fails to redetect Ethernet being plugged in after I unplug it for 5-10 seconds, and I had to resort to the out-of-tree R8168 driver as a workaround.


Ok, you're essentially saying that kernel code is too important to be tested. Honestly, that logic sounds like Effective Altruism levels of twisted to me.

So if tests don't cut it, what does the Kernel community then do instead of testing to verify correctness?

> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?

That's exactly the kind of code that needs tests.

What happens if someone changes some code that affects the read call and expects it to run on said billion of devices?


> It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.

I think that this heavily depends upon the part that is tested or not tested. For certain parts in the kernel, such as the firewall, I truly believe, that such tests cases (including corner ones) shall be present.


> that syscall is called some 10^12 times per second globally?

wow. that blows my mind. is that the most frequently invoked piece of code in the world?


Probably the context switch logic (save registers, switch to ring 0 mode, etc.) in the kernel might be. That's called for every syscall, up to thousands of times per second per active core.

It would be cool to know which is the most frequently called piece of code in the world. Maybe something that hasn't been changed in decades.


I'm not qualified to do it, but someone should write a blog post figuring this out


Thank you! In a previous job I was doing the equivalent of integration testing and the very young collaborators were bashing me and demanded that I contribute to small random and useless (according to them even) tests that were not really testing anything.


Yep.

Only time I've written 100% branching coverage unit tests was when I needed to implement a spec, with many alternate representation formats, and a plethora of test cases I could run through as a litmus test of conformity.


I agree, but how about regression testing?


Unit tests are poorly suited for kernel code. The *vast* majority of production failures in kernel code are due to race conditions and other timing-specific issues, which unit tests are extremely poor at reproducing in a way that mimics how the code is used in real life

Another huge source of issues is due to hardware doing something unexpected/not-to-spec - another thing that unit tests would very poorly verify given that, any unit test will simply reproduce how the developer thinks some piece of hardware works, rather than what it does in real life

Production kernels are usually better served by long-running stress tests, that try to reproduce real-world use-cases but inject randomness into what they are doing. And indeed, both NT and Linux kernels are extensively tested in this fashion


As a HW guy this seems insane to me. Just because it's hard to do doesnt mean you can't randomise and try to create and catch failures. Might not catch 100%, but better than not trying. HW testing pre silicon is impressively comprehensive as the cost of respinning a chip is in the order of 10s of millions, so the industry found a way. While software is less costly to edit, the cost of updating and rolling out is essentially huge, but uncounted. Finding bugs by extensive testing, even for difficult to catch items, is inherently the right thing to do


As HW guy you are making the hardware so you can make a mock of it relatively easily, that is not the case if driver is essentially reverse-engineered out of how windows driver worked.

And many other drivers are "some vendor made it, and they won't tell us how the chip actually work, just give us driver code".

Then as other people mentioned hardware isn't exactly 100% predictable, HW bugs happen, firmware upgrades change how stuff works etc. so your test suite might be entirely unable to catch that.


I know hardware isn't 100% precitable, which is why we expliclty try to test that behaviour through randomized testing, formal proofs, etc. I'm not mocking anything, I'm just saying it's possible to attempt to cover difficult to create scenarios.


I believe the parent was referring to Test mocks[0], not implying you mocking anything.

0. https://en.wikipedia.org/wiki/Mock_object#Use_in_test-driven...


I see, thanks. Well we do have events in HW that are non determistic, i.e. metastability due to async signal synchrinzation, but that is something we can and do model as best we can


As a combined hardware/software person, the testing for both types of systems is very different.

In hardware, you often have rigid boundaries with well-defined behavior, and almost no state in each unit. A big part of hardware verification is just testing that you did the basics of the interface right and that you won't deadlock anything. The rest of it is testing the behavior of your nearly-stateless thing.

Software testing tends to involve exponentially larger state machines, and fuzzier abstraction boundaries. The basics are mostly done for you, and you tend to do complicated logic that holds and manipulates a lot of state.

"Fuzzing" in software is very similar to constrained random testing in the hardware world, but it still doesn't catch a lot of basic issues due to the size of the state machines involved.

The testing philosophy you need to apply is not the same at all.


It's interesting you have the perspective that hardware has realtiveley few states. It can be the case, but there are very complex interactions between functions also. A million gate design is realatively modest by modern standards, and there can but tens to hundreds of interacting state machines in that. Of course to mange this we do try define clean boundaries and interfaces between blocks, but this clean approach can be taken in SW also. Fuzzing is a bit different in that actually tries to execute different code paths by changing the input, so it's more of an exploration of the code state space. Constrained random is more trying to create inputs that randomly cover the valid input space. It does not introspect the internal state to determine the next input. Perhaps thats an enhancement hardware testing could take from the SW world.


I have seen functional-coverage-driven constrained random verification of hardware, which does what you describe. Someone beat you to it!


They do apply a lot of fuzzing tests actually and the kernel maintainers also contribute state-of-the-art dynamic analysis tools like UBSAN to check for undefined behaviors or leaks.


I think you have conflated "tests aren't useful" with "*unit* tests aren't useful". I am asserting the latter, not the former! Testing in general is absolutely something that you can and should do when working on a production kernel


> The vast majority of production failures in kernel code are due to race conditions and other timing-specific issues

This assertion calls for links to data that backs this up I think.

Published counterarguments seemed to be easy to find, eg this survey suggests that most bugs are reproducible ("bohrbugs"): https://xiaotingdu.github.io/publications/TR2018.pdf

(journal link: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=24)


I think that is fair to say for many hardware device drivers, but things like filesystems definitely should have a place for automated testing, eg. just having a bunch of golden values that verify correctness in backward and forwardcompatibility of the fs drivers should be great.


> but things like filesystems definitely should have a place for automated testing

And that is what xfstets is used for.


I don't suppose those long-running stress tests can be automated in any way, can they?


They are.

Many many companies run tests against the various kernel trees/branches, and will flood you with emails if you break the build.

I can't remember who (I think IBM maybe?), but Greg KH said one company made a fantastic test suite out of nowhere. If you submit six patches, it will try each one, tell you what you broke in patch 3, and recommend a way you can fix it. And this just gets emailed to the person who submitted the patches, and the maintainers if they actually merged it.

He said it was insanely advanced, and was made without the knowledge of the kernel maintainers. A company just made it and started sending emails one day.


Is there are literature on that IBM system? It sounds fascinating


Perhaps they’re talking about the (Intel) kbuild bot?


They are. There's a project by google called syzcaller, which pretty much does that (it's a bit more intelligent than that, and also includes running sanitizers, which find otherwise hard to detect memory safety and concurrency issues).

It's actually causing an unexpected problem: It finds so many issues that kernel devs have troule keeping up fixing them.


So it turns out kernel does have unit tests, just that they are running after release.


No, this is way above the unit test level. Fuzzers usually are.


Excellent! I keep complaining after each vuln pops up here that there isn't anyone with a kernel pipeline. Hopefully they have some static analysis too? Does the data out of this effort support the ggp that most of these defects are concurrency related vs the norm of memory safety?


Honestly, it's pretty rare I find a use case for testing period.

If I'm writing a library which contains algorithms or something like that it's pretty easy, seems useful or necessary.

But a lot of the time I'm gluing things together through API calls and it just seems pointless ?


It’s exactly the other way around for me. Because a test takes one input and checks the output.

An algorithm for, say, quick sort, will “test positive” on the same unit test that uses a brute force approach. A performance test might pick some differences up, but at that point that’s not a unit test any more


So basically, you find it important to test already tested code, with...your own tests?


Yes, that is precisely what I said, you got it right. Have a good day!


> The vast majority of production failures in kernel code are due to race conditions and other timing-specific issues, which unit tests are extremely poor at reproducing in a way that mimics how the code is used in real life.

Is this not due to the legacy of a language that allows race conditions? Take Loom [1] in Rust which is a tool to test your concurrency primitives, ensuring that deterministically all interleavings are tested and verified to be sound. Then, when you expose a safe interface, you know that the only potential issues you have are user logic errors or not covering the entire problem space with your tests.

https://docs.rs/loom/latest/loom/


You also interact with hardware. An RC-safe language does nothing when you suddenly realize there is an RC baked into the disk driver you are using and if you send it specific commands in a specific order too fast it does something unexpected.


Yes, that widens the problem space considerably, especially if it depends on the rate of access or something similarly non-deterministic. But you're still fundamentally testing interleavings through a synchronization library.

Still, reducing the problem space to such problems instead of any time you reach for atomics or a mutex, I imagine, removes the majority of bugs, so lets not throw the baby out with the bathwater because it's not perfect.


I'm surprised that I had to scroll this far down to find a Rust fanatic complaining about C.


I complain similarly about writing deeply multithreaded code with synchronized data structures in Java, Go or whatever. It's a nightmare the day you want to refactor anything and have to keep the entire implicit state in your mind to make sure you don't trip over yourself.


Race conditions are not limited to those “caused by language”, whatever that means.


There are unit tests for the Linux kernel and even a kernel-specific unit-test framework. See: https://kernel.org/doc/html/latest/dev-tools/kunit/

Here is an article about the various ways Linux is tested: https://embeddedbits.org/how-is-the-linux-kernel-tested/ Unit tests are just a small part of the array of automated testing used.


Every "why" question on the Internet contains at least one wrong assumption. Here's yet another data point.


Thank you.


Maybe not "typical" unit tests, but Linux does have test cases for stuff like the generic library code (data compression, encryption, ECC, ...) in the shape of special modules that, when built, run through their test cases during boot up.

It gets harder to test low-level driver code, but some subsystems have similar sets of tests, e.g. MTD has a set of tests for drivers & flash chips that they interact with. In 2016 I ported those tests to userspace as part of mtd-utils. The tests for the kernel crypto stuff I mentioned would also test hardware accelerators, if enabled.

Filesystems have the fstests project (formerly xfstests, as it was originally written when porting XFS to Linux), testing filesystem semantics and known bugs. There is something similar for the block layer (blktests). The downside being that those test suits take rather long to do a full run.

Those are just things that I can think of on top of my head, given the subsystems that I have interacted with in the past. Static analysis on the kernel code is also being done[1][2], there are also CI test farms[3][4], fuzzing farms[5], etc. As others have pointed out, there is a unit testing framework in the kernel[6], IIRC replacing an older framework and ad-hoc test code.

[1] https://www.kernel.org/doc/html/v4.15/dev-tools/coccinelle.h...

[2] https://scan.coverity.com/projects/linux

[3] https://cki-project.org/

[4] https://bottest.wiki.kernel.org/

[5] https://syzkaller.appspot.com/upstream

[6] https://kernel.org/doc/html/latest/dev-tools/kunit


I recently took a class on linux kernel development, and this is indeed correct, you can always expect when developing for the linux kernel to get your bugs from external sources. If your code has a bug or breaks something, you get a nice email from them telling you what errors you have.


As huge parts of the kernel are effectively device driver code which interacts with hardware on a low level, classic unit testing techniques like mocking can't be applied - or they could be, by writing a mock for the device in question, but that's usually prohibitively expensive if you aim for a fidelity level at which such a mock would actually be of any worth and wouldn't just provide you the "good feeling" of having some unit test coverage while not really increasing the confidence in the tested code.

This is especially true when we're talking about reverse-engineered device communication protocols, where you don't have a clue how the device actually works internally, and thus lack the basic means of constructing a mock that does more than just implement the exact same assumptions you've already based your driver code on (resulting in always-green tests that never find actual bugs). But also in cases where you have a protocol spec, the vast majority of ugly bugs in drivers usually originate from differences between that spec and the behavior of the device in the real world.


> or they could be, by writing a mock for the device in question

That would create a good description of the hardware that could be used for emulators and to test whether actual hardware behaves the way it was specified to behave.

> the vast majority of ugly bugs in drivers usually originate from differences between that spec and the behavior of the device in the real world.

Having the model hardware encoded in software would make it possible to test its behavior against real hardware for comparison and allow making adjustments.


That takes massive amount of effort (most likely writing the emulator is way more than the actual driver code) and still only allows you to say "given those assumptions, that might be not correct, the code works".

> Having the model hardware encoded in software would make it possible to test its behavior against real hardware for comparison and allow making adjustments.

So you waste weeks to set up a hardware test jig, trying to match it with software and next firmware update changes some stuff that makes model incorrect again.

I mean I would expect that if you're a company that wants to put a driver for your hardware into kernel, but many drivers are just RE stuff and setting up a test jig for hardware that allows for fully automated testing can be quite an effort


> if you aim for a fidelity level at which such a mock would actually be of any worth and wouldn't just provide you the "good feeling" of having some unit test coverage while not really increasing the confidence in the tested code

When did you acquire such taste for luxuries?

I’m convinced that fully 75% of engineering effort in Silicon Valley is exactly this kind of testing. At least it is in my company, and we get people from all the big names, and none of them think it’s weird. Sometimes I wonder if I’m insane.


If you want to contribute unit tests, I believe that Linus would be open to the idea. For example, there are a bunch of internal APIs: linked lists come to mind. If you wrote unit tests for an internal API, he might accept it. I would start very small -- one .c file with no more than 20 unit tests -- to gauge if there is interest from the community. It seems like a nice gateway to get junior developers (university-level students) in the kernel development. Reading open source mailing lists, I have learned that the best way to "break-in" is a tiny code change that is well-tested. It is hard to reject this type of submission. Be unopinionated, but competant in your submission, and they will be open-minded.


> ... there are a bunch of internal APIs: linked lists come to mind.

Those already have unit tests:

https://github.com/torvalds/linux/blob/master/lib/list-test....

using an existing in-kernel test framework no less.

If you step up into the 'lib' directory (containing generic in-kernel utility code) you might notice that there are a whole bunch of C files already that have an "_test" suffix in their name (or "_kunit").


> If you want to contribute unit tests, I believe that Linus would be open to the idea.

Not saying this is not true but has Linus stated this anywhere?

Because once unit tests have been introduced, the question becomes: Who / what executes those unit tests? Who makes sure no one breaks them? You basically need CI pipelines and infrastructure and suddenly the tiny code change is not so tiny anymore.


You can also run them yourself and just submit bug reports. You don’t need anyone’s permission to do that.



If I understand correctly the documentation, this was implemented fairly recently, which would explain why I missed it the last time I researched this topic. Nice to see progress on a framework unit testing in the kernel!!



Anyone ever used for non-kernel work? (Is that even possible?)


I just love threads that start with "why don't they do X ?", followed by dozens of comments explaining why "X is useless, they would never bother doing it, and you should not either", followed by "maybe you should try doing X, who knows", before someone mentions, "hum, well, actually they have been doing it for a few years now" - and that point _not_ being the end of the conversation.

Fontenelle, golden tooth, etc..


Personally, I am a fan of meta comments ranting about comments explaining why X is useless.


Feels like HN comment sections are like seeing all the thoughts a brain can come up with on a subject written out in parallel, including the dead-ends and false-starts and "on second thought"s. Normally a person only continues thinking the resulting two or three most compelling thoughts, and only says one of them out loud, but on here you get to see the whole process.


There are tests, they're mainly just out of tree, focused on integration rather than unit, and very decentralized. You'll get nastygrams on lkml if you break them.

Here's one prominent example: https://github.com/linux-test-project/ltp


I think the comment about race conditions is spot on.

A while, back on hacker news, there was an article about a company developing a database. Their entire testing methodology came down to producing a deterministic kernel and thread scheduler. This allowed them to simulate every possible permutation of their concurrent code in scientifically speaking reproducible manner.

Developing this test framework was actually the majority of what the company did. Testing the kernel would be a similar level of effort.


Quite possibly FoundationDB - they do a lot with deterministic simulation

https://www.foundationdb.org/files/fdb-paper.pdf


> every possible permutation of their concurrent code

Did they test against weak memory models? Because weak memory models do not necessarily have any equivalent interleaving behavior (stores can appear in different orders on different cores).



Testing in production, like one should, lol.


Being shocked by testing in production is akin to being shocked by air in the atmosphere.

Atmosphere is but air, and production is but testing: a (hopefully) very long test run, which will only end when the project will be dismissed.


Everyone has test environment. Some people also have production environment!


;-)


It's like crowdsourcing a UAT team!



> This makes me wonder why aren't there any unit tests in the kernel (and the different drivers in it; especially file system related).

Others have already commented on the testing situation for the kernel in general (historically, the tests were all in separate projects outside the kernel, and more recently, the kernel itself has a testing framework), but for filesystems in particular, there's an external project called "xfstests". The name might imply they're only for xfs (and they were originally made to test xfs in particular), but they're used for all filesystems (and also for the common filesystem-related code in the kernel).



huh, unit test for low-level C codes? I suggest to learn kernel self test -> Linux kernel contains a "self test suite" and this tool testing coding at module compile time so these are intended tests to exercise individual code paths in kernel and tests are intended to be run after building, installing and booting a kernel.


Linux has unit tests for portions of the kernel. Look in the tools/selftests directory.

x86 in particular has quite a few, although they run from userspace and exercise stable ABIs, so one might argue that they’re really integration tests.


With plain logic application, writing good unit tests is much harder than writing good code. Probably by a magnitude of 10 or more. Think about it, you have to surround every path in code by a test case ! You have to create mocks everywhere and end up consuming pretty much the whole team time in writing and maintaining tests, whereas in less time through code reviews/rewrites you could have fixed several issues, improved readability, improved team knowledge, facilitate long term support, and so on...

In software real troubles comes from integration, not much unitary stuff (unless you praise Monkey coding).

And for Linux, it is not just a plain logic application but a kernel that runs on different hardware and has huge base of users and applications. Testing the thing as whole is the only thing that matters. Releasing alpha and beta software is a much more sound, rational and efficient approach. IT industry is (or at least was) organized to test before production.

I am certainly not against unit testing. It remains a wise approach for piece of software that need to be glued in stone forever or have super bounded in/out outcome (eg: bank transfers). But it is nothing more than a tool that you might use or not based on situations. Certainly never a one fit all solution !


Unit Tests are a colossal waste of time 80% of the time, and at least 80% of that time should be spends in thinking how to make your system debuggable.


Disagree, but depends on context.

When you have a complex monolith, unit tests and static typing allow you to create features and refactor the code fearlessly ("Computer says no" is a good thing here).

For example, a decades old cad product with millions of lines and tens of developers doing parallel changes. The unit tests allow the continuous integration system to dish out isolated reports on failing tests before pullrequest is merged to main, instead of integration test just bombing at some random place.

Unit tests save time.


The problem is, you don't know which 80%


A good rule of thumb is that if a unit test was easy to write then it is useless because either the code it’s testing is so simple that it will never change and/or does not warrant testing, or that the unit test itself isn’t actually doing anything useful (for example it could just be doing an echo test on a function that does something very complex).

There are some things that are absolutely worth testing. For example my current project is a library that calculates payroll taxes for US employers. This stuff is self contained, stateless, idempotent, but complex. And yet I know what kinds of inputs are non-trivial and can calculate the outputs by hand, so I can test it.

On the other hand, testing that CRUD form that submits data to S3 as well as saving it in the local database is nearly useless. Run through the form when you update it manually and make sure it works. S3 semantics change rarely, your own database changes would necessitate you changing the corresponding form.


It depends on your field and architecture. Functional-ish design lends itself well to unit tests, where you have known expected inputs and outputs. CRUD apps, not so much.


repls a key item in your ideal list of requirements when it comes to building a system?


So, if no tests, what is the feedback cycle like for kernel work? Write, compile, put it on some hardware, reboot, collect an oops, debug, repeat?


There's other means of coding than trial-and-error.


I see a lot of headlines written in the form of : "Why [subject] doesn't/isn't [verb]?" instead of "Why doesn't/isn't [subject] [verb]?".

Is this an evolution of English of simply a sign the writer is not a native English speaker?

(I mean no judgement with this question)


Engineers working on the kernel and drivers have a dilemma. They cannot test the code for every conceivable platform because they do not have that many resources. They will test the code for the devices they have, and then ask for others to test on other similar platforms. This is because the code is near metal. The most the developers can do is to develop using the design-by-contract methodology. But that too goes out of the window when others start supporting more platforms and devices.

So the only approach that has worked is solid code reviews. There was a very interesting email exchange between Linus and an engineer from Sun. Linus was rejecting his patch because there was a spelling mistake in a comment. That's how strict he is about code quality.


While uncle bob and other unit test dogmatists have "convinced" some cargo-cultists and inexperienced people that "code without tests" doesn't work (and of course they do some fine goalpoast moving while at it) it is a fact that it, in fact, it does work.


Without tests, how do you know your code actually does what you think? Run it and manually click through every type of functionality?

And without tests, how do you know if you introduced a bug or not? Send it to QA and wait for a week?


Because it runs? And people are using it?

Do you really think most code out there is fully end-to-end tested automatically?

Having integration tests are a good idea. Mandating TDD dogmatism and selling unit tests as the only type of test acceptable is what's wrong here


I agree that integration tests are more important, but a lot of components (particularly anything mathematical or with a functional design) benefits hugely from unit tests.

For example, checking that your backoff works correctly in the case of dropped packets is really hard to do at a higher level, because that part of your API isn't exposed publicly but is still critical to the correct function in degraded environments.

The people who insist on unit tests for 1-line functions are crazy and dogmatic though, I agree. But if a large fraction of your codebase is 1-line functions I'd also argue your architecture is crap. A codebase where effective unit testing is useless or difficult is a distinct smell.


It‘s the classic „what came first“ question that everybody knows.

What was written first: A. Unit tests B. Binary code

(The answer might shock some of you…) -


There are studies on the effectiveness of unit tests (and other layers of tests) from the 1980s. Unit tests catch about 20% of bugs. Integration tests catch 40% of bugs, but the overlap between the two is small, so you still benefit from having both.

Which one came first is irrelevant, the question is if they help us write better software.


That is an interesting fact, thank you. My point was a different one - and not a serious one… but anyway. Thanks for the stats!


Tests != Unit tests & TDD != Unit tests

Linux definitely has tests.


Presumably there are some sort of automated tests though. If those are working well enough to find bugs there's probably no need for "pure" unit tests (which I normally write because of performance and reliability issues that plague automated tests running at a higher level). It makes sense that kernel OS code would be more suitable to such higher level testing, given it typically doesn't have expensive/slow dependencies on external resources out of its control.


They do perform automated fuzzing which isn't the same thing but for most of its use cases like drivers, it might be enough so long as coverage is validated routinely.


Unit tests are a waste of time. System level tests are awesome. I am maintaining a very large C++ library used in production by large corporations around the world. And I haven’t had a production bug, not even one, for more than 6 years. The reason why is 9000+ system level tests. A combination of 1500+ hand written tests and 7500+ auto generated regression tests.


I have spent more time fixing bugs in unit test than fixing bugs they exposed. The one exception is in untyped scripting languages. Test are necessary to run every piece of code to at least attempt to insure a type or function signature has not changed in a way that completely breaks the code.


Please forgive me for a potentially dumb question:

Is there a reason something like the scheduler, which should be mostly algorithmic, should lack unit tests? I can see an argument against some aspects of device driver testing, but the scheduler is in a different class of code, right?


There are automated (integration) test rigs that run your changes that watch Linux kernel branches. If you send a patch to a Linux kernel maintainer then it will make it into their branch and then you (the author of the patch) will get emails with the results of the automated tests.


That‘s super cool! Did you personally, patch something in the kernel? May I ask what it was?


I fixed some stuff in the Allwinner A20 (an ARM CPU) audio driver. Added a lot of mixer controls :)


My intuition tells me the minimal testing + community testing is not sufficient. But I’m curious what reality tells us. How often has this been a problem? How often are there significant bugs that testing would have caught?


I don't have that kind of data, so any numbers that I might say are based solely on my own experience and my circle of friends working in tech. That said, a friend of mine hit an EXT4 resize bug (https://www.spinics.net/lists/linux-ext4/msg86163.html) the other day. IMHO regressions on fairly common features should be captured by unit tests and should never get their way to production.


Unit testing is a way to debug logic error, not "hardware" error.

It's like quarantine when you know a part of codebase is infected by "bugs"


Ceci n'est pas un test.


The real reason is there's just too many units


Linux test project exists


It’s because unit testing is hard and most OSS devs are wankers.


Linux didn't have unit tests.

For the most part we had Linus eyeballing every line of code before merging it. And if you wrote an extra if or did a boo boo using the wrong enum flag or you overran your buffer, you were flogged, berated and pitied in front of an international audience of engineers.

I don't know how things go now in the post-sensitivity world.


My opinion:

Linux Kernel coders are real programmers. Do you think Mel wrote unit tests?[1]

Real programmers are discrete mathematicians in their head. With pointers.

Their code isn't perfect.

By analogy, when Andrew Wiles published his some 109 page proof of Fermat's Last Theorem it had bugs. [2]

The mathematics community tested it and he eventually corrected it. The Linux Kernel is like that.

There are no unit tests for a^n + b^n = c^n because no integers above n=2 satisfy them.

You can't unit test your way to secure, correct code in the kernel either. Only a community of testers and verifiers can do that.

"given enough eyeballs, all bugs are shallow"[3]

[1] http://www.catb.org/jargon/html/story-of-mel.html

[2] https://en.wikipedia.org/wiki/Wiles%27s_proof_of_Fermat%27s_...

[3] https://en.wikipedia.org/wiki/Linus%27s_law


>> You can't unit test your way to secure, correct code in the kernel either. Only a community of testers and verifiers can do that.

Unit tests cannot show the presence of bugs that you are not teating for, but they can show the absence of bugs you've fixed and do not want to re-introduce / regressions.

Not having unit tests strikes me as either hubris or lack of resources.

Are you really going to have people run repeated tests even if some of the tests could be automated? If a test cannot be automated, then that's understandable, but why wouldn't you automate what you can with unit tests and integration tests?

I would further argue that even if unit tests, integration tests, and a community of testers and verifiers are working together there will still be bugs, some of which will be critical.

Why wouldn't you use every tool at your disposal in addition to community / people to find and fix the bugs?


If interface A is verified to be correct code(not tested) and it's replaced with interface B that's also verified to be correct, why would there be any regressions?

The reason I see not to do it is it takes time away from writing correct code. I could spend half my time writing unit tests or I can spend twice the time writing the code in a verifiable language.


>> If interface A is verified to be correct code (not tested) and it's replaced with interface B that's also verified to be correct, why would there be any regressions?

How are you verifying that interface A and interface B are correct?

You would either need to test manually, test automatically (i.e. test automation, unit tests, integration tests, fuzzing, and perhaps even mutation testing or design by contract), or maybe you are using formal methods (proving correctness with mathematical rigor). If you are not testing, how are you verifying that the code is correct?

"It works on my machine" or "my teammate peer reviewed it and did not see anything wrong" are pretty low standards and how a lot of bugs get shipped and discovered in production.

>> The reason I see not to do it is it takes time away from writing correct code.

Again, how do you know the code is correct? Spending more time writing code is pointless if the code has bugs and exploits.


When Andrew Wiles writes a flawed proof he gets to think for a while and try to fix it. When someone puts broken code into Linux someone else gets to think for a whole and try to fix it. Except they’re losing thousands per hour or someone is having their phone hacked. Unit tests are just one part of the story, but they can be very important.


I think their point is that Andrew Wiles' flawed proof was shown to a community and improved without any automated testing because something can be verified before it's integrated and that verification never changes.

You don't need testing for verified code interacting with verified code in an environment verified to run verified code, but testing takes half the time for "good enough" reliability so we don't spend the time.


I'm not sure I see the point of this comparison. Publication of a bad proof with an untrue conclusion may lead to man-years of wasted research effort.

The dollar figure of the cost of a bad proof, or of a kernel bug, will always depend on the specifics of the case.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: