To add to all the excellent answers in this thread: unit tests are massively overrated.
It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.
What is the external interface of the kernel? What is its surface area? A kernel is so central and massive the only way to test its complete contract with the user (space) is... just to run stuff and see if it breaks.
TDD has some good ideas but for a while it had turned in a religion. While tests are great to have, a good and underrated integration testing system is just for someone to run your software. If no one complains, either no one is using it, or the software is doing its work. Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
Unittests is not the same thing as TDD religion. At its base, it just means dividing things up into small units and testing their functionality as exhaustively as possible. It's a divide-and-conquer.
The hard part in adding unittests is deciding what a unit is, and when a unit is important enough that it should have its own battery of tests. Choosing the wrong boundaries means a lot of wasted time and effort testing things that likely won't break or change so fast that put a drag on refactoring.
I disagree that a kernel can't or shouldn't be unit tested. At the very least, it has a strong interface in the userspace system calls. Of course you should unit test the system calls. Especially because Linux's motto is not to break userspace.
The other benefit from unit testing that is overlooked is that it accelerates optimization. If you have a good set of unit tests that test all observable behaviors of a system means you can optimize the logic inside and constantly rerun those tests to make sure it's not broken. This speeds up the experimentation process and clearly delineates the interface so that you can see when and how you can "cheat" on things that aren't observable.
So, hard disagree. Test the bejesus out of the kernel. It will harden it and open up the ability hyperoptimize the units inside.
edit to add: well-written (i.e. terse and readable) tests are excellent documentation on how a unit should behave.
I both agree and disagree. There are PARTS of the kernel that probably should be unit tested. As I specified in my answer, these tend to be algorithmic. I would also agree that an interface that cannot change ("don't break user space!") makes a good candidate possibly.
However, I think there are parts that probably should NOT be unit tested, because unit testing isn't free and slows down change by getting very close to internals. As an example, device drivers that are tied tightly to hardware. While it would sound nice to have an nvidia GPU fake/mock, in reality it is probably not possible to create one, keep it up to date, etc. The complexity would be enormous and might not show anything as you wouldn't know if it was a bug in the fake or driver.
As such, I retain my answer of: it depends what you are trying to accomplish. Different testing strategies for different types of code and domains.
There are some gray areas, sure. I usually find a way around making mocks, but maybe it's just the domains I am in. My biggest mistakes are mostly just writing tests that are too intertwined with internal interfaces. So they break all the time and need to be adjusted when refactoring. That's a drag.
I think the same is true of any sufficiently complex project. There are areas where unit tests are valuable, and areas where the costs outweigh the benefits. There isn't anything really special about a kernel in that regard (except for the specific things that aren't worth unit testing).
In a commercial setting, unit test coverage inevitably becomes a KPI that needs to be driven upwards regardless of whether rank and file engineers consider it worthwhile in particular cases.
> As an example, device drivers that are tied tightly to hardware. While it would sound nice to have an nvidia GPU fake/mock, in reality it is probably not possible to create one, keep it up to date, etc. The complexity would be enormous and might not show anything as you wouldn't know if it was a bug in the fake or driver.
Ok, but then again: How would you make sure your Nvidia driver is working correctly?
> If you have a good set of unit tests that test all observable behaviors of a system means you can optimize the logic inside and constantly rerun those tests to make sure it's not broken
This is a naïve utopian world view. The word "good" is doing all the heavy lifting for you. In the last 15 years I haven't seen a single company that had "good" unit tests. I'm inclined to believe they don't exist.
Some of the tests I saw turned out to be good. Most of them weren't.
The reality is that if you're committing to unit tests, a big chunk of your tests will be shit. And when that's the case, accelerated optimization is far from guaranteed.
"Deciding what a unit is" isn't the hard part. The hard part is finding the units that benefit from unit tests (and convincing other religious people about this, as I am trying to now, which in this case - not many).
The Linux system calls are "deceptively simple". Simple to test, right? How complicated can say `write(2)` be? But if you actually tried doing it, I'd be surprised if you can write reliable "unit" tests beyond writing to /dev/null.
The practical way to test system calls are to test it with real world usage. Unit tests here might catch some problems, but the vast majority of kernel bugs aren't those that can be caught with unit tests. (If they're bad enough, the bug will cause the OS to crash before it completes booting, no unit tests needed.) The more subtle bugs are often hard to reproduce, only happens in certain loads and hardware.
As to your final edit-to-add: you're joking right? Like Linux syscalls need more documentation. POSIX (I'm aware this is not exactly Linux documentation, but still) was a spec literally before Linux was started.
GP is using the original definition of unit test, GGP is using the religious implementation-level definition TDD advocates have come up with.
From GGP:
> but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.
They call this an integration test, but by the original definition, testing the interface is a unit test. An integration test is how multiple units interact with each other. Testing the implementation details is what TDD advocates have turned unit testing into, by changing the definition of "unit" from semantics (a self-contained module that does one thing from the business perspective) to syntax (a function).
Seems like you agree with the original definition. The TDD version of this GIF would be a test for each part that makes up the handle (the implementation details instead of the interface).
The semantics doesn't really matter either way. The comment I originally replied to basically implied the whole Linux kernel is one "unit" (or at least each kernel syscall is one unit... which isn't much difference due to large interdependencies between them).
Using that definition, that's just basically saying "the kernel can benefit from ... tests". Of course the Linux kernel doesn't have "TDD-religion unit tests", but there's already more than enough "tests" for the Linux kernel, and there's more than enough benchmark tests for the Linux kernel (for the optimization argument). They're just not "unit" tests in any meaningful way.
From the pedantic side of things, the original comment said "(TDD-religion) unit tests are overrated", then the reply was either "no they're not", or "no, unit tests (as originally defined) are not overrated". The latter is somewhat inconsistent with the "hard disagree", so either they were confused as to which definition they wanted, or cherry picked properties from both unit testing regimes.
I don't know how many LOC you could trigger with a write. Including all drivers, filesystems , ..., perhaps millions. Hardly a unit test. And then, a driver is difficult to unit test. BUT some parts of the kernel could use unit tests. The thing is, they don't and the usage of the kernel shows that they would be triggered by usage, but it wouldn't harm.
To add to Guillermo's mantra, a good suggestion from the video I posted is "write tests exercising the internal logic before a big refactor. Then remove them."
Because otherwise, the more tests you have against implementation details, the more times your tests will break just because you've moved some code around. Tests need to go red when there is a bug, not because you've renamed a couple of private methods and now the mocks are broken.
Depends a lot on what it is you're doing. Like if you're writing some finnicky library code, exotic algorithms or data structures or whatever, you'll probably want more test lines than code lines because that sort of stuff is notoriously difficult to get right. Even something as pedestrian as a binary search is downright gnarly to get working in all cases[1], the bugs are difficult to reason about and the code doesn't reduce to more comprehensible primitives.
On the other hand, if you do that with application code, you're essentially submerging your code in a tar pit. Any change in behavior will require changing dozens of tests. Fixing broken tests will become so routine that they stop telling you anything about your code. Development will become "I changed from foo to bar, now let's update the 73 assertions that failed to whatever value the test runner says they have now".
To go farther: tests should not break during a refactoring. They're here to let you know if your refactoring broke something. They should even be green when you decide to scratch your software to rewrite it in another language.
Code coverage? Use it to determine what is dead code you can remove: if your tests don't go through some code, that code is useless.
I agree with the first part of your comment, but not the second. Achieving 100% code coverage is very hard and sometimes unreasonable. E.g. How do you test a GUI? I think that often it is better to write tests for the application logic and test a GUI by using it.
Rewrites are not super common -- but IMHO if you must do one the first step is to move the tests over. Unless your goal with the rewrite is to rearchitect the interfaces at which point your basically greenfielding anyways.
If you’re using a lot of mocks you’re gonna have a bad time.
I think I can agree with the sentiment that mock heavy tests should be short lived, and why you don’t want to keep those around. Not sure I can agree with the rest.
I aim for max of two stubs per test, and many of those end up with a TODO. Pure functions need no stubs, and most well factored code should only need at most one. One stub one test is pretty sustainable. It’s easy to rewrite such a test if the requirements change. Unit tests should absolutely be disposable, but you don’t dispose of them all at the same time. Just the ones that don’t fit the new rules.
I do run into a steady stream of people who can’t seem to understand that the tests should affect the structure of your code. “And then a miracle happens” is what you have there - long intervals where the system produces no verifiable state or output is bad. That’s not an architecture. It’s lack of it. A functional programming style makes this easier to avoid, but it’s not a cure, because the disease is in their heads, the code is the symptom.
There’s a substantial overlap between people with untestable code and people with undocumentable code. They can’t explain the code to the test framework any better than they can explain it to each other.
All that said, testing is hard. It shouldn’t be this hard and we need to keep looking for ways to improve that situation. But even here we have people who reach for the least expressive solution quite frequently, such as assert over more reflective matchers, which make for much more useful red tests. At least BDD style seems to be winning out.
I worked at one place that had so many mocks, your tests were barely testing any real code. In one case, a dude checked in a 5000 line test suite for an "internal API." The server was mocked. All the tests were doing was checking that calls echoed back their own parameters. What was the point? Well, the API client now had 100% coverage.
Going into new segments of our code, I’ve had to rewrite an awful lot of tests. It was a mess, and each one had at least a few tests that were only testing the mocks. Hand written mocks at that. Those people need to be stopped.
It wasn't that bad at a previous job, but it was close. The sad part is nobody else would comment on the useless nature of these tests that didn't actually test anything.
It’s very tricky to be against any kind of testing in a professional setting… opens you up for other engineers to question your maturity, professionalism, an and commitment to reliability. Or at least to look substantially more mature, professional, and committed to reliability than you. In front of management that can be death.
People with a lot of clout have absorbed the virtues of automated testing in general and applied it to unit testing in particular. It’s hard to swim upstream on that one.
It's true. I mean, I wouldn't comment on them either. I'd just roll my eyes when asked to review another 4000 line "test suite" that upped the coverage but did not test anything meaningful. I wrote many useless tests myself, assigned tasks like "upping test coverage for module XYZ." They'd all get thumbs up and looks-good in reviews.
To play devil's advocate here, what usually suffers the most in integration-only testing is testing error handling. In some codebases it's extremely important to test all the error handling paths because they are not extremely exceptional events. Many times, the disproportionate amount of failures are from error handling not precisely doing the right thing even though it looks reasonable upon code review. It might not mean 100% coverage, but in these situations, ensuring comprehensive tests of all the error conditions (inducing OOMs and other resource limits) helps make sure the failure paths work just as well as the normal ones. Failure of a failure path can manifest at best in ways such as hiding the true source of an issue, telemetry problems, logging statements without salient values; at worst, crashing because of a chain of events triggered by the seemingly innocuous error handling that passed a code review. OOM modeling to test every code path that allocates in a function is the most tedious thing in a gtest, but often yields the most surprising, actionable fixes. Sadly, most programmers just ignore OOMs and hand-waive "if allocation fails, the system is broken and my software doesn't need to work correctly".
For the webdev world, I think a big reason for this change is that many web systems talk to so many underlying 3rd party services that often times it's subtle, backwards incompatible changes in those other services that break things. If you're just only testing everything with mocks, you're never going to catch what will often be the worst breakages and bugs that your users experience.
I think better type systems, like the one in Typescript have made mainstream the use of types to remove some basic bugs. This makes some basic tests not needed anymore, and gives more time to focus on the important ones.
> Write as many tests as possible, 100% code coverage, in fact 100% branching code coverage.
In my eyes this still hold true. Software that has expected behavior should be tested to make sure that the behavior isn't broken due to changes to any of the involved hot code paths or data formats.
I once wrote a code library for some boring business system that handled integration between the system and a JWT library, which would make sure that certain requests should be serviced. Using a library without tests wouldn't be acceptable (security related/adjacent code is perhaps the best example of such circumstances). Neither would my code not having tests, either, given that this library would be used across multiple services within the system.
Thus, I wrote tests until I got pretty close to 100% coverage and doing that actually helped me discover a few bugs while I was actually writing the tests! Not only that, but once the need to refactor something arose due to changing requirements, the tests breaking told me exactly what I had overlooked while doing those changes. Not only that, but if I'm long gone and someone comes to make changes to the library, the CI will tell them about the things they might overlook themselves, aside from any boring Wiki that they wouldn't read or other docs. The tests also demonstrate all of the ways how the code can actually be used, so aside from the occasional code comments, they also serve as living documentation.
There absolutely are cases where testing something won't be viable (e.g. different file systems the code implementations for which depend on the runtime that's installed on the system, whereas all you get is a leaky abstraction in front of these and your test setup doesn't contain every covered platform, for example, checking which file paths are parsed as valid and which aren't across different file systems on different platforms), but in most systems they're not the majority.
> Use an absurd amount of mocks to achieve this
You also hit the nail on the head here - this is a problem and a symptom of us perhaps developing all of our systems wrong. The main reason for not writing tests (one that I can understand) is the fact that it's not easy to do so. You end up with various mocking frameworks and libraries that try to take away some of the pain caused by the fact that your entire system is not testable, but end up with more complexity to dance around in the end.
I think the only way around this is to do data driven design that's coupled with functional programming in ample amounts, with as many pure functions as you can get. This would be completely un-idiomatic for many of the languages out there (e.g. those that rely on injecting services/repositories/whatever in fields, instead of passing everything a function needs in the parameters), but is also the only way how you could make testing easier. Maybe passing interfaces to "services" (many seem to use the service) pattern would be wrong and instead you'd need to pass in separate methods that your code will use. So instead of passing in UserService you'd pass in UserService::getUserById.
So in a sense, it's a struggle to find the balance between code that is absolutely untestable without being in mock hell, to the point where you test mocks and your tests are useless and ending up having to write code that goes fully against how things are done in any given language and the frameworks you'll use within it, probably ending up with more code meant for decoupling those parts than you have the time to maintain.
> write tests, not too many. mostly integration
In an imperfect world, I guess we can just pretend that this is okay, because it will give you the most results, compared to the amount of work you need to put in. At the end of the day, nobody wants to pay 10x more for systems that are nearly perfectly tested, they just want something that is vaguely decent and will accept hand-wavy apologies for everything constantly breaking, as long as the breakages are small and non-critical enough. Devs also don't seem to typically enjoy writing tests, in part due to some systems not being testable easily, but also because of many tools, in particular mock frameworks and even integration testing tools (like Selenium, which thankfully has more and more alternatives), just being unpleasant to use.
> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
Maybe not, but isn't it true that any new code that is being added to the kernel has been run on exactly 0 devices? And new code is being added all the time.
It's one thing to say that it's impossible to test effectively locally before release (which I'm not sure if that's true or not). But you're saying it's just not worth testing because it'll break in real life and that's even better, which I"m not sure I can agree with.
That merge only has fixed because of how the development model works: there’s a merge window where new features are merged and which then becomes -rc1.
After that only fixes are allowed until the final release of that version and then the merge window opens again.
I think that's a bit of a strawman. I don't think anyone says tests prevent all bugs. They can help fix some parts of the contract down and give you a place to check against regressions.
SQLite doesn't have hardware drivers (that IIRC are still majority of kernel bugs) that need actual hardware to test (as hardware mocks are at best mildly useful, coz real hardware can be... weird)
And unit testing is relatively useless, what would be useful are end to end tests that start with userspace API but that's much harder task, altho hopefully those tests wouldn't need to be changed that often.
I work as a software developer in a hardware company. We have A LOT of tests.
But testing costs a lot more when there is hardware present. It's a whole new dimension in your testing matrix. You can't run on cloud provider CI. Hardware has failures, someone needs to be there to reboot the systems and swap out broken ones (and thing break when they run hot 24/7). And you need some way to test whether the hardware did what it is supposed to do.
Although some kernel driver developers have testing clusters with real hardware (like GPU drivers devs), there is no effin way Linux could be effectively tested without someone paying a lot of money to set up real hardware and people to keep it running.
Of course the hardware can be simulated but simulation and emulation are slow (and potentially expensive). At work we run tests on simulation for next gen hardware which doesn't exist yet. It is about 100,000x slower than the real deal, so it's obviously not a solution that scales up.
Testing purely software products without a hardware dimension is so much simpler and cheaper.
I think people have forgotten or indeed never learned in the first place that testing terminology borrows from physical device testing culture. A “fixture” for instance is a mock environment that you place a piece of hardware into to verify it.
Hardware testing takes up a pretty sizable portion of the development cycle. Why do we think we are special?
kind of this -- I can't imagine any hardware company moving forward without fixture test flows of all sorts and a lot of effort put into them. It really seems like an area where software tries to stand on the edge of "art" forgetting that there is a science and craft needed when teams revolt against testing.
>
Although some kernel driver developers have testing clusters with real hardware (like GPU drivers devs), there is no effin way Linux could be effectively tested without someone paying a lot of money to set up real hardware and people to keep it running.
Just crowdsource this testing: I am sure that there exist some people who own the piece of hardware and are willing to run a test script, say, every
* night
* week
* new testing version of the kernel
(depending on your level of passion for the hardware and/or Linux). I do believe that there do exist a lot of people who would join such a crowdsourcing effort if the necessary infrastructure existed (i.e. it is very easy to run and submit the results).
I think you are vastly overestimating the willingness and capability of volunteers and underestimating the effort needed to coordinate such an effort.
And having any kind of manual intervention required will almost certainly reduce the reliability of the testing.
This is further complicated by the need to reboot with a different kernel image. Qemu and virtual machines can't do all kinds of hw testing needed.
And in fact, the kernel is already tested like this. Just very irregularly and sporadically. The end users will do the field testing and it is surprisingly effective in finding bugs.
> I think you are vastly overestimating the willingness and capability of volunteers and underestimating the effort needed to coordinate such an effort.
> And having any kind of manual intervention required will almost certainly reduce the reliability of the testing.
Perhaps I am underestimating the necessary effort, but the willingness and capability problem can in my opinion be solved by sufficiently streamlining and documenting the processes of running the test procedure.
If the testing procedure cannot be successfully run by a "somewhat experienced Linux nerd", this should be considered a usability bug of the testing procedure (and thus be fixed).
The NixOS community would be perfect for this, since "nothing" (i wouldn't wanna do experimental filesystems) can break my system, I just atomicly roll back to my previous generation and report it broken :)
You can buy TH3 Testing Support for SQLite. Sole item on pricing page (https://www.sqlite.org/prosupport.html) that has "call" as a price. From their page: "The TH3 test harness is an aviation-grade test suite for SQLite. SQLite developers can run TH3 on specialized hardware and/or using specialized compile-time options, according to customer specification, either remotely or on customer premises. Pricing for this services is on a case-by-case basis depending on requirements."
SQLite Test Harness #3 (hereafter "TH3") is one of three test harnesses used for testing SQLite. TH3 meets the following objectives:
- TH3 is able to run on embedded platforms that lack the support infrastructure of workstations.
- TH3 tests SQLite in an as-deployed configuration using only published and documented interfaces. In other words, TH3 tests the compiled object code, not the source code, thus verifying that no problems were introduced by compiler bugs. "Test what you fly and fly what you test."
- TH3 checks SQLite's response to out-of-memory errors, disk I/O errors, and power loss during transaction commit.
- TH3 exercises SQLite in a variety of run-time configurations (UTF8 vs UTF16, different pages sizes, varying journal modes, etc.)
- TH3 achieves 100% branch test coverage (and 100% MC/DC) over the SQLite core. (Test coverage of extensions such as FTS and RTREE is less than 100%).
> tests the compiled object code, not the source code
That's not something you see called out very often. Correct code fed to a broken compiler can definitely give you a broken binary. Likewise a correct binary will pass on a correct simulator and may fail on broken hardware.
I would note here that SQLite gets reasonably close to the Unix Philosophy. It’s not really a feature factory sort of application. That does make it quite a bit easier to write good tests. Applications that don’t know what they are or what they do are difficult to pin down.
I take that as an argument not for or against tests but against making it easy for management to turn your project into a feature factory.
A project as large and well funded as the Linux kernel could have a hardware test farm at least with reasonable coverage of popular hardware. "But Linux isn't that well funded!" Sure, but it's orders of magnitude better funded than Embassy[0], which runs tests on real hardware automatically before every merge.
There's also the Linux testing project, which is technically third party. It's not clear to me how extensive it is but for a project as important as Linux I think it has to be graded as "needs improvement."
I don't think you can argue they are classic, mock-the-world test one function tests. Giant majority of those are just huge amounts of SQL statements running all over - which are integration tests.
The API is massive. Just the number build parameters and boot parameters is very high. There is also a ton of things you can tune at runtime in /sys/proc . Not to mention all the syscalls and the subtle ways they all interact with each other.
Those are more integration tests (load stuff into the DB and query it out) vs. unit tests (set up mocks around functions and call them with different arguments).
It's a cursed notion that unit-tests are "set up mocks around functions and call them with different arguments".
It's more useful to categorize tests by how hard to setup an environment. In reality cost/usefulness line lies on this boundary.
* in single process
* multiple process using IPC
* multiple processes using network
* tests validate functions calling foreign services
I see many devs (including myself) call "integration" tests as "unit" tests because in your particular app/system they are easy to spawn locally even without any container.
While I tend to agree, the Go toolchain authors are very specific: it doesn’t count as unit test coverage when the code and the test are in different packages. And our management has no interest in breaking with them on this.
You are free to write tests that compose packages, and I’ll do it for my own sanity and confidence. But that’s in addition to, not instead of, the minimum “not horrifically irresponsible engineering” standard of universal, exhaustive, very detail oriented mock driven unit tests within all packages.
Agreed, people tend to over complicate and make testing religious. In reality, testing is nothing more than a means to an end. Ask yourself these questions for your project:
1. Are there pieces that can be tested independently where verification is valuable?
2. What is the impact of a bug or regression?
3. What type of testing is most likely to uncover issues in my product given its architecture and domain?
4. Am I confident to refactor the code base without creating new bugs?
etc. etc. Iow, testing isn't magic, it should be driven by business goals. For example, I use Rust which due its great type system makes some types of code "just work", however, Rust cannot prevent logic errors. This means when I'm writing 'algorithmic code' I tend to write heavy unit tests. When I write API driven code, I tend to use more integration/end-to-end style testing. Do what makes sense for your code and goals. Tests take time and need to be refactored, so they aren't free, but can be valuable.
> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
This attitude blows my mind. The point of tests is to find bugs before you ship them to the user!
Writing tests after you've already verified that some piece of code is working correctly (e.g., by executing it 10^15 or whatever times on different systems with different inputs) is useless, I give you that. But what if you want to make a non-trivial change to the code? How are you going to make sure you didn't introduce any bugs without tests?
I think the OP took the strictest and dumbest definition of testing possible, used that as his straw man, and then lit it on fire.
The fact that systems level programmers, generally speaking, have a disregard for testing automation is apparent. Over the years if somebody had the "pleasure" of running any type of linux distro, how many times would things break randomly for reasons that the end user could only imagine? Many of those issues probably could have been surfaced and fixed before the code shipped had there been any testing automation in place.
But let's forget linux for a minute. Look at Windows. For the longest time the quality of windows was the biggest joke in software. And those engineers working on it, sorry to say, were cut from the same cloth as the Linux kernel folks and all other systems programmers. We don't need no stinking tests. Well eventually somebody that cared about the reputation of the company forced testing upon these teams and lo and behold things have gotten way better. Apple is the same way.
To me it is honestly astounding that there is still such a strong anti-testing sentiment in the industry. When you have more than a handful of people working on a complex system I greatly prefer to know that there is a robust test suite looking at important functionality, seeing if performance degrades, checking for static analysis errors, etc. And when something does break, since I already have a testing solution in place, it is generally easy to add a test which covers the regression and ensures that it never comes back.
>What is the external interface of the kernel? What is its surface area? A kernel is so central and massive the only way to test its complete contract with the user (space) is... just to run stuff and see if it breaks.
Having never interacted with the kernel, I have to assume it isn't just one massive file right? It's broken into separate files and components? And if you ever want to modify or refactor one of those components its nice to have the confidence that your code changes are safe without having to rebuild the whole thing and run your integration/behavior/e2e tests. Especially if you aren't the one who originally wrote the component you're modifying and don't know what the intended behavior for edge cases was.
Obviously if you're mocking everything then the tests might be pointless(there can still value in preserving in the test what you assumed the behavior was), but I feel like in some ways people have over-corrected on TDD and claiming unit tests are pointless.
> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
Yes, you do, because otherwise you will find that some change or new hardware will break it.
If some hardware breaks it, the hardware must be seriously broken.
If a change breaks it, some one will probably have called read(2) a few times before that kernel is shipped to you. A change is filtered through multiple developer and the kernel has some kind of smoke and consistency tests it is subject to. Also there are people and companies that compile their kernel from the master branch and run software on it.
No one's gonna ship Linux with a broken read syscall. And if there were tests, one could still ship a broken read syscall anyway.
> If some hardware breaks it, the hardware must be seriously broken.
And it sometimes is.
> If a change breaks it, some one will probably have called read(2) a few times before that kernel is shipped to you.
I've seen to many kernel bugs on new hardware to believe this is enough.
Maybe not in read(), but still in something that should work.
Throwing testing on users may have been acceptable in the past, but not anymore.
I was thinking of Red Hat (for example), that pays kernel developers and regularly tests and backports features for its paying customers. That means having a finger of the pulse of the kernel tree, and trying changes to decide whether it might be worth to port them to their supported kernel branch (which lags behind master).
If you have 100000 servers it's in your interest to test HEAD on 100 of them to be able to catch and bugs before they hit a release. That's how open-source should work!
Why not run 1-2 releases behind and test/qualify a later release?
A commenter below gave a good example of a company that would need to run a bleeding edge kernel (Red Hat). The vast majority of companies would never need to do that though.
> Why not run 1-2 releases behind and test/qualify a later release?
This may work for most people, but not all. It's easier to get things fixed while developers still work on something, than weeks/months later. Also if you have not most-popular hardware/software configuration, testing done by others isn't necessarily sufficient for you.
Maybe your company needs that super duper new feature that's on master. Or you're Google scale and that 0.5% improvement in IO throughput would save you millions a year. You wouldn't want to wait until it hits the stable branch if possible.
If one person globally used a feature, it crashed and they didn't report it (so it was not critical for them), it's a massive waste of time and energy to commit and maintain a test case for it. In fact, perhaps that feature has no place existing at all in the first place and should be dropped.
You're underrating the fact that Linux's massive and diverse user base is a great filter for bugs. Only the weirdest heisenbugs can survive this filter, and because of their nature, they wouldn't be tested anyway.
I always assumed the fact I can reliably break Linux on Laptop A with Docking Station B by doing Action C was pointless to report, because how is anyone supposed to reproduce it if they don't own Laptop A and Docking Station B?
Can you reliably break Linux on Laptop A with Docking Station B? Then go here: https://bugzilla.kernel.org (and follow the big yellow notice)
There's a good chance it's a firmware or hardware bug Linux cannot fix, but it's worth a shot.
But to answer your question, Laptop A and Docking Station B probably interface with each other with the same two chips/subsystems present in other devices. And if they work with Linux, the maintainers of these drivers, often the manufacturers, are publicly listed and are the ones that will try and troubleshoot it with you.
At the company I am working 80% of the code needs to be unit tested or you won't be able to commit. This results in tons of useless unit tests and you learn to restructure classes in a way that makes writing the unit tests later easy instead of what would make the most sense, for example exposings fields that should be private.
I'm a huge proponent of good testing, mainly because it's so much easier to work sustainably on a code base with good tests. But I hate, hate, hate test coverage mandates. Every time I've seen them, it's as you say: some people write garbage tests to hit the metric. Before with those people you wouldn't have been able to trust their code. Now you have two problems, because you can't trust their tests either.
> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
I’m not sure why you’re expecting the answer “no” here. The biggest value of automated tests is non-regression. So with billions of users, yes, you absolutely need reliable and reproducible tests. Maybe not unit tests (I agree they’re overrated, you want to test contracts and behaviors, not implementation details) but a strong integration test suite is a must have. To ensure you don’t break anything your billion users might rely on.
I do actually agree with you, what I am saying is that a big part of Linux testing is crowdsourced: you and I and hundreds of millions are testing if there's a regression every time we run Linux, or running buggy, incorrect software on machines in random levels of stability and soundness.
Unit tests and TDD should be used as tools when necessary. Unfortunately, many team leads and higher-level engineers don't take the time to explain to juniors when to use these tools. Just that the tools are essential. And therefore, for many engineers, they have become a religion, or worse - a cargo cult.
But really, most code does not need to be tested. The parts that are likely to break when others change them should be. Testing can mitigate risks for complex, esoteric, highly dependable, domain-specific, and technical code. In such cases, tests can save a lot of time that would be spent fixing defects later. In contrast, testing as a religion is entirely counter-productive. It's a time sink in an effort whose whole purpose is to save time. And what is worse, poorly written tests make the codebase less maintainable, more complex (to pass wrongly written tests), and degrade code quality (through a false sense of security).
Functional/integration testing can suffer from the same problems. There's nothing more annoying than fixing a defect in a system only to uncover a mess in functional tests that relied on the bug. Of course, some of this is unavoidable. However, cargo cult-style testing is avoidable and entirely self-inflicted harm in many teams.
In short: testing is a tool to save time and increase code quality; such a tool should be used where it saves time and improves code quality, and it should not be used where it just results in the opposite.
Is it weird if I would be *thrilled* to work with a manager like that?
People here have great arguments against tests. I agree that they’re not always useful. Aiming for 100% code coverage is a waste of time.
But if a manager wants easily measurable metrics to hit and they have the budget and bandwidth for it, why should I complain?
Writing tests is easy, predictable work. My rate is my rate and if this is how management dictates I spend my time, I’ll take an easy couple weeks and paychecks any day.
But is it useful work? This is the problem: engineers have at some point decided that keeping busy writing tests is better than finding other ways to write correct code.
Now that TDD has died down as a religion the current fad became strong typing. I will repeat it again and again, I have yet to see an error caused by me passing an int where an array-type was expected in Python or JavaScript. I have been doing this for a long time. This too shall pass and we will see posts like “Show HN: new lightweight dialect of XYZ without the burden of types”.
Static typing is machine-readable and machine-validatable documentation largely of a sort you should be writing anyway. If adding static types is a substantial burden either you're stuck with a really bad language, or you weren't writing enough documentation in the first place. Further, if things like much-improved autocomplete, far better editor-provided errors while editing, and better refactoring aren't saving you time, overall, we must write code very differently. And during handoff of code, static types are really nice and save tons of time.
I am not writing code to make it easier for the computer to do its job. I am writing code so that the computer can make my job easier. Wasting time telling a computer how to show me autocomplete suggestions is better spent on creating APIs that are easier to use and require less of it. Or creating smarter autocomplete algorithms.
I think what the person replying was saying was not about static typing being easier for the compiler to understand, but the fact types being documented in some way(whether it be embedded in syntax or via documentation) make code far more readable.
Python was my first programming language(cliche, I know) and I didn't understand the whole "static typing is good" thing until I learnt languages like C, C++, C#, and rust.
Python also introduced type hints and made them a valid part of syntax for good reason. You'd typically end up writing a docstring with the return type of a function and the types of the parameters anyway, which is what the person replying to you was presumably referring to.
It's not about passing in an int instead of an array, but rather but being able to figure out what a function wants without needing a stackoverflow thread or having to search through tons of documentation for something that would be otherwise trivial in statically typed languages where the function declaration tells you quite a bit about a function.
Hence it's safe to say that types documented in code in some way are very useful for humans, and if you have issues with statically typed programming languages, there's a chance you were not documenting your code enough already.
Ah, the Sufficiently Smart IDE, companion to the Sufficiently Smart Compiler. Sure, go ahead and wait for the magic autocomplete algorithm that can deduce precise function signatures in a large Python or JavaScript code base.
Meanwhile, some of us have work to do, and we'll use the best tools at our disposal for managing non-trivial code written by many people across teams. And those tools include modern static type systems.
Refactoring is easier when you have either tests or types, but I think it's easiest when you have types as tests don't always remain compatible during refactorings. In TypeScript I let the types guide me as I change code. Too many times I've looked at a type error, thought "why would this be an issue", ignored it, patched up my test suite, and realized the type checker was right all along; I would have saved time just listening to the types.
For some languages it’s a whole lot more important than others. Python and JavaScript are memory-safe so you don’t need to worry about writing a bigger object into a too-small memory allocation.
You will discover that you reversed the order for parameters to a function by running your code with much less effort than annotating all your code. This type annotation is entirely useless:
Additionally, if you are testing the code anyway for other reasons (like, to check that the logic is correct), you exercise the dynamic type checking for free.
I have had a number of times where static typing would have saved me a lot of time in both Python and Javascript. I have a recursive-descent parser for a non-trivial language in Python, so when you pass something incorrect down in the bottom it bubbles up and fails up the stack and a bunch of calls later. But if I'd had static typing, the compiler would have caught it immediately. Similar story for a 3D model generator in Python. I've wasted enough time on stuff like this that then next Python project I do that I think will be over 1000 lines long I'm going to write in some statically-typed language.
I really don't understand how strong typing does anything related to testing behavior. I keep seeing this statement and it makes me believe maybe these devs idea of testing behavior are testing nulls or maybe trying to pass in hashes with missing keys. What a waste of time.
If you don't have tests, all that means is you or somebody else is testing it manually.
I equate the two as popular HN religions, not as substitutes for each other. TDD is on its way out as a religion. Strong typing in dynamic languages is gaining popularity.
I don't think it's weird at all, in general. I actually believe that's the thinking of the majority of programmers. As employed by Tata, HCS, HP, IBM, Oracle, Cognizant and so many other huge corps employing developers.
We feel it "funny" because that culture is perpendicular to the Startup culture that abounds in this forum. But number-wise, it's a minority.
The great majority of people are not wholly in love with their craft. They just want to do their job to get paid. They dont read about Rust on weekends, and dont program yet-another-X in their spare time.
And that Ok. Thats what the majority of the world does.
> if this is how management dictates I spend my time, I’ll take an easy couple weeks and paychecks
I would rather my employees (and contractors!) work on what's actually valuable, and not just take blind instruction. The people closest to the code should be most empowered to improve it. Is that not a common management view?
I find writing tests is often not easy because test tech debt is usually worse than regular code tech debt! Really with non TDD the tests
are doing gymnastics to test that code. Code smell example: GetInternalHitCounter_ForTests()
> It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.
Okay, but Linux has internal implementations of base64, assorted hashing algorithms, some compression, probably other algorithmic tidbits; wouldn't those be easy to test?
Edit: And downthread people are saying that Linux does do that:)
The Linux Elantech V3 touchpad driver for an old 2013 laptop of mine (and most laptops of the similar era) has been broken since 2019 (reports half the size compared to actual coordinates, offsets vertical coordinates, so the entire right half is interpreted as edge scroll). I debugged and proposed a fix months ago, and there's been no motion towards merging it: https://lore.kernel.org/linux-input/20221014111533.908-1-nya.... Evidently, the amount of "run stuff and see if it breaks" they're doing so far is insufficient for niche hardware.
Additionally the in-tree Realtek R8169 driver fails to redetect Ethernet being plugged in after I unplug it for 5-10 seconds, and I had to resort to the out-of-tree R8168 driver as a workaround.
Ok, you're essentially saying that kernel code is too important to be tested. Honestly, that logic sounds like Effective Altruism levels of twisted to me.
So if tests don't cut it, what does the Kernel community then do instead of testing to verify correctness?
> Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
That's exactly the kind of code that needs tests.
What happens if someone changes some code that affects the read call and expects it to run on said billion of devices?
> It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.
I think that this heavily depends upon the part that is tested or not tested. For certain parts in the kernel, such as the firewall, I truly believe, that such tests cases (including corner ones) shall be present.
Probably the context switch logic (save registers, switch to ring 0 mode, etc.) in the kernel might be. That's called for every syscall, up to thousands of times per second per active core.
It would be cool to know which is the most frequently called piece of code in the world. Maybe something that hasn't been changed in decades.
Thank you! In a previous job I was doing the equivalent of integration testing and the very young collaborators were bashing me and demanded that I contribute to small random and useless (according to them even) tests that were not really testing anything.
Only time I've written 100% branching coverage unit tests was when I needed to implement a spec, with many alternate representation formats, and a plethora of test cases I could run through as a litmus test of conformity.
It is good to know that your base64 encoding function is tested for all corner cases, but integration and behaviour tests for the external interface/API are more important than exercising an internal implementation detail.
What is the external interface of the kernel? What is its surface area? A kernel is so central and massive the only way to test its complete contract with the user (space) is... just to run stuff and see if it breaks.
TDD has some good ideas but for a while it had turned in a religion. While tests are great to have, a good and underrated integration testing system is just for someone to run your software. If no one complains, either no one is using it, or the software is doing its work. Do you really need tests for the read(2) syscall when Linux is running on a billion devices, and that syscall is called some 10^12 times per second globally?
https://www.youtube.com/watch?v=EZ05e7EMOLM