Hacker News new | past | comments | ask | show | jobs | submit login
A warning about 5.12-rc1 (lwn.net)
217 points by eplanit 41 days ago | hide | past | favorite | 111 comments

I used to assume that Linux had a huge test suite with hundreds of thousands of tests, given the crazy feature matrix that the kernel has to support.

And that they would have started requiring tests for new features and bug fixes.

Alas, that's not the case. There are a few external test suites like the Linux Test Project [1], but nothing that looks very extensive.

The process seems to mostly rely on maintainers giving patches a good look, plus developers and first line users, including hardware manufacturers and corps like IBM or Red Hat, running their own software on kernel pre-releases and reporting the bugs.

I realize that most of Linux is device drivers or code that is tightly coupled to hardware, which is more or less impossible to test without a huge test farm and lot's of manual labor, but it's still surprising to me that they don't do it for the more or less device independent functionality.

This particular bug looks like it might have been caught by a relatively straight-forward swap file test case.

[1] https://github.com/linux-test-project/ltp

Comment rom "Helping Out with LTS Kernel Releases"[0]:

"I find it amusing that the answer to "how can I help test" is "that people test the -rc releases when I announce them, and let me know if they work or not for their systems/workloads/tests/whatever". No smoke tests, no automated tools, no suggested workloads, no fuzzing, just "fool around and report back". There probably are lots of kernel testing tools around, but you wouldn't learn that from the linked article."

It seems to be conscious choice, to the point of developers stating that "the process works". Which, looking at the results, it does seem to work. But wouldn't it work better with more tests, easily run by users wanting to help? I mean, if a downstream patch in related code regresses the issue, is there a new test added with this fix that would catch that?

Edited to add: it seems that the situation is changing for the better, with gregkh saying in a comment: "`make kselftest` seems to do what you want today, if there are any gaps the kernel developers are glad to take more tests."

[0] https://news.ycombinator.com/item?id=26021962

[1] https://lwn.net/Articles/848352/

I suspect lots of downstreams kernel developers do run smoke tests, automated tools, workloads and fuzzing.

It's just done individually, with each person hitting their own niche, and without a single all-powerful CI pipeline that's visible.

Several people booting their favourite distro with the new kernel stresses a lot of areas at once.

Yes, it's a manual test, but it "does the job" (or 99% of it at least)

It would do an even better job if the user could easily increase coverage or depth of that stressing and check against known issues (regressions). Maybe `make kselftest` is the way to go. But it might be worth creating an easy way for people to help test the kernel, something like Debian popcon meets CPU-Z benchmarks.

It is important to remember that this is the first release candidate of a series which is usually 7 or 8 entries strong. It's pretty far removed from shipping. This is a testing procedure, though not one which looks the way you're used to. In general, Linux ships pretty reliable releases once the rc process is completed.

To be honest, many codebases that I've worked on with large automated test suites seemed to have approximately the same relative occurrence of bugs as codebases with zero automated tests.

Automated testing is good at catching (especially immediate) regressions, but also it is a development tool and a communication tool.

This is the checks & balances of the many eyes looking at code and eating your own dog food. These method succeed when you have a torrent of users but fail at smaller scale more often.

I was interested to see the code. Linus's message doesn't provide commit IDs, so you have to go looking yourself.

I believe this is the commit which introduces the bug: https://github.com/torvalds/linux/commit/48d15436fde6feebcde...

And I believe this is the commit which fixes it: https://github.com/torvalds/linux/commit/caf6912f3f4af723234...

The fix makes me feel a little uneasy.

It changes the behaviour of swap_page_sector() to take into account the offset. But swap_page_sector() was already used before the bug was introduced, a few lines earlier in the bdev_write_page() fast path, as introduced by this commit from 2014: https://github.com/torvalds/linux/commit/dd6bd0d9c7dbb395100...

So either:

1. The bdev_write_page() call was also broken, and has been for years, or

2. The bdev_write_page() uses the sector value differently, worked fine with the old swap_page_sector(), but is now broken with the new version, or

3. The change in swap_page_sector() somehow doesn't affect bdev_write_page(); maybe it's a noop when using swap files?

Your first point is the correct one.

Basically the offset passed to bdev_write_page() was wrong, and has been like that since it was introduced to swap code back in v3.16.

In practice this means that the same filesystem corruption could happen with swapfiles sitting on top of filesystems which are created on top of block devices that bdev_write_page() supports (in specific: brd, zram, btt, pmem). This appears to be a non-common setup that apparently was not encountered. Nevertheless the fix is being backported to all stable kernels to cover those use-cases.

edit: bdev_write_page() was never broken, just the offset handed to it was wrong.

Why do you believe that’s the commit that caused the bug?

The second commit self-disambiguates with "swap: fix swapfile read/write offset" and the commit message notes "fixes: 48d1543", which is the first linked commit.

Thanks for answering the question

Because of the commit message of the fix?

This is one of the best examples of leadership I’ve ever seen...

Huge impactful problem....

Gets past the gatekeepers...

Things go bad...

Issue gets discovered & corrected....

Leader writes an honest and open note to explain the situation and actually apologizes.

Corporate America could learn something from this.

Linus really does write beautifully. Love his style, concise, easily readable, frank, honest with nice human touches.

That email was many things, but concise is definitely not one of them.

... but remarkably good humoured! (Or are those teeth gritted?:-)

I thought so too! Linus was even like "don't blame the developers, this one was particularly nasty" and everything! Good to see nice and sensible Linus once in a while!

> So I'm not blaming the developers in question,

Much kinder than past Linus writings. I loved it too. "Double ungood" works at least as good as the swearing.

For anyone who didn't recognize it, that's a reference to Orwell's Newspeak (https://en.wikipedia.org/wiki/Newspeak). Which makes me wonder about two things:

- is Linus hinting that he is censoring himself, and if he wasn't, what expression would he have used instead?

- for what kind of incident does he reserve the attribute "double plus ungood"? Probably if such a bug would get into the production kernel?

You’re reading way too much into it - it’s just a colourful expression.

Anyway, “double ungood” would not be a valid Newspeak construction. The base is “good” and “ungood” is the negation; intensification would be “plusungood” and further intensification “doubleplusungood”.

> ... if he wasn't, what expression would he have used instead?

My first guess would be something that includes the word "fuck"... but, on second thought, this isn't Nvidia -- so perhaps not.

I think that the word ‘Double ungood’ was generated by his auto correction program which automatically replaces all the swear words with some less offensive words

I can recall a colleague, many years ago, sometimes using "doubleplusungood", often shortened -- in (typical lazy UNIX admin style -- to just

Later, another cow-orker tried using a slight variant ...

... although that one never really caught on -- apparently it didn't compile or something (maybe because we were using Red Hat's version "2.96" of gcc at the time?).


(+10 Internet points if you get the reference(s))

++ can't operate on !good because it's not an lvalue (i.e. not assignable).

    typedef double winston;
    winston smith(int good) { return (double)+!good; }

Oops, sorry for giving it away in my comment ;)

It's okay, that's not the only one!

In my view seeing him writing double ungood worries me.

Don't need to swear, but he could have written "very bad" or something like that.

Why are swap files/partitions constant size?

Most of the time, I don't need any swap at all, and any swap file/partition is wasted space.

While hibernated, I need ~the amount of system ram as swap.

While running some horrendous matlab script, I need about 1TB of swap.

Yet linux won't dynamically allocate it like other files. I'm forced to constantly resize it manually to save space or be able to do more things.

The answer to why a partition is constant size: you are permanently reserving that amount of disk space, that's what a partition is.

Swapfiles aren't dynamic for the same reason, but, you can have multiple swapfiles, and when they are empty, you can discard them. If you really want dynamic swap, you can run swapfiled, which will monitor memory usage and create and mount swapfiles of a given granularity on demand.

If you're going to hibernate, you need that system RAM sized swap space immediately, and what are you going to do if it fails? Right, you need that to be permanently allocated.

If you have a matlab script that needs 1TB of swap, I don't know why you're running it here. Buy a server with a TB of RAM, or rent one. Swapping a terabyte is one of those edge cases where nobody expects the system to be usable for anything else for hours or days.

> If you have a matlab script that needs 1TB of swap, I don't know why you're running it here. Buy a server with a TB of RAM, or rent one.

why would you assume that having a machine with 1TB of free space means that you have any way of spending money ?

I do not believe the parent comment were assuming they have money just suggesting that the grandparent comment not do what they were currently doing.

A 1 TB swap sounds like it will thrash a disk to the point of failure pretty quickly and perform quite poorly compared to a system with more appropriate amounts of RAM.

Having to replace disks more frequently and have tasks take longer than an appropriately equipped system could potentially be the more expensive in the long run.

I assume the argument to be that it adds complexity (out of swap, what do I do, grow it or something else, if set to grow what if there isn't enough space, if set to grow and there are multiple swap locations which do I grow, ...) that needs to be maintained & tested, but isn't commonly enough needed for someone working on the kernel to decide it warrants that effort, given there are workarounds (i.e. dynamically adding swap partitions/volumes/files) for most of the cases where it might be needed.

> I need about 1TB of swap.

I don't think most people[†] would ever want that much to be dynamically allocated. If something has needed that much memory unexpectedly then it is likely to be something like a massive leak or accidental fork bomb. It could quickly get to the point where a performance drop due to swap thrashing means you can't interact with the system to kill the offending process so your system is effectively locked until either hit the BRB or it runs out of the massive maximum storage allocated to swap. Of course that adds more complexity: how do you decide how much too much is? If it is a user decision and you sometimes want it to be able to balloon that much but still want the OOM killer to deal with accidentally leaky processes at other times, then you are back to manually managing swap allocation before/after processes that are expected to heavily use it.

[†] not that it is wrong to have a use case that warrants it, but if there is a safe workaround for such unusual cases that complicates life just a little, then IMO that is preferable to the risk of adding complexity that will likely go relative untested because it isn't commonly used.

> linux

Of course other OSs have made different decisions on the matter, so maybe there is room for further discussion.

It could get very messy. Think of it:

1. Something asks for memory from the kernel.

2. Kernel doesn't have enough free memory, goes to offload something old to swap.

3. Swap file is too small, needs to be extended. This calls the "extend file size" filesystem code.

4. Filesystem code tries to allocate memory for keeping track of filesystem data.

5. Goto 1.

As far as I understand, the "swap file on filesystem" system bypasses the filesystem. The kernel asks the filesystem to give it a bunch of blocks to work with and then ignores the filesystem code entirely. This is why some filesystems can't have swap files -- the FS has to support the right APIs, and has to be prepared for this kind of deal.

A simple solution to this would be for a filesystem to say "I guarantee that if RAM allocation fails, I will still be able to extend a swapfile".

In most filesystems, extending an already open file is fairly straightforward - it's usually a matter of writing just a single block.

Since this is an incredibly rare case, there is no strict performance requirement. It could use an approach to filesystems like grub which involves reading and writing just a single block at a time.

Such guarantees aren't easy.

And come on, "writing a single block"? No, that's what it looks like on the user side of things. In the kernel it's going to be a good deal more complicated, because how does anything know anything about this block you just added? You need to keep track of it somehow.

Inside the kernel you need to go find some free space to use for this extra block, and to update whatever metadata is needed to make things consistent. It might well turn out that the block you're using to keep track of what blocks belong to this file is just full, so now you have to allocate another block for filesystem metadata. Then there's issues like journalling. And doing all that while providing the guarantee that you won't need to allocate memory is likely tricky.

You've received a bunch of replies about why it needs to be statically allocated, but it's worth noting that several OSes have dynamically allocated swap files that are managed by the OS: macOS, Windows, and OS/2 come to mind.

Windows uses a single file by default (C:\pagefile.sys), while macOS allocates them in chunks (/private/var/vm/swapfile<n>).

Possibly because it's hard to guarantee re-allocating/sizing will not require more RAM during the process on any given filesystem...

That sounds like what systemd-swap does. Apparently, hibernation is not quite working yet, though.


> And, as far as I know, all the normal distributions set things up with swap partitions, not files

I guess Ubuntu is not a "normal" distribution then? Because it uses a swap file by default.

While it's the default setup of the auto installer it common (and IMHO best) practice to change it to install with a swap partition.

It's just that it's much easier for anyone who just gets started with Linux to have a swap file.

For example for hibernation you swap partition needs to have the right size. But people do upgrade their desktop RAM and as such would need to resize the partition, which isn't easy (because likely it means shrinking another partition)...

So for a distribution like ubuntu the "non-normal" case became the default.

Still this mail was targeting people which install pre-release versions of Linux kernerls which is a group of people relatively unlikely to use a swap file.

I used to use a swap partition, but then if you get the partition size wrong you're lumbered with it. Why wouldn't you use a file.

At one point in the past I used the default (or recommended) /boot size and got stuck a couple of years later juggling at updates to fit the boot files into the partition. I just use a monolithic partition per OS per disk now and swap files within that. Is that not good practice (for home systems)?

`/home` should be separate from the system so you can reinstall it without trouble. Same for `/var` (or subdirectories) if you value the data of particular services. Other than that, the bootloader and EFI must be able to access certain directories like `/boot` and `/boot/efi`. As long as it works for your use cases, everything is alright.

Edit: A swap file has a little bit of overhead because of the underlying file system. But at the point you notice this, your system is pretty much frozen anyways.

The Linux kernel doesn't rely on the filesystem for swap files insuring no more overheard than swap partitions. It bypasses the filesystem and is told what blocks it can use, that's why some filesystems do not support swapfiles as there is additional plumbing to be done.

True, I realized when I read TA. After all, the kernel not considering the swap file's offset was precisely the defect in 5.12-rc1.

Not always I guess. The 20.04 and 20.10 installer with LUKS enabled created a swap partition for me.

ubuntu does a lot of things where the design decision is better understood in the context of being as easy to use as possible, for the widest possible spectrum of people with zero prior linux experience, or even computer knowledge in general.

I'd venture that most people who are compiling their own -rc1 kernel are not running Ubuntu, and most of those that are probably chose the swap partition option when installing it.

> it didn't even show up in normal testing, exactly because swapfiles just aren't normal.

Uhh, does it mean swapfiles are undertested in the Linux kernel? Or is there some "non-normal" extra testing that is being run for non-rc releases, but not for rc ones? (!?)

I fear you grossly overestimate how detailed the kernel's testing is.

(Do please note that the word "detailed" above was carefully chosen.)

While the Kernel team directly may not be as involved in testing the Linux Kernel to the extent that dedicated QA team would be, there are multiple billion dollar organizations like IBM (Red Hat), Oracle, etc that do a lot of automated testing. Given the amount of issues with Windows as well, especially for Insider builds, it doesn't make me lose trust in the Linux kernel anymore than I'd trust a Windows operating system. In fact, because it was caught before it would regular distro users it gives me more trust.

> Uhh, does it mean swapfiles are undertested in the Linux kernel? Or is there some "non-normal" extra testing that is being run for non-rc releases, but not for rc ones? (!?)

My understanding is that is the point of RC builds, they are the builds used for testing before a stable release.

this is unlikely to hurt you unless you use a swap file, rather than a dedicated swap partition.

for performance reasons most any modern linux installation would be using a dedicated swap partition, that is defined as swap-only during the install process with mkswap

In my experience some of the reasons for using a swap file were more common maybe 10 or 15 years ago. Nowadays with disk space being less costly and rare, and systems having a lot more RAM (so the likelihood of going deep into swap is less), it's unsurprising that none of the testers encountered it.


Swapfiles are quite common when you use FDE (say, LUKS) and you don't want your swap to be in plaintext on disk.

(Although, you could just have an entire LUKS-encrypted swap volume instead of a swapfile...)

While maybe not the most performant my favorite is LVM2 on top of a single large LUKS-encrypted partition (+no boot partition and a single (custom platform key) signed EFI bootable kernel blob containing the initram fs in the EFI partition).


- Allows a encrypted swap partition (on in lvm2 on top of luks, sure not perfect but I don't really use swap)

- Allows hibernation (which I don't really use tbh.)

- Fully encrypted disc

- Fast boot (compared to e.g. encrypted grub, at least last time I checked)

- Easy resizing of partitions on demand

- Which in turn makes it easier to use more partitions (for /, /home, etc.), which is necessary for taking advantage of certain mount flags like noexec for security.

Anyway I think the reason Ubuntu uses a swap file is because it means you don't need to resize a partition when "upgrading" your RAM or similar (for hibernation to work). Through tbh. they should make the choice in the installer dependent of the hardware.

You still need an unencrypted EFI partition given what you describe, and the signed unified kernel image lives there, which means that you do in fact have a boot partition. My /boot mount is my EFI partition.

However, an alternative is having your EFI partition reside on a USB drive that in turn boots into your encrypted partition.

/boot is not the same as the efi system partition (ESP), sure you need to copy the signed kernel blob into the ESP but I still wouldn't recommend mounting ESP as /boot if your linux distro allows it.

The reason for this are many fold including that Linux assumes /boot is managed by it but the ESP isn't managed by Linux and might contain other EFI compatible programs. Another reason is that you want to make sure only the signed blob is in the ESP and nothing else, but depending on your distro and packages all kind of thinks might be put into /boot. Another reason is the dir structure. /boot is flat but ESP isn't meant to be flat e.g. in my case it's /EFI/<osname>/<efifiles>

E.g. my /boot partition contains "initramfs-linux.img", "intel-ucode.img" and "vmlinuz-linux" but my /esp contains only linux-signed.efi which packs the necessary image files, linux kernel, kernel parameters, boot splash screen etc.

Still what you can do (with reasonable effort) depends a bit on the Linux Distro you run.

Also in many default setups you have a /boot parition and a ESP partition, in your case you just folded both into one. In my case I don't have a boot parition and could as far as I know tweak my system not not have a /boot folder at all as it's not really used, it's just not worth the effort to do so.

My apologies and thanks for the clarification. I suppose I could see how you'd want to create a hook to create the unified image, in which case /boot or whatever could simply be a directory that holds the non-unified images.

I guess the point I was trying to make, is whether you call the partition that holds your unified kernel images /boot or /esp or whatever.. it is still a separate unencrypted partition that most people would associate with a boot partition, whether it is mounted at /boot or /esp.

Would the bug Linus is describing still bite you if your swap is in a separate LVM partition? (That's how Ubuntu 20.04 sets things up by default.)

Only if it's a swap file on a filesystem on that LVM partition, and even then if the filesystem is otherwise empty then there's no other data to accidentally overwrite, you might just trash the filesystem structure.

Then, if I'm understanding this correctly, the default setup in Ubuntu (at least in Ubuntu 20.04, which I happen to have installed on a couple of laptops recently), which has been mentioned both in this discussion and in the lwn.net article discussion, should be safe from the bug, since that setup, if you enable full-disk encryption, sets up two LVM partitions, one for root (called "vgubuntu-root") and one for swap (called "vgubuntu-swap"). They show up as separate LVM partitions in fdisk. The swap appears to me to just directly use the "vgubuntu-swap" partition.

Sounds fine. It would be strange if vgubuntu-swap contained an actual filesystem with a single file on it (the swap file).

This is my setup as well.

Even with full-disk encryption you have two options that I'd recommend over a swap file:

- lvm on luks: inside the luks dm-crypt container, create multiple logical volumes: one for swap, one for the root filesystem

- swap on dm-crypt: create a separate encrypted partition for swap

Since the swap data needs no permanence, it's possible to generate a new encryption key every boot. Debian's crypttab supports this out of the box [0]:

    cswap  /dev/sda6  /dev/urandom  cipher=aes-xts-plain64,size=256,hash=sha1,swap
(although I'd recommend using a /dev/disk/by-id/ path there, for obvious reasons. The scripts do check there's no valid signature on the partition before formatting, but still...)

[0] https://manpages.debian.org/buster/cryptsetup-run/crypttab.5...

> Since the swap data needs no permanence

Unless you want to hibernate!

In the most recent Debian installer, if you want to do FDE, it does:

partition > LUKS > LVM > swap volume

That way you get encrypted swap without having swap be a file in your filesystem.

most people I know who are paranoid to really go into the details of a FDE linux desktop/workstation environment choose to have no swap file, and at minimum 64GB of RAM. even safer if the swap doesn't exist.

Ubuntu uses a swap file by default since 17.04 Zesty: https://blog.surgut.co.uk/2016/12/swapfiles-by-default-in-ub...

But as otherwise discussed this only made it into a release candidate and not an actual release. So the impact is minimal.

I thought partitions used to be faster but files are just as fast now.[1]

[1] http://lkml.iu.edu/hypermail/linux/kernel/0507.0/1690.html

If you are using budget vps, it is likely you will want to enable it. Because vps tend to not setup a swap partition at all. And you are unlikly to reformat a vps to add a swap partition.

I use swap files on all my VPS, even the ones that give me a swap partition - space is cheap as free and it's easy to set a monitor that says "oh wow we're starting to use swap significantly, something's wrong" instead of a "the box OOMKILLed everything, something's wrong".

In fact that just happened - no service died but something used a lot of RAM at midnight - investigating now.

Or use zswap on budget VPS.

I guess because I use a swapfile I am not normal according to Linus? Haha, I just find it easier to use a swap file when dual-booting Windows and my Linux install is encrypted. Otherwise, it'd me more work to ensure my swap partition is encrypted.

I recently ended up using swapfiles because I refactored my disk layout and it was just simpler to chuck the swap on an existing nvme partition. I realised an extra benefit: you can make it a sparse file, so it never even uses disk space at all until you actually need it, which could be never.

> I realised an extra benefit: you can make it a sparse file, so it never even uses disk space at all until you actually need it, which could be never.

AFAIK, this cannot work. The swap code does not go through the filesystem, it bypasses the filesystem (it only asks the filesystem code for the swap file extents during the "swapon" call), and it's the filesystem that would be responsible for allocating disk space when writing to a sparse file.

I don't think I follow. Check it out:

$ du /swapfile

329696 /swapfile

$ du --apparent-size /swapfile

33554432 /swapfile

Looks to me like it's behaving as expected?

But if you suddenly need it and don't have it, you'll get segfaults (I assume). Sounds a bit too dangerous for me :p

You'll get OOM killed as if you ran out of RAM. This doesn't happen unless the program is buggy.

Moreover, if your working set is larger than RAM, you're screwed regardless. So don't do that. Swap exists so you can use more RAM for the disk cache. It's doesn't exist so that buggy programs can crawl instead of dying

> if your working set is larger than RAM, you're screwed regardless.

I mean, this is patently false. I've used 1TB swapfiles to get out of a sticky situation with some terrible research software that used a ton of memory to solve really valuable problems.

There's nothing wrong with an absolute shit ton of swap when the alternative is OOM.

"refactoring" means changing the (often implicit) structure of code without changing its external behaviour.

In this case "reorganized" or "repartitioned" could probably be more apt.

If we'd talked about software here (and not disk layout), I'd have considered less leniently the use of "refactoring", and a hint of unclear intentions.

Sorry for being so nitpicky, but this is a sore thumb from too many "refactorings" that are, indeed, somenthing else.

Refactoring has the meaning that we impute to it. God didn't hand down golden tablets in the 60s with the words "thou shalt only use 'refactor' in these prescribed ways".

Instead of being sorry for being nitpicky, maybe just keep your nitpicks to yourself in future.

Dual boot is probably not typical for people who install rc kernels right away? I remember trying to do kernel dev on a dual-boot laptop in college, and I spent a lot of time trying to repair the bootloader while running off of a live-USB environment.

Besides, most Linux installs by the numbers are probably web servers/data centers, or large enterprises (think movie studios, large tech companies).

I realize this comes from a position of privilege, but these days I'd just have a second computer that runs Windows; the bonus is if I completely mess up I can just use the working computer to generate OS installation media for the broken machine.

Also, depending on your threat model, "buy the next size up of RAM sticks" may be preferable to having a swap file or partition at all. (Although I recognize that may require replacing the entire system if it soldiered on or something.)

It is a lot more uncommon for MBR-based boot partitions these days. It is no longer the default for Windows. With EFI and GPT, it is a lot simpler. Linux can use a Windows EFI boot partition, or vice-versa. Additionally, in my specific case I have a separate EFI partition saved on a USB stick for booting my encrypted root partition. It adds plausible deniability.

> It adds plausible deniability.

How so?

Because the disk drive has a Windows-styled layout and EFI partition. The encrypted partition just looks like random data. Since the bootloader for my encrypted disk isn't on there, it is harder to determine that it even exists.

There is no requirement on what type of partition type you use for your LUKS2 encrypted partition, so ideally you'd use something that isn't going to be apparent.

Then, as long as knowledge of a tiny USB drive or micro SD card existing isn't known (which is a lot easier to hide) it is more difficult to discover that an encrypted partition even exists.

Since the bootloader for my encrypted disk isn't on there, it is harder to determine that it even exists.

How does make it harder to determine your encrypted partition exists? The usual way encryption aids plausible deniability is through multiple keys, e.g. when one key unlocks a seemingly innocuous-looking installation, while the other unlocks the nasty bits.

Having an encrypted partition in plain sight offers no plausible deniability at all to me, especially if it contains a default LUKS header. You might get away with arguing the disk is unused when the partition has a detached header, but even then you'd have to argue why a non-functioning machine is fully equipped with monitor, keyboard and network connections.

> How does make it harder to determine your encrypted partition exists?

Because it just looks like random data that hasn't been formatted.

> Having an encrypted partition in plain sight offers no plausible deniability at all to me, especially if it contains a default LUKS header.

First, I do use a detached LUKS header. The LUKS header is on the USB drive. The machine functions fine and appears to be a regular Windows PC without the USB drive, with an extra partition that doesn't have encrypted data in plain sight. It would look like random data, which to someone else wouldn't know if I did a secure wipe and that it is waiting to be partitioned or if it is just old data that has since repurposed after resizing. I'm sure they could find out if they pry hard enough, especially if trimming is enabled or if they really started to question why the supposed random data can't be recovered to produce files.

> Because it just looks like random data that hasn't been formatted.

I'm guessing that SSDs also give up the secret if an attacker has access to custom firmware to read the wear-levelling metadata and finds a bunch of recent writes in the random data.

At this point, unless speed or capacity are at a premium, it probably makes more sense to completely boot and run off of a fast USB drive. Linux + LUKS-header will be fairly obvious on the USB boot drive anyway, making plausible deniability pretty hard.

What's the threat model? State-level attackers (e.g. journalists working with Snowden) or the border police wanting you to provide a passphrase for stored data or to boot a system? For the latter a good solution is don't transport anything important across the border.

There's a cheaper, easier option.

Just buy a new SSD for each OS you plan to run. A bit more expensive than messing with MBRs, but we'll... No more partition or MBR to deal with anymore. Well worth $50.

That's a perfect solution if you're already using the largest/fastest SSDs available, but otherwise be aware that you're sacrificing throughput: Each 1TB SSD will have around half the throughput of a single 2TB SSD because large SSDs are essentially a RAID0+ECC array of smaller SSDs

Even a 256GB ssd boots damn near instantly.

If you want a 2TB data drive, go on ahead. Both Linux and Windows handle multiple drives just fine.

For a while I was running linux and windows on the same computer but on different hard drives. To switch systems I shut it down and used hard switches (special hardware in a disk drive slot) to power off/on the right hard drives.

This way you only need one computer, but can have multiple completely separated OS'es. Of course, this doesn't work for laptops.

Yep, encrypted swap space is also an added configuration and that too got hosed bad in 5.12 so that’s an issue i’ve yet to file a bug for because i’m still stepping through the kdb.

This is a dedicated data partition that LVM managed to procure a data volume that user land encryption performs on for a swap partition.

This is the root cause of fs corruption that Phoronix reported.


I'd find it interesting to read an analysis how this bug could have happened from a technical point.

Like... is swap file support going shortcuts in the kernel so that it can corrupt the file system the swap file is residing on? Because (at least in my hope) no matter of write() or whatever calls on a file handle should cause a filesystem corruption...

The swap file code does not go through the filesystem. It's the exact same code which is used for a swap partition or a swap file, and it writes directly to the block device (the partition). The main difference is that, if you have a swap file, the "swapon" call asks the filesystem to tell it which ranges in the block device contain the swap, instead of using the whole block device.

I did not know this - it makes sense, but it certainly a surprise.

> Additionally, he is asking maintainers to not start branches from 5.12-rc1 to avoid future situations where people land in the buggy code while bisecting problems.

Nice. Everywhere I've worked, once the fix has been committed the job is done. I should be more conscientious about preventing bad commits from being in branch histories like this.

Automatic testing could have caught this, maybe.

The email mentions that normal testing didn't catch it as it's not a typical use-case for the devs apparently.

Which is exactly why automated testing exists, to catch cases like this.

Automated testing is not magic. it only tests things you make it test.

And since kernel is basically involved into everything that computer does, you would have to test everything, to find every error. Linux kernel is also extremely customizable, so that it can run from watches, to mars rowers, to HPC and cloud.

Linux kernels are continuously heavily tested on various different hardware architectures, by various organizations (RH, IBM, Oracle, Google).

So there are automated tests done. It's just seems nobody tested this.

Of course. Thanks for stating a lot of mostly obvious things.

Assuming there is a test suite (I really don't know enough about Linux kernel testing, sorry), it is a matter of testing that on a representative set of configurations. Apparently no-one thought of having one with a swap file instead of a swap partition. That's okay, it's an understandable oversight, but one that clearly should be remedied.

If the devs foresaw the set of conditions that trigger the bug and wrote a test that produces those conditions.

This is why life on the bleeding edge is sometimes very bloody.

As no one did a TLDR: if you use a swapfile, it could overwrite part of the the filesystem where the swapfile exist.

Don't use swapfiles if you can. Use a dedicated partition, there are many advantages (including automatic RAID0 if you have a swap partition on each drive)

If you click the link, there's a TL;DR right up at the top, under the title.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact