Hacker News new | past | comments | ask | show | jobs | submit login
size_t-to-int vulnerability in Linux’s filesystem layer (openwall.com)
384 points by jwilk 6 days ago | hide | past | favorite | 275 comments

I love the little nugget in the mitigations section. You can plug the hole for a normal filesystem, but then FUSE filesystems have an additional problem: "if an attacker FUSE-mounts a long directory (longer than 8MB), then systemd exhausts its stack, crashes, and therefore crashes the entire operating system (a kernel panic)."

If there's one place other than the kernel where truly defensive programming should be applied, it is systemd.

What the hell is systemd doing that a 8MB long file path can exhaust its stack? Is it doing some recursive parsing or is it just doing something plain stupid like using a VLA to store user provided data?

Probably unbounded alloca() as always.

Yep: https://github.com/systemd/systemd/commit/b34a4f0e6729de292c...

strdupa(input) without any length check

Fix is to replace it with unbounded malloc() instead of checking for sane length first.

Good find thanks for sharing. And everyone at work gripes about me carrying the size around with a lot of my variables in the form of a struct. It's strictly a reminder to always be checking the size since I'm juggling with shotguns.

The fact that c doesn't have a native concept of an array with length and strings usually use a null byte to determine the end is, IMO c's biggest failing, and it's worst legacy on the wider software world.

This and having any pointer implicitly be nullable

Do they even C? It's official ideology: a good programmer compensates shortcomings of the language.

They seem to be all about the conciseness :) . We have gigabytes of memory, a size parameter isn't going to make a difference haha. The allow me my little idiosyncrasies though so I can't complain.

Shouldn't this PR also use `free` on the duped string before returning? (I never use C so probably missing something but just based on the docs of strdupa...)

The variable p is now declared with "_cleanup_free_" which is using some compiler cleanup/destructor attribute stuff to run free

ah okay, thank you :)

This fix to me reduces the performance for nothing. In Linux (or most general on any UNIX system that I saw) a path should not be longer (total) than PATH_MAX, that is typically defined to 4096 bytes. What is the point on allocating something statically at this point?

And yes, I know that really that is only a limit of the system call path lenght, and in theory you can work with longer paths (by changing the current directory to a path and then opening a file from there), because filesystems does (stupidly in my opinion) support it.

But in reality, how many applications will break? Does it make sense to support them?

Also the code in question seems to be dealing with a filename more than a path. A file name shouldn't be longer than NAME_MAX, and that is an hard limit of many (possibly all?) filesystems, as far as I know. So why?

It would be simpler and more optimized to just truncate the name at PATH_MAX. Avoid the overflow and the crash but give an error. Why hard limits are considered that bad? We waste time supporting edge cases that no one would really use in a real system (no way someone needs a path longer than 4096 bytes...), for what? In Windows the limit is 260 characters, and nobody seems to be bothered by that, only in Windows 10 you can increase that.

The Linux kernel doesn't have an actual path limit. Nor does Solaris. PATH_MAX is 4096 in glibc and musl libc because setting to it to something like INT_MAX or ULONG_MAX would break a lot of existing code that uses PATH_MAX to size buffers. (Though Solaris does define it as INT_MAX, IIRC.) OTOH, because of the lack of a hard limit there's also code that relies (if accidentally) on paths longer than PATH_MAX.

Linux does have a limit, at least for some system calls:

  $ strace -e trace=file perl -e 'open(FH, "<", "/" x 4096)'
  openat(AT_FDCWD, "//////////…"..., O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENAMETOOLONG (File name too long)
  +++ exited with 0 +++

There's a limit to what path string you can make the kernel interpret. That does not limit total path length. Keep looping on mkdirat/openat and you can make very very deep trees. As opposed to your syscall that is relative to an arbitrary directory, /proc/self/mountinfo has to contain the whole absolute path to be useful.

I stand corrected. It seems that Linux copies the entire path into a kernel-allocated buffer (see getname and getname_flags in fs/namei.c as called by various syscalls in fs/open.c), rejecting paths longer than PATH_MAX.

EDIT: And on Solaris PATH_MAX is 1024 and (AFAICT) Solaris also copies paths into kernel space. It seems I was confusing things with NL_TEXTMAX, which is INT_MAX on glibc (but not Solaris).


The Linux kernel defines upper limits for NAME_MAX (255) and PATH_MAX (4096).

The glibc doesn't enforce this limit because it was originally written to run on GNU HURD which I guess doesn't have these limits.

But systemd only runs on glibc on Linux. So I don't see why it doesn't at least sanity check the length of absolute paths with PATH_MAX...

> reduces the performance for nothing

Does the code in question ever run in a tight loop (e.g. on file operations after the filesystem is mounted), or just at mount time?

If it's just at mount time, "reducing performance" by one malloc vs a stack adjustment doesn't seem to me like it should be a primary concern.

We are talking about systemd, that is a core software. I would like systemd to do as few memory allocations as possible. The reason is that memory can run out, especially on embedded devices where you have for example 32Mb of RAM, and you have to properly manage the case that you run out of memory. Most programmers don't, and the program does crash in case you don't have memory available. That is bad for PID 1, because that would mean a kernel panic, that you don't want.

If PID 1 does need to do that kind of stuff (as it seems), I would prefer it to fork a process and do the memory allocation in that, so if that process crashes because you are out of memory the kernel doesn't panic.

> In Linux (or most general on any UNIX system that I saw) a path should not be longer (total) than PATH_MAX, that is typically defined to 4096 bytes.

That almost sounds like the 260 character windows path limit constant used by some ancient APIs. I would assume that any API limited to that path length is dated and probably unreliable in various contexts as the wikipedia article on filesystems explicitly gives the limit as not defined for various Linux filesystems. Also given the recent talk about in kernel support for NTFS (path limit ~2^16) I assume that any historic code still relying on PATH_MAX needs to be fixed.

The filesystem can support path length even infinite (simple, make a symlink of a directory inside that directory, you have an infinite path).

PATH_MAX is a limit of a path that you can pass to the various path manipulating functions, open(), unlink(), etc, or returned by getcwd() (that gives an error if path is longer than PATH_MAX, and yes there are non standard system call to go around this limit but... why?)

You can however use paths longer by PATH_MAX, how? Simply chdir() PATH_MAX, then you can chdir() another PATH_MAX, then count how many software breaks...

Imposing a limit on paths makes sense and should be done. 4096 bytes seems reasonable to me. Also, in that example, it wasn't even a matter of a path! They are parsing it seems only the file name, and that is defined to be NAME_MAX, that is 255 bytes, on every system and every filesystem!

was that a guess? wtf... btw. it would probably be hard to make the same mistake in rust. unless you write your own code for strings or use strdupa, too via libc.

I also do never understand why some libraries use "faster" methods everywhere, unless safer ones. it's not like all interfaces to systemd would need to be fast. but they should be secure.

Yes. It happened before, so it was not exactly hard to guess.


It's not a mistake. Allocating that string on the stack it is not a bad idea. Most of the time the string will be short, and thus an allocation on the stack is faster.

Consider that in Linux a path is defined to be a maximum length of PATH_MAX, that is defined to 4096 bytes, and a filename (and directory name) shouldn't be longer than FILE_MAX that is 255 bytes. This limits are defined in the headers and I use them always in writing my C programs (if it crashes... you are doing something really wrong!).

So how the hell do you have a directory that is more than 8Mb? You shouldn't! The filesystem doesn't support it. It's a matter of the filesystem driver that should reject a path that long in my opinion.

Systemd should be fast. It's at the base of the operating system. Also it should consume little memory. You can say, who cares about allocating dynamically a string, or allocating a static buffer of 16Mb, yes we should care, I use Linux computer with 16Mb of RAM, total. Of course they don't run systemd nowadays since it's too big, but in my opinion systemd is good, and I would like to see it more in the embedded world.

> The filesystem doesn't support it.

Remember that Linux supports hierarchical mounts! You can mount anything at any depth of directory nesting. Even if it were true that MAX_PATH were an FS limitation, you could still nest mounts and encounter absolute paths exceeding MAX_PATH. MAX_PATH is simply the length in bytes of the longest string you should expect system calls to accept as a path parameter.

> I use Linux computer with 16Mb of RAM, total. Of course they don't run systemd nowadays since it's too big, but in my opinion systemd is good, and I would like to see it more in the embedded world.

It sounds like using systemd is a terrible idea for memory-constrained devices, so you really don’t want to see it in the embedded world.

> It sounds like using systemd is a terrible idea for memory-constrained devices, so you really don’t want to see it in the embedded world.

On the other hand, proper event-driven init system (instead of horrible shell scripts with all sorts of fragile "sleep"s and other hacks) sounds sexy for an embedded system. I sometimes get annoyed how home routers, NAS, etc. are slow to boot up

Though the embedded systems I refer to have much more than 16 MB of RAM, more like 128 and up.

Is there a list of init systems that aren't made of shell scripts? Epoch is the only one I found.

Fun fact: PATH_MAX and FILE_MAX are glibc/muslc limitations. The Linux Kernel doesn't have a limit here and will happily let you walk into a directory with a 2GB pathname.

ext4 doesn't limit directory depth; only filename length. A filename can be 255 bytes in ext4. How deep that lies in the filesystem isn't limited.

btrfs has the same filename limit, no underlying limit on directory depth.

And I would most likely guess most filesystems don't because the obvious ways to implement directories don't place limits on that depth.

In rust you can't currently dynamically allocate on the stack, although that's probably something that will be added in the future. And as others have pointed out, allocating on the stack is a fairly reasonable optimization here.

I don't think you could even call strdupa through libc in rust. I would guess that strdupa is either a macro that uses the alloca compiler intrinsic or is itself a compiler intrinsic. Even if it isn't, it will break assumptions the rust compiler makes about the size of the stack frame.

Wait how does a user space daemon exhausting its stack lead to a kernel panic?

Because the kernel intentionally panics [1] if the init process would otherwise exit in any way – whether because it called exit(), it was killed, or, in this case, it crashed.

This is likely because Unix semantics treat the init process specially: any process whose parent dies is re-parented to the init process. It's not clear what should happen to these processes if the init process itself went away, so the kernel just gives up.

[1] https://elixir.bootlin.com/linux/latest/source/kernel/exit.c...

I wonder if just restarting PID1 could be viable alternative?

It's generally dangerous to restart PID1 in an enviroment where, by definition, something happened to PID1 that it wasn't expecting. The state of the system is now unreliable, and it's exactly the sort of unreliable in exactly the right place that tends to lead to whopping security issues. Far too easy to end up with "Crash PID1 with 'blah blah blah', then when it restarts it ends up doing bad things X & Y".

Perhaps a distinction here can be made between running as a server OS and a desktop OS.

In a server I generally want a crash immediately. But on a desktop I'd rather it limp along and give me a chance to finish writing my Hackernews post.

I'd rather push the other way on that. We should treat desktops with as much security paranoia at servers.

I'd much rather _not_ have my desktop "limp along" in a poorly understood and probably exploitable fashion while the malware gets a chance to finish encrypting all my files...

If that costs the world the "benefit" of my shared wisdom in a half written Hackernews post, I'm good with that.

I run a a lot more untrusted code on my laptop than on my cloud servers. Likewise for work, even more so as I don’t myself trust the spyware/malware they jam on the laptops.

Basically, there's no solution at this level of granularity. One can also argue that the desktop is where the most important stuff is that we least want hacked, e.g., your family photos, documents, other stuff with a high priority of not being backed up, so we must treat security even higher than a server at this point.

I call these the "already lost" situations. You've already lost, we're just arguing about how to distribute the lossage. While those discussions aren't completely pointless, it is important to keep it clear in our head we're arguing about how to pick up the bodies at a crash site and not how to prevent the crash in the first place; it's a different mindset.

Despite some moderately-justified mockery in the other messages in this thread, the answer really is "just don't crash and have secure code here", which is to say, "don't lose". It's exceeding hard to write and it's a very high bar, but at the same time, it's very difficult to imagine how to secure a single system when you can't even stipulate a core of trusted software exists. If you don't even have a foundation, you're not going to build a secure structure. In this case, by "secure" I don't just mean security, but also, functionality and everything else.

On a server I'd usually want it to limp along too... Better fire an alert but keep happy customers than cause a massive outage just because of someone's overly strict checkfail...

It depends really what your server does, and what the consequences of it doing the wrong thing are.

If the consequences of one server wedging itself is a "massive outage" and "unhappy customers", then you probably don't really care about that outage or those customers. If you don't have enough redundancy and alerting and automated disaster recovery to keep your customer facing shit up when one server panics, you're just relying on luck to keep your customers happy.

Fire an alert, remove that server from the load balancer, and fix the problem without your customers even noticing.

Or make sure if you're running a hobby-project architected platform that your customer expectations and SLAs are clear up front, and let it go down until Monday morning when you'll get around to fixing it.

Or just don't put stuff that crashes in pid1

seriously, if you're doing allocations in pid1 you fucked up.

Yeah, great idea: rather than worry about how to deal with software bugs, just never have bugs...

Exactly. This is why systemd is a terrible design. The fastest and most secure code, that never crashes and never needs patching, is code that doesn't exist.

runit.c is 330 lines of code

Keeping it small and simple to minimize bugs is perfectly viable and reasonable.

A better alternative would be to keep PID1 as simple as possible, and do anything more complex in a subprocess.

Systemd of course goes in the opposite direction: It assimilates as much functionality as possible from the OS into systemd (though to be fair, not all into PID1).

PID 1 is special in Unix systems. As the parent of all other processes (and child to none) it's not clear to the kernel what should happen when it exits.

Poorly designed security architecture and division of labor. A more idealized init / systemd would have all of the execution flow of PID 1 mathematically provably correct, and correspondingly have as small a footprint as possible there. All additional functions would run under one or more child processes (where the bulk of systemd would execute).

"Lets put the graphics drivers in ring 0, for better performance!" -- Windows NT architects, 1996

"Ummm, lets not do that, it's not such a great idea..." -- Windows Vista team, 2006

Initial Windows NT uses user mode graphic driver, then NT 4.0 move it kernel mode.


Yep. In about '96.

(Well, it came out late '96, so I suppose a bunch of that was actually done in '95.)

Different trade offs for different eras and different constraints.

Systemd is a swiss-army-kitchen-sink-knife monolith of brittle complexity.

A proper init system similar to runit or s6 would be written in something safer (minimum unsafe) like Rust, be modular, simpler, follow UNIX philosophy, and not try to do everything in one process. Microkernel-style.

Because it's PID 1.

systemd? More like systemK! But seriously, are non-systemd systems not vulnerable to the FUSE portion of this? (CVE-2021-33910)

FWIW, I feel like your comment is responding to an implicit critique of systemd, but even if one was warranted I didn't read that comment as implying such (as the premise would just be that systemd is a key place in the stack where you would need to be super careful, not that it is somehow less careful than other projects... even if I might claim as such for at least logging ;P); it could be that I am misinterpreting your comment, though?

Yeah, I think you and a lot of other people misinterpreted my comment, since it was oddly one of my most-downvoted-ever.

My "systemK" joke was indeed implying what you said, that systemd is "a key place in the stack where you would need to be super careful." (Almost Kernel-like.)

And my question was legitimate, although poorly-researched. Answering myself: CVE-2021-33910 only affects systemd, not all FUSE in general.

Anyone knows how to try the PoC (https://www.openwall.com/lists/oss-security/2021/07/20/1/1) ?

For me it crashes into the fork_userns:177

PS: don't need to downvote. Sometimes managers want you to prove that there's a need to patch. It's dumb but it's what it is

Your linux distro may already have unprivileged user namespaces disabled. See the "mitigations" section of the post, and check /proc/sys/kernel/unprivileged_userns_clone

This kind of issue is the reason why some more modern languages like Rust or Go do not have implicit narrowing conversions. For instance, on Rust, trying to simply pass an usize (Rust's equivalent of size_t) to a function which expects an i32 (Rust's equivalent of int) will not compile; the programmer has to write "size as i32" (Rust's equivalent of "(int) size"), which makes it explicit that it might truncate the value at that point.

(Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.)

> Some Rust developers argue that even "size as i32" should be avoided, and "size.try_into()" should be used instead, since it forces the programmer to treat an overflow explicitly at runtime, instead of silently wrapping.

It's important to still have the option for efficient truncating semantics, though; some software (e.g. emulators) needs to chunk large integers into 2/4/8 smaller ones, and rotation + truncating assignment is usually the cheapest way to do that.

But, importantly, this is a rare case. Most software that does demoting casts does not mean to achieve these semantics.

So I wonder — are there any low-level/systems languages where a demoting cast with a generated runtime check gets the simple/clean syntax-sugared semantics (to encourage/favor its use), while truncating demotion requires a clumsier syntax (to discourage its use)?

Right, this is a small infelicity in Rust, it is easier to write

  let x = size as i32;
... even if what you meant was closer to

  let x: i32 = size.try_into().expect("We are 100% sure size is small enough to fit into x");
But at least it isn't C or C++ where you might accidentally write

  x = size;
... and the compiler doesn't even warn you that size is bigger than x and you need to think about what you intended.

It's really hard to fix this in C++. Some of the Epoch proponents want to do so using epochs to get there, basically you'd have a "new" epoch of C++ in which narrowing must be explicit, and old code would continue to have implicit narrowing so it doesn't break.

> and the compiler doesn't even warn you that size is bigger than x

That's not true tho, compiler with reasonable flags set will definitely warn you and if you really don't like this kind of code you can force compiler to issue an error instead

What flags you have in mind? Because this code doesn't generate any warning with GCC 11.1.0 with -Wall -Wextra:

    int main(void) {
        int some_int = 1234567;
        char c = some_int;

        return c;

-Wall -Wextra -Wpedantic does not enable all diagnostics.

This is GNU's idea of "all".

Contrast to Clang's -Weverything, which will.

It seems though that the point is made, right? Even 'good' approaches miss on what should be a clear 'whoa, are you sure?' type warning. There are a lot of footguns wandering around in C/C++ land.

No, the point was you want don't get a warning and it will silently wrap. You can scroll up if you've forgotten.

And it is false. My default configuration C++ project created in Clion shows it very clearly, and even pesters to use int32/int64 over int/long.

But as usual the default fallback when you're wrong about C++ is "uh yeah but lotta footguns amirite"

As if there aren't enough that we need to start making them up...

And yet RedHat's recommended compiler flags for GCC [0], for example, do not appear to catch the wrapping assignment in the above example code.

0: https://developers.redhat.com/blog/2018/03/21/compiler-and-l...

Ah yes, of course the goalpost was "you need to customize your settings to catch it" above.

Now that the default in the most beginner friendly of IDEs catches it, the goalpost is "my pet source of customization designed with C++98 in mind doesn't catch this"

Of course, even your pet source of customization caught up: https://developers.redhat.com/blog/2021/04/06/get-started-wi...

If by "caught up", you mean talked about clang-tidy in a separate post, which is definitely not a GCC compiler flag, then sure.

The goalpost, since you're insistent on being explicit about it, was whether a C/C++ compiler "with reasonable flags" will catch the implicit wrap. GCC is a very popular compiler, and to be honest, I'm still not sure how to get it to warn on the above code, if doing so is possible.

Edit: Just read the rest of the thread, it's -Wconversion, which I suppose makes sense. Ignore me, point taken.

If said beginning friendly IDE is used by only a couple percent of the ecosystem, it seems disingenuous to use it as proof this isn't a problem in this context?

Ok so we're going to keep shifting the goalposts, now it's "there aren't enough beginners relative to total usage so beginner friendly IDE isn't enough"...

I mean MSVS uses Clang-tidy too, Clang-tidy integrates style guides provided by Mozilla and Google.

Most C++ Google projects have clang-tidy configs.

Clang-tidy is literally table-stakes for modern C++ tooling.

Github shows 970,000 commits related to setting up clang-tidy

But uh, yeah, let's see where the goalpost skitters to next.


The irony is I said above, C++ has enough footguns without sticking your fingers in your ears and ignoring boring, easy to setup, widely well known and well used tooling.

But in the war against C++ no stone must be left unturned.

C++ is a tiny fraction of all the code I've written in my life but it irks me to no end that people can't deal with the idea that language safety can improve, that tooling can be considered part of that safety. Or rather they can... unless they're talking about C/C++

I’m definitely not moving any goalposts I know of!

I thought the point I had been making, as had others, is that by default this is an easy footgun.

There are all sorts of things that can be added on to all languages to help - if you know it’s a problem worth solving, etc. which is inevitably after you’ve footgunned yourself with it bad enough you felt the need to research how to prevent it.

Other languages just do the safer thing (or most compilers By default warn at least about common footguns) more - which is the whole point of this thread?

There was one point, C++ won't warn you by default .

But tooling that is incredibly common, that beginners will run into even if they take the path of least resistance, and experts will use because it enforces standards at the very least, covers it.

Like Js without linters is a minefield, but everyone accepts you should lint your Js. Why does that change when C++ is involved?

The reason for this decision is so that compiler upgrades with -Wall and -Werror don't break builds.

I can see the reason behind it, but I feel that this behavior is something you opt into when you use -Werror.

> The reason for this decision is so that compiler upgrades with -Wall and -Werror don't break builds.

It feels like the "right thing" here would instead be for the compiler to allow build scripts to reference a specific point-in-time semantics for -Wall.

For example, `-Wall=9.3.0` could be used to mean "all the error checks that GCC v9.3.0 knew how to run".

Or better yet (for portability), a date, e.g. `-Wall=20210720` to mean "all the error checks built into the compiler as of builds up-to-and-including [date]."

To implement this, compilers would just need to know what version/date each of their error checks was first introduced. Errors newer than the user's specifier, could then be filtered out of -Wall, before -Wall is applied.

With such a flag, you could "lock" your CI buildscript to a specific snapshot of warnings, just like you "lock" dependencies to a specific set of resolved versions.

And just like dependency locking, if you have some time on your hands one day, you could "unlock" the error-check-suite snapshot, resolve all the new error-checks introduced, and then re-lock to the new error-check-suite timestamp.

I think it might be more of an headache: what if somebody fixes a bug in an analyzer so that it catches things it used to miss ? Should it be a breaking change ?

Personally i would vote for "Wall with Werror" means no guarantee for your build.

The real solution: leave Werror off by default, activate it only during CI builds

That's even worse, because then an upgrade to the compiler in the managed CI runner (e.g. Github Actions') base-image will translate to the same version of the code failing where it previously succeeded, with nobody sure why.

At least with -Werror on at all times, devs will tend to upgrade before the very-stable CI environment does, and thereby catch the problem at development time (usually less time-pressure) rather than release-cutting time (usually more time-pressure, esp. if the release is a hotfix.)


Mind you, it does work to enable -Werror only in CI, if you lock your CI environment / compiler Docker image / etc. to a specific stable version, and treat that as the thing to re-lock in place of the "error-check suite snapshot version."

This has the disadvantage, though, that you can't take advantage of newly-stable/newly-unstable language features, or of newly-introduced compiler optimizations, without biting the bullet and taking on the work of fixing the errors introduced by re-locking the base-image.

With a separate flag for locking down the error-check-suite snapshot version, you could continue to upgrade the compiler — and thereby get access to new features / optimizations — while staying on a particular build regression "scope."

> That's even worse, because then an upgrade to the compiler in the managed CI runner (e.g. Github Actions') base-image will translate to the same version of the code failing where it previously succeeded

If you don't want your build to fail on warnings, don't use -Werror. If you want it to only fail on specific warnings, use -Werror=...

> with nobody sure why

Unless they look at the errors in the compiler output. What does it matter if it was brought on by a compiler update or a push?

> At least with -Werror on at all times, devs will tend to upgrade before the very-stable CI environment does

Nothing wrong with -Werror for devs - the problem is when you ship code to others and leave -Werror on by default.

Is -Werror really supposed to not break builds?

The whole point of -Werror is to break builds and -Wall / -Wextra are definitely not frozen. If you can't handle compiler updates resulting in errors, don't use -Werror in that environment.

keep in mind though that -Weverything is not intended to be used in production: https://quuxplusone.github.io/blog/2018/12/06/dont-use-wever...

-Weverything is great for CI though, in compination with lots of -Wno-... flags to disable warnings you don't want. Instead of having to manually look out for new warning flags you will get all automatically.

Yep, this is what I do; throw in -Weverything followed by a few things like -Wno-packed -Wno-padded -Wno-unused-parameter.

> This is GNU's idea of "all".

Unfortunately, over the years people baked the semantics of -Wall into their builds so new diagnostics could not be added to that flag.

And clang’s -Weverything shows how the opposite can fail as well

There are some very wrong-headed warning options in gcc, such that turning them on and avoiding getting them will make your code worse. So -Wall means 'all recommended warnings'.

Also there are some warnings that won't be produced if you compile without optimization, because the needed analysis isn't performed.

And yet we have things like -Wmaybe-uninitialized in -Wall which by definition will occasionally warn on perfectly good code.

-Wconversion will do it.

-Wconversion will catch this

>What flags you have in mind?



This is common knowledge for ages. Any cursory Google search returns countless answers.

Take this post made over a decade ago.


Fair point although it seems "reasonable" varies from one platform to another, it doesn't warn out of the box for me but people have reported MSVC gets warnings here.

Narrowing within { } initialization is forbidden C++ now

It can impact compile time performance but Boost Safe Numerics provides some nice wrappers to prevent narrowing (or restrict it to specific classes of narrowing) and throw warnings or errors at compile time similar to what you see in Rust.

If you initialize it like this you get a warning:

x = { size };

In swift:

    //  a is a UInt64
    let a = Int.random(in: 0..<Int.max)
    //  causes a fatal runtime error if out of range, halts execution
    let b = UInt32(a)
    //  returns a UInt32?, which will be nil if out of range
    let c =  UInt32(exactly: a)
    //  another approach for exact conversion:
    guard let d = UInt32(exactly: a) else {
        //  conversion failed
        //  handle error and return
    // 'd' is a UInt32 without any bits lost
    //  always succeeds, will return a UInt32, either clamped or truncated.   Truncation just cuts off the high bits.
    let e = UInt32(clamping: a)
    let f = UInt32(truncatingIfNeeded: a)

As a interesting side note:

The "as" operator is often considered to have been a mistake. Both because of unchecked casts and because of "doing to much".

So I wouldn't be surprised if in the (very) long term there will be a rust edition deprecating `as` casts (after we have alternatives to all cast done with `as`, which are: Pointer casts, dyn casts/explicit coercion and truncating integer casts, for some we already have alternatives for on stable for other not).

And for all who want to not have `as` today you can combine extension traits (which internally still use `as`) + clippy lint against any usage of `as`.

EDIT: I forgot widening integer casts in the list above ;-).

I would prefer not to see that happen, I'm fine with as and the safer options as they are currently. It would be a big job to update all the code in the wild when you want to move to the newer edition.

I don't see it as a big job. Am I wrong?

Imagine there's a suitable narrow::<type>() function introduced which has the same consequence, always narrowing, if your data was too wide it may drop important stuff on the floor, and narrow() just says that's too bad.

Rust 2030 can introduce narrow::<type>(), warn for narrowing as usage and then Rust 2035 can error for as. The Rust 2030 -> 2035 conversion software can consume code that does { x as y } and write { x.narrow::<y>() } instead. This code is not better but it's still working in Rust 2035 and this explicit narrow() function is less tempting for new programmers than as IMO.

Yes, BUT the correct thing to do would be more tedious to write that way.

You don't need to do that. Rust editions are fully backwards-compatible, since they can depend on code from different editions.

I'm aware of that, which is why I specified when you want to move to the newer edition.

Isn't this something that could be done automatically by rustfix?

> are there any low-level/systems languages where a demoting cast with a generated runtime check gets the simple/clean syntax-sugared semantics (to encourage/favor its use), while truncating demotion requires a clumsier syntax (to discourage its use)?

Not exactly the same thing, but in a related area C++ does this a bit.

In C++ you can always still do a c-style cast `(int) some_var` (and the implicit casts obviously), but in general you're meant to use the C++ style explicit casts like `static_cast` and `const_cast`. These are generally tidy, but the most powerful and dangerous of these casts is deliberately awkwardly named as `reinterpret_cast<int>(some_var)` rather than something terse.

It's always easy to spot during a code review.

You can have your compiler warm about c-style casts

In Virgil, casts are written "type.!(expr)". Casts between numbers check ranges and roundability (for float<->int conversion). Reinterpreting the bits is written "type.view(expr)" and ignores signs, looks at the raw float bits, etc.

edit: a cast will throw an exception if it fails, in case that was not clear from context.

It would be nice if CPU's had an instruction for "read the low 8 bits of this register, and require the high bits all be zeros (otherwise throw an exception)".

Then safety is free...

What does "otherwise throw an exception" mean at the CPU level?

I know Erlang's BEAM VM has a "fail jump pointer register", where instructions that can fail have relative-jump offsets encoded as immediates for those instructions, and if the instruction "fails" in whatever semantic sense, it takes the jump.

But most CPUs don't have anything like that.

Would you want it to trap, like with integer division by zero?

CPU traps are pretty hard to handle in most language runtimes, such that most compilers generate runtime checks to work around them, rather than attempting to handle them.

I assume they meant throwing an exception like divisions by zero usually do, i.e. a hardware trap.

I always thought that overflows should be checked in hardware, I suppose it's not a stretch to extend that to truncation. It's controversial though, and obviously mostly a thought experiment anyway unless we manage to convince some CPU manufacturer to extend their ISA that way.

MIPS does have (optional) trapping overflow on signed add/sub overflow, so at least there's a small precedent for it.

Swift checks all arithmetic by default: https://swift.godbolt.org/z/rW614G5aq

It seems obvious that future Apple CPUs will have hardware support for this, if they don't already.

I don’t see this happening unless it makes it into the ARM ISA.

I don’t know the terms of Apple’s license with ARM, do you? I’m quite interested.

Given that Apple was one of the original founders of ARM it’s quite possible that their license allows much more latitude anyone else’s.

Adding new instructions to userspace programs is almost certainly not going to fly. All of their extensions have been hidden behind an opaque API, or limited to use in the kernel.

That… makes no architectural difference at all. As far as the architecture is concerned, these are architectural extensions either way: userspace programs can observably contain and execute instructions which are not standard ARM.

If AMX is allowed under their license, there is no reason why checked extensions would not be.

Apple benefits from compatibility with the ARM ecosystem, but there’s no downsides (from their perspective) from extending it. Their chips are “Apple Silicon”, setting the stage for forging ahead alone. I think it’s a card they hold in reserve, to be played when the time is right, like the Intel transition.

I'm not familiar with ARM ISA, but from the godbolt disassembly, it doesn't look like anything special going on here - just the ASM being generated. What's happening here is it just does the add, jumps on overflow flag set to an invalid opcode...

The suggestion was a custom instruction or architectural extension to have this happen in hardware, rather than needing to write out extra code for this.

D doesn't allow implicit narrowing conversions. The user has to have an explicit cast, like `cast(int) size`. Cast is made into a keyword so all those explicit conversions can be found with a simple grep.

We consider it best practice to try and organize the types such that explicit casts are minimized.

This is a deep hole for language design.

I thought about this very, very carefully when designing Virgil[1]'s numerical tower, which has both fixed-size signed and unsigned integers, as well as floating point. Like other new language designs, Virgil doesn't have any implicit narrowing conversions (even between float and int). Also, any conversions between numbers include range/representability checks that will throw if out-of-range or rounding occurs. If you want to reinterpret the bits, then there's an operator to view the bits. But conversions that have to do with "numbers" then all make sense in that numbers then exist on a single number line and have different representations in different types. Conversion between always preserve numbers and where they lie on the number line, whereas "view" is a bit-level operation, which generally compiles to a no-op. Unfortunately, the implications of this for floating point is that -0 is not actually an integer, so you can't cast it to an int. You must round it. But that's fine, because you always want to round floats to int, never cast them.

[1] https://github.com/titzer/virgil

C/C++ compilers commonly have warnings for narrowing conversions, and separate warnings for mixing signed/unsigned conversions for same-sized values.

While some folks aren't too fussed about warnings like this, those folks generally aren't writing secure code like kernels. I'm very surprised that kind of conversion was permitted in the code.

Shout out to Zig, which requires explicit casting also.

I wish clippy had a lint against downcasts specifically. I aso like that rust has no “int” type

It does, you just need to enable it. There are a bunch of other cast-related lints as well that are allow by default.


Oh wow, didn’t know about that one. Is this new?

Languages of the same age or older than C, also have explicit narrowing, but apparently that was seen as programming with a straightjacket.

That's cool! And some languages like Go don't even allow implicit widening conversions: https://play.golang.org/p/a5C5jsHypmu

Rust doesn't allow them either: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Adding .into() works though, which is the recommended method if the conversion can be statically guaranteed (otherwise try_into should be used, which will become easier in the 2021 edition as the TryInto trait will become part of the prelude).

into() does not work from size. It's rather frustrating in practice. https://stackoverflow.com/questions/62832438/why-is-rusts-us...

Yeah, that is frustrating since there are no platforms where that would fail today, and it's hard to imagine why we would ever want one with 256bit pointers.

We don't even use full 64bit pointers today on x64.

Maybe a fat pointer llvm target ala SoftBound running on a machine with 128 bit bare pointers, like as/400.

When 16 bit computers went to 32 bit, people probably thought that one wouldn't ever need 64 bit computers either. That being said, by the time 128 bit run out we have probably boiled earths oceans :).

You could think of checked memory models where half of the 256 bit address is a 128 bit random key needed to access some allocation, or maybe even a decryption key. Similar things are done with the extra space of x64 as well.

Also, 128 bit numbers are still quite uncommon in Rust. Easy conversion of usize to them wouldn't be that useful if conversions of the other number types don't work.

> When 16 bit computers went to 32 bit, people probably thought that one wouldn't ever need 64 bit computers either.

People have said that about pretty much every memory size in the history of computing. The argument for 64-bit is not "2^64 bytes ought to be enough for anyone"; it's "you couldn't use more than 2^64 bytes even if you wanted to". Writing a full register worth of data every clock cycle at 4GHz works out to 32GB/s. (2^64B / 32GB/s) is just over seventeen years, to fill up a 64-bit address space, assuming you're doing no actual computation. Few computers even work for seventeen years without replacement, much less run single processes continually with no reboots.

The Ethereum virtual machine addresses it's storage with 256 bits, so there's one wild example. Although in this case you'd probably not want to use usize directly to represent storage.

I'm not familiar with the Ethereum virtual machine, are these really memory pointers? Surely it's represented differently?

EVM is a stack machine, and it has durable storage which is addressed using 256 bit "pointers". So you can do something like

    push x // 256 bit constant
    // top of stack now contains storage[x]

Even worse, there are platforms with >=128-bit pointers but 64-bit address space. Rust has chosen usize to be uintptr_t rather than size_t, even though it mostly uses it as if it was size_t. A ton of code is going to subtly break when these two sizes ever differ. Rust is likely already doomed on >64-bit platforms, and is going to be forced to invent its own version of a LLP64 workaround.

Sorry, I'm not sure how this is a problem. On segmented architectures size_t is smaller than uintptr_t, but that just means there needs to be an allocation limit < usize::max_value.

It would have cause more bugs if they defined it the other way and people used usize to store addresses.

"Fortunately" there will be bigger problems than Rust on 128-bit machines - lots of UAPI structures in the Linux kernel have pointer size hardcoded to 64 bits.

> It's rather frustrating in practice.

All explicit conversions, more so fallible, are "frustrating in practice" especially when coming from a language without those foibles.

But given the semantics of usize/isize, it is perfectly reasonable, nay, a good thing, that they're considered neither widenings nor narrowings of other numeric types.

usize should be Into<u64> iff usize is <= 64 bit.

usize is already on that hook: `0x100000000usize` will fail to compile in 32 bit, so you already risk compile errors when switching archs.

As it is I'm just writing `as u64` which is clearly worse.

Add Java to that list.

Same for C#. Any narrowing truncation needs to be an explicit cast. Widening is typically allowed implicitly, although in the case of the 'decimal' (128 bit struct representing a 'higher precision' floating point) type you still need an explicit cast from a 'double', since there are cases where that conversion can still change the value or fail (i.e. Infinity/NaN)

Microsoft is considering turning on by default checked arithmetic in the 2022 Visual Studio templates, by the way.

It can catch people by surprise that Java's narrowing conversions may not preserve the sign. For example, the following is broken:

class TimestampedObject implements Comparable<TimestampedObject> { long timestamp; int compareTo(TimestampedObject other) { return (int)(timestamp - other.timestamp); } ... }

ErrorProne catches this: https://errorprone.info/bugpattern/BadComparable

Noo! This idea is super new and was invented by Go/Rust.

The compiler can issue warnings for this.

This os why in C it is a good practice to enable all compiler warnings and to have the compiler treat warnings as errors.

In theory but not in practice if you're distributing your apps sources.

If you write for C compiler Foo 8, there's a decent chance Foo 9 will raise a warning which didn't exist before. Now you have to handle "why doesn't this compile" issues and distributions have to patch your sources to do future releases. And that's ignoring bugs like GCC in the past where in some versions you could not satisfy specific warnings.

The golden rule is: use -Werror for Debug builds, so you catch and fix all warnings during development. That's fair enough.

But never ever leave -Werror enabled for building in Release mode. You'll be preventing your code from building as soon as a new compiler version goes out. Maintainers or code archeologist will have a much worse time than if this option was simply disabled to start with.

Long and int are the same size on Windows. The fact that these sizes aren't well defined is the cause of the issue.

This wouldn't be a problem if "int" was defined as the same size as size_t. The solution is probably to change all those functions to take a parameter of size_t instead of int.

IMHO one should always be using C99 types instead of int, but Linux predates that.

Also, shouldn't that implicit conversion cause a compiler warning?

> This wouldn't be a problem if "int" was defined as the same size as size_t.

ILP64 causes a lot of problems, most notably needlessly-increased memory usage and, in C, the inconvenience of requesting a 32-bit type when int is 64-bit. It's rather uncommon to actually need the extra 64-bit range except when describing pointer addresses and memory/disk sizes, both of which benefit from an explicit intptr_t/size_t type for readability if nothing else.

ILP64 also solves a lot of problems if you don’t define overflow to be UB.

Linux apparently provides a similarly named set of sized integer types to Rust, ie: s8 u8 s16 u16 s32 u32 s64 u64

But of course getting C programmers to use these integer types rather than the ones they grew up with isn't easy.

I did a lot of C - a lot - in the mid-90s through the mid/late 2000s and have never seen any sizable C code base where explicit sizes were not the norm throughout - u_int32_t etc. I think this is not the problem, it's the implicit conversions.

I don't think that stops C's implicit conversions either, does it?

It doesn't. They are in the end just typedefs to the basic types like int, long, etc.

> IMHO one should always be using C99 types instead of int, but Linux predates that.

"Always" is a strong way of putting it, there are often times where it makes sense to use the platform's "natural" word sizes (which is the entire point of having `int` `long` `long long` etc.)

>> there are often times where it makes sense to use the platform's "natural" word sizes

In those cases we probably don't care about the full range of the larger types, so it doesn't hurt to use the smallest type for a range of expected values. If it does make a difference, the program will behave differently when compiled on a different arch or even a different compiler.

But maybe "generally" instead of "always". OTOH even I am guilty of using an int to loop over an array.

> This wouldn't be a problem if "int" was defined as the same size as size_t.

That would lead to a hole in the type sequence (char <= short <= int <= long <= long long) for 64-bit targets (where int is the 32-bit type while size_t is 64 bits).

> IMHO one should always be using C99 types instead of int, but Linux predates that.

On the other hand, on Linux "long" has always been defined as the same size as size_t, so using "long" instead of "int" everywhere could also be an option.

so you mean int_fastX_t or int_leastX_t?

I think Rust has no language construction for that, but the best implementation of "size as i32" should fail on overflow.

In Haskell I would use an exception, and mark the function as unsafe, but the stdlib seems to disagree with me here.

> Unfortunately, this size_t is also passed to functions whose size argument is an int (a signed 32-bit integer), not a size_t.

Is this the type of things that could be caught by a linter or strict compilation rules? This seems to be to be a failure of the type system.

At this point, you can—and probably should—consider C and “C-with-analysers” two different languages. If you use static (and dynamic, if you have tests) code analysis, making software without these issues is way easier, and doing so without these tools is essentially impossible. Both because of the C language itself as well as because of the culture of “bytes go brrr” and “just use a debugger” that a lot of middle-level C programmers have in my experience.

Yes, it is the type of thing caught by a linter or strict compilation rules.


But strict compilation rules (eg, clang's -Weverything) mainly only work if you treat them as errors (so -Werror), and then some of those strict are then also questionable at best, and just outright annoyingly wrong at worst. For example, unused parameter warnings on virtual methods are a waste of time to deal with. It's not a symptom of a bug most of the time, so it being an error just generates workaround churn or you end up just disabling the warning and then maybe that bites you the few times it would have pointed out an actual issue.

Beyond the blanket ones like clang's -Weverything, it can otherwise be a job to keep up with compiler upgrades and the vast number of warning options they have.

> For example, unused parameter warnings on virtual methods are a waste of time to deal with.

Why is that even a warning? If at least one of the implementers use a parameter and a warning is shown the warning itself is wrong. That’s just broken implementation of the warning?

AFAIK most compilers by default will output a warning in this case.

Too bad that most projects are so full of integer size and signedness warnings that people get warning fatigue and just completely ignore them.

GCC's -Wconversion has some issues. For example, good luck getting gcc to /not/ emit a warning for this code, in C or C++. I have yet to find the appropriate cast to avoid a warning. Clang does not warn for this.

    typedef struct {
        unsigned value : 4;
    } S;

    void foo(S* s, unsigned value) {
        // error: conversion from 'unsigned int' to 'unsigned char:4' may change value
        s->value = value;
I mean, I guess I can see the rationale.. it's just annoying to have to resort to using pragmas to turn off -Wconversion whenever I need to assign to a bitfield.

Does it still warn if you do `s->value = value & 0x0F`? That seems like a reasonable alternative to pragmas if it works.

Thanks, that fixed it!

It doesn't.

GCC and Clang (the predominant Linux and Mac compilers) mostly don’t warn by default, you need -Wall or other flags (-Wextra, -Weverything, or specific flags like -Wconversion).

I don't see any warnings for this narrowing parameter conversion.

  #include "stddef.h"

  short foo(short a) { return a % 42; }
  size_t bar(void) {
      size_t sz = ~0UL;
      return foo(sz);

On MSVC, default project settings I get:

    warning C4267: 'argument': conversion from 'size_t' to 'short', possible loss of data
https://godbolt.org/z/nYeWT7zv6 (/W3 is the default warning level when creating a new project)

I saw these warnings so often that I assumed that every compiler had them.

It warns if you add -Wconversion. Unfortunately, that flag generates lots of false positives (at least in gcc), so using it isn't always a good idea.

What kind of false positives are you seeing with gcc?

Personally I have never seen gcc spitting out a false positive. IMO it's always a good idea to explicitly downcast even if you know that it's 'safe'. That way someone else will see instantly what's going on. The fact that Rust requires it should tell us something.

For example, I've just reported this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101537

You can find more cases in the bugtracker. To be fair, it seems many of them were fixed in recent releases.

The 1 and 0 are ints. It most likely complaints because the sign isn't same.

It isn't ever a good idea. C programmers are just supposed to know about the conversion and promotion rules, every time they add a line, even though the rules are actually insanely complicated. Compiler warnings can't overcome this because programmers have no way of just discovering one of GCC's 726 warning flags, only a few of which are enabled by -Wall and -Wextra, most of which are way too noisy to serve a useful purpose.

In my own small projects, I always add -Wconversion to build configuration. I think the false positive is affordable when you start from small piece of code.

Yeah; AFAIK you need something like clang-tidy's cppcoreguidelines-narrowing-conversions check (which everyone should definitely be using). (edit: But I'm wrong! That check is apparently similar to the -Wconversion mentioned by someone else.)

They can't warn by default, because on some platforms long->int isnt a narrowing conversion.

The compiler knows what platform it's compiling the code for, though.

clang's -Wshorten-64-to-32 can catch this:


The more general-purpose -Wconversion has many false positives, often around int to char conversion. Some functions like int toupper(int) have an unexpected int return type to deal with special out-of-bound values like EOF.

Yes, this can be caught by a static analyzer and it is sad that Linux doesn't use it. I wonder, is it because the code quality is low and there would be too many warnings?

They do use them, but only in a narrow fashion. A lot of people have tried to make default kernel wide static analyzers but they are mostly not useful.

And it's because the kernel does a lot of non-standard things, mostly because it has to. It is not a normal program.

A fun fact is there are 196 instances of `int buflen` in Torvalds' tree today, 95 instances of `int namelen`, and 13 instances of `int pathlen`, also 3000+ `int size`.

Cached mirror - https://webcache.googleusercontent.com/search?q=cache:LwH96X...

I'm clueless about security: where does this fall on the scale of non-issue to critical? It strikes me as tending towards the latter, given that it enables unprivileged users to become root. Any insight into past Linux Kernel vulnerabilities that were severe?

Local root escalation is very common vulnerability. It's almost pointless to expect that attacker will not be able to escalate his privileges given enough time, if he got user account access. One weak layer of defense at most.

Pulling this attack off requires enough access to the machine to either run the unprivileged commands needed to create the exploit condition or to upload a binary/script that runs unprivileged that in turn creates the exploit condition.

If the attacker already has that level of unauthorized access, you're already doomed.

Attacks like this break multitenant computing environments. It's not a threat to your desktop computer or your phone. But it can be a very big deal for hosting environments.

It also breaks sandboxing. To whatever extent you're trying to run programs that are somehow jailed, so you can download and run them without worrying about them taking over your system, kernel LPEs break those assurances.

Android also uses user accounts (one per app) to enforce its security model. Is that what you refer to with sandboxing?

I see in the mail that Red Hat sent out patches to resolve this. Are those patches already merged, or is this a CVE about a live exploit?

Still not fixed in the mainline kernel, it seems.

It's fixed in 5.13.4.

Afaik this should work on Android for some time now too right?

Got unrootable old (but still fully working) phone there, might try to play with it.

... do Android kernels normally build with and allow non-privileged users to make namespaces? I'd be really surprised.

I'd love to see that. Please share if you end up doing so.

C++ engineers check out: Cppcon 2018: Safe numerics by Robert Ramey https://m.youtube.com/watch?v=93Cjg42bGEw

I've always thought implicit parameter conversion in general (narrowing or otherwise) was fraught at worst, and code smell at best. If some function takes a size_t, why are you passing something other than a size_t to it? If your eventual call into "code you don't control" takes type X, make sure the value that you eventually pass is type X all the way through the code you do control. Even casting is kind of the lazy way out. I used to be pretty dogmatic about this and more than a few times got called a pedantic nitpicker (size_t is just like an int, bro! Don't worry about it--we got to ship in a week!). You can probably find serious bugs in any C or C++ software project simply by turning on the implicit cast warnings.

And it's not just parameter passing. It can apply to anything that acts like an assignment. That includes assignment (of course), parameter passing, and returning a value from a function.


m->size must be of type size_t. It's slightly mind-blowing to me that casting to a smaller unsigned int can cause a vulnerability. But I guess unintended behavior (not undefined) can do that.

It's very common. All that needs to happen, as just one example, is the variable of the smaller type being used as an index into an array. You might get an out of bounds access, or even just an access to the wrong element.

To add to my own comment, more realistically this happens if the variable is used in the calculation of an index, rather as directly as the index. Though if the array is sufficiently large, and/or the smaller type sufficiently small (short or even char, which are most often 16 or 8 bit nowadays), then at least accessing the wrong element is still common (out of bounds less so). Well, you get the overall idea.

Why does the compiler not warn if you use a 64 bit unsigned integer when a 32 bit signed integer is required?

You actually can pass -Wconversion to a compiler, but it's one of those things that's going to generate a lot of noise and not be a vulnerability 99% of the time. It's not an easy solved problem, because if you annoy developers with noise they stop caring.

It’s an easily solved problem if you turn on these warnings from the start. It’s not though if your program contains a million of these potential vulnerabilities already.

Should we be afraid developers will stop caring? Did they even care in the first place?

Our exploit requires approximately 5GB of memory and 1M inodes

...so basically 32-bit systems are totally unaffected (and I believe size_t and int are the same size there anyway), but I think bugs like this are easily prevented by simply imposing sane limits --- there is zero reason to even consider allowing a path more than a few K in length, and IMHO even that is overly generous.

While I know there's a lot of hate for Windows' traditional 260-char limit, I personally haven't run into it as a developer except by accident (e.g. runaway recursion) and it's very comforting to know that the limit is there so code will fail before it consumes the disk or memory entirely.

> I personally haven't run into it as a developer

Clearly not a nodejs developer then! npm's insanely nested dependency graph caused me to hit the 260 character limit relatively regularly. (though this was several years ago, so maybe they have mitigations for that now)

IIRC this is partly why node_modules moved to a flat structure

We've had trouble (and had to occasionally shorten module names to something dumb like mdlWthVclsRmvd) because some part of some toolchain would create a path like

or some garbage like that (you get the picture). Yes, half the path was taken by descending into some directory that it went out of again straight away. Test runs would fail because of the 260 char limit unless we cut down the module name length. (Thankfully this did not need to be done in the code itself, just in the test run invocation.)

> I think bugs like this are easily prevented by simply imposing sane limits

In general, no. Attackers are clever and can usually find their way around arbitrary limits when a bug exists. Sometimes such a restriction might stop them, but more often than not they’ll bypass it some other way.

B doesn't have these issues because it only has one integer size, take that C.

Well, for that matter the original C on PDP-11 wouldn't have either, it also only had one integer size. We should all just blame ISO C ;-)

$ ls -l /boot/vmlinuz-linux -rw-r--r-- 1 root root 9464864 Jul 16 12:59 /boot/vmlinuz-linux

That's over 9MB, running in supervisor mode.

2021, and people are still surprised every time a kernel bug with security implications is found.

Maybe it is time to look at different OS designs. Large companies such as Google (Fuchsia) or Huawei (HarmonyOS) have begun to pick up on this.

Most of it is drivers though, and stuff that is pretty much required to be there unless one's pasttime (or dayjob) is to tailor and harden kernels.

There are architectural ways to avoid this problem.

Making the kernel as small as possible, and having components and drivers run unprivileged is the primary way to achieve this.

Some real examples that do this: Minix3, Haiku, Genode, Fuchsia, Harmony.

I’m still amazed that we allow compilation of so obviously faulty programs.

It’s like you sent a 1kg package through the postal service, and then the recipient gets an envelope containing a piece of cardboard from the original packaging.. And everyone involved is somehow A-OK with all of this.

If your programming language silently converts between types (in any direction), just to accommodate the programmer, instead of them specifying what they actually want to compute, you simply have failed as a programming language designer.

> you simply have failed as a programming language designer

That's some hubris. The C language is 49 years old. Dennis Ritchie made reasonable design decisions for the time he found himself in. I think we should be understanding of that, and the network effects that lead to large parts of the world's critical software infrastructure being implemented in C. I don't think he failed at anything.

I used to be a C developer. I know how easy it is shoot your foot off in C. I think that, arguably, as an industry we should think twice before building more big and/or critical systems in C. There are better tools now.

But we are where we are and it's important to understand how we got here. Castigating our predecessors as failures does them a disservice.

Though one should mention that most languages can evolve and overcome some of their problems, eg. see PHP which used to be objectively a shitty language and nowadays it is somewhat usable - C underwent basically no improvements in that ridiculous timeframe. So it is not against the original creator but everyone responsible for the language since then.

The thing is that in all those 49 years we could already have had a backwards-incompatible reshape of the language. Leaving old cruft behind would have brought immense improvements! I develop in C and C++ and think this would apply to both.

However with the painful experience that was the Python 2 to 3 debacle, it's clear to me that the only way to do such upgrade is with an all-in commitment. See Ruby: breaking compatibility hasn't ever been as discussed and polemic as in Python. You just upgrade and tell the world: here's the new version, and the old one will be supported for not a day further than 4 years.

People would complain but at the end of the day the world keeps turning. We could be already at C 3.0 and be much happier without all the old compatibility baggage that the language drags with it.


"Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own. "


> That's some hubris.


Or have you, completely on your own, decided that my criticism of programming language design in the 2020s is somehow applicable to languages “literally” designed in the 1970s?

Why not go one step further and deny Alan Turing and Alonzo Church, and their achievements…?

>The C language is 49 years old.

That’s my point, no shade on K&R though.

meanwhile, lisp.

age is orthogonal to good design.

As if I hate lisp…

I was just excusing K&R from the realization we have now in the 2020s, and I would obviously afford the same leniency to John McCarthy and lisp.

It's funny because C compilers have warnings for these, but you have to explicitly enable them. Typically my stuff looks like: -Wall -Wextra -Wpedantic -Wformat=2 -Wstrict-aliasing=3 -Wstrict-overflow=3 -Wstack-usage=12500 -Wfloat-equal -Wcast-align -Wpointer-arith -Wchar-subscripts -Warray-bounds=2

I honestly think all the automatic type promotion and conversion rules of the C family should be officially classified as "cute", namely an example of a childish simplification of a serious issue. I'm a C++ programmer of 20+ years experience and I have never, NEVER, caught myself thinking "gee, I'm glad I don't need to cast this." You ALWAYS think it, you just don't TYPE it, and that is the utterly wrong metric to optimise for. My 2c anyway.

I've had my fair share of annoyance from situations like "I have between one and four hard-coded insertions into this vector, but the compiler yells at me if I try to store the resulting size in an int".

Also, tangentially related is the signed/unsigned business which tends to get in the way frequently. For example, OpenMP 2.0 (the only OpenMP version you get to use with MSVC) requires loop indices to be signed, but both std::vector::size and std::vector::operator[] deal with unsigned integers. Casts guaranteed!

> deep directory structure whose total path length exceeds 1GB [...]

Oh, I didn't know Linux supports GB long path name. On Windows it's limited to something like MAX_PATH_LENGTH which was defined as 200+ chars when I worked on it.

Windows goes to 260 chars, but the underlying system supports something like 64K, which means you can have files on Windows which are largely untouchable by built-in tools (Windows Explorer, file dialogs etc.)

Windows goes way beyond 260, but for BC purposes historically you needed UNC paths (and UNC-aware APIs?).

Since W10 Anniversary Update, there is a setting to disable the MAX_PATH limitation in various APIs, but applications still have to opt into long path awareness via a manifest key.

Windows lifted that limit a few years ago

> Windows lifted that limit a few years ago

IIRC, its a mess because it was lifted inconsistently for different access methods (APIs, and as a consequence UI/CLI methods that depend on them), and at least for some in ways which also are or were dependent on how paths are expressed.

So it is, or at least has historically been after it was first “lifted”, a minefield of inconsistent, surprising behaviors with plenty of gotchas if you didn’t treat it as if it were still a limit.

AFAIK the limit has been lifted since Windows NT (so since the 90s), but only if you use the obscure NT path prefix (\\?\).

Not that obscure, it's been mentioned on MSDN for every relevant operation since before 2000.

IIRC you still need to use a special flag to enable it, right?

There are two things necessary for an application to use long paths:

* the system must have LongPathsEnabled set (though it might be the default nowadays, not sure)

* the application itself must have `{http://schemas.microsoft.com/SMI/2016/WindowsSettings}longPa...` set in its manifest

I wonder if we're going to see some more of Torvald's excellent management technique?

>PATCH 3.12 108/142] fs/seq_file: fallback to vmalloc allocation

>Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

Oh maybe not this time :-)

Can we not have one integer type in c that can grow like in js as required and become bigint if too big?

Haha, yeah JS integers "can grow". Kind of. And then they bite you in the worst way possible. Especially when combined with cryptography (e.g. nonces).

Try this:

  >> const x = 288230376151711740;
  >> x == x + 1
Or this:

  >> 2\*1024
JS doesn't even have integers. It only has floats. JS is by far the worst language I know of when it comes to integer support.

If you want a better example of arbitrary precision integers, try Python, for example.

I was working at Twitter when we learned this the hard way. We switched to a separate service for generating tweet IDs (instead of MySQL sequences), which for various distributed systems reasons meant IDs started taking up more bits. And when we stuck those bigger IDs into JSON responses in the API... well, we learned the dumbest lesson possible about the intersection between JSON, JavaScript, and double-precision floating-point. Later that day the "id_str" fields were born.

Yep, I can totally relate. I found multiple bugs in a MessagePack JS implementation that were related to this issue.

I also worked on a project once where unique IDs of objects could get quite large (because they were random u64 integers). Those IDs were serialized and sent to a browser. Sometimes two objects were viewed as "the same" in the browser application because their IDs were truncated by the floating point precision issue.

Neat trick.

    Math.pow(2, 53) == Math.pow(2, 53) + 1
This is a bit clearer I think. The addition just barely overflows the 53 mantissa bits of the IEEE 754 double precision floating point number.

By the way, JS has BigInts these days, which are supported by all major browsers: https://caniuse.com/bigint

    >> const x = 288230376151711740n;
    >> x == x + 1n


  2**53 == 2**53+1
^^ even more clear.

That's why Number.isSafeInteger(a) exists.

Integers up to +/- 9007199254740991 (Number.MAX_SAFE_INTEGER) are fine.

GP may have been referring to BigInt.

Although, C does have an equivalent to that: the GNU MP library (and probably others as well).

I was referring to bigint not js bigint, checking for overflows if optimized could be just 1 instruction.

We don't need arbitrary-sized integers; we need exceptions on overflow (or underflow, but I'll stick to overflow for the rest of this post) to be the default, or similar language features as appropriate.

I for one am tired of the chicken & egg issue of "CPUs don't support efficient overflow checking because nobody uses it, so it's slow" and "Overflow checking is slow because the CPU doesn't support it, so nobody uses it". For all the other good security work done in both software and hardware, much for things far more complex than this, this seems like an absolutely batshit insane oversight considering the cost/benefits for fixing this.

CPU would raise a signal just as null pointer exception? And it seems a lot of code (for eg, safeintadd metioned below) assumes no exception. Would not all that code get messed up?

Would it be possible to just silently replace with arbitrary sized integer and not break any code like safeintadd?

In C, you can't "just" replace with arbitrary-sized integers. They are fundamentally different memory shapes.

You may not be able to turn enforcement on for all code immediately. There's even the rare bits of code that depend on current overflow behavior. (Due to our human brains and the fact that we can easily name these bits of code, making the cognitively available, people often grotesquely overestimate the amount of code that operates this way. I'm sure it's only a matter of how many zeros belong in the 0.001%.) But we need this support to be available for code to be turn on easily and cheaply.

But what really boggles my mind, again given all the security work we've done, is that the reaction to this remains a combination of silence and "we can't do that!", when it seems to me the reaction ought to be "well duh jerf we all know that already." I don't get this. I don't get this attitude at all. This is a huge source of errors, a good fraction of which are security bugs, and nobody seems to care. Incomprehensible. This is, arguably, the number one thing that could be getting changed right now to fix security issues, and I just get slack-jawed "whaaaa?" in response to the idea.

One hundred plus one hundred is not negative fifty six! Here we are trying to hold together megabytes upon megabytes of security-critical code in a world where 100 + 100 = -56. Is it any wonder that writing any sort of code to maintain security invariants is tough in an environment where 100 + 100 = -56?

I feel like carefully planned with compiler level errors(not warnings), and some glibc non-backward compatible changes. we can achieve this in software layer.

CPUs don't raise exceptions, that is a software concept. CPUs do have traps, like for division by zero, but those are not exceptions in the way you think.

Null pointers are also handled by the kernel, not the CPU. Its called a segmentation fault because you are trying to access a memory segment that the OS doesn't want you to.

It's worth noting that the term "exception" is in fact used for traps that aren't interrupts.

From your link:

A NULL pointer in most (but not all) C implementations is address 0. Normally this address is not in a valid (mapped) page.

Any access to a virtual page that's not mapped by the HW page tables results in a page-fault exception. e.g. on x86, #PF.

This invokes the OS's page-fault exception handler to resolve the situation.

Yes, page fault exceptions happen within the CPU, they cannot happen in software. I think of it as C is close to assembly and does not check every memory address is referencing. It just compiles and CPU starts running it. If somehow a memory of 0 is dereferenced, it is already run by the CPU in its fetch/decode cycle. but of course once the excetion happens, Kernel is responsible for killing the actual process. So both CPU and kernel do it together.

We can do this in software because there's virtually no limit to the abstractions people make, but hardware doesn't work like that. At some point in the software stack we need to draw a line and say this is a 64-bit signed integer that hardware can understand.

But should it be everywhere including the fs/virtual fs layer? Could we limit it only to device drivers? Not a kernel expert here and would love to hear thoughts.

This doesn't have anything to do with filesystems or kernels.

The x86 assembly uses fixed width immediates. CPU registers are a fixed width. For any code to compile and run, it needs to make decisions about how large stack frames need to be, and how much heap memory to allocate.

This was the parent's point about abstractions. You can make a library that pretends to be a variable sized integer, but to implement such a library you need to make a decision about how much space to allocate and the width of variables in order to compile. There is no getting away from how the hardware works.

Yes totally, but FS is a Kernel abstraction, it deals with questions like: How to keep a directory? How to keep a file? How to keep synlinks. Etc,

These can be implemented in any higher level language or within C without using fixed width numerals if correct abstractions were available. Only device control parts should use fixed with numerals in my opinion.

The problem is syscall interfaces are facilitated through hardware circuits, and as such, you have to work with fixed-width integers. Not only that, but those integers usually have other requirements around alignment and endianness.

We can. But it would be very inefficient. There's no point to use C at all, if you can accept that level of performance.

No, because that would require implicit dynamic allocation, which would defeat the entire point of using C.

Only when limits are getting breached, otherwise it is one if statement extra on every access. This happens in Java and other languages.

It doesn’t matter if it only happens occasionally. It’s completely inappropriate for C, which has no implicit memory allocation. There’s not even a clear way to include implicit allocation in the semantics of C.

>that can grow like in js

I thought js "integers" are just floating point numbers?

I believe the poster you were replying to was talking about JS's new `BigInt` type.


Good example of that feature is Python. Also, AFAIR, Scheme (and probably some other lisps) support it.

Yes, arbitrary precision integers could theoretically become a C language/library feature. But let's say that it gets proposed and accepted and it's integrated into popular compiler toolchains. The kernel wouldn't leverage it here or likely anywhere else because of its cost.

A newly designed OS kernel could perhaps take on this kind of feature. This would be the kind of OS that could be formally verified and would be willing to pay that runtime cost for arbitrary precision.

C doesn’t have the abstraction power to make anything with such a datatype.

I left myself a lot of latitude when I referred to this as a potential "language/library feature".

    typedef struct { /* TODO */ } arb_int;
    arb_int *new_arb_int(void);
    void delete_arb_int(arb_int *);
    arb_int *arb_int_add(arb_int *, arb_int *);
    arb_int *arb_int_sub(arb_int *, arb_int *);
    arb_int *arb_int_mul(arb_int *, arb_int *);
    arb_int *arb_int_div(arb_int *, arb_int *);
    bool arb_int_gt(arb_int *, arb_int *);
    bool arb_int_lt(arb_int *, arb_int *);
    bool arb_int_gte(arb_int *, arb_int *);
    bool arb_int_lte(arb_int *, arb_int *);
    bool arb_int_eq(arb_int *, arb_int *);

Perhaps such type of things have been put in Kernel before.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact