Falsehoods programmers believe about null pointers

mcdeltat · 2025-02-01T01:58:06 1738375086

"falsehoods 'falsehoods programmers believe about X' authors believe about X"...

All you need to know about null pointers in C or C++ is that dereferencing them gives undefined behaviour. That's it. The buck stops there. Anything else is you trying to be smart about it. These articles are annoying because they try to sound smart by going through generally useless technicalities the average programmer shouldn't even be considering in the first place.

pradn · 2025-02-01T15:54:15 1738425255

> these articles are annoying

You’re being quite negative about a well-researched article full of info most have never seen. It’s not a crime to write up details that don’t generally affect most people.

A more generous take would be that this article is of primarily historical interest.

motorest · 2025-02-02T07:27:18 1738481238

> You’re being quite negative about a well-researched article full of info most have never seen.

I don't think this is true. OP is right:

> These articles are annoying because they try to sound smart by going through generally useless technicalities the average programmer shouldn't even be considering in the first place.

Dereferencing a null pointer is undefined behavior. Any observation beyond this at best an empirical observarion from running a specific implementation which may or may not comply with the standard. Any article making any sort of claim about null pointer dereferencing beyond stating it's undefined behavior is clearly poorly researched and not thought all the way through.

pradn · 2025-02-05T19:11:44 1738782704

I think you do point to a real issue. The "falsehoods programmers believe about X" genre can be either a) actual things a common programmer is likely to believe b) things a common programmer might not be be knowledgeable enough to believe.

This article is closer to category b. But the category a ones are most useful, because they dispel myths one is likely to encounter in real, practical settings. Good examples of category a articles are those about names, times, and addresses.

The distinction is between false knowledge and unknown unknowns, to put it somewhat crudely.

mcdeltat · 2025-02-02T01:46:41 1738460801

You are right, it was overly negative, which was not nice. Read it as a half-rant then. These types of articles are my pet peeve for some reason.

SR2Z · 2025-02-01T03:22:38 1738380158

Haha all of the examples in the article are basically "here's some really old method for making address 0 a valid pointer."

This isn't like timezones or zip codes where there are lots of unavoidable footguns - pretty much everyone at every layer of the stack thinks that a zero pointer should never point to valid data and should result in, at the very least, a segfault.

MathMonkeyMan · 2025-02-01T02:13:08 1738375988

Useless, but interesting. I used to work with somebody who would ask: What happens with this code?

    #include <iostream>

    int main() {
        const char *p = 0;
        std::cout << p;
    }

You might answer "it's undefined behavior, so there is no point reasoning about what happens." Is it undefined behavior?

The idea behind this question was to probe at the candidate's knowledge of the sorts of things discussed in the article: virtual memory, signals, undefined behavior, machine dependence, compiler optimizations. And edge cases in iostream.

I didn't like this question, but I see the point.

FWIW, on my machine, clang produces a program that segfaults, while gcc produces a program that doesn't. With "-O2", gcc produces a program that doesn't attempt any output.

gerdesj · 2025-02-01T02:49:15 1738378155

I think that reasoning about things is a good idea and looking at failure modes is an engineers job. However, I gather that the standard says "undefined", so a correct answer to what "happens with this code" might be: "wankery" (on the part of the questioner). You even demonstrate that undefined status with concrete examples.

In another discipline you might ask what happens what happens when you stress a material near to or beyond its plastic limit? It's quite hard to find that limit precisely, without imposing lots of constraints. For example take a small metal thing eg a paper clip and bend it repeatedly. Eventually it will snap due to quite a few effects - work hardening, plastic limit and all that stuff. Your body heat will affect it, along with ambient temperature. That's before we worry about the material itself which a paper clip will be pretty straightforwards ... ish!

OK, let's take a deeper look at that crystalline metallic structure ... or let's see what happens with concrete or concrete with steel in it, ooh let's stress that stuff and bend it in strange ways.

Anyway, my point is: if you have something as simple as a standard that says: "this will go weird if you do it" then accept that fact and move on - don't try to be clever.

immibis · 2025-02-01T10:04:21 1738404261

"undefined" means "defined elsewhere".

david-gpu · 2025-02-01T10:47:42 1738406862

LOL. No.

Some languages/libraries even make an explicit distinction between Undefined and Implementation-Defined, where only the latter is documented on a vendor-by-vendor basis. Undefined Behavior will typically vary across vendors and even within versions or whatnot within the same vendor.

The very engineers who implemented the code may be unaware of what may happen when different types of UB are triggered, because it is likely not even tested for.

immibis · 2025-02-01T20:11:51 1738440711

So it's defined in the compiler's source code. God doesn't roll a die every time you dereference null. Demons flying out of your nose would conform to the C++ standard, but I assure you that it would violate other things, such as the warranty on your computer that says it does not contain nasal demons, and your CPU's ISA, which does not contain a "vmovdq nose, demons" instruction.

david-gpu · 2025-02-01T21:33:43 1738445623

No. The compiler isn't the only component of the system that will determine what happens when you trigger UB, either. There is UB all the way down to hardware specifications.

I used to be one of the folks who defined the behavior of both languages and hardware at various companies. UB does not mean "documented elsewhere". Please stop spreading misinformation.

motorest · 2025-02-02T07:58:41 1738483121

> No. The compiler isn't the only component of the system that will determine what happens when you trigger UB, either. There is UB all the way down to hardware specifications.

I don't think you know what undefined behavior is. That's a concept relevant to language specifications alone. It does not trickle up or down what language specifications cover. It just means that the authors of the specification intentionally left the behavior expected from a specific scenario as undefined.

For those who write software targeting language specifications this means they are introducing a bug because they are presuming their software will show a behavior which is not required by the standard. For those targeting specific combinations of compiler and hardware, they need to do their homework to determine if the behavior is guaranteed.

immibis · 2025-02-02T10:07:30 1738490850

Hardware also has UB, but what happens is still dictated by the circuitry. The relevant circuitry is both complicated enough and not useful enough for the CPU designer to specify it.

Often they use the word "unpredictable" instead. The behavior is perfectly predictable by an omniscient silicon demon, but you may not be able to predict it.

The effect that speculative execution has on cache state turned out to be unpredictable, so we have all the different Spectre vulnerabilities.

Hardware unpredictability doesn't overlap much with language UB, anyway. It's unlikely that something not defined by the language is also not defined by th hardware. It's much more likely that the compiler's source code fully defines the behaviour.

gerdesj · 2025-02-01T23:59:35 1738454375

"I used to be one of the folks who defined the behavior of both languages and hardware at various companies"

But not at all companies, orgs or even in Heaven and certainly (?) not at ISO/OSI/LOL. It appears that someone wants to redefine the word "undefined" - are they sure that is wise?

motorest · 2025-02-02T07:47:43 1738482463

> LOL. No.

It actually does. You should spend a minute to learn the definition before commenting on the topic.

Take for example C++. If you bother to browse through the standard you'll eventually stumble upon 3.64 where it states in no uncertain terms that the definition of undefined behavior is "behavior for which this document imposes no requirements". The specification even goes to the extent of subclassifying undefined behavior in permissible and how it covers the program running in a documented manner.

To drive the point home, the concept of undefined behavior is something specific to language specifications,not language implementetions. It was introduced to allow specific existing implementations to remain compliant even though they relied on very specific features, like particular hardware implementetions, that once used may or may not comply with what the standard specified as required behavior and went beyond implementetion-defined behavior.

I see clueless people parroting "undefined behavior" as some kind of gotcha, specially when they try to upsell some other programming language. If your arguments come from a place of lazy ignorance, you can't expect to be taken seriously.

mcdeltat · 2025-02-01T02:34:43 1738377283

I'm assuming it's meant to be:

  std::cout << *p;

?

I still think discussing it is largely pointless. It's UB and the compiler can do about anything, as your example shows. Unless you want to discuss compiler internals, there's no point. Maybe the compiler assumes the code can't execute and removes it all - ok that's valid. Maybe it segfaults because some optimisation doesn't get triggered - ok that's valid. It could change between compiler flags and compiler versions. From the POV of the programmer it's effectively arbitrary what the result is.

Where it gets harmful IMO is when programmers think they understand UB because they've seen a few articles, and start getting smart about it. "I checked the code gen and the compiler does X which means I can do Y, Z". No. Please stop. You will pay the price in bugs later.

MathMonkeyMan · 2025-02-01T02:48:18 1738378098

> I'm assuming it's meant to be: [...]

Nope, I mean inserting the character pointer ("string") into the stream, not the character to which it maybe points.

Your second paragraph demonstrates, I think, why my former colleague asked the question. And I agree with your third paragraph.

mcdeltat · 2025-02-01T03:22:19 1738380139

Ah, I got confused for a minute why printing a character pointer is UB. I was thinking of printing the address, which is valid. But of course char* has a different overload because it's a string. You can tell how much I use std::string and std::string_view lol.

I reckon we are generally in agreement. Perhaps I am not the best person to comment on the purpose of discussing UB, since I already know all the ins and outs of it... "Been there done that" kind of thing.

johnnyanmac · 2025-02-01T07:34:04 1738395244

>No. Please stop. You will pay the price in bugs later.

indeed. It is called UB because that's basically code for compilers devs to say "welp don't have to worry about changing this" while updating the compiler. What can work in, say, GCC 12 may not work in GCC 14. Or even GCC 12.0.2 if you're unlucky enough. Or you suddenly need to port the code to another platform for clang/MSVC and are probably screwed.

johnnyanmac · 2025-02-01T07:29:44 1738394984

>I didn't like this question, but I see the point.

These would be fine interviewing questions if it's meant to start a conversation. Even if I do think it's a bit obtuse from a SWE's perspective ("it's undefined behavior, don't do this") vs. a Computer scientists' perspective you took.

It's just a shame that these days companies seem to want precise answers to such trivia. As if there's an objective answer. Which there is, but not without a deep understanding of your given compiler (and how many companies need that, on the spot, under pressure in a timed interview setting?)

motorest · 2025-02-02T08:01:33 1738483293

> These would be fine interviewing questions if it's meant to start a conversation.

I don't agree. They sound like puerile parlour tricks and useless trivia questions, more in line in the interviewer acting defensively and trying too hard to pass themselves as smart or competent instead of actually assessing a candidate's skillset. Ask yourself how frequent those topics pop up in a PR, and how many would be addressed with a 5min review or Slack message.

SAI_Peregrinus · 2025-02-01T03:43:44 1738381424

Not quite.

Trivially, `&E` is equivalent to `E`, even if `E` is a null pointer (C23 standard, footnote 114 from section 6.5.3.2 paragraph 4, page 80). So since `&` is a no-op that's not UB.

Also `*(a+b)` where `a` is NULL but `b` is a nonzero integer never dereferences the NULL pointer, but is still undefined behavior since conversions from null pointers to pointers of other types still do not compare equal to pointers to any actual objects or functions (6.3.2.3 paragraph 3) and addition or subtraction of pointers into array objects with integers that produce results that don't point into the same array object are UB (6.5.6).

sgerenser · 2025-02-01T11:09:44 1738408184

I prefer: “Falsehoods programmers believe about X” articles with falsehoods considered harmful.

immibis · 2025-02-01T10:03:29 1738404209

Are you writing C or C++ code or are you writing, for example C or C++ code for Windows? Because on Windows it's guaranteed to throw an access violation structured exception, for example.

Maxatar · 2025-02-01T19:23:21 1738437801

No it's not, even with MSVC on Windows dereferencing a null pointer is not guaranteed to do anything:

Here's a classic article about the very weird and unintuitive consequences of null pointer dereferencing, such as "time travel":

https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

metalcrow · 2025-02-01T03:11:38 1738379498

> asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.

Is this actually a real optimization? I understand the principal, that you can bypass explicit checks by using exception handlers and then massage the stack/registers back to a running state, but does this actually optimize speed? A null pointer check is literally a single TEST on a register, followed by a conditional jump the branch predictor is 99.9% of the time going to know what to do with. How much processing time is using an exception actually going to save? Or is there a better example?

nickff · 2025-02-01T03:47:27 1738381647

The OP is offering terrible advice based on a falsehood they believe about null pointers. In many applications (including the STM32H743 microcontroller that I am currently working on), address zero (which is how "NULL" is defined by default in my IDE) points to RAM or FLASH. In my current application, NULL is ITCM (instruction tightly coupled memory), and it's where I've put my interrupt vector table. If I read it, I don't get an error, but I may get dangerously wrong data.

pfyra · 2025-02-01T07:54:38 1738396478

I work with the same mcu. You can set up the MPU to catch null pointer dererences so they don't pass silently.

russdill · 2025-02-01T10:06:44 1738404404

Not only that, if you're referencing an element in a very large structure or array, the base address may be zero, but the actual access may be several pages past that.

LorenPechtel · 2025-02-01T14:46:36 1738421196

I disagree. You're looking at embedded code which very well might not be running with memory segmentation. If you have no hardware safety you must check your pointers, period. But few of us are in that situation. Personally, I haven't touched an environment without hardware safety in 20 years.

gwbas1c · 2025-02-01T03:51:30 1738381890

> Is this actually a real optimization?

No... And yes.

No: Because throwing and catching the null pointer exception is hideously slow compared to doing a null check. In Java / C#, the exception is an allocated object, and the stack is walked to generate a stack trace. This is in addition to any additional lower-level overhead (panic) that I don't understand the details well enough to explain.

Yes: If, in practice, the pointer is never null, (and thus a null pointer is truly an exceptional situation,) carefully-placed exception handlers are an optimization. Although, yes, the code will technically be faster because it's not doing null checks, the most important optimization is developer time and code cleanliness. The developer doesn't waste time adding redundant null checks, and the next developer finds code that is easier to read because it isn't littered with redundant null checks.

mr_00ff00 · 2025-02-01T04:57:34 1738385854

“Most important optimization is developer time and code cleaniness”

True for 99% of programming jobs, but if you are worried about the speed of null checks, you are in that 1%.

In high frequency trading, if you aren’t first your last and this is the exact type of code optimizations you need for the “happy path”

LorenPechtel · 2025-02-01T14:53:08 1738421588

Yup, three is definitely value to the code not doing things it doesn't need to. I also find it clearer if the should-never-happen null check is in the form of an assertion. You know anything being tested in an Assert is a should never happen path, you don't need to consider why it's being done.

gwbas1c · 2025-02-03T14:14:38 1738592078

> You know anything being tested in an Assert is a should never happen path, you don't need to consider why it's being done

That's also rather... Redundant in modern null-safe (or similar) languages.

IE, Swift and C# have compiler null checking where the method can indicate if an argument accepts null. There's no point in assertions; thus the null reference exceptions do their job well here.

Rust's option type (which does almost the same thing but trolls love to argue semantics) also is a situation where the compiler makes it hard to pass the "null equivalent." (I'm not sure if a creative programmer can trick the runtime into passing a null pointer where it's unexpected, but I don't understand unsafe Rust well enough to judge that.) But if you need to defend against that, I think Panics can do it.

wging · 2025-02-01T05:27:31 1738387651

If you're actually paying a significant cost generating stack traces for NPEs, there's a JVM option to deal with that (-XX:-OmitStackTraceInFastThrow). It still generates a stack trace the first time; if you're able to go search for that first one it shouldn't be a problem for debugging.

LorenPechtel · 2025-02-01T14:41:21 1738420881

Conditional jumps tend to carry performance penalties because they tend to trash the look-ahead. And are pretty much forbidden in some cryptographic code where you want constant time execution no matter what.

Also, so long as the hardware ensures that you don't get away with using the null I would think that not doing the test is the proper approach. Hitting a null is a bug, period (although it could be a third party at fault.) And I would say that adding a millisecond to the bugged path (but, library makers, please always make safe routines available! More than once I've seen stuff that provides no test for whether something exists. I really didn't like it the day I found the routine that returned the index of the supplied value or -1 if it didn't exist--but it wasn't exposed, only the return the index or throw if it didn't exist was accessible.) vs saving a nanosecond on the correct path is the right decision.

UncleMeat · 2025-02-01T19:42:14 1738438934

In languages like java, jits are really good at optimizing the happy path and outlining these rarely taken cases. In a language like c++ the branch predictor has absolutely no trouble blasting right through branches that are almost always taken in one direction with minimal performance waste.

bjackman · 2025-02-01T09:25:44 1738401944

Regardless of whether it's faster, it's an extremely bad idea in a C program, for many of the reasons the author outlines.

It's UB so you are forever gonna be worrying about what weird behaviour the compiler might create. Classic example: compiler infers some case where the pointer is always NULL and then just deletes all the code that would run in that case.

Plus, now you have to do extremely sketchy signal handling.

toast0 · 2025-02-01T06:00:06 1738389606

Sure, the cost of the check is small, and if you actually hit a null pointer, the cost is much higher if it's flagged by the MMU instead of a simple check.

But you're saving probably two bytes in the instruction stream for the test and conditional jump (more if you don't have variable length instructions), and maybe that adds up over your whole program so you can keep meaningfully more code in cache.

LorenPechtel · 2025-02-01T14:58:58 1738421938

More important is the branch predictor. Sometimes you take the hit of a failed predict and you also stuck another entry in the predictor table, making it more likely some other code will suffer a failed branch predict.

oguz-ismail · 2025-02-01T03:29:35 1738380575

Signal handling is done anyway. The cost of null pointer checks is a net overhead, if minuscule.

Aloisius · 2025-02-01T04:13:28 1738383208

OpenJVM does it, iirc. If the handler is triggered too often at a location, it will swap back to emitting null checks though since it is rather expensive.

Of course, there's a big difference between doing it in a VM and doing it in a random piece of software.

Gibbon1 · 2025-02-01T12:31:10 1738413070

My takes are

Unlike crummy barely more than a macro assembler compilers of yore. A modern compiler will optimize away a lot of null pointer checks.

Superscalar processors can chew gum and walk at the same time while juggling and hula hooping. That is if they also don't decide the null check isn't meaningful and toss it away. And yeah checking for null is a trivial operation they'll do while busy with other things.

And the performance of random glue code running on a single core isn't where any of the wins are when it comes to speed and hasn't been for 15 years.

Validark · 2025-02-01T10:32:59 1738405979

I would be very interested to see optimizations based on this. Could the compiler emit faster code with this idea? What impact does this have on auto-vectorization like using `vgather`?

liontwist · 2025-02-01T03:31:29 1738380689

> A null pointer check is literally a single TEST on a register

On every pointer deref in your entire program. Not for release mode.

heraclius1729 · 2025-02-01T03:47:25 1738381645

> In both cases, asking for forgiveness (dereferencing a null pointer and then recovering) instead of permission (checking if the pointer is null before dereferencing it) is an optimization. Comparing all pointers with null would slow down execution when the pointer isn’t null, i.e. in the majority of cases. In contrast, signal handling is zero-cost until the signal is generated, which happens exceedingly rarely in well-written programs.

At least from a C/C++ perspective, I can't help but feel like this isn't great advice. There isn't a "null dereference" signal that gets sent--it's just a standard SIGSEGV that cannot be distinguished easily from other memory access violations (memprotect, buffer overflows, etc). In principle I suppose you could write a fairly sophisticated signal handler that accounts for this--but at the end of the day it must replace the pointer with a not null one, as the memory read will be immediately retried when the handler returns. You'll get stuck in an infinite loop (READ, throw SIGSEGV, handler doesn't resolve the issue, READ, throw SIGSEGV, &c.) unless you do something to the value of that pointer.

All this to avoid the cost of an if-statement that almost always has the same result (not null), which is perfect conditions for the CPU branch predictor.

I'm not saying that it is definitely better to just do the check. But without any data to suggest that it is actually more performant, I don't really buy this.

EDIT: Actually, this is made a bit worse by the fact that dereferencing nullptr is undefined behavior. Most implementations set the nullptr to 0 and mark that page as unreadable, but that isn't a sure thing. The author says as much later in this article, which makes the above point even weirder.

saagarjha · 2025-02-01T10:16:32 1738404992

You can longjmp out of a signal handler.

fanf2 · 2025-02-01T16:15:59 1738426559

But that’s likely to be very unsafe, especially in a multithreaded program, or if it relies on stack unwinding for RAII.

heraclius1729 · 2025-02-01T16:23:41 1738427021

Oh! I didn't know that actually. That's useful information.

alain94040 · 2025-02-01T01:14:03 1738372443

I would add one more: the address you are dereferencing could be non-zero, it could be an offset from 0 because the code is accessing a field in a structure or method in a class. That offset can be quite large, so if you see an error accessing address 0x420, it's probably because you do have a null pointer and are trying to access a field. As a bonus, the offending offset may give you a hint as to which field and therefore where in your code the bad dereferencing is happening.

nyanpasu64 · 2025-02-01T02:03:08 1738375388

One interesting failure mode is if (like the Linux kernel) a function returns a union of a pointer or a negative errno value, dereferencing a negative errno gives an offset (below or above zero) different from the field being accessed.

catlifeonmars · 2025-02-01T01:35:32 1738373732

Now this is a really interesting one. I’m assuming this trivially applies to index array access as well?

tialaramex · 2025-02-01T01:54:43 1738374883

In C, and this article seems to be almost exclusively about C, a[b] is basically sugar for (*((a) + (b)))

C does actually have arrays (don't let people tell you it doesn't) but they decay to pointers at ABI fringes and the index operation is, as we just saw, merely a pointer addition, it's not anything more sophisticated - so the arrays count for very little in practice.

juped · 2025-02-01T02:56:27 1738378587

Not just basically sugar, the classic parlor trick is doing 3[arr] or whatever

catlifeonmars · 2025-02-02T05:21:34 1738473694

Party trick? That’s just good coding style. I even have that enabled as a lint rule /s

kevincox · 2025-02-01T01:53:33 1738374813

I think this technically wouldn't be a null pointer anymore. As array indexing `p[n]` is defined as `*(p + n)` so first you create a new pointer by doing math on a null pointer (which is UB in C) then dereferencing this new pointer (which doesn't even really exist because you have already committed UB).

caspper69 · 2025-02-01T02:21:16 1738376476

The article wasn't terrible. I give it a C+ (no pun intended).

Too general, too much trivia without explaining the underlying concepts. Questionable recommendations (without covering potential pitfalls).

I have to say that the discourse here is refreshing. I got a headache reading the 190+ comments on the /r/prog post of this article. They are a lively bunch though.

Hizonner · 2025-02-01T01:34:14 1738373654

> Instead of translating what you’d like the hardware to perform to C literally, treat C as a higher-level language, because it is one.

Alternately, stop writing code in C.

uecker · 2025-02-01T07:55:28 1738396528

Or catch them using a sanitizer: https://godbolt.org/z/z9WKs5aYv

liontwist · 2025-02-01T03:31:58 1738380718

No. I don’t think I will.

mcdeltat · 2025-02-01T02:04:46 1738375486

IMO one of the most disappointing things about C: it smells like it should be a straightforward translation to assembly, but actually completely is not because of the "virtual machine" magic the Standard uses which opens the door to almost anything.

Oh you would like a byte? Is that going to be a 7 bit, 8 bit, 12 bit, or 64 bit byte? It's not specified, yay! Have fun trying to write robust code.

tialaramex · 2025-02-01T02:18:26 1738376306

Abstract. It's an Abstract machine, not a Virtual machine.

zajio1am · 2025-02-01T03:40:07 1738381207

Size of byte is implementation-defined, not unspecified. Why is that a problem for writing robust code? It is okay to assume implementation-defined behavior as long as you are targeting a subset of systems where these assumptions hold, and if you check them at build-time.

bobmcnamara · 2025-02-01T02:09:36 1738375776

Ahem, it's specified to not be 7.

keldaris · 2025-02-01T05:37:34 1738388254

Luckily, little of it matters if you simply write C for your actual target platforms, whatever they may be. C thankfully discourages the very notion of "general purpose" code, so unless you're writing a compiler, I've never really understood why some C programmers actually care about the standard as such.

In reality, if you're writing C in 2025, you have a finite set of specific target platforms and a finite set of compilers you care about. Those are what matter. Whether my code is robust with respect to some 80s hardware that did weird things with integers, I have no idea and really couldn't care less.

msla · 2025-02-01T06:50:38 1738392638

> I've never really understood why some C programmers actually care about the standard as such.

Because I want the next version of the compiler to agree with me about what my code means.

The standard is an agreement: If you write code which conforms to it, the compiler will agree with you about what it means and not, say, optimize your important conditionals away because some "Can't Happen" optimization was triggered and the "dead" code got removed. This gets rather important as compilers get better about optimization.

uecker · 2025-02-01T09:48:04 1738403284

True, we are currently eliminating a lot of UB from the future C standard to avoid compilers breaking more code.

Still, while I acknowledge that this is a real issue, in practice I find my C code from 30 years ago still working.

It is also a bit the fault of users. Why favor so many user the most aggressive optimizing compilers? Every user filing bugs or complaining about aggressive optimizing breaking code in the bug tracker, very user asking for better warnings, would help us a lot pushing back on this. But if users prefer compiler A over compiler B when you a 1% improvement in some irrelevant benchmark, it is difficult to argue that this is not exactly what they want.

keldaris · 2025-02-01T12:05:23 1738411523

In practice, you're going to test the next version of the compiler anyway if you want to be sure your code actually works. Agreements or not, compilers have bugs on a regular basis. From the point of view of a programmer, it doesn't matter if your code broke because you missed some fine point in the standard or because the compiler got it wrong, either way you're going to want to fix it or work around it.

In my experience, if you don't try to be excessively clever and just write straightforward C code, these issues almost never arise. Instead of wasting my time on the standard, I'd rather spend it validating the compilers I support and making sure my code works in the real world, not the one inhabited by the abstract machine of ISO C.

lelanthran · 2025-02-01T16:43:02 1738428182

> In practice, you're going to test the next version of the compiler anyway

> In my experience, if you don't try to be excessively clever and just write straightforward C code, these issues almost never arise.

I think these two sentiments are what gets missed by many programmers who didn't actually spend the last 25+ years writing software in plain C.

I lose count of the number of times I see in comments (both here and elsewhere) how it should be almost criminal to write anything life-critical in C because it is guaranteed to fail.

The reality is that, for decades now, life-critical software has been written in C - millions and millions of lines of code controlling millions and millions of devices that are sitting in millions and millions of machines that kill people in many failure modes.

The software defect rate resulting in deaths is so low that when it happens it makes the news (See Toyota's unintended acceleration lawsuit).

That's because, regardless of what the programmers think their code does, or what a compiler upgrade does to it, such code undergoes rigorous testing and, IME, is often written to be as straightforward as possible in the large majority of cases (mostly because the direct access to the hardware makes reasoning about the software a little easier).

uecker · 2025-02-01T09:41:35 1738402895

No, it has to be at least 8 and this is sufficient to write portable code.

HeliumHydride · 2025-02-01T02:08:09 1738375689

C++ has made efforts to fix some of this. Recently, they enforced that signed integers must be two's complement. There is a proposal currently to fix the size of bytes to 8 bits.

mcdeltat · 2025-02-01T02:24:28 1738376668

Yes, which is excellent (although 50 years too late, I'll try not to be too cynical...).

The problem is that C++ is a huge language which is complex and surely not easy to implement. If I want a small, easy language for my next microprocessor project, it probably won't be C++20. It seems like C is a good fit, but really it's not because it's a high level language with a myriad of weird semantics. AFAIK we don't have a simple "portable assembler + a few niceties" language. We either use assembly (too low level), or C (slightly too high level and full of junk).

1over137 · 2025-02-01T03:28:04 1738380484

I'm excited about -fbounds-safety coming soon: https://github.com/llvm/llvm-project/commit/64360899c76c

kerkeslager · 2025-02-01T04:42:46 1738384966

That's just not an option in a lot of cases, and it's not a good option in other cases.

Like it or not, C can run on more systems than anything else, and it's by far the easiest language for doing a lot of low-level things. The ease of, for example, accessing pointers, does make it easier to shoot yourself in the foot, but when you need to do that all the time it's pretty hard to justify the tradeoffs of another language.

Before you say "Rust": I've used it extensively, it's a great language, and probably an ideal replacement for C in a lot of cases (such as writing a browser). But it is absolutely unacceptable for the garbage collector work I'm using C for, because I'm doing complex stuff with memory which cannot reasonably be done under the tyranny of the borrow checker. I did spend about six weeks of my life trying to translate my work into Rust and I can see a path to doing it, but you spend so much time bypassing the borrow checker that you're clearly not getting much value from it, and you're getting a massive amount of faffing that makes it very difficult to see what the code is actually doing.

I know HN loves to correct people on things they know nothing about, so if you are about to Google "garbage collector in Rust" to show me that it can be done, just stop. I know it can be done, because I did it; I'm saying it's not worth it.

oguz-ismail · 2025-02-01T01:39:05 1738373945

impossible

no serious alternative

IshKebab · 2025-02-01T08:59:08 1738400348

Rust and Zig are the serious alternatives for cases where you need a "zero cost" language. If you don't (plenty of C code doesn't) there are endless serious alternatives.

I think you could argue that Zig is still very new so you might not want to use it for that reason, but otherwise there is no reason to use C for new projects in 2025.

oguz-ismail · 2025-02-01T09:07:27 1738400847

> Rust

one compiler, two platforms, no spec

> Zig

dollar store rust

IshKebab · 2025-02-01T13:51:57 1738417917

Ooo now do C!

tialaramex · 2025-02-01T01:51:09 1738374669

For very small platforms, where it's a struggle to have a C compiler because a "long int" of 32 bits is already a huge challenge to implement, let alone "long long int" - stop using high level languages. Figure out the few dozen machine code instructions you want for your program, write them down, review, translate to binary, done.

For the bigger systems where that's not appropriate, you'll value a more expressive language. I recommend Rust particularly, even though Rust isn't available everywhere there's an excellent chance it covers every platform you actually care about.

saagarjha · 2025-02-01T10:19:57 1738405197

People wrote and continue to write C for 16-bit platforms just fine

tialaramex · 2025-02-01T11:35:43 1738409743

16-bit is huge. Rust will fit fine on a 16-bit platform. We're not talking about these big modern systems.

IshKebab · 2025-02-01T13:54:55 1738418095

Rust doesn't support 16-bit architectures. It theoretically could but 16-bit architectures are completely obsolete and almost unused, so there's no real need.

xandrius · 2025-02-01T08:28:30 1738398510

No true scotsman either.

PoignardAzur · 2025-02-01T12:20:19 1738412419

That's not how you're supposed to write a "falsehoods programmers believe about X" article.

The articles that started this genre are about oversimplifications that make your program worse because real people will not fit into your neat little boxes and their user experience with degrade if you assume they do. It's about developers assuming "Oh, everyone has a X" and then someone who doesn't have a X tries to use their program and get stuck for no reason.

Writing a bunch of trivia about how null pointers work in theory which will almost never matter in practice (just assume that dereferencing them is always UB and you'll be fine) isn't in the spirit of the "falsehoods" genre, especially if every bit of trivia needs a full paragraph to explain it.

Blikkentrekker · 2025-02-01T03:19:51 1738379991

> In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.

> For all intents and purposes, UB as we understand it today with spooky action at a distance didn’t exist.

The first official C standard was from 1989, the second real change was in 1995, and the infamous “nasal daemons” quote was from 1992. So evidently the first C standard was already interpreted that way, that compilers were really allowed to do anything in the face of undefined behavior. As far as I know

bitwize · 2025-02-01T01:18:15 1738372695

Nowadays, UB is used pretty much as a license to make optimizer go brrrr. But back in the day, I think it was used to allow implementations wiggle room on whether a particular construct was erroneous or not -- in contrast to other specifications like "it is an error" (always erroneous) or "implementation-defined behavior" (always legitimate; compiler must emit something sensible, exactly what is not specified). In the null pointer case, it makes sense for kernel-mode code to potentially indirect to address 0 (or 0xffffffff, or whatever your architecture designates as null), while user-space code can be reasonably considered never to legitimately access that address because the virtual memory never maps it as a valid address. So accessing null is an error in one case and perfectly cromulent in the other. So the standard shrugs its shoulders and says "it's undefined".

lmm · 2025-02-01T01:54:13 1738374853

The original motivation was to permit implementations to do the reasonable, friendly thing, and trap whenever the program dereferences a null pointer. Since C compilers want to reorder or elide memory accesses, you can't really define explicit semantics for that (e.g. you want it to be ok to move the memory access before or after a sequence point) - the JVM has to do a lot of work to ensure that it throws NullPointerException at the correct point when it happens, and this slows down all programs even though no-one sane has their program intentionally trigger one. But the intention was to permit Java-like behaviour where your code would crash with a specific error immediately-ish, maybe not on the exact line where you dereferenced null but close to it. Ironically compiler writers then took that standard and used it to do the exact opposite, making null dereference far more dangerous than even just unconditionally reading memory address 0.

uecker · 2025-02-01T07:57:08 1738396628

Compiler writers did both: https://godbolt.org/z/z9WKs5aYv

megous · 2025-02-01T03:38:04 1738381084

Dereferencing a null pointer is how I boot half of my systems. :D On Rockchip platforms address 0 is start of DRAM, and a location where [U-Boot] SPL is loaded after DRAM is initialized. :)

SAI_Peregrinus · 2025-02-01T03:51:44 1738381904

That's not a null pointer. Address `0` can be valid. A null pointer critically does not compare equal to any non-null pointer, including a pointer to address 0 on platforms where that's allowed.

> An integer constant expression with the value `0` , such an expression cast to type `void *` , or the predefined constant `nullptr` is called a null pointer constant ^69) . If a null pointer constant or a value of the type `nullptr_t` (which is necessarily the value `nullptr` ) is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

C 23 standard 6.3.2.3.3

Also this is point 6 in the article.

megous · 2025-02-02T01:45:36 1738460736

It pretty clearly is a null pointer on my machine. What's null pointer on some HP mainframe from 70's is kinda irrelevant to how I boot my systems.

userbinator · 2025-02-01T03:22:52 1738380172

"Falsehoods programmers believe" is the "considered harmful" of the modern dogma cult.

butter999 · 2025-02-01T05:18:57 1738387137

It's a genre. It's neither dogmatic, modern, nor unique to programming.

EtCepeyd · 2025-02-01T01:48:53 1738374533

> Dereferencing a null pointer always triggers “UB”.

Calling this a "falsehood" is utter bullshit.

saagarjha · 2025-02-01T10:22:10 1738405330

Not everyone is using standards-compliant C.

rstuart4133 · 2025-02-01T01:59:38 1738375178

> In ye olden times, the C standard was considered guidelines rather than a ruleset, undefined behavior was closer to implementation-defined behavior than dark magic, and optimizers were stupid enough to make that distinction irrelevant. On a majority of platforms, dereferencing a null pointer compiled and behaved exactly like dereferencing a value at address 0.

Let me unpack that for you. Old compilers didn't recognise undefined behaviour, and so compiled the code that triggered undefined behaviour in exactly the same way they compiled all other code. The result was implementation defined, as the article says.

Modern compilers can recognise undefined behaviour. When they recognise it they don't warn the programmer "hey, you are doing something non-portable here". Instead they may take advantage of it in any way they damned well please. Most of those ways will be contrary to what the programmer is expecting, consequently yielding a buggy program.

But not in all circumstances. The icing on the cake is some undefined behaviour (like dereferencing null pointers) is tolerated (ie treated in the old way), and some not. In fact most large C programs will rely on undefined behaviour of some sort, such as what happens when integers overflow or signed is converted to unsigned.

Despite that, what is acceptable undefined behaviour and what is not isn't defined by the standard, or anywhere else really. So the behaviour of most large C programs is it legally allowed to to change if you use a different compiler, a different version of the same compiler, or just different optimisation flags. Consequently most C programs depend on the compiler writers do the same thing with some undefined behaviour, despite there being no guarantees that will happen.

This state of affairs, which is to say having a language standard that doesn't standardise major features of the language, is apparently considered perfectly acceptable by the C standards committee.

uecker · 2025-02-01T08:05:29 1738397129

Note that in most scenarios compiler do not "recognize UB" and then don't tell you about it. Instead, they do not know whether there will be UB or not at run-time and simply assume that you know what you are doing.

I also like to point out that not all modern compilers behave the same way. GCC will (in the case it is clear there will be a null pointer dereference) compile it into a trap, while clang will cause chaos: https://godbolt.org/z/M158Gvnc4

Both have sanitizers to detect this scenario.

rstuart4133 · 2025-02-01T23:33:56 1738452836

> Note that in most scenarios compiler do not "recognize UB" and then don't tell you about it. Instead, they do not know whether there will be UB or not at run-time and simply assume that you know what you are doing.

The point is the old K&R era compilers did that 100% of the time, whereas the current crop uses UB as an excuse to generate some random behaviour some percentage of the time. Programmers can't tell you when that happens because it varies wildly with compilers, versions of compilers and flags passed. Perhaps you are right in saying it's "most scenarios" - but unless you are a compiler writer I'm not sure who you would know, and even then it only applies to the compiler you are familiar with.

ahartmetz · 2025-02-01T06:20:20 1738390820

I see at least two possible reasons why it happened: 1. "Don't do something counterproductive" or "Don't get your priorities wrong" do not usually need to be said explicitly, 2. Standards "culture" values precision so much that they'd balk at writing fuzzy things like "Do typical null pointer things when trying to deref a null pointer".

Then later 3. "But we implemented it that way for the benchmarks, can't regress there!"