Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's your worst undefined C story?
84 points by idiliv on Oct 17, 2019 | hide | past | favorite | 43 comments
We've all heard the ghost stories about undefined C programs that delete hard disks or set the computer on fire. However, usually, C programs with undefined behavior either crash or seem to be working.

So, has anyone ever experienced behavior that went beyond that?

I was unable to find the quote, but someone once said that "We always focus on the negative effects of undefined behavior, but literally anything could happen. It doesn't have to be bad! Let's replace fear with hope."

Perhaps double free could send a signal to a randomly chosen process forcing it to allocate that newly freed memory for something. A new take on a garbage collector.

I hope those nasal daemons are friendly.

Maybe they can solve the halting problem.

Yes, usually by introducing an infinite loop. Never halts. ;)

Just out of curiosity, why did you make a throwaway account for this comment?

Some people prefer to do it due to privacy concerns. In the ideal world, I don't want others to deduce my identity solely from my opinions posted on a random Internet forum, too. If you can identify everyone's identity or background from people's post history on every random forum, I think it shows something is very wrong about the web. And I don't even mean controversial opinions, just normal talks.

In additional to concerns of mass surveillance and privacy, there is a also cultural factor, for example, members from the WELLs (a forum) or early Usenet are expected to post publicly using their real names, on the other hand, for example, privacy is highly valued socially in Japan, it's considered eccentric or offensive to reveal one's identity online when people are not speaking publicly. I remember reading a story of how an owner of a cat whose pictures went viral worldwide refused to reveal his/her identity to any journalist, not even interviewing the cat, including a U.S. journalist who perused the case sincerely.

It's discouraged by the HN Guidelines to make throwaway accounts constantly.

> Throwaway accounts are ok for sensitive information, but please don't create accounts routinely. HN is a community—users should have an identity that others can relate to.

Since it defeats the karma system and leads to abuses, such as publishing defamatory comments in a hit-and-run manner. But personally, I don't have a problem with them, per se, as long as the comment is constructive.

Upvoted this, for some strange reason

Another good reason is you are on another device, want to reply immediately and don’t have your password handy.

In the late 80's, two of my fellow C programmers and I thought that we might want to embed a copyright string into all of our executable programs that comprised a system that we were building. These were 16-bit MS-DOS programs. It seemed like an innocuous idea.

Each C program in our system included a header file with a few globals that looked something like this:

    char *dave="Copyright ... blah, blah";
    char *bob="More magic and mirth from the mavericks in MicroDev";
    char *jim="CAPS Sucks!";
The system we were replacing was written in a proprietary programming language called CAPS that was a royal pain to work with.

A decade or so later, after we had moved on to other projects and other companies, one of the old users called me. They were using the system to board a customer and in the printed paperwork that the system generated the words "CAPS Sucks!" showed up! They caught the error before sending the paperwork to the new customer, manually correcting it.

I talked with a new programmer overseeing the system. They eventually found the bug in some code they'd added that caused a pointer to go astray which ended up pointing to constants in early portions of memory ( early because those three strings were in the first #include file in the set.) Why it skipped over the first two strings, I'll never know for sure.

Another undefined behaviour error:

We had a macro BAG which looked like:

    #define BAG(x) **x
and people would write things like:

    BAG(x) = f(y);
Now, we had a moving garbage collector, which could cause the value of x to change, so it was important that this was executed "as if".

    temp = f(y); **x = temp;
For many years, this was exactly what gcc always did, but this wasn't required by C. An update to gcc caused, when optimisation was high enough and gcc felt like it, to instead first dereferencing x, and then calculating f(y), then assigning f(y) to the old value of

But, this would only cause a problem when calling f(y) caused a GC which would change x, and then a random memory location was written to.

Hi, I understand C to the extent I can write simple, intermediate-scale useful programs, but my foundational grasp of the basics is still sufficiently shaky/incomplete that I can still get easily thrown by simple memory management constructs in practice.

In this case, I academically understand pointer-to-pointer references... but I'm honestly not able to mentally model the above.

First of all, congratulations - you have posted comment #4,294,967,296 to be bitten by the asterisks-are-only-for-formatting-and-all-other-uses-should-be-passive-aggressively-ignored bug in the Arc software powering this site! Your prize is that the explanatory denotations of "* x" in your post have disappeared and everything between the two points you've added asterisks has become italicized instead. (It would be great if someone can figure out a way to successfully convince the admins to just fix this bug already - it has existed for the entirety of this site's existence. </rant>)

With that over...

I don't understand the relevance of the 3rd code block. I get the impression it somehow replaced the first one (...?), but I can't figure out how the #define would have been rewritten without changing the signature to something like

    #define BAG(x, y) temp = y; **x = temp;
which was used like

    BAG(x, f(y))
but the above is not valid code as the type of `temp` is not defined or inferrable.

Next, as I noted above the pointer-to-pointer thing is throwing me for a pretty big loop :) and I don't understand how a double dereference would (in practice) translate to a moving garbage collector.

Like pointers I also have a basic academic understanding of GC, but sorely little practical real-world experience with either to be able to make the connection between the two ideas.

Insight appreciated in advance :)

So, the double pointer dereference is to handle moving memory with C. The idea is we can't (reasonably) change every pointer to a piece of memory (as in raw C the compiler can hide pointers, and we scan the stack, so we don't know if things we find there are pointers or integers).

So, x points to a well knowm place, and that contains a pointer to the place where the data is stored. When we GC, we change where the data is stored, and also change x to point to the new location. Then anyone with x can use x to find the new location of the memory.

This does mean every memory access goes through this "lookup" pointers, and those lookup pointers can't move.

In practice the fix was "tell people not to put BAG on the left hand side of assignments", rather than change this macro which occurs everywhere (although we are also slowly phasing it out)

To do star star x = f(y), star x and f(y) each need to be evaluated. You'd think the right-hand side would always be evaluated before the left-hand, but that's not always possible (what should "arr[i] = i++" do? I stole this from wikipedia btw). In the above case, f(y) has this kind of side effect that change x, which is undefined.

Rather than changing the macro, this could be fixed easily by putting the assignment into its own function. Then the argument (f(y)) must be fully evaluated before star x.

Edit: I have no idea how to escape asterisk on HN :(

The problem with putting the left hand side in a function is you can't assign to the result of a function in C, so it has to be a macro. (although our actual general fix was "stop trying to put anything fancy on the left hand side of an assignment")

> I have no idea how to escape asterisk on HN

Put a space after it: "* x = f(* y)"

”but I can't figure out how the #define would have been rewritten without changing the signature”

It likely is possible in C++, when combined with a generic class that wraps x and defers dereferencing it to the time its copy assignment operator runs.

BAG(x) would return an instance of that class.

For completeness, you also would want an implicit conversion to the type of

in that class, so that


would continue to work

Thank you for providing my daily WTF moment. Do you know how this was tracked down?

The problem was subtle -- some part of a program was writing to a memory location it shouldn't be, making random memory locations 0.

We had a custom moving GC. Fixing the bug involved changing the GC so, we used mmap to map an anonymous buffer storing the GCed objects into many memory locations, and each time a memory block was allocated:

1) Choose a random copy of the memory.

2) Mark all the other blocks as neither readable or writable.

3) Make all references to GC memory we were currently tracking point to that block.

This meant anyone who was keeping a pointer to GCed memory we had lost track of would point to a now protected block of memory, and as soon as they tried to read or write through it would cause a segfault.

This made the program run slower.. so much slower the startup, which used to be about 3 seconds was now about 6 hours. However, it found the bug (and a whole bunch of other GC bugs as well).

Wow, awesome.

How long did it take to track down all the bugs? Presumably (/hopefully) there was copious logging going on allowing you to track down many issues per run...

(Also, how'd you mark the pages as !RW?)

Linux has a function mprotect which let's you mark if a page (usually 4k) of memory can be read, or written to. In these kinds of circumstances it can be used to check if programs are accessing memory they shouldn't be. You do have to do it at a 4k level however, not individual mallocs unfortunately.

Early in my programming “career”, when I was about 15/16, I started writing network programs in C. I was fascinated by the idea that one program could send and receive data with another far far away. One particular story was a great lesson for me (I eventually ended up naming my company after it), and also pretty amusing. It’s not about undefined behaviour, but about not properly understanding C strings. If you’re curious, the story is here: https://martinrue.com/zzuy-a-lesson-in-perseverance

Funny story. Those were great times to be learning programming. Kids these days are swimming in tech, and the thrill of getting a prompt isn't the same as it was then.

Side note: Why do some people insist on calling it "a code" or "codes" (plural)? It's bizarre

Non-native English speakers whose native language doesn't distinguish countable and uncountable nouns is a possibility, but it's still very strange to say "two codes" in those languages, it doesn't have a clear meaning.

Until you see that "code" in ordinary-day language is an ugly word, it's ill-defined, and is used to refer to so many things, but usually with implications of secrecy or espionage. For example, it can refer to the combination of a lock, e.g. a password. It can refer to an encoding scheme or an encoder, e.g. Morse code, Pulse-Code Modulation. It can refer to a codename, e.g. Project X. And most commonly, it refers to an encryption algorithm (or its output), e.g. AES (or ciphertext), or a sentence of something written in a programming language, e.g. "one-liner".

And most people who don't have an engineering background usually don't realize those are not the same thing, they think a telegram in Morse code, a ASCII-encoded string, the combination of their lock, an encryption algorithm, or a line of code in programming language are the same thing, they just picture it as a number (well, they are numbers, all information is numbers, but their visual representations and purposes are very different) - when you have this mentality, "a code" or "two codes" makes sense.

Probably because a lot of people on the internet are not native English speakers, and so struggle to see the difference between "cat" and "code".

I think it's British or European usage.

It’s also somewhat common in the scientific community even in the US.

Whoever wrote the web interface to "p4 blame" at Danger called it "See who made these codes," which always sounded super weird to me

Mine is a simple off by one byte write causing a failure in a separate thread in proprietary barely related code on deinitialization of the library. It took years to find. Not even ASAN, valgrind and debug allocators helped. What did was a full audit including assembly.

They had their own memory allocator and internal threads, the off by one corrupted an atomic flag... Since this is data dependent on a relatively cold path the issue did not appear that often. And less predictably because it was ARM.

Note the multiple stacked undefined behaviors.

> We've all heard the ghost stories about undefined C programs that delete hard disks

Exactly that one. Someone at some job had written a Linux kernel driver - for a subsystem which was definitely not related to any disk IO. This driver had a bug and was subject to a race condition which it lead it in some cases to write to memory which it didn't own. That caused hard drive corruptions on some devices.

I also had another one in an embedded software project: A "filesystem" on the flash memory got corrupted for multiple reasons: The responsible code was missing synchronization - if two threads would be writing at the same time the result was basically undefined. And it also missed checks for dynamic memory allocation failures, and would then to undefined things.

Debugging all those issues had not been particularly fun, and made me a firmer believer in safer programming languages.

Only crashes, as far as I know.

I met a bug from 2010 in Openresty where if you set ngx.header["Cache-Control"] = nil then sometime later maybe nginx will segfault.

nginx uses a pointer to char and length for strings, like any sane C program. For headers that are always present on the response, setting the header value to nil from Lua is implemented as setting it to {NULL, 0} (which is different from the empty string, as far as the lua side is concerned). Sometime later, nginx will call strlcasestrn passing this header value to check whether it contains "private". This involves adding the length of the string to the start pointer to calculate the upper bound. So that's NULL + 0, which is whatever the compiler feels like it is at the time.

A colleague wrote this code...

func( a=1, a++);

Sadness ensued until I disassembled it.

I recall once reading a bug-report thread of how null-check elisions caused a vulnerability in a big piece of software. I've tried to find it often, because it would be nice to reference in response to your kind of question.

I'm afraid that more than an hour of searching hasn't turned anything up. I think it was a Firefox bug, but given my fruitless search, that seems suspect.

Here's to hoping someone else happens to know what I am talking about and shares a link.

You might be thinking of this old linux kernel bug where a gcc optimization that deleted null pointer checks resulted in a vulnerability https://access.redhat.com/security/cve/CVE-2009-1897 . There's a more interesting write up of the bug and patch at https://lwn.net/Articles/342330/ .

Not sure if this is "null-check elisions" but there was the famous case of the Debian SSH daemon package generating easily guessed keys because the packager "fixed" the upstream code... the upstream code was using uninitialized variables to seed randomness, and the packager's software flagged that as a bug, so it was patched by the packager.

Of course, that's not a good way to seed randomness, so I think there's a real argument that the upstream code was, in fact, buggy.

Kind of in the same vein, some code I was porting to a newer C++ compiler had "int error = get_status(x);" and I got a warning that the variable "error" is never used. So I commented the line out. After I finished fixing about 1,000 such warnings, re-running the program showed it no longer performed its serial I/O communication function. Turned out "get_status(x)" should have been called "send_data(x)". What was arguably the most important line out of 10,000 lines in the the project had nothing to differentiate it as special, and performs it's work as a "side effect" of setting a flag variable which is never checked.

I was reading some data off of the network and storing it in a byte array. Then it was being loaded into some SIMD registers (ARM NEON) to do mathy stuff. Ended up in type punning hell when the code ran fine on some machines but not others. IIRC it worked on machines running Linux but not on bare metal.

I had one for shift by negative number. It was in some error correction code. A different team had wrote this code and tested on a PC but the actual hardware used an ARM processor and both implement it differently. We did catch it eventually using a linter tool.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact