
Ask HN: What's your worst undefined C story? - idiliv
We&#x27;ve all heard the ghost stories about undefined C programs that delete hard disks or set the computer on fire.
However, usually, C programs with undefined behavior either crash or seem to be working.<p>So, has anyone ever experienced behavior that went beyond that?
======
koala_man
I was unable to find the quote, but someone once said that "We always focus on
the negative effects of undefined behavior, but literally anything could
happen. It doesn't have to be bad! Let's replace fear with hope."

~~~
gHosts
I hope those nasal daemons are friendly.

~~~
egdod
Maybe they can solve the halting problem.

~~~
AstralStorm
Yes, usually by introducing an infinite loop. Never halts. ;)

------
throwawa9999
\- Removing Linux NULL pointer check:
[https://blog.regehr.org/archives/970](https://blog.regehr.org/archives/970)

\- Time traveling undefined behavior:
h[https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633)

\- bool that is both true and false: [https://markshroyer.com/2012/06/c-both-
true-and-false/](https://markshroyer.com/2012/06/c-both-true-and-false/)

\- Turning finite loops into infinite one:
[https://stackoverflow.com/questions/24296571/why-does-
this-l...](https://stackoverflow.com/questions/24296571/why-does-this-loop-
produce-warning-iteration-3u-invokes-undefined-behavior-an)

From:
[https://twitter.com/shafikyaghmour/status/114602835166622925...](https://twitter.com/shafikyaghmour/status/1146028351666229250?s=20)

~~~
Judgmentality
Just out of curiosity, why did you make a throwaway account for this comment?

~~~
bcaa7f3a8bbc
Some people prefer to do it due to privacy concerns. In the ideal world, I
don't want others to deduce my identity solely from my opinions posted on a
random Internet forum, too. If you can identify everyone's identity or
background from people's post history on every random forum, I think it shows
something is very wrong about the web. And I don't even mean controversial
opinions, just normal talks.

In additional to concerns of mass surveillance and privacy, there is a also
cultural factor, for example, members from the WELLs (a forum) or early Usenet
are expected to post publicly using their real names, on the other hand, for
example, privacy is highly valued socially in Japan, it's considered eccentric
or offensive to reveal one's identity online when people are not speaking
publicly. I remember reading a story of how an owner of a cat whose pictures
went viral worldwide refused to reveal his/her identity to any journalist, not
even interviewing the cat, including a U.S. journalist who perused the case
sincerely.

It's discouraged by the HN Guidelines to make throwaway accounts constantly.

> _Throwaway accounts are ok for sensitive information, but please don 't
> create accounts routinely. HN is a community—users should have an identity
> that others can relate to._

Since it defeats the karma system and leads to abuses, such as publishing
defamatory comments in a hit-and-run manner. But personally, I don't have a
problem with them, per se, as long as the comment is constructive.

~~~
notananthem
Upvoted this, for some strange reason

------
jim_lawless
In the late 80's, two of my fellow C programmers and I thought that we might
want to embed a copyright string into all of our executable programs that
comprised a system that we were building. These were 16-bit MS-DOS programs.
It seemed like an innocuous idea.

Each C program in our system included a header file with a few globals that
looked something like this:

    
    
        char *dave="Copyright ... blah, blah";
        char *bob="More magic and mirth from the mavericks in MicroDev";
        char *jim="CAPS Sucks!";
    

The system we were replacing was written in a proprietary programming language
called CAPS that was a royal pain to work with.

A decade or so later, after we had moved on to other projects and other
companies, one of the old users called me. They were using the system to board
a customer and in the printed paperwork that the system generated the words
"CAPS Sucks!" showed up! They caught the error before sending the paperwork to
the new customer, manually correcting it.

I talked with a new programmer overseeing the system. They eventually found
the bug in some code they'd added that caused a pointer to go astray which
ended up pointing to constants in early portions of memory ( early because
those three strings were in the first #include file in the set.) Why it
skipped over the first two strings, I'll never know for sure.

------
CJefferson
Another undefined behaviour error:

We had a macro BAG which looked like:

    
    
        #define BAG(x) **x
    

and people would write things like:

    
    
        BAG(x) = f(y);
    

Now, we had a moving garbage collector, which could cause the value of _x to
change, so it was important that this was executed "as if".

    
    
        temp = f(y); **x = temp;
    

For many years, this was exactly what gcc always did, but this wasn't required
by C. An update to gcc caused, when optimisation was high enough and gcc felt
like it, to instead first dereferencing x, and then calculating f(y), then
assigning f(y) to the old value of __x.

But, this would only cause a problem when calling f(y) caused a GC which would
change _x, and then a random memory location was written to.

~~~
exikyut
Hi, I understand C to the extent I can write simple, intermediate-scale useful
programs, but my foundational grasp of the basics is still sufficiently
shaky/incomplete that I can still get easily thrown by simple memory
management constructs in practice.

In this case, I academically understand pointer-to-pointer references... but
I'm honestly not able to mentally model the above.

First of all, congratulations - you have posted comment #4,294,967,296 to be
bitten by the asterisks-are-only-for-formatting-and-all-other-uses-should-be-
passive-aggressively-ignored bug in the Arc software powering this site! Your
prize is that the explanatory denotations of "* x" in your post have
disappeared and everything between the two points you've added asterisks has
become italicized instead. (It would be great if someone can figure out a way
to successfully convince the admins to just fix this bug already - it has
existed for the entirety of this site's existence. </rant>)

With that over...

I don't understand the relevance of the 3rd code block. I get the impression
it somehow replaced the first one (...?), but I can't figure out how the
#define would have been rewritten without changing the signature to something
like

    
    
        #define BAG(x, y) temp = y; **x = temp;
    

which was used like

    
    
        BAG(x, f(y))
    

but the above is not valid code as the type of `temp` is not defined or
inferrable.

Next, as I noted above the pointer-to-pointer thing is throwing me for a
pretty big loop :) and I don't understand how a double dereference would (in
practice) translate to a moving garbage collector.

Like pointers I also have a basic academic understanding of GC, but sorely
little practical real-world experience with either to be able to make the
connection between the two ideas.

Insight appreciated in advance :)

~~~
desertrider12
To do star star x = f(y), star x and f(y) each need to be evaluated. You'd
think the right-hand side would always be evaluated before the left-hand, but
that's not always possible (what should "arr[i] = i++" do? I stole this from
wikipedia btw). In the above case, f(y) has this kind of side effect that
change x, which is undefined.

Rather than changing the macro, this could be fixed easily by putting the
assignment into its own function. Then the argument (f(y)) must be fully
evaluated before star x.

Edit: I have no idea how to escape asterisk on HN :(

~~~
CJefferson
The problem with putting the left hand side in a function is you can't assign
to the result of a function in C, so it has to be a macro. (although our
actual general fix was "stop trying to put anything fancy on the left hand
side of an assignment")

------
CJefferson
The problem was subtle -- some part of a program was writing to a memory
location it shouldn't be, making random memory locations 0.

We had a custom moving GC. Fixing the bug involved changing the GC so, we used
mmap to map an anonymous buffer storing the GCed objects into many memory
locations, and each time a memory block was allocated:

1) Choose a random copy of the memory.

2) Mark all the other blocks as neither readable or writable.

3) Make all references to GC memory we were currently tracking point to that
block.

This meant anyone who was keeping a pointer to GCed memory we had lost track
of would point to a now protected block of memory, and as soon as they tried
to read or write through it would cause a segfault.

This made the program run slower.. so much slower the startup, which used to
be about 3 seconds was now about 6 hours. However, it found the bug (and a
whole bunch of other GC bugs as well).

~~~
exikyut
Wow, awesome.

How long did it take to track down all the bugs? Presumably (/hopefully) there
was copious logging going on allowing you to track down many issues per run...

(Also, how'd you mark the pages as !RW?)

~~~
CJefferson
Linux has a function mprotect which let's you mark if a page (usually 4k) of
memory can be read, or written to. In these kinds of circumstances it can be
used to check if programs are accessing memory they shouldn't be. You do have
to do it at a 4k level however, not individual mallocs unfortunately.

------
martinrue
Early in my programming “career”, when I was about 15/16, I started writing
network programs in C. I was fascinated by the idea that one program could
send and receive data with another far far away. One particular story was a
great lesson for me (I eventually ended up naming my company after it), and
also pretty amusing. It’s not about undefined behaviour, but about not
properly understanding C strings. If you’re curious, the story is here:
[https://martinrue.com/zzuy-a-lesson-in-
perseverance](https://martinrue.com/zzuy-a-lesson-in-perseverance)

~~~
pstuart
Funny story. Those were great times to be learning programming. Kids these
days are swimming in tech, and the thrill of getting a prompt isn't the same
as it was then.

------
shaneprrlt
I guess you weren't exaggerating: [https://www.quora.com/Can-a-code-be-
written-to-blow-up-a-com...](https://www.quora.com/Can-a-code-be-written-to-
blow-up-a-computer)

~~~
tcbasche
Side note: Why do some people insist on calling it "a code" or "codes"
(plural)? It's bizarre

~~~
AnimalMuppet
I think it's British or European usage.

~~~
mokus
It’s also somewhat common in the scientific community even in the US.

~~~
enf
Whoever wrote the web interface to "p4 blame" at Danger called it "See who
made these codes," which always sounded super weird to me

------
AstralStorm
Mine is a simple off by one byte write causing a failure in a separate thread
in proprietary barely related code on deinitialization of the library. It took
years to find. Not even ASAN, valgrind and debug allocators helped. What did
was a full audit including assembly.

They had their own memory allocator and internal threads, the off by one
corrupted an atomic flag... Since this is data dependent on a relatively cold
path the issue did not appear that often. And less predictably because it was
ARM.

Note the multiple stacked undefined behaviors.

------
Matthias247
> We've all heard the ghost stories about undefined C programs that delete
> hard disks

Exactly that one. Someone at some job had written a Linux kernel driver - for
a subsystem which was definitely not related to any disk IO. This driver had a
bug and was subject to a race condition which it lead it in some cases to
write to memory which it didn't own. That caused hard drive corruptions on
some devices.

I also had another one in an embedded software project: A "filesystem" on the
flash memory got corrupted for multiple reasons: The responsible code was
missing synchronization - if two threads would be writing at the same time the
result was basically undefined. And it also missed checks for dynamic memory
allocation failures, and would then to undefined things.

Debugging all those issues had not been particularly fun, and made me a firmer
believer in safer programming languages.

------
anonymoushn
Only crashes, as far as I know.

I met a bug from 2010 in Openresty where if you set ngx.header["Cache-
Control"] = nil then sometime later maybe nginx will segfault.

nginx uses a pointer to char and length for strings, like any sane C program.
For headers that are always present on the response, setting the header value
to nil from Lua is implemented as setting it to {NULL, 0} (which is different
from the empty string, as far as the lua side is concerned). Sometime later,
nginx will call strlcasestrn passing this header value to check whether it
contains "private". This involves adding the length of the string to the start
pointer to calculate the upper bound. So that's NULL + 0, which is whatever
the compiler feels like it is at the time.

------
gHosts
A colleague wrote this code...

func( a=1, a++);

Sadness ensued until I disassembled it.

------
rocqua
I recall once reading a bug-report thread of how null-check elisions caused a
vulnerability in a big piece of software. I've tried to find it often, because
it would be nice to reference in response to your kind of question.

I'm afraid that more than an hour of searching hasn't turned anything up. I
think it was a Firefox bug, but given my fruitless search, that seems suspect.

Here's to hoping someone else happens to know what I am talking about and
shares a link.

~~~
bdamm
Not sure if this is "null-check elisions" but there was the famous case of the
Debian SSH daemon package generating easily guessed keys because the packager
"fixed" the upstream code... the upstream code was using uninitialized
variables to seed randomness, and the packager's software flagged that as a
bug, so it was patched by the packager.

Of course, that's not a good way to seed randomness, so I think there's a real
argument that the upstream code was, in fact, buggy.

~~~
phaedrus
Kind of in the same vein, some code I was porting to a newer C++ compiler had
"int error = get_status(x);" and I got a warning that the variable "error" is
never used. So I commented the line out. After I finished fixing about 1,000
such warnings, re-running the program showed it no longer performed its serial
I/O communication function. Turned out "get_status(x)" should have been called
"send_data(x)". What was arguably the most important line out of 10,000 lines
in the the project had nothing to differentiate it as special, and performs
it's work as a "side effect" of setting a flag variable which is never
checked.

------
Rebelgecko
I was reading some data off of the network and storing it in a byte array.
Then it was being loaded into some SIMD registers (ARM NEON) to do mathy
stuff. Ended up in type punning hell when the code ran fine on some machines
but not others. IIRC it worked on machines running Linux but not on bare
metal.

------
pkaye
I had one for shift by negative number. It was in some error correction code.
A different team had wrote this code and tested on a PC but the actual
hardware used an ARM processor and both implement it differently. We did catch
it eventually using a linter tool.

