Hacker News new | past | comments | ask | show | jobs | submit login
Every programmer should read the source to abort() at some point in their life. (reddit.com)
240 points by raldi on July 19, 2011 | hide | past | web | favorite | 58 comments

Reminded me of "5 ways to reboot a PC, none of them reliable" from a couple months ago.


Interesting how fundamentally simple tasks like aborting a process or rebooting the machine have very nontrivial (even kludgey) implementations.

Raymond Chen wrote a classic blog post on how process exits on WinXP. But if you're a developer, know that it's not the same for Win7.


And OSX's, which is rather similar to FreeBSD's but adds:

* Writing to NULL

* Writing to address 1 (unaligned write)

* Writing to text space (read-only machine code)

* Dividing by 0

* More violence than SIGABRT (SIGILL, SIGBUS)


Looks to me like glibc and FreeBSD are the only ones that flush stdout, which I'd view as a bug on the other systems...

(Note: The uclibc implementation in the OP also attempts to flush stdout as part of the _stdio_term() call)

It's a bug in the spec, if anything. See the abort() spec http://pubs.opengroup.org/onlinepubs/009695399/functions/abo...

Vs the exit() spec http://pubs.opengroup.org/onlinepubs/000095399/functions/exi...

abort() is intended as a last-ditch effort. exit() is the one that attempts to flush all open buffered file descriptors, and should be used in lieu of abort except in cases where you know you're screwed, or explicitly want to throw a signal so a debugger can take a peek.

Since dereferencing a null pointer and dividing by 0 are undefined by C, is the compiler required to emit the code for them? In practice, does it?

In practice, <asterisk>(int <asterisk>)0 (how do I escape asterisks on HN?) and similar are popular idioms for "segfault here". Making them break is not an optimization any compiler maintainer would bother to make - it requires a special case and there don't seem to be any benefits to justify the effort.

See the series of articles starting with http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

In practice: compilers usually don't touch expressions with volatile vars.

Plan 9's abort causes an access fault, causing the current process to enter the `Broken' state. The process can then be inspected by a debugger. Pretty elegant.


In practical terms, does this differ significantly from killing the process and leaving a coredump?

Beautiful! The gurus do not disappoint.

Very elegant, though in some cases 0x0 is addressable, in which case, the abort never happens.

Not, as it happens, on plan 9 systems where the kernel marks that area as not accessible. This could fail, conceivably, but then you're in much deeper stuff.

0 cast as a pointer is not necessarily 0x0.

The spec says it equals NULL, which, too, is not necessarily 0x0.

again, what's with the while()? even if it doesnt cause a segmentation fault, there's a chance it will evaluate to 0.

gcc, at least, optimises out the null deref for both:-

    *(int *)0;

        *(int *)0;
So the first bit of code does nothing, and the second slips off into an infinite loop.

I suspect that treating expressions that demonstrably lack side effects (other than the intended segfault here of course) as statements is undefined, and hence these are getting optimised out (even with -O0).

Clearly with:-

    while(*(int *)0)
The expression is being evaluated and is therefore not elided, I guess the choice of while is to 'be cute' as others have suggested, and I guess the world is sane in plan 9 and 0 is readable so you can't get a situation where it escapes the loop. Perhaps there is a deeper reason here that I am missing, however.

(int *)0 is not defined as a pointer to memory address 0x0 on architectures that support such an address.

0 cast as a pointer is defined by the spec to always be the NULL pointer, which on such architectures would have a value other than 0x0 and not point anywhere addressable.

Thanks. I could never guess it could be optimized out.

I'm not a plan 9 programmer, but to me it looks 'cute' (in the sense of attractive to some people but annoying to others) - that form of abort() would only be used on systems where that operation is known to abort the process, but enclosing it in a while simply makes it apparent that there is no alternative to trying it.

On a more prosaic note, perhaps

        *(int *)0;
generates a compiler warning that the programmer wanted to avoid.

As far as I know, the kernel programs the MMU so that dereferencing 0 will always fault. I could be wrong, as my understanding of the kernel is limited. I am not sure of the purpose of the loop, but to me it make it unavoidably obvious that the function never returns.

Great article, picking apart low level code like this can be super informative - and you've explained it well.

Link to abort.c source: https://gist.github.com/1093410

I may be wrong, but I thought that in a multithreaded environment, doing i++ is not atomic and could result in garbled data. Instead you should use __sync_add_and_fetch. However, I have no idea if it should be used inside abort().

>I may be wrong, but I thought that in a multithreaded environment, doing i++ is not atomic and could result in garbled data. Instead you should use __sync_add_and_fetch. However, I have no idea if it should be used inside abort().

This is only true if the variable in question's memory is accessed by multiple threads at the same time, and there isn't any locking or synchronization method used to protect the memory.

In this case, even though it is a globally scoped variable, it's locked by the globally scoped mutex declared in the file. All increments are done in the locked sections, so there isn't any possibility of accessing the variable without having a lock.

It should be noted that there is a very minor race condition when abort() is called in two different threads sequentially, and every attempt up to line 89 doesn't work. The first call will get the lock, then go through to line 89, where it released the lock. The second thread will then get the lock, and go through the first section. When it hits the section line 89(if (been_there_done_that == 0)), that will resolve to false, because been_there_done_that is 1. It will then go on, leaving the first thread deadlocked at the LOCK attempt on line 91. This shouldn't result in any missed functionality, but I actually wonder why they're releasing the lock in the first place. Raise() isn't thread safe anyway, because the signal is applied to all threads in the process. Plus, you're trying to suicide the program. It's a bad idea to even have the possibility of multiple threads trying to kill themselves at the same time.

The lock must be released because the same thread might reenter abort() in the signal handler, and without a release in the parent abort(), the program would hang.

You are right. Re-reading the code, I see that it re-acqures the lock.

Since the lock is not guaranteed in the code, the variable is globally defined and the code only ever increases it. This means a step in the chain of killing could get skipped, but that doesn't matter, as there's always a more violent option (or just the infinite loop).

On an 8-bit system with 16-bit-wide ints, i++ is almost certainly not atomic (with respect to threads, interrupts, etc.).

A few months ago I fixed this exact bug in a developer's code, on a 8-bit embedded processor.

i++ is atomic in most cases, unless i is an excessively wide integer, in abort I think they lock anyway so it doesn't matter (at least in uclibc)

Actually, while the write itself is generally atomic, most systems don't guarantee that the entire increment operation is (for performance reasons).

And since abort still needs to work even without locks and on every platform...

Wow, abort() is much more polite than I assumed.

Why only abort()? Entire Kernal code is beautiful http://lxr.linux.no/

Most of the code in glibc has been heavily Dreppered.

What does Dreppered mean?

Ulrich Drepper tends to write software that is easy for him to read (one would hope) -- but is very difficult for anyone else to follow.

But you have to admit, his abort is pretty damn easy to read.

I admit it's heavily commented, and his usual tangle of preprocessor macros are blissfully absent.

But it contains hints of Drepperification, like the superfluous use of preincrement.

When I don't care about the result, I always write preinc/decrement too. Sure, it's superfluous on any non-braindead compiler (it should be able to see that you don't care about the result of a postincrement and elide the temporary), but it's just habit at this point. I fail to see how it reduces or changes readability though.

Sounds like you just have an axe to grind with Drepper.

I don't have anything personal against Drepper. I've never had any direct experience with him of any kind.

I thoroughly enjoyed his article about memory. He is obviously an extremely intelligent and knowledgeable guy.

I am afraid that he is too clever by half though, insofar as good code is clean and readable first, and clever second. Every time I've had an opportunity to interact with the glibc codebase I'm dismayed that such an important, core piece of software has been written so cleverly that it essentially can only be maintained by one guy.

That's unfortunate; I'll have to look at some glibc code sometime. Could be fun... or, as you point out, dismaying :-/

Ulrich Drepper is the maintainer of glibc; I don't know what the comment is intended to mean, though.

It's probably a quip referring to Ulrich Drepper, a kernel hacker whose personality seems to be quite controversial according to a quick Google search. I'd love to hear the GP explain it further though.

Is he a kernel hacker? I thought he was only a glibc hacker.

very true. but glibc's abort() is IMHO ok.

And also not very robust. Sure, it'll halt execution -- of that one thread, anyway -- but if the program has a SIGABRT handler installed that doesn't exit, abort() will fail to do its job. It'd be nice to try just a little harder to kill the program.

I liked the way they commented there.Reading the code with comments was like reading an comic book "Still here? We are screwed!" :-)

Why is the outer while(1) loop needed? Seems redundant to me.

The link specifically asks that question -- there are some good guesses in the comments.

the one in line 87. It looks like it might be added later because it's not even indented, but clearly it doesn't serve any purpose and it doesn't make it more readable (you can tell the function never returns from the second while(1) loop).

> it's not even indented

It's an issue of mixed tab-space indentation, with tabs being displayed at 4 spaces instead of the 8 spaces they were intended to be.

You know... just in case.

Every programmer should implement an operating system and compiler. I did. Now get off my lawn.


Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact