Hacker News new | past | comments | ask | show | jobs | submit login
How can C Programs be so Reliable? (2008) (tratt.net)
251 points by adamnemecek on Dec 5, 2013 | hide | past | favorite | 221 comments



The author makes a good point about the discipline imposed by not having exceptions. Programmers tend to write code in one of two modes: the quick-and-dirty mode where you consistently don't check return values, or the built-to-last mode where you consistently do. If you start in the first mode and have to fix a bug, you often have to add error checking all up and down the call chain from where it occurs to where it's handle, so you very quickly migrate toward the second mode. By contrast, programmers in exception-based languages often get to a state where they catch some exceptions in some places where their tests have found bugs, but large sections remain in quick-and-dirty mode. The code tends to keep running longer after the actual error, leaving latent bugs that are harder to debug when they actually surface. The kinds of errors that are characteristic to C, such as segfaults from bad pointer handling, have obvious effects and more often have obvious causes as well.

I'm not totally anti-exception, and I think manual memory management is only appropriate in an increasingly small set of situations. I'd rather program in Python than C when I have the choice, but I acknowledge that the Python code I write is probably a bit less robust than the C code I write. I'm probably not alone. Theory aside, in practice the greater prevalence of sloppy error handling in languages other than C seems to leave the field pretty level when it comes to robustness.


It depends on the problem you're trying to solve, I think.

Let's consider a command line application that fetches a URL, like wget. Without exceptions, you would check the return code of all the system functions you call (dns, sockets, etc), and if any of them fail you can't really recover. You just write out an error message and exit. With exceptions, you could wrap the whole thing and if any system function throws an exception, you print out the exception and exit. So it's much the same in this case.

Now put the url fetch code into a larger application. With exceptions this is pretty easy - you can catch any exception thrown by your url fetch code and then retry it a few times until it succeeds, or you inform the user of the app. Without exceptions this is tougher as you need to unroll everything manually in order to retry.

Finally we have something complicated like running a simulation. If something gives an error, we probably don't want to stop the whole simulation. We'd want to check the specific error result and then fix the simulation and continue. In this case exceptions would add a lot of boiler plate code and would need to be handled just as carefully as error results (eg, wrapping each call in its own try and handling each error).

So it's pretty clear that it depends is very applicable to when you'd rather use exceptions or error results. In general I think exceptions are better as long as they're only used for actual error cases as it gives the user of the functions more control over where and at what level to handle exceptions.


> Now put the url fetch code into a larger application. With exceptions this is pretty easy - you can catch any exception thrown by your url fetch code and then retry it a few times until it succeeds, or you inform the user of the app. Without exceptions this is tougher as you need to unroll everything manually in order to retry.

Unless you have something like goroutines - or Erlang-style processes - and perhaps employ a moderately disciplined coding style. Retrying those is just a matter of relaunching them, isn't it?

Also, Common Lisp got exceptions mostly right. Many other languages...not so much, I guess.


> Without exceptions this is tougher as you need to unroll everything manually in order to retry.

I don't get why you think this is a problem. You can trivially "unroll" it by explicitly writing your functions to be single-exit and "unrolling" at the end before returning whatever your result is.

"Exception handling" without "real" exceptions is then also trivial: just check the appropriate error code and take whatever action is suitable at the time.


This is why I really like the callback pattern that node.js tends to use for all it's IO api calls... the callback always has the option for an error first.. generally if its' a nested callback, bubble up early. Internally, the C code can just check the state and call the callback method early if an error state arises.


C libraries don't just exit. A C executable could in some circumstances just fprintf() to stderr and exit(1) but usually even that requires sane deinitialization so before exiting you will effectively end up back in main() from the lower levels.

C libraries generally handle their errors on some level, possibly jumping directly out from the lowest levels and then doing deinit before passing back an error code to the caller of the library function, or exiting in a nested fashion by deinitializing stuff and reporting failure to the caller. Whatever works for whichever codebase but try to use any decently popular library, such as libcurl, and see if they just exit in the middle.


Exactly. Writing libraries in C means you have to be much more strict than simple throwaway single execution binaries.

What our company has ended up doing is wrappering most low level functions with the common code around it. For example, fopen() may fail and return EINTR if the process received a signal just as it was opening a file (we use signals to tell processes to reread their config so this does happen albeit rarely) so the wrapper calls fopen() in a do/while loop that repeats on EINTR (and a few others). It saves you from writing the same 50 odd lines each time you want to open a simply open a file.

In an exception led world you'd still have to do the same thing, but with different syntactic sugar. You can't just fail and bubble up the exception without dealing with the few exceptions you must deal with and repeat, and you can't just repeat on every exception as most will just fail again and again. You end up writing code to do the same thing just in different styles.

Checking the return values of every single function call, and dealing with it, can make the code verbose (one line of code and then 10+ lines dealing with errors) but it is worth it in the end for bullet proof programs. Also code where you have to undo all of the work already done in the function up until that point, which can get a little tedious, i.e.:-

    APP_RET makefoo( char *fname )
    {
      char *foo,*bar;
      thingy_t *qux;
      size_t len;
    
      ASSERT( fname );
    
      len = strlen( fname );
    
      foo = malloc( len+1 );
      if( foo == NULL ) {
         return( ENOMEM );
      }
      bar = malloc( (2*len)+1 );
      if( bar == NULL ) {
         free( foo );
         return( ENOMEM );
      }
      qux = malloc( sizeof( thingy_t ) );
      if( qux == NULL ) {
         free( foo );
         free( bar );
         return( ENOMEM );
      }
      ...
      return( APP_OK );
    }
It's the job of anything calling makefoo() to deal with the various errors it can return, but the idea is to return the minimum number of unique error codes as necessary to avoid proliferation of error codes to the higher and higher functions. Many calling functions will only really care about success or failure, and will just use the return code to log out the reason for the failure.

The wrappers help deal with many situations; did fwrite() write all of the bytes we wanted or just some of them? Well, our wrapper around fwrite will handle short writes and repeat the call depending on the result of ferror().


> Also code where you have to undo all of the work already done in the function up until that point, which can get a little tedious, i.e.:-

Sorry pal, you're doing it wrong. At least, this is not the way I've done it and seen it done in large C code bases.

You're supposed to have only one return statement, and one block that frees everything. For example if you initialize `foo`, `bar`, and `qux` to `NULL`. Then testing these pointers for `NULL` de facto tells you how far you got in the function, and which buffers need to be freed. Just before your one single return statement (can't emphasize this enough) you call `free` on all of them regardless of success or failure. It's much more composeable than what you described - allocations for `foo`, `bar`, `qux` can fail, the ones that will be not yet allocated at that point in time will be `NULL`, and `free(NULL)` is a harmless no-op.

None of this business of "I've got to return now, let's see, how many of these buffers do I need to release at this point in time?", with varying amounts of the same free statement appearing redundantly. Write the cleanup block once when the pointers are about to fall out of scope, have it able to run in both success and failure cases and be done with it. Think of it as a more manual RAII if you like.

As for what to replace those early `return`s with, the two common schools are `goto` into the cleanup block, or repeatedly checking some kind of failure status variable before performing new actions.


What you've described looks just as messy to my eyes, and I've seen all different ways to do this.

You can nest under if(thing!=NULL) but then you end up with indentation creep.

You can use the goto pattern if you like but some folks will tell you that goto's are never, ever to be used.

Or you can do what's done above. When it comes down to it the logic is basically identical and it's just down to code formatting.


> You can nest under if(thing!=NULL) but then you end up with indentation creep.

I am not suggesting any nesting of anything. Repeatedly checking at the same indentation level.

> You can use the goto pattern if you like but some folks will tell you that goto's are never, ever to be used.

What matters more to you, getting stuff right or repeating adages that other people have said out of context? `goto` is probably the cleanest way to do it in plain C.

> When it comes down to it the logic is basically identical and it's just down to code formatting.

No actually, it is quite a bit more than style, doing it the way alexkus has it is much much much less maintainable when it's done all over a large code base. He's got `foo`, `bar`, `qux`. What if a year later some future maintainer totally unfamiliar with the code needs to add another one? Then it's up to that person to find all of the exit paths, make sure that `foo`, `bar`, `qux` and the new thing are freed in all cases. Doing that is a lot harder if the free statements are all over the place and repeated several times instead of in a single block.


>> I am not suggesting any nesting of anything. Repeatedly checking at the same indentation level.

This could be considered wasteful. You end up checking if something is NULL, if it is you jump to the cleanup code and immediately check again. Nesting may be more elegant.

>> What matters more to you, getting stuff right or repeating adages that other people have said out of context? `goto` is probably the cleanest way to do it in plain C.

Writing code in a way consistent with the team I work with and the established codebase.

>> What if a year later some future maintainer totally unfamiliar with the code needs to add another one?

Then they need to read what the function is doing and understand it before they mess with the code, just like in any other situation.

>> Doing that is a lot harder if the free statements are all over the place and repeated several times instead of in a single block.

More verbose certainly, but if the code is written in small, discrete functional blocks then it shouldn't really impact much.

He repeats frees, you repeat tests. An indenter would repeat neither but has indent readability to consider.

Frankly, don't trust anyone that tells you that they have the one true way to do things.


> This could be considered wasteful.

It's true that there is an extra compare. I think it's a small cost for maintainable code.

> Then they need to read what the function is doing and understand it before they mess with the code, just like in any other situation.

Sounds great, however, the time they spent figuring out your haphazard, repetitive and confusing free() statements could be better spent somewhere else. When I said that my way is more "composeable" I meant that adding, removing, or re-ordering operations is a cheaper operation for the programmer. Follow this style and you'll spend less time trying to read and figure out code because it will fit the existing convention and will be bleedingly obvious where the buffers are released.

I may have been a bit hyperbolic with calling this "correct" or "supposed" however I didn't make up these conventions, I advocate them because I have seen them work really well and I have seen yours create mounds of inflexible spaghetti.


>> It's true that there is an extra compare. I think it's a small cost for maintainable code.

And I don't think it's the only way to achieve maintainable code. That's all I'm saying.

>> Sounds great, however, the time they spent figuring out your haphazard, repetitive and confusing free() statements

If it's the coding standard of the product that you have a small block of this at the top before you start actually writing the function then it doesn't take any more time for a coder to understand than any other way around, the important part is consistency.

>> I advocate them because I have seen them work really well and I have seen yours create mounds of inflexible spaghetti.

What's mine? I'm not advocating any of them, just sticking to one. I still don't think yours is any better than (for instance) -

    type function()
    {
        type2 *thing = (cast) malloc (size);
        type2 *thing2;

        if (thing)
        {
            thing2 = (cast2) malloc (size2);
            if(thing2)
            {
                // do some stuff here
                // and some more stuff
                ...
                free(thing2);
            }
            free(thing);
        }
        return code;
    }
A pattern which auto-unrolls as it exits without the need for more tests, and to me is every bit as maintainable as the goto cleanup; pattern.

Spaghetti code (to me) is more about encapsulation and modularisation failures than it is about the content of any individual function.


So you avoid a few null checks but you introduce the indentation problem you mentioned a few comments ago.

I think we can agree this is better than the alexkus example but there are tradeoffs involved.

(PS: I am bothered at how you cast the return value of malloc, this is C we're talking about right, not some other language with more plus signs? :-))


The logic may be identical but the scope for programmer error is not. Consider the possible mistakes someone can write by doing it each different way.


I've seen errors in all the ways considered here.


But what if the exception was in your code and not the system call? This is incredibly dangerous.

Monads are perfect for this. Really combines the best of both worlds without all the unspecified/undefined behavior.


You could always write Monads in C... http://blog.sigfpe.com/2007/02/monads-in-c-pt-ii.html


Then it should be a different exception, and if it matters, you should be treating that class of exceptions differently. Sometimes it doesn't matter.


Your experience with languages with exceptions seem to come from people who misuse them. Randomly placing catch clauses around in the code is not good practice, even if perhaps a majority of all programmers in safe languages code that way. That causes latent bugs that are incredibly hard to debug.

The trick is to almost never ever catch exceptions. For example, in his post he describes a bug caused by accessing beyond allocated memory. That would in a safe language immediately have caused an ArrayIndexOutOfBoundsException (or equivalent) which the programmer would have fixed. In C, errors are often "silent" because you can't be bothered to check the return value of every call to printf. In a correctly coded program in a safe language errors are never ever silent.


"Your experience with languages with exceptions seem to come from people who misuse them."

Yes, it does, because people who misuse them seem to be a majority. I think that's an important point in language (or for that matter any kind of) design. You can't just look at how well things work for the experts - a mistake made by both Common Lisp and C++ in different ways). Nor can you just consign all non-experts to some straw-man-ish "blub" category. You have to look at what skill level is required to make the benefits outweigh the costs, which could be anywhere along the skill continuum, and compare that to the actual skill distribution of the programmers you have (or will have after hiring and/or training).

The sad fact seems to be that the tipping point for exceptions is at a point that leaves most programmers on the bad side. The same is almost certainly true for any kind of meta-programming. It might even be true for closures and continuations. "Primitive" languages lacking features like exceptions or GC surely do trip up the true beginners, but leave fewer traps further along the path.


Somewhat OT, but I find it curious that you would, in the same breath so to speak, put closures and continuations in the same category of "hard things". To me, continuations are still pretty mysterious, but closures are a pretty simple, usable idea.


Continuations are extremely counterintuitive and should only be used as low-level building blocks. Exceptions are actually a kind of continuation, and in Common Lisp the compiler uses continuations to implement exceptions (i.e. "conditions") and the restarts system. It is common in functional languages for the compiler to use continuation passing style as an intermediate representation, where instead of returning from a function you will invoke the "return continuation" that was passed to the function as an argument (i.e. it is the code that follows the function call -- where you return to).


> The trick is to almost never ever catch exceptions.

I strongly disagree. Not catching exceptions leaks abstraction layers. If I have a Prefs::save() method, I don't want it throwing a DiskFullException when the Prefs class is an abstraction of a preferences datastore. I don't care what is the final store, as long as it fits the abstraction. A well designed abstraction will catch and wrap the exception into something that makes sense at that level of abstraction, never leaking implementation details.


"A well designed abstraction will catch and wrap the exception into something that makes sense at that level of abstraction, never leaking implementation details."

This makes recovering from the error rather difficult. If the problem is that a disk is full, I need to do something about the disk being full (maybe ask the user to delete some files). If the problem is that the disk was disconnected, I need to do something else about it.

The real issue here is that you are thinking of exceptions as they exist in languages like C++ and Java, where you destroy your call stack in order to locate the exception handler. Such languages make the difficult problem of error recovery that much harder. Common Lisp does it better: the handler is allowed to invoke restarts (if they exist), which the function that signaled the error sets up. This encapsulates things very neatly. The disk was full? The error is signaled by write, which sets up a restart that tries to continuing writing to the disk. At the next level of abstraction, you might have a restart to remove the half-written record from your disk. In theory, you might only need one top-level exception handler, which interacts with the user as needed to recover from errors (or politely inform them that no recovery is possible).


I'm not familiar with the concept of restarts. I do concede, though, that wrapping extensions limits recovery. Either the library can recover on its own, or it can't fulfill its designed service.

On the other hand, the most revered architectures we have aren't leaky. You don't see network stack code trying to recover from Ethernet collisions at the IP level, or app logic trying to salvage an SQL transaction when a restriction has been tripped. The price for non leaky abstractions is not zero, but the gains are also definitely not zero.


I don't think you understand proper exception handling. Catching and wrapping DiskFullException is pretty pointless because what are you going to do about it? Nothing. It's nonsensical for a preferences class to deal with that situation. Instead let it bubble up and so that the caller has the option of handling it, for example by showing a dialog "Delete temporary files and try again?"

You'll never be able to catch all exception. In addition to your DiskFullException, you have PermissionDeniedException, NFSException, NullPointerException, InvalidFilenameException, PathToLongException ad infinitum. By trying to be "nice" by trying to wrap all those exception you are actually doing your api users a great disservice.


You state a lot of half truths ("you'll never be able to catch all exceptions"), don't justify the assumptions and didn't handle the core of my argument (abstraction leakage). In the hurry to insult me, did you actually read my argument?


Because the core of your argument was based on a stupid rule I don't agree with! You: "Throwing DiskFullException results in abstraction leakage" Me: "No it doesn't.."


My bad. I should have realized earlier I was discussing with a child.

Please accept my apologies.


To your second point: all of these errors should extend IOException ;)


Librairies that wrap exceptions into something else often do a disservice to their users. In this `Prefs:save()` example, what should the wrapping library throw? A "SaveFailedException"? That's more abstract, however now I would need to go check the source code of the library, find where the exception has been converted to something else, comment out the "try/catch" statement, and rerun the program. Then I can finally now what really happened and do something to fix it.


If done right, rethrowing exceptions does not affect the ability to debug the code. You don't throw a pristine new exception on the spot. You wrap the exception in a new one, effectively maintaining all the information. Coded recoverability suffers, but debugging ability does not. You have the same debugging information in the rethrown exception as you did in the bubbled up one.


"Discipline" is indeed the right word.

I was taught C at Epitech. A single segfault, no matter insidious, was a valid reason to render a whole project NULL. We often had evaluations ran with LD_LIBRATY_PATH=malloc_that_fails.so or just piping /dev/urandom to stdin...

Needless to say, calling exit() when a call to malloc() failed wasn't an acceptable recover routine.


> Needless to say, calling exit() when a call to malloc() failed wasn't an acceptable recover routine.

What do you do when malloc fails? A bit of graceful shutdown and logging seems like it would be in order, but otherwise how do you keep rolling if mallocs start failing? It seems to me like that would indicate something has gone unusually wrong and full recovery is futile.


I grew up using the Amiga, when having memory allocation fail was routine (a standard Amiga 500 for example, came with 512KB RAM, and was rarely expanded to more than 1MB, so you would run out of memory).

What you do when malloc() fails depends entirely on your application: If a desktop application on the Amiga would shutdown just because a memory allocation failed, nobody would use it. The expection was you'd gracefully clean up, and fail whatever operation needed the memory, and if possible inform the user to let him/her free up memory before trying again.

This expectations in "modern" OS's that malloc never fails unless the world is falling really annoys me - it for example leads to systems where we use too much swap to the point where systems often slow down or become hopelessly unresponsive in cases where the proper response would have been to inform the user - the user experience is horrendous: Swap is/was a kludge to handle high memory prices; having the option is great, but most of the time when I have systems that dip into swap, it indicates a problem I'd want to be informed about.

But on modern systems, most software handles it so badly that turning swap off is often not even a viable choice.

Of course there are plenty of situations where the above isn't the proper response, e.g. where you can't ask the user. But even for many servers, the proper response would not be fall over and die if you can reasonably dial back your resource usage and fail in more graceful ways.

E.g. an app server does a better job if it at least provides the option to dynamically scale back the number of connections it handles rather than failing to provide service at all - degrading service or slowing down is often vastly better than having a service fail entirely.


Isn't fork the real offender, which requires Linux to overcommit by default? Disabling swap shouldn't affect that, right? Just makes your problem happen later, in a somewhat non-deterministic way.

Without fork, what reason do you not disable swap? I can only think of an anonymous mmap where you want to use the OS VM as a cache system. But that's solved easily enough by providing a backing file, isn't it?


> Isn't fork the real offender, which requires Linux to overcommit by default?

fork() != Linux.

Each UNIX system does it on its own way.


Saying that fork forces overcommit is strange. Fork is just one of the things that allocates memory. If you don't want overcommit fork should simply fail with ENOMEM if there isn't enough memory to back a copy of all the writable memory in the process.


I meant the practical considerations of fork means overcommitment is needed in many cases where it otherwise wouldn't be needed. If you fork a 2GB process but the child only uses 1MB, you don't want to commit another 2GB for no reason.


> Isn't fork the real offender, which requires Linux to overcommit by default?

Maybe I'm missing something, but how does fork require overcommitment? When you fork, you end up with COW pages, which share underlying memory. They don't guarantee that physical memory would be available if every page were touched and required a copy; they just share underlying physical memory. Strictly speaking, very little allocation has to happen for a process fork to occur.


If there's no overcommit, each of those COW pages needs some way of making sure it can actually be written to. Isn't that literally the point of overcommit? Giving processes more memory than they can actually use on the assumption they probably won't use it? And Windows takes the different approach of never handing out memory unless it can be serviced (via RAM or pagefile).

What am I missing? (I know you know way more about this than I do.)


When you fork a process, your application's contract with the kernel is such: existing pages will be made accessible in both the parent and the child; these pages may or may not be shared between both sides -- if they are, then the first modification to a shared page will cause an attempt to allocate a copy for that process; execution flow will continue from the same point on both sides. That's pretty much the extent of it (ignoring parentage issues for the process tree). The key thing here is the 'attempt' part -- nothing is guaranteed. The kernel has never committed to giving you new pages, just the old ones.

I don't personally see this as an overcommit, since the contract for fork() on Linux doesn't ever guarantee memory in the way that you'd expect it to. But in all honesty, it's probably just a matter of terminology at the end of the day, since the behavior (write to memory -> process gets eaten) is effectively the same.

Edit: Though I should note, all of the overcommit-like behavior only happens if you are using COW pages. If you do an actual copy on fork, you can fail with ENOMEM and handle that just like a non-overcommitting alloc. So even in the non-pedantic case, fork() really doesn't require overcommit, it's just vastly less performant if you don't use COW.


Oh. I was under the impression that if overcommit was disabled then forking a large process won't work if there's not enough RAM/swap available, regardless of usage.


So out of memory failure won't happen when you malloc, it will happen when you assign a variable in a COW page. This somewhat invalidates the idea of a failing malloc.


The problem is once you touch those memory pages, total memory usage increases even if you don't call malloc.


The most common way to indicate that an error has occurred in a C function is to return non-zero. If this is done consistently, and return values are always checked, an error condition will eventually make its way up to main, where you can fail gracefully.

For example:

    int a(void)
    {
        ...
        if (oops) {
            return 1;
        }
        return 0;
    }

    int b(void)
    {
        if (a(...) != 0) {
            return 1;
        }
        return 0;
    }

    int main(void)
    {
        if (b(...) != 0) {
            exit(EXIT_FAILURE);
        }
        return EXIT_SUCCESS;
    }
(This means that void functions should be avoided.)


It's not so simple in real life. I use this style of error-handling for only one type of project: a library with unknown users. In that case, as I can't make assumptions about the existence of better error handling systems, it gives the most flexible result. But at a price, I know have to document the error codes, and I had damned well better also provide APIs that allow my client to meaningfully recover from the error.

In most I have worked on, this type of error handling is completely inadequate. Think multithreaded applications. The code that needs to handle the error your code just generated isn't in the call stack. This happens very often in my experience, and I have found that the best solution is to post some kind of notification message rather than returning an error code. This creates a dependency on the notification system though, so it's not always the correct solution.

The thing that I dislike the most in your example was when you propagated the error from function a out of function b. my most robust code mostly uses void functions. Error codes are only used in cases where the user can actually do some meaningful action in response to the error, and feNkly this is rarely the case. Instead I try as much as possible to correctly handle errors without propagation. It frees up the user of my APIs from having to worry about errors, and in my opinion thus should be a design goal of any API.


What's the point of propagating all errors way up to main if you're only going to exit anyway? I think we know how to indicate errors in C functions. What to do about specific errors, in this case allocation failures, is a more interesting question.


If malloc() fails now, it might succeed again later. So you can just go on doing everything else you were doing, then try the memory-hungry operation again in the future.

For example, this could be important in systems where you might be controlling physical hardware at the same time as reading commands from a network. It's probably a good idea to maintain control of the hardware even if you don't have enough memory right now to read network commands.


This is a pet peeve of mine with modern applications: So many of them just throw their metaphorical hands in the air and give up.

Prior to swap and excessive abuse of virtual memory this was not an option: If you gave up on running out of memory, your users gave up on your application. On the Amiga, for example, being told an operation failed due to lack of memory and given the chance to correct it by closing down something else was always the expected behaviour.

But try having allocations fail today, and watch the mayhem as any number of applications just fall over. So we rely on swap, which leaves systems prone to death spirals when the swap slows things down instead.

If embedded systems programmers wrote code the same way modern desktop applications developers did, we'd all be dead.


Doesn't it duplicate effort to put the responsibility of checking for available memory on individual applications?

I think, in most computing environments, it should be the operating system's responsibility to inform the user that free memory is running out. Applications should be able to assume that they can always get memory to work with.

I think the swap is an extremely sensible solution, in that executing programs slowly is (in most environments, including desktop) better than halting programs completely. It provides an opportunity for the user to fix the situation, without anything bad happening. Note that the swap is optional anyway, so don't use it if you don't like it.

Comparing modern computing environments to the Amiga is laughable. It's not even comparable to modern embedded environments, because they serve(d) different purposes.

I'm a hobbyist C application/library developer who assumes memory allocation always works.


Most computing environments don't have a user to speak of, and the correct response of an application to an out of memory error could range from doing nothing to sounding an alarm.

As a user, I find it incredibly frustrating when my old but indispensable music software runs out of address space (I have plenty of RAM) and, instead of canceling the current operation (e.g. processing some large segment of audio), just dies with a string of cryptic error dialogs. The best thing for the user in this case is to hobble along without allocating more memory, not to fail catastrophically by assuming that memory allocation always works.

Swap is not a good solution because when there's enough swap to handle significant memory overuse, the system becomes unresponsive to user input since the latency and throughput to swap are significantly slower than RAM.


I think most computing environments do have a user of, if you consider a "user" to be something that be notified and can act on such notifications (e.g. to close applications).

Your music software's problem seems to be a bad algorithm - not that it doesn't check the return values of the `*alloc` functions.Aas you say, it should be able to process the audio in constant space. While I assume that I can always acquire memory, I do still care about the space-efficiency of my software.

I must admit I've never seen my system depending on swap, so I don't know how bad it is. But, if you have 1GB of on-RAM memory already allocated, wouldn't it only be new processes that are slow?

Also, I'd again point out that if you don't like swap, you don't have to have one.


> if you have 1GB of on-RAM memory already allocated, wouldn't it only be new processes that are slow?

No - the memory sub system, will swap out pages based on usage and some other parameters. A new application would most likely result in already running applications least used pages being swapped out.


I must admit I've never seen my system depending on swap, so I don't know how bad it is.

Just for fun, try deliberately creating a swap storm some time. Then try to recover from it :-). Do this on a system that doesn't have other users.


Conversely, if app developers wrote the same code that embedded systems programmers do, we'd never have any apps to use. Its just not worth the trade off- moreover, telling a user to free memory is a losing battle.


> If embedded systems programmers wrote code the same way modern desktop applications developers did, we'd all be dead.

If Boeing made passenger jets the way Boeing made fighters, we'd all be dead, too, but try telling a fighter pilot that they should do their job from a 777. It's two very different contexts.

Besides, some errors can't be recovered from. What do you do when your error logging code reports a failure? Keep trying to log the same error, or do you begin trying to log the error that you can't log errors anymore?


>What do you do when your error logging code reports a failure? Keep trying to log the same error, or do you begin trying to log the error that you can't log errors anymore?

First you try to fix the problem of the logging system by runnig a reorganisation routine (delete old data,...) or reinit the subsystem. If that does not work AND if logging is a manadatory function of you system you make sure to change into a safe state and inidcate a fatal error state (blinking light, beeper, whatever). If the logging is such an important part of the system surrounding your system it might take further actions on its own and reinit your system (maybe turn your system off and start another logging systen). There is no exit. You never give up.


It's an even better idea to make the hardware fail safe, so you can let the program die and not worry too much about it. This does not apply in all cases (cars), but it does apply in many (trains, like a deadman switch for the computer). For a vivid example of why this is an important approach, read about the Therac-25.


Absolutely, but I would still write my software as though the hardware could fail deadly unless doing so made the system less reliable.


To make sure you free() all your previous allocations on the way down. You can choose not too, it's "kinda" the same (can't remember the exact difference) but it's dirty and people with OCD just won't accept it.

(Disclosure: I got OCD too, this is not meant to make C development hostile to people with OCD.)


If your program is going to exit anyway, there's no need to call free() on allocated memory. The operating system will reclaim all of the memory it granted to your process when it exits. Remember that malloc lives in your process, not the kernel. When you ask for memory from malloc, it is actually doling out memory that you already own - the operating system granted it to you, when malloc requested it. And malloc requested it because you asked for more memory than it was currently managing.

If your intention is to continue running, then of course you want to call free() on your memory. And this certainly makes sense to do as you exit functions. But if you're, say, calling exit() in the middle of your program, for whatever reason, you don't need to worry about memory.

Other resources may be a problem, though. The the operating system will reclaim things it controlled, and granted to your process - memory, sockets, file descriptors and such. But you need to be careful about resources not controlled by the operating system in such a manner.


Multiple reasons.

Some kernels may not get memory back by themselves and expect each application to give it back. We're lucky that the kernels we use everyday do, but we may one day have to write for a target OS where it's not the case. Just hoping "the kernel will save us" is a practice as bad as relying on undefined behaviors.

If you're coding correctly, you have exactly as much malloc()'s as you have free()'s, so when rewinding the stack to exit, your application is gonna free() everything anyway.

Speaking of resources, what about leftover content in FIFOs or shm threads that you just locked?

And when you got OCD, you're only satisfied with this:

    $ valgrind cat /var/log/*
    ...
    ==17473== HEAP SUMMARY:
    ==17473==     in use at exit: 0 bytes in 0 blocks
    ==17473==   total heap usage: 123 allocs, 123 frees, 2,479,696 bytes allocated
    ==17473==
    ==17473== All heap blocks were freed -- no leaks are possible
    ==17473==
    ==17473== For counts of detected and suppressed errors, rerun with: -v
    ==17473== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
(cat was an easy choice and all I got on this box, but I've had bigger stuff already pass the "no leaks are possible" certification)


First, you sometimes do want to call exit deep within a program. That is the situation I am addressing, not normal operation. Of course you want to always free unused memory and prevent memory leaks. I am quite familiar with the importance of memory hygiene, and have even written documents admonishing students to use valgrind before coming to me for help: http://courses.cs.vt.edu/~cs3214/fall2010/projects/esh3-debu...

Second, please re-read my last sentence. I specifically addressed things that the kernel does not reclaim. This would also include System V shared memory segments and the like. You must manage these yourself, and it is always messy. Typically, rather than calling exit directly, you're going to invoke a routine that knows about all such resources, frees them, then calls exit. But you still don't need to unwind your stack back to main.

Third, the kernel always gets back all of its memory that was granted through conventional means. That's what operating systems do. I think you have a fundamental misunderstanding of what malloc is, and where it lives. Malloc is a user-level routine that lives inside of your process. When you call malloc, it is granting you memory that you already own. Malloc is just a convenience routine that sits between you and the operating system. When you say malloc(4), it does not go off and request 4 bytes from the operating system. It looks into large chunks of memory the operating system granted to it, and gives you some, updating its data structures along the way. If all of its chunks of memory are currently allocated, then it will go ask the operating system for memory - on a Unix machine, typically through a call to brk to mmap. But when it calls brk or mmap, it will request a large amount of memory, say a few megabytes. Then, from that large chunk, it will carve out 4 bytes for you to use.

(This is why off-by-one errors are so pernicious in C: the chances are very good that you actually do own that memory, so the kernel will happily allow you to access the value.)

Now, even if you are a good citizen and return all memory to malloc, the operating system still has to reclaim tons of memory from you. Which memory? Well, your stacks and such, but also all of that memory that malloc still controls. When you free memory back to malloc, malloc is very unlikely to then give it back to the operating system. So all programs, at exit, will have memory that they own that the kernel will need to reclaim.


They say memory is the second thing to go. ;) Unfortunately, the OS doesn't know how to remove files or modify database entries that also represent program state, or properly shut down connections to other machines. Proper unwinding is still necessary.


For the third time, I specifically addressed resources that are not controlled by the operating system.


Sane cleanup. Close any open resources, especially interprocess visible resources. Resources entirely in your process, such as memory, will just get freed by the OS; a file might want to be flushed, a database properly closed. Likely, in the frame where you're out of memory, you won't have the context to know what should be done: that is most likely a decision of your caller, or their caller…


This is a brilliant example of why you'd want exceptions. Look at what you're doing for error handling, manually every time.

Exceptions do the exact same thing, except: 1) automatically 2) type-safe 3) allow you to give extra detail about the error 4) tools can actually follow the control flow and tell you what's happening and why 5) debuggers can break on all exceptions. Try breaking on "any error" in your code (I do know how C programmers "solve" that : single-stepping. Ugh)

In this case, they are better in every way.

This is almost as bad as the code in the linux kernel and GNOME where they "don't ever use C++ objects !", and proceed to manually encode virtual method tables. And then you have have 2 object types that "inherit" from eachother (copy the virtual method table) and then proceed to overwrite the wrong indices with the overridden methods (and God forbid you forget to lock down alignment, resulting in having different function pointers overwritten on different architectures). Debugging that will cost you the rest of the week.

When it comes to bashing exceptions, it would be better to give people the real reason C++ programmers hate them, it's because of the major unsolvable problem you'll suddenly run in to when using them. In C and C++ you can use exceptions XOR not using exceptions.

This sounds like it's not a big deal, until you consider libraries. You want to use old libraries ? No exceptions for you ! (unless you rewrite them) You want to use newer libraries : you don't get to not use exceptions anymore ! You want to combine the two ? That's actually possible but if any exception library interacts with a non-exception library in the call-stack boom.

Exceptions are a great idea, but they don't offer a graceful upgrade path. Starting to use exceptions in C++ code is a major rewrite of the code. I guess if you follow the logic of the article that "would be a good thing", but given ... euhm ... reality ... I disagree. Try explaining "I'm adding exceptions to our code, rewriting 3 external libraries in the process" to your boss.


    This is a brilliant example of why you'd want exceptions. Look at what you're doing for error handling, manually every time.
You say that like safely handling exceptions is trivial. Exceptions are emphatically not "better in every way", they are a mixed bag. They offer many clear benefits (some that you have described here), but at the cost of making your code more difficult to reason about. You essentially end up with a lot of invisible goto's. Problems with exceptions tend to be much more subtle and hard to debug.

I'm not against them at all, and often I prefer them, but there are certainly downsides.


Also, exceptions are faster.

There is a lot of comparisons and branching going on, when the program always checks return codes. Assuming zero-cost exceptions, there is only overhead in the failure case.


I also find it very disingenious of the pro-exceptions post to claim that these mazes of ifs are easy to navigate. In his example that is sort-of true. When you're using actual real data to make the comparison it's easy to introduce very hard to trace bugs in them.

Once I had two things to check, one being time, and as you know that means 8 cases. You have to pick one to check first, and I picked the non-time based check to check first. That means that I suddenly didn't check all cases anymore :

  if (currentTime() < topBound) {
    if (some other condition) {
      if (currentTime() > lowerBound) {
        // at this point you of course do NOT know for sure that currentTime < topBound. Whoops.
(these look like they can be trivially merged. That's true if you look at just these lines, it becomes false if you look at the full set of conditions).


I don't get the sense that there was any attempt to recover from errors - it sounds more like they were enforcing that error checking occurred, by replacing `malloc` with one that just returned `NULL` always. It sounds like the goal was to make sure that one didn't assume `malloc` would always succeed and just use the memory.

Indeed, recovery is basically futile in this case and your program is going to shut down pretty quickly either way. Maybe you'll get the chance to tell the user that you ran out of memory before you die, which seems polite.


In systems that overcommit memory (like Linux), malloc() can return non-NULL and then crash when you read or write that address because the system doesn't have enough real memory to back that virtual address.


Even in Linux which set to overcommit, malloc() can still return NULL if you exhaust the virtual address space of your process, though I expect it's much less likely now on 64-bit platforms.


What if you application is the fly-by-wire for an airliner? Can you imagine that there might be better options than just calling abort(3)?


Yes, a better option is to make sure this error cannot happen, by making sure the program has enough memory to begin with. Fly-by-wire shouldn't need unbounded memory allocations at runtime.

There are some applications where you can try to recover by freeing something that isn't critical, or by waiting and trying again. Or you can gracefully fail whatever computation is going on right now, without aborting the entire program. But these are last resort things and will not always save you. If your fly-by-wire ever depends on such a last resort, it's broken by design :-)


From what I understand, in those sort of absolutely critical applications the standard is to design software that fails hard, fast, and safe. You don't want your fly-by-wire computer operating in an abnormal state for any amount of time, you want that system to click off and you want the other backup systems to come online immediately.

The computer in the Space Shuttle was actually 5 computers, 4 of them running in lockstep and able to vote out malfunctioning systems. The fifth ran an independent implementation of much of the same functionality. If there was a software fault with the 4 main computers, they wanted everything to fail as fast as possible so that they could switch to the 5th system.


Embedded systems like this do not use dynamic memory allocation.


Related: The JPL C guidelines forbid dynamic memory allocation: http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf (HN discussion: https://news.ycombinator.com/item?id=4339999)

They also demand that code be littered with sanity checking assertions.


Thanks for posting this.

Tangent, I was thinking about Toyota's software process failure and how they _invented_ industrial level mistake proofing yet did not apply it their engine throttle code.

C is obviously the wrong language, but from a software perspective they should have at least tested the engine controller from an adversarial standpoint (salt water on the board, stuck sensors). That is the crappy thing about Harvard architecture cpu (separate instruction and data memory), you can have while loops that NEVER crash controlling a machine that continues to wreck havoc, sometimes you want a hard reset and a fast recovery.

http://en.wikipedia.org/wiki/Crash-only_software


Ugh that's why I'll never buy Boeing; non-upgradable memory in 2013? Please!


I knew that people would nit-pick on this and not address the actual issue. Next time, I should try harder to come up with a better example.

My point is: sometimes it is worth trying to recover when malloc() fails.


I wasn't trying to nitpick. Correcting the example, and yes recovering from a malloc failure _could_ be a worthy goal, but on Linux by the time your app is getting signaled about malloc failures the OOM killer is already playing little bunny foo foo with ur processes.

If your app can operate under different allocation regimes then there should be side channels for dynamically adjusting the memory usage at runtime. On Linux, failed malloc is not that signal and since _so many_ libraries and language runtime semantics allocate memory, making sure you allocate no memory in your bad-malloc path is very difficult.


Like eliteraspberrie said, the proper way to recover from an error is to unroll your stack back to your main function and return 1 there.

Error checking was enforced for EVERY syscall, be it malloc() or open(). Checking for errors was indeed required but not enough: proper and graceful shut down was required too.


Go ahead. Call exit(). On your HTTP/IRC/anything server. Just because you couldn't allocate memory for one more client. Now your service is down and the CTO is looking for blood =)

Yes, it's far-fetched and like some said further down the comments, you "can't" run out of memory in Linux, but straight killing a service is never good.


If your server's running Limux, it's going to kill your process with no questions asked if you run out of memory. You're better off practicing crash-only error recovery and having robust clients that can handle a reconnect.

HTTP is stateless already, so crash and restart all you want!


The OOM killer is more likely to kill some other process and trash your server.

Thankfully that sort of behavior has been vastly reduced since the thing was introduced, but disabling overcommit for high-reliability applications is still a reasonable course of action.


The OOM killer might eventually kill something, after it thrashes the system for a few hours.

I had a server last week in which swap hadn't been configured. A compilation job took all memory and the OOM started thrashing. Thankfully there's always one SSH session open but I couldn't kill anything, sync or shutdown; fork failed with insufficient memory.

Left if thrashing overnight and had to power-kill it next day.


I'd say letting that server bounce and having the watchdog/load balancer work to keep you at capacity is the best option there. You are going to need that infrastructure anyway and if you can't malloc enough for one more client, who is to say that the existing stuff is going to be able to continue either?

You should count on any particular instance bouncing when that happens, and design your system to survive that. You should also invest some effort to figure out why your system can get into that state. Consider if any particular instance should be attempting to accept more jobs than it has the ability to handle. I shouldn't be able to trigger an out of memory situation on your IRC server just by connecting as many irc clients as I can.


I suspect you mean LD_PRELOAD=whatever.so


Thanks, haven't used it in two years and am a bit rusty :)


I've been thinking this might be well ameliorated by using checked exceptions, but even languages that have them don't get sophisticated with them, so they wind up being painful enough they're not really used (in my experience).

For instance, I don't know of a language that lets you say "This function accepts a function (f) and returns a function which throws anything f throws except FooException."


I think you may have hit on something subtle. To me, it's better to have two completely distinct modes: I'm either dealing with errors or not. Then if an error occurs, I can expect one of two things -- either it should be handled or it should crash. I've noticed in some exception-using code I've encountered that a lot of times, error conditions are tested and then not really handled -- they're either silently ignored, or dealt with by some half-measure. To me, silent failures are much more insidious than crashes. This halfway style of error handling is the worst of all worlds, IMO, and it might be encouraged by exceptions. Java forces you to acknowledge errors that you have no need to, and probably pushes you toward this gray area when you're not sure how to handle things yet. Then, later, it's totally non-obvious what's truly handled and what isn't. It also clutters up your code. Clarity is often more beneficial than reflexive error-checking.


Well I think this is only a problem with runtime exceptions. You have no choice but to deal with compile time exceptions (of course you can deal with them poorly if you choose). But it seems the compile time exceptions are unpopular.

I think there's another way to deal with this and that's a better type system. AFAIK it's impossible to have a null pointer exception in Haskell. EG: If you do something like `hashmap.get` the type system forces you to write code to react to the situation when the value doesn't exist in the map. Your code won't run otherwise.

...I can't tell if I'm rambling or replying to you :)


Haskell does not force you to handle the value not existing. For example, you can do:

let Just b = HashMap.lookup k xs

If the element is not there, then lookup will return Nothing, which will fail to pattern match with 'Just b'. This can also be accomplished using the fromJust function. What Haskell does do is make it obvious where these exceptions can occur by documenting what functions can return Nothing, and by requiring you to explicitly cast between values that may be Nothing and values that cannot be Nothing.


Haskell's type system isn't powerful enough to guarantee no exceptions will occur, though it certainly manages NULL better and other languages like Agda go further.

Try debugging an out of bounds array access in Data.Array, which IIRC a few years ago just made my program print "error" and die...


I feel the majority of libraries you will use in Haskell won't resort to errors except in two cases: One is very low-level libraries doing things like interacting with C-code or raw memory. These are dangerous in all languages unless you are careful. The second is in functions that explicitly let the caller know they may fail.

For example, you can write "head aList" which will cause an error if aList is empty, however if you instead write "case aList of { [] -> handleEmptyList; a:as -> handleNonEmptyList a}" then you can cleanly get around the exceptional cases. It does take discipline to use these functions and when writing in "quick and dirty" mode I do skip and use the other functions, but they should only be used carefully.


Checking before you call a partial function isn't really any different than a null check.


I wonder if that's a general "water seeks its level" thing. We only care so much about robustness, and start to trade off in favor of it at some threshold regardless of language...


Yeah, I think that people are starting to realize this. E.g. Go and Rust don't have exceptions (or Java/C++ style exceptions).


Popular C software is so reliable because of enormous amount of effort spent developing, testing, and polishing it over many years - not because error handling in C is superior to that in higher-level languages.

If your python script sticks around for 30 years constantly being used by millions of people - I bet it will be rock solid as well. All other things [1] being equal [2], a program written in a higher-level language will likely be more reliable - simply because it will likely be shorter. C has many properties conductive to great longevity of software written in it, but error handling is not one of them.

Now, to the rant about not knowing the exact number of exceptions a function can throw. Most sane languages [3] give you exception hierarchy. You know that every function can throw an Exception (or Throwable, or whatever the root of all things evil is called in your language), and you either handle it or let in propagate. The important bit is you catch exceptions based on what you can handle in a current context, not on what the functions you call can throw [4]. As long as you realize that every call can potentially throw, it is not important what exactly is thrown - letting you abstract away the exact nature of an error is a feature, not a bug, that's what exceptions are there for! Let's say some function deep below changed and now throws a new exception type. So what? It didn't change what you can handle, so there's no reason to modify calling function - just let the exception propagate. What if nothing in your program can handle that exception? You let it crash (hopefully to the debugger, and no, it's not ironic that I suggest it in a topic about reliable programs - crashing is a reliability feature invented to expose problems that would be hidden and hard to debug otherwise). Of course you may also put a few catch-all clauses in a very limited number of strategic places.

If you approach exceptions this way they are vastly superior to error codes.

[1] Scope of the problem, amount of effort spent on development, programmer skills

[2] Or at least on the same order of magnitude

[3] With a notable exception of C++ - I'm not sure what they were smoking when they decided that throwing arbitrary values is a good idea

[4] And this, by the way, is what's wrong with checked exceptions in Java

PS. Didn't mean that having no exceptions is insane - but that if a language has exceptions, the sane thing to do is to organize them in a hierarchy with a single root.


I disagree with being polished. Newer C programs can be robust as well. In fact, I write C programs both for fun and at work and so do a lot of people I know, and these programs still end up being much more robust than I would even expect myself. And when you run them continuously, you will find some bugs in the program but the next bug will appear much later until you don't remember when was the last time a bug caused a failure.

There's something about the robustness you're forced to implement. I never trust scripting languages like that, not mine nor others', as much as solid C. I've seen it way too many times that when a C program is well-written solid code, it keeps on doing what it does and doesn't give surprises. I've had many Python scripts running for years and I always need to fix something little there every year. Maybe something has changed in some library, or I bump into a new failure mode, or something. This rarely happens with a C program once it has "settled".


I worked with PHP for a while and I found it to behave like you describe: I rarely found bugs in code once it reached a stable state. I haven't used Python (I might be the only one), so I can only speculate on whether it's truly more bug-prone than PHP. But one possible reason could be the relatively straightforward behavior of PHP, and C code. (My experience is with non-OO PHP, mainly from hacking Drupal modules, which might make a difference). C and (older) PHP have a small number of data types and few or no hidden behaviors. Neither is OO. I wonder if OO might contribute to long-term instability, because since OO languages are rich in types, they tend to let the language make inferences and allow types to alter the meaning of syntax. For predictable code, you probably want a language that isn't too smart and always gives the same meaning to the same code.


The problem with exceptions (in certain contexts, of course) is precisely these assumptions.

Changes in low-level routines can change what your function is able to handle. Sometimes "just let it crash" isn't an option, period. Often, the exception hierarchy doesn't expose enough information to handle an exception without outside context (vis Python's OSError).

In many cases, exceptions are superior to returning error codes because of ease of use and debugging. In many other cases, returning error codes (or something like Java's checked exceptions, although those have some rough edges) are superior.

I've yet to see a universally applicable error handling model, and I suspect I never will.

(By the way, I believe the ability to throw arbitrary values in C++ is, somewhat loosely, related to something called 'foreign exceptions'. The Itanium C++ ABI on exception handling is an interesting read here. C++ implementations support throwing things into and out of non-C++ code, and simply don't limit what C++ code can throw under normal circumstances. Though perhaps they should have.)


"I've yet to see a universally applicable error handling model, and I suspect I never will."

I'll bite:

http://www.gigamonkeys.com/book/beyond-exception-handling-co...


Common Lisp's condition/restart system is very nice. It solves the context problem, which (at least in my experience) is the biggest problem with exceptions in day-to-day programming.

CL's dedication to the debugger also greatly reduces the downsides of not explicitly handling errors. You get the ability to notify the user of problems that they might be able to resolve manually for free.

The lack of function signature information is still a problem, though. Sometimes you really do need to know what exceptions might be thrown. There's also the issue of overhead . . .

I'm happy to say that exceptions are nice to have, but the idea that they're superior to returning error codes at all times in all places (especially in a language that supports returning multiple values) is more than a little silly.


Yes!

Most software I've seen can be dramatically improved by removing exception handlers. (And then fixing the underlying problems).

On the server side I prefer one exception handler which is the OS, it will kill your process, release and free all memory, file handlers, etc. Then I wrap it in a bash script / other monitoring tool, and have it immediately restart whenever it fails.

As for the server itself I'll also add an automatic reboot to the server to ensure nothing out of the ordinary persists for very long (like that PATH modification that someone forgot to add in the appropriate place to get set on restart). If it's virtualized I prefer to destroy the entire server if possible.


Are the servers you're building serving concurrent clients? An exception could take out multiple in-flight requests.

(not against your idea, just curious how you handle it)


One advantage of forking servers; kill and reboot the parent, and you don't loose in-flight connections.

That said, I do this as well, even the best behaved daemon can get... funky... after a few months. Planned outages for a daemon restart are ok in my experience, particularly if you can fail over to other nodes as part of a rolling restart.

Of course, this refers to planned restarts, though forking servers helps with unplanned exceptions as well.


Don't you find performance suffers? AIUI this approach means you can only handle as many concurrent requests as you have processes, and the OS scheduler has less information to work with than if you were using threads.


Not typically: using forking daemons does not mean that you can't also use threads. The ideal model probably uses both - so they can be stuck to a processor, but still use threads per request. It's nice to have a library to abstract the implementation details away for you, but not necessary.


Right, but doesn't the error handling approach you describe mean allowing a whole process to fail whenever an error condition occurs, which would cause any requests that were being handled by other threads of that process to fail even though they were perfectly valid?


Can you really tell me that you know after an exception that the other threads are really in a well-defined state let alone 'perfectly valid'?

Look at something like ZeroMQ that is being rewritten specifically to avoid the non-determinism inherent in throwing an exception.

Once you're using threads it's pretty much anyone's guess as to what state the system is in at any point, add exceptions and it just gets worse.


An exception in one thread shouldn't affect others - why would it?

I agree that unstructured use of threading primitives leaves you with an unpredictable system, but it's possible to build safer, higher-level abstractions and use those.


Because fundamentally the only reason to use threads rather than fork is to share memory. If an exception leaves shared memory in an undefined state then all threads that share that memory are in undefined states.


True, but why would an exception ever leave memory in an undefined state? I can imagine it corrupting a thread's own stack in some languages, but you wouldn't use a thread's stack for shared memory (at least, not without some construct that told other threads when it was safe to access it)


Why do you need to handle requests concurrently when something like the disruptor pattern can handle 6 million/sec on a single core.


Disruptor would be even worse for this - you'd lose all the perfectly valid messages in all the ringbuffers.


It's more of a question of what the semantics of failed requests are than concurrent clients.

The key is to segment your system so processing is orthogonal to the implementation details of the networking protocol on a given system. Just because on a particular OS when a process closes the TCP/IP connections are dropped does not mean that every time your process crashes that client connections are dropped.

In the case of a webserver you can use something like mongrel2 / nginx that maps physical connections to backend processes so that a process restart doesn't mean a dropped connection, or failed request.

Forcing your machines to reboot early and often makes you think about and deal with these problems rather than simply delaying them until one of your nodes dies and takes out a bunch of client connections anyway.


Fail-fast is one of the philosophies I love in Erlang. By using gen_server and supervisor, it's easy to write fail-fast software that works.

In most software I see, exception handlers are a myriad of poor implementations of "log a message, clean up state, and try/fail again" rather than the actual handling of a specific exceptional state. Many of the most frustrating bugs I've seen are introduced when an exception handler fails to clean up state and tries again with some kind of insane context or a leaked handle.


If this were valid, there would be no need for an exception type hierarchy at all.

In reality, we have different exception types (and error codes) because there are circumstances when the type of exception determines how it may be handled. Therefore, a professional programmer needs to be able to find out what exceptions could occur in any given call (which is not the same as saying that she needs to consider each one individually after every call - this is where your 'what can be handled here?' principle has some use.)

Terminate-on-exception might just be a workable policy for a simple or unimportant application, but it is not how you create a robust software infrastructure or safety-critical software.

The original article describes a style of programming in which his programs are built directly on OS services without middleware, and in which the programmer is aware at all times of what might go wrong. I think it is this, rather than the distinction between error codes and exceptions, that contributes to the reliability of the programs he is writing about.

Creating a new exception type that could propagate out of the abstraction you are working on is a serious matter, so it would not be a bad thing if it were a non-trivial thing to do. Unfortunately, the attempts to make this so in Java and C++ put most of the burden on the users of the abstraction instead, and this seems to be built in to the nature of this problem.


In a lot of ways, comparing syscalls like stat and Java methods is apples and oranges. System calls are at the very bottom of the stack, where userspace interacts directly with kernelspace, so they must return all error information, otherwise there would be no other way to get it from userspace. So it's not really examplary of C language style that syscalls behave this way. Syscalls are not designed to obscure what's really going on by putting a friendlier interface on it; at some level you need the truth.

Having said that, though, there are aspects of C that could lend it to writing robust code. The concepts are concrete, for one thing. Despite the fact that high-level languages are designed to be easier to understand, I suspect that concrete concepts are inherently easier to grasp than abstract ones (and it doesn't hurt that concrete concepts don't leak). What C contributes is a concise and eloquent syntax for telling the machine what to do, with little waste. Some people argue that it's dangerous to control the machine directly without the compiler being able to watch over you; but not having extra annotations for strong typing and such makes it easy to see what's happening, and I think that concision and transparency is key to writing robust code.


It's no longer even true that the kernel returns all error context; for example selinux is supposedly a nightmare to figure out "why was this denied?".


Maybe that's because SELinux was handled by the NSA, and they're not so thorough?


Exceptions are kind of like gotos, but worse, because the jump target is decided at runtime based on the state of the call stack and therefore cannot be determined just by looking at the code. So they're more like comefroms [0] than gotos.

[0] http://en.wikipedia.org/wiki/COMEFROM


Right, which is why they are reserved for situations where you are not trying to "jump somewhere", but just want to give up and crash, because further execution in the current context has failed. If some external calling context really does not want to crash, the implementation of failing function doesn't and shouldn't care.


That might be what they are intended to be reserved for but the reality of what they get used for is much more of the "Jump Somewhere" sort.


Really? I had it drilled into me from day 3 or so, that "EXCEPTIONS ARE NOT FLOW CONTROL!"


After more than a decade of seeing exceptions in use it has become very apparent that no matter what they might have been told when they learned about exceptions most people use them for flow control. Mostly because they are a way of doing flow control.


> So they're more like comefroms than gotos.

Except, not, because the jump source for a particular exception handler is no more (and no less) determinable in advance than the jump target of an exception raised at a particular point. They are very much unlike either gotos or comefroms.

If you are going to criticize exceptions, do it directly, rather than by asserting that they are like something that they are completely unlike.


> Most sane languages [3] give you exception hierarchy

Go being a glaring exception -- and this is a good thing.


I disagree. Go errors aren't even error codes. The error interface is just one method that returns a string. The only advantage Go has over C is that the errors are no longer inband with normal return values. Once you get one though, you're back in the perl5 land of reasoning about your error by scraping a string. I'm completely baffled as to why they didn't attempt to bundle more machine readable information into the errors.


Go errors are machine-readable data.

Most modules expose variables with the naming convention Err... that represent the errors that they can return. The errors can be compared with those values (eg. err == os.ErrPermission). Some modules also define error types that provide additional error data, such as net.OpError; a type-cast can be used to check if the error is of that type, and then access the data within. As a fallback, all errors implement the Error interface which allows you to display them as a string if you can't handle them based upon their type.

It's not unlike exception handling by exception type, although it's definitely more of a manual process because Go doesn't give you specialized syntax for it.


I stand corrected. I'd still prefer normal exception handling but whatever.


He said sane.


Or, writing C software with an acceptable quality requires you to put relatively big effort in developing, testing and polishing; the result of which is usually higher quality than what you get in a language where things are easier to write and you don't feel you have to be careful.


I disagree with 4. How else would you recommend structuring this:

  try{
    lst.Where(i => i.ID == itm.ID).First().Amount += itm.Amount
  } catch (InvalidOperationException) {
    lst.Add(itm);
  }
I could check if the list contains it first, but this is more convenient. This is just one small example of a lot of cases where I've used typed exceptions successfully - in some cases I have multiple catches that do different things, eg catch(InvalidOperationException){...}catch(ArgumentNullException){...} this saves a lot of lines of pre-checking and has a clear control flow.


You are totally correct about the engineering that goes into complex software development being the cause of the stability this guy mentions.


The author's point is weakened by the fact that the Converge VM is now (5 years after this article) written in python: https://github.com/ltratt/converge/search?l=c (as of https://github.com/ltratt/converge/tree/08dadda29b/vm). Compare https://github.com/ltratt/converge/tree/converge-1.x/vm.

"there are some obvious reasons as to why it might be so reliable: it's used by (relatively) large numbers of people, who help shake out bugs; the software has been developed over a long period of time, so previous generations bore the brunt of the bugs; and, if we're being brutally honest, only fairly competent programmers tend to use C in the first place. But still, the fundamental question remained: why is so much of the software I use in C so reliable?"

Why could the fundamental question not be explained by all the points he just mentioned?

I'm a C programmer, and this guy is hitting all my biases, but this isn't very well thought-out.


I believe he explains this switch here: http://tratt.net/laurie/blog/entries/fast_enough_vms_in_fast...

It's a very good read. IIRC, the reasons aren't really rooted in the language that is used. It's more about the tooling provided by PyPy for constructing a compiler.


> The author's point is weakened by the fact that the Converge VM is now (5 years after this article) written in python

Kinda, it's written in RPython, a restricted and statically inferable subset of python built specifically to write VMs.


Speaking as someone who has been a long time coder in C and Java, I think the point he makes about checking errors is BS. You have to handle exception in Java and you're perfectly free to consider all the possible failure paths if you want though typically you'll handle the ones you can sensibly do something about and ignore the rest (i.e.: log or just fail outright depending on context). I do NOT miss the C world where you have to handle the results of every single function call if you want to be safe. The exception handling approach IMO is generally superior - the normal code executes simply and the exception handling code gathers all the exception cases together. Much better than interspersing both and especially better than handling the same error 10 times in a row rather than just doing it once.

That said, I love C. It's a very simple language and that's a virtue.


"[Pointers are] arguably the trickiest concept in low-level languages, having no simple real-world analogy[.]"

I'm not sure how "address" is not a good, "simple real-world analogy" for pointers. It may be that I've just internalized enough of the behavior of pointers that I'm glossing over important differences, though... I struggled with pointers long enough ago that I don't remember struggling with pointers.


I think pointer syntax in C, not pointers themselves, is what causes confusion for most beginner programmers. This post explains it well: http://objcsharp.wordpress.com/2013/08/14/the-great-pointer-...


Plausibly. C++ confuddles things with reference syntax, but in straight C the notion that

    int b, *a;
declares that

    "b"
and

    "*a"
are ints only really breaks down at casts. It does require point out that you are allocating space for the named things, not the terms...


For a lot of people I think it takes a while to really internalize the ramification of the fact that you can have a pointer to memory you're not allowed to use. (That, plus the way they're used in C as a cheap answer to templates/generics.) I understood that a pointer was an address from the start, but it was still some time before I figured out how to use them correctly.


You've never arrived at a physical address to find a shuttered business or gaping crater? I can understand the syntax of pointers being confusing, but the concept has distinct real world analogies. (That being said, I find the syntax of physical addresses in other countries to sometimes be confusing.)


Certainly I have (the former, not the latter). You won't be surprised when I tell you that it wasn't a horrible tragedy and that I didn't die on the spot because of this, or accidentally purchase a handgun instead of a watermelon. In fact, it was such a minor event that I hadn't taken the time to make plans ahead of time for such an eventuality. Likewise, the first time I used pointers, nobody told me ahead of time that it was even possible to have a pointer that pointed to nothing. Understanding this, and (more crucially) learning exactly how to know when a pointer was and was not valid, was what got me my first merit badge in C. Everything before that was just Pascal with less formality.


I've definitely never arrived at an address to find a gaping crater myself.


I have - after demolition following an earthquake. I wasn't lead there by the address, though - just passing by.


C implements a fairly low-level but general model of how a computer works, and a programmer's understanding of pointers is something of a proxy for their understanding of how that model works. Therefore, C has a sort of built-in test for a level of competence in a programmer. Unfortunately, it is by no means perfect, and you can still find people writing in C who have no idea why they should not return a pointer to an automatic variable.


Sure... but I don't really see that that's a problem with the analogy. I'm limited in what I can do to someone else's house, even if I have the address; an address might persist after the house has been bulldozed and replaced; &c...


It's not really a problem with the analogy; I'm just saying that a good analogy is no cure for a lack of experience. My first non-toy C program was a disaster because I knew what a pointer was but I didn't yet understand how to manage them.


Oh, for sure it's no cure for a lack of experience. My contention was solely with the statement that there is no real-world analogy. There is one; it's useful; it doesn't accomplish everything because it's just an analogy.


I agree with you now. But I keep thinking back ~10 years when seeing a * (or ) was a moment for pause and speculation.

I think it comes down to the address of an int* is a thing, which again makes sense to someone familiar, but to a newcomer means you have a thing at an address that is an address of another thing.


They are so simple. It's just the address of the thing it points to.

Maybe it's lost in the mists for me too, but if you don't know your pointers from yout references from your values... how can you code anything?


Based on the experience I've had with teaching programmers, most don't have the distinction clearly in their mental models. For most of basic things they might do it can be made to work either way: usually with a bit of fiddling they can make it work. Of course, this doesn't work at all as soon as you start trying to implement data structures.

I've seen some awkward code too from people moving from Java to C++ where either they make everything a pointer, or they use value objects without understanding that they're not references.


Walking across a tightrope is a lot easier than you would think. When you're walking on a tightrope you really concentrate on your walking; it's stressful, but not actually hard. Still, I don't think many people would choose to walk across a tightrope when an easier option was available.

Exceptions, meanwhile, allow a counterintuitive but effective strategy for greater reliability. My boss likes to say that the cloud is great because machines fail more often - the point being that rather than the impossible task of making individual machines failure-proof, you instead design systems that can handle the failure of any given machine. Often the best way to make a resilient system is to break the problem into quite coarse-grained blocks, and then define appropriate responses (such as failing a single message and processing others, or retrying the whole block) whenever a given block fails.


"Once one has understood a concept such as pointers (arguably the trickiest concept in low-level languages, having no simple real-world analogy) "

huh?

So when you write a letter to someone, you actually attach their house onto the outside of the envelope? No of course you don't. You write their ADDRESS on the envelope.

All this lore about how difficult pointers are, is bunk. The syntax of pointers vs addresses vs dereferenced pointer values, can be difficult at first ... and can be very difficult if you read and/or write unecessarily obfuscated code ... but there is no need.


Sometimes I just write the contents of their house on the outside of the envelope.

You have to admit it is one of the trickier concepts people learn when programming. Once you understand it, it's definitely a "Duh, how else would it work?" moment but getting to that point can take some time.


I agree. The concept may be simple, but when you're just starting out as a C++ newbie and you've got pointers-to-pointers-to-pointers flying all over the place in this ugly syntax trying to satisfy some contrived homework assignment on a short deadline, things can get confusing rather quickly.


Agreed, in theory - in practice, I find that explaining

mov r1, r2 vs mov (r1), r2 vs mov (r1)+, r2

followed by an hour of writing assembler programs to do basic things with null-terminated strings is usually needed to help people learning C to quickly internalize what that means.

Something about * and + and - around the var name seems to become easier when they can say "oh now I see; *c++ is like (r1)+".


Yeah, I've never gotten why people consider pointers to be hard.

If you consider memory to be a massive array, pointers are just the indices that you use to get at different elements in that array.


C also has "better" I'd argue that average person programming in C is more matured, and understand nitty gritty of programming, computer science, performance and how things work at low level far better than, say, Visual Basic or even Java programmer.

I think we have been putting too much faith in to languages. After certain level of sophistication, the weight of quality shifts to person programming in it instead of language itself. However most language designers would tell you that their holly grail is to shift this threshold so programmers can be as lower in skills and ignorant as possible while quality of program as high as possible.

Lot of these things can be also said for JavaScript. I've now coded fair bit in plain JS constantly avoiding temptation to use some other language that compiles to JS (and even avoid TypeScript for that matter). Just 100% pure JavaScript. And I can tell you that my habits and flow is very different when I code in JS. Back then I though coding in pure JS would blow everything up just because of lack of types or silly thing like missed semicolons or as simple stuff like forgetting to pass 10 in parseInt. The fact is that none of these bothers me at all anymore. I rarely find bugs that can be attributed to these deficiency in JavaScript. Strongly typed languages now feels like some kind of over hyped hoax.


I've never in my life written as careful and premeditated code as when I was writing C on my Amiga when I was young.

One miss and you would tilt the whole system.

There was no memory protection and everything was system wide. If stuff failed, you had to deal with it because if you didn't you probably left something unlocked or left a resource open which hogged it and prevented other applications from using it. And anything left open would leak memory as well. And there was only system memory so other programs would eventually fail memory allocations because your program didn't close some handle somewhere.

Surprisingly, most of the programs I wrote worked correctly early from the beginning. There was no "I'll just code this thing up and fix bugs later" phase because there were absolutely no safety nets. And fixing bugs in retrospect usually meant rebooting your machine, starting up the editor and compilers again for another round.

I'm pretty pedantic on Unix too but I'm still a lazy bastard compared to those times. And my C programs on Unix do take a whole lot more iterations to "stabilize" than those I wrote on my Amiga.

In some perverse masochistic way I somehow miss that.


> it's reliable in the sense that it doesn't crash, and also reliable in the sense that it handles minor failures gracefully.

This is an incomplete definition of reliability. It seems that the author is only considering programs which work as advertised when used as intended.

Most software is written optimistically, i.e., the "good path" is tested most. As soon as you start feeding the program with invalid input, it is no longer reliable, as witnessed by the myriad of low-level security bugs (stemming from the nature of C) in virtually every nontrivial C program ever written.

Now, would you rather have your program crash (be "unreliable" by the author's definition) or end up running arbitrary code injected by an attacker?

(Security exploits are the most prominent example; many programs will do something weird when fed unexpected input.)


There are idioms and methods for writing reliable C programs, which you learn with experience and reading better code than yours. If you learned languages in the order higher-to-lower, you eventually understand that you cannot write a program in C at the same pace as in a higher level language.

What has worked for me is: always have a second terminal open for the manual pages and the standards; and to regularly read quality C source code. Opengroup.org has thorough references for the standard C library, including its relationship to POSIX and the SUS; and the various kernels and core Unix utilities are always a good read.


It is an interesting observation. I have found that scripted languages are unreliable as a function of loadable modules (which is to say they depend on some module that gets loaded, that at some point changed, and then broke the script). C has that problem too with shared libraries but there are generally fail stop policies in place (if you can't get the version you were linked against exit rather than try to valiantly proceed)

What is perhaps most important is that software does not "rot" in the sense of building materials becoming weaker after exposure to the elements. But it does fair poorly against changes in things it assumes to be true. Some languages, like Ada or Mesa, which have very detailed configuration specifications, achieve high reliability at the cost of high configuration specificity. That makes them brittle against even modest changes in their environment.

The art is striking the balance. Take something like thttpd, 20 year old code still compiles, still does exactly what it always did (simple HTTP web server) very reliable. But the barest minimum of reliance on the ambient environment it runs in.


> I have found that scripted languages are unreliable as a function of loadable modules (which is to say they depend on some module that gets loaded, that at some point changed, and then broke the script).

That's why I'm thankful for a lot of these newfangled virtual environments and machines along with their respective package managers (and why more people should use them!).


C programs are so damn reliable I spent last three weeks mostly trying to debug others' code (from multiple software projects - I was lucky enough they decided they'd all start hitting obscure bugs at approximately same time) and waiting for it to crash again.

Just imagine my feelings when a critical server that ran just fine for 3 years gets a bit more load and starts getting assertion failures. That was a known and already fixed bug, but after version bump to next stable those had changed to SIGSEGVs - and it's not trivial NULL pointer dereferencing but a weird concurrency issue.

C has neither [relatively] strong guarantees of OCaml and Haskell (so I could be at least sure there're no trivial bugs with that weird macros doing crazy pointer juggling), nor dynamic flexibility of Python and Erlang (so I could hack working runtime live, letting it fail and hot-swapping code at will).


Adjusted for effort I don't know of any evidence that they are...


This really does depend on how "reliable" is defined. At the code level, reliable is dependent on the actual programmer. This code must then be compiled/interpreted in order to to actually function. At this stage the programmer has lost control unless they are also in direct control of the interpreter/compiler - which is, of course, impractical and extremely unlikely.

Inevitably writing in high level languages sacrifices control for how the program looks at the machine level simply because you must depend on existing structures. The higher up you go, the less control you have of the final product. Writing in Python vs. C means that you have to think less about lower level issues and consequently are subject to problems that arise outside the scope of what you could have originally controlled - simply because you didn't have to think about a subclass of issues that you assumed were inherently solved for you.

So yes - if you take a Python program and rewrite it in C, you can get a more reliable program IF you properly solved the set of things that Python took care of for you in relation to your specific problem.

But this relies on the fact that you succeeded in rewriting what Python took care of for you. So we're back to where we started - it all depends on the programmer.

The article implies this in the final sentence:

All that said, I don't recommend using C unless much thought has been given to the decision - the resulting software might be reliable, but it will have taken a significant human effort to produce it.

And this basically negates that C is any more reliable than whatever other programming language. Despite the fact that I think a discussion of reliability of a language is effectively useless (at least the way I've framed it here, I'm open to other interpretations), I enjoyed the article - thanks for sharing!


How does effort affect reliability? Even if you consider one as consequence of the other ("it requires more effort to make a reliable C program than the equivalent program in $language") that doesn't make the final product any less reliable.


it is more about the attitude of a C programmer versus the attitude of a, say, JavaScript or Ruby programmer. Whilest the latter merely assumes "nah, the VM will catch all null pointers as soft exceptions for me, and all exceptions I don't catch, my caller should." the first kind of programmer has learned (the hard way) to respect as much of the error codes a function call can return. "The hard way" is the malicious smiled SIGSEGV dynamic-programming language programmers may laughter about, because it hardly crashes your program, one may say.

However, I do believe, that this attitude to think first (how to program right) rather than trying to remember (what could have caused that many 500's on my HTTP server) which I think is the better approach to write more reliable software.

Dynamic languages are said to be more convenient for web sites, for example, I'm not denying that one, but those languages just shift the problems back into the future, where, when time has come, you may or may not be willing to attempt to fix the bug you introduced days/weeks/months ago, depending on the urgency.

On programming environments (such as C), where types are more statically typed, errors have to be handled manually and with caution, memory has to be managed (more or less) always with an ownership in mind, those programs, that think about these topics from the very beginning, and iron out those remaining bugs over time, are - from my point of view - the more reliable ones.

So I can out of my distance second this blog post. It was interesting to read.


It does when you only have a set amount of effort that you can dedicate on a project. Which is usually the case.


I am not sure that's a useful criterion in practice, though. People are fond of saying that languages will not make you orders of magnitude more productive, and I think the corollary is that on average they will not make you orders of magnitude less productive either. (Controlling of course for characteristics of problems and such.)

The problem is that there's not really a great way to reason about this, either. If you're talking about how long it takes to write a program in C vs Java, how do you measure time spent writing code vs debugging vs bug fixing, et al?

All in all it's certainly possible it takes longer to write code in C, but for all anyone knows, writing more reliable code up front means fewer bugs later on. Code which is more performant by default may lead to fewer performance regressions when someone induces pathological GC behavior. Contrariwise, there's stuff like memory corruption as the OP describes, which is probably way more difficult in a native environment instead of a hosted one.

It's hard not to reach the conclusion that we're all just talking out of our hats anyway (myself included).


Also, compile-time checks and the tendency of lower-level languages to fail catastrophically rather than subtly or silently plays a part. The author also has a lot of experience to draw on, which might just make his work more solid and error free in the first place.


Some good points in the essay but I have to admit that for me the reason I like C is that it's the first language I ever really did any serious programming in and the first language that made me think hard. Because of that it just feels like home when I come back to it and I can never really evaluate it without that bias. Its the simplest model of the machine that you can get without going down to assembly. Its the same reason I like riding and fixing bicycles. In both cases sparse resources have kept the design to the bare minimum.


There are multiple factors that contribute, but here's a few that I haven't (to the best of my recollection) seen mentioned so far: tooling (crucial), "do the simplest thing that could possibly work" attitude brought about by (lack of) a built-in collections library and simple syntax, and lack of rapidly changing requirements during development.

Tools for C are powerful, mature, and available on near every platform. Every single C programmer that I know uses multiple memory debuggers, for example: e.g., valgrind for leak checks and more on smaller code segments, LLVM address sanitizer on unit test runs, jemalloc or tcmalloc/google-perftools, and more.

"Do the simplest thing that could possibly work" is somewhat close to the author's point: simply put, while there are widely used cross-platform hash tables in C, it takes more effort to use one; hence, they aren't used e.g., when N is known to be 100 (or if maximum size is bound, there's likewise no pressure to use the built in hash table/red-black tree and one can use perfect hashing instead...). In another big distinction I see between code I wrote in C vs. higher-level languages is that one can simply use a stack allocated variable length array -- or (if random access doesn't happen) a linked list in place of a heap allocated resizable array.

These exammples, I think demonstrated how C discourages you from interpreting "the simplest thing that could possible work" as "the thing that is fastest to code"; that's not to say development speed is not an issue -- it's a _huge_ issue! -- but it often leads to code where the unneeded complexity (such as using a red-black tree or a skip list instead of a sorted stack allocated array!) is hidden from the programmer but is nonetheless present.

Onto the last point, I agree sentiment expressed by others questions the premise. I think the examples of C programmers where we explicitly know of something as being a C program are heavily biased towards well known and open source software; open source software, language runtimes, operating systems, browsers (usually in a clean and limited subset of C++ like, e.g., Chromium and likely Firefox as well -- I only mention Chromium as I've re-used base libraries from its codebase), and so on. We don't, however, think of parking meters, vending machines, and many other usually embedded (but not life critical) systems that -- unlike open source systems software -- are built from unclear, confused, and ever changing business requirements ("well, this city introduced a new red-zone, so now we have to make sure to charge extra 0.33 cents between hours 8 and 13 except on all Holidays, but not July 4th").

Yet when people bring up Java, it's easy to forget about gmail or Minecraft, but instead to think of the insurance claim management system that you had to use after your rental car got rear-ended near 4th and Harrison, that crashes with a user-visible stack trace when you click the wrong button and (even in 2013) doesn't use Ajax and requires opening a new page (slowly, as they don't use Java's built in thread-pools but spawn new threads, because the system has to run at a customer site that uses Enron's custom implementation of Java 1.3 on an AS/400) to make an changes.


>Tools for C are powerful, mature, and available on near every platform.

If only this was true for embedded platforms! I have no tools at my disposal aside from a vendor supplied debugger. No pre-existing test infrastructure exists, what we have we have had to build up ourselves.

One of C's last remaining strongholds is in embedded, and the tooling in this world is atrocious! (Though some of the instruction and performance profiling tools are beyond amazing!)

Rigorous coding standards and a heavy reliance upon code reviews keep a code base clean and sane, sort of like on any other project written in any other language!


Last remaining strongholds?

C is everywhere.

I'm in a similar situation with embedded stuff at the moment. We're basically stuck with debug-by-printf over a serial port, far from ideal!

I have worked on other embedded projects that were better, for instance you can make a gdb server that operates over serial or tcp/ip and allows you to use gdb (or even graphical debuggers like ddd) on a host machine to step through code as it runs on your device. This does rely on there being resources available to do this though I guess...


Embedded test software is developing. Look at http://throwtheswitch.org/ and my (more minimal) "greatest": http://github.com/silentbicycle/greatest


> Tools for C are powerful, mature, and available on near every platform. Every single C programmer that I know uses multiple memory debuggers, for example: e.g., valgrind for leak checks and more on smaller code segments, LLVM address sanitizer on unit test runs, jemalloc or tcmalloc/google-perftools, and more.

This is exactly the point. C developers need to make use of external tools to improve the language's flanky design in terms of type safety.

Tooling, outside the language standard thus not available in all systems that support C, instead of using a saner systems language like Ada, Modula-2, Oberon among many others.


You're missing the point: in many cases, choice of C is not ideological but practical. I love Modula-3 and OCaml (and ML's incorporation of Modula's module system in general); yet I'm using C++ for the project I am working on. From even the purely technical point of view, despite all the features OCaml has that I enjoy, building this system wouldn't be feasible -- I need shared memory concurrency and multi-core support, for example.

On the other hand, memory leaks occur just well in memory-safe languages (looking at Modula-3, I see a WeakPtr type -- which is there for this very reason). Finally there are also many reasons to want to have manual control over memory -- which when using high-level languages, ends up looking very much like C programming (e.g., Unsafe module in Java). However, you're essentially on your own when doing this.

By the way, memory debuggers are just one example -- plain debuggers are another, tools for tracing the code execution, performance tuning, etc... are another. As are mature and universally available libraries.

I don't mean to disparage high-level languages: I greatly enjoy programming in them and would like them to expand. Yet we're talking about existing code -- and in this case, C's maturity and universality (as a "lingua franca" of programming) are unrivaled. Nothing stops one from using -- and enjoying -- both advanced high level languages and C.


I don't equate C++ with C.

At least with C++ the language offers the mechanisms without external tooling, for doing safe coding, if one so wishes.

You can use arrays that don't decay into pointers, bounded checked arrays, proper strings, references for output parameters, automatic reference counting.


Type safety is a lie. Everything is a piece of memory. C does not hide this from you, it's a design feature and not a bug.


> Type safety is a lie. Everything is a piece of memory.

So speaks an Assembly developer

> C does not hide this from you, it's a design feature and not a bug.

I guess the design goal was to make security exploits as easy as taking caddies from children.


>> So speaks an Assembly developer

C is a portable, readable assembler abstraction in a lot of ways, yes. I'm not seeing what's wrong with that. Was that supposed to be an insult?

>> I guess the design goal was to make security exploits as easy as taking caddies from children.

Do children often have caddies? I've never noticed a child with a caddy. Golfers perhaps.

I'm still not sure why you think type safety is the be-all and end-all of everything. You can have memory violations and leaks just as easily in type-safe environments as any other. There are lots of ways to f*ck up in C, but the flexibility turns out to be useful as well, when used appropriately.


The other day I was quite surprised to learn that the first version of memcached was written in Perl - thinking something like that should be obviously written in C, or else could never make LAMP systems faster... :)


Keep in mind the date it was written -- this was before SSDs and in the days of tiny heaps. Any improvement upon seek times of what were then commodity disks (SCSI/SAS being insanely expensive and out of bradfitz' reach, afaik) would easily dominate the cost of Perl's runtime (which, in retrospect, isn't horribly bad -- the GC and VM are more mature and less trouble-prone than most common scripting languages, although nowhere near LuaJit or any modern Scheme/Common Lisp implementation).


Why it's obvious an user-space daemon should be written C? It's a myth other runtimes don't have enough performance.


It's not about performance (nothing memcache does or more precisely -- given all new features I am likely unaware about -- when it was first rewritten in C is particularly CPU intensive), it's about memory management and efficiency: a in-memory system serves to reduce latency by using large amounts of memory; tuning the JVM to handle managed heaps larger than ~24gb (with only half to 2/3 of that usable for caching data) is doable but pretty much a black art if you want to avoid huge latency spikes caused by garbage collection pauses (which defeats the purpose of an in-memory store). Even if pauses are rare enough as not to affect 95-pctile latency (or if you don't care about 95-th pctile latency) in a highly-available system they introduce one of the more nasty distributed systems failures -- node A talks to node B and goes into a GC pause; node B thinks node A is dead, node A thinks node B is dead -- because when it didn't receive a reply to its outstanding requests until after it emerged from a GC pause triggering a timeout (all in the mean while certain figures from RDBMS community insist this didn't actually occur because "partitions never happen in a single datacenter...")

While it's very much possible to directly allocate and manage memory yourself in Java -- and I have done that ( http://mail-archives.apache.org/mod_mbox/hbase-commits/20130... ) -- but without memory safety guarantees, without the great tooling, without the general idea that "if your code compiles, it will probably work correctly". In other words, you're programming a very verbose and ugly (see DirectByteBuffer interface) C. There's many cases where this approach ("off-heap memory") makes sense when building memory intensive apps in Java that clearly benefit everywhere else from being written in Java: JNI is unpleasant to work with, JNA adds additional performance penalties on top of those imposed by JNI calls that it makes under the cover -- which is fine in many cases, but remember that reading files or sockets are also JNI calls -- but less so when you're building a system that distinguishes itself by being in memory.

On the other hand, if you use C (or "C+" a.k.a. "C with objects", or -- as I prefer to say to avoid confusion with libraries like glib or apr or the Linux vfs layer, all successful object oriented systems written in regular C -- "C with templates") you have accesses to the existing works (advanced allocators like jemalloc and tcmalloc don't work very well with JVM), excellent memory-debugging tools, and so on...

I will say this: I think more software rather than less should be written in higher level languages. I absolutely love OCaml, Erlang, and Lisps (especially those -- like Typed Racket and Clojure -- that have started importing features from Haskell/ML family), like what I see in Go. I use Python (and formerly Perl) on a daily basis for "casual programming", to experiment with new ideas, and automation. I've written a great deal of software in Java, where many people blinked the idea that this category of software could be more than a toy in Java. I think garbage collection could be great improved and I see no theoretical reasons why, e.g., compilers can't be written in OCaml as opposed to C or C++ (I do see practical reasons having to do with runtime, lack of multi-core support but that's a separate and fixable issue). I find projects aimed at making systems programming in high-level languages fascinating (e.g., Mirage OS, Microsoft's experiments, Jikes RVM, etc...)

However, even if these theoretical strides are achieved, there's always going to be room for C -- in the end, you need systems that give you great deal of control and act in a very deterministic and predictable fashion . In the mean while, however, there's also need to build practical systems -- so while C and C++ may not be ideal in theory, they are often the only realistic option in practice.

[1] Note, however, I didn't say anything about performance -- OCaml, Java, Haskell, LuaJit etc.. do well in the Debian benchmarks; Lisps follow those languages closely. Likewise, many high-level languages can be AOT compiled. While when written by a strong programmer and/or assisted by today's excellent optimizing compilers C will still beat these fast high-level languages in most cases, it should be noted that today it's extremely difficult to write assembly code by hand that beats assembly code written by an optimizing C compiler or even the JVM. I predict that when a language comes about that has a better designed type system than C/C++ and yet still provides ability to control memory layout much as C does, eventually compilers for that language will emit assembly code that beats assembly emitted by C compilers for the same reason compiler-written assembly beats hand-written assembly -- compiler is able to leverage the type information (declared or -- in the case of dynamically typed languages with fast runtimes -- inferred) to aid optimization.


Because those of us who have been doing it for 15 years or more know we're tangling with tigers and have figured out how to use their strength against them...


These pieces are the reason why I still read HN today. Thanks for submitting it and thanks for upvoting it, everyone.


I should note that less experienced programmers tend to start with higher level languages, as a rule, so the average experience of a C programmer is likely higher than that of the HLL programmer. This general experience level alone gives the resulting code a reliability boost.


With notable exceptions, I think this is true. I've been coding seriously for 2 years and am starting to move down the stack. I see others in my cohort doing the same thing.


> "Surprisingly few people can really program in C and yet many of us have quite strong opinions about it."

Maybe they have strong opinions precisely because they don't find C to be a language which is easy to "really" program in?


C programmers have better culture. Than say Java programmers. If I was a swordsman and had a rusty sword and you had a new one, the sword does not decide the fight.


Yes, mainly because new sword has an auto-grinder welded to it that might start running at any given moment, even if you are in a deathly fight :-)


Kent Pitman's idea of "languages as political parties" [1] seems to me to offer the best perspective on the surprising reliability of C. The core community shaping C over the years has been:

1. Unix-centric, so the community has a coherent view on the preferred sort of semantics offered to users of code;

2. Applies Kernighan--Ritchie--Pike coding style that gives a clear "code smell" [2] for C code; and

3. Is the natural home of "Worse is Better", which favours source-code simplicity over specification simplicity.

For example, the "C party" favours procedures "succeeding" with non-zero exit codes on failure where other language parties have language constructs to represent failure. This can be better from the point of view of source-code simplicity, both in code and compiler, than exceptions, and worse in terms of semantics. The LK coding style, among many other C coding styles, prefers the use of goto (or sometimes setjmp/longjmp) to handle exit status [3], which is frowned upon in nearly all non-C programming language communities; the restriction of goto/longjmp to handling errors in this way avoids the problems Dijkstra pointed out [4,5], and correct usage of goto/longjmp in properly checking exit status has good code smell if generally accepted coding conventions for the use of goto to handle failure are followed.

There's a nice fragment of code illustrating the idea on SO [6].

[1]: Pitman 1994, Lambda: The ultimate political party, http://www.nhplace.com/kent/PS/Lambda.html

[2]: https://en.wikipedia.org/wiki/Code_smell

[3]: From http://www.tux.org/lkml/: So now we come to the suggestion for replacing the goto's with C exception handlers. There are two main problems with this. The first is that C exceptions, like any other powerful abstraction, hide the costs of what is being done. They may save lines of source code, but can easily generate much more object code. Object code size is the true measure of bloat. A second problem is the difficulty in implementing C exceptions in kernel-space. This is convered in more detail below.

[4]: Dijkstra 1968, A case against the goto statement, http://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/EW...

[5]: Knuth, Structured programming with go-to statements, http://pic.plover.com/knuth-GOTO.pdf

[6]: http://stackoverflow.com/a/741517/222815


Good Resource!


Tarsnap is the classic example of a C program whose reliability almost defies belief. Its codebase is large; it solves a very hard problem; and it manages to handle almost every cornercase flawlessly. (As of the latest version, it seems to handle every cornercase flawlessly, as far as I know.)

But I've always wondered: if Colin were as experienced with Python as he is with C, would Tarsnap be better served by writing it in Python? It seems like the codebase would be half as large.

In other words, is C the reason why Tarsnap is good, or is it because of Colin? Or is it both?

Example: Lisp, along with PG, is the reason why HN is good. Many of HN's features simply wouldn't be easily accomplished without it, so they wouldn't ever have been implemented. But I'm not sure the same could be said of Tarsnap.

But let's be honest - in how many languages can one retrospectively add a garbage collector?

Many. Lisp in particular. HN's links like /x?fnid=KxPuWC6qnkl4nxcUTDh4xs are an example of this.


Wait, HN is good? The community is interesting, but I've never been particularly impressed by the implementation. In particular, the "feature" where "More" links and such randomly expire after a while just screams "shoddy implementation" to me.


The thing is, the "more" links are specific to each user and generated page. They're not just "page 3 at the time you click on the link", but the actual next 30 links following the 30 it's previously shown you. As a result, it needs to store state for each of these links it's created (IIRC, the fnid is a reference to the closure containing that state). For obvious memory management purposes, these need to be expired after some time.

I agree that it's an odd implementation choice though. I guess it made sense when HN was smaller and it could grant more ressources to each user. Nowadays, Reddit's simpler system (with risks of duplicates or missing rising links as you go deeper) would probably make more sense.


I understand that it's a tough problem and can totally understand solving it by just making the links quit working like that. But that sort of thing doesn't strike me as "good", even if it is highly pragmatic.


And why not store that state on the client? 30 serialized ids is not that much.


Yeah, I really can't believe that hasn't been fixed by now.

Not only that, but it's impossible to follow discussions here. You have to go through your comment history to find threads you reply to.


The UI is in some way (i think) intentionally poor. For example, when you want to reply to a comment it takes you to a new page, and then you still have to click on the textarea to focus it. I think they don't want people to be able to comment quickly.


It's rarely the language that has any actual effect on the end result. The language is just a side show - the main story is about how well the developer understood the problem space and how well the abstractions and solutions he generated work to solve the problem.

Language can help a bit in making it clearer and giving the programmer more time to work on the important parts - but this actually goes both ways. Not having to worry about memory is a definite plus, but it can also be a big minus when you need to do something specific with that memory. For example, mutating in place is very easy in C when its required, but trying to mutate in place in some of the newer languages is very difficult.

Also it's doubtful if Tarsnap would be better if it was in Python. It might (or might not, depending on optimizations) have been quicker to write it originally in Python - but nobody would trade an already created C tarsnap for a Python tarsnap because the Python one would use more ram and run slower.


> It's rarely the language that has any actual effect on the end result.

Implementing HN in APL is left as an exercise for the reader.


> the reason why HN is good

I don't get this.

I would chose reddit over HN any given day technically / ergonomically speaking.

Actually if https://news.ycombinator.com was returning a 301 to https://reddit.com/r/hackernews I would be very happy.


Is /r/hackernews officially restricted by Reddit or what is going on there?


> E.g. Lisp, along with PG, is the reason why HN is good. Many of HN's features simply wouldn't be easily accomplished without it

Could you expand/elaborate on this? I've never viewed HN as particularly complex, just tastefully designed with a good community.


I never read the Arc source but I bet what he means is how comments are laid out in memory which you can see is reflected with direct links:

https://news.ycombinator.com/item?id=6857138

I'm not sure why that would be impossible in a none-lisp, just less elegant.


The traditional argument for high level languages is that you can try many more things in the same time, and that you can focus more of your cognitive faculties on the business problem at hand.

But C guides you to simplify, and gives you control. Combined with the right certain amount of taste this can lead to exceptionally good outcomes. See also qmail, plan 9, etc.


/x?fnid=KxPuWC6qnkl4nxcUTDh4xs resolves to "unknown or expired link". Do you have a full URL?


It got garbage-collected. That's the point.


Gotcha. The parent comment made it seem like that was a link to information about retrospectively adding GC to other languages, especially given that Lisp already has GC as nearly an assumption at the language level. But I can see what he was trying to get at.


haha, exactly


Since everyone has access to decent c toolchain these days you tend to see this over and over again. The "wisdom tradition" of UNIX? There is some kind of need for this kind of counter factual "the ancients had powers we can't believe" thing... the thing is that languages and frameworks surface sets of techniques for solving problems that if not present end up needing to be reinvented... this goes for C and UNIX, mainframes, os400, erlang...whatever. Maybe I'm just being contrary, but it seems to me that the lessons I learn in one language apply to all the others at least in some facet.

I worked my way through SICP last year and it really changed the way that I think about code...while also not making me a lisp nut. It seemed to me that lisp could have been anything else and the techniques would still be there.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: