Hacker News new | comments | ask | show | jobs | submit login
How can C Programs be so Reliable? (tratt.net)
249 points by ColinWright on Sept 22, 2011 | hide | past | web | favorite | 152 comments

"C is unreliable" is the wrong way to conceptualize the problem. The argument against C is that it is inefficient. It's not that C programs are broken; it's that if you spend six hours writing C code your code will either do less, or be more broken, than what you would have produced had you spent those six hours writing, e.g., Perl or Javascript or Lisp instead.

There's no reason why you can't write correct C code, or correct assembly code for that matter. The challenge is to do so without wasting a lot of time: Any amount of time that you spend consciously thinking about correct memory management or hand-optimizing your opcodes could probably be spent doing something more important, unless you are working on one of the few problems where that kind of optimization is actually the bottleneck.

Of course, the flip side of having to think about every layer is that you get to see and potentially tweak every layer. It's nice to work on something transparent. It's nice to know what is going on down there among the sockets and the buffers. I've been thinking about practicing some C for just that reason, and it seems to be why the OP likes C. But I don't anticipate being very efficient when writing my own web server in C. My website will be better if I just install a big pile of other people's C and get on with designing or writing.

I think it is a mistake to think that you can't be efficient just because you're using C. It all depends on the abstractions you need to write your code. When writing code for the web, using Python is faster not just because of the language, but because there is so much ready to be used. Putting together 10 disparate libraries is much easier in Python/Ruby than it will ever be in C and C++. However, there is a lot of code that either doesn't require all these libraries or where the infrastructure is well established in C (think about kernel drivers for example). In that case, programming in C is not fundamentally harder than in Python, if you have enough experience.

Regardless of the abstractions you use, buttoning down a C runtime environment so you can be rely on the assumptions behind those abstractions is a chore. There's a pretty famous Mark Dowd Sendmail vulnerability that relies almost entirely on (only- incidentally- sent-) signal timing that I like to use as an example here, but rlimits are another.

The problem is, your code will appear to work fine even if you don't complete the chore. It isn't until your program blows up that it'll even occur to you that there was more work to be done.

> There's a pretty famous Mark Dowd Sendmail vulnerability that relies almost entirely on (only- incidentally- sent-) signal timing that I like to use as an example here

Would love to see a reference to this.

The unwillingness to cut and paste "mark dowd sendmail" into google for something you would "love to see" is amusingly typical of lots of the comments on this thread. It's the first result BTW

I like the second result better than the first, for what it's worth. ;)

But come on. While let-me-Google-that-for-you requests are annoying, they are in toto less toxic to threads than comments like yours; at least the lame question generates a factual answer.

FWIW I Googled for "mark dowd signal timing sendmail vulnerability" which led me to http://lwn.net/Articles/176596/ which didn't look like the right vulnerability and gave me the impression that Mark Dowd does a lot of vulnerability research -- it seemed that finding the exact vulnerability tptacek was talking about might be more complicated than just asking him directly.

> buttoning down a C runtime environment so you can be rely on the assumptions behind those abstractions is a chore.

may be hard, but not more difficult than creating the run time for any other language such as Java or Python -- and they are all written in C.

Irrelevant, since those runtimes are already written. And yes, I am a primarily a C programmer, although I've been mostly doing C++ recently.

Your point is valid, but you still miss the larger point you're replying to. Even without libraries, coding in C is going to require more lines of code than coding the same functionality in a higher-level language like Python. And lines of code correlate more or less evenly with time/effort/thought that goes into writing the code.

Crucially, this linkage does not depend on whether there are libraries for what you're trying to do or experience level in the language. If coding a particular feature takes 1000 lines of code in C vs. 100 in Python, you'd have to assert that a programmer can write (and maintain) code 10x faster in C than Python. Since roughly the same amount of conceptual energy goes into each line of code, this is tantamount to the assertion that a programmer writing C can think through concepts 10x faster than when the same programmer is writing Python.

There's a reason the field invented higher-level languages, and it doesn't entirely have to do with novice programmers.

Less lines of code does not equal more efficient code. A single C statement has almost a 1:1 correlation between underlying assembly when compiled, where as a single statement in a high level language usually consists of several assembly instructions.

I think the overarching point is experience with the language. I for one have seen and written incredibly long and convoluted programs in a high level language which was largely due to a weak understanding of the massive API's that come with it. And it would have taken roughly the same amount of time to read and understand what the API functions do and how to use them as it would in C.

In the end, it's all relative and it's totally dependent on experience with the language and what you need for your application.

This thread is about efficiency of the coding process, not the execution process. Nobody would argue that an arbitrary high-level language performs as well as the same code written C or hand-tuned assembly. The point of this discussion is that writing C or hand-tuned assembly takes more lines of code (and therefore thought) than writing the same code in a higher-level language.

So with this definition of efficiency, less lines of code does equal more efficient code.

I find that by far the most time/effort/thought goes into figuring out the overall structure of the algorithm I want to implement. Once I've sketched that out, translating it into lines of code is pretty much just typing.

It's not that C programs are broken; it's that if you spend six hours writing C code your code will either do less, or be more broken, than what you would have produced had you spent those six hours writing, e.g., Perl or Javascript or Lisp instead. </quote>

Python doesn't produce smaller programs compared to C http://plg.uwaterloo.ca/~migod/846/2011-Winter/projects/Simo...

That article doesn't debunk the claim very effectively, IMO: of Bazaar, Git, Mercurial, SVN, and CVS, Python-dominated Mercurial is the smallest by a healthy margin (~50k SLoC), whereas the more C heavy CVS, Git, and SVN weigh in at ~2x, 5x, and 8x that amount.

Sure, we throw in the almost 100%-Python Bazaar, at 200k, which CVS beats. But CVS vs. any of these other source control systems is not really a fair comparison, IMO, and it's still blown out of the water by Mercurial.

I'm not saying that this proves that Python code is smaller (comparing source control systems to each other is completely unfair, since they differ so much in feature set, platform target, and code quality), but it certainly does not disprove it.

There is no single best language for all problems out there. Let's say you have a working program that does something useful over a TCP/IP network but you have zero documentation for it's protocol and yet you still want to communicate with it. Trying to do this in Perl is actually harder than C assuming equivalent understanding of both languages. Yet, doing the same except it's a browser your talking to and suddenly Perl wins. The advantage of higher level languages lies not in the their ability to do everything well but their ability do handle huge projects that do a ridiculous number of things with minimal pain.

PS: PG is in love with LISP in large part because he deals with problems well suited to the LISP domain. But, if he had been writing drivers he would have gone in another direction. The real lesson is if you wanted to write a great XML editor use something in the LISP family or something that's closely related to it not that LISP wins on all fronts.

Great article. My personal take on this is that C programs are so damn reliable because there is nothing under the hood, the building blocks are so simple and transparent that you can follow the thread of execution with minimal mental overhead.

That means that when you lay out your program the most important parts (memory map and failure modes) are clearly visible.

IF you are a good programmer.

And that's the reason there is an obfuscated C contest, if a C programmer sets his or her mind on being deliberately hard to understand that same power can be used against any future reader of the code. Incompetence goes a long way towards explaining some of C's bad reputation. You can write bad code in any language, but none give you as much rope to hang yourself with as C (and of course, C++).

    the building blocks are so simple and transparent
    that you can follow the thread of execution with 
    minimal mental overhead.
I do not agree.

I've seen plenty of code that does weird things with pointers, like passing around a reference to a struct's member, then to retrieve the struct decrementing a value from the pointer + casting. Or XOR-ing pointers in doubly-linked lists for compression. And these are just simple examples.

I've seen code where I was like "WTF was this guy thinking?".

My biggest problem with C is that error handling is non-standard. In case of errors ome functions are returning 0. Some are returning -1. Some are returning > 0. Some are returning a value in an out parameter. Some functions are putting an error in errno. Some functions are resetting errno on each call. Some functions do not reset errno.

Also, the Glibc documentation is so incomplete on so many important issues that it isn't even funny.

Yes, kernel hackers can surely write good code after years of experience with buggy code that they had to debug.

But for the rest of the code, written by mere mortals, I basically get a headache every time I have to take a peek at code somebody else wrote.

> I've seen plenty of code where I was like "WTF was this guy thinking?".

Yes, that happens. But I've seen that in COBOL, Perl, Pascal, Java, PHP and in Ruby as well.

> In case of errors (s)ome functions are returning 0. Some are returning -1.

That's not a feature of the language.

     That's not a feature of the language.
Well, yes, but it's kind of nice when you've got exceptions with stack traces attached.

Some people don't like exceptions, but I do.

It is indeed "kind of nice". But the question at hand is whether it's a requirement for writing reliable software. I tend to agree with the posts here that argue that it's not. It saves time for developers, it doesn't meaningfully improve the quality of the end product.

Serious C projects tend to come up with this stuff on their own, often with better adapted implementations than the "plain stack trace" you see in higher level environments. Check out the kernel's use of BUG/WARN for a great example of how runtime stack introspection can work in C.

gdb: where

No, but C could use more consistent error handling semantics, rather than conflating return values and error codes. Worse still: a combination of a return code and a global error code.

"Worse still: a combination of a return code and a global error code."

That's not the worst that exists in C :-) Let me quote a dietlibc developer from http://www.koders.com/c/fid1639C203A2255EB1FA11DC6A68D74FEB2...

    /* Oh boy, this interface sucks so badly, there are no  words for it.
    * Not one, not two, but _three_ error signalling methods!  (*h_errnop
    * nonzero?  return value nonzero?  *RESULT zero?)  The glibc goons
    * really outdid themselves with this one. */

And a comment written by a 13-year old proves what exactly?

But first function returning multiple values. Baby steps. :-)

> In case of errors (s)ome functions are returning 0. Some are returning -1.

That's not a feature of the language.

The inconsistency is a natural, expected, unavoidable result of the language forcing, er strongly encouraging, use of an unsuitable error reporting mechanism ("find some value in the range of the function's return type that isn't in the range of the function, and use it to indicate an error"). This wouldn't be an issue with exceptions or tuples / multivalue return like some languages allow.

> like passing around a reference to a struct's member, then to retrieve the struct decrementing a value from the pointer + casting.

that's not weird, that's a pretty standard way to enqueue structures on singly/doubly linked lists... it's made somewhat prettier by offsetof/CONTAINING_RECORD though

Yeah, but why?

I mean, can't you pass a reference to the whole structure instead? I prefer pointers to void* to the whole thing, with a normal cast later, instead of seeing pointer arithmetic.

I'm not a C developer, I just play around -- I've seen for example this practice used in libev, passing around extra-context along with the file-handler in events callbacks.

That seems really ugly to me, as they could have added an extra parameter for whatever context you wanted to be passed around.

It's a relatively common pattern to have a "collection" data structure (like a list or hash table) use link structs embedded inside other structs to simplify memory management. When you append to a linked list in Java, you always allocate a new Link object and then update the various pointers. Using this pattern in C, the object you want to put in a list contains a list_link_t structure, and the list library takes as arguments pointers to these structures. This may sound like an argument about convenience, but the implications of this are very significant: if you have to allocate memory, the operation can fail. So in the C version (unlike the Java one) you can control exactly in which contexts something can fail.

For example, if you want to trigger some operation after some number of second elapses, you can preallocate some structure and then just populate it and fill it in when the timeout fires. Timeouts are usually just signals or other contexts where you have no way to return failure, so it's important that it be possible to always handle that case correctly without the possibility of failing.

This pattern is sometimes called an intrusive data structure. For example, see boost::intrusive in the C++ world. It saves allocation, gives better locality, and allows various optimizations such as the ability to remove an object from a doubly-linked list in constant time.

Another way to think about all the offsetof() stuff is that it's emulating multiple inheritance in C. You can think of structures as inheriting the "trait" of being a participant in a container; the "pointer-arithmetic-and-cast" idiom to move from a container entry to the corresponding object is isomorphic to downcasting from the trait to the object that contains it.

Interestingly it is not possible to express this pattern in a generic way within Java's type system.

True - and the important point here is that this is a pattern. Like any language, to be truly fluent in C you have to understand the common idioms as well as the syntax, grammar and vocabulary.

the purpose of offsetof/CONTAINING_RECORD is so that you don't "see" the pointer arithmetic, it's safely ensconced in a macro ;)

one advantage is it produces a generic linked list API. you can write routines to traverse, add, and remove elements from the list without caring about the structure of data stored in the list. if you use offsetof, you can also have the list data for a structure at any position inside of the structure instead of the beginning. some systems do that so they can store header information at the beginning.

you can also have elements enqueued on multiple lists. you might say that if you're doing that, you have bigger problems, but sometimes shakespere got to get paid.

Check out the Linux kernel's linked list implementation for an example of it being done right. The actual workings of it are hidden behind macros that you can look at (quite simple, easy to grasp) so it's very clean to used.

> I've seen code where I was like "WTF was this guy thinking?".

You can write obfuscated code in any language. The point about C is that the mental model is very simple. There's no magic happening anywhere, so if you can parse the language, you can figure out what's happening line-by-line pretty easily.

This is one of the biggest reasons Linus Torvalds refuses to re-write the Linux kernel in C++ even though he is repeatedly pushed to do so... in C++ a whole bunch of things outside the file (templates, operator overloading) can make it so that what you're looking at doesn't do what you think.

In C, what you see is what you get. :)

For definitions of "repeatedly pushed to do so" that mean "asked on occasion by language trolls on the mailing lists who aren't even kernel devs". ;)

    #define int double

There is a special place in hell for you.

Had me laughing though :)

Remember that if you want to use C++ in kernel development, you'll only get a small subset of C++ because there's no runtime system to rely on. Exceptions are one example of this.

Without exceptions, the advantages of C++ are not that great compared to the hassle it needs to get running in kernel mode like dealing with name mangling, static/global constructors, etc and hassling with compilers.

> like passing around a reference to a struct

I thought that references are a feature of C++, not C. Personally, I never really got references... They are just a kind of magical pointers that programmers can forget about, but they make the code much less readable and can interact in funny ways...

Btw, I corrected my statement -- I was referring to "a struct's member" from which later you can retrieve the actual struct that owns that value.

Of course C has references because C has pointers. References in C++ are just constant pointers.

> Of course C has references because C has pointers. References in C++ are just constant pointers.

This is incorrect. You cannot have a reference to nullptr, for example. Pointers and references are different beasts, nowhere in the C standard does it refer to pointers as references. The underlying representation in compilers does not imply equivalence.

Absolutely. Having learned Pascal before C, I really missed pass-by-reference for quite some time. Efficiency wise, a reference is just a hidden pointer, BUT, it is nice to know that the reference CANNOT be null. The caller of a routine expecting a reference must have actual data to pass, or the routine never gets called.

C is a very handy portable assembler, though.

You might like to read "Moron Why C is NOT Assembly"


I've seen code that does things like "Foo &foo = * ((Foo * )0);" (possibly split among multiple statements). It seems to work fine, I suppose it's really undefined behavior?

Dereferencing a pointer to 0 seemed to work fine?

I think what happened was that the reference was passed to a function that (1) under most conditions (I think there was a fast-path added after the function had been around a while) accessed it directly, and (2) under all conditions turned it back into a pointer for another call. The way that function was called in this particular case was outside of those "most conditions", so the only thing that was done with the invalid reference was to turn it back into a pointer and then null-check it. And so while making the reference probably counts as "dereferencing" the pointer as far as language rules go, the memory that it pointed to was never actually accessed.

(Why yes, that does sound like something badly in need of refactoring. And illegal reliance on implementation details.)

On AIX, page 0 is mapped and readable, so dereferencing null works just fine. :)

Actually, the C standard does say:

  A pointer type describes an object whose value
  provides a reference to an entity...

References are a C++ creation with a precise definition. This is what this quote is talking about, considering C does not have references.

You don't need to do wacky things with pointers to get into trouble in C:

  int i;

  /* Iterates over everything except the last n elements of array... right? */
  for (i = 0; i < length - innocent_little_function(); i++)

>You don't need to do wacky things...

A function call in a for loop's conditional doesn't fit your definition of wacky?

A function call in a for loop's conditional is practically C best practice. K&R do it on almost every page.

They usually only do it for builtins (e.g., strlen) and const functions, which makes this example a lot safer.

A bit suspicious, maybe? I was trying to suggest the unsigned issue without actually spelling it out, not anything to do with side effects.

I probably should have used sizeof, even though that doesn't make sense there.

Incompetence goes a long way towards explaining some of C's bad reputation

Not just incompetence. Also bad language choice (usually due to legacy).

If programmers don't get enough time to properly test and review the code, which needs to be done very thoroughly in C, it's easy for even experienced developers to shoot themselves in the foot.

C is very good (let's say irreplacable) for low-level hardware and OS code. This is code that needs to be verified and tested very well.

On the other hand, using C for run-of-the-mill business projects or higher-level stuff on a tight deadline can be a very bad idea. It results in a lot of overhead for programmers to think about the details of error handling, buffer sizes, pointers, memory allocation/deallocation and so on, especially getting it right for every function. It is a recipe for screwups.

In this case it is very useful to have garbage collection, bounds checking, built-in varlength string handling, and other "luxuries" that modern languages afford you.

Right. C with Lua is a nice combination serving a wider range of projects without sacrificing C's approach to low-level correctness.

I love Lua, but what stack would allow you to use it in a web project?

Mongrel2 has a Lua web framework called tir.

Precisely. These days, with the advent of easy to integrate Javascript/Python/Ruby/Lua scripting languages, there is very little reason to write an entire application in C. My last two projects, I wrote the business logic in a scripting language, and all of the performance bound stuff in C. There are a few gotchas the first time you do this - your bindings code needs to fit with the object model that you're using in C, you need to force all calls into the VM onto the same thread etc. I ended up just writing my own IDL parser/bindigs generator, but once you've done that once, you can use it for all of your future projects, and it really isn't that much work. The pay-off is huge - you can arbitrarily move a module of code between C and your scripting language depending on the optimal split for performance/ease of programming.

Very true. When you have managers shouting "get it out" and over-promising to clients, then any language is a bad choice, but particularly powerful languages that require more careful thought and testing. I love C++ and use it regularly, but when I need to "get stuff out quickly" I'll use something like Python as I can be more reckless with it.

I think that in other languages such as Object Pascal it is easier to be a good programmer. I've seen horrible code written in OP, yes, but I think the language itself helps a programmer be better. For one thing there are fewer ways to kill yourself than even in C++, and yet there is little difference in speed between them.

I think C is one of these "other languages" because "here be dragons".

I watch a lot of people bang out C++ code as if it's totally safe, and fail. I see a lot of people hammer out C# code and say, "to hell with you, you don't even have .NET!" And so on. But today, when a programmer sits down and writes a C program they must sit and think out what they're doing and why—with no abstractions like OO to make an easy solution.

There are so many ways to blow your head off in C without knowing you left the opportunity in the program, that it forces a competent programmer to think differently about how they code. And a newbie? Well, if they aren't scared stiff about blowing a hole in their system, they should be! ;D

And C doesn't change often, unlike other languages.

I'm no C programmer, but I've seen C code for years and translated it into whatever language I'm using at the time. I have tremendous respect for UNIX/Linux, and a great many C-powered programs. Thanks for your work on them, guys and gals.

> But today, when a programmer sits down and writes a C program they must sit and think out what they're doing and why

That's true for any programming language. Sadly, far too often, programmers are unable to afford taking the time needed to think about what they are doing or understanding what happens under the hood of the libraries they link against.

>Incompetence goes a long way towards explaining some of C's bad reputation

Incompetence is REASON for C bad reputation(if any).

Agree. I'd also say that someone who is not a good programmer will find C quite unreliable. Bugs and issues popping up every now and then.

That is why in the "old days" C was used as a language for advanced courses in CS programs. It teaches you to think straight about your code. Nowadays people think it is easier just to write anything and catch exceptions later.

> the building blocks are so simple and transparent that you can follow the thread of execution with minimal mental overhead.

The first fundamental purpose of any programming language is to provide abstraction via functions. This implies that following the thread of execution is never easy and the blocks are never simple. It's pretty much a wash, with special mention for languages in the Hindley-Milner family.

The second fundamental purpose of any programming language is to provide specification abstraction via replaceable modules. This is where C fails. It is common practice in C culture to not specify interfaces in depth (we are all good programmers, aren't we?) and the implementation via manual virtual tables makes it painfully difficult to find the specific implementations in the code base.

My problem with this article is the use of the word 'flaw' to describe the potential pitfalls of programming in C. Use of that word seems to imply that these things are accidental, and maybe if it had been better designed the problems wouldn't exist.

The original idea of the language (or at least a major part of it) was to be a portable alternative for the many processor specific assembly languages in use - rather than having to write the same functionality for each one, you could write it once in C and then compile it for each platform. If that's your aim, then you will end up directly manipulating memory, and you open yourself up to that whole class of errors - memory leaks, array overruns, pointer arithmetic mistakes. All C gives you is portable access to how processor hardware works, with a few conveniences (y'know - function calls).

If you want to protect against these problems you have to add some extra layers of abstraction between the language and the underlying hardware, and that comes at a cost. That cost is mostly performance, but thanks to Moore's law these days that is a much lower priority hence the abundant use of higher level languages - Java, Python etc.

My point is that C is how it is _on purpose_. This direct access to the hardware comes with some downsides, but they aren't 'flaws', they come hand in hand with the power.

I would contend that C does very little in the way of preventing errors and debugging them if they occur. The claim that "[..] only two C-specific errors have thus far caused any real problem in the Converge VM," is completely beside the point. Language specific errors have never been the problem. Java's infamous null-pointer exceptions are not java specific: the C equivalent would be a segfault. And please do note, that Java prints a stack trace by default to help correct the mistake. A huge step forward from C's generic segfault.

The real reason that most C programs in daily use are so robust, is because they are ages old. Many, many man-years have been invested in the production of e.g. BSD, unix tools, POSIX libraries, and even web browsers and word processors.

Why do we use Javascript and even PHP to program web-applications? Because we need fewer lines to get the same result. Moreover, given the correlation between number of lines and number of bugs, shorter programs are better. If we had been limited to C "web 2.0" would have been decades away.

A nullpointer exception in java is the exact same notification that C gets. A segmentation fault has nothing to do with C, it's an exception raised by the CPU. Printing a stack trace also has zero to do with C, but I would bet that the code generating the trace information in the JVM and others is usually written in C, as is the signal handler that handles it.

It is amazing that the people who rely on high-level languages think they can stomp on lower-level languages like this, without even realizing that virtually _all_ the features they talk about are made possible by low level languages and are implemented in them.

New C programs get written all the time. They work. They move your world, every day, like clockwork.

I'm glad some people reading this thread have a better perspective. It's actually kind of shocking how many people talk about C as if no one uses it anymore. "In the old days..."

Provided the binary has a minimum of debugging symbols, you can get a meaningful backtrace from a core dump of a C program. It is very similar to the java stack trace. You can even get the value of the parameters, which AFAIK, java doesnt do.

Assuming the core wasn't dumped as a result of secondary damage, causing a completely correct line of code to be shown at the top of the stack trace.

At least with Java, pointers/references/objects are either null, or valid. Uninitialized references won't compile, and a null dereference blows stack at point of first use.

Having said that, I like pointer and bounds checking, but wish Java had significant memory management options, for times when you were willing to trade some safety for speed.

"Java prints a stack trace by default to help correct the mistake, a huge step forward from C's generic segfault."

A segfaulting program will dump core, which the programmer can use to get the stack trace. I consider this to be better UI than printing the trace at a likely bewildered user.

Distros ship with "ulimit -c" being 0, so no, it won't dump core.

Blame your distribution for that, not C... On OpenBSD "it just works (tm)":

  $ ulimit -c

Although for most Linux users nowadays a limit of zero is the Right Thing.

...Which means that only the people who want coredumps/stacktraces get them. Instead of everyone.

Sounds like a feature.

Ah, interesting information nugget! I have been wondering why there's no core dumps around in linux any more, in the old days when I used redhat & mandrake the fs would be littered with them.

Think of javascript and PHP as DSLs for web development written on top of C. Their reliability and conciseness is an indirect proof of the authors main point.

C allows you to write interpreted languages that execute with a speed high enough to afford you more abstraction.

That's a bit too cheer-leady of C, and stretches the definition of DSL way too much. Ruby didn't suddenly become a Java DSL when it was ported to the JVM, neither did Perl become a Haskell DSL when Pugs was written.

Let's give credit where it's due, but really C could have been Ada in any of those examples and the results would have been about the same (especially in PHP's case).

A huge step forward from C's generic segfault.

This is straightforward enough in C: http://gaiustech.wordpress.com/2011/09/09/segmentation-fault...

valgrind doesn't dump registers but doesn't require special compilation options and if debugging symbols are available will include file names and line numbers in the stack trace.

Obviously I have no hard data on the reliability of new C programs, but things like git (which I at least find pretty reliable) may serve as counter point to this theory.

Don't forget that only the core of Git is C, the rest is big pile of shell scripts, and (I believe) perl.

This is not particularly the case. Things with git are initially implemented as shell scripts, as a way of "just getting it done", but are later migrated to C. These days a very large portion of Git is straight C.

Much of the early git scripts (e.g. git-pull) have been rewritten in C. Performance is one reason. Helping out win32 (where fork+exec is slow) is another.

  $ ls -1 git-*.sh | wc -l
  $ ls -1 git-*.perl | wc -l
Compare this to the built-in commands:

  $ ls -1 builtin/*.c | wc -l
or all the C files:

  $ git ls-files '*.c' | wc -l

I suspect the authors of Git are - on average - more competent than the average developer.

when one calls a function like stat in C, the documentation lists all the failure conditions

Actually, no. When the documentation says

    This function shall fail if:

    [EFOO]   Could not allocate a bar.
it doesn't mean that this is the only possible failure; POSIX states that functions "may generate additional errors unless explicitly disallowed for a particular function".

Except in very rare circumstances, when you make system or library calls you should be prepared to receive an E_NEW_ERROR_NEVER_SEEN_NOR_DOCUMENTED_BEFORE and handle it sanely (which in most cases will involve printing an error message and exiting).

That is true, but good man pages still tend to document all or at least a lot of failure conditions. It's quite reasonable to do something like

    if ((fd = open(myfile, O_RDWR | O_NONBLOCK, 0644)) == -1) {
        switch(errno) {
        case ENOENT: case ENOTDIR: case EACCESS:
        case ELOOP: case ENAMETOOLONG: case EPERM:
            warn("Cannot open file");
            goto choose_file_to_open;
        case EISDIR:
            if (chdir(myfile) != 0)
                warn("Failed to enter %s", myfile);
            goto choose_file_to_open;
        case ENXIO: case EWOULDBLOCK:
            err("Cannot open file");
The above is well-documented in open(2); compare, for instance, http://docs.python.org/library/os.html#os.open.

`os.open` is a low-level interface whose semantics are platform-dependent. The Python analogue is the builtin `open`, which documents that it raises IOError on a failure to open a file: http://docs.python.org/library/functions.html#open .

True, but you have to dig a lot to find any details. The above code is easy to write from a good man page (OpenBSD's, in this case.)

(Also, open(2) has all these options for a reason; think symlink races. Python's open() is not sufficient.)

Note, open is not part of C. A better comparison would be fopen which, according to the C99 draft I have, just returns NULL on failure.

You're technically correct, but that's not the "C" being discussed in this article. Also, fopen(3) on OpenBSD does document values errno after failure (by reference to the malloc(3) and open(2) man pages), and I expect that any reasonable system does likewise.

Sure. And you should certainly think about all the failure conditions listed in the man page and make sure you're handling them sanely. Just make sure that you don't forget to include the

            err("Cannot open file");

My biggest nightmare: debugging a chunk of badly documented C code that does not check return values to system calls at all...

This would be my dream. Reading and writing C code for a living no matter what the current state of the code.

I have a gut feeling that there is some merit to the idea that exception handling isn't all that great. Just so much code out there does not really handle the exceptions, it just exit(1)s. C will teach you to check return values (usually easy enough: if (result==NULL) {fatalerror(1,"result not OK");}) . If you don't, the program will continue to run (derailed). Most 'high-level' programmers will consider an abortion of execution just fine, while C programmers will put more thought into handling an error situation. Few C programs will automatically abort with a core dump on the first occasion of 'record not found'.

> compilers were expensive (this being the days > before free UNIX clones were readily available)

I'm not sure what era the author is referring to, here. In the late 80's, Turbo C broke the price barrier for a decent MS-DOS C compiler at the $79-$99 price range. Shortly after that, Mix began offering their MS-DOS Power C compiler for $20. Tom Swan's book "Type and Learn C++" provided a tiny-model version of Turbo C++ on a disk provided with the book.

The GNU ports djgpp and GCC were available for MS-DOS and Windows in later years.

> the culture was intimidatory;

I'm again wondering what time-period he's talking about. When I started learning C in the late 80's, most of the trade magazines were full of articles that used C as the primary language for whatever programs or techniques were being presented. Dr. Dobbs Journal was full of C code. Before Byte quit publishing source code, one could find a fair amount of C there. Of course, the specialty magazines like The C/C++ User's Journal and the C Gazette contained nothing but C and later C++ code.

> This is a huge difference in mind-set > from exception based languages,

Yes. C is a language that was designed two decades before Java.

At first, I was really taken aback by the author's take on C, but as I tried to digest why he has these perceptions of the language, I ventured to guess that a number of developers who came of age when languages with more modern niceties were available probably also have this view of C. From the perspective of someone who has been able to use more modern languages, C must seem like a rickety bridge that could be dangerous to cross.

A number of points that Mr. Tratt makes, though, pertain to the programmer; not the language. Certainly there are library routines that allow for buffer overflows, like gets(). It's been known for quite a while ( since the Internet worm was unleashed in 1988? ) that fgets() should be favored so that buffer boundaries can be observed. Certainly people writing their own functions may not write them correctly, but this is a matter of becoming conversant with C. It's a matter of attaining the right experience.

I would guess he's talking about the early 80s. I was in the same boat in that time period. I didn't have access to C until my senior year in college (1984-85) when we got a new Computer Science department head who was pro-Unix, and we got a Vax 11-750 with BSD Unix on it. I worked on a special project that gave me access to that machine, and there I learned Unix, vi, and C. I thought I had died and gone to heaven!

I've recently been curious why Ada isn't more popular in industry and academia beyond its niches in avionics and defense. Seems close in speed & memory usage to C/C++, has good GNU tools, and claims resiliance to the pitfalls of C.

ADA is a stereotypical non-hacker language. It was invented by a DOD committee & requires a good deal of 'extra' fluff to write a program.

Also, as I recall, to be a certified ADA-compliant compiler requires having a ton of libraries.

Ada (not ADA) was largely the product of one man, Jean Ichbiah.

The notion that it's some kind of commitee-created monstrosity seems largely to have come from ESR's wildly inaccurate writeup of it in the Hacker's Dictionary. It's actually a fairly small and nice language.

The object oriented extensions didn't follow the "standard" Java "." syntax, though, which probably hurt Ada 95's uptake more than it should have.

because "C" was invented by geeks and Ada was brought into existence by "The Man" and geeks are as dogmatic as the rest of humanity when it comes to changing their beliefs even when they proven to be wrong.

Before my university transformed into a Java school it was an Ada school. From what I heard the compilers for Ada in the early 90s were extremely slow.

In the end it all boils down on what you're building. If you've done your fair share of programming (C, C++, Java, PHP, Python, Ruby) then you just go with the tools that are best for the job.

Would I write a complete web service in C? Probably not. Would I write a fast image manipulation/modification library for that specific website if needed in C (or C++)? Probably -- because I like the performance gain when I'm converting 10.000 images.

I love the fact that you can just build components in different languages and then glue them together so you can build awesome products.

10,000 images is not over the break even point yet (unless you do something really hard resulting in a runtime significantly larger than what it would cost to farm out that job to a bunch of EC2 instances, it's all about the question whether your time is more expensive than the cost to rent the hardware to do the job).

It starts to pay off when you write that package either as a service with a large number of users or if you make a general purpose library for inclusion in lots of other programs, especially if they are written in other languages.

There could be some filtering going on, both on the type of programs one tends to write in C and the type of people that write C. It could be that problems for which C is chosen tend to be intrinsically more well defined (command line applications, kernels, libraries, etc). It could also be that C intimidates less talented programs so some self-selection could be happening.

These days, with new software perhaps. But go back just 10 - 15 years and you'll find pretty much every type of software was being written in C. Even webapps: tons of CGI apps were written in C.

The reality is that doom and gloom about C is overrated. Sure, you can shoot yourself in the foot easily, but most competent programmers will do just fine. That's been my general experience and I don't think I've spent my career surrounded by rockstars :-)

I actually agree with you. I never bought into the idea that C programs are intrinsically less reliable. They might even be more reliable, because they don't depend on black boxes that you can't figure out or fix when something goes wrong.

I still write CGI programs in C.

Same here except I've "upgraded" to FastCGI.

Good point. I'd use FastCGI (or equivalent) if I had a significant load; but my CGI programs tend to execute only a few times per hour, so forking is cheaper than keeping a process running.

A week ago I would react differently to this article. But I just had my belief system overhauled by reading http://blog.vivekhaldar.com/post/10126017769/smeeds-law-for-....

I had posted this a few hours ago (http://news.ycombinator.com/item?id=3024495) - I suppose the success of this one has something to do with the submitter? Or the time submitted...

As with comedy, timing is everything. I have had items sink without trace, only to see someone submit the same thing and have it get 100, 200 or 300 points and occupy the front page for a day or more.

It happens.

Added in edit: Just for reference, I didn't down-vote you. Not least, one can't down-vote replies to one's own submissions or comments.

HN is retarded this way.

I see all my posts sink. The only reason I bother to post is it gets maybe 50 people to read and that is better than 0 I figure.

I don't understand the exception argument. You can choose which Exceptions to catch in languages such as Java, just as you would choose which error to deal with in C, but exceptions are so MUCH more powerful because they allow you to check for the error in user code rather than at each function call. In C, errors don't trickle down and you need to deal with them in each level of abstraction, which can be totally useless and time consuming.

I'm a noob at languages with built-in structured exception handling. I'm comfortable with the C approach.

In C, errors don't trickle down and you need to deal with them in each level of abstraction, which can be totally useless and time consuming.

Time consuming, yes, absolutely. "Useless" I don't understand at all. It's structured exceptions that more often seem useless to me.

The more layers of code an exception bubbles up (or trickles down) through, the less the exception handler can know about where it happened, why it happened, or what the resulting state of the program is. Very often, the only "handling" that can be done is making a report of the exception.

It seems to me that the most useful exception handlers, the ones most likely to actually salvage the situation and allow the program to continue to work, are the ones that immediately follow an exception-throwing call, the ones that don't allow any trickle-down.

But those are degenerates, of course. They're functionally equivalent to C-style error return codes.

What you say about error handling is true enough, but it's just not the whole story.

One problem with C-style error handling is that having error handling at all levels makes it impossible to reuse code without tweaking it. Say you handle an error by printing a warning message to stderr, now you can't reuse that code in anything that doesn't want error messages printed there (maybe it doesn't want errors printed, or it needs to localize them, or it's using stderr for something else like in strace). So error handling in C kill reusability.

Another problem with C-style error handling is that the number of error types increases as you pass the buck up levels, but C doesn't have any good way to express more than one type at a time. Say one function returns true or false and another returns an error constant like errno. When one function calls the other, what do you return at the top level? You could map one to the other, but you've lost maybe important information about the error. So error handling in C kills composability.

And a third problem is documentation. With no standard error types that are known to the compiler it is rarely possible to tell the caller they forgot some error handling.

Exceptions address the first two problems, and checked exceptions the third one. A common misunderstanding of Java checked exceptions is that they force the caller to handle the error, when what they really do is force to caller to document the errors it can generate.

This is absolutely untrue. Look at how libpng handles exceptions. It uses setjmp. Look at how FreeTDS handles errors. It allows you to define a function pointer to handle error cases. Yes you can throw return codes down through each layer but there are lots of other ways to build in exception like handling as well.

If you're going for reliability, exceptions are generally speaking more unpredictable than error codes at the function level (as they can propagate up, including out of libraries, where you may not know they'll originate from).

Also, setjmp: http://en.wikipedia.org/wiki/Setjmp.h if you really really really like exceptions and want them in C.

Personally, I'm partial to "catch ( Throwable e)" in Java, so that any exception or error gets noticed, and then log what was broken by such problem.

The declared exceptions (and exceptions vs errors) in Java are a nuisance, IMHO. For example, use any kind of "dependency injection" ("strategy" pattern, driver plugin(s) for other, older languages) via reflection, and suddenly all of your exceptions have transmogrified into errors :-(

I don't find C programs to be very reliable at all.

Heavily used ones are as reliable as other heavily used programs, but barely any C programmers even use clang (static analysis) or even the elderly lint and its more modern cousins.

This on top of half of people calling themselves C programmers are really C++ programmers (they really are quite different how you use them in the correct manner), I don't really think he's correctly summarizing the field at all.

edit: I have been a C programmer for most of my career, including embedded linux, cli linux (including research robotics), and C-Servers to communicate to the above

I'm not some guy who just knows python and bitches about "the hard compiled languages" (although I do like python and ruby and objective-C).

> if we're being brutally honest, only fairly competent programmers tend to use C in the first place.

Oh, if only that were true. I've seen some not-so competent programmers churn out lots of C code (and then move on to C++ in order to do some real damage)

C programs are reliable because either they're small, or, in the case of the few large reliable ones like the Linux kernel, they have undergone a tremendous number of eyeball-hours of review.

I don't see how any language that depends on manual handing of error return codes can ever be considered "reliable". It's far, far too easy to leak memory and other resources. As other posters have noted, the only reason a lot of popular C programs are reliable is that they've been groomed with a fine-toothed comb.

The only low-level language that has any innate claim to reliability is C++ with proper use of the RAII idiom.

> The only low-level language that has any innate claim to reliability is C++ with proper use of the RAII idiom.

Or rather a very carefully chosen subset of C++. See http://yosefk.com/c++fqa/exceptions.html#fqa-17.3 for some of the problems.

By the way, what about ADA?

I have to admit a total ignorance of ADA. In domains where it's appropriate it may be a good choice. It's certainly a niche language compared to C/C++ though.

I think some of Yosef's advice on C++ is out of date WRT to the C++11 standard too. shared_ptr is now the recommended smart pointer, for instance.

Ada is typically less well known and not used a lot, but if you want reliable code, it is pretty nice.

Be aware that it is a b&d language.

I usually use Haskell at work, which is even more b&d in a sense and it's pretty reliable. But you can not call Haskell low-level.

PL/SQL is similar to Ada in many ways.

It's like finding an old circus tightrope walker who has performed for 30 years without a net and saying:

My goodness look at that, how can those tightropes be so reliable?

First, you have to decide if you are building a batch program, or an interactive program. Then, you make your own I/O and memory wrappers with an appropriate error handling strategy: exit(), or longjmp(), etc. In addition, for a batch process, you might have isolated blocks of logic that use setjmp/longjmp to log and skip over blocks of crappy input.

The point is that, as mentioned elsewhere, C is "build your own system" level -- VERY LOW (level). The overhead of wrapping primitives with strategies for your app is minimal, once you know it's an issue.

I wrote about some of this on my blog back in Feb. ( http://www.jayfuerstenberg.com/blog/hot-potato-thoughts-on-j... ) Java's exceptions cause Java applications to break often. It's not something that Java engineers want to hear but it's true.

Oh, I love C. So simple and so powerful. God help me, I barely write anything in C anymore, but it will always hold a dear place in my heart until the day I die.

Thank you for allowing me this nostalgic indulgence, hackernewers. I know for at least a few of you, it will resonate.

C is beautiful because http://en.wikipedia.org/wiki/Duff%27s_device is possible.

I agree with you.

But C is also ugly because Duff's device is possible. Think about it from an optimizing compiler standpoint. It takes a serious amount of effort to turn Duff's device -type control flow into an intermediate representation that can be somehow optimized. Now compare that to a language that is based on some form of extended lambda calculus.

C apps are reliable because C programmers embrace C's direct simplicity. Other languages aspire to be more complex by adding new features and syntax, where C remains stubbornly simple. Still dangerous, but still simple.

Fancy that, catching Colin at his own game :)

This kind of "FUD" surrounding C is definitely exaggerated. There's an awkward knee-jerk glow to the whole article, not least from the fact that the writer admits his inexperience in C. At times it even seems as if he lacks experience in programming, silently admitting his failure to comprehend the computer/software symbiosis altogether. After reading the article I played around with a funny exercise in my mind: I replaced the semantical mentions of C and programming with "tightrope walking", moving his arguments out of the computer programming sphere, and suddenly the general, ridiculous tone of the article stood out even more clear. Tightrope walking can be really, really tricky. Running with scissors can be done in a risky way, I suppose. Practicing pistol marksmanship incurs some risk, too.

"pointers... arguably the trickiest concept in low-level languages, having no simple real-world analogy"

Arguably, indeed. The analogy is quite simple - a gigantic roulette wheel with 2^$membusbits slots, except the numbers are sequential. The ball is the pointer and pointer arithmetic involves moving the ball around the wheel.

I use Excel as an analogy when explaining pointers. If you imagine the machine's memory as a gigantic, single column excel table, then a pointer -- or address -- to the third slot in the machine's memory would be the value 'A3'.

When you're referencing that row in Excel, you don't copy around its value, but rather, you copy the address of the value. That way, if you change the value, any other cells that reference it will also fetch the new value.

An even simpler analogy is a street address. (Or if you want to involve numbers: Postal codes.) An address is a place where people live, but it's not the actual place.

For people familiar with a spreadsheet, I think the Excel analogy better helps people understand the purpose of pointer dereferencing, because it's something you do all the time when building a mildly complex spreadsheet.

That's funny, my CS prof used to use addresses as an example of multi-dimensional array indices: "state" being a major index, down through city, street, to the number on the street being a minor index.

Of course, in C, arrays are just funny looking pointers anyway, so I guess an address can be both a pointer and an ordered set of array indices.

I like the analogy of PO boxes at the post office. They are contiguous, numbered, contain data (or not), and could have a forwarding address to a new PO box.

"Error. Unhandled exception."

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact