
How can C Programs be so Reliable? - ColinWright
http://tratt.net/laurie/tech_articles/articles/how_can_c_programs_be_so_reliable#
======
mechanical_fish
"C is unreliable" is the wrong way to conceptualize the problem. The argument
against C is that it is _inefficient_. It's not that C programs are broken;
it's that if you spend six hours writing C code your code will either do less,
or be more broken, than what you would have produced had you spent those six
hours writing, e.g., Perl or Javascript or Lisp instead.

There's no reason why you can't write correct C code, or correct assembly code
for that matter. The challenge is to do so without wasting a lot of time: Any
amount of time that you spend consciously thinking about correct memory
management or hand-optimizing your opcodes could probably be spent doing
something more important, unless you are working on one of the few problems
where that kind of optimization is actually the bottleneck.

Of course, the flip side of having to think about every layer is that you get
to _see_ and potentially tweak every layer. It's nice to work on something
transparent. It's nice to know what is going on down there among the sockets
and the buffers. I've been thinking about practicing some C for just that
reason, and it seems to be why the OP likes C. But I don't anticipate being
very efficient when writing my own web server in C. My website will be better
if I just install a big pile of other people's C and get on with designing or
writing.

~~~
coliveira
I think it is a mistake to think that you can't be efficient just because
you're using C. It all depends on the abstractions you need to write your
code. When writing code for the web, using Python is faster not just because
of the language, but because there is so much ready to be used. Putting
together 10 disparate libraries is much easier in Python/Ruby than it will
ever be in C and C++. However, there is a lot of code that either doesn't
require all these libraries or where the infrastructure is well established in
C (think about kernel drivers for example). In that case, programming in C is
not fundamentally harder than in Python, if you have enough experience.

~~~
tptacek
Regardless of the abstractions you use, buttoning down a C runtime environment
so you can be rely on the assumptions behind those abstractions is a chore.
There's a pretty famous Mark Dowd Sendmail vulnerability that relies almost
entirely on (only- incidentally- sent-) signal timing that I like to use as an
example here, but rlimits are another.

The problem is, your code will appear to work fine even if you don't complete
the chore. It isn't until your program blows up that it'll even occur to you
that there was more work to be done.

~~~
haberman
> There's a pretty famous Mark Dowd Sendmail vulnerability that relies almost
> entirely on (only- incidentally- sent-) signal timing that I like to use as
> an example here

Would love to see a reference to this.

~~~
epo
The unwillingness to cut and paste "mark dowd sendmail" into google for
something you would "love to see" is amusingly typical of lots of the comments
on this thread. It's the first result BTW

~~~
tptacek
I like the second result better than the first, for what it's worth. ;)

But come on. While let-me-Google-that-for-you requests are annoying, they are
_in toto_ less toxic to threads than comments like yours; at least the lame
question generates a factual answer.

------
0x12
Great article. My personal take on this is that C programs are so damn
reliable because there is nothing under the hood, the building blocks are so
simple and transparent that you can follow the thread of execution with
minimal mental overhead.

That means that when you lay out your program the most important parts (memory
map and failure modes) are clearly visible.

 _IF_ you are a good programmer.

And that's the reason there is an obfuscated C contest, if a C programmer sets
his or her mind on being deliberately hard to understand that same power can
be used against any future reader of the code. Incompetence goes a long way
towards explaining some of C's bad reputation. You can write bad code in any
language, but none give you as much rope to hang yourself with as C (and of
course, C++).

~~~
bad_user

        the building blocks are so simple and transparent
        that you can follow the thread of execution with 
        minimal mental overhead.
    

I do not agree.

I've seen plenty of code that does weird things with pointers, like passing
around a reference to a struct's member, then to retrieve the struct
decrementing a value from the pointer + casting. Or XOR-ing pointers in
doubly-linked lists for compression. And these are just simple examples.

I've seen code where I was like "WTF was this guy thinking?".

My biggest problem with C is that error handling is non-standard. In case of
errors ome functions are returning 0. Some are returning -1. Some are
returning > 0\. Some are returning a value in an out parameter. Some functions
are putting an error in errno. Some functions are resetting errno on each
call. Some functions do not reset errno.

Also, the Glibc documentation is so incomplete on so many important issues
that it isn't even funny.

Yes, kernel hackers can surely write good code after years of experience with
buggy code that they had to debug.

But for the rest of the code, written by mere mortals, I basically get a
headache every time I have to take a peek at code somebody else wrote.

~~~
0x12
> I've seen plenty of code where I was like "WTF was this guy thinking?".

Yes, that happens. But I've seen that in COBOL, Perl, Pascal, Java, PHP and in
Ruby as well.

> In case of errors (s)ome functions are returning 0. Some are returning -1.

That's not a feature of the language.

~~~
bad_user

         That's not a feature of the language.
    

Well, yes, but it's kind of nice when you've got exceptions with stack traces
attached.

Some people don't like exceptions, but I do.

~~~
ajross
It is indeed "kind of nice". But the question at hand is whether it's a
requirement for writing reliable software. I tend to agree with the posts here
that argue that it's not. It saves _time_ for developers, it doesn't
meaningfully improve the quality of the end product.

Serious C projects tend to come up with this stuff on their own, often with
better adapted implementations than the "plain stack trace" you see in higher
level environments. Check out the kernel's use of BUG/WARN for a _great_
example of how runtime stack introspection can work in C.

------
rkangel
My problem with this article is the use of the word 'flaw' to describe the
potential pitfalls of programming in C. Use of that word seems to imply that
these things are accidental, and maybe if it had been better designed the
problems wouldn't exist.

The original idea of the language (or at least a major part of it) was to be a
portable alternative for the many processor specific assembly languages in use
- rather than having to write the same functionality for each one, you could
write it once in C and then compile it for each platform. If that's your aim,
then you will end up directly manipulating memory, and you open yourself up to
that whole class of errors - memory leaks, array overruns, pointer arithmetic
mistakes. All C gives you is portable access to how processor hardware works,
with a few conveniences (y'know - function calls).

If you want to protect against these problems you have to add some extra
layers of abstraction between the language and the underlying hardware, and
that comes at a cost. That cost is mostly performance, but thanks to Moore's
law these days that is a much lower priority hence the abundant use of higher
level languages - Java, Python etc.

My point is that C is how it is _on purpose_. This direct access to the
hardware comes with some downsides, but they aren't 'flaws', they come hand in
hand with the power.

------
stygianguest
I would contend that C does very little in the way of preventing errors and
debugging them if they occur. The claim that "[..] only two C-specific errors
have thus far caused any real problem in the Converge VM," is completely
beside the point. Language specific errors have never been the problem. Java's
infamous null-pointer exceptions are not java specific: the C equivalent would
be a segfault. And please do note, that Java prints a stack trace by default
to help correct the mistake. A huge step forward from C's generic segfault.

The real reason that most C programs in daily use are so robust, is because
they are ages old. Many, many man-years have been invested in the production
of e.g. BSD, unix tools, POSIX libraries, and even web browsers and word
processors.

Why do we use Javascript and even PHP to program web-applications? Because we
need fewer lines to get the same result. Moreover, given the correlation
between number of lines and number of bugs, shorter programs are better. If we
had been limited to C "web 2.0" would have been decades away.

~~~
tolmasky
Obviously I have no hard data on the reliability of new C programs, but things
like git (which I at least find pretty reliable) may serve as counter point to
this theory.

~~~
rkangel
Don't forget that only the core of Git is C, the rest is big pile of shell
scripts, and (I believe) perl.

~~~
burgerbrain
This is not particularly the case. Things with git are initially implemented
as shell scripts, as a way of "just getting it done", but are later migrated
to C. These days a very large portion of Git is straight C.

------
cperciva
_when one calls a function like stat in C, the documentation lists all the
failure conditions_

Actually, no. When the documentation says

    
    
        This function shall fail if:
    
        [EFOO]   Could not allocate a bar.
    

it doesn't mean that this is the only possible failure; POSIX states that
functions "may generate additional errors unless explicitly disallowed for a
particular function".

Except in very rare circumstances, when you make system or library calls you
should be prepared to receive an E_NEW_ERROR_NEVER_SEEN_NOR_DOCUMENTED_BEFORE
and handle it sanely (which in most cases will involve printing an error
message and exiting).

~~~
JoachimSchipper
That is true, but good man pages still tend to document all or at least a lot
of failure conditions. It's quite reasonable to do something like

    
    
        if ((fd = open(myfile, O_RDWR | O_NONBLOCK, 0644)) == -1) {
            switch(errno) {
            case ENOENT: case ENOTDIR: case EACCESS:
            case ELOOP: case ENAMETOOLONG: case EPERM:
                warn("Cannot open file");
                goto choose_file_to_open;
            case EISDIR:
                if (chdir(myfile) != 0)
                    warn("Failed to enter %s", myfile);
                goto choose_file_to_open;
            case ENXIO: case EWOULDBLOCK:
                enqueue_open_callback(myfile);
                return;
            default:
                err("Cannot open file");
        }
    

The above is well-documented in open(2); compare, for instance,
<http://docs.python.org/library/os.html#os.open>.

~~~
jemfinch
`os.open` is a low-level interface whose semantics are platform-dependent. The
Python analogue is the builtin `open`, which documents that it raises IOError
on a failure to open a file:
<http://docs.python.org/library/functions.html#open> .

~~~
JoachimSchipper
True, but you have to dig a lot to find any details. The above code is easy to
write from a good man page (OpenBSD's, in this case.)

(Also, open(2) has all these options for a reason; think symlink races.
Python's open() is not sufficient.)

~~~
sausagefeet
Note, open is not part of C. A better comparison would be fopen which,
according to the C99 draft I have, just returns NULL on failure.

~~~
JoachimSchipper
You're technically correct, but that's not the "C" being discussed in this
article. Also, fopen(3) on OpenBSD does document values errno after failure
(by reference to the malloc(3) and open(2) man pages), and I expect that any
reasonable system does likewise.

------
jjr
I have a gut feeling that there is some merit to the idea that exception
handling isn't all that great. Just so much code out there does not really
handle the exceptions, it just exit(1)s. C will teach you to check return
values (usually easy enough: if (result==NULL) {fatalerror(1,"result not
OK");}) . If you don't, the program will continue to run (derailed). Most
'high-level' programmers will consider an abortion of execution just fine,
while C programmers will put more thought into handling an error situation.
Few C programs will automatically abort with a core dump on the first occasion
of 'record not found'.

------
jim_lawless
> compilers were expensive (this being the days > before free UNIX clones were
> readily available)

I'm not sure what era the author is referring to, here. In the late 80's,
Turbo C broke the price barrier for a decent MS-DOS C compiler at the $79-$99
price range. Shortly after that, Mix began offering their MS-DOS Power C
compiler for $20. Tom Swan's book "Type and Learn C++" provided a tiny-model
version of Turbo C++ on a disk provided with the book.

The GNU ports djgpp and GCC were available for MS-DOS and Windows in later
years.

> the culture was intimidatory;

I'm again wondering what time-period he's talking about. When I started
learning C in the late 80's, most of the trade magazines were full of articles
that used C as the primary language for whatever programs or techniques were
being presented. Dr. Dobbs Journal was full of C code. Before Byte quit
publishing source code, one could find a fair amount of C there. Of course,
the specialty magazines like The C/C++ User's Journal and the C Gazette
contained nothing but C and later C++ code.

> This is a huge difference in mind-set > from exception based languages,

Yes. C is a language that was designed two decades before Java.

At first, I was really taken aback by the author's take on C, but as I tried
to digest why he has these perceptions of the language, I ventured to guess
that a number of developers who came of age when languages with more modern
niceties were available probably also have this view of C. From the
perspective of someone who has been able to use more modern languages, C must
seem like a rickety bridge that could be dangerous to cross.

A number of points that Mr. Tratt makes, though, pertain to the programmer;
not the language. Certainly there are library routines that allow for buffer
overflows, like gets(). It's been known for quite a while ( since the Internet
worm was unleashed in 1988? ) that fgets() should be favored so that buffer
boundaries can be observed. Certainly people writing their own functions may
not write them correctly, but this is a matter of becoming conversant with C.
It's a matter of attaining the right experience.

~~~
cicero
I would guess he's talking about the early 80s. I was in the same boat in that
time period. I didn't have access to C until my senior year in college
(1984-85) when we got a new Computer Science department head who was pro-Unix,
and we got a Vax 11-750 with BSD Unix on it. I worked on a special project
that gave me access to that machine, and there I learned Unix, vi, and C. I
thought I had died and gone to heaven!

------
aklein
I've recently been curious why Ada isn't more popular in industry and academia
beyond its niches in avionics and defense. Seems close in speed & memory usage
to C/C++, has good GNU tools, and claims resiliance to the pitfalls of C.

~~~
pnathan
ADA is a stereotypical non-hacker language. It was invented by a DOD committee
& requires a good deal of 'extra' fluff to write a program.

Also, as I recall, to be a certified ADA-compliant compiler requires having a
ton of libraries.

~~~
msbarnett
Ada ( _not_ ADA) was largely the product of one man, Jean Ichbiah.

The notion that it's some kind of commitee-created monstrosity seems largely
to have come from ESR's wildly inaccurate writeup of it in the Hacker's
Dictionary. It's actually a fairly small and nice language.

The object oriented extensions didn't follow the "standard" Java "." syntax,
though, which probably hurt Ada 95's uptake more than it should have.

------
charlesdm
In the end it all boils down on what you're building. If you've done your fair
share of programming (C, C++, Java, PHP, Python, Ruby) then you just go with
the tools that are best for the job.

Would I write a complete web service in C? Probably not. Would I write a fast
image manipulation/modification library for that specific website if needed in
C (or C++)? Probably -- because I like the performance gain when I'm
converting 10.000 images.

I love the fact that you can just build components in different languages and
then glue them together so you can build awesome products.

~~~
0x12
10,000 images is not over the break even point yet (unless you do something
really hard resulting in a runtime significantly larger than what it would
cost to farm out that job to a bunch of EC2 instances, it's all about the
question whether your time is more expensive than the cost to rent the
hardware to do the job).

It starts to pay off when you write that package either as a service with a
large number of users or if you make a general purpose library for inclusion
in lots of other programs, especially if they are written in other languages.

------
TelmoMenezes
There could be some filtering going on, both on the type of programs one tends
to write in C and the type of people that write C. It could be that problems
for which C is chosen tend to be intrinsically more well defined (command line
applications, kernels, libraries, etc). It could also be that C intimidates
less talented programs so some self-selection could be happening.

~~~
HeyLaughingBoy
These days, with new software perhaps. But go back just 10 - 15 years and
you'll find pretty much every type of software was being written in C. Even
webapps: tons of CGI apps were written in C.

The reality is that doom and gloom about C is overrated. Sure, you can shoot
yourself in the foot easily, but most competent programmers will do just fine.
That's been my general experience and I don't think I've spent my career
surrounded by rockstars :-)

~~~
cperciva
I still write CGI programs in C.

~~~
aninteger
Same here except I've "upgraded" to FastCGI.

~~~
cperciva
Good point. I'd use FastCGI (or equivalent) if I had a significant load; but
my CGI programs tend to execute only a few times per hour, so forking is
cheaper than keeping a process running.

------
akkartik
A week ago I would react differently to this article. But I just had my belief
system overhauled by reading
[http://blog.vivekhaldar.com/post/10126017769/smeeds-law-
for-...](http://blog.vivekhaldar.com/post/10126017769/smeeds-law-for-
programming).

------
swah
I had posted this a few hours ago
(<http://news.ycombinator.com/item?id=3024495>) - I suppose the success of
this one has something to do with the submitter? Or the time submitted...

~~~
ColinWright
As with comedy, timing is everything. I have had items sink without trace,
only to see someone submit the same thing and have it get 100, 200 or 300
points and occupy the front page for a day or more.

It happens.

 _Added in edit: Just for reference, I didn't down-vote you. Not least, one
can't down-vote replies to one's own submissions or comments._

------
sktrdie
I don't understand the exception argument. You can choose which Exceptions to
catch in languages such as Java, just as you would choose which error to deal
with in C, but exceptions are so MUCH more powerful because they allow you to
check for the error in user code rather than at each function call. In C,
errors don't trickle down and you need to deal with them in each level of
abstraction, which can be totally useless and time consuming.

~~~
Lagged2Death
I'm a noob at languages with built-in structured exception handling. I'm
comfortable with the C approach.

 _In C, errors don't trickle down and you need to deal with them in each level
of abstraction, which can be totally useless and time consuming._

Time consuming, yes, absolutely. "Useless" I don't understand at all. It's
structured exceptions that more often seem useless to me.

The more layers of code an exception bubbles up (or trickles down) through,
the less the exception handler can know about where it happened, why it
happened, or what the resulting state of the program is. Very often, the only
"handling" that can be done is making a report of the exception.

It seems to me that the most useful exception handlers, the ones most likely
to actually salvage the situation and allow the program to continue to work,
are the ones that immediately follow an exception-throwing call, the ones that
don't allow any trickle-down.

But those are degenerates, of course. They're functionally equivalent to
C-style error return codes.

~~~
0xABADC0DA
What you say about error handling is true enough, but it's just not the whole
story.

One problem with C-style error handling is that having error handling at all
levels makes it impossible to reuse code without tweaking it. Say you handle
an error by printing a warning message to stderr, now you can't reuse that
code in anything that doesn't want error messages printed there (maybe it
doesn't want errors printed, or it needs to localize them, or it's using
stderr for something else like in strace). So error handling in C kill
reusability.

Another problem with C-style error handling is that the number of error types
increases as you pass the buck up levels, but C doesn't have any good way to
express more than one type at a time. Say one function returns true or false
and another returns an error constant like errno. When one function calls the
other, what do you return at the top level? You could map one to the other,
but you've lost maybe important information about the error. So error handling
in C kills composability.

And a third problem is documentation. With no standard error types that are
known to the compiler it is rarely possible to tell the caller they forgot
some error handling.

Exceptions address the first two problems, and checked exceptions the third
one. A common misunderstanding of Java checked exceptions is that they force
the caller to handle the error, when what they really do is force to caller to
_document_ the errors it can generate.

------
gte910h
I don't find C programs to be very reliable at all.

Heavily used ones are as reliable as other heavily used programs, but barely
any C programmers even use clang (static analysis) or even the elderly lint
and its more modern cousins.

This on top of half of people calling themselves C programmers are really C++
programmers (they really are quite different how you use them in the correct
manner), I don't really think he's correctly summarizing the field at all.

edit: I have been a C programmer for most of my career, including embedded
linux, cli linux (including research robotics), and C-Servers to communicate
to the above

I'm not some guy who just knows python and bitches about "the hard compiled
languages" (although I do like python and ruby and objective-C).

------
fbomb
> if we're being brutally honest, only fairly competent programmers tend to
> use C in the first place.

Oh, if only that were true. I've seen some not-so competent programmers churn
out lots of C code (and then move on to C++ in order to do some real damage)

------
ScottBurson
C programs are reliable because either they're small, or, in the case of the
few large reliable ones like the Linux kernel, they have undergone a
tremendous number of eyeball-hours of review.

------
cageface
I don't see how any language that depends on manual handing of error return
codes can ever be considered "reliable". It's far, far too easy to leak memory
and other resources. As other posters have noted, the only reason a lot of
popular C programs are reliable is that they've been groomed with a fine-
toothed comb.

The only low-level language that has any innate claim to reliability is C++
with proper use of the RAII idiom.

~~~
eru
> The only low-level language that has any innate claim to reliability is C++
> with proper use of the RAII idiom.

Or rather a very carefully chosen subset of C++. See
<http://yosefk.com/c++fqa/exceptions.html#fqa-17.3> for some of the problems.

By the way, what about ADA?

~~~
tomjen3
Ada is typically less well known and not used a lot, but if you want reliable
code, it is pretty nice.

Be aware that it is a b&d language.

~~~
eru
I usually use Haskell at work, which is even more b&d in a sense and it's
pretty reliable. But you can not call Haskell low-level.

------
jayfuerstenberg
I wrote about some of this on my blog back in Feb. (
[http://www.jayfuerstenberg.com/blog/hot-potato-thoughts-
on-j...](http://www.jayfuerstenberg.com/blog/hot-potato-thoughts-on-java-
exceptions-and-error-handling) ) Java's exceptions cause Java applications to
break often. It's not something that Java engineers want to hear but it's
true.

------
beej71
Oh, I love C. So simple and so powerful. God help me, I barely write anything
in C anymore, but it will always hold a dear place in my heart until the day I
die.

Thank you for allowing me this nostalgic indulgence, hackernewers. I know for
at least a few of you, it will resonate.

------
dicroce
C is beautiful because <http://en.wikipedia.org/wiki/Duff%27s_device> is
possible.

~~~
exDM69
I agree with you.

But C is also ugly because Duff's device is possible. Think about it from an
optimizing compiler standpoint. It takes a serious amount of effort to turn
Duff's device -type control flow into an intermediate representation that can
be somehow optimized. Now compare that to a language that is based on some
form of extended lambda calculus.

------
snorkel
C apps are reliable because C programmers embrace C's direct simplicity. Other
languages aspire to be more complex by adding new features and syntax, where C
remains stubbornly simple. Still dangerous, but still simple.

------
derleth
Previously:

<http://news.ycombinator.org/item?id=3024495>

~~~
0x12
Fancy that, catching Colin at his own game :)

------
hackermom
This kind of "FUD" surrounding C is definitely exaggerated. There's an awkward
knee-jerk glow to the whole article, not least from the fact that the writer
admits his inexperience in C. At times it even seems as if he lacks experience
in programming, silently admitting his failure to comprehend the
computer/software symbiosis altogether. After reading the article I played
around with a funny exercise in my mind: I replaced the semantical mentions of
C and programming with "tightrope walking", moving his arguments out of the
computer programming sphere, and suddenly the general, ridiculous tone of the
article stood out even more clear. Tightrope walking can be really, really
tricky. Running with scissors can be done in a risky way, I suppose.
Practicing pistol marksmanship incurs some risk, too.

------
diolpah
"pointers... arguably the trickiest concept in low-level languages, having no
simple real-world analogy"

Arguably, indeed. The analogy is quite simple - a gigantic roulette wheel with
2^$membusbits slots, except the numbers are sequential. The ball is the
pointer and pointer arithmetic involves moving the ball around the wheel.

~~~
nupark2
I use Excel as an analogy when explaining pointers. If you imagine the
machine's memory as a gigantic, single column excel table, then a pointer --
or address -- to the third slot in the machine's memory would be the value
'A3'.

When you're referencing that row in Excel, you don't copy around its value,
but rather, you copy the address of the value. That way, if you change the
value, any other cells that reference it will also fetch the new value.

~~~
lobster_johnson
An even simpler analogy is a street address. (Or if you want to involve
numbers: Postal codes.) An address is a place where people live, but it's not
the actual place.

~~~
Roboprog
That's funny, my CS prof used to use addresses as an example of multi-
dimensional array indices: "state" being a major index, down through city,
street, to the number on the street being a minor index.

~~~
Roboprog
Of course, in C, arrays are just funny looking pointers anyway, so I guess an
address can be both a pointer and an ordered set of array indices.

------
hack_edu
"Error. Unhandled exception."

