

Who Says C is Simple? - ColinWright
http://www.cs.berkeley.edu/~necula/cil/cil016.html

======
joshuaellinger
Simple Language != Simple to use.

C is a small language. C is gussied-up assembly. It is the ballet of
programming language.

People who say C is simple really mean that it doesn't do anything behind my
back.

~~~
anon1385
Is it a small language? The standard takes a dense 179 pages to describe the
language. I guess it's debatable if that is small or not. I certainly don't
agree that it's a simple language though. There are myriad rules and
exceptions to rules to remember. When and where one integer type will be
converted to another. When overflow is defined and when it's undefined. Just
those things alone mean that much real world C code is full of undefined
behaviour[1] just related to arithmetic on numeric types, because the rules
are hard to remember and reason about.

John Regehr asked people to submit str2long() implementations that didn't
execute undefined behaviour. Despite being well warned about avoiding
overflows and other undefined behaviour only 35 of the 78 submissions passed
the test suite[2].

Even a trivially small 2 line C function can result in different results on
the common compilers depending on which compiler and which optimisation level
you use[3]. Compiler engineers struggle to agree on and correctly implement
the "simple" rules of C. To me that's an indication that they aren't so
simple.

I would say that C doesn't do anything behind your back as long as you don't
ever use an optimising compiler or you pay very very close attention to the
standard. If you forget then suddenly your check for pointer arithmetic
overflow has been "helpfully optimised away" behind your back for reasons that
are not immediately apparent at all[4].

I do agree that apparently simple rules can lead to complexity in use, and C
is full of this too. For example you would think in a lower level language
like C viewing a chunk of memory as a different type would be trivial, but the
strict aliasing rule means you have to be a language lawyer to understand what
is allowed and what isn't[5] as it makes the intuitive solution into undefined
behaviour (which is extra pernicious since most of the time it will work as
intended, until somebody compiles the code with a smarter compiler at a high
optimisation setting).

[1] [http://blog.regehr.org/archives/963](http://blog.regehr.org/archives/963)

[2] [http://blog.regehr.org/archives/914](http://blog.regehr.org/archives/914)

[3] [http://blog.regehr.org/archives/482](http://blog.regehr.org/archives/482)

[4] [http://pdos.csail.mit.edu/~xi/papers/stack-
sosp13.pdf](http://pdos.csail.mit.edu/~xi/papers/stack-sosp13.pdf) example 1

[5] [http://blog.regehr.org/archives/959](http://blog.regehr.org/archives/959)

~~~
TheCoelacanth
Compared to other languages, 179 pages for the full specification is pretty
small. Java and C sharp have that more than three that size. C++ is roughly
five times as large. If you're a fan of functional languages, Haskell is
almost twice as large. Even Scheme, an extremely simple language, is only
about 30 pages shorter.

~~~
gnuvince
The book "The Definition of Standard ML" is ~150 pages for a language that is
simple (IMO) and rich. Bonus, the definition is not written in prose, but with
typing rules and operational semantics.

~~~
jlgreco
> _Bonus, the definition is not written in prose, but with typing rules and
> operational semantics._

While that is certainly nice, I suspect it makes it hard to really compare
based on just pagecount. I doubt prose and such a formal definition like that
are equally dense.

~~~
gnuvince
Agreed. From what I've read [1], the syntax rules and semantic rules take up
about 40 pages with the rest of the book being "introduction, exposition, core
material, appendices and index".

[1] [http://mythryl.org/my-
Mythryl_is_not_just_a_bag_of_features_...](http://mythryl.org/my-
Mythryl_is_not_just_a_bag_of_features_like_most_programming_languages_.html#section:notes:engineered)
(go to the middle of the page or search for "pages")

------
vadman
Simple != easy, as per this great talk by Rich Hickey:

[http://www.infoq.com/presentations/Simple-Made-
Easy](http://www.infoq.com/presentations/Simple-Made-Easy)

~~~
JeanPierre
To get an even better explanation on what the word "simple" means and what it
derives from, I would recommend "Simplicity Ain't Easy" by Stuart Halloway[1].
While it is very similar to "Simple made Easy", it focuses a lot more on the
etymology of the words "simple" and "complex" and how people misuse the word.

[1]:
[http://www.youtube.com/watch?v=cidchWg74Y4](http://www.youtube.com/watch?v=cidchWg74Y4)

------
EthanH
Someone else at Cal summarized C succinctly when he wrote "C is the machete of
programming languages. It's great for clearing a quick path through the jungle
of programming problems, but be alert as there are no safety mechanisms to
protect the wielder of this wonderful weapon." -David Patterson
[http://www.informit.com/promotions/promotion.aspx?promo=1389...](http://www.informit.com/promotions/promotion.aspx?promo=138913)

~~~
pjmlp
Warnings as errors, turn on all of them, pedantic mode and static analysers
running alongside unit tests on every build.

Even then, there might be dragons waiting for you.

------
rdtsc
> Who Says C is Simple?

I do.

C is like chess. There are few simple rules (compared say to C++ spec). But
knowing the rules doesn't mean you'll end up beating Kasparov. It still takes
skill and practice to be a good C programmer.

So, the language itself, is pretty simple compared to other popular
programming languages as far as it has a simple syntax. I can teach someone
the rules of chess pretty quickly, It doesn't mean I'll create a chess grand-
master in a day or two.

~~~
cygx
>> Who Says C is Simple?

> I do.

I disagree. C comes with a lot of edge cases and subtleties that can surprise
even people like myself who 'know' C.

Some examples: Exact semantics of restrict, C99 inline semantics (eg I wasn't
aware that it's possible to make a non-extern/non-static inline definition
into an extern one with a single redeclaration), effective typing rules and
whether it's possible to circumvent them, whether or not it's undefined
behaviour to cross boundaries of the sub-arrays of a multi-dimensional array
if it doesn't happen in a single expression, ...

~~~
rdtsc
The difference is between gimmicky ways one can use ambiguity to produce
convoluted edge cases vs knowing enough to do actual work. Sure C being low
level and and having a relatively short specification is ripe for compiler
specifics and hardware specific hacks. But still think it has the basic core
defined and that is pretty small.

With C++ and its typical libraries it is a bit different. It has so many
features (templates, classes, streams, friends, shared pointers, unique
pointers, distructors, constructors, polymorphism rules, and combination of
those) that code gets complicated without using obscure features, just
sticking to the standard ones gets hairy and needs someone who knows the whole
spec.

------
null_ptr
These little teasers are starting to annoy me. Congratulations, you can twist
C to be as unintuitive as you'd like it to be. But why would you write code
like that to begin with? I don't care what value __ return -3 >> (8 *
sizeof(int)); __ returns because no sane program anyone contributes to will
have constructs like that.

~~~
eps
You are missing the context, which is that of an unambiguous parsing of any
valid C code. Examples given are teasers for the parser, not a prorammer.

------
acjohnson55
C is not a simple language. It is confoundingly difficult for a machine to
parse. Add in the fact that it pretty much requires a preprocessor to be
useful and it get's even more difficult. Clearly the article is not about
whether the C paradigm is complex. It obviously isn't, because it doesn't come
with a whole lot of pre-baked abstractions. But for parsing reliably, it's an
absolute nightmare.

------
dvt
tl;dr: C _is_ simple. Bad code is bad. Shitty compilers are shitty.

This needs to be qualified. Simple compared to what? C _is_ simple when
compared to C++, Java, and many other languages. Obviously, C is simple
because it lacks syntactic sugar, classes, polymorphism, templating
(generics), memory management, etc. So, uh, yeah. It's simple.

The fact that _bad code_ can be written in a language doesn't really make the
language non-simple. I could write bad code in any language (often times I
do!) - so I'm not sure how any judgment about a language can be made with
these kinds of examples. As far as the GCC/VC examples are concerned, they are
a non-issue. Shitty compiler keywords are shitty[1]. We know. This is one of
the many pains of writing cross-platform code in compiled languages. These
examples are contrived and I highly doubt most come from production code.

[1] [http://stackoverflow.com/questions/3437404/min-and-max-
in-c/...](http://stackoverflow.com/questions/3437404/min-and-max-
in-c/3437484#3437484)

~~~
lmm
Signed/unsigned conversions and conversion to/from pointers of functions and
arrays cause real, production-code confusion. In terms of how much mental
overhead the language takes up - how many edge cases you have to keep in mind
to read code in the language (because if you're programming right then you
spend more time reading than writing) - no, it's not simpler than Java,
because classes, generics and memory language are more straightforward and
less confusing than C's typing rules.

Languages have a complexity budget, and C blew its on a squillion different
kinds of integer and a bunch of arbitrary-seeming rules to minimize the amount
of typing needed to change between them. That was a good tradeoff in its day,
where hand-optimization was practical and program source needed to be small.
It's not an appropriate language to use now outside of very specific
circumstances.

~~~
dvt
You say this as if Java doesn't have its fair share of oddities. It seems
you're saying that the highly granular typing found in C doesn't have an
equally-annoying analogue in Java; it does[1].

[1]
[http://www.javalobby.org/java/forums/m92125315.html](http://www.javalobby.org/java/forums/m92125315.html)

~~~
lmm
That's not a corner case, it's a particular instance of a general Java
misfeature. In Java, "==" is an abomination that you must never use and will
give essentially random results; "equals" is what you use to compare things
for equality, and this is consistent throughout your codebase. Is it annoying?
Yes. But it's easy to remember because it applies everywhere; any case of "=="
in a java source file sticks out like a sore thumb. (Also any use of an array,
anywhere, for anything. Sometimes I think I should write "Java: the good
parts").

------
nautical
[https://news.ycombinator.com/item?id=6059249](https://news.ycombinator.com/item?id=6059249)

------
alco
> Why does the following code return 0 and not -1?

This questions appears twice in the article.

Note that shifting a 32-bit integer (no matter the sign) by 32 is undefined
behavior (§6.5.7, 3). I'm using Clang, and its `int` has 32 bits even on a
64-bit system.

C is deceptively simple.

------
rootbear
I would not have thought that

    
    
        return ({goto L; 0;}) && ({L: 5;});
    

was valid C. GCC 4.4.7 chokes on it, using -std=gnu99. Is this a C11 thing?

~~~
apaprocki
To the compiler, it shouldn't care where a label exists. If you're telling it
to go to label L:, execution will jump there. GCC appears to look out for the
programmer and always prints an error (and I don't see an option to shut it
off). Oracle Studio compiles it with no issue and the above statement returns
1.

------
harrytuttle
All this proves is that you _can_ write ugly C and GCC has some crocks in it.

These problems are all optional.

------
auctiontheory
C is a tool. A powerful and flexible tool.

If a programmer wants to write obfuscated code (as in the examples), C allows
that.

If a programmer wants to write clean, maintainable and documented code, C
allows that too.

------
qdpb
Nobody?

What these examples demonstrate is not that C is "hard", but that it's
powerful.

~~~
pjmlp
There isn't a single feature in C that isn't present in more safer languages
like Modula-2 or Turbo Pascal. They are as powerful, or even more, than C.

Just because C won the battle with those languages, it does not mean we need
to live with its design issues ad eternum.

~~~
CoryG89
How are you going to go about fixing C? And what design issues? It's pretty
bare bones on top of the assembly. What should change?

~~~
GregBuchholz
* Eliminate sources of undefined behavior * Remove implicit type casting * Have a way to check the hardware overflow flag from the language * Saner syntax for declaring variables (i.e. function pointers) * get rid of null-terminated strings * a module system instead of the preprocessor

~~~
pjmlp
Additionally:

\- Proper arrays with bound checking, which can locally be turned off, if
required for performance reasons

\- Explicit operation for converting arrays into pointers

~~~
nwmcsween
-fstack-protector && -D_FORTIFY_SOURCE=2

~~~
pjmlp
Where is that defined in the C standard?

Because you see, if it is compiler specific, it is not part of the language.

------
bwy
Go Berkeley, awesome prof.

