
The Evolution of C Programming Practices: A Study of Unix (2016) [pdf] - signa11
https://www2.dmst.aueb.gr/dds/pubs/conf/2016-ICSE-ProgEvol/html/SLK16.pdf
======
jacksmith21006
I went to pick up my son at University studying CS recently.

He had replaced Win 10 on his tower with Ubuntu and was writing a pretty
complex program in C++. Doing some AI/ML primitives.

Thought to myself I am old yet went to the same University and would have been
coding in C++ on Unix (Ultrix or OSF or Solaris) at that point in my life.

How in the world have we not moved on?

BTW, his code was so, so pretty. Mine would have been ugly. Kids are far
better coders than we were 30 years ago. I was crazy proud.

~~~
rayiner
Why would we move on? C is a local maximum, like a Brayton-cycle gas turbine
engine. A 787 engine is no more than 50% more efficient than a 707 engine,
despite _tens of billions_ of dollars in investment and decades of R&D between
the two. And we won't get even another 50% improvement any time soon. It'll
take a fundamental change in technology to move past that plateau.

C is likewise on a plateau. There is nothing categorically better than C for
what C does; just different points in the design space that make different
trade offs. _E.g._ Rust gives you memory safety through ownership, but you
can't even make a doubly-linked list while staying within the ownership
system. In a different point in the design space, Go has garbage collection,
but that comes with its own trade offs. All of these might become obsolete
when we have _e.g._ quantum computers. But in the meantime we are just playing
with different trade offs, because we've captured all the low-hanging fruit
decades ago.

~~~
nickpsecurity
"E.g. Rust gives you memory safety through ownership, but you can't even make
a doubly-linked list while staying within the ownership system."

Your otherwise good comment presents this like all-or-nothing when comparing
it to C. To start with, C and Rust are equivalent for anything it cant borrow-
check. Then, it's better than C in safety or expressiveness from there.
Second, C developers use plenty external libraries to supplement the
language's capabilities, including with ASM breaking abstraction gap.
Likewise, whatever Rust cant handle can be proven safe with external tool,
wrapped in safe interface, and used from there. This has been done in
languages from C to Haskell.

So, there's different design points for sure but doubly-linked-list isnt an
example of unfit for purpose vs C. If anything, that it's equal or better in
safety making it a better instance in that design space. Likewise for Clay
which was more C-like and used in device drivers.

~~~
rayiner
But at the end of the day, I wouldn’t call it a categorical improvement over C
because you still have to drop down to what is basically C to handle essential
tasks like declaring circular data structures. Instead, it makes a special
(but common) case of tree-like data structures safer to handle. This is no
fault of Rust—it is a theoretical limitation. There is no single model that
let’s you express arbitrary data structures, gives you memory safety, and
doesn’t require runtime heap walking.

~~~
pjmlp
The biggest difference is that in C, 100% of the code is unsafe and possibly
UB, while in other system languages with _unsafe_ code blocks those low level
tricks are pretty easy to track down.

~~~
nialv7
Many seem to have the illusion that if you use unsafe in Rust, bad things can
only happen in the unsafe blocks.

This is wrong. If your unsafe block fails to maintain the required safety
guarantees (I personally don't know what they are), then the safe code could
break terribly as well. And figuring out which unsafe block is the culprit can
be really hard too.

~~~
pjmlp
I don't have that illusion.

Logic errors are always bound to happen in any language.

Problem in C is that every single line of code is either unsafe or potentially
UB, specially at -O3.

And yeah everyone can always assert it doesn't happen to them, but that
assertion does not hold when working in teams or using third party code.

So it is already a big security improved if the attack space is largely
reduced.

Also unsafe blocks aren't nothing specific to Rust. A few system programming
languages since the 60's have them.

------
bio_end_io_t
It bothers me when "goto" is assumed to be "a maligned language construct".

People who think "goto" is evil should also give up the other jump statements:
continue, break, and return (and also switch, though its not listed as a jump
instruction in the C standard, at least not in '89 or '99).

You can see some contradictions in the paper regarding goto. For example, they
state that deep nesting should be avoided, but goto should be avoided as well,
even though one benefit of using goto is to limit nesting depth. From the
Linux Kernel coding style doc: \- unconditional statements are easier to
understand and follow \- nesting is reduced \- errors by not updating
individual exit points when making modifications are prevented \- saves the
compiler work to optimize redundant code away ;)

~~~
WalterBright
As a longtime C and goto user, defending the practice many times, I discovered
something interesting.

My uses of goto can be replaced with nested functions! The code is nicer,
cleaner, and the equivalent code is generated (the nested functions get
inlined).

Of course, nested functions aren't part of Standard C, but they are part of
D-as-BetterC. (D has goto's too, but I don't need them anymore.)

~~~
bgongfu
Just wanted to mention that GNU C has supported nested functions since
forever, it's one of the main reasons I prefer GCC over Clang these days.

~~~
buserror
Yes and it annoys me no end that clang has refused to implement them as well,
as they were part of my codebase as well... How better to implement stuff like
qsort() callbacks than with a simple, contextual small function _just over it_
??

YES it is dangerous due to stacks etc etc but hey, we're grown up adults, not
script kiddies.

~~~
pjmlp
> YES it is dangerous due to stacks etc etc but hey, we're grown up adults,
> not script kiddies.

That is how CVEs are born.

~~~
bgongfu
Using a chainsaw without paying attention is how fingers are cut off, using
that as an argument against making chainsaws easier to use doesn't make any
kind of sense.

~~~
pjmlp
Good chainsaws have protection mechanisms builtin.

C does not.

~~~
bgongfu
Good for you maybe, projecting that on people who have a clue what they're
doing doesn't make sense either. They're mostly messing up chainsaws as well
these days, for the same misguided reasons.

------
drewg123
There seems to be an oddity with their LOC for FreeBSD 2.0. At ~6M LOC, it is
roughly 3x the size of the FreeBSD releases just before and just after it. The
size steadily creeps up, but we don't see anything else that big until FreeBSD
5, so there seems to be something fishy there..

~~~
klez
2.0 was when they revamped the code base because of copyright problems. Check
this out:

[https://www.freebsd.org/releases/2.0/notes.html](https://www.freebsd.org/releases/2.0/notes.html)

~~~
drewg123
Sure, I initially though that might be it. Eg, maybe some of the non-x86
support from 4.4BSDL was not pruned out of 2.0, for example. But I checked,
and 2.0 only has i386 support.

What strikes me as odd is that the very next minor release (2.0.5) is listed
as 2.1M LOC, while 2.0 is listed as 6.1 MLOC. That's a reduction of 66%.
Looking at a diff between the 2 releases, I do not see anything to explain
that.

------
badrabbit
This helped me out a lot as a base guidline for writing good C code:
[https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...](https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Developing_Safety-
Critical_Code)

------
pjmlp
So no discussion about any positive evolution of security best practices or
static analysis, other than single line remarks?

------
shreyasetia
I have enrolled for CSE in [http://www.thapar.edu](http://www.thapar.edu)
college this year. Can I join this C programming course with engineering?

I know basics of C and C++ !!

------
jandrese
I wonder how many of those LOC are just support for more and more hardware
over the years? The nice thing about drivers is that they're almost completely
modular so they don't really bloat the code even though they shoot the LOC
count through the roof.

------
dm319
DSTMT/statement density plot is nice to see. Although it wasn't my time, I've
seen early code (or code from very memory-limited machines), and always been
amazed out how tightly packed it seems to be. And remarkably unreadable too.

------
jxub
A really good read! Is there a repository to share papers related to source-
code analytics like this one?

~~~
irundebian
Not a repository but many references about source-code analysis focused on
software security/safety:

[https://www.us-cert.gov/bsi/articles/tools/source-code-
analy...](https://www.us-cert.gov/bsi/articles/tools/source-code-
analysis/source-code-analysis-tools---references)

------
jradd
I know this title is not technically click bate, but it got me.

