Hacker News new | past | comments | ask | show | jobs | submit login
The Evolution of C Programming Practices: A Study of Unix (2016) [pdf] (aueb.gr)
229 points by signa11 10 months ago | hide | past | web | favorite | 124 comments



I went to pick up my son at University studying CS recently.

He had replaced Win 10 on his tower with Ubuntu and was writing a pretty complex program in C++. Doing some AI/ML primitives.

Thought to myself I am old yet went to the same University and would have been coding in C++ on Unix (Ultrix or OSF or Solaris) at that point in my life.

How in the world have we not moved on?

BTW, his code was so, so pretty. Mine would have been ugly. Kids are far better coders than we were 30 years ago. I was crazy proud.


Why would we move on? C is a local maximum, like a Brayton-cycle gas turbine engine. A 787 engine is no more than 50% more efficient than a 707 engine, despite tens of billions of dollars in investment and decades of R&D between the two. And we won't get even another 50% improvement any time soon. It'll take a fundamental change in technology to move past that plateau.

C is likewise on a plateau. There is nothing categorically better than C for what C does; just different points in the design space that make different trade offs. E.g. Rust gives you memory safety through ownership, but you can't even make a doubly-linked list while staying within the ownership system. In a different point in the design space, Go has garbage collection, but that comes with its own trade offs. All of these might become obsolete when we have e.g. quantum computers. But in the meantime we are just playing with different trade offs, because we've captured all the low-hanging fruit decades ago.


Because it is a Swiss cheese of security exploits.

Our computers and IoT devices will never be properly safe while the underlying kernel and bare metal stuff keeps being written in C.

C is the job safety of black hat hackers, security experts, companies selling memory protection utilities and anti-virus.

It was the explosion of free UNIX clones with *BSD and GNU/Linux that spread it around, until then it was just yet another systems programming language, already being replaced by C++ on desktop computer systems on the mid-90's.


"E.g. Rust gives you memory safety through ownership, but you can't even make a doubly-linked list while staying within the ownership system."

Your otherwise good comment presents this like all-or-nothing when comparing it to C. To start with, C and Rust are equivalent for anything it cant borrow-check. Then, it's better than C in safety or expressiveness from there. Second, C developers use plenty external libraries to supplement the language's capabilities, including with ASM breaking abstraction gap. Likewise, whatever Rust cant handle can be proven safe with external tool, wrapped in safe interface, and used from there. This has been done in languages from C to Haskell.

So, there's different design points for sure but doubly-linked-list isnt an example of unfit for purpose vs C. If anything, that it's equal or better in safety making it a better instance in that design space. Likewise for Clay which was more C-like and used in device drivers.


But at the end of the day, I wouldn’t call it a categorical improvement over C because you still have to drop down to what is basically C to handle essential tasks like declaring circular data structures. Instead, it makes a special (but common) case of tree-like data structures safer to handle. This is no fault of Rust—it is a theoretical limitation. There is no single model that let’s you express arbitrary data structures, gives you memory safety, and doesn’t require runtime heap walking.


Again, that sounds more limited or narrow than what people are actually building in safe Rust. The ecosystem has piles of software that would've been in C. Likewise, that it cant handle a few primitives doesnt matter if I can prove them separately using a VCC-like tool for a subset of unsafe Rust, one that generates Rust, or one that does assembly. That these situations are fairly uncommon is suggested from fact that people always bring up the same examples which C does no better on by itself. Whereas, all the rest of the code outside those components will be safer and more maintainable than C. So, it beats it on a number of dimensions at language level despite remaining unsafety.

Whereas, if we count tooling, C is still safer than Rust given there's more verification tools available plus certifying compilers. You have to use many of them with a subset of C easy to analyze, though.


The biggest difference is that in C, 100% of the code is unsafe and possibly UB, while in other system languages with unsafe code blocks those low level tricks are pretty easy to track down.


Many seem to have the illusion that if you use unsafe in Rust, bad things can only happen in the unsafe blocks.

This is wrong. If your unsafe block fails to maintain the required safety guarantees (I personally don't know what they are), then the safe code could break terribly as well. And figuring out which unsafe block is the culprit can be really hard too.


I don't have that illusion.

Logic errors are always bound to happen in any language.

Problem in C is that every single line of code is either unsafe or potentially UB, specially at -O3.

And yeah everyone can always assert it doesn't happen to them, but that assertion does not hold when working in teams or using third party code.

So it is already a big security improved if the attack space is largely reduced.

Also unsafe blocks aren't nothing specific to Rust. A few system programming languages since the 60's have them.


A modern, ergonomic language with memory safety without GC is definitely better then anything C had to offer. You can easily write C code in Rust with unsafe; you can't write C code that gives you the guarantees of Rust, no matter what you do.


Even in the mid-90's, the only thing C had to offer over something like Turbo Pascal 6.0 on MS-DOS, was portability to other OSes.

Feature wise it was just plain inferior, in safety, modularity, compile time, OOP support.

Apple was on Object Pascal, then moved into C++. IBM was adopting C++ on their OS/2 tooling, everyone was building C++, Pascal and Basic frameworks on top of Win16, later BeOS also went C++.

Then free UNIX clones happened.


Not that with Pascal you'd never had dangling pointers, double-free errors, uninitialized memory access, or smashed stack.

Security-wise, Pascal offered vastly more sane strings (counter-prefixed, not null-terminated) and built-in support for access index checking and value range checking (at a constant runtime cost).


What I don't understand is: where are all the portable, performant, alternative, user-friendly operating systems written in Pascal, Rust etc.

Why are there near zero alternatives to these "swiss-cheese" operating systems after all these years available for the mass-market?

Surely backwards compatibility with existing user-space programs can't be the whole issue here. You could feasibly write an operating system with a Rust-based kernel, which allowed "unsafe" C/C++ programs to run in their own protected, virtual memory space, no? All you need to provide at the most basic level is a compatible system call interface.

EDIT: to make my point clearer, there are really just two major types of attack surface for C exploits against UNIX (excluding obvious language-agnostic hardware issues eg. Meltdown):

a) against the kernel (written in C)

b) against userspace (the majority of which is written in C)

Which class of exploits are we hoping to eliminate here, (a) or (b)? If the answer is the latter, then sure, write a replacement for libc, and the whole ecosystem that sits on top of that. When that is done, you still have a kernel written in C that is vulnerable to attack.

If it is the former, then rewrite the kernel in Rust or whatever "safe" language you like. Provide an optional backwards compatible syscall interface for "legacy C/C++" or "unmanaged" code, and let users enable it at their own risk. This would provide a path forward for systems comprised of safe kernel and safe userspace, while allowing older code to be run where needed.

I am eternally curious, when the UNIX design is approaching 60 years old, why the latter has not been done yet.


It has been done, you just weren't paying attention.

Burroughs B5500 was written in ESPOL, an Algol dialect for systems programming, later improved as NEWP.

This is still sold by Unisys as ClearPath MCP mainframes.

Xerox PARC initially used BCPL, but then created Mesa, used to develop their Xerox Star workstations, which then evolved into Mesa/Cedar.

The Solo operating system was written in Concurrent Pascal for the PDP 11/45.

IBM did all their RISC research using PL/8, a safe systems programming language based on PL/I with an architecture that you would find nowadays on LLVM. They only adopted C for RISC when they decided to bet on UNIX workstations as means to sell their RISC research.

UCSD Pascal OS System was a mix of interpreted and AOT compiled environment, later adopted by Western Digitial for some of its firmware.

Apple designed Object Pascal with input from Wirth, implementing Lisa and Mac OS on a mix of Assembly and Object Pascal, later switching to C++ as means to make the platform more appealing to other developers.

None of IBM mainframes were developed in C, rather a mix of Assembly, PL/S, PL/8. They only started to move into C and C++ as POSIX compatibility on those mainframes became a selling point as well.

There are plenty of other examples available.

Backwards compatibility, lack of actual research on most companies with eyes only on short term profits, UNIX clones available for free, managers not willing to bet on the team even if it is an up-hill battle, there are plenty of reasons why we aren't still there, most of them not technical.

There are alternatives to UNIX's design, hence why POSIX has been loosing relevance all these years.

On Apple platforms, Swift is the future even if it takes a couple of years to reach there, that is clearly the way Apple sees it.

"Swift is a successor to both the C and Objective-C languages."

-- https://developer.apple.com/swift/

Even on NeXTSTEP, drivers were written in Objective-C, and UNIX compatibility was a way to embrace UNIX software then extend it with Foundation code.

On Google platforms, the fact that the underlying kernel is an UNIX clone is irrelevant, the Web platform, Android Runtime or the NDK APIs don't expose any of it on the official APIs.

On Windows, C is considered done and the future for systems programing on the platform is a mix of C++ and .NET Native.

Many embedded vendors do sell Basic, Pascal, Oberon, Java and Ada bare metal compilers.

It is slowly getting there, we just need a couple of developer generations to get it back on track.


I'm well aware of many of the above systems. You may have missed the part where I specified "mass-market" though, right at the start. Perhaps you weren't paying attention.

UNIX itself predates POSIX by a large margin and early versions were written in assembly, not C.

I'm talking about the bulk of modern commodity desktop and web application development, which is really the lion's share of modern software development. This overwhelmingly occurs on and for workstations running Linux/Windows/BSD, all derived from C, with large parts of the userland still written in C code, or in languages which were bootstrapped from C, depend on C, or depend on a thin language-specific wrappers around existing C libraries.

Many if not most of the popular languages supported by LLVM have a frontend written in C/C++.

Sure these languages can become self-hosted, eg. Rust (albeit dependent on LLVM, written in C/C++), or Golang. But I posit that until large parts of these systems (kernel _and_ userland) and the extensive collection of C libraries which are used for application development today are rewritten or replaced, you will not see the huge paradigm shift in commodity application development you are describing. I think it's going to take longer than that. I am not as optimistic as you. Stakeholders, programmers and product managers alike tend to favour the path of least resistance.

C is the lingua franca of these existing systems, and forms the common basis around which multiple higher-level languages currently wrap and leverage.

Which new common denominator is going to take its place? LLVM bytecode? Are we going to rewrite libz/libpng in each new language-of-the-week instead?

This is even before getting into embedded, where C is still king, C++ is really only starting to become more popular over the last 10 years or so, and Rust is barely a blip on the radar. Linux's popularity growth in this area is phenomenal, unmatched by any other as well.


On IBM and Unisys mainframes the common denominator is the "language environment", the intermediate format that has allowed them to evolve all these years since the 60's.

On Windows the common denominator are COM and .NET, with COM upgraded to UWP on recent versions.

On Android the common denominator is Dalvik bytecode.

On macOS/iOS the common denominator could be indeed LLVM bitcode, depending on how much Apple would like to deviate from upstream to make their own toolchain actually hardware independent.

Natural Linux's popularity has sky rocketed, UNIX's path to success and its clones, has always been free beer, which tastes even better when it comes with source code.

I am optimistic, but I am also aware it won't happen on my lifetime, changing mentalities requires changing generations, one person at a time.


Z/OS is written in, amongst other things, in C/C++.

.NET is implemented in C/C++.

Dalvik is written in C. And the specification allows for calling into unmanaged code via JNI, mostly for performance reasons.

LLVM is written in C/C++.

Windows, MacOS, Android/Linux kernels are all written in C.

The Swift, Rust, Objective-C communities et al should put their money where their mouth is after all these years, and provide feasible replacements for just _one_ of these consumer systems.

Otherwise it comes off at worst as grandstanding, at best, navel gazing.

Where is the consumer ready Rust replacement for LLVM? Where is the consumer ready Obj-C replacement for Dalvik? Where is the consumer ready operating system kernel based on .NET? Where is the consumer ready Swift library of high performance video codecs?

These languages need some sort of flagship IMHO if they are ever to displace C/C++. Developer mindshare is important to getting critical mass and these languages are fighting each other for it, while C/C++ is still sort of sitting by eating their lunch and getting shit done.


First of all, there is no such language as C/C++.

C++ although tainted by copy-paste compatibility with C89, does offer the language features for security conscious developers to make use of, and C++17 is quite a pleasure to use.

C on the other hand could be nuked for what I care, it has been clear since 1979 that security and C would never go together.

Now regarding your assertions.

z/OS was initially written in a mix of Assembly and PL/I. C and C++ came later into the picture as the system got a POSIX personality.

The way to write libraries exposed to all languages available on z/OS is via the z/OS Language Environment.

Chapter 9.12 of z/OS Basics, https://www.redbooks.ibm.com/redbooks/pdfs/sg246366.pdf

.NET has hardly any pure C code, rather C++.

Even so, the C# and VB.NET compilers have been bootstrapped thanks to Rosylin and now C++ is gone from the compiler side. F# was bootstrapped from the early days.

Since Roslyn came into production, the .NET team started planning moving parts of the runtime from C++ to C#.

"So, in my view, the primary trend is moving our existing C++ codebase to C#. It makes us so much more efficient and enables a broader set of .NET developers to reason about the base platform more easily and also contribute."

https://www.infoq.com/articles/virtual-panel-dotnet-future

Also the first version of .NET's GC was actually prototyped in Common Lisp,

https://blogs.msdn.microsoft.com/patrick_dussud/2006/11/21/h...

Unity introduced HPC# at GDC 2018 and they are now in the process of migrating C++ engine code into HPC#.

Both the requirements of HPC# as C# subset for performance critical code, as the experience with Singularity and Midori drove the design of C# 7.x features regarding low level GC free data structure management.

"Evolving Unity"

https://www.youtube.com/watch?v=aFFLEiDr3T0&list=PLX2vGYjWbI...

"Safe Systems Programming in C# and .NET"

https://www.infoq.com/presentations/csharp-systems-programmi...

"Safe Systems Software and the Future of Computing"

https://www.youtube.com/watch?v=CuD7SCqHB7k

Dalvik was written in C++ and is dead since Android 5.0.

ART was written in a mix of Java and C++ between Android 5 and 6 as AOT compiler at installation time.

It was rebooted for Android 7, where it is now a mix of an interpreter written in highly optimized Assembly, with JIT/AOT compilers written in a mix of Java and C++, making use of PGO data.

On Android Things, userspace device drivers are written in Java.

https://developer.android.com/things/sdk/drivers/

Sun back in the day toyed with the idea of having Java on the Solaris kernel,

https://www.researchgate.net/publication/220938922_Writing_S...

And on SunSPOT devices, http://www.sunspotdev.org/

Fiji VM and PTC Perc Ultra can be compiled AOT to native code and run Java bare-metal.

https://www.ptc.com/en/products/developer-tools/perc

http://fiji-systems.com/

Oracle is also in the process of following JikesRVM example, and with the help of Graal, bootstrap Java thus reducing the dependency on C++, via the Project Metropolis.

https://www.youtube.com/watch?v=OMk5KoUIOy4

Graal, which incidently has better

LLVM is written in C++. Yes it does expose a C API, but it also has bindings for other languages.

At WWDC 2017, Apple announced that launchd and the dock were rewritten in Swift. I expect other OS components to be announced at this years' WWDC.

Windows kernel was written in C. Since Windows 8, C++ is officially supported on the kernel and given the company's stance on usefulness of C, they have been migrating the code to compile as C++.

"We do not plan to support ISO C features that are not part of either C90 or ISO C++"

https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...

Now Visual C++ has been updated up to C11 library compatibility, as per ISO C++17 compliance requirement, that's all.

"We have converted most of the CRT sources to compile as C++, enabling us to replace many ugly C idioms with simpler and more advanced C++ constructs"

https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-great...

https://www.reddit.com/r/cpp/comments/4oruo1/windows_10_code...

Fuchsia's TCP/IP stack, WLAN services, disk management, package manager, update service are written in Go.

https://groups.google.com/forum/#!msg/golang-dev/2xuYHcP0Fdc...

https://fuchsia.googlesource.com/garnet/+/master/go/src

Genode OS, ARM Mbed OS and Arduino Wiring are written in C++.

As I mentioned, change requires replacing generations one person at a time.

First lets turn C into the COBOL of systems programming, then we worry about C++ afterwards.


Thank you for the long list of examples. I think some of them are a little fringe and fall outside the scope of my argument (eg. Sun literally "toying" with Java device drivers, or Microsoft compiling their C code with a C++ compiler), and you haven't refuted my point that the kernels of the most popular systems are still written in C for the most part.

Listing endless reams of discontinued research systems (eg. Singularity and Midori) isn't reinforcing your point.

Redox (Rust) and Zircon (C++) are more what I'm alluding to. However Redox AFAIK doesn't even have USB drivers yet and Fuschia is even less useful in its current state. These systems have to be available, and I venture, usable, in order to be able to displace eg. Linux which is already both of those things.

I'm hopeful we can see more progress in the next few years on some of these. It's nice to see some serious attempts in this space, however with the current pervasiveness of and dependencies on C at so many layers of these systems, I suspect it is really going to take much longer than hoped.


Singularity and Midori were killed by political reasons, you just need to see Joe Duffy stories about how it all went down.

However .NET AOT compilation on Windows 8 was taken from Singularity Bartok's compiler, while Midori influenced async/await, TPL, improved references on C# 7.x and .NET Native on UWP.

I guess you missed my "As I mentioned, change requires replacing generations one person at a time.".

So yeah, it is going to take awhile until all those devs and managers that are religiously against safe systems programming are gone, replaced by newer generations with more open mind.

The only way to convince Luddites is to wait for change of generations, unfortunately also means one doesn't get to see change him/herself.


Tell you what is most impressive with Turbo Pascal is the entire thing on a floppy was 32k. Do not know how possible.

This was less than 1/10 of a 360k floppy.


The actual productivity improvements are things like

* distributed version control,

* open source culture and infrastructure (GitHub, GPL and Apache licenses...),

* dependency management and build tools (pull in many man years of effort into your project with a single declaration),

* HTTP and REST APIs (accidentally taught developers the importance of idempotency, statelessness, uniform interfaces, etc.),

* cloud infrastructure (no more need to purchase, install, configure, and secure expensive workstations before starting your project and business).

As for programming languages, there have been very few actual productivity gains since the standardization of Common Lisp.


>As for programming languages, there have been very few actual productivity gains since the standardization of Common Lisp.

Although this is mostly true, I think you forgot to mention that the latest trend of repositories with tens of thousands of libraries are a boost in productivity as well.

Even the Common Lisp world received a big boost in productivity when the Quicklisp library/system repository was created and consolidated.


* Decades of coherent thought on code robustness and maintainability.


> As for programming languages, there have been very few actual productivity gains since the standardization of Common Lisp.

This isn't true. Common Lisp had more than a few problems which later languages fixed:

1) Module system/encapsulation

Common Lisp has a notoriously fickle and opaque system for importing other code into your space.

2) Sequences as sequences rather than linked lists terminated by null.

Dealing with the fact that a list is a null-terminated linked list instead of an actual sequence/vector/tree is a big deal. Try implementing a printer for dotted pairs at some point. Talk about annoying and it costs you enormous performance.

I could keep going, but there are far better qualified people to talk about the many shortcoming of Common Lisp.


>Dealing with the fact that a list is a null-terminated linked list instead of an actual sequence/vector/tree is a big deal.

If you need a vector, just use a vector, don't use a list. Common Lisp has very flexible vectors and arrays.

Lists are the way they are because they're the fundamental block for manipulating/constructing lisp code itself.

>Common Lisp has a notoriously fickle and opaque system for importing other code into your space.

Care to elaborate? You just load it; everything is in its own, separate namespace.


It's an interesting observation, but I would say slightly misleading because different levels of the stack change at different rates.

The low level C/Unix layer is the foundation for everything else, so it has not changed very much. That's actually a good thing! It's stable and incrementally improving.

The higher levels of the stack have wildly changed -- JS, Python/Ruby/Perl, web frameworks, R and pandas, TypeScript, etc. not to mention all this cloud stuff. JS and Python are on microcontrollers!

I think that is working as intended. But if anyone had the misconception: we don't have the bandwidth to replace the lower levels of the stack! Microsoft doesn't either, and they have billions of dollars.

Nobody rewrites stuff "just because". That's why C++ is still relevant. (It's also why I think Rust is not really a C++ "replacement". It might be used for NEW stuff, but there will still be tons of C/C++ code around for decades. Again, the rate of rewriting/replacing is slower than you think.)

Now, to argue the opposite side: writing applications (as opposed to low-level libraries) in C++ does feel like a smell. Sometimes C++ is still a justified application language.

But it does feel like its territory has been encoached about by Obj C / Swift on iOS, Java/Kotlin on Android, and JS on the web. So there are not as many new C++ applications as there used to be, but there are still some.


C++ has been replacing the low level kernel code since Windows 8, little by little.

And UWP, written in a mix of C++ and .NET Native is the future of the Windows desktop applications, even if it takes the next couple of years to eventually get there. All new APIs are UWP based.

This week's BUILD 2018 has proven that they are still on that direction, one day Win32 will join Win16.

And in Azure, well the underlying OS doesn't really matter for most languages with rich runtimes.

As for your remark regarding apps, there I fully agree.

C++ has been pushed down the stack, left for GPGPU shaders and low level OS features, with Microsoft being the only exception of having tooling parity with the other languages on their OS SDK.

Here Apple and Google have C++ on a little box bounded by Objective-C++ and JNI.

And on UWP, even if many high performance components are written in C++, most teams just use .NET Native.


Thanks for your thoughts!

When I saw my son coding in C++ I said to him why not Go or Rust?

But on the ride home thought doing it C++ makes total sense as no matter what he uses when gets a job knowing C++ is required today. So much older code still in C++.

BTW, he was working on lower level primitives as for higher level I would have suggested Python and a backup of R.


Sometimes tools are good enough that they don't need to change. People still use hammers and nails, even though we have awesome power tools and many different types of skrews. Coherency and ease of use are very important factors when building anything.


> How in the world have we not moved on?

We tried but have yet to give up.

"It looks like Plan 9 failed simply because it fell short of being a compelling enough improvement on Unix to displace its ancestor. Compared to Plan 9, Unix creaks and clanks and has obvious rust spots, but it gets the job done well enough to hold its position. There is a lesson here for ambitious system architects: the most dangerous enemy of a better solution is an existing codebase that is just good enough."

— Eric S. Raymond


I think it's because C++ maps reasonably well to our model of a CPU (basically moving stuff in and out of registers and doing simple operations on those). The guts of a modern CPU is quite different from our mental model and that causes a lot of distress for compiler writers but the programming model has basically not changed in the 35 years I've been coding.


Perhaps that BTW is what makes all the difference.

I suspect that 'us' coders might overvalue the language or tools we use, and undervalue the progress that might take place in the context of them. Not just when it comes to the next generation(s), but also in our own work.

For example, over the past decade or two I've 'improved' my toolage by learning vim and emacs, figuring out how to get Sourcemaps and live reloading working, and improving my command-line-fu. And yet, as a programmer, I feel all of these improvements pale in comparison to the stuff I learned that would be equally useful in Notepad as it would be in a decked out Sublime Text/VSCode/Emacs.

This becomes evident when I work on one-off scripts in the Chrome developer tools, or editing stuff using vim on a server. I'm reduced to the most basic tooling, and yet I can get stuff done in a fraction of the time that I'd have needed a few years ago.


> And yet, as a programmer, I feel all of these improvements pale in comparison to the stuff I learned that would be equally useful in Notepad as it would be in a decked out Sublime Text/VSCode/Emacs.

As someone who's 5 years into my career and interested in moving past the not quite a senior engineer plateau, what would you say are the important things you've learned?


Woops! Forget the best part. He was using Vim to write the code. That one really cracked me up as I am a Vi wiz and would never change as I am just so fast using.

Never used Emacs.


> BTW, his code was so, so pretty. Mine would have been ugly. Kids are far better coders than we were 30 years ago. I was crazy proud.

I don't know what compiler you were using on Ultrix but C++ in that era was insanely bad.


Ultrix was roughly about the same time as the first C++ (a preprocessor that made C) - people were still figuring out what good and bad C++ was


Yes similar time frame but C++ first. I tended to be into the new thing so was into C++. It was mid 80s and C++ dates back to 1979 if memory serves.


It bothers me when "goto" is assumed to be "a maligned language construct".

People who think "goto" is evil should also give up the other jump statements: continue, break, and return (and also switch, though its not listed as a jump instruction in the C standard, at least not in '89 or '99).

You can see some contradictions in the paper regarding goto. For example, they state that deep nesting should be avoided, but goto should be avoided as well, even though one benefit of using goto is to limit nesting depth. From the Linux Kernel coding style doc: - unconditional statements are easier to understand and follow - nesting is reduced - errors by not updating individual exit points when making modifications are prevented - saves the compiler work to optimize redundant code away ;)


This is because certain coding standards are designed to be idiot-proof. Unfortunately, that can result in tasteless code and sometimes undesirable workarounds (e.g. using "goto" have one exit path for errors is a perfectly valid use).

When Dijkstra wrote his famous essay "Go To Statement Considered Harmful" (1968), it was a manifesto against unstructured programming i.e. the spaghetti code. However, the use of "goto" per se does not imply unstructured programming. Donald Knuth wrote a wonderful essay "Structured Programming with go to Statements" (1974) to make this point.

Availability of "goto" in C merely gives us more flexibility, but it does not mean that we should start writing unstructured code.


Dijkstra was concerned about being able to reason about code, and spaghetti code can make it impossible to decompose. A single goto within a function is not a big deal, and that's not really what he was worried about.

Few people have worked on real spaghetti code, thousands of lines with no functions, no modules, just spectacular leaps forward, backwards, leaping forward into the middle of huge loops, leaping backwards into the middle of loops, giant loops nested with and partially overlapping other loops.

I worked on such code, trying to decompose it in order to organize it into subroutines. It resisted my efforts almost completely. Fortran IV I think.


Lua added goto just recently, and it's benign, because it can't escape its calling context.

Users were agitating for `continue` to join `break` for control flow interruption. Lua instead chose to provide all the non-structured control flows you would like as a primitive; scoping it lexically keeps it from breaking composition.


I still hope that we might be able to convince the Lua authors to add continue one day. It would be very convenient.

I think the real reason why it hasn't been added to the language yet is that it has a weird interaction with the way Lua scopes its repeat-until loops. The justification about having a single control-flow structure is more of an excuse.


As a contrasting opinion, I would like the break keyword to be formally defined as a sugar for a special labeled goto.

Since it's a keyword, you can't create your own ::break:: label, so that would be fine.


I'd be OK with that. But let me type `break` and `continue`, with all the sugar that entails :)


> Few people have worked on real spaghetti code

The closest I've come was my own code: minsweeper on my Ti-82 graphical calculator. I Separated the program in various sections, then used `goto` to jump where I needed. I was tempted at some point to use the "call program" facility instead, but that would have meant exposing those programs to the end users, so I just lumped everything in one file.

Reminds me why I love functions.


Oh god, I feel for you. My first real language was when I had my TI-84, and you NEEDED goto if you didn't want to clutter up the user's machine with a bunch of things they should never-ever press. God, what horrible yet nostalgic memories.


I’ve seen a few Fortran goto subroutines. I somehow get the feeling that the two following facts are mathematically related: some graphs cannot be drawn on a two-dimensional sheet of paper, and some subroutines cannot be decomposed into smaller subroutines.


> People who think "goto" is evil should also give up the other jump statements: continue, break, and return (and also switch, though its not listed as a jump instruction in the C standard, at least not in '89 or '99).

That makes no sense. goto is maligned because it's unstructured, the other constructs you list are structured and restricted. Much like loops, conditionals and function calls they're specific and "tamed".

Not only that, but the historical movement against goto happened in a context were goto was not just unstructured but unrestricted (to local functions).

Even K&R warns that goto is "infinitely abusable", and recommends against it when possible (aka outside of error handling & breaking from multiple loops as C does not have multi-level break).


Unstructured and unrestricted equivalent of goto then is a method call in OOP. Where method is basically a label and object is a shared state it randomly messes with.


That's not unrestricted though -- control returns to the caller after the end of the function.

With goto that's not guaranteed to be the case -- that was the spaghetti part.


No, that's not a restriction, but just a behavior. It doesn't keep you from jumping all over the place and modifying shared state by calling methods within methods. You can only use conventions to have some restrictions and structure here. Just like with goto.


if (False) goto error;

vs.

if (False) throw ExceptionE;


Ehem

   if(False) longjump(error, 1);

   vs

   if (False) throw ExceptionE;


Good job, compiler :)


You highlighted in this comparison why I hate exception handling in OOP languages, and just generally the common practices prescribed for handling errors.


switch is unstructured to a large degree:

  switch (val)
    while (cond)
    {
      case 42: /* wee: if val is 42, we go straight here */
        break; /* this belongs to the while! */
    }

I have written this kind of jig the past:

   if (...) {

   } else switch (val) for(;;) {

   }
:)


Yep. Those maligners forget that state machines are useful ways to structure code, are inherently analyzable (they form the basis of modeling languages like PlusCal), and can only be fully expressed in C using goto.

(No, you can't use tail calls to represent state machines in C; that is not a feature of the C language but rather of a given implementation. And yes, you could model state machines with an enum variable and a big switch statement, but that's even harder to follow.)

What trips people up is when goto is used to jump across resource lifetime boundaries (which C++ addresses with RAII), when they maintain too much state in cross-state variables that they forget what their invariants are, and when they use a goto as a replacement for what should be a function call.

Using goto to implement e.g. a data processing loop, a non-recursive graph traversal algorithm, or a parsing state machine are all perfectly valid uses.


Tracing through goto spaghetti is not more comprehensible than a structured switch with clearly defined regions for each state. This is the sort of abuse that gives goto a bad name. The only thing worse is table driven state machines calling function pointers scattered everywhere.


State machines implemented with gotos have very clearly defined regions for each state: the space between each label. Switch-based state machines are fine, but become hard to follow when e.g. you need a loop around a couple states, and are often abused to allow changing states from a non-lexically-local context (e.g. within a function call).

At a high level, this:

    goto NewState;
is no less comprehensible or spaghetti-prone than this:

    state = NEW_STATE;
    break;


That’s what I am thinking of. These days people use goto only to reduce spaghetti code in very specific use cases in C and nothing more.


As a longtime C and goto user, defending the practice many times, I discovered something interesting.

My uses of goto can be replaced with nested functions! The code is nicer, cleaner, and the equivalent code is generated (the nested functions get inlined).

Of course, nested functions aren't part of Standard C, but they are part of D-as-BetterC. (D has goto's too, but I don't need them anymore.)


Walter, what do you think of nested functions or even normal functions taking an identifier similar to continue / break to be able to jump up the stack precisely: either a certain number of steps, or to a particular calling function.

https://dlang.org/spec/statement.html#continue-statement

also, what do you think of a partial compilation feature / tree shaking https://webpack.js.org/guides/tree-shaking/


Just wanted to mention that GNU C has supported nested functions since forever, it's one of the main reasons I prefer GCC over Clang these days.


Yes and it annoys me no end that clang has refused to implement them as well, as they were part of my codebase as well... How better to implement stuff like qsort() callbacks than with a simple, contextual small function just over it ??

YES it is dangerous due to stacks etc etc but hey, we're grown up adults, not script kiddies.


Clang has Blocks instead of nested functions. There may be patches for gcc-4.2 for Blocks but they didn't make it to the mainline.

https://en.wikipedia.org/wiki/Blocks_(C_language_extension)


> YES it is dangerous due to stacks etc etc but hey, we're grown up adults, not script kiddies.

That is how CVEs are born.


Using a chainsaw without paying attention is how fingers are cut off, using that as an argument against making chainsaws easier to use doesn't make any kind of sense.


Good chainsaws have protection mechanisms builtin.

C does not.


Good for you maybe, projecting that on people who have a clue what they're doing doesn't make sense either. They're mostly messing up chainsaws as well these days, for the same misguided reasons.


That adds a lot of boilerplate though. IMO the best solution is destructors and RAII so that you can return early in case of error and not leave your resource half-initialized. And this way you don't have to repeat your cleanup code in the "deinit" method. Of course if you start adding destructors soon you'll want generics and before you know it you end up with "C with classes" and we all know that's the path towards insanity.


> That adds a lot of boilerplate though.

In practice, it doesn't. (The compiler inlines the code.) I know because I'm pretty picky about this sort of thing - it was why I was using goto in the first place.

> RAII

RAII can work, but it's a bit clunky compared to a nested function call. Additionally, if the code being factored is not necessarily the same on all paths, that doesn't fit well with RAII.


Boilerplate in term of characters typed, not resulting code size. Adding a bunch of function declarations adds a significant amount of noise IMO.

>RAII can work, but it's a bit clunky compared to a nested function call. Additionally, if the code being factored is not necessarily the same on all paths, that doesn't fit well with RAII.

I think I'd need to see an example of what you're talking about then because I don't quite understand how your method works exactly.


Boilerplate doesn't mean (binary) bloat.


Rather than debate, here's an example of replacing goto's with a nested function:

https://github.com/dlang/dmd/pull/6656/files


I guess it's a matter of taste, but I often (not always) prefer less return statements in a function with gotos to error handling/cleanup code. Especially in kernel mode drivers.


Check the asm code generated by your compiler - it may optimize multiple returns into one.


Not performance or code size I'm concerned, it's mainly about correctness. Stuff like input validation and releasing a spinlock in kernel code.


> My uses of goto can be replaced with nested functions

I guess you are not using goto as flow-control then.


That's what I would have said until I tried it :-), and why I said it was interesting!


> It bothers me when "goto" is assumed to be "a maligned language construct".

That's a statement about what people say about the feature, not about the feature itself. The subtlety here is that malign as an adjective refers to something evil or ill-intentioned, while as a verb it means something closer to slander or defame. It's frequently (mostly?) used in a context of skepticism regarding the claims in question, especially in the construction much-maligned.


Sounds a bit like a strawman then, does anybody actually maligns "goto" as an error handling construct in C? It's pretty standard in my experience. It's goto "like in BASIC" that's utterly evil and rightfully maligned. And having learned C coming from BASIC I speak from experience...


Yes actually. Our teacher in university described it as "Something really bad".


In the context of C or C++? I've read similar warnings in C++ tutorials in the past and I tend to agree with them, in C it's just silly however.


In the context of a C systems programming class.


Yeah their take on goto is a bit odd given that it's probably the sanest way to do "cascading" error handling in C given that we don't have RAII or exceptions.


GCC supports RAII through cleanup variable attribute (https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attribute...) though.


For small and moderate memory requirements, C pretty much supports RAII via variable-length arrays.

You just replace this:

    void f(int n)
    {
            float *x = malloc(n * sizeof*x);
            // ...
            free(x);
    }
with this:

    void f(int n)
    {
            float x[n];
            // ...
    }
edit: and you can exit your function anywhere without leaking.


They were made optional, as it turned out a very bad idea regarding security.

I just need to call your example function with the right n to corrupt the stack.


For memory allocs sure, but that's only a small subset of resource management. How about sockets, fds, locks, 3rd party library initialization, hardware setup (for device drivers) etc... Alloca doesn't cut it, you need general purpose destructors.


Also when you code something at lower levels, sometimes it’s beneficial to treat some parts of CPU state and thread state as resources: process/thread priority, interrupt mask, FPU control, even instruction set (e.g. Thumb support on ARM needs to be manually switched on/off).


sure, it's just a small, but non-negligeable, part of RAII that you can do in C


It's negligible, you still need to manage lifetimes somehow beyond the scope of a single function. Like having a context abstraction and tying destruction of resources to it. Introducing something alternative for special cases only increases complexity as now instead of using a single universal and consistent API you have multiple that behave rather differently.


in many cases (I would say most, at least for my programs in non-interactive scientific computing), all the objects can be created at the beginning of the program, and then no further creation happens. Sometimes it takes a bit of effort to refactor your program into that structure, but it is an effort well spent. Then you can use tools like openbsd's pledge, and reason more clearly about your algorithms.


I concur, that tends to be my modus operandi as well but unless your application is completely monolithic you'll probably have 3rd party init code to deal with at some point. And again it won't help if you need to handle cleanup that's not memory-related.

In the case of an operating system (the subject of TFA) pre-allocating everything is obviously completely impractical and alloca won't help since you can't return the memory outside of the stack frame. I'd wager that there are very few uses of goto in kernel code that could successfully be replaced by alloca (the fact that kernel stacks tend to be very shallow wouldn't help either).


Pre allocation is generally the safe option in a embedded security critical environment where you must always handle the worst case scenario and you know all possible inputs. In a user interaction environment though it's usually better to over-sell so that the user can choose wheter he wants to create 1 million A's or 1 million B's, instead of having a pre-created pool of half a millon A and Bs each.

With pre allocation you usually also end up creating your own resource management within the pre allocated pool and then you are back to the resource management problem...


RAII is about a lot more than freeing memory. What if that’s an array of open file handles? Or mutex locks, etc?



yeah, just like ieee754 floating point arithmetic. What's the point of your remark?


It is, as an unstructured "anything goes" statement.

As for continue, break, and return, they have an explicit workflow, existing or resuming the current scope.

The only language where unconditional jumps make sense is Assembly.

The last time I actually wrote a goto statement was probably in some BASIC dialect.


>The only language where unconditional jumps make sense is Assembly.

In Scheme, because a lambda expression is a closure and shall be optimized with tail call elimination, it is considered as the ultimate goto. It's goto but with procedural abstraction.


Yes, which means you are sure where the flow goes next, instead of hunting down goto spaghetti.

Even call/cc is more structured than plain gotos.


I am not sure about that. What about high order programming

    (define (goto proc) 
      (proc))


It still has a call stack, goto only goes forward.


Using goto for managing cleanup after an error condition results in cleaner, easier to understand and maintain code. This is a specific idiom that is easy to recognize.

On the other hand, using goto to jump back in the program flow, or multiple branching gotos are both asking for trouble.


For people who say goto statements make spagetti, please consider looking at this goto use case in Linux, sock_create_lite() [1]. Imagine we make the same thing without goto here, is it gonna be more Don't-Repeat-Yourself, more readable, and less error-prone? I don't think so.

I understand the how hamful goto statement is in general all of us know that. But there is very specific useful area in C. When Dijkstra wrote "GOTO Statement Considered Harmful", even before C was born, people tended to use goto statements everywhere because they were used to use assembly jump instructions. But we don't abuse goto statments anymore.

[1]: https://github.com/torvalds/linux/blob/master/net/socket.c#L...


return is ok -- continue and break are also problematic, but at least are isolated in the same context (e.g. function).

goto is ok only for local error cleanup (or auto-generated code, e.g. for parsers etc).


As a C novice, this makes me wonder - wouldn't templates/macros be able to serve a similar purpose to goto statements if the goal is to avoid nested calls but still share code?

EDIT: Or, for that matter, trusting the compiler to inline small-enough function calls?


> wouldn't templates/macros

A gcc extension that gets around macros is nested functions.

> trusting the compiler

99.99% of the time you should just trust the compiler to do the right thing. That 0.01% of the time use a well tested and maintained library.

Hating on goto in c is just cargo cult programming. goto in BASIC and FORTRAN are evil. Though back in the 1960's 1970's programmers often had no choice. Programmers used evil assembly goto hacks in order to get their programs to fit in memory/disk.

Problem with Dijkstra was he was a academic mathematician who despised practical programming where the program needed to run on the hardware available. Hint back in the 1970's professors like Dijkstra had unlimited accounts on the schools mainframe where everyone else had extremely limited accounts.


The authors even concede that goto may be useful if not dangerous:

Perhaps goto is too valuable to let go; or letting it go results in code that is more complex than simply using it judiciously.


I saw in a Computerphile video that gotos are not recommended in high level languages because you have the while loop for that.


There seems to be an oddity with their LOC for FreeBSD 2.0. At ~6M LOC, it is roughly 3x the size of the FreeBSD releases just before and just after it. The size steadily creeps up, but we don't see anything else that big until FreeBSD 5, so there seems to be something fishy there..


2.0 was when they revamped the code base because of copyright problems. Check this out:

https://www.freebsd.org/releases/2.0/notes.html


Sure, I initially though that might be it. Eg, maybe some of the non-x86 support from 4.4BSDL was not pruned out of 2.0, for example. But I checked, and 2.0 only has i386 support.

What strikes me as odd is that the very next minor release (2.0.5) is listed as 2.1M LOC, while 2.0 is listed as 6.1 MLOC. That's a reduction of 66%. Looking at a diff between the 2 releases, I do not see anything to explain that.


Maybe their tool got confused with the forks made around that time to fix the copyright issues? Basically the tool counted the codebase 3 times due to forks? It seems weird that an error like that would make it to publication though.


There is also a strange jump at 6.3.0


This helped me out a lot as a base guidline for writing good C code: https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...


So no discussion about any positive evolution of security best practices or static analysis, other than single line remarks?


I have enrolled for CSE in http://www.thapar.edu college this year. Can I join this C programming course with engineering?

I know basics of C and C++ !!


I wonder how many of those LOC are just support for more and more hardware over the years? The nice thing about drivers is that they're almost completely modular so they don't really bloat the code even though they shoot the LOC count through the roof.


DSTMT/statement density plot is nice to see. Although it wasn't my time, I've seen early code (or code from very memory-limited machines), and always been amazed out how tightly packed it seems to be. And remarkably unreadable too.


A really good read! Is there a repository to share papers related to source-code analytics like this one?


Not a repository but many references about source-code analysis focused on software security/safety:

https://www.us-cert.gov/bsi/articles/tools/source-code-analy...


I know this title is not technically click bate, but it got me.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: