Hacker News new | past | comments | ask | show | jobs | submit login
How to C in 2016 (matt.sh)
1145 points by fcambus on Jan 8, 2016 | hide | past | favorite | 581 comments



Writing correct code at scale is essentially impossible. We have multiple operating systems, runtimes, libraries, and hardware platforms to worry about without even considering things like random bit flips in RAM or our block devices lying to us with unknown probability.

The best we can do is write simple, understandable code with as few indirections and as little undocumented magic as possible.

Applies very well to all programming.


I'm highly disagree that it's impossible to write high quality code in C language for a big project (I don't like his statement: The first rule of C is don't write C if you can avoid it).

I was mostly C developer (80% C, 20% C++) for 4 years in a big project (I was backend developer of ICQ Instant Messenger). It has more than 2M lines of C code. Almost all libraries was written from scratch in C language (since ICQ is very old project, many of essential stuff was written within AOL).

I felt comfortable to write high level code in C.

When I read Nginx source code (it's written in C), I see better code quality than 90% of proprietary commercial software which is written on nice and comfortable high level languages.

Good developer is able to write high quality code in C language with minimum bugs in large scale project. Regardless of how nice and high level and convenient language, bad developer will write c....y code.

You can also check my personal project which is written entirely in C. (I didn't contribute for 1.5 years to my github since I was changed countries of living, jobs etc. I hope I will return to it when my life become stable again).

P.S. To be clear, I think that C++11 is great language. But unlike many C++ developers, I love pure C.


This is why we can't have these discussions...everything gets interpreted literally, and it becomes some sort of ego contest. I'll take your claim at face value, that your team wrote 2M lines of C for a commercial project, with no memory corruptions, crashes, integer overflows, etc. That's a shockingly impressive achievement, one that I think has probably never been done before.

But how is that a useful or replicable result, that will work in the real world for other projects? You're basically saying that everyone just needs to be a top 0.001% C programmer. This is the "abstinence-only sex education" of programming advice. It's super simple to not get STDs or have out-of-wedlock pregnancies - just don't have sex with anyone but your spouse, and ensure they do the same. It's 100% guaranteed to work, and if you don't follow that advice you're stupid and deserve whatever bad things happen to you.


> You're basically saying that everyone just needs to be a top 0.001% C programmer.

A more constructive/charitable phrasing of the parent's point might be: you should only be allowed to code in C if you're already a top 0.001% programmer. Then, obviously, all C code will be good—not that there'll be much of it.

That's actually a more interesting point than it seems. Programmers generally tend to reject the idea of guilds and "professional" licensing—mostly because, AFAIK, most programs aren't critical and most crashes don't matter. But what if, instead of licensing all programming, we just licensed only low-level programming? What if, as a rule, everyone who was going to be hired to write code in a language with pointers was union-mandated to be certified as knowing how to safely fiddle with pointers? That could probably split on interesting dimensions, like a bunch of regular coders working in e.g. a Rust codebase, with a certified Low Level Programmer needing to "sign off on" any commit that contained unsafe{} code, where the actual failure of the unsafe{} code would get the Low Level Programmer disbarred.


Not all C developers need be a top 0.001% C developer to write high performance and rock solid code. Why?

I'm a very experienced C programmer and one day my boss came to me and said that the sales guys had already sold a non-existing client side module to a house hold name appliance manufacturer. The deal was inked and it had to be ready in only 3 months. Even worse, it had to run in the Unix kernel of the appliance and therefore be rock solid so as to not take the whole appliance down. It also had to be ultra high performance because the appliance was ultra high performance and very expensive. Now the really bad news: I had a team made up of 3 more experienced C developers (including myself) and 3 very un-experienced C developers. We also estimated that in order to code all the functionality it would take at least 4 months. So we added on another 4 less experienced C developers (the office didn't have a lot of C developers). The project was completed in time, a success, and almost no bugs were found, and yet many developers without much C experience worked on the project. How?

(a) No dynamic memory allocation was used at run-time and therefore we never had to worry about memory leaks.

(b) Very, very few pointers were used. Instead mainly arrays. And not just C developers understand array syntax, e.g. myarray[i].member = 1 :-) Therefore we never had to worry about invalid pointers.

(c) Source changes could only be committed together with automated tests resulting in 100% code coverage and after peer review. This meant that most bugs were discovered immediately after being created but before being checked in to the source repository. We achieved 100% code coverage with an approx. 1:1 ratio of production C source code to test C source code.

(d) All code was written using pair programming.

(e) Automated performance tests were ran on each code commit to immediately spot any new code causing a performance problem.

(f) All code was written from scratch to C89 standards for embedding in the kernel. About a dozen interface functions were identified which allowed us to develop the code in isolation from the appliance, and not have to learn the appliance etc.

(g) There was a debug version of the code littered with assert()s and very verbose logging. Therefore, we never needed to use a traditional debugger. The verbose logging allowed us to debug the multi-core code. Regular developers were not allowed to use mutexes etc themselves in source code. Instead, generic higher level constructs were used to achieve multi-core. My impression is that debugging via sophisticated log files is faster than using a debugger.

(h) We automated the process of the Makefile so that developers could create new source files and/or libraries on-the-fly without having to understand make voodoo. C header files were also auto generated to increase developer productivity.

(i) Naming conventions for folders, files, and C source code were enforced programmatically and by reviews. In this way it was easier for developers to name things and comprehend the code of others.

In essence, we created a kind of 'dumbed down' version of C which was approaching being as easy to code in as a high level scripting language. Developers found themselves empowered to write a lot of code very quickly because they could rely on the automated testing to ensure that they hadn't inadvertently broken something, even other parts of the code that they had little idea about. This only worked well because there was a clear architecture and code skeleton. The rest was like 'painting by numbers' for the majority of developers who had little experience with C.

The same team went on to develop more C software using the same technique and with great success.


I'm pretty you would pass whatever sort of bar exam for a "Low Level Programming" certification they might deign to thrown at you. You're the 0.001%. (Although 0.001% might be the wrong number—I'd hazard 0.1%, or maybe even as high as 0.5%.)

Which is to say, your story matches my hypothetical pretty well. You effectively created the same structure as Rust has, where there are safe and unsafe sublanguages, and the "master" developers thoroughly audited any code implemented in the unsafe sublanguage.

Which makes my point: there exist a small set of people qualified to write in "full C", and a much-larger set of people who aren't; and the only way—if you're a person who isn't qualified—to write C that doesn't fall down, is with the guidance and leadership of a person who is.

Though I think maybe your thesis statement agrees with that, so maybe I'm not arguing with you. Not all the developers on a given project need to be from the qualified set, no. But at least some of the developers need to be from the qualified set, and the other developers need to consider the qualified ones' guidance—especially on what C features to use or avoid—to be law for the project. As long as they stay within those guidelines, they're really working within a DSL the "master" developers constructed, not in C. And nobody ever said you can't make an idiot-proof DSL on top of C; just that, if you need everything C offers, your the only good option is to remove the idiots. :)


We do agree that the experienced developers are necessary along side the less experienced developers. At least when the project is getting off the ground and the less experienced developers are still learning the ropes.


Congratulations on the project's success.

Unfortunely your case is the exception that confirms the rule.

I never saw a company using C like that.

On my case the teams used to be composed from circa 30 developers, scattered around multiple consulting companies with high attrition.


Thanks! And I have not seen any teams doing this kind of thing either.

Another tidbit: The developer pairs were responsible for writing both the production code and associated automated tests for the production code. We had no 'QA' / test developers. All code would be reviewed by a third developer prior to check-in. However, at one stage we tried developing all the test code in a high level scripting language with the idea that it would be faster to write the tests and need less lines of test source code. However, because we did several projects like this, we noticed that there was no advantage to writing tests in a scripting language. The ratio of production C source lines to test source lines was about the same whether the test source code was written in C or a scripting language. Further, there was an advantage to writing the tests in C because they ran much faster. We had some tens of thousands of tests and all of them could compile and run in under two minutes total, and that includes compiling three version of the sources and running the tests on each one; production, debug, and code coverage builds. Because the entire test cycle was so fast then developers could do 'merciless refactoring'.


As another software developer working in the "embedded systems" industry, I can tell you that using C like that is not unique. The companies I have worked for and several of my friends have worked for all use C similarly. Sometimes we do use the dangerous features of C. When we do, additional reviews are done to ensure the use is both sufficiently justified and properly done.


Do you happen to know the company names where they work?


That's really an awesome achievement, and your post is chock-full of very practical tips. Thank you!


Thanks!

I also forgot to mention about performance. Whether you code * foo or the easier to comprehend and safer(?) foo[i] then the compiler still does an awesome job optimizing. However, it's much easier to assert() if variable i is in range rather than *foo. And it's also easier to read a verbose log variable i (usually a 'human-readable' integer) than to read a verbose log pointer (a 'non-human-readable' long hex number). At the time we wrote a lot of network daemons for the cloud and performance tested against other freely available code to ensure that we weren't just re-inventing the wheel. NGINX seemed to be the next fastest, but our 'dumbed down' version of C ran about twice as fast as NGINX according to various benchmarks at the time. Looking back, I think that's because we performance tested our code from the first lines of production code, and there was no chance for any type of even puppy fat to creep in. Think: 'Look after the cents and the dollars look after themselves' :-) Plus NGINX also has to handle the generic case, whereas we only needed to handle a specific subset of HTTP / HTTPS.


It was at this point that I realized that the performance of the user-land code in general (including NGINX) could be much better if it wasn't for the legacy network stack in the kernel, which does a bad job at some things such as slow performance accept()ing new sockets, and slower performance because every packet read has to be another system call unnecessarily copying memory; why not use shared memory? That's when I started to get interested in C10M... :-)


Well, there's a trick to writing C++ with no crashes and no memory corruption: never allocate memory using pointers. If you store all your data in stack variables and STL containers, C++ becomes a _much_ more forgiving language (especially if you also avoid templates and operator overloading).

I imagine that this poster and his team took a similar approach with C: avoiding the dangerous features, or just finding ways to factor them out.


Even if you do all the things you said, you can still crash a C++ program. Iterator invalidation is very easy to do on accident, for example.


I'm gonna have to steal this analogy. Its impressive how often these dick-measuring arguments pop up.


Sorry to side-track, but the percentage of people who abstain from sex until marriage or death is far higher than 0.001%, so the comparison isn't really fair.


The "everything gets interpreted literally" though, that the parent wrote, is still spot on.


"out-of-wedlock pregnancies"?

What is this, 1953?

There are a great number of couples choosing to have children in stable, committed - yet unmarried - relationships.


This sums up internet commenting in 2016. Congratulations on entirely missing the point because you chose to focus on an irrelevant and pointless detail, which I'm sure will spawn a stupid subthread that changes no one's mind.


HN needs "collapse whole thread" feature. Badly.


An official solution would be great, but in the meantime there was a thread last year with lots of options:

https://news.ycombinator.com/item?id=8706496


+∞

Way to hijack a useless subthread and do something useful with it. Great meta-point. Honestly. This feature alone makes the comments on Reddit bearable for consumption.


And the parent thread will?


Yeah, that comment is way off-base.

Should be 1967.

</s>


The great thing about threaded textual commenting is that you can have a sub-thread to discuss a side point, without removing the ability to debate the main point in parallel.

The comment struck me as incredibly anachronistic, bordering on offensively so.


> bordering on offensively so.

please don't.


The parent commenter is saying that there is a type of person who believes that everyone should just abstain from sex until marriage, and there should be no out-of-wedlock pregnancies. Those people are as ridiculous as someone who believes that everyone should just write perfectly secure, bug-free C code.


Nice troll


> This is the "abstinence-only sex education

For the record, I don't think your stupid if you have a child accidentally.

However, once you do accidentally have a child, the results are often not pretty, and I will feel bad watching as you deal with the consequences of your poorly thought out decision.

Regarding programming, I can't see the relationship you make at all.

If you write a good program, well good job. If you write a bad program, you are probably going to write a better one next time. (Ever look back at your old code?)

It takes practice to get good, and the more we work at it, the better we'll get. Just keep trying.


I think defen's point was that if a team produces truly high-quality C code, it's similar to a couple that both abstain until marriage. Yes, it's technically possible, but it's rare, and it's not something that should be the expected outcome (that is, most C code will have some problems, and most couples will have at least a little bit of regrettable sexual history before they're married).

It's as if I say: "Step 1 of writing good code: Don't make mistakes. If you make mistakes, you deserve what's coming to you!" Some mistakes are expected, and essentially inevitable; they're the default, not the exception. Blaming the programmer might seem like the right thing to do, but there's more practical benefit to improving the tools and teaching programmers how to handle errors, so that we can decrease the negative impact that bugs will have.


The point of the analogy is that it is very easy to prescribe behavior that is guaranteed to work if followed 100% to the letter, but fails catastrophically if there is even 1 mistake. There are two responses to this - you can either design systems that are easier to follow & fail less catastrophically in the real world (don't use C for new code if you don't need it; teach people about safe sex practices), OR you can double down on the sermonizing about how if people had just been smarter / had more willpower, they wouldn't be in this bad situation.


Yes, but the analogy isn't accurate, in the sense that when things do fail in a C system, it's probably not that hard to fix.


> However, once you do accidentally have a child, the results are often not pretty... Regarding programming, I can't see the relationship you make at all.

I have two words for you: MAINTENANCE PROGRAMER.


Ha! Yes, aren't we all.


It's either that or the life of the grasshopper ;)

Now that I think of it, there's another parallel with unwanted pregnancies. There are men that stick around, and there are boys who run and let others clean up after their mess.


Would you let your daughter date a contractor???


Sure, if she was over 25 and fully aware that it is "no strings attached".


I remember seeing someone's C++ code at GDC and thinking to myself, "That looks like good Smalltalk!" Everything was intention revealing. Everything was about objects "sending messages" to each other. I've also seen production Smalltalk that might as well have been FORTRAN from the 70's. (Not one instance method or variable in the whole system. Everything was indexed loops.)

The people and the culture/organization of the shop counts for much more than your language. Language counts for something, but it's not where you get the most bang for your buck, and it shouldn't be your first priority!


Somebody once said that Real Programmers can write FORTRAN code in any language. ;-)


C++ isn't C. They have....different sized standards.

That said you can create message-passing systems in either - Objective-C was originally a C library and cross-compiler.


C++ was originally implemented as a preprocessor for C. Eventually, the preprocessor was replaced by a direct compiler for C++


Sounds interesting, do you recall the name or topic of the talk?


The only possible ways you can believe you can write issue-free large scale projects in C, which don't suffer from all the usual pitfalls of C, are:

* You've never deployed one.

* You've never had to maintain one.

* You wrote the last line of code and handed it over to the maintenance team, who then never gave you feedback.

* You have no customers.

* You have done a never-before-seen formal analysis of a large scale project down to the individual source line level, and made no mistakes with the formal specification.

The author is correct: use C only if you must. There are still an enormous number of reasons why you must use C, but it should be a constraint imposed on you, and not a language you actively want to start a project in.


True.

Parent seems to believe in nginx. But as a guy who actually tried to do a few of those things with nginx I can confirm that nginx's code base is pretty awful and working around all the usual pitfalls of C in nginx is a huge PITA and enormous amount of work.

(same username on github)


> There are still an enormous number of reasons why you must use C, but it should be a constraint imposed on you, and not a language you actively want to start a project in.

It's the same as with, say, goto or #ifdef's - one has to consider the alternatives and make an informed choice. Most of the time, the alternatives are preferable, but sometimes there either aren't any alternatives, or they are even worse.

Which is not to say that C can't be a whole lot of fun. I love it dearly. But I, too, tend to avoid it when I can. It's just too easy to shoot yourself in the foot.


It's what they paid me to do in the 90s. That was a compelling reason to use C.

(of course, one employer appropriately used C as a portable assembler, and respected it as such, for the purposes of developing a cross compiler and tool emulations for an older minicomputer environment; the next was doing (batch) "business applications" and foolishly spent too little on hardware and too much wasted effort on application development)

Pascal (Modula) did much the same, SAFELY, but the money wasn't put into compiler optimization :-(

Interesting that the GNU compiler collection now includes a very efficient Ada compiler.


> Interesting that the GNU compiler collection now includes a very efficient Ada compiler.

GNAT has existed for ... at least ten years, I think. The DOD apparently paid for the development so there would be at least one open source/free software implementation. (According to Wikipedia, development started about twenty years ago, and it was merged into GCC in 2001.)


As a C++ programmer who delved deeply into nginx, I think that your chosen example of "high quality C code" is telling of the state of C programs. It could be one of the best specimens of C code in terms of organization, but absolutely speaking it's one of the best large code-bases I had to deal with:

1. Almost devoid of any documentation. 2. No ADTs - all struct fields are directly accessed, even though there are sometimes very complicated invariants that must be preserved (and these invariants are not documented anywhere, of course). 3. Ultra-short variable names that really describe nothing but their types (it's like Hungarian notation with only the notation part). 4. These mystery variables are all defined in the top of the function of course, making them hard to trace. Does anyone even compile Nginx with C89? 5. Configuration parsing creates a lengthy boilerplate that's really hard to track. 6. Asynchronous code is very hard to track, with a myriad of phase-specific meaning for return codes, callbacks and other stuff that makes reading Boost.ASIO code look like a walk in the park.

I think it really shows the sad state of C programming. Die-hard C programmers always complain C++ is hard to read because of its template and OOP abstractions, and it sometimes really is a mess to read, but C is even harder since large programs always re-implement their own half-assed version of OOP and template-macros inside.


> but absolutely speaking it's one of the best large code-bases I had to deal with

You meant "one of the worst"?


Either that, or best example of the worst


Sorry if I think you're overstating your achievements.

- Was your code unit-tested?

- Did you use the 'strn' functions? (strncpy, instead of strcpy for example)

- Did your code run under Valgrind? (was there valgrind at that time? I'm not sure)

- Was it tested for memory leaks?

- Was there input fuzzying tests?

I agree that there are ways of writing great C code (like nginx, the linux and BSD kernels, etc)

But from a certain point on you're just wasting developer time when you would have a better solution in Java/Python/Ruby, etc with much less developer time and much less chance of bugs and security issues.


The 'strn' functions are wrong, stop using them.

https://randomascii.wordpress.com/2013/04/03/stop-using-strn...


This solution uses templates, which are C++, not C

But if you're using C++, use C++ strings, you don't need strcpy

There's strlcpy mentioned there, which seems a better C solution


strncpy is fine if you wrap it with a length error check or explicit 0 terminator.


Unit tests? In C? I worked on a system processing millions of dollars a day in real time transactions, written in C++ with lower level libraries written in C. Some of it was written in K&R pre-ANSI C.

There was not one unit test. And there probably still isn't.


C is one of the easier languages to do unit testing in:

1. write a new one file script which parses function declarations from a source code file

2. have the script find all function declarations which start with "utest_", accept no parameters, and return "int" or "bool".

3. have the script write out a new C file which calls all of matching unit test functions and asserts their return value is equal to 1

4. have your Makefile build and run a unit test binary using code generated by your script for each source code file containing greater than 0 unit tests.

Writing a script to parse function declarations from your C source code is useful because you can extend the script later on to generate documentation, code statistics, or conduct project-specific static analysis.


> Unit tests? In C?

Why wouldn't you unit test in C? I do and I have found lot's of bugs as a result and maintaining code is much easier.


I agree. But it seems to happen less than in other languages. I'd say it's not part of the culture.


I don't think this is the boast that you think it is.


It wasn't meant to be a boast.


    Good developer is able to write high quality code in C...

the sufficently smart developer is as much elusive as the sufficently smart compiler


It has more than 2M lines of C code. Almost all libraries were written from scratch.

Every professional security researcher reading this just raised an eyebrow and thought to themselves, "That's a vulnerable application." In fact, your team (or a similar one at AOL) did introduce vulnerabilities[1][2]. If we assume that, as you imply, AOL really was exclusively composed of above average C developers writing high quality code, this should really just demonstrate the point for us.

I'm sure your team wrote high quality C code (or at the very least, tried very hard to write high quality C code), and as someone who likes C I also dislike the meme that C should simply not be written, no exceptions. I'd probably change it to, "Don't write C unless you need extreme performance, have great engineering resources and you'll generate unmitigated financial returns such to make any reasonable business risk assessment moot." That means virtually the same thing for anyone reading it, but it sounds a lot less sexy and memorable as far as programming aphorisms go.

I have never professionally audited a C project and not found a vulnerability. Most of my friends in this industry would likely say the same. I've never seen a serious audit of a "large" (100kloc or more) C project by a serious, reputable firm with no vulnerabilities reported. If you wrote 2 million lines of C code, if you wrote libraries from scratch, you have serious vulnerabilities. Otherwise, your engineering practices and resources exceed NASA's in formal verification and correctness.

I believe it is theoretically possible to write 100% safe C code, for some approximation of that ideal that leaves aside the ivory tower definition ("the only safe machine is the one which is off"). But I also believe this is so expensive and resource intensive (secure coding standards, audits, attempts at formal verification and provable correctness) that it's just generally not realistic in practice.

The reason I am saying all of this is not to attack you or your sense of worth as a C developer. Rather, I'd like you to consider that you simply have no idea how many vulnerabilities exist in ICQ. In fact, I found eight different security reports for ICQ, linked at the bottom. One of the more serious ones allowed arbitrary email retrieval, which I'm willing to bet was introduced because your team liked to develop in-house libraries and decided to do that for POP3.

C is a language which is both very powerful and which requires constant vigilance while coding. It is a language which makes pushing a buffer overflow to production on a Friday at 3 pm very easy. I bet your team was comparatively well-versed in C vulnerabilities for the time, but the very fact that you rolled libraries from scratch tells me you already have a high probability of errors showing up. Rolling a library/framework in-house is basically the first thing security researchers look for in an audit, because the third party ones are (usually) more secure due to their exposure. When you further consider the probability of third party software you did use becoming vulnerable in the future, changing compiler optimizations introducing new vulnerabilities and the sheer size of the SEI CERT Secure Coding Standard itself, you are left with a very dangerous risk profile.

We cannot expect C to be a safe language for general use if you need to have the rigor of one of the best research organizations in the world coupled with the top 0.01% of all C programmers. It just isn't feasible.

[1]: http://www.coresecurity.com/content/bevy-of-new-icq-vulnerab...

[2]: https://www.cvedetails.com/vulnerability-list/vendor_id-123/...


> Every professional security researcher reading this just raised an eyebrow and thought to themselves, "That's a vulnerable application."

I think the reason why every seasoned security researcher I've met also happens to be a heavy drinker or is damaged in some way is because they're in a war of attrition. You want perfect security but it will never happen. These are ultimately mutating, register-based machines. We will probably never be certain that with a sufficient level abstraction a program can be written which will never execute an invalid instruction or be manipulated to reveal hidden information.

Where the theory hits the road is where the action happens.

Which is how we end up with this wide spectrum of acceptable tolerances to security. Holistic verification of systems is extremely costly but necessary where human lives matter. However if someone finds a weird side-channel attack in a bitmap parsing library I think we can be more forgiving.

The whole idea that C programs are insecure by default and can never be secure is where theory wants to ignore the harsh realities. We can write languages with tighter constraints on the verification of the programs they create which will lower the risk of most security exploits by huge margins... but we have to trade something away for the benefit. The immediate costs being run-time performance or qualitative things like maintainability.

What I ultimately think will make these poor security researchers feel better is liability. Having a real system and standard in place for professional practices will at least let us soak up the damage will force us to consider security and scrutinize our code.


> The immediate costs being run-time performance or qualitative things like maintainability.

I don't think that's true. There's no reason why memory safety has to cost either performance or maintainability.

It feels like there's some sort of fundamental dichotomy because we didn't know how to do it in 1980, and we're still using languages from 1980, but our knowledge has advanced since then.


The sad fact is that the languages from the 60 and 70's happened to be safer than C.


I probably shouldn't get in on this heated discussion ... but C++ aims to be a safer (higher level) C, at no runtime cost. I think it mostly succeeds at this.

I think C++ can only get so far, without deprecating some of C. Or at least C style code, should come with a big warning from the compiler:

"You are using an unsafe language feature. Are you absolutely sure there is no bug here, and there are no way to use safe language features instead?"

In libraries there might be cases where C-style code like manual pointer arithmetic, c arrays, manual memory management, c-style casts, or *void pointers, uninitialized variables, etc are necessary. But in user code most often there are safer replacements.


> but C++ aims to be a safer (higher level) C, at no runtime cost. I think it mostly succeeds at this.

No, it doesn't succeed at this at all. Large C++ codebases routinely suffer from the same problems that C does.

For proof, search the CVE lists for browser vulnerabilities.


Actually there do exist provably safe programs. The problem is making that usable to the industry wide audience.


There are no such provably safe programs. I assume that you are thinking about programs that have been verified by formal proof systems.

They are not proven to be safe, they are proven to adhere to whatever the proof system managed to prove. This shifts the possible vulnerabilities away from the actual code and onto the proof system and the assumptions that the proof system makes.

For example, take a C program that was proven to be absolutely free of buffer overflows. And therefore labelled by the proof system as 'secure'. But, unbeknownst to the proof system, the C program also interprets user-supplied input as a format string! So it's still vulnerable to format string exploits.

Correct proof systems probably add a huge margin of security compared with the current security standards, but it's not absolute.



Which is executed on say early stepping CPU that has a vulnerability.


Safe software would be a huge accomplishment towards safe systems.


Can you elaborate on "damaged"?


I'd rather not as the details are not important.

There are all kinds of people doing security research. In my experience with some of those people that I've met it doesn't seem like the challenges they have to contend with are not unrelated to the stress caused by the work that they do.

Sort of like how someone who works with giant metal stamping presses is likely to be missing a couple of fingers if they've worked long enough with them (and in an environment where safety regulations are too relaxed).

These are also some of the wonderful people in my life and I enjoy them very much.

But you have to take care of yourself!


"Don't write C unless you need extreme performance,

This is the wrong mindset and has been the guiding principle behind over a generation of poor programming discipline.

You might be able to rationalize that you don't always need extreme performance, but it's not about performance, it's about efficiency. Efficiency always matters. We live on a small planet with finite resources. Today, the majority of computing is happening with handheld portable devices with tiny batteries. Every wasted byte and CPU cycle costs power, runtime, and usability.

Even if you're coding for a desktop PC or a server application - every bit of waste costs electricity, generates heat, increases cooling load. Every bit of waste costs the user real money, even if they don't see the time that you wasted for them.

Performance is a side-effect. The only thing that matters is efficiency, and it matters all the time.

Most of today's programming problems come down to laziness and fear of "premature optimization" (which is obviously bunk). It is possible to write perfect C code that runs correctly the first time and every time. You do that by planning ahead. Formal specs are part of this.

The same consideration applies to writing, in general. I witnessed this first hand, when I worked at the computer labs at the University of Michigan back in the 1980s. When all anyone had was an electric typewriter, they planned their papers out well in advance - detailed outlines, with logically designed introductions and conclusions. This was a requirement, because once you got into actually typing, correcting a mistake could be expensive or impossible without starting over from scratch.

When we only had Wordstar available, and users came in with the inevitable questions and requests for help, you would still see well-organized papers written with a clear logical flow.

In 1985 with the advent of the Macintosh and MacWrite, all of that changed: People discovered that they could edit at will, copy and paste and move text around on the fly. And so they did - throwing planning out the window. And the quality of papers we saw at the help desk plummeted. Surveys from those years confirmed it too - paradoxically, the writing scores in the better-funded departments dropped rapidly, commensurate with their rate of introduction of Macs.

Planning ahead, making sure you know what you need to write, before you start writing any actual lines of code, prevents most problems from ever happening. It's a pretty simple discipline, but one that's lost on most people today.


> Most of today's programming problems come down to laziness and fear of "premature optimization" (which is obviously bunk). It is possible to write perfect C code that runs correctly the first time and every time. You do that by planning ahead. Formal specs are part of this.

If "planning ahead" were enough to prevent security vulnerabilities in practice, someone would have done it by now. Instead, we've had C for upwards of 30 years and everyone, "bad programmers", "good programmers" and "10xers" alike, keeps introducing the same vulnerabilities.

It's a nice theory, but we've been trying for over 30 years to make it work, and it keeps failing again and again. I think it's time to admit that C as a secure language (modulo the extremely expensive formal coding practices used in avionics and whatnot) has failed.


Sorry, but your statement makes no sense. That's like saying "we've been trying for over 5000 years to make hammers work, and people keep smashing their thumbs and fingers again and again. I think it's time to admit that hammers as a safe tool have failed."

Security is not a property of the tool, it is a property of the tool's wielder.


> Security is not a property of the tool, it is a property of the tool's wielder.

How many memory safety bugs do you see in Java applications compared to C applications?


If Java memory safety is so good, why does NullPointerException still exist in the language?


I'm not talking about NPE (though I would be happy to elaborate on how not having null pointers also reduces crashes in the wild). I'm talking about exploitable and/or non-type-safe memory safety problems.

Writing in Java is an effective way to reduce remote code execution vulnerabilities over writing in C. Statistics have shown this over and over again.


The very fact that NPE exists, in a language which doesn't have pointers, running on a JVM that doesn't have pointers, is evidence that the entire model of "doesn't have pointers" is a sham. The reality of computer architecture is that data resides in memory, memory is an array of bytes, and bytes have addresses (i.e., pointers) and denying this fact is impossible. Even in a pure pointerless system like Java. And if there are pointers in the system, somewhere, anywhere, they are exploitable.

Remote code execution vulnerabilities in C are less a feature of the language syntax and mostly due to the abysmally poor C library, which is regrettably part of "C The Language" - but no one forces you to use it - there are better libraries. Also, it's an artifact of implementation - just as Java's supposed immunity is an artifact of implementation, not an inherent feature of the language syntax.

Case in point http://www.trendmicro.com/vinfo/us/threat-encyclopedia/vulne...

As for RCEs in C code - these would be easily avoided if the implementation used two separate stacks - one for function parameters, and one for return addresses. If user code can only overwrite passive data, stack smashing attacks would be totally ineffective. Again - there's nothing in the C language specification that dictates or defines this particular vulnerability. It is solely an internal implementation decision.


> As for RCEs in C code - these would be easily avoided if the implementation used two separate stacks - one for function parameters, and one for return addresses.

No. Increasingly, RCEs are due to vtable confusion resulting from use after free.


Sounds like you're the one with confusion; vtables are not a feature of C. C++, sure, but that's a different conversation. Your point is invalid.


References are pointers.


The trouble here is that the efficiency you hold up so highly is marginal in most cases. Making the computer do work to make human lives easier isn't a waste of resources, it's the only reason computers exist in the first place.


Totally agree with your latter statement. But the overall assertion still leads people down misguided paths. Modern OSs like MacOS, Windows, and desktop Linux are presumably intended to make using computers easier. Each new release adds reams of new code implementing new features intended to improve ease of use. Yet the net effect is often negative, because the sheer amount of code added makes the overall system significantly slower.

Meanwhile, new programming languages abound, presumably to make programmers' lives easier. But these developments often completely lose sight of the users. Python is easier to write, but a program written in python uses 1000x the CPU and memory resources of a program written in C. It does the job, but slowly, and consuming so much that the computer is unable to do anything else productive. Meanwhile, it's questionable whether programmers' lives have been improved either, because they spend too much time chasing the next new tool instead of growing expertise in just one.

As for marginal efficiency gains - that is only a symptom of the latter developer problem. E.g., LMDB is pure C, less than 8KLOCs, and is orders of magnitude faster than every other embedded database engine out there. It is also 100% crash-proof and offers a plethora of features that other DB engines lack. All while being able to execute entirely within a CPU's L1 cache. The difference is more than "marginal" - and that difference comes from years of practice with C. You will never gain that kind of experience by being a dilettante and messing with the flavor of the week.


Python is a scripting language, it was never meant to write full blown applications.


> > It has more than 2M lines of C code. Almost all libraries were written from scratch.

> Every professional security researcher reading this just raised an eyebrow and thought to themselves, "That's a vulnerable application."

That may be true, but substitute "C" with "Python" and wouldn't you have the same reaction? Maybe you would expect it to be slightly less vulnerable, but the key risk that I read in that statement (I am not a security professional) is the "2M" and the "written from scratch". The risk from "C" is secondary.


I would think a 2M line Python application would suffer logic bugs, like the C version. However, I wouldn't worry about indexing off the end of arrays, NULL pointer accesses, etc. which occur in C (sometimes silently) on top of the logic errors.


Less vulnerable is still an improvement.


Not if it doesn't work anymore.


"I have never professionally audited a C project and not found a vulnerability."

Just out of interest: How many C projects have you audited? And have you ever looked for a relationship between the number of vulnerabilities and the percentage code coverage from automated tests?


Granted I would say nginx, and perhaps Redis are quite exceptional that way. I use them as examples of nice, clear, clean C code.

A good programmer will make COBOL look clean and understandable. But on average, I could way C code requires a lot more discipline, knowledge, and experience to produce quality code.


> I don't like his statement: The first rule of C is don't write C if you can avoid it

I wanted to interpret it as you shouldn't write code if you can avoid it.

I'm certain that's not what he meant, but it sounds better.


Similar to - the best line of code is the one you didn't need to write. Naturally that applies to all programming languages, not just C.


The first rule of C is don't write C if you can avoid it).

I interpreted that differently than you. I didn't think take it to use another language. But to keep c code as concise as possible and to not reinvent the wheel when a good library is available.


There may be individual exceptions, but the key point "write simple, understandable code" for most people and cases is the right thing to do.

I like tweaking younger engineers with C eccentricities, but any commercial code tends towards being as vanilla as possible with lots of error checking and docs to be as clear as possible.


The problem are the enterprises that don't care about the quality of the C developers they hire, with big teams, high attrition and rotation of outsourcing companies.

I have seen the quite a few of those and I imagine you wouldn't enjoy to audite their code.


I just wanted to comment to your point about loving pure C over C++ and say I feel the same way.


No, and to claim as much is simple laziness. It is possible to write provably correct code. Even if you don't want to go full formally verified, you can eliminate large categories of possible errors with a relatively small amount of proof.

None of which removes the need to write simple, understandable code without undocumented magic. But we should not be throwing up our hands and treating invalid accesses, memory safety failures, buffer overflows, SQL injection, XSS, or a whole host of common failures as inevitable. It really is possible to do a lot better than most programs and a lot better than C.


I interpreted the comment a bit more generously that you did. You're absolutely corrent that it's possible, and I'll go as far as to say even desireable, to write provable correct code. However, I've never seen a demonstration of probably correct hardware. I interpreted the comment as reminder that even binary is, ultimately, a leaky abstraction.

I used to work in a facility where they'd test systems for radiation hardness. Get the system running, then shoot a proton beam travelling a significant fraction of the speed of light directly at the processor. It doesn't matter if the code was proven correct Dijkstra, it will eventually manifest bugs under these conditions.


I don't think anyone proves their hardware from basic physics[1], but hardware comes with specs and tolerances. If your software needs to operate in a high-radiation environment, you buy appropriate hardware and potentially take other countermeasures (e.g. multiple processors executing in sync).

[1] practically we resort to empirical testing to confirm that hardware conforms to its spec. But as your sibling says, that's no excuse not to do better in the places where we can.


EAL7 data diodes with octocouplers and EMSEC do. Oh, snap, you didnt see that coming! :P


Perhaps don't put your system in the path of a proton beam? I've had a large test suite that consistently used 8GB running for three years without ECC RAM. Any bit flip would have been caught by the test suite. My conclusion is that bit flips are entirely exaggerated (probably by the hardware industry for obvious reasons).


Do you have the suite published? The frequency of errors depends on how much memory and in which conditions it is being used. Datacenters, supercomputers, even personal workstations used for scientific calculations benefit from error correction. Check this out: http://techcrunch.com/2009/11/02/new-study-proves-that-ecc-m...


Downvoted again by mindless mainstream. Have another go, perhaps you'll reach -10.


Wow, after 7 days someone has bothered to downvote this. Here, you have another chance.


Usually comments that complain about downvotes are downvoted. Have another go, perhaps you'll reach -20. If you persist, you will be eminently qualified as a Wikipedia admin.


John Regehr observed a proven-correct C compiler generating incorrect code. The reason? An error in a system header file. The truth of the matter is that for most of the software systems formally proving every single piece of the system to be correct is too expensive. Perhaps as the rate of change in the software industry inevitably slows, this will change, as slower change will increase re-usability, but at the moment it is the sad state of things.


> It is possible to write provably correct code.

... and then someone finds a bug in the specification which is provably correctly implemented.

Using gets() to read the input can be provably correct. All you have to do is remove any requirements for security from the program's specification, and specify that the program will always be used in circumstances when the expected input datum will be less than a certain length.

... and then the hardware is buggy or failing. That RAM bit flips and you're screwed. The machine as a whole isn't provably correct.


> ... and then someone finds a bug in the specification which is provably correctly implemented.

As a programmer, if the specification is wrong, it's Not My Fault (tm). If you want me to write specifications, well, pay me to do it! I guarantee dramatically better results than the usual crap.

> The machine as a whole isn't provably correct.

As a programmer, defects in the machine are Not My Problem (tm). In any case, hardware vendors have a much better track record than software developers at delivering products that meet strict quality standards.


Unfortunately, even if it's not your fault or problem, there is a fault or problem.

We're no longer discussing correctness, but assignment of blame.


Assignment of blame is important, because it determines what component needs to be fixed to solve the problem. It's all about correctness in the end.

As an aside, I want to clarify that I'm not advocating being a bad team player. The point to assigning blame isn't demonizing the developer of the faulty component, or reducing cooperation between developers of different components to the bare minimum. The point is just reducing the cost of fixing the problem.


Assignment of blame is important, but it is often an opinion.

If one program misuses another, usually the fix can be an alteration in either one or both. The used program can be expanded to accept the "misuse" which is actually legitimate but not in the requirements. Or the using program can be altered not to generate the misuse.

Sometimes what is fixed is chosen for non-technical reasons, like: the best place to fix it it is in such and such code, but ... it's too widely used to touch / not controlled by our group and we need a fix now / closed source and not getting fixed by the vendor {until next year|EVER} / ...


> Assignment of blame is important, but it is often an opinion.

When in doubt, ask the specification. If there was no specification, why was any code written at all?

> If one program misuses another, usually the fix can be an alteration in either one or both.

Actually, there are three possibilities:

(0) The used program doesn't comply with its specification.

(1) The using program doesn't comply with its specification.

(2) The specifications of both programs are mutually inconsistent. In which case, it's a programming mistake to connect both programs.

> The used program can be expanded to accept the "misuse" which is actually legitimate but not in the requirements.

If it isn't in the specification, it isn't legitimate, period.

> Or the using program can be altered not to generate the misuse.

Sure.


"If it isn't in the specification, it isn't legitimate, period."

You have some goal you're trying to achieve, or why was a specification written at all? If you find that the existing specification is not the optimal path to that goal, you may very well want to change the specification. In which case, it's entirely legitimate.


Anyway, requirements, specifications and programs are just multiple representations of the same thing. Ideally, we would just state some requirements and they would execute.

So the requirements/specification/program complex exists Just Because. Humans decided they want that: for their amusement, for the sake of supporting some enterprise or solving a problem, or to try to sell to other humans.

The complex contains several different representations because that's what it takes to bridge the gap between stating the requirements and making the machine carry them out.

A proof is something internal to that complex: that the specification corresponds to the requirements, and that the program implements the specification.

If we had just one artifact: a specification that executes, then there would be no concept of proof any more.

There is only the question whether the specification that was expressed is the one that was intended in the mind. That equivalence is no more susceptible to proof than, say, the correspondence between the Ceasar salad that the waiter just put on your table (and which you clearly specified) and the idea of whether you actually wanted one, or did you mistakenly say "Ceasar salad" in spite of having wanted a soup.

Plus, there is the external question of whether a specification has unintended consequences. That basically amounts to "you say you want that, but maybe you should revise what you want because of these bad things".

"You say you want plaintext passwords to be stored for easier recovery, and that can certainly be implemented (provably correctly, too) but consider the following ramifications ..."


> If we had just one artifact: a specification that executes, then there would be no concept of proof any more.

It is possible for an executable functional specification to have unbearably bad performance for production use. In that case, the programmer has to reimplement the program in a more efficient (but perhaps less obvious) way and supply a proof that the reimplementation is functionally equivalent to the original executable specification.

Proofs aren't going away anytime soon.

> There is only the question whether the specification that was expressed is the one that was intended in the mind.

If other people can't bother communicating what they really want, that is absolutely Not My Problem (tm).

> Plus, there is the external question of whether a specification has unintended consequences.

If other people can't analyze the logical consequences of what they wish for, that is absolutely Not My Problem (tm). The most I can do is point at contradictions in the requirements.


Depending on where you go that Caesar salad can be wildly different and it probably is different from what you had in your mind... just saying...


It is only legitimate if you change the specification first.


I actually think this depends quite a bit on context, unless you have a carefully particular definition of some of those words in mind.


Specifications can have bugs. The root cause of a software defect can sometimes be traced to a bad specification (which was followed).


The customer signed the specification.


This is tremendously context specific.


Maybe cosmic rays give you a lower bound on the defect rate (though you can certainly measure and account for them). But I have never seen a complete system in which cosmic ray errors were anything more than the tiniest of undetectable noise, drowned out by the vast weight of software errors.


Cosmic rays are no excuse not to throw an exception on integer overflow.


If, for some inputs, the program is supposed to calculate the value 42, but it throws an integer overflow, then it is not correct.

Detecting overflows and throwing nice exceptions instead of continuing execution with garbage values isn't the same thing as ensuring correctness.


Is it really the case that some programming languages are more resilient to hardware errors than others?

Java does array bounds checking in principle, but it is usually optimized out of the machine code. Isn't it then virtually just as susceptible to hardware error as C? And the JVM is written in C anyway.


The JVM is written in C, but that C has been subject to some formal analysis. And even if it hadn't, I'd bet the defect rate of a Java program running on a C JVM would be much much lower than that of a C program.


But, in keeping with this topic, that C has not been subject to formal analysis from the angle of defending against rays doing random things to the CPU and RAM or other hardware errors.


So let's stop unit testing all of our code, since it's provably incorrect anyways, due to cosmic rays flipping random bits of RAM once in every gazillion calculations.


> And the JVM is written in C anyway.

Which one?

https://en.wikipedia.org/wiki/List_of_Java_virtual_machines

Depending which one we are speaking about, they are implemented in Assembly, C, C++ or even Java.


This has been the subject of some papers, like

https://www.cs.princeton.edu/~appel/papers/memerr.pdf [2003]


right "the best we can do" is actually pretty damned good, if anyone would bother doing it.


I bother doing it but it still ends up: "the best they want to pay us for" which usually is less than the best.


That reminds me of the old G.K. Chesterton quote, which was originally about religion.

Provable correctness "...has not been tried and found wanting; it has been found difficult and not tried."


What is the largest program that has been proven correct?

We're writing programs that are tens of millions of lines of code (OSes, databases, air traffic control systems). My impression of provable correctness is that the largest program that has been proven is in the area of 100,000 lines (feel free to correct if I am in error). We need to be able to prove programs two orders of magnitude bigger than that; provable correctness has not been tried for such programs because nobody wants to wait a generation or two before they get the results.


Sort of. We haven't proven a single thing beyond certain size and complexity. That was true when it started back in Orange Book days. Even when Dijkstra came up with the stuff and the method of dealing witg it: composition w/ preconditions, postconditions, and invariants.

You code in a functional style and/or with state machines composed. They composed stuff hierarchically and loopfree where possible to facilitate analysis. Each component is small enough to analyze for all success or failure states. The results of each become "verification conditions" for things that build on it. With such methods, one can build things that are as large as you want so long as each component (primitive or composite) can be analyzed.

Not clear at what point this breaks down. I used it for drivers, protocols, algorithms, systems, networks, and so one. I brute forced it with more manual analysis because Im not mathematical or trained enough for proofs. However, what holds it back on large things is usually just lack of effort far as I can tell.


So maybe 1) we shouldn't wrote systems that are tens of millions of anarchically intercoupled lines of code or 2) we should write 100 100,000 line subsystems and link them together in a rigorous way.

They key is then to simply avoid emergent interactions between modules/subsystems. That's pretty doable.

What doesn't happen is that people never reason about the cost of defects. Even if they do some reasoning about this, they don't apply that to other components of systems.


Exactly what I said above. Verifying individual components and their straight-forward compisition is how it was done in the past. People just ignore that.


I heard you. I just still fail to believe it. :)


Y'know a lot of the best, most robust and mature software out there is a generation old... Maybe that's just how long it takes to get large scale software right.


All you do by proving code correct is shift any errors from the code to the proof.


The discussion is about C, so I would assume the post you're responding to was in that scope as well. Leading with an insult isn't likely to win any hearts or minds either.


> It is possible to write provably correct code.

How many modern programmers hold this mistaken belief? It's false -- the Turing Halting problem (https://en.wikipedia.org/wiki/Halting_problem) demonstrates that one cannot establish that a non-trivial program is "provably correct."

It seems the OP got carried away with his rhetoric, and the way he put it is simply wrong.


That is a very common but incorrect misinterpretation of the halting problem.

The halting problem says that there exist programs that cannot be proven correct; it says that for any property you want to prove computationally, no program can give you an answer for every last program. It doesn't say anything about "non-trivial" programs. In particular, if you write programs with awareness of the halting problem in mind, you can write extremely complicated programs that are provably correct.

One straightforward way to do this is with a language that always halts. The proof assistant Coq is based on such a language, and it's powerful enough to write a C compiler: http://compcert.inria.fr/

It is certainly correct that you can't stuff an arbitrarily vexatious program through a proof assistant and get anything useful on the other side. But nobody is interested in proving things about arbitrarily vexatious programs; they're interested in proving things about programs that cooperate with the proof process. If your program doesn't want to cooperate, you don't need an exact answer; you might as well just say "No, I don't care, I'm not trusting you."


Right. This is a bogus complaint about proof of correctness.

The way you prove halting is to find some integer measure for each loop (like "iterations remaining" or "error being reduced") which decreases on each iteration but does not go negative. If you can't find such a measure, your program is probably broken. (See page 12 of [1], the MEASURE statement, for a system I built 30 years ago which did this.)

The measure has to be an integer, not a real number, to avoid Zeno's paradox type problems. (1, 1/2, 1/4, 1/8, 1/16 ...)

On some floating point problems, with algorithms that are supposed to converge, proving termination can be tough. You can always add an iteration counter and limit the number of iterations. Now you can show termination.

Microsoft's Static Driver Verifier is able to prove correctness, in the sense of "not doing anything that will crash the kernel", about 96% of the time. About 4% of the time, it gives up after running for a while. If your kernel driver is anywhere near close to undecidability, there's something seriously wrong with it.

In practice, this is not one of the more difficult areas in program verification.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf


Pascal-F is interesting and not in my collection. What were most useful or interesting things you did with it? And does it have any use today in modern Pascal-like languages (maybe Modula-2 revision)?


> It doesn't say anything about "non-trivial" programs.

No, but its source does. The source for the Turing Halting problem is Gödel's incompleteness theorems, which do specify that the entities to which it applies are non-trivial. The Turing Halting problem, and Gödel's incompleteness theorems, are deeply connected.

> One straightforward way to do this is with a language that always halts.

That doesn't avoid the halting problem, it only changes the reason for a halt. And it cannot guarantee that that program will (or won't) halt, if you follow -- only its interpreter.

> In particular, if you write programs with awareness of the halting problem in mind, you can write extremely complicated programs that are provably correct.

You need to review the Turing Halting problem. No non-trivial computer program can be proven correct -- that's the meaning of the problem. It cannot be waved away, any more than Gödel's incompleteness theorems can be waved away, and for the same reason.


> The source for the Turing Halting problem is Gödel's incompleteness theorems, which do specify that the entities to which it applies are non-trivial. The Turing Halting problem, and Gödel's incompleteness theorems, are deeply connected.

Gödel doesn't say anything about "non-trivial" any more than Turing does. His first theorem is a "there exists at least one statement" thing, just like the halting problem. Like the halting problem, the proof is a proof by construction, but that statement is not a statement that anyone would a priori are to prove, just like the program "If I halt, run forever" is not a program that anyone would particularly want to run. So while yes, they're deeply connected, from the point of view of proving actual systems correct, it's not clear they matter. And neither theorem says anything about "non-trivial" statements/programs (just about non-trivial formalizations and languages).

Gödel's second theorem is super relevant, though: it says that any consistent system cannot prove its own consistency. But that's fine. Coq itself is implemented in a Turing-complete language (ML), not in Coq. It would be nice if we could verify Coq in Coq itself, but since Gödel said we can't, we move on with life, and we just verify other things besides proof assistants. (And it seems to be worthwhile to prove things correct in a proof assistant, even without a computer-checked proof of that proof assistant's own correctness.)

This is not waving away the problem, it's acknowledging it and working within the limits of the problem to do as much as possible. And while that's not everything, it turns out a lot of stuff of practical importance -- perhaps everything of practical importance besides proving the system complete in its own language -- is possible.


> Gödel doesn't say anything about "non-trivial" any more than Turing does.

Source: https://en.wikipedia.org/wiki/Halting_problem

Quote: "Rice's theorem generalizes the theorem that the halting problem is unsolvable. It states that for any non-trivial property, there is no general decision procedure that, for all programs, decides whether the partial function implemented by the input program has that property. (A partial function is a function which may not always produce a result, and so is used to model programs, which can either produce results or fail to halt.) For example, the property "halt for the input 0" is undecidable. Here, "non-trivial" means that the set of partial functions that satisfy the property is neither the empty set nor the set of all partial functions."

Source: https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...

Quote: "Gödel's incompleteness theorems are two theorems of mathematical logic that establish inherent limitations of all but the most trivial axiomatic systems capable of doing arithmetic."

> It would be nice if we could verify Coq in Coq itself ...

I was planning to bring this issue up, but you've done it for me. The simplest explanation of these two related theorems involves the issue of self-reference. Imagine there is a library, and a very conscientious librarian wants to have a provably correct account of the library's contents. The librarian therefore compiles an index of all the works in the library and declares victory. Then Gödel shows up and says, "On which shelf shall we put the index?"

In the same way, and for the same reason, non-trivial logical systems cannot check themselves, computer languages cannot validate their products (or themselves), and compilers cannot prove their own results correct.

> And it seems to be worthwhile to prove things correct in a proof assistant, even without a computer-checked proof of that proof assistant's own correctness.

Yes, unless someone is misled into thinking this means the checked code has been proven correct.

> This is not waving away the problem ...

When someone declares that a piece of code has been proven correct, in point of fact, yes, it is.


> for any non-trivial property, there is no general decision procedure that, for all programs

> all but the most trivial axiomatic systems

This is about non-trivial properties on arbitrary programs, and non-trivial axiomatic systems proving arbitrary statements. It is not about non-trivial programs or non-trivial statements.

Rice's theorem states that, for all non-trivial properties P, there exists a program S where you cannot compute P(S). (The Wikipedia phrasing puts the negative in a different spot, but this is de Morgan's law for quantifiers: ∀P ¬∀S P(S) is computable = ∀P ∃S ¬ P(S) is computable.)

Gödel's first theorem is that, for all non-trivial formalizations P, there exists a statement S where you cannot prove P in S.

You are claiming, for all non-trivial properties P, for all non-trivial programs S, you cannot compute P(S); for all non-trivial formalizations P, for all non-trivial statements S, you cannot prove P in S.

This is not what Rice's or Gödel's theorems are, and not what the texts you quoted say.

> Then Gödel shows up and says, "On which shelf shall we put the index?"

You keep it outside the library. The index tells you where every single book is, other than the index.

In particular, Gödel claims that, if the index has any chance of being correct, it must be outside the library. The fact that Coq cannot be proven in Coq does not decrease our confidence that it is correct; in fact, Gödel's second theorem means that if it could be proven in itself, we'd be confident it was wrong!


>> Then Gödel shows up and says, "On which shelf shall we put the index?"

> You keep it outside the library. The index tells you where every single book is, other than the index.

This approach would have saved Russell and Whitehead's "Principia Mathematica" project, would have placed mathematics on a firm foundation, but it suffers from the same defect of self-reference -- it places the index beyond validation. Russell realized this, and recognized that Gödel's results invalidated his project.

> The fact that Coq cannot be proven in Coq does not decrease our confidence that it is correct;

Yes, true, but for the reason that we have no such confidence (and can have no such confidence). We cannot make that assumption, because the Turing Halting problem and its theoretical basis prevents it.

Source: http://www.sscc.edu/home/jdavidso/math/goedel.html

Quote: "Gödel hammered the final nail home by being able to demonstrate that any consistent system of axioms would contain theorems which could not be proven. Even if new axioms were created to handle these situations this would, of necessity, create new theorems which could not be proven. There's just no way to beat the system. Thus he not only demolished the basis for Russell and Whitehead's work--that all mathematical theorems can be proven with a consistent set of axioms--but he also showed that no such system could be created." (emphasis added)


> Russell realized this, and recognized that Gödel's results invalidated his project.

One particular project. The specific project of formalizing mathematics within itself is unachievable. That doesn't mean we've stopped doing math.

And this is in fact the approach modern mathematics has taken. The fact that one particular book (the proof of mathematics' own consistency) can't be found on the shelves doesn't mean there isn't value in keeping the rest of the books well-organized.

> Yes, true, but for the reason that we have no such confidence (and can have no such confidence). We cannot make that assumption, because the Turing Halting problem and its theoretical basis prevents it.

No! Humans aren't developed in either Coq or ML. A human can look at Coq and say "Yes, this is correct" or "No, this is wrong" just fine, without running afoul of the halting problem, Gödel's incompleteness theorem, or anything else. The fact that you can have no Coq-proven confidence in Coq's correctness does not mean that you can have no confidence of Coq's correctness.

(Of course, this brings up whether a human should be confident in their own judgments, but that's a completely unavoidable problem and rapidly moving to the realm of philosophy. If you are arguing that a human should not believe anything they can convince themselves of, well, okay, but I'm not sure how you plan to live life.)

This is like claiming that, because ZF isn't provably correct in ZF (which was what Russell, Whitehead, Gödel, etc. were concerned about -- not software engineering), mathematicians have no business claiming they proved anything. Of course they've proven things just fine if they're able to convince other human mathematicians that they've proven things.


>> Russell realized this, and recognized that Gödel's results invalidated his project.

> One particular project.

A mathematical proof applies to all cases, not just one. Gödel's result addressed Russell's project, but it applies universally, as do all valid mathematical results. Many mathematicians regard Gödel's result as the most important mathematical finding of the 20th century.

> That doesn't mean we've stopped doing math.

Not the topic.

> A human can look at Coq and say "Yes, this is correct" or "No, this is wrong" just fine, without running afoul of the halting problem, Gödel's incompleteness theorem, or anything else.

This is false. If the Halting problem is insoluble, it is also insoluble to a human, and perhaps more so, given our tendency to ignore strictly logical reasoning. It places firm limits on what we can claim to have proven. Surely you aren't asserting that human thought transcends the limits of logic and mathematics?

> Of course they've proven things just fine if they're able to convince other human mathematicians that they've proven things.

Yes, with the caution that the system they're using (and any such system) has well-established limitations as to what can be proven. You may not be aware that some problems have been located that validate Gödel's result by being ... how shall I put this ... provably unprovable. The Turing Halting problem is only one of them, there are others.


> A mathematical proof applies to all cases, not just one. Gödel's result addressed Russell's project, but it applies universally, as do all valid mathematical results.

What? I have no idea what you mean by "applies universally" and "all valid mathematical results."

A straightforward reading of that sentence implies that Pythagoras' Theorem is not a "valid mathematical result" because it applies only to right triangles. That can't be what you're saying, is it?

I am acknowledging that Gödel's second theorem states that any mathematical system that attempts to prove itself is doomed to failure. I am also stating that Russell's project involved a mathematical system that attempts to prove itself. Therefore, Gödel's second theorem applies to Russell's project, but it does not necessarily say anything else about other projects.

> Many mathematicians regard Gödel's result as the most important mathematical finding of the 20th century.

So what? This is argumentum ad populum. How important the mathematical finding is is irrelevant to whether it matters to the discussion at hand.

> This is false. If the Halting problem is insoluble, it is also insoluble to a human

OK, so do you believe that modern mathematics is invalid because Russell's project failed?


> Surely you aren't asserting that human thought transcends the limits of logic and mathematics?

Given that both logic and mathematics are creations of human thought, is this really so untenable?


Wrong. What Russell/Gödel/Turing/Rice/etc. tell you you can't do is write a decision procedure that checks arbitrary nontrivial properties of programs. In other words, if you first write a program and only then ask whether it is correct, you've already lost.

Obviously, the solution is to stop writing arbitrary programs. As Dijkstra said long ago, we need to find a class of intellectually manageable programs - programs that lend themselves to being analyzed and understood. Intuitively, this is exactly what programmers demand when they ask for “simplicity”. The problem is that most programmers' view of “simplicity” is distorted by their preference for operational reasoning, eschewing more effective and efficient methods for reasoning about programs at scale.


> Obviously, the solution is to stop writing arbitrary programs. As Dijkstra said long ago, we need to find a class of intellectually manageable programs - programs that lend themselves to being analyzed and understood.

You're overlooking the issue of self-reference. A given computer language can validate (i.e. prove correct) a subset of itself, but it cannot be relied on to validate itself. A larger, more complex computer language, created to address the above problem, can check the validity of the above smaller language, but not itself, ad infinitum. It's really not that complicated, and the Halting Problem is more fundamental than this entire discussion seems able to acknowledge.


Correctness is always relative to a specification. More precisely, it's the program's specification, which must exist prior to the program itself, which defines what it constitutes for the program to be correct. So a program that “validates itself” simply doesn't make sense.

Also, it seems you misunderstand Gödel/Turing/etc.'s result. What a consistent and sufficiently powerful formal system can't do is prove its own consistency as a theorem. Obviously, it isn't sustainable to spend all our time inventing ever more powerful formal systems just to prove the preceding ones consistent. At some point you need to assume something. This is precisely the rôle of the foundations of mathematics (or perhaps I should say a foundation, because there exist many competing ones): to provide a system whose consistency need not be questioned, and on top of which the rest of mathematics can be built.

In any case, we have gone too far away from original point, which is that, if arbitrary programs are too unwieldy to be analyzed, the only way we can hope to ever understand our programs is to limit the class of programs we can write. And this is precisely what (sound) type systems, as well as other program analyses, do. Undoubtedly, some dynamically safe programs will be excluded, e.g., most type systems will reject `if 2 + 2 == 4 then "hello" else 5` as an expression of type `String`. But, in exchange, accepted programs are free by construction of any misbehaviors the type system's type safety theorem rules out. The real question is: how do we design type systems that have useful type safety theorems...?


> So a program that “validates itself” simply doesn't make sense.

The language comes from Gödel's incompleteness theorems, in which there exist true statements that cannot be proven true, i.e. validated.

A given non-trivial computer program can be relied on to produce results consistent with its definition (its specification), but this cannot be proven in all cases.

It's not very complicated.

> This is precisely the rôle of the foundations of mathematics (or perhaps I should say a foundation, because there exist many competing ones): to provide a system whose consistency need not be questioned, and on top of which the rest of mathematics can be built.

This is what Russell and Whitehead had in mind when they wrote their magnum opus "Principia Mathematica" in the early 20th century. Gödel proved that their program was flawed. Surely you knew this?

> Also, it seems you misunderstand Gödel/Turing/etc.'s result.

The misunderstanding is not mine.


> A given non-trivial computer program can be relied on to produce results consistent with its definition (its specification), but this cannot be proven in all cases.

Provability is always relative to a formal system. The formal system you (should) use to prove a program correct (with respect to its specification) isn't the program itself.

> Gödel proved that their program was flawed. Surely you knew this?

Gödel didn't prove the futility of a foundations for mathematics. Far from it. What Gödel proved futile is the search for a positive answer to Hilbert's Entscheidunsproblem: There is no decision procedure that, given an arbitrary proposition (in some sensible formal system capable of expressing all of mathematics), returns `true` if it is a theorem (i.e., it has proof) or `false` if it is not.

If you want to lecture someone else on mathematical logic, I'd advise you to actually learn it yourself. I'm far from an expert in the topic, but at least this much I know.


> Gödel didn't prove the futility of a foundations for mathematics.

Your words, not mine, but Gödel did prove that a non-trivial logical system cannot be both complete and consistent. As Gödel phrased it, "Any consistent axiomatic system of mathematics will contain theorems which cannot be proven. If all the theorems of an axiomatic system can be proven then the system is inconsistent, and thus has theorems which can be proven both true and false."

Gödel's result don't address futility, but completeness. That's why his theorems have the name they do.

> The formal system you (should) use to prove a program correct (with respect to its specification) isn't the program itself.

This refers to the classic case of using a larger system to prove the consistency of a smaller one, but it suffers from the problem that the larger system inherits all the problems it resolves in the smaller one.

> If you want to lecture someone ...

This is either beneath you, or it should be.


> This is either beneath you, or it should be.

I'm sorry, but 'catnaroek is completely justified here. Multiple people have told you that you are simply factually wrong in your interpretation of the halting problem, and you are acting arrogant about it. If you tone back the arrogance a bit ("Surely you knew this?", "The misunderstanding is not mine") and concede that, potentially, it's not the case that everyone else is wrong and you're right, you might understand what your error is.

It's a very common mistake and is frequently mis-taught, especially by teachers who are unfamiliar with fields of research that actually care about the implications of the halting problem and of Gödel's theorems (both in computer science and in mathematics) and just know it as a curiosity. So I've been very patient about this. But as I pointed out to you in another comment, you are misreading what these things say in a way that should be very obvious once you state in formal terms what it is that you're claiming, and it would be worth you trying to acknowledge this.


My original objection was to the claim that a non-trivial program could be demonstrated to be "provably correct." This statement is false. For non-trivial programs, it contradicts the simplest and most easily accessible interpretation of the Turing Halting problem and Gödel's incompleteness theorems, both of which demonstrate that this is not possible.

Source: https://en.wikipedia.org/wiki/Halting_problem

Quote: "Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist." (emphasis added)

> Multiple people have told you that you are simply factually wrong ...

So as the number of people who object increases, the chances that I am wrong also increases? This is an argumentum ad populum, a logical error, and it is not how science works. Imagine being a scientist, someone for whom authority means nothing, and evidence means everything.

In your next post, rise to the occasion and post evidence (as I have done) instead of opinion, or don't post.


> In your next post, rise to the occasion and post evidence (as I have done) instead of opinion, or don't post.

I have already done so. Did you read my comment about existential vs. universal quantifiers? Can you respond to it?

This was the original claim:

> It's false -- the Turing Halting problem (https://en.wikipedia.org/wiki/Halting_problem) demonstrates that one cannot establish that a non-trivial program is "provably correct."

This is the statement you quoted from Wikipedia:

> Quote: "Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist."

Do you agree or disagree that the quantifiers in these two statements are different? Do you agree or disagree that "the Turing Halting problem demonstrates that one cannot establish that every single non-trivial program is either 'provably correct' or 'provably incorrect'" is a different claim than the one originally made?


> My original objection was to the claim that a non-trivial program could be demonstrated to be "provably correct." This statement is false. For non-trivial programs, it contradicts the simplest and most easily accessible interpretation of the Turing Halting problem and Gödel's incompleteness theorems, both of which demonstrate that this is not possible.

You don't need to solve the Halting problem (or find a way around Rice's theorem, etc.) to write a single correct program and prove it so. A-whole-nother story is if you want a decision procedure for whether an arbitrary program in a Turing-complete language satisfies an arbitrary specification. Such a decision procedure would be nice to have, alas, provably cannot possibly exist.

To give an analogy, consider these two apparently contradictory facts:

(0) Real number [assuming a representation that supports the usual arithmetic operations] equality is only semidecidable: There is a procedure that, given two arbitrary reals, loops forever if they are equal, but halts if they aren't. You can't do better than this.

(1) You can easily tell that the real numbers 1+3 and 2+2 are equal. [You can, right?]

The “contradiction” is resolved by noting that the given reals 1+3 and 2+2 aren't arbitrary, but were chosen to be equal right from the beginning, obviating the need to use the aforementioned procedure.

The way around the Halting problem is very similar: Don't write arbitrary programs. Rather, design your programs right from the beginning so that you can prove whatever you need to prove about them. Typically, this is only possible if the program and the proof are developed together, rather than the latter after the former.

This much math you need to know if you want to lecture others on the Internet. :-p


> The way around the Halting problem is very similar: Don't write arbitrary programs. Rather, design your programs right from the beginning so that you can prove whatever you need to prove about them.

The system you outline works perfectly as long as you limit yourself to provable assertions. But Gödel's theorems have led to examples of unprovable assertions, thus moving beyond the hypothetical into a realm that might collide with future algorithms, likely to be much more complex than typical modern algorithms, including some that may mistakenly be believed to be proven correct.

If we're balancing a bank's accounts, I think we're safe. In the future, when we start modeling the human brain's activities in code, this topic will likely seem less trivial, and the task of avoiding "arbitrary programs," and undecidable assertions (or even recognizing them in all cases), won't seem so easy.

> This much math you need to know if you want to lecture others on the Internet.

My original objection was completely appropriate to its context.


> The system you outline works perfectly as long as you limit yourself to provable assertions.

Of course. And if a formal system doesn't let you prove the theorems you want, you would just switch formal systems.

> But Gödel's theorems have led to examples of unprovable assertions, thus moving beyond the hypothetical into a realm that might collide with future algorithms likely to be much more complex than typical modern algorithms, including some that may mistakenly be believed to be proven correct.

Programming is applied logic, not religion. And a proof doesn't need to be “believed”, it just has to abide by the rules of a formal system. In some formal systems, like intensional type theory, verifying this is a matter of performing a syntactic check.

> If we're balancing a bank's accounts, I think we're safe.

I wouldn't be so sure.

> In the future, when we start modeling the human brain's activities in code, this topic will likely seem less trivial,

I see this as more of a problem for specification writers than programmers. Since I don't find myself writing specifications so often, this is basically Not My Problem (tm). Even if I wrote specifications for a living, I'm not dishonest or delusional enough to charge clients for doing something I actually can't do, like specifying a program that claims to be an accurate model of the human brain.

> and the task of avoiding "arbitrary programs," and undecidable assertions (or even recognizing them in all cases), won't seem so easy.

I don't see what's so hard about sticking to program structures (e.g., structural recursion) for which convenient proof principles exist (e.g., structural induction).

Please, get yourself a little bit of mathematical education before posting your next reply. You're so stubbornly wrong it's painful to see.


> If we're balancing a bank's accounts, I think we're safe.

You seem to be contradicting your original claim, that it is impossible to prove any non-trivial program correct. Is this because you believe the program to balance a bank's accounts is trivial? Do you also believe that a C compiler is trivial?

I guess we never defined what you meant by "non-trivial." If your position is that the vast majority of software in the world today is "trivial," sure, I'd agree with you. But that's not really a meaning of "trivial" that anyone expected.


Very good analogy!

> but were chosen to be equal right from the beginning

While this was almost certainly the case, it's sufficient for them to have been chosen so as to be decidably comparable.


> it's sufficient for them to have been chosen so as to be decidably comparable.

Indeed. :-)


> "Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist."

You seem to be mixing up where the "for all" lies. The result is:

There does not exist a procedure that, for all (program, input) pairs, the procedure will say whether the program halts for that input.

    {p | Ɐ(f,i): p(f,i) says whether f(i) will halt } = ∅
You are using it as if it said:

For all (program, input) pairs, there does not exist a procedure that will say whether the program halts for that input.

    Ɐ(f,i): {p | p(f,i) says whether f(i) will halt } = ∅
"post evidence (as I have done)"

What is at issue is your incorrect assertions about what your evidence says. We read it, it contradicts you, everyone points this out, and you just ignore it and get louder.


> You seem to be mixing up where the "for all" lies.

Not at all. My original objection was to the claim that a program could be proven correct, without any qualifiers. Any program, not all programs. As stated, it's a false claim.

Your detailed comparison doesn't apply to my objection, which is to an unqualified claim.

> everyone points this out ...

Again, this is a logical error. Science is not a popularity contest, and in this specific context, straw polls resolve nothing.

> you just ignore it and get louder.

I wonder if you know how to have a discussion like this, in which there are only ideas, no personalities, and no place for ad hominem arguments?


> My original objection was to the claim that a program could be proven correct, without any qualifiers. Any program, not all programs.

Let's be precise.

    Let P be the set of all programs.

    Let C(s) ⊂ P be the set of programs correct relative to specification s.
As I read the original statement, it was saying,

    {p ∈ C(s) | p, s ⊢ p ∈ C(s)} is large enough to be useful
I am not sure whether you read it differently.

> As stated, it's a false claim.

You have said this repeatedly, but all you have done in support of it is point at the Halting problem. The Halting problem says:

    ∃p ∈ P s.t. ¬(p ⊢ p ∈ C(halts)) ∧ ¬(p ⊢ p ∉ C(halts))  
You have claimed this eliminates the possibility of proving correct any non-trivial program:

> [T]he Turing Halting problem (https://en.wikipedia.org/wiki/Halting_problem) demonstrates that one cannot establish that a non-trivial program is "provably correct."

I see no way to read that but as a claim that:

    (∃p ∈ P s.t. ¬(p ⊢ p ∈ C(halts)) ∧ ¬(p ⊢ p ∉ C(halts))) → ((p,s ⊢ p ∈ C(s)) → p is trivial)
Please supply a proof.

Note that it's quite true that

    (∃p ∈ P s.t. ¬(p ⊢ p ∈ C(halts)) ∧ ¬(p ⊢ p ∉ C(halts))) → ((p,s ⊢ p ∈ C(s)) → *s* is trivial)
proved usually by reducing s to "halts" for any non-trivial s.

> Again, this is a logical error. Science is not a popularity contest, and in this specific context, straw polls resolve nothing.

I am not deciding it by poll. I am pointing out that there are objections to your reasoning that you have not addressed and persist in not addressing - despite them being made abundantly clear, repeatedly.

> I wonder if you know how to have a discussion like this, in which there are only ideas, no personalities, and no place for ad hominem arguments?

You're accusing me, as a person, of not understanding that a discussion like this has no place for accusations against a person? My complaint was not about you as a person, but about the form of your arguments in this discussion.


Nice post.

Can you clarify your last sentence? What do you mean by "preference for operational reasoning"? What's an example of the more effective methods for reasoning about programs at scale that you're contrasting this to?


“Operational reasoning” means using execution traces (the sequence of states underwent by a process under the control of your program) to derive conclusions about the program. You can do it in your head, using a piece of paper, using a debugger, using strategic assertions, etc.

By “more effective methods for reasoning about programs at scale”, what I mean is analyzing the program in its own right, without reference to specific execution traces. There are several kinds of program analyses that can be performed without running the program, but, to the best of my knowledge, all of them are ultimately some form of applied logic.


It's a guess, but I think GP uses "operational" as a synonym for "imperative". I think I encountered the term "operational semantics" in the context of Prolog programming. Denotational semantics is a term for a kind of semantics which are mathematically pure. IIRC Prolog has both operational and denotational semantics defined and they conflict. There's something about cut operator and tree pruning, but I read this long ago :-)


> It's a guess, but I think GP uses "operational" as a synonym for "imperative".

Not really. It's also possible to reason non-operationally about imperative programs, e.g., using predicate transformer semantics. It's also possible to reason operationally about non-imperative programs.


The Halting Problem helps us prove programs because it informs us that some functions cannot be computed. This means that the requirement spec for some computations cannot be implemented. When we spot this problem in a requirement spec, we instantly know that its proposed implementation is an incorrect program, no matter what that program is. We don't have to look at a single line of code.


The Halting problem can already be solved for a non-trivial subset of programs, take a look at Microsoft Research's "Terminator" project:

http://research.microsoft.com/apps/mobile/news.aspx?post=/en...


But the halting problem for a subset of programs isn't The Halting Problem, which is the question whether a UTM can exist to calculate any function, over its entire domain.


The parent's wording was a little sloppy, but the point was that the grandparent was incorrectly using the Halting Problem to argue that you can't even do it for a meaningful subset of programs, which of course does not follow.


It would seem that Dijkstra, were he alive, would disagree with that. Check google for Proof of Correctness for Dijkstra's algorithm.

Also, Knuth in TAoCP spends significant time proving his programs correct.

Further, there is a compiler that is asserted to be proven correct: http://compcert.inria.fr/

Now it is possible you are working with a different definition of "proved correct" than I am thinking.


Here is something else: provably correct code can demonstrate unacceptable complexity for some input cases. We know from the proof that it will halt and the result will be correct: in 200 years on the given hardware. You have to add a time bound as a requirement specification to avoid this, or a constraint on the complexity (e.g. "the implementation must use no algorithm slower than quadratic time in such and such inputs").



I've noticed that "low-level" languages like C, or for a more extreme example, assembly languages, tend to encourage simpler code if for nothing other than the fact that everything takes many more lines than in higher-level languages. E.g. if you want to do OOP you'll have to allocate memory and call constructors manually, pass explicit 'this' parameters into all method calls, declare virtual function tables, etc. Certainly there are cases where this tempts too easy solutions like fixed-size overflow-prone buffers instead of dynamically allocating strings, but on the other hand it forces you to think more about whether you actually need to do something ("should this string ever be more than X bytes long? Maybe not, might as well use fixed-size array - but I should not forget to check that length!"), which I think is overall a good thing.


It also makes you think upfront about the design much more. With all the intricacies of writing correct low-level code you really want to write it once only and not try out different designs and refactor between them, where mistakes quickly happen.

Of course this is a disadvantage if you don't have sufficient experience in the problem domain. Sometimes I resort to writing prototypes in a script language in that case, translating the best design to the low-level language.


A language that is "write it right the first time, refactoring is painful" is a poor fit for most problems. Requirements change, if your software can't be adapted easily your project will fail. The general failure of waterfall-style software development is a strong indicator that finding the "right" design up front usually doesn't work out.


> on the other hand it forces you to think more about whether you actually need to do something ("should this string ever be more than X bytes long? Maybe not, might as well use fixed-size array - but I should not forget to check that length!"), which I think is overall a good thing.

While that certainly can't be a bad thing, you just have to keep in mind that it forces you to think about a specific set of details – the ones that matter for the machine ("how many bytes will the binary representation of this username require") – and it may fool you into forgetting to think about higher-level concerns ("Does å compare equally to å?").


Au contraire my friend, I've seen C code with "functions" that are >1000 lines long, probably ~10 parameters with lovely names like `lfp`, and where probably 40% of those lines are #ifdef and #endif.

The same code would be much simpler with C++ templates, but the "C being simple" really translates into limited, then you have that some have who never learned how to use macros and this unfounded and misunderstood fear of goto that gives you a nice long lines of error checking, each that that are duplicated if statements character by character where a goto and a single error handler would have worked.

Perhaps this is an exception, but low-level in my case did not correspond to simple code.


In an extreme example, I replaced a 10,000 line C method with one template of less than a page of C++ code. It was marshaling different structures for transmission over a mailbox port. All that code copied and pasted and one or two numbers changed depending on the structure size.


Yeah, let's use copy & paste and C preprocessor macros as our abstraction tools ...

Avoiding unnecessary abstractions is important but at the same time those abstractions were invented for a reason. Basically the Go vs generics debate, except even more rudimentary. It's fine if your code doesn't need those abstractions, it sucks badly if it does (eg. gobject)


Depends on your definition of rudimentary. The C preprocessor sucks in general, but you can use it to implement all sorts of abstractions Go basically isn't capable of.


If replacing standard high level programming concepts like generics/templates with C macros is your idea of a productive programming environment than I don't know what to say.

Just look at any C collection library for things like maps compared to C++.


This is only true of C.

Algol, PL/I, Mesa, Cedar, Modula-2 and many other languages of similar age or older than C, do offer both higher and lower level mechanisms.


I think C does not make you write simpler code. It makes people take short-cuts. Static allocations, using a hand rolled linked list, skipping return value checks or boundary checks etc. are all typical to C programs.


Not sure why you've been voted down. I've seen too many lines of C and what you said is true most of the time.

Although developers skipping return value checks is true in most languages.


> Although developers skipping return value checks is true in most languages.

Yes, which is why I disagree with Go's approach for error codes as return values. I'm not saying that exceptions are the right solution everywhere, but if you return a error code, you should make it pretty hard to ignore it. In languages with Algebraic Data Types it's very easy to use Maybe or a Error(code) | Success(result) type. You've then got to explicitly deconstruct that type to get the result, so it's a bit harder to just forget handling the error.


I have two things to say.

#1 The submitted article is very good. #2 I'd add to it, avoid pointer arithmetic when possible. Use array notation instead. Typically the difference in execution speed is zero to nil. #3 I'd really like a pragma that forces an exception for unchecked return values. Something like

    int DoSomeThing(/* whatever*/) #pragma exit "unchecked non_zero"
Meaning if DoSomething() doesn't return zero the program bails.


I think it's mostly that C's lack of language features removes distractions. You have only a couple abstractions to work with if you're writing idiomatic C code, and so you can focus on the task at hand more easily.

The tradeoff is that the task at hand tends to have more implementation details in it than in higher level languages, but as you said, this isn't always bad.


Aaaand we got ourselves another site with an 8 or less char password


I think it only looks that way because people don't compare like with like. x lines of code C may well be simpler than x lines of a given alternative. But http://qdb.us/301922


Also, the very first advice applies to suprisingly many languages and frameworks, not just C:

> The first rule of C is don't write C if you can avoid it.


Programming in general honestly. Don't write code if you can avoid it, which loosely translates to "Don't solve problems that don't need solving".


Don't write code if you can avoid it and somebody will sell you a half-baked closed source SaaS, which will be acquired and closed in two years.


Or some obscure library found on github that's effectively become abandonware. Now you've just inherited a legacy codebase to maintain because some of your coworkers wanted to use a small subset of its functionality!


>The first rule of C is don't write C if you can avoid it.

Man, this is just lame. C is a really nice, useful language, especially if you learn the internals. It requires discipline and care to use effectively, but it can be done right. The problem is that a lot of C programmers only know C and can't apply higher level concepts or abstractions to it because they've never learned anything else. This makes their code harder to maintain and easier to break.


He's not saying that C isn't useful, but that for most problems it's overkill. Why inflict the extra burden on yourself -for no benefit-? Most code being written can be done just as well in a language with garbage collection, that has more powerful abstractions and paradigms as first class citizens...why not take advantage of them? Of course you can write those higher level concepts/abstractions in C, but the point is, -they've already been written-, tested, and documented, by other people. Why write untested, undocumented code that reimplements those things when you have no compelling reason to?


There are even languages without garbage collection that makes a lot of things easier than they are in C. See Ada, Rust, ~~Go~~ D and others.


Go is garbage collected.


Whoops, sorry. I meant D rather than Go, but I guess that is also sorta-kinda garbage collected.


D is also primarily garbage collected. You had an option to create scoped classes (allocated on stack) but it's apparently deprecated now.

I think Swift would be a better example here. Or objective C with ARC.


Rust is great, but I think C is much easier


You mean it's easier to write an incorrect program in C than a correct one in Rust? Of course!

If you mean it's easier to write correct programs in C than correct programs in Rust, I'm afraid you're badly mistaken.


That is some really good perspective. The incorrect C program is a bit of an externality that's felt by QA/testing/customers/etc.

I liken programming w/Rust to programming with C or C++ with warnings and a static analyzer barring object files from being created. If I were new to C/C++, I would probably find it baffling how anyone ever gets it to work.


I mean, Rust has a lot of high-level features that make it flat out easier once you get past fighting with the compiler...


> once you get past fighting the compiler

Actually, it's the very process of “fighting the compiler” (as you call it, though I prefer to view it as collaboration between a human who knows what he wants and a program whose logic codifies what is possible) that makes it easier to write correct programs in Rust/Haskell/ML/$FAVORITE_TYPEFUL_LANGUAGE.


I think there's a time before you learn how to make that collaboration productive, for a given tool, which is more aptly termed "fighting" and which you should certainly be able to get past. After that, you can have disagreements but they are far more productive.


From what I have seen, only the hacker types find themselves “fighting” against (executable embodiments of formal) logic.


This seems an unwarranted dig against a poorly specified group of people.

The notion that a type system is "just" logic and so should be trivially followed by anyone "smart enough" or "careful enough" is laughable - there are many different type systems (sometimes with different details depending on the options or pragmas you feed to your tooling), and it can take time to get a sense of where the boundaries are.

The notion that you're not likely to find a type system helpful if you have trouble with it initially is harmful.

Please don't do this.


> The notion that a type system is "just" logic

I would've never said "just". Formal logic is big f...reaking deal.

> so should be trivially followed by anyone "smart enough" or "careful enough" is laughable

I never said anything about being "smart" or "careful". I only said that hacker types, for whom the ability to subvert anything anytime is a fundamental tenet, tend to have a hard time sticking to the rules of a formal game, whether it is a type system or not. Just look at the reasons Lispers give for liking Lisp.


> I would've never said "just". Formal logic is big f...reaking deal.

"Just" was in no way meant to diminish significance; it was meant to draw attention to your improper limiting of scope. Someone "fighting the compiler" is not fighting "executable embodiments of formal logic", but "executable embodiments of formal logic and a pile of engineering decisions". Often, enough of those decisions are made well that the tool can be phenomenally useful; some are made less well, and some simply involve tradeoffs. It doesn't have the... inevitability that your wording carried. Programmers coming from a context where enough of those decisions have been made differently are likely to get tripped up for a while in the transition, without harboring any objections to logic.

> I never said anything about being "smart" or "careful". I only said that hacker types, for whom the ability to subvert anything anytime is a fundamental tenet, tend to have a hard time sticking to the rules of a formal game, whether it is a type system or not.

True, but you left it up to us to figure out what attributes of "hacker types" were relevant. As there are many different uses of the word "hacker", it was quite unclear that you meant to refer specifically to inability to stick to rules.

While people of that description may well persist in "fighting the compiler" for longer, that is not most of what I've observed when I've observed people struggling with a type system, and I reiterate my assertion that those new to a particular tool and used to doing things another way also frequently struggle for a period.


I agree. My point is that, after programming in Rust for a bit, it's also easier to write just any program, regardless of correctness requirements


While I agree with your sentiment, I think this is not true in general. The Rust compiler sometimes rejects correct programs, in which case it can be harder to write it in Rust than in C.


In the absolute worst case, you can pretty much transcribe your C program into unsafe Rust, though you will be fighting the syntax, which was deliberately made cumbersome to discourage programmers from writing unsafe code.

In practice, I have never seen any programs where this is the best approach. Coming up with designs that don't need unsafe code is very useful, because the safe subset of Rust is vastly easier to analyze and understand, both for humans and computers.


> The Rust compiler sometimes rejects correct programs, in which case it can be harder to write it in Rust than in C.

Well, the C compiler sometimes rejects correct programs too.


Sometimes C compilers can't even agree on what counts as "correct"! With Rust, you can at least be reasonably sure that if your program compiles with your version of the compiler, it will continue to do so with all later versions, because they're coming from the same codebase and made by people who care about backward compatibility.


> Sometimes C compilers can't even agree on what counts as "correct"!

This comes primarily from the fact the C standard itself is hard to interpret right. Standard ML has less issues in this department: since the Definition is clearer and less ambiguous, implementations disagree less on what counts as correct code and what its meaning is. So having only one major language implementation isn't the only way to prevent compatibility issues.


Hard to interpret right... according to some people, there is actually no valid reading of it.[1]

[1]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1637.pdf


C is a much simpler language thats all. I understand its easier to write safe program in Rust. But to just pick up a language and go, C is easier in my mind.


It's deceptively simple.

It's probably easier to write CS101 prime-number or fibonacci programs in C.

Any nontrivial bit of software? You're going to have to learn about undefined behavior and deal with it. Rust forces you to learn this before getting started.

Plus Rust has a lot of higher-level (still zero cost) abstractions like Python which make a lot of patterns really easy to program. (And Python is _the_ teaching language these days)

See also: https://www.reddit.com/r/rust/comments/3mtwev/using_rust_wit...


> C is a much simpler language thats all.

Simpler in what sense? Not this one: https://news.ycombinator.com/item?id=10671800


Coming from JavaScript, Java and Python, Rust has a steep learning curve, but with thought and patience you can learn to write Rust. I am getting much better at thinking about ownership before even compiling now. If some silly webdev like me can hack it, other programmers can do it too.


There are even languages with deterministic destruction which solves nearly all resource leak problems, and at least one of them is directly compatible with C and has nice metaprogramming facilities. C++ it is.


If I make a mistake in Python, C#, Java, Swift, Go, Ruby, Javascript, Haskell, D, Rust, et al I don't leak arbitrary contents of memory, smash the stack, overrun a buffer, or otherwise provide huge gaping security holes that allow arbitrary code injection.

C is riddled with opportunities to do _just that_ with even the slightest mistake.

Yes, I can make a logic error in any of the above languages that performs some operation a user shouldn't actually be allowed to perform. But I can't accidentally let the user read or write arbitrary memory locations.

With the exception of unsafe blocks or similar mechanisms; being limited to these explicit scopes, most programmer's natural laziness leads to avoiding unsafe behavior whenever possible. It also greatly limits the scope of potential bugs, as well as testing and code review required.


> But I can't accidentally let the user read or write arbitrary memory locations.

Many of the languages you mention contain an 'eval' method, and most use environment variables in less than transparent ways. I wouldn't be so sure of this claim.


Using eval is not accidental.


Yes, it easily can be. My point was, no language allows you to get away with ignoring security (e.g. there is no free lunch).

http://blog.codeclimate.com/blog/2013/01/10/rails-remote-cod...


I'd argue that it can't be done right for non trivial programs without enormous effort. Even software written by expert C programmers with years and years of experience contains bugs that are caused by C's unsafe features.


I'd agree, there was a great book called Deep C Secrets by Peter van der Linden. The book itself is not without errors but still provided some useful insights to things I didn't realize I was doing wrong back when I coded in C.


Errors are human. It's easier and more dangerous to screw up in C than in other languages. Don't use it unless there's a good reason to.


Modula-2 was released in 1980 - a Pascal successor which offers faster compilations speeds (precompiled modules) executables as fast as C built ones and a very safe type system avoiding most common errors in C programs. It is sad it was mostly ignored for so long a time. Fortunately, Go picks up its heritage these days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: