
C and C++ Aren't Future Proof - malloc47
http://blog.regehr.org/archives/880
======
pkaler
> This propensity for today’s working programs to be broken tomorrow is what I
> mean when I say these languages are not future proof.

It doesn't matter. This is not how programming works in the real world. In the
real world, you write the most correct program you can under time pressure. A
new compiler, operating system, or platform arrives that exposes a bug. You
fix it and you move on. It doesn't matter if the language is future proof or
not. The process is similar for any complex program.

The blog's name is "Embedded in Academia" and this is perfectly valid
viewpoint for someone in academia to take. And people in academia should
research towards building more robust tools and languages. But it really is
not going to matter in the real world. Languages and platforms will always not
be future proof because computing is complex.

~~~
mjn
The particular kind of not-future-proofness he has in mind seems pretty
practically important: code that relies on this undefined behavior often
suffers from exploitable security holes. Just because computing is complex
doesn't mean you have a free pass if you shoot yourself (or your customers) in
the foot the _same_ way the previous 100 folks did. If it happens enough, it
becomes prudent to do something about it, like people finally did about
unsanitized format strings, or the use of unbounded sprintf().

His suggestion #3, that the standards should define more of the commonly used
behavior and leave less of it undefined, wouldn't even require C programmers
to do anything about it themselves.

~~~
pkaler
> His suggestion #3, that the standards should define more of the commonly
> used behavior and leave less of it undefined, wouldn't even require C
> programmers to do anything about it themselves.

I've written Windows, Mac, Linux, Xbox, PlayStation, PSP, iOS, and Android
code. The memory model is subtly different for each platform. I just don't
think you can define certain behaviour and have that work across disparate
platforms.

I haven't really written any device drivers or kernel space code but I would
imagine it would make the job even more difficult.

~~~
munin
Ostensibly, a platform like Java or Rust is supposed to abstract stuff like
the memory model. I haven't written a lot of Java code, especially not Java
code that runs on many different native system / VMs, but from my perspective
of blissful ignorance, it seems to have done the job?

Same with other high-level VM based languages like Python...

~~~
dragontamer
Python is not future proof.

There are undefined sequences even in Python, where Jython and CPython output
different programs.

~~~
justincormack
Small amounts of undefined behaviour are normal in most language specs though
to give implementations flexibility. Tests to make sure you do not rely on
them would be useful though.

------
haberman
I think there is an important point here, which is that C and C++ compilers
have let us get away with a lot of undefined behavior for a long time, and
that there hasn't been a lot of tooling to help avoid it nor a culture that
stresses the long-term danger of depending on it.

I can speak as someone who has been programming in C and C++ for over ten
years, but only in the last few years became aware of this issue and started
taking it seriously. Five years ago I would do things like cast function
pointers to void-pointer and back, or calculate addresses that were outside
the bounds of any allocated object and compare against them, all without
really even realizing I was doing something wrong.

I don't think this will spell doom-and-gloom for C and C++ though. I think a
few things will happen.

First of all, the compiler people are walking a fine line; yes, they are
breaking code that relies on undefined behavior, but they often avoid breaking
too much. For example, I've had it explained to me that at least for the time
being, gcc's LTO avoids breaking any programs that would work when compiled
with a traditional linker. In addition, they often provide switches that
preserve traditional semantics for non-compliant code that needs it (like
-fno-strict-aliasing and -fwrapv).

Secondly, I believe that tooling will get better, and rather than ignoring the
warnings I believe that people's general awareness of this issue will raise,
as well as knowledge of standard-compliant ways of working around common
patterns of undefined behavior. For example, it's often easy to avoid aliasing
problems by using memcpy(), and this can usually be optimized away.

Thirdly, I expect that the standard may begin to define some of this behavior.
For example, I think that non-twos-complement systems are exceedingly rare
these days; I wouldn't be surprised if a future version of the standard
defines unsigned->signed conversions accordingly.

~~~
pascal_cuoq
I agree with all your arguments. The function pointer <-> void pointer
conversion in particular is an excellent example. But the very last example,
unsigned -> signed conversion, is not a good illustration of the point you are
making then.

unsigned -> signed conversion is already “implementation-defined behavior” (as
opposed to “undefined behavior”). The standard does not guarantee how it
behaves but forces compilers to make a choice and to stick to it.

A different example, of a behavior that is really undefined, would be signed
arithmetic overflow:

int detect_max(int x) { return x+1 < x; }

The function above branchlessly detects that its argument is INT_MAX, and
returns 1 in this case thanks to 2's complement representation.

Except that it doesn't. The command “gcc -O2” compiles it into “return 0;”.
GCC can do this, because signed arithmetic overflow is undefined behavior. The
compiler is only taking advantage of undefined behavior in a way locally
convenient.

Now that two's complement is (almost) everywhere, making it the standard for
signed arithmetic overflows is the sort of bold choice I would like to see,
but it won't happen (it would break GCC's existing optimization).

~~~
haberman
Good point about undefined vs. implementation-defined, though even
implementation-defined behavior could break programs that switch to a
different implementation that makes a different choice.

------
pacaro
This caught my eye "Program analyzers that warn about these problems are
likely to lose users."

For me, this is perhaps the biggest issue raised in this article, as static
and dynamic analysis tools become more ubiquitous we should be learning to fix
the issues that they raise, not ignore them.

I remember a while ago (2004 or 5) interviewing a college-hire candidate, I
had asked about working with others and we had gotten to talking about code
review - the candidate was passionate about how code review had helped with a
group project he worked on, but every single example he gave of a a bug found
by code review was something that -Wall would have found...

The same applies to static analysis - let the machines do the work that they
can do, that leaves the humans to get on with the work that the machines can't
do (yet!)

------
ge0rg
The problem with smart compilers is indeed how they break existing (naive)
code, optimizing away things like "assert(len + 100 > len)" [1]

Making a correct overflow check in C/C++ is not just not straightforward, it
is overy complicated even for experienced developers [2]. This is IMHO
inacceptable for a thing that is required often in a security context.

Therefore, I hope that option 3 proposed by the author (change of the C/C++
standard to define the correct behavior at for least integer overflows) will
be adopted. However, this probably will not happen for a long time, leaving us
with security holes all over the net.

[1] <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475>

[2] [http://stackoverflow.com/questions/3944505/detecting-
signed-...](http://stackoverflow.com/questions/3944505/detecting-signed-
overflow-in-c-c)

~~~
ygra
I don't really see how that's a problem with the compilers instead of with the
language.

------
c3d
C and C++ indeed aren't future-proof, but it's not juste because of undefined
behavior, it's by remaining stuck firmly in the 1960's in terms of programming
style.

C++11 added many changes intended for "do-it-yourself" crowd, like auto, new
function syntax, lambdas. It didn't add much in terms of "let the compiler do
the work for me" crowd (one notable exception being variadic templates,
something that was in my own XL programming language since 2000). In C++, you
are still supposed to do the boring work yourself.

For example, C++11 still lack anything that would let you build solid
reflexion and introspection, or write a good garbage collector that doesn't
need to scan tons of non-pointers.

If you want to extend C++, it's just too hard. C++11 managed to add complexity
to the most inanely complex syntax of all modern programming languages.
Building any useful extension on top of C++, like Qt's slots and signals, is
exceedingly difficult. By contrast, Lisp has practically no syntactic
construct and is future proof. My own XL has exactly 8 syntactic elements in
the parse tree.

So in my opinion, C and C++ are already left behind for a lot of application
development these days because they lack a built-in way to evolve. If you are
curious, this is a topic I explore more in depth under the "Concept
programming" moniker, e.g.
[http://xlr.sourceforge.net/Concept%20Programming%20Presentat...](http://xlr.sourceforge.net/Concept%20Programming%20Presentation.pdf).

------
mjn
A side note I took away from this post is the existence of Frama-C, which
appears to be a quite nice, open-source analyzer: <http://frama-c.com/>

------
shmerl
So far I doubt C++ is going anywhere - it's here to stay. When the usage of
such languages as Rust will gain more traction up to the point that high
performance games engines will be written in it, one could start saying that
C++ is being pushed out. But it's really somewhere in the future.

------
bcoates
Hey, don't lump C++ in with this. If you write code in the STL weenie style or
the Pretend It's Java style there aren't any idioms I know of that would ever
violate the rules he mentions (out-of-range pointers, signed overflow, invalid
aliasing). I don't do those things and the C++ programmers I work with don't
do those things, at least not habitually. I don't see violations of undefined
behavior rules, or the use of idioms that come close to it, very often in our
code. Not nearly as often as the sort of mundane errors that no language can
prevent.

These are not problems of a language per se, but the original sins of neo-
vaxocentrism and confusing "I understand how this might work, at some random
abstraction layer" and "I can depend on what happens when I do something
stupid". Free your mind of these and the rest will follow.

These low-level bit banging errors are vastly less common than shared-memory
concurrency issues, which as far as I can tell are endemic to all code that
attempts shared-memory concurrency, in any language. If you want to have an
axe to grind about languages that aren't future proof, look there.

~~~
betterunix
"If you write code in the STL weenie style or the Pretend It's Java style
there aren't any idioms I know of that would ever violate the rules he
mentions (out-of-range pointers, signed overflow, invalid aliasing)."

What does the STL do about signed overflow? As for out of range pointers, that
is an easy one to get with the STL:

    
    
      vector<int> somevector(100);
      somevector[200] = 5;
    

"These are not problems of a language per se"

Yes they are: the default numeric type is fixed-width, pointers pop up all
over the place and pointer dereferences are unchecked by default. Personally,
though, I would have chosen (as the article's author did) the more severe
deficiencies in the standard, like the lack of any requirement that a function
with a non-void return type have a return statement along every control path
or the fact that there is no reliable way to signal errors that occur in
destructors.

"These low-level bit banging errors are vastly less common"

Not in my experience, and not judging by the number of bug reports and
vulnerabilities I have seen that stem from low-level mechanics.

~~~
bcoates
I'm not saying the language is some sort of security barrier that prevents any
error, I'm saying sanely styled code does not have these issues in practice.
The solution is "don't do that, and cultivate habits that will not cause you
to do that by accident", not having the compiler make up semantics for broken
code or putting in checks everywhere. Just because someone, somewhere does it
wrong, doesn't mean it's impossible to do it right.

this:

    
    
      vector<int> somevector(100);
      somevector[200] = 5;
    

Is a C idiom translated by cut-and-paste. The unmotivated poking of arbitrary
magic-number offsets into a magic-number sized vector is not proper. It's the
kind of thing that sets off alarm bells on even the most casual of review.

~~~
betterunix
"I'm saying sanely styled code does not have these issues in practice"

Otherwise known as the "just do it right" argument. This is an argument that
goes all the way back to the days of writing everything in assembly language,
and it was just as wrong then as it is today. If only a restricted subset of a
language can ensure that basic issues do not become serious problems, then the
language should be restricted to that subset.

"not having the compiler make up semantics for broken code or putting in
checks everywhere"

Really? I would rather have the compiler put in run time checks whenever it
cannot infer that no input will cause the program's behavior to be undefined.
Thus, the compiler might insert a check here:

    
    
      for(i = 0; i < input.length(); i++)
        some_vector[i]++;
    

but not here:

    
    
      for(i = 0; i < min(input.length(), some_vector.length()); i++)
        some_vector[i]++;
    

nor here:

    
    
      if(input.length() > some_vector.length()) {
        throw some_exception();
      }
      for(i = 0; i < input.length(); i++)
        some_vector[i]++;
      

At the very least, requiring bounds checks on array access would create a
definition for out-of-bounds pointers: program termination (or perhaps an
exception being thrown). A reasonably good compiler can detect when a bounds
check is unnecessary and can remote the bounds check as an optimization. Why
shouldn't this be something that compilers do -- out-of-bounds array access is
never a good thing (oh, wait, you might be dereferencing some arbitrary
pointer that you got by some means other than allocating memory with "new" --
OK, fine, but that is what type systems are for; this sort of separation is
not unheard of, I see it in Lisp with SBCL's FFI)?

"The unmotivated poking of arbitrary magic-number offsets into a magic-number
sized vector is not proper. It's the kind of thing that sets off alarm bells
on even the most casual of review."

Perhaps so, but then the answer is not simply "just use the STL." As with most
things C++, it requires a long list of things to make code work right, and
even people who have been writing C++ code for many years are sometimes
surprised to discover that something they thought was fine is actually bad.
C++ makes it pretty easy for programmers to do the wrong thing and needlessly
difficult to do the right thing, which is why years of expertise are needed to
write remotely reliable C++ code.

~~~
bcoates
I really don't have any difficulty finding programmers who have the discipline
to not use the unsafe parts of the language all over the place. C++ has an
issue with having a fragmented multitude of sane subsets, but any of them are
fine if they get the job done.

That said, I don't understand why you still keep putting up awful mostly-C
code as if any trained C++ programmer wouldn't yell at you for doing it wrong,
_even before they saw the part with the error_.

    
    
      for(i = 0; i < input.length(); i++)
    

Where did you learn this? don't do this. Everyone else knows not to do this.

    
    
      for(i = 0; i < min(input.length(), some_vector.length()); i++)
    

This is actually worse, though it does have the virtue of probably working. If
you want a run-time check, use at(), or better still use an iterator already.

C++ has all sorts of issues. It's too hard to learn, it's missing some very
useful features, and it has a number of rough edges that you have to learn
your way around. But the things being complained about in the OP and by you
are not real problems for anything but beginners. There just aren't that many
naked array accesses or pointer math operations going on in an ordinary C++
application written in non-C style.

~~~
pcwalton
Iterators don't protect against iterator invalidation due to e.g. emptying a
vector while you iterate over it. Accessing elements through an invalidated
iterator is undefined behavior and can lead to exploitable security
vulnerabilities.

Even modern C++ has very unsafe parts.

~~~
roel_v
You're arguing a straw man here. bcoates is saying, and I agree, that the
usual examples being given on how horrible C++ is, are not idiomatic C++ and
are used only by people who don't have any experience using C++. Of course
it's easy to come up with examples of when things might go wrong. C++ is a
powerful language, and with great power comes great responsibility, pardon the
pompousness of that phrasing. C++ isn't perfect by a long shot, but the
reasons brought forth in the OP and most of this discussion are _not_ examples
of _real_ problems.

~~~
zxcdw
Many times it's the _human error_ which causes the bug/vulnerability to happen
rather than sheer ignorance/lack of experience. In such cases a tool which
prevents this from happening in the first place is superior to one which
doesn't have such a safety feature in it.

For the same reason we can't ever completely prevent traffic accidents by
requiring higher skilled drivers. We can prevent traffic accidents by building
cars, lanes, junctions and roads in such way which minimizes the damage caused
by a human error.

I'll rather use a hammer which refuses to strike to my finger even if I try to
make it to, rather than one which I can smash my fingers with by accident. I
am sure you would too.

~~~
roel_v
Sure, and that's why I e.g. prefer strong typing for bigger systems. The
examples that have been used so far just aren't good examples of what is wrong
with C++, which is what the point was about. At some point, there is a trade
off between safety and power, and one that makes C++ quite well, IMO.

------
dysoco
If all people started writing code with more RAII and Smart Pointers this
would be a better world.

Talking about C, well... it's unsafe by nature, let's face it.

~~~
zxcdw
If all people started writing code in a modern C++ equivalent called Rust this
would be a better world. After all in comparison Rust is _safe by nature_
unlike C or C++. :)

~~~
CJefferson
Really? I should switch my apps from C++ to an unfinished language?

I'll consider once the Mozilla foundation have written any application of note
in it.

~~~
zxcdw
You should not _switch_ your apps. You should stop creating new projects in
C++ once Rust is mature enough for your liking.

The fundamental "problem" we're having/facing with C and C++ is the
investments we've put in. Lots of "infrastructure" in modern day computing
relies on C and C++ and will do so for ages. We can't just drop the projects
and switch to something else(say Rust or maybe Go), _but we can stop creating
new C and C++ codebases_ to alleviate the problem for the future.

~~~
shmerl
Exactly, it's a chicken and egg problem. C++ has tons of libraries, while Rust
has barely any. Rust can use external C bindings, but not C++ ones yet. I
think if they solve the issue of using C++ libraries from within Rust, the
transition will be much easier, and meanwhile more native Rust libraries will
be created.

~~~
rplacd
Is the C++ ABI issue _ever_ going to be solved?

~~~
pjmlp
Yes, in the cases where the operating system is done in C++.

The C ABI is the operating system ABI and it is only ubiquitous in operating
systems done in C. Other operating systems use whatever ABI the system offers.

------
chipsy
Sufficiently Smart Compilers vs. Sufficiently Dumb Code

~~~
jacques_chester
A better way to put this would be Sufficiently Smart Compilers vs
Insufficiently Defined Languages.

------
X4
It will exist as long as there are people porting C to other architectures!

C is there since 1972, it is one of the most widely used programming languages
of all time and there are very few computer architectures for which a C
compiler does not exist. Many later languages have borrowed directly or
indirectly from C, including C#, D, Go, Java, JavaScript, Limbo, LPC, Perl,
PHP, Python, and Unix's C shell.

------
anuraj
Nothing is future proof - don't worry. We have only been programming for the
last 60 years. C has endured 40 years out of that. That is no guarantee that
it will endure further. But the point is programming practices has not
drastically changed during the course of these years. As and when a disruption
occurs there, almost all our current tools shall be rendered obselete.

------
jbert
I don't see how this issue is specific to C/C++.

Don't all languages have "don't do that" corners, even if they are just bugs
in the current versions of the compilers/interpreters?

C and C++ at least _tell_ you where some of these are, so actually the
situation is better?

------
Executor
if assembly was regular writing then C would be cursive. I would like to see
D/go/rust succeed where C/C++ has failed.

------
malkia
People use C/C++ like you ride on the streets and freeway - the sign says
65mph max, yet everyone else is on 70. Just don't go too much over it.

Laws are the be broken, and C/C++ is the wild west in this respect - cowboy
programming is welcomed.

And I love it :)

------
nib952051
>> We ditch the C and C++, and port our systems code to Objective Ruby or
Haskell++ or whatever.

omfg:))

~~~
BadDesign
Objective Ruby ? What's that?

~~~
nib952051
This is suggestion from articke to code in and I have no idea wtf it is

------
cmccabe
Wow, C and C++ have undefined behavior? I bet nobody knows that unless... they
took an undergrad comp sci class.

Why is this on HN?

Use the right tool for the job. Sometimes that C or C++, sometimes it's not.

~~~
scott_s
There is substantially more sophistication to his points than you give him
credit for. If someone who is an expert in something - and he _is_ \- says
something about their area of expertise that you think is obvious and simple,
consider perhaps that it's your level of understanding that is lacking, not
theirs.

~~~
cmccabe
Academics have been whining about C and C++ since at least the 1980s. But if
you ask any three academics what programming languages they like, you'll get
three different answers, depending on what department and program they are in.

I'm sure Dr. Regehr is a smart guy, but I don't consider academics good
sources of advice on software engineering, for the same reason I don't get sex
tips from Catholic priests. Also, John Regehr's CV relates more to static
analysis than software engineering anyway.

------
nnq
semi-oftopic: ...nice to know about Haskell++, never heard of it before.
Hopefully the lung cancer C++ joke doesn't apply :)

