
How to design a replacement for C++ - rcfox
http://apenwarr.ca/log/?m=201007#22
======
phaedrus
I find his talk about a language's suitability for kernel writing to be like
people buying sporty cars they will never race or off road vehicles they will
never drive off road. Most people, _even including_ embedded programmers, will
never actually write (or need to write) a kernel.

Especially his comment about garbage collection: if you're _not_ going to have
garbage collection, frankly, what's the fucking point? You might as well
continue to use C/C++ then.

He is demonstrably wrong that you cannot reliably combine garbage collected an
non-garbage collected memory with a mark sweep collector! My C++ to Io binding
library builds C++ objects in script and stores script objects in C++, and Io
uses a mark-sweep collector and I am using it _right freaking now this very
minute_ in a game I'm programming without trouble. I regularly profile it in
valgrind; there are no memory corruptions. The solution? Give the script-
created C++ objects the same semantics as C++ code does. C++ temporaries are
created in script with a "temp" keyword and get passed by value. Long lived
C++ objects are created with a "new" keyword (in script) which stores a
pointer to the C++ object. You have to manually call "delete" (again, in
script) on the script-object-containing-the-C++-pointer to delete the objects.
Here's the beauty of the logic: as long as you are holding the pointer, the
script object will not be collected. If you do allow the script object to be
collected, by definition that means it is unreachable, so by definition you
FORGOT to call delete on it, ergo the system could then throw an exception if
an undeleted C++ pointer is ever collected. So rather than a problem, it
actually _helps_ you catch undeleted memory.

For C++ objects that hold script objects, I simply force any C++ object that
wants to hold a pointer to a script object to implement my IMarkable
interface, and the script binding is smart enough to know that if a C++
implements IMarkable, it gets included in the mark-sweep so it can mark off
its owned script objects.

------
blasdel
The recently-revealed "Rust" appears to be your salvation:
<http://wiki.github.com/graydon/rust/language-faq>

Written by Mozilla employees, and very much like Go, except that shared state
is immutable by default, no global GC, and no null pointers. It does have fine
struct layout control, RAII always, and type-parametric user code (purely
structural too!).

It does have a real module system for grownups, which even if it doesn't
preclude your fetish for #include, it will at least make you feel bad about
it.

------
hazzen
An immediate issue with the article: the author takes issue with the liberal
attitude taken by C++ in adding features and then proceeds to state:

    
    
      Maybe you like [lambdas], maybe you don't, maybe you think
      they're God's gift to programming and any language
      without them is an infidel. But adding them would be
      harmless, anyway.
    

Well I agree on this point, it seems like a poor argument. You can't just
declare a feature harmless if you ignore it. You make lambdas first-class and
you end up baking them into libraries. Now you _must_ use them.

Otherwise no issues, but it does feel like the author is advocating C++0x
minus heavy template metaprogramming - which is a perfectly fine language in
my book. Why not just use that and get on with life?

~~~
apenwarr
If your language isn't completely terrible, then you can choose to use a named
function anywhere an anonymous function will do. The libraries can never force
you to not give your functions a name or not put them in a global context.
That's why it's a harmless feature.

Templates might have been harmless if there was a way to do basic obvious
stuff (callback functions, strings) without using them.

~~~
shasta
I'll hazard the guess that hazzen is interpreting 'lambdas' to involve
closures.

~~~
hazzen
Yes and no. If a library assumes a lambda is easy, you will see a lot of code
like this (using somewhat bastardized OCaml types):

    
    
      interface 'a collection {
        void sort(lt:'a -> 'a -> bool);
      }
    

So, if you want to use lambdas, you would get:

    
    
      foos.sort(fn x y -> x.bar() > y.bar());
    

And if you don't want to use lambdas, you would get:

    
    
      bool sort_by_bar_gt(x:Foo, y:Foo) {
        return x.bar() > y.bar();
      }
    
      foos.sort(sort_by_bar_gt);
    

And that is assuming you can nest functions, but chances are you will have to
put that function somewhere removed from the actual call to sort. This is the
exact problem the STL hits: it assumes, for many things, you want a functional
style - and then it doesn't give you a way of writing lambdas. You are left
with one-off functors littering your code, wishing you could write that
lambda.

I would like to stress: I think lambdas are a requirement for any language, I
just take issue with the argument presented to convince lambda haters.

------
edanm
Agree with most parts of the article, one huge issue I didn't see mentioned.

Working with C/C++ for embedded systems often means you have to have total
control of the memory layout of your objects. You sometimes want to define
classes which have data members that directly map on to something in memory,
and you have to know _exactly_ how the class will be layed out in memory, and
more importantly, that it won't take up extra memory without your knowledge.

Specifically, his point #5, "Automatic vtable generation.", might get in the
way. C++ gets away with vtables because they aren't included by default, only
included when you define a virtual method. The same needs to be true of any
replacement.

------
shasta
1\. Hey, I more or less agree with this guy

2\. Hahahahahaha

3\. Aside from the fact that his wording is backwards from his meaning, I
agree that it should be possible to write code that doesn't ever garbage
collect.

4\. Ok

5\. Does this guy know the function of a linker?

6\. Again, he asserts something you are to avoid, when really he should be
emphasizing what you need to support: code without runtime checks.

7\. Wrong. This is one of the few opportunities you have to beat C at its own
game. C hardcodes assumptions about the stack that prevent certain low level
optimizations.

His #8 is then a summary of the entire article, and really is the only valid
point he's made. C is close to a sweet spot for wrapping low level code in
somewhat higher level clothing. You can't beat it at the same level of
simplicity (not if you count the cost of any potential switch anyway). The way
to beat C is to naturally support the high level / low level hybrid that he's
doing with python+C in a single language.

~~~
froydnj
> C hardcodes assumptions about the stack that prevent certain low level
> optimizations.

Would you please expand on this? I can't think of any assumptions C makes
about the stack that C++ wouldn't make; neither can I think of low-level
optimizations that said assumptions prohibit. I'm curious about what you have
in mind.

~~~
shasta
I just meant that the role of the stack is hardcoded. In most assemblys you
can write to the stack pointer and presto, you're working with a new stack.
This is a capability of assembly that C doesn't expose.

~~~
stcredzero
longjmp()?

~~~
shasta
No, that only lets you jump out - not e.g. jump between several suspended
computations.

~~~
froydnj
Sure you can. This is how user-level threading packages work if you don't have
kernel support.

~~~
shasta
Interesting, it seems many C implementations do support this. It's not
standard C, though.

------
d0m
"(Talking about C++ macro) [...] Well fuck you. If you take it out, I can't
#include stdio.h, and I can't implement awesome assert-like macros. End of
discussion."

This statement is simply wrong, and saying "End of discussion" is useless, if
not plain stupid.

It's like saying: 1+1 = 3, End of discussion.

~~~
apenwarr
I honestly can't begin to contemplate how you expect stdio.h to work (without
writing extra wrappers) if you don't support macros and a preprocessor.

Some of the "functions" in there _are_ macros.

~~~
shasta
import_c<stdio.h>

You can import C headers without your language being a superset of C.

~~~
ToastOpt
Yes. Many a time I have wished to import a library's symbols _without_
importing its macros. Or better yet, import its symbols under alternate names.
"namespace x = import<y.h>" anyone?

~~~
phaedrus
Yes, the biggest problem with #includes is the namespace cross contamination
they can cause. I once tracked a build problem down to someone #defining
usleep(time) to 'sleep(time/1000)' (or something similar). Problem was, usleep
was included later, so the definition of the _real_ usleep got replaced with
gibberish that did math inside a function declaration.

------
nddrylliog
Sounds like he is begging for <http://ooc-lang.org/> =)

Opt-out garbage collection, type inference, operator overloading, full-blown
OO but you can use covers if you just want to have methods on primitive types,
some type inference, saner syntax, you can include C headers all you want (but
it has a real, good module system), one-time declarations in any order,
objects are by-reference....

Besides, you can replace the SDK pretty much as you want, you're not tied to
any 'threads' implementation, the proof being: there's actually an OS written
in ooc! <http://github.com/tsion/oos>

(Of course I'm barely scratching ooc's features right now)

------
angusgr
I have two questions. Which qualities of C++ would one want to maintain in a
replacement, and what is one using the replacement for?

For the second question, the article's answer is: _kernels, drivers, highly
performance-sensitive code like game engines, virtual machines, some kinds of
networking code, and so on. And for me in particular, it also includes new
plugins to existing C-based legacy systems, including Microsoft Office._

I think this is the first step to discussing "a replacement". I fear that
every C++ programmer would give a different answer to each question.

The ideal replacement in the article can be summarised as most/all language
features of C, plus some chosen subset of the features of C++.

Although with most of them implemented differently to how C++ has implemented
them. Presumably that better implementation is to be done without compromising
it's C-like qualities. Which sounds to me like the same trap that C++ itself
fell in.

~~~
wvenable
The problem with C++ isn't so much that it's compatible with C or that you
shouldn't pay for any features that you shouldn't use -- it's that they used
the same dumb linker as C and so most advanced features are grossly limited
and ended up as a twisty maze of text manipulation and includes.

~~~
apenwarr
That's no longer true; the linker has been massively extended to support
templates at this point. (Otherwise every file that instantiated a particular
template would result in all the code for that template being duplicated in
the final executable. This actually used to happen in older C++ compilers.)

~~~
wvenable
I don't consider that _massively_ extended -- that's a optimization but it's
not really a fundamental change in the way the linker works.

------
tmsh
I like the article a lot (i.e., the suggestions for a newer alternative). But
C + Python is much tricker than you might think (because of the GIL mainly)
for any non-trivial, high-performance desktop or mobile app. Because any non-
trivial, high-performance app is going to involve queuing and you just can't
do that cheaply in CPython (it's not just the GIL, it's that the interpreter
is more or less a singleton -- Stackless Python is different -- but with a
singleton interpreter on top, the design -- whoa, I feel like I just read a
Paul Graham article on this :) -- the design gets a little top heavy).

I personally prefer C + C libraries + the non-invasive suggestions which are
mentioned which are already found in C++, Obj-C, etc.

~~~
apenwarr
I would have called it bottom heavy :) In python+C, every time something is
too slow, all you have to do is push more stuff down into the C layer. You
have control of the continuum between 100% optimized (100% C) and 100% clean
and readable (100% python).

I don't know any other programming environment that can claim that without
adding tons of complication (the C/python interface layer is the simplest I've
ever seen, except maybe for c/tcl).

~~~
tmsh
Right, but the problem is that the root node is always going to be Python, and
any code that uses Python is going to have to go through the root node (hence
all Python code is really part of the root subtree). This becomes problematic
if you have more than one queue that you're watching (e.g., input, network,
graphics output, etc.).

Were it possible to have Python code be in some non-root node, it'd be great
(e.g., for parsing text, or doing something algorithmic). As it is, when
everything has to be fed through the interpreter, the design becomes top-heavy
from a performance standpoint because the slow stuff is on top. And any time
you want to optimize something but then use some Python, you're stuck (it
becomes part of the root subtree again).

At least that was my experience. But Python does encourage interesting
alternatives for performance (with generators, etc.). And if you're already
mostly doing Python anyway (I'd say the threshold has to be 70%), then it
seems to make sense (v. the opportunity cost of switching contexts between
languages -- if you are spending more than 30% in C, I find it difficult).

But it's a really interesting issue. C, C++, and Obj-C all take a much longer
time to develop up front. But if you refactor into libraries, maybe not in the
long-run...

~~~
hagy
In my experience, having python as the root node for multithreaded C/python
programs generally doesn't present a problem. When calling into external C
functions through ctypes, the GIL is released such that multithreading at the
C-level is unaffected. I personally use this feature heavily to perform
numerically intensive work with multiple threads as directed through python
with C doing the grinding.

Additionally, all IO code in the standard library and most C-extensions that
I've seen release the GIL around blocking IO calls. This allows multi-
threading to be used for the standard blocking IO use cases. Further, the
computationally intensive routines in numpy and scipy that I’m familiar with
also release the GIL to facilitate multithreading at the C-level

The GIL only becomes an issue if you want multiple threads to be executing
Python bytecode concurrently. I don’t believe this is a common problem as one
rarely runs computationally intensive code at the python level. Such
computationally intensive work is generally performed through C extensions or
through external C functions (as accessed through ctypes). As long as such C
extensions properly release the GIL, multithreading at the C-level is
unaffected.

~~~
tmsh
These are really good points. I suppose the only problem I see is when the C
modules develop their own hierarchy of types and then they have to interface
with Python's hierarchy of types. That's really the only issue.

If what you're doing in C is numerical or computational in a functional way
that doesn't need a hierarchy of types -- then keeping all your types in
Python and calling into C for IO, computation, etc., makes a lot of sense.

Where I've run into problems is if the C modules themselves get sophisticated
and start having scene graphs or hierarchies of shapes, etc. Because then it
makes sense to mirror the hierarchy in Python -- but then you get real
performance issues.

So I guess if you can ensure that your C modules have a clear functional
interface (or at least one whose side effects are clearly defined) and doesn't
involve anything but a shallow type hierarchy, then Python + C is all good for
that. I.e., the C modules have to be very well-defined or have just about no
'code smell' (at least near their interface with Python). Arguably you can
always do this if you make an effort to refactor your C modules. But yeah,
because Python must always be at the root node, one is sort of constrained to
serve the 'top', so to speak. And that may encourage a sort of top-down design
which is less amenable to bottom-up programming (as mentioned in On Lisp),
etc.

------
andolanra
If I recall correctly, D allows you to disable garbage collection and do your
own memory management, and it does give the programmer a lot of power over how
memory is managed, up to and including literally using C's malloc and free. I
think it's more accurate to say that D has optional, on-by-default garbage
collection. I still wouldn't write a kernel in it, and I can't agree more
about D 2.0, but it's a bit misleading to say that D "requires" garbage
collection.

~~~
Benjo
Can anyone explain or provide a reference for the problems with D 2.0?

------
chmike
D is the best replacement of C++. Why reinventing another language ? Many
people have gone through the same process before.

------
hugh4life
Just throwing these out there... since he didn' put out many alternatives...

<http://live.gnome.org/Vala>

<http://ooc-lang.org/>

~~~
junkbit
+1 for Vala

------
pspda5id
ATS (ats-lang.org)

1\. Can embed and call C code with no overhead, uses C representation
internally

2\. Embedded C can use CPP, otherwise the language has macros

3\. GC is optional

4\. No "system" thread

5\. Optional standard library

6\. Static typing, linear types, types as propositions, programs as proof.

7\. ML-style exceptions, not insane

8\. Examples from K&R C translated into ATS <http://www.ats-
lang.org/EXAMPLE/KernighanRitchie/>

------
andymorris
Is there any non-legacy reason to have macros in C++? I've been coding C++ now
for over 3 years, and as soon as I learned all the features (especially
templates), I have never needed them again. Indeed, I despise them, because
they break debuggers and can conceal bugs easily.

I'd like to see an example, if anyone has one. But for me, the template system
is the best thing about C++ - they are much closer in functionality to Lisp
macros than preprocessor macros are!!

------
news4nobody
Wow it shrunk by +1.16%

[http://www.tiobe.com/index.php/content/paperinfo/tpci/index....](http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html)

Outstanding.

~~~
greyman
:-) One can argue about the validity of "tiobe", but still...as much as people
would like to see that "C and C++ are on the decline and they're just going to
get smaller," it somehow just doesn't happen. ;-)

Even in my case, I was happily coding in C# in the last 3 years, and in
January - bam! - I am on another project written in, well, C++.

------
sliverstorm
> But there will always be programs that have to be written in a language like
> C and C++. That includes kernels, drivers, highly performance-sensitive code
> like game engines, virtual machines, some kinds of networking code, and so
> on.

Thank you. I am not looking forward to the day when people begin to advocate
Java or something like that for these applications, and I am glad to see this
explicitly acknowledged.

------
junkbit
Sounds like Vala: reference counting instead of GC (full pointers underneath
if you want), zero cost to call C, lambdas, generics, signals/properties,
native unicode strings and other sugar such as foreach.

Can't do operator overloading though because they want to compile down to C
without mangling names

------
alextgordon
_2\. Do not remove the cpp preprocessor._

 _7\. One-time declaration/definition of functions._

Aren't these in conflict? A C-style preprocessor is crippled without the
header-file model. You'd do better to replace it with some other kind of
metaprogramming.

~~~
wvenable
If you've got one-time declaration/definition of functions, that's just one
less thing you need the CPP preprocessor for. There would be no conflict, it
would just end up being used for macros only.

I think you might be right though, I'd loose the cpp preprocessor but add
built-in hygienic macros and constants to make up for the loss.

------
phaedrus
Some comments on your proposed additions (my other comments were about the
proposed reductions):

\- I don't think .NET-style generics can be implemented in a statically typed
language that doesn't have a common base class (e.g. everything derives from
class Object). This is an oversimplification to say this but at some level
generics are implemented as syntatic sugar for typecasts between Object and
the type parameter type. If you want generics instead of templates, a lot of
C#'s design gets pulled in along for the ride. Nothing wrong with that (C# is
a fine language), just as long as you understand it will happen and why. But
also be aware that it means you will never be able to make a generic array
class that performs as fast as a C++ STL vector does - which violates your own
starting criteria, that it has to be as fast as C++ to succeed.

\- Most of the time I wish pass-by-reference were implicit, but pass by value
has some special consequences in C++ that make it unlike many other languages.
I was able to write a class that does fast bit-twiddling on itself, with its
internal representation being an integer, and the entire thing was no more
expensive in memory or speed than C functions bit twiddling on an int. So I
was able to get abstraction and raw speed with no compromises. That's a rare
use case, but I think it is illustrative of why pass by value is in C++ - you
don't need it most of the time, but when you need it, you really need it.

As for not being able to reseat a reference, I found that frustating too until
I started learning LLVM and about SSA (static single assignment). The common
answer to why you can't reseat a C++ reference is just that it has something
to do with compiler optimizations. That common answer is too weak; it has
EVERYTHING to do with how the compiler optimizes the code. Basically, if
you're asking why can't I reseat a reference, you don't understand what a
reference is. But it's not your fault: what you really don't understand is
that C++ uses the _word_ reference to refer to a thing that is not exactly
like references as used in other languages. I used to think a reference was
syntactic sugar for a pointer; that's not quite true. A C++ reference
literally IS the object; it's just converted to a pointer during the process
of a function call. The reason you can't reseat the reference is that if you
could the static analysis of the program becomes immeasurably harder and later
in the code the compiler cannot be absolutely sure that it knows what the
reference really refers too.

In a way, not being able to reseat a C++ reference is a lot like not being
able to assign to a constant defined by a macro or not being able to assign a
new address to a function at runtime. It's part of a class of things that
happen at a stage before runtime, and to change the system so that it would be
possible would require introducing a level of indirection.

~~~
jaen
Generics requiring type casts and a common base class is Java type-erasure
braindeadness.

CLR (.NET) generics compile separate code for each value type parameter, so
you can have eg. vectors of structs, with the same performance as in C++. (for
reference types, it uses the same code with an additional hidden "generic
context" parameter, but that's an implementation detail). CLR has pass-by-
value too, and although older versions of the MS.NET JIT did not inline such
functions, the current version should have performance similar to C++.

Unfortunately, few programmers know how to get the maximum performance out of
the CLR.

About not being to able to reassign references in C++ - the main reason is
probably because that would require a separate operator, since assigning to a
reference calls operator= on the class.

C++ has subtle interactions in many areas of the language, especially operator
overloading, allowing you to override even assignment. If C++ can not do
something apparently simple, there is probably a good reason for that, only
obvious to those who know the spec inside out.

~~~
grogers
I think this is probably true too, that they didn't want to have a separate
operator just to reassign a reference. I think it could have been made to
work, but just wasn't worth it.

Fortran for example has this - pointers in fortran behave like value types
(like C++ references) when performing operations on them, however there is a
special => operator to change the actual pointer. I never used operator
overloading in fortran but I don't think it would be much of an issue, you
would be able to overload the operator for anything except a pointer type.

------
phaedrus
1\. Instead of the ability to directly call and be called by C/C++, include a
really kick-ass Foreign Function Interface that can JIT the hookups and
wrappers. And more important than _call_ compatibility IMO is C++ object ABI
compatibility. (I realize there is no universal C++ ABI; just pick one and be
compatible with it.) If you can declare a struct or class in the language and
make its layout match a C or C++ class/struct to the point of being able to
pass them back and forth, that's the important feature. The other important
thing is some C API's use callback functions, without passing a user object
that you can use to implement a closure out of it. A language with the ability
to dynamically emit new functions (built in JIT) would be able to build a new
C function just for one object to receive a callback.

2\. Fuck the C preprocesser. An architecture based on #include files is a
joke; you will never make something better than C++ if you stick with that
compilation model, you'll only make another C++ (corollary to the phrase,
those who don't understand Unix are doomed to re-implement it poorly). Here's
an idea: sandbox the god damn shit preprocessor and parse its output to
provide input to the FFI (foreign function interface) JIT compiler. Under no
circumstances allow macros defined in an #include to spill into the language
proper. _This_ is not negotiable.

3\. No, mark-sweep garbage collection can be made to coexist with C++-style
memory management.

4\. If you aren't going to include built-in threading or some better form of
concurrency, why bother? Unless you have a time machine and intend to go back
to 1980 to introduce your language there.

5\. I don't think it's that clear cut (about requiring (or not) a standard
library) but I won't argue this point.

6\. I think you're right in the sense that if you _do_ include dynamic typing
it is going to lead to creating something completely different than the stated
goal. I'd say it would be a _better_ language, but it would not be the
_correct_ language for replacing a statically typed one.

7\. I think exceptions are an important language advance, but I can't dispute
that they can cause a mess of problems when it comes to cross-language (or
cross-thread) boundaries.

8\. If you look at C, it is not too verbose for coding the types of concepts
which it supports (function calls with a small number of control-flow
statements). But it does not allow you to define new concepts. Of _course_ C
is terse for calling functions: _those are C's atoms, it's fundamental
language element!_ So IMO comparing to C code based on how easy it is to call
a function is the wrong thing to think about. True improvements would come at
the macro scale from better organization of groups of functions (e.g. into
classes) or at a semantic level, by not needing to specify a function
explicitly at all (e.g. operator overloading).

~~~
Nitramp

      I think exceptions are an important language advance, but 
      I can't dispute that they can cause a mess of problems 
      when it comes to cross-language (or cross-thread) 
      boundaries.
    

Exceptions are not completely problem-free, but there is no alternative. If
you don't have exceptions, every single line of code:

    
    
      doSomething();
    

Becomes:

    
    
      int result = doSomething();
      if (result != OK) return result;
    

... and that is the simple case, without memory management, an actual result
to return, unfortunate interactions with your other control flow statements,
and so on. And you still end up with your application helpfully saying "ERR:
-1923876".

This is simply doesn't work. And few people programming C/C++ appear do it
correctly; I'd imagine that many security flaws come from improperly
proceeding code that should have checked a return code.

~~~
supersillyus
I prefer return values over exceptions in languages that make it safe and
easy.

Haskell does a pretty good job, I think. With Maybe or Either, you have to
check it to extract your value, but if you are doing a lot of things that will
result in/use the results of Maybe/Either, you can just operate in a Monad
that makes them the default.

Similarly, languages with multiple return values make you explicitly ignore
the return value.

Exceptions have their benefits, but I have a personal preference for return
values because they make it explicit in the interface. Also, I find that it
makes more sense to handle various success and failure scenarios in the same
bit of code instead of writing a best-case flow and assuming that exceptions
will be thrown and errors will be dealt with somehow.

Exceptions are great if you can't continue due to programmer error, but most
cases where people use exceptions are just expected execution paths that (in
my view) make no sense to handle via non-local control flow.

------
jpr
tl;dr, he just wants a language that somehow magically has all the features of
C++ while somehow magically not sucking at the same time. Also, regarding the
first 2: fuck you too, author.

------
bhiggins
I figure closures would be hard to do in a programmer-friendly way without GC.

~~~
moultano
The C++ closure library we use internally at Google has you specify whether
the callback is permanent or not. If it isn't permanent, its deleted right
after executing. It is a little painful maintaining the memory of your
arguments, but people have generally converged to packing them all into a heap
allocated struct and deleting when appropriate.

