
A Usable C++ Dialect That Is Safe Against Memory Corruption - ibobev
http://ithare.com/a-usable-c-dialect-that-is-safe-against-memory-corruption/
======
devit
This works, although the downside compared to Rust is that soft pointer
validity is checked at runtime, meaning that a program that compiles can still
randomly fail at runtime and that performance is worse due to the checks.

The key idea and massive difference from standard C++ is that object
destruction is delayed until a "quiescient state" happens in what is a
reframing of RCU [[https://en.wikipedia.org/wiki/Read-copy-
update](https://en.wikipedia.org/wiki/Read-copy-update)], allowing to freely
use raw pointers as long as none survive across a quiescient state.

[note however that this system allows to take pointers to stack variables, so
they have to restrict raw pointers to function arguments only - it would be
better to also introduce a "heap-only" pointer that can be freely
returned/stored on the heap/etc. but can't be stored in types that live across
a quiescient state, from which stack-or-heap raw pointers can be derived]

This also results in the downside that things like mutexes can only be safe if
they are kept locked until a quiescient state happens, since that's the only
lifetime that the system understands.

Likewise, you can't do this like prevent updating a collection while iterating
unless you are fine with freezing the collection until a quiescient state
happens.

In general, you are much better off using Rust (or an equivalently expressive
language, if it existed), since that allows to statically check for
correctness, not have to delay freeing memory, and allows to use lifetimes and
linear types to secure mutex locking, collection iteration, and other things
where lifetimes are essential.

~~~
hedora
I don’t understand your comment about how rust pointers are safer than soft
pointers. The article explains how to implement a wide variety of pointer
semantics, all of which are memory safe (throw an exception on explicit use
after free, use the type system to have the compiler statically check the
pointers are live, use dynamic cast, etc). Looking online, I see that people
implement all the same primitives in rust, with exactly the same safety
caveats.

Also, the container and mutex tricks you mention sound interesting, but I
don’t see why they can’t also be used in C++ (which has a turing complete type
system / checker).

~~~
masklinn
> use the type system to have the compiler statically check the pointers are
> live

It doesn't explain how it would statically ensure that a moved-from unique_ptr
(or equivalent) can not be used. In fact the only mentions of moves are that
owning pointers can only be moved and soft pointers can be moved or copied,
but C++'s move does not remove any access, it just moves the content leaving
the moved-from object in a "valid but unspecified state".

Note that valid != safe. Dereferencing a moved-from unique_ptr is unsafe for
instance.

Rust's affine types solve this issue, a moved-from type (Box included) simply
can't be used, its scope ends when it's moved.

> Looking online, I see that people implement all the same primitives in rust,
> with exactly the same safety caveats.

Rust's (safe) pointers and references don't throw exceptions on explicit use
after free because such code doesn't compile at all, and its equivalent to
dynamic_cast has to be very specifically opted in: [https://doc.rust-
lang.org/1.19.0/std/any/trait.Any.html#meth...](https://doc.rust-
lang.org/1.19.0/std/any/trait.Any.html#method.downcast_mut)

~~~
nobugs
> Rust's (safe) pointers and references don't throw exceptions on explicit use

Which essentially goes at the cost of having Java-style semantic memory leaks
(very generally, _any_ kind of keeping-an-object-as-long-as-at-least-one-
reference-exists suffers from it) => we still have to pick our poison
(personally, I _strongly_ prefer to avoid refcounting, and it does work like a
charm in a few very serious million-LoC/billions-transactions projects, but I
do agree that opinions may differ).

~~~
masklinn
> Which essentially goes at the cost of having Java-style semantic memory
> leaks (very generally, _any_ kind of keeping-an-object-as-long-as-at-least-
> one-reference-exists suffers from it)

Rust references work the opposite way. References don't extend the lifetime of
their source, and a reference outliving its referent is a _compile-time_
error.

> we still have to pick our poison (personally, I _strongly_ prefer to avoid
> refcounting, and it does work like a charm in a few very serious million-
> LoC/billions-transactions projects, but I do agree that opinions may
> differ).

I have no idea what the hell you're talking about, but you seem to suffer from
pretty significant misunderstandings.

~~~
nobugs
I'm still speaking about reference-counted RC<T>, which inevitably suffers
from memory leaks. And moreover - _any_ implementation which avoids throwing
an exception, in quite a few use cases has no other choice than to resort to
keeping the stuff until the last reference to it is killed, inevitably causing
Java-style semantic memory leaks.

P.S. FWIW, Rust's references ~= OP's "naked pointers" (NOT 'soft pointers'),
and SaferCPP's 'scoped pointers'. A useful tool, but is not sufficient in
quite a few real-world use cases.

------
btilly
I have a question about this.

Articles like [http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know_14.html) have convinced me that even if C or C++ reads logically like it
is safe, there is a possibility that the compiler can rewrite your code in an
acceptable way according to the standards such that the checks that are
clearly visible in your code disappear, opening up the very problems that you
thought you were protected against.

Is there any possibility that after an aggressive compiler gets done with
inlining and optimization that that could happen here in some way? Can it be
proven that if the compiler works according to the standard that this won't
happen..even if the programmer accidentally trips on undefined behavior?

~~~
saagarjha
As far as I'm aware, if you stay within the confines of smart pointers (and
don't drop down to the raw pointer it owns) you will never encounter undefined
behavior. You may have crashes if you try to double free something, but these
are defined to crash rather than letting the compiler optimize out checks.

~~~
zach43
Wouldn't you have a problem with reference cycles in C++ smart pointers? Not
sure if they do anything special to prevent this.

~~~
saagarjha
std::weak_ptr is there for resolving reference cycles.

~~~
pdpi
Which means you have to design around it. You can definitely run into problems
with cyclical shared_ptr dependencies

~~~
saagarjha
Well, you have to do this for every language that does reference counting.

~~~
pjmlp
Either that, or have a cycle collector in addition to the normal reference
counting optimizations.

------
pcwalton
This technique is mostly a garbage collector, as I see it. Postponing memory
destruction until the stack is empty is a special case of deferred reference
counting [1], where sweep can only happen with an empty stack. If the "soft
pointers" are implemented with reference counting, that's also a type of GC.

On the other hand, the tagged pointer implementation strategy for "soft
pointers" isn't really garbage collection, but it does have much of the same
overhead. Pointer reads must check the tag ID and throw, which is like a read
barrier [2]. Writes through a pointer must do the same, similar to a write
barrier [3]. And that's not getting into the overhead of multithreading; I see
no reasonable way to implement this scheme in a multithreaded world. I expect
that a fast GC without read barriers will significantly outperform this
scheme. As much as everyone complains about the speed of GC, garbage
collection is hard to beat!

[1]: [http://www.memorymanagement.org/glossary/d.html#term-
deferre...](http://www.memorymanagement.org/glossary/d.html#term-deferred-
reference-counting)

[2]: [http://www.memorymanagement.org/glossary/r.html#term-read-
ba...](http://www.memorymanagement.org/glossary/r.html#term-read-barrier)

[3]: [http://www.memorymanagement.org/glossary/w.html#term-
write-b...](http://www.memorymanagement.org/glossary/w.html#term-write-
barrier)

~~~
nobugs
> Pointer reads must check the tag ID and throw, which is like a read barrier

Usually, "read barrier" is understood as a multithreaded stuff - and OP has
nothing to do with MT. In other words, no "read fence" is necessary (simply
because it lives in a perfect single-threaded world). And from this POV, it is
extremely difficult to beat this schema with any popular-multithreaded-GC. As
a side note, proposed schema DOES allow 'naked' pointers, so relatively-
expensive (costing ~4CPU cycles, which is not much to start with) conversion
from 'soft' into 'naked' has to be done only _very_ occasionally, and after
the conversion, we're working with good old plain pointers, which just happen
to be safe due to the way they're used.

------
duneroadrunner
If anyone is really interested in this sort of thing, I suggest you take a
look at SaferCPlusPlus[1]. It is "A Usable C++ Dialect That Is Safe Against
Memory Corruption" (including data races). And it already exists.

And I think it's better than this proposed dialect in that most of the
(safety) restrictions are enforced without requiring extra tooling, and it's
much less restrictive. Most existing C++ code can be converted directly. And
the run-time overhead is kept to a minimum. Btw these advantages apply versus
the Core Guidelines[2] as well.

[1] shameless plug:
[https://github.com/duneroadrunner/SaferCPlusPlus](https://github.com/duneroadrunner/SaferCPlusPlus)

[2]
[https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp...](https://github.com/duneroadrunner/SaferCPlusPlus#safercplusplus-
versus-the-core-guidelines-checkers)

~~~
nobugs
I happen to like quite a few things from it, but... there is a Big Fat Hairy
Difference(tm) between "safe" and merely "safer". Make it "guaranteed to be
safe" (which will most likely require tooling) rather than merely "safer" \-
and I will be the first one to promote it myself :-). Also - it would be gr8
to reduce the number of different concepts developer needs to remember about
while programming. In OP (assuming that tooling does exist) it is quite
simple: there are only 3 concepts, with 2 of them ('naked' and
'owning'=unique_ptr<>) being already very familiar; OTOH, current
implementation of SaferCPlusPlus reminds me of ALGOL68 - where it was possible
to specify _everything_, but choosing the right thing was so time-consuming
that it never really flew.

~~~
blub
SaferCPlusPlus has two big issues which prevent me from using it: confusing
class naming and too many concepts.

I do like the ideas it builds on, and I will probably implement a simplified
version for my needs...

~~~
duneroadrunner
Yes, documentation and class names are not SaferCPlusPlus' strong points at
the moment. Perhaps the easy way to get started is just to use the elements in
the "mse::mstd" namespace, like vector, array, string, string_view, etc. which
are just safe, compatible implementations of their namesakes in the "std"
namespace.

As for the pointers, there's a slightly out-of-date article[1] that tries to
explain them with examples. But a simple option is to just replace all your
raw pointers with "registered" pointers. It's not performance optimal, but
it's safe and simple.

But yes, better introductory documentation and examples are needed. There is
not yet a forum for those picking up SaferCPlusPlus, but for now you can post
any questions or suggestions in the issues section[2].

[1] [https://www.codeproject.com/Articles/1093894/How-To-
Safely-P...](https://www.codeproject.com/Articles/1093894/How-To-Safely-Pass-
Parameters-By-Reference-in-Cplu)

[2]
[https://github.com/duneroadrunner/SaferCPlusPlus/issues](https://github.com/duneroadrunner/SaferCPlusPlus/issues)

------
Jeaye
Even when you use RAII, const-by-default, shared pointers, type-rich APIs, and
the like, you're still using C++. That means you're still tied to C's legacy
defaults (of UB) and that also means you're still using C++ value categories.
If your "safe" C++ subset uses references, it can't be guaranteed to be safe
(since plenty of _valid_ code will lead to UB).

More info in the value category cheat sheet: [https://github.com/jeaye/value-
category-cheatsheet/blob/mast...](https://github.com/jeaye/value-category-
cheatsheet/blob/master/value-category-cheatsheet.pdf)

------
alacombe
TL;DR; use smart pointers, RAII semantic and STL containers/iterators. Though,
I'd have a few criticism...

> Rules to ensure memory safety

These "rules" only protect you against object's lifetime issues, not overflow
/ underflows, and other kind of memory issues.

> ‘owning’ pointers are obtained only from operator new

no, you shall be using std::make_{unique,shared}(...) which will protect you
against leaking memory if exceptions are raised.

> Calling a function passing the pointer as a parameter, is ok.

Correct, but you can still shoot yourself in the foot. Best is to pass a
[const] reference to the function called.

> This only leaves us with functions such as strchr()

Don't use C API. The STL should provide you with enough API to use the proper
C++ types, either std::string or std::string_view in C++17 if possible.

> and also prohibits C-style cast and static_cast with respect to pointers

IIRC, you can't static_cast<> a pointer, you'd have to reinterpret_cast<> it,
which the document does mention.

> For arrays, we can always store the size of the array within our array
> collection, and check the validity of our ‘safe iterator’ before
> dereferencing/indexing

use std::array.

~~~
saagarjha
Mostly agree, but:

> Don't use C API. The STL should provide you with enough API to use the
> proper C++ types, either std::string or std::string_view in C++17 if
> possible.

Sometimes you're working with C API that gives you back a char * that they've
already allocated. AFAIK there isn't a way to create an std::string out of
that without a copy.

> you can't static_cast<> a pointer

You can static_cast a void * into other kinds of pointers.

> use std::array

Do you mean std::vector and at()?

~~~
scott_s
> Do you mean std::vector and at()?

No, std::array was added in C++11:
[http://en.cppreference.com/w/cpp/container/array](http://en.cppreference.com/w/cpp/container/array)

~~~
saagarjha
std::array has its size fixed at compile time, so there's no reason for you to
do have to do bounds checking…

~~~
blub
char buffer[100] also has a fixed size and is responsible for most buffer
overflow bugs :)

array::at is just as mandatory as vector::at IMO.

------
wilun
Dialects are of limited uses, because they are dialects... New dialects are
arguably of even more limited uses, because better languages now exist where
the desirable characteristics are enforced not by using a dialect, but by the
core languages, and safety checking is not optional. (Also, I'm somewhat
curious about why the proposed dialect tells about think "similar to
unique_ptr, and so over: just use the real think -- at least it would be less
a dialect and more of modern standard C++). Dialects enforced by wishful
thinking or at beast ad-hoc tools maintained by a too small community will
perish in front of well architectured languages maintained by a real
community.

They have even been used to ship some important code in big project made of
tons of legacy code -- so I'm not even sure an interop argument could be made.

~~~
hedora
One point of a dialect (aka “coding standards”) is that you can evolve legacy
code bases toward them with a series of simple refactorings instead of by
rewriting from scratch.

For me this is the big advantage of C++: it is possible to backport virtually
any language feature you want to it, thanks to the combination of modern
template programming and low-level C-style bit twiddling.

------
mcguire
Interesting article. I hope the author has a chance to take a look at the Pony
language, which he's described the core of. Now all it needs is a capability
system to statically ensure that the data in sent messages is safe without
copying. (And to move those runtime checks into the type system.)

------
sriku
This is pretty close to the "autorelease pool" concept in objective C - an
idea which I copied in C++ for a product around 10yrs ago to good effect.

You wrap an auto release pool around every turn of the event loop, which is
the deferred memory release mentioned in the article (I admit I only scanned
it). Within the "react" part, this gives you much cleaner way to code even if
your code involves raw pointers to objects, as long as they are allocated on
the pool and aren't being transferred to another thread or caught in an RC
loop - both of which we managed using custom smart pointers.

------
jasonhanley
I feel like stuff like this is why golang and rust were created.

(edit: Forgot about Rust, sorry!)

~~~
ansible
_I feel like stuff like this is why golang was created._

Or more properly, why Rust was created.

I've grudgingly used C++ on some projects because of other constraints such as
the target platform. Due to the compiler version, we're stuck on C++11, which
is... OK. But keeping straight what we can use, and what we can't, and which
kinds of pointers we should be using when is a considerable burden.

Still working through "Effective Modern C++" while learning the ins and outs
of it in general.

------
Animats
_" Now, we can extend our allocation model with a few additional guidelines,
and as long as we’re following these rules/ guidelines, our C++ programs WILL
become perfectly safe against memory corruptions."_

What could possibly go wrong?

~~~
saagarjha
If you're not being facetious, then really, not much. You really can't go
wrong with smart pointers unless you explicitly try to access the memory it
handles rather than going through its normal interface (e.g., not using
get()). Shared pointers are basically reference counted just like many other
language handle memory management.

~~~
kolpa
> > as long as we’re following these rules/ guidelines

If that's an assumption, it's not worth much. A safe language is one that
enforces the rules, not one that hopes the program authors self-enforce.

~~~
dcookie
I don't think anyone claimed the _language_ was safe.

~~~
Animats
Title: "A Usable C++ Dialect That Is Safe Against Memory Corruption".

Any questions?

~~~
alacombe
The author claims that the rules described, "extending" the standard C++, are
enforcing memory corruption, and it is this author described subset that is
still unsafe, not C++ in general. Think about english vs. americanized
english, largely the same, but two distinct entities.

In C++, you are free to shoot yourself in the foot. In Rust, you have a
Government inspector ensure that you always point the gun not just in a "safe"
direction, but only toward a crosshair target at a designated gun range.

------
Kenji
This is basically how I program C++. Except that I try to avoid the 'new'
keyword too by std::make_unique and std::make_shared. This way there are
literally zero 'new' and 'delete' or 'malloc' or 'free' calls in your program.

~~~
ape4
Likewise, I avoid -> if possible.

~~~
saagarjha
Like, the operator ->? Why, and what do you use instead?

~~~
kolpa
You could use ".get()." instead of "->", but I don't see how that helps.

~~~
fjsolwmv
Correction: "( _foo). " or smart pointer (_foo.get())." instead of "foo->".

But I still don't see how that would help.

------
jeffreyrogers
There's a pretty good comment on this post from a shadow banned user named
Kenji, which I'm reproducing below:

This is basically how I program C++. Except that I try to avoid the 'new'
keyword too by std::make_unique and std::make_shared. This way there are
literally zero 'new' and 'delete' or 'malloc' or 'free' calls in your program.

~~~
saagarjha
I just vouched for that comment, so it should show up now.

~~~
jeffreyrogers
Oh cool, I didn't know you could do that.

