
Memory management in C programs (2014) - adamnemecek
http://nethack4.org/blog/memory.html
======
comex
I'm going to claim that the problem being solved here fundamentally comes from
attempting to shoehorn early exit into an ancient codebase that wasn't
designed for it, via longjmp. Exceptions can be a handy language feature, but
plenty of modern code written in languages like C++ (whose exception support
is often eschewed) and Go gets by without them. It just has to be written so
that most functions manually propagate error codes they receive, potentially
after manually unwinding some core state. C code can be written that way too:
the manual cleanup that must be written then extends to mundane buffer
freeing, but that's a tractable and local requirement; no need for complicated
global reasoning around static buffers and custom allocation schemes.

~~~
deathanatos
> _plenty of modern code written in languages like C++ (whose exception
> support is often eschewed) and Go gets by without [exceptions]._

I would not consider C++ code that eschews exceptions "modern". To eschew
exceptions almost always entails giving up RAII, and if you give up RAII
you've done yourself a huge disservice: you're now stuck manually managing
memory and resources, and suffering all the bugs that come with that. While
RAII does not outright prevent these bugs (which is one of the reasons I'm a
huge fan of Rust), coding along its principles greatly reduces the risk that
you will run into problems.

If you're wondering: giving up exceptions means a constructor has no way to
signal failure, as the result of a ctor in C++ is always either an exception,
or a constructed object. Most "no exceptions" C++ code I've seen opts to
construct what I call "zombie objects"; internally, the object tracks that an
error has occurred, and all uses of the real object must be first checked
against that internal error flag to ensure the object isn't a zombie (or if it
is, then error). The object is effectively a null pointer, and has all the
trouble that entails.

(I also won't pretend that exceptions aren't without problems; in particular,
the argument against them because you can't tell, at a particular point in a
function's code, if an exception can occur, or where it would be caught, is
completely valid, but no different in many popular languages, such as Python,
C#, and in some ways, Java. However, I think the advantages of RAII — and
exceptions — outweigh their disadvantages.)

> _It just has to be written so that most functions manually propagate error
> codes they receive_

When I've had to write C, I've found this to be exceptionally tedious to do
correctly, and all too easy to ignore.

~~~
kabdib
I've not seen a large production C++ code base that did anything with
exceptions other than to catch them, record some debug information and
restart.

This is especially true when third party code is involved. C++ is hard enough
to get right without exceptions; dealing with failure in bodies of code that
you didn't write and maybe don't have control over (or even source code of) is
death on wheels.

If that means the code I'm working on isn't modern . . . I think that's fine.
But the first volume (of three?) of Herb Sutter's _Exceptional C++_ should be
enough to convince anyone that dealing with exceptions "correctly" is time
better spent eschewing exceptions and restructuring your code so you can do
_worthwhile_ hard things.

~~~
deathanatos
> _C++ is hard enough to get right without exceptions_

In what manner? If you use exceptions, you _must_ (IMO) use RAII, or yes, it
will be painful. Every example of "painful C++" I've seen involved someone
ignoring that, and that is fighting the language. I understand that's a bit of
a strawman argument, but you didn't elaborate enough in your post to debate
it.

If you are using RAII, exceptions shouldn't be terribly hard to "get right";
an exception propagating up the stack will free resources as it goes. (This is
the same behavior, again, as numerous other languages; except in C++, I have
the benefit of RAII, in others, I generally do not.)

> _dealing with failure in bodies of code that you didn 't write and maybe
> don't have control over (or even source code of) is death on wheels._

It's peculiar that I only ever hear this argument against C++, when Python,
C#, and to an extent Java¹ work in the same manner. Were you writing a C++
webserver, for example, I expect that you would do exactly what my current
job's Python webserver does: if an uncaught exception makes it to the main
request handler, it is caught in a catch-all, duly logged, and a 500 served in
response. This is appropriate both in Python and C++ IMO. (Whereas in the
hypothetical C++ side, I gain the benefits of static typing and RAII that I
can never have in Python.)

In my day-to-day Python, we quite often learn about uncaught exceptions — from
third-party code that I didn't write and maybe don't have control over — in
production; sometimes, we will add a handler for that particular exception
where appropriate (it was a bug) or fix the underlying symptom (the exception
indicates a larger problem, and allowing it to propagate to a good catch-all
point was appropriate, as was 500'ing the request).

But again, "death on wheels" is not really something I can provide worthwhile
argument against. Perhaps where third-party code has caused pain (at least,
for me) is when a very low level (say, socket error) propagates through the
stack; there might not be a good way to catch these beyond "catch everything",
but again, that's not unique to C++. The biggest problem I've had with these
is that they lack context: _what_ caused this socket error? Or, if some
library catches & re-throws its own exception class, too often do they discard
the inner error, and I lack the root cause (a different lack of context). But
you see this in Python as well. Error codes are perhaps the worst, as they
will quite often force you to lose context; do you expose error codes for all
of your inner error conditions? (And theirs, transitively?). (I've never seen
a non-exception C++ or a C codebase use anything aside from error codes as a
substitute.)

~~~
zvrba
>> C++ is hard enough to get right without exceptions > In what manner?

I guess he's referring to noexcept/basic/strong exception guarantees. Yes,
writing code with strong guarantees is hard, but what makes it hard is that
you have to write your code "transactionally", i.e., either 1) prepare changes
and "commit" them in one go, or 2) roll back changes if an exception occurs.

But writing code with "strong guarantee" is just as hard, if not harder, with
error codes.

~~~
yoklov
Its not harder. You know exactly where that function can return, with
exceptions its very hard to know what might throw in practice.

~~~
zvrba
With exceptions you can use something like scoped exit
([http://www.boost.org/doc/libs/1_61_0/libs/scope_exit/doc/htm...](http://www.boost.org/doc/libs/1_61_0/libs/scope_exit/doc/html/index.html))
for automatic rollback and neat cleanup even in multilevel "transactions".
With error codes, nothing happens automatically, hence it's easy to mess up
cleanup/rollback.

Yes, you can still use RAII for rollback/cleanup in combination with error
codes, but then you get the worst of both worlds: the (minor) complication of
writing RAII classes AND code cluttered with manual error-checking.

------
RustyRussell
Hmm, for C programs I use
[https://talloc.samba.org/talloc/doc/html/index.html](https://talloc.samba.org/talloc/doc/html/index.html)
or the more cut-down ccan/tal
[http://ccodearchive.net/info/tal.html](http://ccodearchive.net/info/tal.html)

There's sometimes a great insight available in viewing your allocation
hierarchy on a running program, too.

------
ableal
Happened to go look up the Nethack.org page, and hey, the author just
"ascended" to the DevTeam earlier this month:

[http://www.nethack.org/#News](http://www.nethack.org/#News)

 _" ""

The DevTeam would like to welcome its latest members. Both of these folks
should be familiar to the members of the NetHack community:

Alex Smith, who created the AceHack and NetHack 4 variants. Alex is an expert
on the inner workings of the game and the ways in which they can be exploited.

Patric Mueller, who is probably best known as the creator of the UnNetHack
variant. Patric also created NetHack-De (German translation of NetHack) and
has considerable involvement in the Junethack tournament. Before contracting
the dreaded coding bug, Patric was a long time player, having started out on
NetHack 3.0 on the Amiga back in the late 80s.

Alex has indicated that, at least initially, he'd be concentrating on improved
interface, improved internals, and new features in areas that don't affect the
game's gameplay, drawing on some of the features introduced in AceHack and
NetHack 4.

Patric's initial focus will be to identify and incorporate changes from a
variety of variants into NetHack, in order to expand the gameplay while
keeping NetHack's spirit and appeal intact.

Please join us in welcoming Patric and Alex to the DevTeam as we start plans
for the next major release.

For the DevTeam, Mike Stephenson

"""_

------
gp7
In my opinion, writing code that consists of malloc/free pairs is inviting all
the problems RAII and garbage collectors were designed to solve and which rust
can now statically check for correctness. So that's basically it: at this
point, I don't use the heap to manage memory in C, because if I did I'd
instead use either rust or, more likely, a garbage collected language. The
fact is, despite going out of their way to use the stack as much as possible
in this article (including copying into the stack), its sort of a wonder they
didn't realise a second, manually managed stack would be easy to manage _and_
be orthogonal to the program stack. Yes, it has a top, but so does the program
stack, and so does the memory backing the heap (yes, yes, pages, et cetera).
It doesn't get you all the way there, but for many small programs, which might
otherwise do a lot of mallocing and freeing as it dutifully initializes and
destroys objects, you can get away with just _one_ call to malloc, and _never_
pop the second stack. Allocate a few stacks in your stack, throw them in a
free list, and now you're cooking with gas

~~~
warmwaffles
Yes because pretending it does not exist and not understanding how things
worked prior to rust is totally realistic.

~~~
pjmlp
For all the good things Rust brings, one should not forget Rust isn't the
first language having RAII, regions or being memory safe for systems
programming by default.

------
Keyframe
With C, at least for me, it all starts with memory management and data.
Specific use of it dictates what I'm going to do (mostly image manipulation
and processing). I tend to allocate one or more big malloc buffers and manage
them with TLSF. I've also looked into halloc for hierarchical allocations.

------
keithnz
if you remember NASAs rules for programming, and MISRAs rules, then dynamic
allocation is out of the question. Certainly for most all embedded devices
I've worked on all significant memory is statically allocated, and the rest
are stack variables. Ideally with proofs of maximum stack depth.

~~~
swah
This page is about an old game, though.

------
swolchok

        case OPTTYPE_KEYMAP:
            str = malloc(1 + sizeof "submenu");
            strcpy(str, "submenu");
            return str;
    

There isn't a good reason for this not to be

    
    
        return strdup("submenu");
    

is there?

~~~
vinkelhake
malloc+strcpy is standard C, strdup is not.

~~~
deathanatos
While I agree, strdup is POSIX. In the exceedingly rare event that you're on a
system without it, it's simple enough to roll your own. (And if this occurred
more than a couple of times, I'd roll my own anyways to keep the code
obvious.)

~~~
theseoafs
In the exceedingly rare event that you need to target Windows?

~~~
deathanatos
Honestly, when I wrote the post, I thought I recalled that Windows _had_
strdup (Windows does include some limited POSIX functionality, IIRC; e.g.,
"select" definitely exists, and is also a POSIX function, so I figured surely
strdup was there.) MSDN seems to indicate that Windows does in fact have
strdup — and that it's "deprecated" in favor of "_strdup", which appears to do
exactly the same thing as strdup… what on earth is going on here?

------
bluetomcat
> In C++, the use of RAII means that exceptions can be made to free any
> dynamically allocated objects whose owners went out of scope as a result of
> the exception. In C, we don't easily have that option available to us, and
> those objects are just going to stay allocated.

In GCC and Clang, there is the "cleanup" attribute which runs a user-supplied
function once a variable goes out of scope, so you can have scope-based
destructors:

    
    
        static void do_cleanup(char **str) {
            free(*str);
        }
    
        void f(void) {
            char *s __attribute__((__cleanup__(do_cleanup))) = malloc(20);
            // s gets freed once the function exits
        }

~~~
pjmlp
Which is inherently not portable across other C compilers.

------
foota
This is a very interesting intro to some ways that a C program can manage
memory.

------
sdegutis
<offtopic>As someone obsessed with writing C code, something just clicked. I
think I understand better why there are so many JavaScript programmers, so
many tools and frameworks written in and around JS every day.

For me, C is the gateway to the computer. It's ubiquitous. It's deceptively
simple. I can literally write anything I want in C. It's an exciting thing to
have an open main.c file sitting in front of me.

That must be how JS developers feel about the browser. The browser is a
natural way to start learning programming for someone new to it these days.
When I was growing up, we didn't really have JS and the web was brand new. I
had to learn from QBasic's "Help" menu!

I bet there's an age correlation, too. I bet over a certain age, it's way more
C programmers, and under it is way more JS people. (And above us is probably
FORTRAN and above them is probably COBOL.)</div>

~~~
_RPM
I agree completely about C being the gateway to the computer. It really is the
only way to go if you want to learn real programming. That doesn't mean I'm
discounting JavaScript developers, but I like C better and it's more fun.

~~~
collinmanderson
Slightly off topic: The way I look at it, C and C++ are the bedrock of all
programming languages, because the popular compliers (llvm/clang, gcc and
VisualStudio) are written in C and C++.

JavaScript, for example requires a browser to compile, which requires gcc or
clang to compile. Python, Rust, etc all depend on programs written in C and
C++.

Go is one recent exception of a popular language whose main implementation is
completely self hosted. Though, Go doesn't host other popular languages.

~~~
rat87
I hate that way of looking at it because I think it's totally wrong.

C (this view is usually presented as only about c) isn't fundamental or
bedrock in any way. Most of our current stacks just happen to be written in
it.

The original Mac OS for instance was written in Pascal and C. The most popular
open source compilers are written in c++. Most browsers are written in c++ not
in c.

And there's no reason these days you couldn't rewrite llvm (a set of compiler
libraries and compilers mostly written in c++, not that rust compiler uses
llvm for code generation but the frontend is in rust) in haskell or java or
javascript(they'd be slower but they would work). Haskell has useful features
for writing compilers and a lot of language analysis libraries already. If you
re-wrote in in rust it'd be just as fast.

Hell this guy
([https://github.com/jameysharp/corrode](https://github.com/jameysharp/corrode))
wrote a rust-c source-to-source compiler in literate haskell. Also in a few
years servo will be a full browser in rust and Firefox is already adding
components written in rust.

Other then os interface & c binary api linking there is nothing special about
c or c++.

~~~
collinmanderson
> Most of our current stacks just happen to be written in it.

Sure, there's nothing special about the languages themselves, but _currently_
C and C++ are the foundation for (almost) all other languages. In theory you
could rewrite llvm (or use a different complier), but I don't see that
happening any time soon. That's why I call it bedrock, because it would be
very hard to change.

(Though, I do wonder when llvm will stop depending on Python 2 :)

------
kevin_thibedeau
> Sprintf(...)

Seriously? Why would anyone use sprintf() in 2013 without knowing the size of
all possible arguments.

~~~
to3m
The article does touch on this...

