Hacker News new | past | comments | ask | show | jobs | submit login
So We've Got a Memory Leak (stevenharman.net)
219 points by todsacerdoti 26 days ago | hide | past | favorite | 147 comments



I just dont understand the fear with manual memory management. With RAII and and simple diligance (clear ownership rules), managing memory is an easy engineering task. I actually find it *more* challenging to deal with frameworks that insist or reference counting and shared pointers since ownership is now obscure.

I create it, I free it. I transfer, I no longer care. Its part of engineering discipline. Memory bugs are no worse than logic bugs, we fix the logic bugs, makes sense to fix the memory bugs. Disclaimer: I do embedded complex systems that run 24/7.

We do the same for OS resources (handles, sockets, etc) and dont use automatic resource managers, we do it manually. So why complicate the design with automatic memory management?


The use of manual memory management increases the cognitive load when reasoning about software. Working memory capacity varies considerably between people and is a performance limiting factor when designing complex systems. You see analogues of this in other engineering disciplines too.

Over my many years of working in software development I have come around to the idea that most software developers do not have sufficient working memory capacity to reason about memory management in addition to everything else they must reason about at the same time. They may know mechanically how to properly do manual memory management but when writing code they drop things if they are juggling too many things at once mentally. It isn't a matter of effort, there simply is a complexity threshold past which something has to give. Automatic memory management has its tradeoffs but it also enables some people to be effective who otherwise would not be.

On the other side of that you have the minority that get manual memory management right every time almost effortlessly, who don't grok why it is so difficult for everyone else because it is literally easy for them. I think some people can't imagine that such a group of engineers exist because they are not in that group and their brain doesn't work that way. If you are in this group, automatic memory management will seem to make things worse for unclear benefit.

I've observed this bifurcation in systems software my entire career. In discussions about memory safety and memory management, it is often erroneously assumed that the latter population simply doesn't exist.


I think the rift comes from people thinking manual memory management is a bunch of random allocs and frees all over the place. That is gross and those of us who don't mind managing memory don't like it either.

Personally my gripe is when people don't think about memory or space complexity at all. I don't care if it's a custom memory management strategy or GC'd language, you need to think about it. I have the same gripe about persistence, socket programming, and database queries.

Abstractions are great, use them, love them. But understanding what the hell they are doing under the hood not only improves efficiency, it prevents bugs, and gives you a really solid base when reasoning about unexpected behavior.


If the automatic allocation tools are good, then they're doing the same thing as quality manual management but with much more compiler enforcement.


Not exactly, GC is a general solution. Manual management can be tailor made. But to your point more compiler enforcement is a good thing.


I didn't mean just GC. Automated memory management comes in many forms and can be pretty specific.

And it can be tailored too if a situation really deserves it. If I'm doing something custom I want the smallest amount of code to verify, just the special memory manager and not everything that uses it.


True, I misread your original reply.


Which is actually the case in enterprise source code, touched by hundreds of offshoring companies.


But for most developers using GC (Garbage Collected) languages, not understanding these concepts does not affect their ability to earn a salary and continue working.


Any professional developer should understand these things. It was taught at first year of my computer science degree. As you say it isn't particularly difficult if you have a strategy. If you can't manage memory then what about any other resource that needs to be manually closed (sockets, file handles etc ).


I'm not sure I understand the causal link between something having a proven track record of being error-prone and not understanding it.


The problem is that automatic memory management means to some newcomers that you don't have to worry about it. Which is not true at all.

These newcomers may have heard about it, may even understand it at some level, but it's not in their rote knowledge. GC is great if you know that every allocation you do is expensive and that you know that it will be taken care of properly, because it will be short-lived.

But I have seen many cases where people just don't have it in their conciousness why allocating huge arrays is a problem or why allocating a new Character object for every single character in a string might be bad. As soon as I point it out, they get it, but it's not like they thought of it while writing their algorithm.


> Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

-Brian Kernighan.


I fully agree, and I've learnt that humbling lesson the difficult way. The transition from "I'm elated and proud of the clever solution I wrote" to "I'm too dumb to debug this so I have to rewrite my not-so-clever solution" was sobering. :-) We live and learn.


Even if you understand every bit of the code, when obscure bugs pop up you have to consider the whole system, and that will quickly overwhelm most people. That’s how pernicious bugs stick around for ages and make the team look like idiots.

Sometimes all you can do is refactor around the problem until you can see it. Or push people to stop being that sort of clever in the first place and write more accessible code.


Memory bugs is an entire class of bugs that we have simply solved. If you use a language with a modern garbage collector (e.g. one that can handle cycles), you will very likely go an entire project without running into a single memory bug. To a first approximation, these bugs were not replaced with other bugs; they are simply gone. Further, we do not ask anything more of the programmer do accomplish this. Instead, we need the programmer to do less work than they would need to do with manual memory management.

That is not to say that garbage collection is an unambigous win. There are real downsides to using it. But for most programs, those modern garbage collectors are good enough that those downsides just don't matter.


> you will very likely go an entire project without running into a single memory bug

Sure, you haven't lost the handle to the memory, but you can still "leak" memory with a GC. Happens all the time. Add to data structure at state 1, do something in state 2, remove data in state 3. What happens you never hit state 3?

> But for most programs, those modern garbage collectors are good enough that those downsides just don't matter.

I mostly agree with this. Although, most programs hit a point where you have to be aware what the GC is doing and try to avoid additional allocations. Which isn't very ergonomic at times. And is often more fucking around than manual memory management, just concentrated to a smaller portion of the code base.


And this kind of leak is the same kind that is actually difficult to prevent and debug with manual allocation -- it's present / reachable in some structure, but not intentionally. Say you have a cache of items -- how big should it grow? Can it free space under memory pressure? If you reuse items, do they own collections (e.g., std::vec) that hold on to reserved but unused space (e.g. clear() doesn't free memory)? These are the hard problems and they are approximately the same with or without GC.


Joshua Bloch, of Effective Java fame, used the term unintentional object retention as a more precise term for memory “leaks” in garbage collected languages.


I’ve also heard this referred to as Abandoned Memory, or sometimes a Space Leak. It’s a common class of bug in a reference counted language, like Objective-C, where cycles need special consideration.


Interesting, not really heard those terms, always "live leak". It's quite possible to introduce a live leak with manual memory management as well, though perhaps less seldom.


Seemes a bit pedantic, but so am I. I like that term.


Are you calling just allocating memory you’ll never use leaking? It is not. It is recoverable because for it not to be GC’d it has to be someone accessible. A memory leak means you lost the pointer to the memory, so now you can’t free it.


I'm OK with people using the expression "memory leak" to mean "unbounded growth in heap size over time". The case that I see most frequently is a routine that's set up like "use some memory, wait until an event that never happens happens, free memory". Technically if you let time run indefinitely, the memory would be freed. But since memory is finite, eventually you run out and your program crashes, which is annoying in this case.


>Are you calling just allocating memory you’ll never use leaking? It is not.

Yes it is, by definition a memory leak is any memory that has been reserved but won't be read from or written to. If you allocate memory into a data structure and after some point in time you will never read from or write to that memory, then that memory constitutes a leak.


There are (at least) 2 definitions of memory leak.

The upsides of the definition you gave are that it's simple, well defined, and maximally precise (nothing that is safe to collect is considered live)

The significant downside to this definition is that it's uncomputable. To know if memory is used requires knowing if a loop halts.

The second definition of memory leak is "unreachability" which is a bit harder to nail down. It's a conservative approximation of the first definition, but is more popular because it's computable, and it's practical to write programs with or without GC that don't leak


The latter definition is not particularly useful for writing programs that run on hardware with finite memory. End users don't care whether or not the allocations are reachable when your program uses all the memory in the system and crowds out other programs, slows, and/or crashes.


That could also potentially be a cache in a highly unused program.


> A memory leak means you lost the pointer to the memory, so now you can’t free it.

that's an useless definition as otherwise it would be trivial to augment, say, malloc to prevent memory leaks.

I would define a leak as unused memory that it is not otherwise reusable by a program.


lost can include you don't know where to find it even though you know it exists...


> Sure, you haven't lost the handle to the memory, but you can still "leak" memory with a GC. Happens all the time. Add to data structure at state 1, do something in state 2, remove data in state 3. What happens you never hit state 3?

What happens if you never escape a while loop? Should we ban while loop then? Bugs happen, we fix them. But I haven't had a memory bug in years outside of C and C++.


There is one downside, though it's not inherent to garbage collection, it just happens to correlate in current languages. Gc'd languages don't have raii, and for that reason they actually make you do more work when managing non-memory resources. The have been some attempts to work around this with e.g. pythons with, but In my opinion it's less ergonomic due to forcing indentation.


Go's defer works like RAII and does not introduce indentations.

    f := createFile("/tmp/defer.txt")
    defer closeFile(f)
    writeFile(f)


Even when the syntax does require indentation ("with" in python, "using" in C#) it's still pretty clean IMO.


Additionally, `using` no longer requires indentation in C#

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...


TIL!

I just recently updated a few small services to C# 12 and there's a bunch of little niceties like this I'm finding (spread operator for instance).


Using also no longer requires inheritance from IDispose, it suffices that the type does support Dispose pattern, which is great when coupled with extension methods.


Using no longer requires a new scope block in more recent versions of C#


Defer and context managers are a pale approximation of RAII. They are constrained to lexical scope, as opposed to RAII which works with object lifetimes. And it's the latter case which is much more error prone than 'oops, I forgot to close a file I opened this function'.


This misses the main benefit of RAII which is that you can't forget to close the file.


While not perfect, static analysis takes care of that, in similar veins C and C++ devs reach out to them for the language flaws.


They surely do, D for example.

Swift as another one, RC is a GC algorithm, before the usual clueless reply from people not educated in CS.

Followed by FP languages now adopting effects and linear type systems, alongside their automatic memory management approaches.


Do they?

    // Disposed when goes out of scope
    using var http = new HttpClient();
    var text = await http.GetStringAsync("https://example.org/");


Even game engines (including unreal) use gc these days, which is nuts. Still though, it’s best to be careful with your allocations, use pooling, etc


Uncollectable reference cycles are shockingly easy to make in JS, especially with React. A classic example:

    function closure() {
        var smallObject = 3;
        var largeObject = Array(1000000);

        function longLived() { return smallObject; }
        function shortLived() { return largeObject; }
        shortLived(); return longLived;
    }
Will keep largeObject alive.


This isn't an uncollectable reference cycle. It's true that with this code in most/all JS engines, if there's a reference to the function `longLived` then `largeObject` will be kept in memory, but reference cycles are collectable in standard garbage collection systems. Both of the values will be garbage-collectable once no outside references to `longLived` still exist. Pure reference counting systems (Rust Rc, C++ shared_ptr, etc) are the kind of system that fail to automatically handle cycles.

You could test this with your code by setting a FinalizationRegistry to log when they're both finalized, unset any outside reference to `longLived`, and then do something to force a GC (like run Node with --expose-gc and call `global.gc()`, or just allocate a big Uint8Array).


Is this a property of JavaScript or the engine running it? This feels like something a sufficiently-smart™ closure implementation should be able to prevent.


This is an artifact of V8's GC. Effectively, largeObject and smallObject are tracked together, as a unit. Splitting it out into two separate records increases average memory usage. They keep saying they want to fix it eventually, but it's been this way for 10+ years at this point.

You really do have to know the quirks of what you're targeting.


> sufficiently-smart™

Well, the reference lua implementation handles this case fine, and it's a solid but not at all smart™ codebase.


JS runtimes are allowed to optimize this out, IIRC, and will often do so.


Geez! Didn't henry baker warn against implementing closures this way decades ago? This is an implementation bug—the code is fine.


Performance regression is a bug and GC has a horrific overhead in the average program. My computer is orders of magnitude faster than it was 15 years ago but it spends all of it's time wondering around in the wilderness hunting for memory to free. We could have just told it.

> those modern garbage collectors are good enough that those downsides just don't matter.

This is not what I see. Every I look at a profile of a program written in a GCed language, most of the time is spent in the garbage collector. I can't recall the last time I looked at a c++ profile where more than 10% of the time spent in new/delete. I have seen 100x speedups by disabling GC. You can't ship that, but it proves there is massive overhead to garbage collection.


> GC has a horrific overhead in the average program

It doesn't have to be that way. The BEAM is designed for tiny processes, and GC is cheap.


I love Erlang and BEAM, but the reason the GC is (mostly) cheap is because self-referential data structures are impossible, so you can have a very simple and yet still very effective GC. One heap per process also helps immensely.

Also, when your process has a giant heap, the GC gets expensive. There's been lots of improvement over the years, but I used to have processes that could usually go through N messages/sec, but if they ever got a backlog, the throughput would drop, and that would tend towards more backlog and even less throughput, and pretty soon it will never catch up.

That sometimes happens with other GC systems, but it never feels quite so dire.


Depends pretty much on what implementation one is specifically talking about.

Just like not all malloc()/free() were born equal, to the point companies did made big bucks selling specialised implementations.


I also speed up a function 100 times. It was concatenatign strings inside a loop. I changed it to used StringBuilder instead.

I could have disabled the GC to get the same speed up.


The article in question is about Ruby, which is garbage collected.

It's easy to get in trouble with an interconnected object graph: you mess up and hold onto a pointer that you shouldn't be holding onto, which results in a leak.


IMO it’s not that memory management is hard, it’s that developers are imperfect, so writing a 100% UB-free, leak-free program is hard. And it only takes a single oversight to cause havoc, be it a CVE, leak that causes a long-running program to slowly grow memory, or hair-pulling bug that occurs 1 in 1000 times.

Yes, logic bugs have the same issues, and even in languages like Java we can sometimes (albeit rarely IIRC) get memory leaks. But memory-safe languages are an improvement. Just like TypeScript is an improvement over JavaScript, even though they have the same semantics. We have automated systems that can decrease the amount of memory errors from 1% to 0.01%, why keep leak and UB prevention a manual concern? Moreover, the drawbacks of memory-safe languages are often not an issue: you can use a GC-based language like Java which is easy but has overhead, or an enforced-ownership language like Rust which has a learning curve but no overhead. Meanwhile, while logic bugs can be a PITA, memory bugs are particularly notorious, since they don’t give you a clear error message or in some cases even cause your program to halt when they occur.

Tangential: another solution that practically eliminated a class of bugs is formal verification. And currently we don’t see it in any system but those where correctness is most important, but that’s because unlike memory management the downsides are very large (extremely verbose, tricky code that must be structured a certain way or it’s even more verbose and tricky). But if formal verification gets better I believe that will also start to become more mainstream.


I worked with manual memory management for a decade (24/7 systems) and don’t miss it. It’s not per-se hard, it’s not scary, but if you’re dealing with structures that may contain reference loops, or using an architecture based on event handlers that may be moving references around, you need to do some very careful design around memory management instead of just thinking of the problem domain.


By far, the worst memory leak I’ve ever had to debug involved a cycle like you are describing, but it was in a Java program (swing encourages/encouraged such leaks, and “memory leaks in java are impossible”, so there weren’t decent heap profilers at the time).

For the last few decades, I’ve been writing c/c++/rust code, and the tooling there makes it trivial to find such things.

One good approach is to use a C++ custom allocator (that wraps a standard allocator) that gets a reference to a call site specific counter (or type specific counter) at compile time. When an object is allocated it increments the counter. When deleted, it decrements.

Every few minutes, it logs the top 100 allocation sites, sorted by object count or memory usage. At process exit, return an error code if any counters are non-zero.

With that, people can’t check in memory leaks that are encountered by tests.

In practice, the overhead of such a thing is too low to be measured, so it can be on all the time. That lets it find leaks that only occur in customer environments.


But circular references don't leak in Java. You have to have a GC root (e.g. a static, or something in your runtime) somewhere pointing at the thing to actually leak it.

There is one case where a "circular" reference can appear to cause a leak that I know of: WeakHashMap. But that's because the keys, which are indeed cleaned up at some point once the associated value is GC'd, are themselves strongly retained references.


ThreadLocal can be another problem. Tomcat kills and recreates threads for this reason.

Edit: another one is swapping code by swapping class loaders. That can retain static references in the runtime class loader.


Excluding from the problem domain the tool - and the constrains that come with it - namely the computer, isn't it a bit arbitrary?

I think nowadays the only major languages that don't provide GC are C/C++, and one use them typically for their unrivaled performances, meaning performance is a more or less explicit requirement, so it is part of the problem domain.


Even those have two major dialects with GC support, namely Unreal C++ and C++/CLI.

There was C++/CX as well, with WinRT/COM reference counting, but it got killed in MS internal politics.


35% of vulnerabilities in the biggest tech companies being due to use after free bugs are a part of the answer. (More than 90% of severe vulnerabilities are due to memory bugs impossible in memory-safe languages.)


Manual memory management vs GC is orthogonal to memory safe vs unsafe.


In practice the lines of each issue are placed very close together. Other than Rust, there are no popular memory safe languages that use manual memory management.


Rust doesn’t use manual memory management?


It does because you're still responsible for managing the lifetimes, even if you don't explicitly call malloc/free. Automatic memory management is reference counting and GC.


Every memory-safe language has a runtime written in a memory-managed language. Yes, even Rust: Rust is implemented using many (well-vetted) unsafe blocks.

So projects to improve the quality, and lower the defect rate, in memory-managed programming, are far from wasted. Even if they only get used to write fast garbage collectors so that line coders can get on with the work of delivering value to customers.


> We do the same for OS resources (handles, sockets, etc) and dont use automatic resource managers

1) Modern languages have made inroads here.

2) The OS is my automatic resource manager. When I hit ctrl-c on my running C program, my free() is never hit, yet it is cleaned up for me.

> So why complicate the design with automatic memory management?

I don't know beforehand who is going to be reading my memory, how many times, in what order, or whether they're going to copy out of it, or hold onto pointers into it.

> I just dont understand the fear with manual memory management. With RAII and and simple diligance (clear ownership rules), managing memory is an easy engineering task.

I claim that if managing memory is that straightforward, then it makes more sense to leave it to the compiler (Rust-style, not Java-style.) rather than let a human risk messing it up.


> just dont understand the fear with manual memory management. With RAII and and simple diligance (clear ownership rules), managing memory is an easy engineering task. I actually find it more* challenging to deal with frameworks that insist or reference counting and shared pointers since ownership is now obscure.*

I mostly agree.

With RAII, memory management is simplified. "Just" evaluate the lifetime of the object. Super easy if you're good at software design.

Reference counting and shared pointers still have their niche use case. But I've most often seen referencing counting used as a crutch where reference-counting is _easy_ but designing around object lifetimes is more appropriate.

> Memory bugs are no worse than logic bugs

It's true. A memory bug is "just" another logic bug. Memory bugs lead to null pointers and wild pointers. They're just as dangerous as logic bugs. Memory bugs are also far more preventable with RAII.

> We do the same for OS resources (handles, sockets, etc) and dont use automatic resource managers, we do it manually. So why complicate the design with automatic memory management?

I'm just going to point out that I don't do manual allocations for OS resources. I wrap OS resources into RAII objects. I've made lots (!) of custom RAII wrappers for OS objects and library objects. It's trivial to do and saves a feckton of headaches.

C++ has std::fstream. I'm not saying it's great. But it's definitely RAII for files and has been around for... well I don't know exactly but certainly 25+ years.


RAII was a huge improvement over C and I was shocked to see Zig forego such an improvement. Rust adopted RAII, and took it all to the next level with no data races and a lot more bugs caught at compile time.


Rust helped making affine types acceptable in mainstream.

The next level is going into linear types, which is what some GC first languages are now slowly looking into.


Memory leaks can be hard to track down, but overflows and use after free bugs can take a project weeks off schedule. Depending on what is overwritten and where, the effects show up very far from the source of the problem. For an engineering or product manager this is a terrifying prospect. Managed memory more or less solves those problems - it introduces a couple of others and there are still a possibility of resource leaks but these problems are both more rare and generally easier to pin down.


Sure, for simple flows-of-control, GC buys you little and an RAII type approach is fine. But RAII really only works if lifecycle and ownership spans are reflected in the lexical structure of the code base. RAII relies on creation and cleanup happening within a block/lexical scope.

Unfortunately in the real C++ codebases I've had to work with, the flow of control is never so simple. Any correspondence between lexical scope and lifecycle is broken if the code uses exceptions, callbacks, any type of async, threads, etc.


No, it still applies. You have to think on a higher abstraction level, like components with interfaces between them of sorts. If one of them creates a resource (because this goes far beyond memory allocations which are also only resources), it MUST take care to delete that resource again. Whether that happens in a different thread or whatever: Who cares. (And yes, more care needs to be taken around creation/deletion/use of resources then.) The Linux kernel is loaded with examples of it.

Any deviation from that MUST be documented which includes seeing that as comments in code so people who read/maintain that code know what this is all about.

Thing is, many memory leaks are actually logic bugs and no (resource-)GC will save your bacon against these, in fact, you made your life more complicated through it because now you have to debug through another (abstraction) layer before getting to the true problem.

This is what this ownership in Rust is all about: It makes ownership visible to the programmer so they have to take care about it in a very explicit manner.


... or a persistent data-structure [1].

[1] https://en.wikipedia.org/wiki/Persistent_data_structure


You can literally replace memory management in your post with stack management.

You wouldn't say the same thing about stack push and pop would you? Memory management is kind of a solved problem for 99% of program. It only happens very rarely and under very constrained environment.

Yes, I did encountered a stack overflow error because I didn't realize the size of the struct is gigantic. So I solved it by creating the object on the heap instead. Similarly, memory "bugs" in managed systems is very rare and they can mostly be solved.

Every time a managed systems have issues, we get blog posts like this exactly because they are rare. If I write a blog post every time I have encountered a memory leak in a C program, well no one really cares. It's kind of expected.


Businesses need software to be less complex and easier to develop so it is cheaper.

I can’t imagine manual memory management on these large scale projects with a high variance of developer scale and opinions.


It doesn't scale in large teams of various skill levels.


If manually managing memory were in fact an easy engineering task, software developers wouldn't be so demonstrably bad at it.


> We do the same for OS resources (handles, sockets, etc) and dont use automatic resource managers, we do it manually.

Generally, in most languages, file handles and sockets are automatically closed, when the variable holding them gets out of scope.


> Memory bugs are no worse than logic bugs

I guess it comes down to what you mean by "worse". It seems that memory bugs carry a higher risk of being completely invisible except in deliberately contrived situations. This makes them more dangerous because there's less of an incentive to fix something that clients will never notice.


> It seems that memory bugs carry a higher risk of being completely invisible except in deliberately contrived situations.

Do they really carry a higher risk of being invisible though? You don’t need to “contrive” situations either. My company just spent 6+ months trying to solve a JavaScript undefined symbol bug stemming from a library. When we finally tracked it down, it was because we were using import instead of require, which the documentation didn’t clarify was important. The fix we implemented was nowhere near where we expected the bug, and the only reason we finally solved it was because we had exhausted all other options. Sounds the same, if not worse, as a difficult to track memory bug to me.


I'm not sure if I get your point here, but it's possible that "invisible" was too ambiguous. What I meant was that a memory bug may be less likely to cause noticeable functional changes to the behaviour of the program during everyday usage. I.e. the client won't notice them.


History has repeatedly demonstrated that it is not easy. Simple diligence is clearly insufficient.


Oh yes! And also dont forget about the hand holding static type system that wont let you fart unless you explicitly convince it that you will not shit your pants!


So you never, ever, ever make a mistake? No, don't bother answering that. Of course you do. No amount of "engineering discipline" in the world can stamp out the human propensity to occasionally make a mistake.


Where did they say that?

Why do you jump to that?

Why didn't you assume that they make errors like everyone, and simply deal with them?

Your ego is so fragile that it can't survive un the the same universe with the idea that someone somewhere might be living just fine without a crutch you apparently consider indispensable?

The fact is there are more than one way to detect and avoid errors reaching all the way to production. Every other human activity since humans existed employs them for thousands of years before golang existed. Rigor and discipline and methodology are not actually impossible. Shock!


"I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say 'Yeah it works but you're leaking memory everywhere. Perhaps we should fix that.' I’ll just restart Apache every 10 requests." -Rasmus Lerdorf, PHP Non-Designer

https://en.wikiquote.org/wiki/Rasmus_Lerdorf


Never calling free() is a valid memory management strategy if you know your process lifetime exactly.


Git is actually using that approach which means that libgit is pretty useless to embed, which noone does anyway since it's GPL and everyone instead uses libgit2.


I tried to do this in a real program once. The program only ever runs for a fraction of a second and then exits, so it seemed like a good candidate. Unfortunately what happened after is that our company started running tools like Coverity which complained about leaked memory. The path of least resistance was to fix that by adding free()'s everywhere.


And then you accidently add a use-after-free bug by trying to satisfy a "correctness" tool...


It’s hard to fault the static analysis tooling here. Not freeing memory is, almost always, unintended behavior. Besides, if a business is going to blindly apply static analysis to their projects and then demand that every warning be “fixed”, then the tool really isn’t the problem.

Still, any serious developer should, at the very least, have the patience to interrogate their code.


Coverity will definitely warn you about use-after-free. It’s not a “correctness” tool, it’s a static analyzer and probably the best one out there (imo). Yes in this use case it’s probably not too important to care about, but really any code base of importance should be run through it on a fairly regular basis.


Besides not being perfect, it requires full access to the whole source code.


Coverity can’t find all use after free bugs.


The tool doesn’t demand it be satisfied, a manager does.


Worth pointing out that in Zig, you wouldn't have had this problem in the first place. The natural way to write a program of that nature is to create an arena allocator, immediately defer arena.deinit() on the next line, and then allocate to your heart's content. When the function returns, on an error or otherwise, the arena is freed.

No need to go back later and add a bunch of free(), because you correctly implemented the memory policy as a matter of course, in two lines.


That has nothing to do with a particular language, you could use individual allocations in Zig and end up with the same mess. Similarly, you could use memory arenas in C/... and not have the problem.


Defaults matter. In C, allocation is customarily done with malloc, a single, global allocator. It's possible to bring in a memory arena, but in the case I was replying to, you'll note that they didn't.

In Zig, everything which allocates receives an allocator to do so. So writing a top-to-bottom program of the sort indicated in the GP post, you're going to make an allocator first thing. That's when you decide to use an arena, and defer it, and you're done.

You wouldn't decide to use malloc (which is available), or the GeneralPurposeAllocator, and then "end up with the same mess" by forgetting to release any of the memory, because the debug release mode, where all the work gets done, will loudly complain about that. It would take real stubbornness and conviction to write an all-malloc-no-free program in Zig, but using an arena is cheap and easy.

So yes, it has everything to do with a particular language. If you wrote it in Rust, you'd end up with a borrow-checker-compatible program, if you wrote it in Nim, allocation would be taken care of by the runtime. In C, it's tempting to just use malloc and let the operating system do the GC when the process exits.


How much use after free and double free did you have to fix?


Don't do it though.

Unless you know you will only allocate 100x times less memory than the average user has it will bite you.

1. Your code is now fragile (in the taleb sense).

2. Your code is now unusable when someone wants to use it as a library in the future

3. You are now prone to memory fragmentation (many use a bump the pointer allocator when not freeing).

4. You encourage people to not care about free-ing anything — when you do turn free on, or turn a GC on, you might struggle to actually free anything because of references all over the place.


In certain circumstances it can be a good move. It might significantly improve real world performance, for instance.

Walter did this in the DMD compiler years ago and it gave a drastic performance boost. [0]

> You are now prone to memory fragmentation (many use a bump the pointer allocator when not freeing)

Pointer-bump allocation is immune to fragmentation by definition, no?

Keeping dead objects around might lead to caches filled with mostly 'dead' data though.

[0] https://web.archive.org/web/20190126213344/https://www.drdob...


Walter doing that is why compiling my work project uses 70 gigs of ram and is slow - because nothing stays in the cache because everything gets copied so much (because copies are cheap right?)

Anecdotally people have told me that's it's faster with a modern malloc impl anyway, haven't tried it properly.

> Pointer-bump allocation is immune to fragmentation by definition, no?

In some sense yes but what I mean is that like should end up near like whereas if you have everything going through a single bumping allocator then you can have a pattern of

ABABAB where a proper allocator would do AAABBB which the cache can actually use.


> Walter doing that is why compiling my work project uses 70 gigs of ram and is slow - because nothing stays in the cache

This sounds like guesswork. Do you know the details of DMD's internals? Have you done profiling to confirm poor cache behaviour?

Again, per the article I linked, when Walter made the change there was a drastic improvement in performance.

> everything gets copied so much (because copies are cheap right?)

I'm not sure what copying you're referring to here. Declining to call free has nothing to do with needless copy operations.

> like should end up near like whereas if you have everything going through a single bumping allocator then you can have a pattern of ABABAB where a proper allocator would do AAABBB which the cache can actually use.

A general purpose allocator like malloc can't segment the allocations for different types/purposes, it only has the allocation size to go on. That's true whether or not you ever call free. If you want to manage memory using purpose-specific pools, you'd need to do that yourself.

As for whether this really would improve cache performance, I imagine it's possible, which is why we have discussions about structure-of-arrays vs array-of-structures, and the entity component system pattern used in gamedev. I'd be surprised if compiler code could be significantly accelerated with that kind of reworking, though.


> I'm not sure what copying you're referring to here. Declining to call free has nothing to do with needless copy operations

Think.

You make malloc feel inexpensive both by using a crappy but fast allocator and disabling free. People (reflexively) know this, and then they don't get punished when they get extremely careless with their allocations because test suites never allocate enough memory to OOM the machine.

This isn't some hypothetical, I have measured the amount of memory dmd ever actually writes to (i.e. after allocation) to be absurdly low. Like single digits.

Pretty much every bit of semantic analysis that hasn't been optimized post-facto involves copying hundreds to thousands of bytes.

> A general purpose allocator like malloc can't segment the allocations for different types/purposes, it only has the allocation size to go on. That's true whether or not you ever call free.

The size is what I'm getting at, but a D allocator can do this because it gets given type information. (dmd allocates mainly with `new` which is then forwarded to the bump the pointer allocator if the GC is not un-disabled)

Also evidence that the exact scheme dmd uses is suboptimal wrt allocator impl:

https://forum.dlang.org/thread/zmknwhsidfigzsiqcibs@forum.dl...


I know the internals of DMD extremely well.


Instagram ran ~year with GC disabled just fine, citing 10% better memory utilisation. While you might be right for a library code, on application level it sometimes makes sense.



I think the complaint is that there was no strategy. The dude just wanted to build some websites.


CGI. But once you loose control of your memory in a large code base it is practically impossible to regain it. So you can’t move to long lived server processes when you decide to enter the more modern era.


free() doing nothing can be a valid application-level strategy. Not calling free() can bite you down the line.


That explains PHP


There is a scene in the pirates of the carribean I think of a lot. "You are without a doubt the worst pirate I have ever heard of" "Ah, but you have heard of me."

He kept the scope down. He shipped. It was hugely successful. In the end it was overtaken and rightly so, but that doesn't invalidate the success it had.


But it doesn't justify the arrogance.

"For all the folks getting excited about my quotes. Here is another - Yes, I am a terrible coder, but I am probably still better than you :)" -Rasmus Lerdorf

OR the continued negligence.

https://news.ycombinator.com/item?id=40256878

And who remembers how careless, reckless, and blithe he was with the PHP 5.3.7 release he didn't bother to test because running tests was too much of a hassle because there were already so many test failures that wading through them all to see if there were any new ones was just too much to ask of him, the leader of the widely used project, in charge of cutting releases?

>5.3.7 upgrade warning: [22-Aug-2011] Due to unfortunate issues with 5.3.7 (see bug#55439) users should postpone upgrading until 5.3.8 is released (expected in a few days).

No seriously, he's literally as careless as he claims to be (when he says that repeatedly, you should believe him!), and his lack of giving a shit about things like tests and encryption and security that are extremely important has caused actual serious security problems, like breaking crypt() by checking in sloppy buggy code that would have caused a unit test to fail, but without bothering to run the unit tests (because so many of them failed anyway, so who cares??), and then MAKING A RELEASE of PHP 5.3.7 with, OF ALL THINGS, a broken untested crypt()!

http://i.imgur.com/cAvSr.jpg

Do you think that's just his sense of humor, a self deprecating joke, breaking then releasing crypt() without testing, that's funny in some context? What context would that be? Do you just laugh and shrug it off with "Let Rasmus be Rasmus!"

https://www.reddit.com/r/programming/comments/jsudd/you_see_...

>r314434 (rasmus): Make static analyzers happy

>r315218 (stas): Unbreak crypt() (fix bug #55439) # If you want to remove static analyser messages, be my guest, but please run unit tests after

http://svn.php.net/viewvc/php/php-src/trunk/ext/standard/php...

https://plus.google.com/113641248237520845183/posts/g68d9RvR... [broken link]

>Rasmus Lerdorf

>+Lorenz H.-S. We do. See http://gcov.php.net

>You can see the code coverage, test case failures, Valgrind reports and more for each branch.

>The crypt change did trigger a test to fail, we just went a bit too fast with the release and didn't notice the failure. This is mostly because we have too many test failures which is primarily caused by us adding tests for bug reports before actually fixing the bug. I still like the practice of adding test cases for bugs and then working towards making the tests pass, however for some of these non-critical bugs that are taking a while to change we should probably switch them to XFAIL (expected fail) so they don't clutter up the test failure output and thus making it harder to spot new failures like this crypt one.

And don't even get me started about mysql_real_escape_string! It has the word "real" in it. I mean, come on, who would ever name a function "real", and why?

That implies the existence of a not-so-real mysql escape string function. Why didn't they simply FIX the gaping security hole in the not-so-real mysql escape string function, instead of maintaining one that was real that you should use, and one that was not so real that you should definitely not use, in the name of backwards compatibility?

Or were there actually people out there using the non-real mysql escape string function, and they didn't want to ruffle their feathers by forcing those people with code that had a security hole so big you could fly a space shuttle through to fix their gaping security holes?

The name of the function "mysql_real_escape_string" says all you need to know about the culture and carelessness and lack of security consciousness of the PHP community.

Melania Trump's "I REALLY DON'T CARE DO U?" nihilistic fashion statement sums up Rasmus Lerdorf's and the PHP community's attitude towards security, software quality, programming, standards, computer science, and unit tests.

¯\_(ツ)_/¯

https://www.youtube.com/watch?v=l5imY2oQauE


Your examples of arrogance has nuance:

> "For all the folks getting excited about my quotes. Here is another - Yes, I am a terrible coder, but I am probably still better than you :)" -Rasmus Lerdorf

Well, he shipped, met his user's needs, met the markets needs and generally hit all the necessary bullet points to make a successful and lasting impact on the world. If he didn't meet some requirement that you have, like memory safety, its because it wasn't necessary.

Checking off the required stuff and leaving the optional stuff for later is the sign of a good coder.

> his lack of giving a shit about things like tests and encryption and security that are extremely important

Woah there, cowboy! It turned out his take was correct, because it continued dominating over all those other technologies which cared about the thing you cared about.

Being correct is better than being elegant, clean, or bug-free.

> This is mostly because we have too many test failures which is primarily caused by us adding tests for bug reports before actually fixing the bug. I still like the practice of adding test cases for bugs and then working towards making the tests pass, however for some of these non-critical bugs that are taking a while to change

That's just a different way of saying "We didn't have enough resources to make the fixes go quicker". What's the alternative here? Don't log the bug? Don't make a test to repro the bug?

> Why didn't they simply FIX the gaping security hole in the not-so-real mysql escape string function,

You sound like you've never been in a professional development shop at all. The reason that things hang around seemingly forever is because someone is using it!.

It's the amateur mickey-mouse outfits that remove stuff which users are still using. It really is the equivalent of "Don't break userland".

No professional worth their salt breaks their existing users without a very good reason. This is why Microsoft is still shipping broken win32 functions that were written in 1998. It's why Linus insists "Don't break userland".

If you want to level up to professional level when shipping software, you're going to be shipping a lot of mistakes that you already know about.

On the whole, your comment makes Rasmus look like more of a professional than you.


Your argument that it's correct to not give a shit about tests and security and encryption and memory leaks simply because lots of people are using the project is upside-down, extremely unprofessional, and downright dangerous.

He shipped, but he didn't meet his users or the Internet community's needs, because his users and the Internet at large need safe reliable systems that somebody's actually bothered running the existing unit tests on, instead of security theater, Dunning-Kruger evangelism, and knee-jerk apologetics like your defeatist and fatalistic acceptance and justification of the status quo.

Trotting out Win32 to justify PHP's flaws is pretty unhinged. I'm unsure you're not just a parody account. A serious person would realize they've run out of valid arguments and re-examine their priors before making such an embarrassingly bottom-of-the-barrel justification.

Thomas Midgley Jr. also shipped and made a lasting impact on the world. Changing the world is not the only measure of success, nor justification of harmful impact.

https://en.wikipedia.org/wiki/Thomas_Midgley_Jr.

>His legacy is one of inventing the two chemicals that did the greatest environmental damage. Environmental historian J. R. McNeill stated that he "had more adverse impact on the atmosphere than any other single organism in Earth's history." Author Bill Bryson remarked that he possessed "an instinct for the regrettable that was almost uncanny." Science writer Fred Pearce described him as a "one-man environmental disaster".

PHP is the CFC of programming languages.


> Your argument that it's correct to not give a shit about tests and security and encryption and memory leaks simply because lots of people are using the project

That's not my argument. My argument is that while you might give a shit about $THINGS-DON-LIKES in code, the actual market is requiring much more stringent sink-or-swim product decisions.

So, yeah, for you what the market values in that product may be irrelevant to what you value in that code.

> He shipped, but he didn't meet his users or the Internet community's needs,

Obviously he did - he dominated over other languages, even though PHP had next to no marketing budget and was competing against products that had millions, or hundreds of millions, spent on marketing.

The market clearly preferred the product he provided.

> because his users and the Internet at large need safe reliable systems that

Your opinion on what those users needed differs greatly from what those user's expressed that they needed. What $DON thinks other people need is irrelevant when those same other people have opinions of their own.

> I'm unsure you're not just a parody account.

I'm sure you do. I'm not sure how that is relevant.

You're once again making the mistake of assuming that your opinion is actually relevant. It's not, to this argument, relevant at all. All your skepticism about my intention only digresses from the main argument, which is what you thought was good for product delivery turned out to be rejected by the market.

> [snipped digression]

The long and short of it is, your acerbic opinion on what the market needed in 1998 differed greatly from what the market actually chose.

Now, you might make a different argument: that the user's should have chosen a better product.

But the argument you made was that the product made the wrong trade-offs when trading off security for existence.

To my my knowledge, there is no product in the world that makes the trade-off you suggest[1], which is what lead me to believe that you have never been part of a product development.

[1] Remember that even Microsoft made the "make it first, then make it secure" decision with almost their entire product line since the 80s. When the richest companies in the world are making this sort of decision and successfully delivering world-dominating products, there's no question of "Is Don wrong?", it's more a question of "When will Don realise it?"


No, you're not a serious or professional person. And you don't use dashes in environment variable names, you use underscores.


> No, you're not a serious or professional person.

For the record, I actually am.

But, like I said, if you were one too, you wouldn't be making mostly irrelevant digressions.


If you were serious, you would be capable of making much better arguments, and be able to address the ones I made, instead of doubling down on your praise and water carrying of Win32's and PHP's mediocrity and insecurity, and defense of Rasmus's arrogant and negligent carelessness.

Doubling down on your arguments is like bragging about shooting your puppy the face and claiming you stared down Kim Jong-il, instead of just admitting you made a mistake. It doesn't save face as much as you'd like to imagine, or lend any credibility to your self-aggrandizing claims of seriousness and professionalism. You must be fun to work with. /s


> If you were serious, you would be capable of making much better arguments,

The argument I made, viz user's needs from a product are quite different to what you imagine their needs are, is enough. It's actually a self-evident assertion in many contexts.

All the successful products from that time paid little to no attention to security. Phones, operating systems, ERP software, instant messaging. I don't see how PHP was different in this regard.

Nothing and no one was paying attention to security: you could (and I did) send email by simply telnetting to a server and talking SMTP to it. The internet was not a place where security was a large consideration. The only secure thing was https, which few places used.

You, on the other hand, are quick to call someone a shill, quick to impugn someone else to make your argument look stronger and ignore any arguments made in favour of what your emotions tell you.

We're both talking here under our real names.


https://news.ycombinator.com/item?id=40256878

>It's not "supposed" to be that way.

>It just happened to end up that way because Rasmus Lerdorf just doesn't give a shit. ¯\_(ツ)_/¯

[...]


So one place I worked might win some sort of prize for the dumbest way to burn $5m with a memory leak.

So back in the 90s the printer driver in Solaris had a memory leak[1]. At the time I was a contractor for a big bank that is sadly/not sadly no longer with us any more. This was when the status of faxes in confirming contracts hadn't been sufficiently tested in court so banks used to book trades by fax and the system that sent the fax would also send a document to a particular printer which would print the trade confirm out and there was some poor sap whos job consisted of picking up the confirms from this printer, calling the counterparty and reading them out so that they were all on tape[2] and legally confirmed.

Anyhow one day the memory leak caused the printer driver to fall over and fail to print out one of the confirms so the person didn't read it out on the telephone. The market moved substantially and the counterparty DKd the trade[3]. A lot of huffing and puffing by senior executives at the bank had no effect and they booked a $5m loss and made a policy to never trade with that particular bank again[4]. The fax printer job was moved to windows NT.

[1] According to the excellent "Expert C programming" this was eventually fixed because Scott McNealy, then CEO of Sun Microsystems had been given a very underpowered workstation (as CEO) so was affected a lot by this problem and eventually made enough of a fuss the devs got around to fixing it https://progforperf.github.io/Expert_C_Programming.pdf

[2] Calls in the securities division of banks are pretty much always recorded for legal and compliance reasons

[3] DK stands for "Don't know". If the other side says they "don't know" a trade they are disputing the fact that a contract was made.

[4] Which I'm sure hurt us more than it hurt them as they could just trade somewhere else and pay some other bank their commission


>...they booked a $5m loss and made a policy to never trade with that particular bank again

Maybe I am too cynical, but would many businesses retroactively agree to a deal which would cost them a ton of money? If the Process requires Is dotted, Ts crossed, and a phone call confirmation which was never placed -why eat the loss when the other party should own the error?

Citi just had a lawsuit because of paying back a loan too quickly. I expect everyone in finance to play hard ball on written agreements when it works in their favor.


Sometimes it happens. Particularly in big US old school broker-dealers, "Dictum Meum Pactum"[1] is something some people take very seriously especially since you will have a fruitful (if adversarial) working relationship over many years and may need someone to do you a personal favour in the future (eg giving you a job etc).

For example I know of one US investment bank where a very large options position was "booked" by a trader using a spreadsheet rather than in the official booking system which meant that the normal "exercise and expiry" alerts didn't go off to warn people when the trade was about to expire. The trader in question went on holiday and as a result the trade expired more than a billion dollars[2] in the money. The CEO of the bank called up the counterparty and successfully persuaded them to honour the trade and pay up even though it had actually expired and everyone knew there was no legal obligation. As it was explained to me at the time, the counterparty had probably hedged the trade so was not scratching around the sofa trying to find the money.

[1] "My word is my bond"

[2] Yes. With a b.


Valgrind makes finding leaks so easy in C.

Fixing them is harder, but it's usually easy if your design is right. I usually allocate and free in the same function unless the function is meant to allocate for the caller. (And then that call is considered allocation in the caller.)


It's the "repro the bug" that's hard...

When I worked on static analysis of codebases, the error handling codepath was the most likely source of problems.


Yes.

To do that, I fuzz like crazy, and every path becomes a test, even if it means a lot of manual work checking each one. That alone exercises tons of error paths.

To cover 99% of the rest, I use SQLite's error injection [1] on all tests. Just doing error injection on allocation gets you 90% of the way there.

[1]: https://sqlite.org/testing.html#anomaly_testing


> I usually allocate and free in the same function unless the function is meant to allocate for the caller. (And then that call is considered allocation in the caller.)

Yeah. I do a similar thing in C, but think of it as 'differing level of scopes in the abstraction'[1].

It's pretty easy to visually spot when a scope acquires a resource in $SCOPE::foo() and never releases that resource in the $SCOPE::cleanup().

[1] Like the way code has block scope, function scope, file scope and global scope, a problem domain or an abstraction over the solution (the model) also has varying levels of scope. I've never found this to be a taught thing, however. Modeling the problem domain and any proposed solution is a useful skill before jumping into coding a solution.


Reminds me of this story I heard about Yahoo. Their ads server had a memory leak and it would OOM after something like 10000 requests.

Their solution: restart the server after 8000 requests.

This worked for a year or two. And then it started OOM-ing after 8000 requests.

Next solution: restart the server after 6000 requests.


> Their solution: restart the server after 8000 requests.

8000 requests is something like 500 milliseconds for the average ad server.

You need exceptionally fast restarts for that to work.


This is in the context of (y)Apache. You set MaxRequestsPerChild, so when the request limit is hit, the child is killed, a new one started, and requests can be served by other children until the replacement is ready. In a pure config, idle children block on accept, so the kernel does all the coordination work required to get requests to children, the child exits after hitting the cap, and the parent catches the exit and spawns another. As long as all the children don't hit their cap at once, there's no gap in service. If they do all hit the cap at once, it's still not so bad.

I don't know about ads, but on Yahoo Travel, I had no qualms about solving memory leaks that took months to show up with MaxRequestsPerChild 100000. I gave valgrind or something a whirl, but didn't find it right away and was tired of getting paged once a quarter for it, so...

I did do some scaleout work for Shopping once, and found theirs set at 3. I don't remember where I left it, but much higher. Nobody knew why it was 3, so I took the chance that it would become aparrent if I set it to 100, and nothing obvious was wrong and the servers were much happier. fork() is fast, but it's not fast enough to do it every 3 requests.


Perhaps now, but the story is about Yahoo, which means it could be from the early 2000s or late 90s. Traffic volumes were probably lower, computers were definitely slower, internet advertising was not as big as it is now, etc.


If that gives you extra time to move the problem to the future, I'd say that's a win :D


When I was a rails developer, the ‘done thing’ was to simply throw hardware at issues like these as an acceptable tradeoff for productivity. If you cared about this sort of thing, you’d use something more formal. I find it personally difficult to calm my pearl-clutching perfectionist tendencies to embrace that approach but I can’t deny it does work :)


Lifehack: instead of admitting you are rebooting the server every 10 minutes to clear the memory leaks, call it a "phased arena allocation strategy" and then it's fine.


Oddly, Apache uses a pool allocator[1], so it is using essentially the same strategy as Rasmus already.

1. or it did 20 years ago, i might be out of date :)


I didn't read it all but I noticed I enjoyed the way you write. I don't know if it's the emojis or the overall formatting you use.


Thank you! It is a long read, to be sure.


I've used languages with and without garbage collection. Usually, manual is more difficult to write, automatic is more difficult to troubleshoot.

I would love to use a language that allows me to do both. Writing exploratory code is more convenient using automatic memory management. There are certain kinds of code that benefit from manual memory management.

I find appalling that people can't find a middle ground between forbidden and mandatory.


V uses a GC by default, but it's easily disabled per function/module via the @[manualfree] attribute or for the entire project via `v -gc none`

https://vlang.io


That langauge is C++. I almost never do manual memory management in that language, but you can do it if you want.


"Much has been written about various tools for profiling a leak, understanding heap dumps, common causes of leaks".

Eww.. leaks and heap dumps. Someone needs a healthier diet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: