Hacker News new | past | comments | ask | show | jobs | submit login
Taming Go’s memory usage, or how we avoided rewriting our client in Rust (akitasoftware.com)
359 points by jeanyang on Sept 21, 2021 | hide | past | favorite | 219 comments



The big wins in this article, in what I believe was the order of impact:

* They do raw packet reassembly using gopacket, and gopacket keeps TCP reassembly buffers that can grow without bound when you miss a TCP segment. They capped the buffers, and the huge 5G spikes went away.

* They were reading whole buffers into memory before handing them off to YAML and JSON parsers. They passed readers instead.

* They were using a protobuf diffing library that used `reflect` under the hood, which allocates. They generated their own explicit object inspection thingies.

* They stopped compiling regexps on the fly and moved the regexps to package variables. (I actually don't know if this was a significant win; there might just be the three big wins.)

This is a great article. But none of these seem Go-specific†, or even GC-specific. They're doing something really ambitious (slurping packets up off the wire against busy API servers, reassembling them in userland into streams, and then parsing the contents of the streams). Memory usage was going to be fiddly no matter what they built with. The problems they ran up against seem pretty textbook.

Frankly I'm surprised Go acquitted itself as well as it did here.

Maybe the perils of `reflect` count as a Go thing; it's worth noting that there's folk wisdom in Go-land to avoid `reflect` when possible.


Agree strongly here. These are common sources of memory leaks in any language, and it's very likely that rewriting this code in Rust would lead to the exact same problems. (Other cases on HN, like Discord's in-memory cache and Twitch's "memory ballast" thing, are pretty Go specific -- the identical C program wouldn't have those particular bugs. But, the Go developers read these incident reports and do fix the underlying causes; I think Twitch's need for the "memory ballast" got fixed a few years ago, but well after the "don't use Go for that" meme was popularized.)

Buffering is a pretty common bad habit. As programmers, we know stuff is going to go wrong, and we don't want to tell the user "come back later" (or in this case, undercount TCP stream metrics)... we want to save the data and automatically process it when we can so they don't have to. But, unfortunately it's an intrinsic Law Of The Universe that if data comes in a X bytes per second, and leaves at X-k bytes per second, then eventually you will use all storage space in the Universe for your buffer, and then you have the same problem you started with. (Storage limits in mirror may be closer than they appear.) Getting it into your mind that you have to apply back pressure when the system is out of its design specification is pretty crucial. Monitor it, alert on it, fix it, but don't assume that X more bytes of RAM will solve your problem -- there will eventually be a bigger event that exceeds those bounds.

Incidentally, the reason why you can make Zoom calls and use SSH while you download a file is because people added software to your networking stack that drops packets even though buffer space in your consumer-grade router are available. That tells your download to chill out so SSH and video conferencing packets get a chance to be sent to the network. The people that made the router had one focus -- get the highest possible Speedtest score. Throughput, unfortunately, comes at the cost of latency (bandwidth * buffer size for every single packet!), and it's not the right decision overall.

I don't know where I was going with this rant but ... when your system is overloaded, apply backpressure to the consumers. A packet monitoring system can't do that (people wouldn't accept "monitoring is overloaded, stop the main process"), but it does have to give up at some point. If you don't have any more memory to reassemble TCP connections, mark the stream as an error and give up. If you're dumping HTTP requests into a database, and the database stops responding, you'll just have to tell the HTTP client at the other end "too many requests" or "temporarily unavailable". To make the system more reliable, keep an eye on those error metrics and do work to get them down. Don't just add some buffers and cross your fingers; you'll just increase latency and still be paged to fight some fire when an upstream system gets slow ;)

Edit to add: I have a few stories here. One of them is about memory limits, which I always put on any production service I run. sum(memory limits) < sum(memory installed in the machine), of course. One time I had Prometheus running in a k8s cluster, with no memory limit. Sometimes people would run queries that took a lot of RAM, and there was often slack space on the machine, so nothing bad happened. Then someone's mouse driver went crazy, and they opened the same Grafana tab thousands of times. On a high memory query. Obviously, Prometheus used as much RAM as it could, and Linux started OOM killing everything. Prometheus died, was rescheduled on a healthy node, and the next group of tabs killed it. Eventually, the OOM killer had killed the Kubelet on every node, and no further progress could be made. The moral of the story is that it would have been better to serve that user 1000 "sorry, Prometheus died horribly and we can't serve your request right now", which memory limits would have achieved. Instead, we used up all the RAM in the Universe to try to satisfy them, and still failed. (What was the resolution? I think we killed the bad browser, which happened to be a dashboard-displaying TV next to our desks. Then kubelets restarted, and I of course updated Prometheus to have a 4G memory limit. Retried 1000 tabs with an expensive query, and Prometheus died and the frontend proxy served 990 of the tabs an error message. Back pressure! It works! You can imagine how fun this story would have been if I had cluster autoscaling, though. Would have just eventually come back to a $1,000,000 AWS bill and a 1000 node Kubernetes cluster ;)


> it's an intrinsic Law Of The Universe that if data comes in a X bytes per second, and leaves at X-k bytes per second, then eventually you will use all storage space in the Universe for your buffer,

This is known as Little's Law. Using Little's Law, you know that if the average time spent in queue is more than the average time it takes for a new entry to be added to the queue, then your queue fills up.


Or in other words, a Little at a time adds up to a lot.


Did Little formulate multiple eponymous laws? Since that does not seem to be the Little's law that I'm familiar with.


Here's a good introduction to Little's Law and associated operational rules derived from it on queues: http://web.eng.ucsd.edu/~massimo/ECE158A/Handouts_files/Litt...


Thanks, but I had already had courses on that. We never associated the condition for stability (λ<μ) with Little's law (L=λW).


> They stopped compiling regexps on the fly and moved the regexps to package variables. (I actually don't know if this was a significant win; there might just be the three big wins.)

Anecdotally, this could be a huge win, depending on how often it's called.

A guy I was working with, new to Go, was writing a router config parser and asked why it was so slow.

The first thing I did was moved regexp.Compile from a hot path into a broader scope. It went from something like 40 seconds down to 2 on my machine.


I think it's easy to assume that in this case Go's regex library would keep an internal cache of expressions, using the expression string as a map key. But on the other, I can see why they haven't implemented it, because it obscures memory usage from direct control of the author.

It would probably be a good idea to add performance hints like 'prefer to put static regular expressions in a package variable' in a linter or go vet.


Actually I would expect any package not to silently cache things until explicitly specified. This otherwise creates an unbounded memory leak.

Moving static (at least as much it concerns the loop) expressions out of a loop is one of the most fundamental optimizations a programmer should do when writing code.


> I think it's easy to assume that in this case Go's regex library would keep an internal cache of expressions

IMHO, the stdlib doing implicit memoization is a catastrophe waiting to happen.

I think that handling regexps and caching functions are two composable and orthogonal features that should be handled by two packages/libs/... .


Spring (boot) works exactly the same. We once found that 30% of CPU time is spent parsing path regexes in Controllers somewhere deep inside the Spring. We had rewritten 1500 endpoints to hardcoded paths and it fixed CPU usage.


I've seen the same in Python, probably a dozen times. Sometimes folks think it's ugly (un-pythonic) but there's plenty of cases in the standard library to point to.


That's because the Python regex module caches the regexes it compiles, so it only happens once. It's proper and good usage to specify the regex string inline, even in a hot path.

I'd only use a variable when I'm using the same regex multiple times in code, and even then I could still just have the variable be the string.


> That's because the Python regex module caches the regexes it compiles, so it only happens once. It's proper and good usage to specify the regex string inline, even in a hot path.

Last time I had a look the regex cache was pretty small (few hundred entries) and gets completely cleared when full. Might have improved since, but historically it was very simplistic.

I disagree that it’s “proper and good usage” to specify regex inline. It’s fine for many usages but that’s as far as I’d go.


Even then, hashing and lookup is completely unnecessary in a hot path. Having a variable with a compiled regex is not unpythonic AFAIK


Yeah, it's been a while since I've benchmarked that, I'll try it out


Gotta agree with the sibling comments here. The performance difference is definitely smaller than it used to be, but there's still good reasons to keep compiled regexes in a module scope.

Caveat: I write libraries, not "production" code; my requirements are significantly more strict. One thing I can't do is make assumptions about where my code will run. If you're using my library, and you compile a whole bunch of regexes, they'll evict my regexes from the cache. I don't want the performance of my library to suffer, so I'll keep them in the module scope.


>Memory usage was going to be fiddly no matter what they built with.

That is true. I do find however the explicitness of the Rust way of dealing with memory, whether it be lifetimes, who can and can't mutate it and who the memory belongs to, makes it much easier to reason about the right way of doing these things.

In C++ the same is often possible, but there is no way to have guarantees at the interfaces. Const is a promise that your function won't mutate something, it doesn't put any restrictions on the caller. Pass by reference doesn't guarantee that the reference will be kept alive.

Go (I guess with no experience there) probably has fewer footguns, but how explicit is memory management?


Rust and Go are equivalently explicit about the memory concerns surfaced in this analysis.


Yeah, and perhaps someone who knows rust well could argue some things are easier to do right in rust. For example, in the second bullet, pass readers could be more of the norm in libraries since rust in a systems programming language. Third bullet to similar point.

I'm not saying rust is better or they made the wrong choice, sounds like C++ would let users easily make the same "wrong" choices, just interesting to carry the thoughts through a bit further.


io.Reader() and io.Writer() are used everywhere in Go, it's really a standard practice.

https://tour.golang.org/methods/21


> They were using a protobuf diffing library that used `reflect` under the hood, which allocates. They generated their own explicit object inspection thingies.

IIRC it is `reflect.Type.FieldXXX` which is the main culprit of allocations. Since the number of types in a typical application are bounded and small, you can get pretty far by just precomputing/caching struct fields.


Reflection APIs seem to be pretty messy and slow in every runtime I've ever used, perhaps because the idea of optimizing them might encourage more use. The C# reflection APIs also allocate a lot.


A thing you can ding Go for is that you can find yourself relying on `reflect` (under the hood) more than you expect, because it's how you do things like read struct tags for things like JSON.

But that's not what the problem was here; the product they were building was using `reflect` in anger. They were relying on something that did magic, pulling a rabbit out of its hat to automatically compare protobuf thingies. They used it on a hot path. The room quickly filled with rabbit corpses. I guess you can blame Go for the existence of those kinds of libraries, but most perf-sensitive devs know that they're a risk.


Reflection is also typically needed for anything that needs to be generic over types. For example, if you want to write a function that can traverse or transform a map or slice, where the actual types aren't known at compile time. We have a lot of this in our Go code at the company I work for. I'm really looking forward to generics, which will help us rip out a ton of reflect calls.


That kind of code is generally non-idiomatic in Go. An experienced Go programmer looks at something that is generic over types and does something interesting and instinctively asks "what gives, where are the dead rabbits?".

I'm less excited about generics. There's a cognitive cost to them, and the constraint current Go has against writing type-generic code is often very useful, the same way a word count limit is useful when writing a column. It changes the way you write, and often for the better.


This argument is getting a little tiresome though, isn't it? It isn't simply enough to call something "non-idiomatic" to gloss over a deficiency. There's a cognitive cost to all language features, but most other general purpose statically typed programming languages seem to have come to the conclusion that the benefit outweighs the cost for some form of generics.

I am by no means a Go basher, it is one of my favorite languages. But I eagerly await generics.


I could have written this more clearly. The fact that things that are generic over types are non-idiomatic today in Go has nothing to do with whether the upcoming generics feature is good or bad. They're unrelated arguments.

The latter argument is subjective and you might easily disagree. The former argument, about experienced Go programmers being wary when an API is generic over types, is pretty close to an objective fact; it is a true statement about conventional Go code.


That's a fair point. Knowing not to try to write generic code (since you don't have the tools) is the sign of an experienced Go programmer.

That being said, I'm curious how much kubernetes (a large, famous, Go codebase) still has code that does this. I used to read that it used a ton of interface{} and type assertion, but maybe that narrative is out of date (or never really true). I was never too familiar with the codebase myself.


I’m so conflicted on that point. I’ve been writing a high performance CRDT in rust for the last few months, and I’m leaning heavily on generics. For example, one of my types is a special b-tree for RLE data. (So each entry is a simple range of values). The b-tree is used in about 3-4 different contexts, each time with a different type parameter depending on what I need. Without genetics I’d need to either duplicate my code or do something simpler (and slower). I can imagine the same library in javascript with dynamic types and I think the result would be easier to read. But the resulting executable would run much slower, and the code would be way more error prone. (I couldn’t lean on the compiler to find bugs. Even TS wouldn’t be rich enough.)

Generics definitely make code harder to write and understand. But they can also be load bearing - for compile time error checking, specialisation and optimization. I’m not convinced it’s worth giving that up.


If we can be at a place where reasonable people can disagree about generics, I'm super happy, and think we've moved the discourse forward. There are things I like about generics, particularly in Rust (I've had the displeasure of dealing with them in C++, too). They're just not an unalloyed good thing.


Before writing Clojure, Rich Hickey wrote FOIL[1], which used sockets to communicate between common lisp and the JVM (or CLR). When asked about making it in-process, Rich observed that the reflection overhead on the JVM was often as large, or larger, than the serialization overhead, so the gains to be had were limited.

1: http://foil.sourceforge.net/


From what I recall, the Java team copped to the intentionally slow accusation, but that started to change when they decided to embrace the notion of other languages besides Java running on the JVM. Unfortunately that would have been shortly after Clojure was born. It took a few releases for them to really improve that situation, and that was still shortly before they started doing faster releases.


The usual C# reflection APIs that devs turn to allocate a lot, but there are ways to make them almost performant by (re)using delegates and expressions. There are a number of good libraries to use reflection faster, as well.


IIRC, dynamically compiled expression trees in C# have the overhead of a single virtual call (on the resulting delegate) when executing - and cover all object factory and member access scenarios. But if you need to discover metadata, you still have to resort to Reflection APIs.


The main corner case that still causes me problems is that if you want to construct a delegate at runtime, this often forces you to go through the reflection APIs to actually grab the method even if you know its full signature, etc. My current project has a JIT compiler for scripts that has this problem (I ended up finding a workaround involving getting LINQ to generate method tokens in an assembly, but .NET Core / .NET 5 deprecated LINQ compilation...)


They're pretty simple in many dynamic languages, eg you can just do "import os; dir(os)" in Python.


yeah, but that doesn't mean they're fast or don't make a mess of the gc heap


I hadn't heard people use messy to refer to garbage cration before!

To continue with Python, yes, you might get a new container (dict) allocated like in the above case to hold the already existing interned attribute name strings. It's still quite light since the object representing the information already exist and are used under the hood in the dynamic typing machinery.


It's a question I ask often in interview, how do you upload a 5GB file over the network with only 1MB of memory.


I am even not sure what this question is aiming at - I hope you are phrasing it more detailed than put here, or it would fit in those posts about the problems with interview questions :).

Assuming the file is on a disk and the 1 MB refers to the system memory - like you do with any potentially unbound data, you read and write it in chunks. Reading in data of any kind in whole is only reasonable, if you can clearly set an upper bound for its size.


This is not vague, it aims to see if the candidate has a notion of buffering, streaming ect ... the numbers don't really matter, you would be surprised at how many candidate have no idea how to load data in memory.


> with only 1MB of memory

Is that total system memory?


Well I left Kentucky back in '49 and went to Detroit work'n on assembly line..


To such a vague question, the only answer can be "less than 1MB at a time". Not terribly useful.


hello world in Go is 1.9 MB


I am not saying it's bad given what go runtime can do


That's it, in most cases, performance is not down to the language but to how it's used. "Mechanical Sympathy" is one of those terms that might apply here as well.

As to the issues you mentioned, there's a few 'adages' you could apply; "always use readers", "don't use reflect if you can help it", "move unchanging expressions to package level", etc.


Maybe I'm just incompetent but why would you do this?


> Frankly I'm surprised Go acquitted itself as well as it did here.

As opposed to, e.g. Java, which I ranted elsewhere in the thread, is a trashy mess. I programmed for over a decade in Java, and yeah, it's only gotten worse over the years. They would have done even more custom processing and bypassing of the layers underneath due to Java's typical copy-happiness.


This kind of analysis and remediation would work just as well in Java and is often a more rigorous and effective approach than the author's somewhat Java-inspired initial idea of fiddling with GC parameters.

One big difference is that the Java runtime design intent is more in the vein of 'converting memory into performance'. On HN, Ron Pressler ('pron) has written a bunch of interesting stuff about that over the years

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


Java the language has improved a lot IME. If you're talking about some specific library, then I don't know.


Bit confused by this part of the article:

> PRO-REWRITE: Rust has manual memory management, so we would avoid the problem of having to wrestle with a garbage collector because we would just deallocate unused memory ourselves, or more carefully be able to engineer the response to increased load.

> ANTI-REWRITE: Rust has manual memory management, which means that whenever we’re writing code we’ll have to take the time to manage memory ourselves.

Isn't part of the point of Rust that you don't manage memory yourself, and rather that the compiler is smart enough to manage it for you?


Yes, Rust kinda doesn't fit super cleanly into a very black/white binary here. It is automatic in the sense that you do not generally call malloc/free. The compiler handles this for you. At the same time, you have a lot more control than you do in a language with a GC, and so to some people, it feels more manual.

It's also like, a perception thing in some sense. Imagine someone writes some code. They get a compiler error. There are two ways to react to this event:

"Wow the compiler didn't make this work, I have to think about memory all the time."

"Ah, the compiler caught a mistake for me. Thank goodness I don't have to think about this for myself."

Both perceptions make sense, but seem to be in complete and total opposition.


"Manual vs automatic" is mostly just a semantic problem IMHO. We could say "runtime versus compile time" to be more precise, but maybe there are problems there as well. The more interesting question to me is "how much time/energy do I spend thinking about memory management, and is that how my time is best spent?". In cases of high performance code, you might spend more time fighting with the GC than you would with the borrow checker to get the performance you need, but for everything else the hot paths are so few and far between you're most likely better off fighting with the GC 1% of the time and not fighting anything the other 99%.

The Rust community has done laudable work in bringing down the cognitive threshold of "manual / compile-time" memory management, but I think we're finding out that the returns are diminishing quickly and there's still quite a chasm between borrow checking and GC with respect to developer velocity.


"developer velocity" is also, in some sense, a semantic question. I am special, of course, but basically, if you include things like "time fixing bugs that would have been prevented in Rust in the first place", my velocity is higher in Rust than in many GC'd languages I've used in the past. It just depends on so many factors it's impossible to say definitively one way or another.


I have trouble believing this, at least in any generalizable way. I'm comfortable in both Go and Rust at this point (my Rust has gotten better since last year when I was griping about it on HN), and it's simply the case that I have to think more carefully about things in Rust because Go takes care of them for me. It's not a "think more carefully and you're rewarded with a program that runs more reliably and so you make up the time in debugging" thing; it's just slower to write a Rust program, because the memory management is much fiddlier.

This seems pretty close to objective. It doesn't seem like a semantic question at all. These things are all "knowable" and "catalogable".

(I like Rust more now than I did last year; I'm not dunking on it.)


I know you're not :) I try to be explicit that I'm only talking about my own experience here. I try not to write about my experiences with Go because it was a very long time ago at this point, and I find it a bit distasteful to talk about for various reasons, but we apparently have quite different experiences.

Maybe it depends on other factors too. But in practice, I basically never think about memory management. I write code. The compiler sometimes complains. When it does, 99.9% of the time I go "oh yeah" and then fix it. It's not a significant part of my experience when writing code. It does not slow me down, and the 0.1% of the time when it does, it's made up for it in some other part of the process.

I wish there was a good way to actually test these sorts of things.


This jives very well with my experience. I like writing Rust, but I do so well aware that I could write the same thing in Go and still have quite a lot of time left-over for debugging issues.

I can also get user feedback sooner and thus pivot my implementation more quickly, which is a more subtle angle that is so rarely broached in these kinds of conversations.

The places where I think the gap between Go and Rust is the smallest (due to Rust's type system) are things like compilers where you have a lot of algebraic data types to model--Rust's enums + pattern matching are great here.


I always miss match and options (I could go either way on results, which tend to devolve into a shouting match between my modules with the type system badly refereeing). But my general experience is, I switch from writing in Rust to Go, and I immediately notice how much more quickly I'm getting code into the editor. It's pretty hard to miss the difference.


I don't do much Go, so I can't really compare it with Rust all that well, but I think it's a plausible result.

To take two GC'd languages, I'm proficient in both Java and Scala. It usually takes me a little longer to write something in Scala, but when I'm done, I've almost certainly written fewer bugs in the Scala program than the Java program (I've also written many fewer lines of code, but that's another topic).

For me, it's the type system that helps the most. Given that Rust's type system is much stronger and expressive than Go's, I do expect to write fewer bugs in Rust than in Go. But it does feel like, if I had more experience with Go, I'd be significantly faster writing Go than Rust. (Then again, the more I write Rust, the fewer write-compile-fail-fix cycles I have to go through, and the compiler's ability to accept code as safe improves pretty frequently.)

Still, though (and I know this isn't the question at hand, but...), I personally value greater chances of correctness at compile time way more than development speed. While some types of bugs can be a fun adventure to track down and fix, most bugs I encounter are some mix of boring and annoying. I honestly would prefer to spend 2 weeks building and 2 days debugging over 1 week building and 1 week debugging. I really do find debugging that annoying. (Fuzzy numbers; I don't actually think I'd build 2x as fast in Go as Rust.)


> I personally value greater chances of correctness at compile time way more than development speed

In my experience, I don’t get much additional correctness for the extra effort, but rather I get independence from the GC, which is worth much less to me.

If we’re optimizing for correctness alone, I think development times could improve significantly by swapping the borrow checker for a gc. I know the borrow checker aids in correctness beyond what a gc does, but IMO the returns diminish rapidly. And I’m not sure how well this would work in practice, but maybe you could keep the borrow checker and add a GC, with every reference type being gc<> by default (not sure if that would recoup any of the extra correctness that a borrow-checker affords or not).


Things like the borrower checker helps manage all resources, a GC only manages memory.


Or rather RAII, borrow checker is a different feature that is not related to garbage collection


Yes, I understand. To quote myself:

> I know the borrow checker aids in correctness beyond what a gc does


> it's just slower to write a Rust program, because the memory management is much fiddlier.

Really depends on what kind of programs do you write. I found that my Rust development gets slowed down only because I have to spend time to create the proper types. Memory management and lifetime problems are very few in my practice (but I can agree that they can swallow a time -- only when you are new though).


It's very much a confusing process. If C-styled memory management is skydiving and Python is parachuting, Rust can feel a bit like bungee-jumping. It's neither working for or against you, but it will behave in a specific way that you have to learn to work around. Your reward for getting better at that system is less mental hassle overall, but it's definitely a strange feeling, particularly if you're already comfortable with traditional memory management.


> At the same time, you have a lot more control than you do in a language with a GC

Are there some examples of that?


Two examples: LockGuard and Box.

Control is primarily exerted over consumers of your API rather than the actual resources. This can be enforced through a combination of Drop implementations, and closures / lifetimes; the classic example is Mutex's LockGuard. In a GC language (eg Go) they give you defer or finally blocks that can accomplish the same thing, but that is always optional and up to other programmers to remember to do. Compare: you can't typically make someone run destructors in a GC language; you also wouldn't be able to guarantee the destructors have run at any particular point in time.

The one area you have more control over actual resources is knowing when memory is freed. Some people need to know when memory is freed, because they have allocated a lot and if they do it again without freeing, they'll run into trouble. To know for sure, simply use a normal owned type or a unique pointer (Box); when it goes out of scope, that's when its destructor is run. No such feature exists in a GC language, because you can never know at compile time when nobody else holds a reference.

As a thought experiment: in JavaScript with WebAssembly, an allocation in WASM can be returned to JS as a pointer. You need to free it, somehow. Can you write a class that will deallocate a WASM allocation it owns when an instance of the class is freed by the JS GC? (Answer: no! You need a new language-provided FinalizationRegistry for that.)


Ah, so it's more about library writer control then about library consumer control? Since for example in Common Lisp, the latter can still be accomplished through declarations, such as DYNAMIC-EXTENT (http://clhs.lisp.se/Body/d_dynami.htm). (Not sure if the former is necessarily related to memory usage control, but you'd probably achieve that type of resource control by exposing only WITH-* macros in your API.)

Maybe D people would have something to say about this as well, but I'm not a D person. What you're describing doesn't seem impossible in D to me, though.


Edit: yes. Library consumers don't get to change much, except where you have generic functions that abstract over a trait like `T: Borrow<T2>`, and then you can pass in any kind of owned or borrowed pointer to T2.

Dynamic-extent appears to be more similar to the "register" hint in C than to anything in Rust, in that it's an implementation-defined-behaviour hint. Rust has no such thing as hinting at storage class. Your variables are either T (stack) or Box<T> (heap) or any other box-like construct involving T. You maintain complete control at all times, nothing is implementation-defined, and it's explicit. You can implement (and people have implemented) dynamic switching between stack and heap storage in a Rust library.

https://lib.rs/smallvec (stack to heap), https://lib.rs/tinyvec (smallvec with no unsafe code), https://lib.rs/arrayvec (stack only)

As you can see, these three library authors get to control very precisely how their types allocate and deallocate, and you basically mix and match these and the stdlib's smart pointers (and Vec) + other libraries like arenas, slot maps, etc to allocate the way you want.

> you'd probably achieve that type of resource control by exposing only WITH- macros in your API*

Yes, this and similarly using with_* closures both work, but both are more limited than destructors that run when something goes out of scope. A type that implements Drop can be stored in any other type, and the wrapper will automatically drop it. You can put LockGuard in a struct and build an abstraction around it.


I feel that this is one of those common misconceptions about Rust. Rust's memory management is nothing like C or non-modern C++'s with malloc/free or new/delete. Rust uses modern-C++'s RAII model, typically, to allocate memory. The compiler is smart enough to know when to call drop() (which is essentially free/delete, but with the possibility of additional behavior). You can also call drop() yourself.

What I think people _should_ focus on with Rust versus Go (et al) is that Rust allows you to choose where you _place_ memory. You can choose the stack or the heap. The placement can matter in hot regions of code. Additionally, Rust is pretty in-your-face when it comes to concurrency and sharing memory across thread/task boundaries.


Tangentially, I did a bit of Rust work recently. I was sadly unable to find a concise credible answer to a rather elementary best-practices question: How does ownership interact with nested datastructures? Is it possible to build a heap tree without Boxing every node explicitly?


This question is a bit subtle, it depends on exactly what you mean. You could make a tree using only borrow checked references and the compiler would make sure that parent nodes go out of scope at the same time or before the child nodes they point to, but I don't think that's what you're talking about.

In general, if it's a datastructure where you have to use pointers, you'll have them Box'ed, but you would try to avoid that if you can. In your example of a heap, you'd want to use an array-based implementation, probably backed by a growable Vec, and use indexes internally. A peek function would still return a normal Rust reference to the data, and the borrow checker would make sure that you don't mutate the heap's backing array while that reference was still in use, etc.


I never thought about using a Vec for these, but that is a great idea for keeping the memory management sane for tree/linked lists.

One thing I would add that you need to be wary of destructors with large pointer data structures in Rust since it can easily stack overflow. When using Option<Box<T>> you need to be careful to call Option::take on the pointers in a loop to avoid stack overflow.


You'd do the same stuff you'd do in C++ here; allocate every node explicitly, use an arena, whatever you want.


You might be interested in this:

https://rust-unofficial.github.io/too-many-lists/


Thanks. Saw that before, but the credibility/length ratio wasn't high enough to read it more carefully. It appears that we do have to Box/Rc/Arc nodes in a recursive datastructure. Doable, but a bit on the inconvenient side.

    struct Node {
        elem: i32,
        next: Option<Box<Node>>,
    }


All explanations start with why without any indirection, Node would be a recursive, infinitely large type. Therefore the Node must be a pointer. Ok. But then, Rust then forces you to answer this question: who will own the data referred to by the pointer? Consequently, who will be responsible for freeing it?

If you use a &mut Node as your pointer, you are attempting to answer those questions with "not me". Someone else has to own the Nodes. They've got to be somewhere on the stack or on the heap. There's nothing stopping you from defining the next pointer as Option<&'a mut Node>. The problem is actually constructing a list.

Not many answers tell you why you can't do this in practice. I agree that this is not explained well enough in general, because new Rustaceans don't intuitively reach for references so it probably doesn't come up much and they're hard to use for this. But it's not that hard to see why:

Imagine you try to allocate all the Nodes at once (e.g. an array), and then use &mut references into the pre-allocated array. In order to set one &mut Node's next pointer, you will have to hold another &mut Node to set it to. This means you need to acquire mutable references to array elements, in the order that you wish them to appear in the linked list. This is actually really tricky to do: slice::get_mut(index)'s returned reference borrows the entire slice, so it doesn't let you have a &mut reference to two nodes at the same time. You need smaller &mut [Node] slices, somehow.

slice::split_first_mut is one way (in order), but if you have an array and can only create a linked list in the order nodes appear in the array, what's the point? Just use the array! Any other compiler-checked access order scheme will also be so limiting that you should just use a data structure of that exact shape anyway. To use an arbitrary order, you're going to need unsafe, so you'd basically be writing C.

To be fair, there is basically one application of this, and it's to have a sparsely populated constant size array that needs to be iterated in order. I made a demo:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

The other problem is that you can't resize the backing array: your &mut Node references would be invalidated.

For this reason, pre-allocated lists like these are usually done with indices instead of references. The overhead is one pointer + offset and then a bounds check when dereferencing.

---

The other solutions answer the ownership question like so:

- You can use Box, so that each Node (acting as a list head) owns the entire tail of the list, uniquely, such that no other list can also refer to it. The tail is freed when the head is, unless of course you detach it first (let tail = node.next.take();).

- You can Arc/Rc the nodes, so that each node has a pointer, but not a unique pointer, to the next node. These can be duplicated, so lists can exhibit structural sharing if you are comfortable with that. Because of the sharing, freeing the head does not necessarily free any/all of the tail.


An improvement, using the Node struct and apparently pushing the limits of borrowck: https://play.rust-lang.org/?version=stable&mode=debug&editio...

If you actually want this intrusive linked list functionality, consider using a real-world implementation like this: https://lib.rs/intrusive_collections


Thanks for the in-depth dive. My usecase is doing transformations over an AST: trees are immutable, but may become shared or dead deep in the middle of some complex transformation. Probably Rc<Node> is the reasonable approach, as Box is too constraining.


Of course. Arena allocation comes to mind.


> Additionally, Rust is pretty in-your-face when it comes to concurrency and sharing memory across thread/task boundaries.

Use channels whenever possible.


Channels are not always the best solution (unless you're referring to Rust channels?)

https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-s...


Yeah, Rust's crossbeam channels are actually really good.


It kills me that RAII is considered modern c++. It's there since 1983 aha, what do you think fstream and std::vector are if not RAII wrappers over files or memory


I think before the introduction of move semantics in C++11, there were a lot of cases where you needed new and delete to get basic things working. (Moving an fstream around is a relevant example.) So the modern rule of "don't use new and delete in application code" really wasn't practical before that.


No, pretty much everything could be done with swap (like moving an fstream as you say). Sure, it's a bit more cumbersome, but it was still RAII.


I suppose RAII is an old concept, but move semantics allowing RAII to transfer ownership and avoid manual new/free of non-copied resources was uncommon until C++11.


before unique_ptr we didn't have a good way to handle raii for a lot of things. I wrote a lot of RAII wrappers for various things (still do, but a lot less). Attempts like auto_ptr show just how hard it is to make raii work well before C++11.

Yes we had RAII, but it didn't work for a lot of cases where we needed it.


Go = you do no explicit memory management and the GC/runtime takes care of it for you

Rust = when writing your code, you explicitly describe the ownership and lifetime of your objects and how your functions are allowed to consume/copy etc. them and get safety as a result

C = when writing your code, you explicitly allocate and free your objects and you get no assistance from the language about when it is safe to copy/dereference/free/etc. a pointer/allocation


I prefer to think that in Go you don't do explicit memory management by default, while in Rust you do. Although you can laboriously opt out of explicit memory management (e.g., by tagging everything Rc<> or Gc<> and all of the ceremony that entails).


> Isn't part of the point of Rust that you don't manage memory yourself, and rather that the compiler is smart enough to manage it for you?

For trivial cases, kind of. But once you start to do anything remotely sophisticated, no. Everything you do in Rust is checked w.r.t. memory management, but you still need to make many choices about it. All the stuff about lifetimes, borrowing, etc: that's memory management. The compiler's checking it for you, but you still need to design stuff sanely, with memory management (and the checking thereof) in mind. It's easy to back yourself into a corner if you ignore this.


While some commenters have pointed out that you still need to deal with lifetimes/thinking about where stuff lives, in practice you can avoid almost all of this by using Rc<Type> instead of Type everywhere (or Arc in a multithreaded scenario).

Yes Rc and equivalents have a performance overhead, but for many use cases the overhead really isn't that bad since you typically aren't creating tons of copies. In practice, I've found one can ignore lifetimes in almost all cases even when using references except when storing them in structs or closures. So really you would just need to increment the Rc counter for structs/closures outside of allocation/deallocation which is dominated by calls to malloc/free.


I've tried this before and it was so laborious that I regretted it. I'm not sure I saved myself any time over writing "vanilla" Rust or whatever one might call the default alternative. If I was really interested in writing Rust more quickly, I would just clone everything rather than Rc it, but in whichever case you're still moving quite a lot slower than you would in Go.


I've tried writing Rc-oriented Rust (for gtk-rs) too, and struggled hard with the pervasive cloning/aliasing needed, having to use weak references to avoid leaking memory, and the clone!() macro turning off rustfmt for all code in the method body. In fact, I'd rather deal with Qt-style memory management, with single QObject ownership, QPointer (which is kinda like a weak pointer), and praying you don't use-after-free.

(Normally I use subclassing in Qt to associate extra state with a widget, but gtk-rs's subclassing API was arcane and boilerplate-heavy. Perhaps there's alternative paradigms for state management that follows Rust's single ownership principle better. Some people take a React/Elm-style approach, but I don't think virtual DOMs and diffing the entire UI tree on each user interaction are the last word on GUI interactivity and updates, and I don't find the added memory of virtual DOMs and CPU of generating/diffing them acceptable, but rather "pure overhead" to be eliminated in favor of minimal targeted UI state updates.)


You can also kind of do your own management of memory in GC languages, you just have to be extremely careful in code review to spot inadvertant allocations in the hot path. A great example is the "LMAX Disruptor" in Java: https://lmax-exchange.github.io/disruptor/

The trick is to pre-allocate all your objects and buffers and reuse them in a ring buffer. Similar techniques work in zero-malloc embedded C environments.


While you may not have to directly call malloc and free in Rust, the memory management still feels very manual compared to a language with GC. When I want to pass an object around I have to decide whether to pass a &_, a Box<_>, Rc<_>, or Rc<RefCell<_>>, or a &Rc<RefCell<_>>, etc. And then there are lifetime parameters, and having to constantly be aware of relative lifetimes of objects. Those are all manual decisions related to memory management that you have to constantly make in Rust that you wouldn't need to think about in Go or Python or Java.

Similarly, idiomatic modern C++ rarely needs new and delete calls, but I'd still say it has manual memory management.

I suppose it's reasonable to talk about degrees of manual-ness, and say that memory management in Rust or modern C++ is less manual than C, but more manual than Go/Python/Java.


There are already a lot of replies to this comment explaining the ideas behind Rust memory management in different ways, but I'll throw in my handwavy explanation as well:

In GC languages, memory management is generally runtime through the interpreter/runtime. In C, memory management is generally done at programming time by the (human) programmer. In Rust, memory management is generally done at compile time by the compiler. There are exceptions in all three cases, but the "default" paradigm of a language informs a lot about how it's designed and used.


You are still managing memory in Rust, it’s just more constrained, statically checked and inferred. Within those constraints you have full control.


I'm not a rust user, but I would argue you are still managing memory manually, you're just doing a lot of it through rust's type system, which can check for errors at compile time, rather than through runtime APIs like the C or C++ standard library. The question then becomes whether it is easier to manage memory through Rust's type system versus via standard runtime APIs.

From what I've read, Rust memory management actually requires more work but provides fantastic safety guarantees. This could mean that rust actually lowers productivity at first, but as the complexity of the code base grows, some of that productivity is restored or even supercedes C/C++ because you spend no time chasing runtime memory bugs.

For some products or projects, the costs of shipping a security flaw caused by a memory bug exploit could be high enough that a drop in productivity from Rust relative to C is still more than justified due to external costs that Rust mitigates.


I think sometimes the "compiler manages memory for you" concept gets overplayed a bit. It's not as complex as that description makes it sound. If you understand C++ destructors, it's really the same thing. Objects get destroyed when they go out of scope, and any memory or other resources they own get freed. The differences come up when you look at what happens when you make a mistake, like holding a pointer to a freed object. (Rust catches these mistakes at compile time, which does indeed involve some new complexity.)


Try to implement a data structure that works across async runtimes, or a couple of GUI widgets, then you will get the point why some of us complain about the borrow checker, even with decades of experience in C and C++.


Or rather, acting as if rust is positioned to replace general purpose languages.


It's very easy to "make it work" while fencing with compiler warnings by just copying things around instead of developing a clear sense of memory ownership. I've seen myself fall into this trap. The upside, coming from C, is that you don't have terrible memory safety issues. The downside is that you have the same data copied all over the place and (accidentally) allocate like a mad man. Managed memory is not inherently bad or good.


I also was confused about that part but for another reason: The whole post is basically "despite go having a GC we had to manually manage the memory to make it work" and then the anti-rewrite is "go does memory management for us". IMO people sometimes have really weird ideas what is and isn't part of managing memory.


After using Rust for a few years professionally it's my take that people that really want to use it haven't had much experience with it on real world projects. It just doesn't live up to the hype that surrounds it.

The memory and CPU savings are negligible between Go and Rust in practice no matter what people might claim in theory. However, the side effects of making your team less productive by using Rust is a much higher price to pay than just running you Go service on more powerful hardware.

There are many other non-obvious problems with going to Rust that I won't get into here but they can be quite costly and invisible at first and impossible to fix later.

Simple is better. Stay with Go.


Agreed. My organization has been a great testing ground for comparing Go vs Rust service development. The teams that spun up web services in Rust have almost uniformly have had poor experiences. In addition to Rust's steep learning curve, the relatively feature-poor standard library (you have to pull in a third party package to create a SHA256!), and instability of best practices/tools around service writing, in one case, lengthy Rust compile times actually increased the time to resolve an incident. We've largely reached a consensus that all new services should be written in Go.

I don't see Rust having much of a place in web services development until there's years of improvements in place. There's plenty of other potentially appropriate places for Rust replacing systems code.


> (you have to pull in a third party package to create a SHA256!)

nitpicking here, but this is by design - it's also true for datetimes and random numbers. it isn't a fault, it's a different packaging philosophy.

i agree with the rest - the good things about Rust just don't matter as much when developing bit-shoveling HTTP services, which is what 99% of backend seems to do nowadays.


Just out of curiosity - what kind of service was it? My experience with web services (API and websockets) has been great with Rust and actix, so I'm curious if it might be a difference of the work that needed to be done.


Explicitly managed memory is useful for handling buffers. Everything else is peanuts anyways and could use a GC for ergonomics reasons. That being said, some really prefer the ergonomics of working with Result and combinators compared with the endless litany "x, err = foo(); if err !== null". IMHO there is still room for significant progress in this space, neither Rust nor Go have hit the sweetspot yet.


My experience differs quite a bit. I did a bit of production code in Go and a bit of Rust as a hobby + one production Rust service. I guess it might depend on the kind of problems that you work on, but for the most part I don't think that my Rust code is so much different than Go. Definitely more concise. I admit there are times when I have to spend more time to think about how to implement a certain thing, but honestly, if you don't need raw performance you almost always can get away with one of the smart pointers and cloning (or just cloning?). So I don't feel that I'm much slower writing Rust and I'm happy to have more compile type checks.

I don't think that my experience is something isolated, either, here is for example a quote from one of Microsoft employees:

> "For the first week or so, we lost much of our time to learning how borrows worked. After about two weeks, we were back up to 50% efficiency compared to us writing in Go. After a month, we all were comfortable enough that we were back up to full efficiency (in terms of how much code we could write)," writes Thomas.

> "However, we noticed that we gained productivity in the sense that we didn't spend as much time manually checking specific conditions, like null pointers, or not having to debug as many problems."

https://www.zdnet.com/article/microsoft-why-we-used-programm...


This is highly project specific. Go is not suitable for everything. Rust is designed as a C++ replacement not a language for writing backends. Even though a whole lot of effort was put into this space. Go is very good at writing backends, Rust is very good at replacing C++. Everything else the waters get much muddier.


Obviously this is a personal preference, but I prefer Rust for web services. And so I have a question - do you have experience in writing web services with Go and/or Rust? I'm often wondering what do people miss when writing Rust based web services.

Recently I even gave a shot to a todo-backend[1] implementation in Rust[2] and it honestly doesn't look that different from the Go versions.

Granted the todo-backend spec is very very simple. I would prefer to also include stuff like authentication/authorization and maybe even multi tenancy to compare better. But when I'm writing this kind of Rust code I'm often wondering - what makes Rust so unergonomic for other people?

  1. https://todobackend.com/
  2. https://github.com/drogus/todo-backend/blob/main/src/main.rs


Rust async is not as simple to use, the ecosystem is much smaller and segmented across async-std and tokio.

A good backend stack requires a rich ecosystem of various connectors to databases, cloud services, payment services, frontend stuff like server side rendering, graphql etc.


> the side effects of making your team less productive by using Rust is a much higher price to pay than just running you Go service on more powerful hardware.

This entirely depends on the ratio of development effort to deployed instances. At one end of the spectrum, lots of developers work for years on a system which is only deployed on one machine; obviously you optimize for developer effort and buy a single massive machine. At the other end of the spectrum, a few developers work for a short time on a system which is deployed at massive scale; obviously you optimize for performance.

At Pernosco we have a very small team deploying a relatively small number of instances, and after five years of Rust we're very happy.


My problem with Rust: I'm sure that if I used it as my primary language for a couple of years, I would be able to claw the productivity loss back. But I can't find any reason to justify using it at my current productivity level.

There is a vicious cycle: few projects use Rust because the productivity hit is large, and programmers do not get enough experience using Rust because few projects use it.


I don't think that 5 years is needed to feel productive. I started Rust a few years ago, but I dropped it due to lack of time and I remember that I had a really hard time with some of the stuff (most notably futures between async/await). I got back in 2019 and I wrote maybe two small projects (under 300 lines of code each I think) and I read quite a lot. After that I got to implement a production web service and also mentor/teach two people. It went very smoothly and for the most part there were no major blockers.

Obviously it all depends on a lot of stuff, but I think that for most people a few weeks to month of writing Rust at work (meaning full time, not like an hour in the evening here or there) should be enough to feel decently productive.

Another thing is that if you've tried Rust long time ago check it out again. I think that both the language and the ecosystem changed a lot in the recent years, it's hard to compare how easy it is to do Rust now vs 2016 or 2017 when I first tried it.


I don't think that vicious cycle exists. I was immediately productive with Rust, but OK, that's just an anecdote. Surveys such as Stackoverflow's Developer Survey show Rust usage growing rapidly.

If I met someone who took years to be productive with Rust, I would conclude they lack aptitude for programming. Maybe harsh, but probably true.


I guess I wasn't clear. We were happy with Rust from almost day 1. After five years, we're still happy.


>> Simple is better. Stay with Go.

Ive been feeling the same, but as someone who just played with Go/Rust (and never professionally), it's nice to hear that professionals feel the same.


I mean, I'm a professional and I'd say "it depends" (as always), but for most of the stuff that I do I would choose Rust, especially if I care about maximum reliability. Go is statically typed, but I've had situations when there was a runtime exception in Go cause of a mistake that wasn't caught by tests nor code review. In Rust you almost never see runtime exceptions, especially with good linting rules. And thanks to no data races I feel so much more confident writing concurrent code.


Why do you say "less productive with Rust"? In my experience I'm more productive with Rust because it's very strong type system catches so many bugs.


Can you name some non-obvious problems?


Very nice write up.

Go’s focus on simplicity means that there is only a single parameter, SetGCPercent, which controls how much larger the heap is than the live objects within it.

FWIW, there is a new proposal from a member of the core Go team to add a second GC knob in the form of a soft limit on total memory:

https://github.com/golang/proposal/blob/master/design/48409-...

It includes some provisions to make sure that the application can keep making progress and avoid death spirals (part of the reason why it is a "soft" limit), and also includes some new GC-related telemetry.

From the blog write up, a second GC knob with a soft limit might have only been a minor help here, with the bigger wins coming from the code changes they described in the blog.


Hmm, I wonder whether a better alternative might be to be able to set a minimum memory size to use. It is a bit annoying to start a go program when you exactly know you are going to need 1G of memory and after the first 1M allocated it tries to GC before growing the heap. If you could set a minimum memory size, then you could get away with a very low value for GOGC to limit the space overhead beyond your set memory size.


Buried in here are great examples of why rewrites don’t help:

“The module that does this inference was recompiling those regular expressions each time it was asked to do the work.”

“The reason for the allocation was a buffer holding decompressed data, before feeding it to a parser. …the output of the decompression could be fed directly into the parser, without any extra buffer.”

The problem here isn’t that the language has GC, it’s that memory usage was just not considered. If you want performance, you have to pay attention to allocations no matter what kind of memory management your language has. And as the article demonstrates, if you pay attention, you can get performance no matter what kind of memory management your language has.


Right, but GC encourages you to not think about memory at all until the program starts tipping over and fixing the underlying cause of the leak now requires an architecture change because the "we hold onto everything" assumption got baked into the structure in 2 places that you know about and 5 that you don't.

I don't miss the rote parts of manual memory management, but it had the enormously beneficial side effect of making people consider object lifetimes upfront (to keep the retain graph acyclic) and cultivate occasional familiarity with leak tracking tools. Problematic patterns like the undo queue or query correlator that accidentally leak everything tended to become obvious when writing the code, rather than while running it. These days, I keep seeing those same memory management anti-patterns show up when I ask interviewees to tell a debugging war story. Sometimes I even see otherwise capable devs shooting in the dark and missing when it comes to the "what's eating RAM" problem.

I feel like GC in long-form program development substitutes a small problem for a big one. Short-form programming can get away with just leaking everything, which is what GC does anyway, so I'm not sure there's any benefit there either.

tl;dr: get off my lawn.


GC will not fix trashy programming. The problem is that many GC'd languages have adopted a style guide that commits to a lot of unnecessary allocations. For example, in Java, you can't parse an integer out of the middle of a string without allocating in-between. Ditto with lots of other common operations. Java has oodles of trashy choices. With auto-boxing, allocations are hidden. Without reified (let's say, type-specialized) generics, all the collection classes carry extra overhead for boxing values.

I write almost all of my code in Virgil these days. It is fully garbage-collected but nothing forces you into a trashy style. E.g. I use (and reuse) StringBuilders, DataReaders, and TextReaders that don't create unnecessary intermediate garbage. It makes a big difference.

Sometimes avoiding allocation means reusing a data structure and "resetting" or clearing its internal state to be empty. This works if you are careful about it. It's a nightmare if you are not careful about it.

I'm not going back to manual memory management, and I don't want to think about ownership. So GC.

edit: Java also highly discourages reimplementing common JDK functionality, but I've found building a customized datastructure that fits exactly my needs (e.g. an intrusive doubly-linked list) can work wonders for performance.


> many GC'd languages have adopted a style guide that commits to a lot of unnecessary allocations.

Oh, that too. I forgot to rant about that.

> Virgil

Unfortunately I'd rather live with a crummy language that has strong ecosystem, tooling, and developer availability, so I'll never really know. It does sound nice, though.


Yeah, but that was one of Java's 1.0 mistakes, that thankfully Go, .NET, D, Swift, among others, did not make.

Now lets see if Valhalla actually happens.


> Right, but GC encourages you to not think about memory at all

I’ve come to a new obvious realisation with this sort of thing recently: if you care about some metric, make a test for it early and run it often.

If you care about correctness, grow unit tests and run them at least every commit.

If you care about performance, write a benchmark and run it often. You’ll start noticing what makes performance improve and regress, which over time improves your instincts. And you’ll start finding it upsetting when a small change drops performance by a few percent.

If you care about memory usage, do the same thing. Make a standard test suite and measure it regularly. Ideally write the test as early as possible in the development process. Doing things in a sloppy way will start feeling upsetting when it makes the metric get worse.

I find when I have a clear metric, it always feels great when I can make the numbers improve. And that in turn makes it really effortless bring my attention to performance work.


Not so much. Here we have an example of a memory pressure problem that's evident only under high load in realistic environments. This is a classic problem with performance engineering: it's usually difficult to do realistic automated load testing. Instead, you end up running lab experiments, which are time-consuming to set up.

The whole post is essentially about how tricky it was to surface the problems their customers were seeing in the field. I'd resist the urge to respond to that with a platitude about automated testing.


Yes it can be difficult to do realistic automated load testing. But I suppose I see this as more evidence that if you're going to do load testing, do it right! In complex systems you often need real world usage data, or your metrics won't predict reality.

I've been running into this a lot writing software for collaborative editing. Randomly generated editing traces work fine for correctness testing. But doing performance testing with random traces is unrepresentative. The way people move their cursors around a text box while editing is idiosyncratic. Lots of optimizations make performance worse with random editing histories, but improve performance for real world data sets.


Plenty of C programs do the equivalent of ioutil.ReadAll; it's not a GC thing.


"Leak everything because we can get away with it here" is a fine memory management strategy. "Why does my program keep getting killed?" isn't.


This has nothing to do with leaking (nothing "leaked"; it's a garbage-collected runtime). It's about memory pressure, which, I promise you, is a very real perf problem in C programs, and why we memory profile them. The difference between incremental and one-shot reads is not a GC vs. non-GC thing.


> Buried in here are great examples of why rewrites don’t help

That has not been my experience. Rewrites do sometimes help, because in a lot of codebases there’s too many “pet” modules or badly designed frozen interfaces.

Rewrites can help in those situations, because there’s no sacred cows anymore. The issue is that a lot of people do rewrites as translations, without touching structures.


Agreed with this 100%.

So many posts here over the years of examples of 'how we rewrote from x to y and saw 2000% gains', where x and y are languages. Such examples are 100% meaningless. Rewrites from the ground up -should- always be way faster, since it's all greenfield. If trying to make a language comparison, rewrite the entire thing in both languages!


Yes absolutely. I wrote an article a couple months ago which was trending here where I got a 5000x performance improvement over an existing system. One of the changes I made was moving to rust, and some people seemed to think the takeaway was “rewriting the code in rust made it 5000x faster”. It wasn’t that. Automerge already had a rust version of their code which ran a benchmark in 5 minutes. Yjs does the same benchmark in less than 1 second in javascript.

Yjs is so fast because it makes better choices with its data structures. A recent PR in automerge-rs brought the same 5 minute test down to 2 seconds by changing the data structure it uses.

Rust/C/C++ give you more tools to write high performance code. But if you put everything on the heap with copies everywhere, your code won’t be necessarily any faster than it would in JS / python / ruby. And on the flip side, you can achieve very respectable performance in dynamic languages with a bit of care along the hot path.


Not only greenfield, but the problem domain is much better understood. A lot of architecture choices are made in the early days of a project when the problem isn't sufficiently understood to make the choice correctly.

I'm a huge fan of writing the first version of anything as an problem-exploration prototype, intended to be discarded and rewritten. As Fred Brooks said, "you're going to rewrite anyway, you might as well plan for it" [0]

[0] paraphrased from https://en.wikiquote.org/wiki/Fred_Brooks "The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow."


In my experience, the prototype never gets thrown away when it should be, and sometimes it's never thrown away at all. It just gets extended, poorly, until development grinds to a halt because you can no longer add features or fix bugs without creating new bugs.

Then you either a) stop what you're doing and spend many months rewriting, or b) spin up a parallel team that does the rewrite, while the old team maintains the old code and does their best to add the most critical features and fix the most critical bugs without breaking anything else in the process.

Neither approach is good. (a) means you'll probably lose customers due to lack of progress on their pet issues. (b) means your development costs have doubled, and you have a team full of people who are demotivated and demoralized because they know they're working on something that's soon destined for the junk heap.

I usually build the first version expecting that it will live on for quite a long time (and sometimes/often be the only version), and build with an eye toward ease of refactor and even ease of rearchitecting. Yes, it's slower than building a prototype-quality product, and yes, sometimes product managers complain that the extra time needed will blow a market opportunity. Those PMs are usually wrong, and even if they are potentially right, building the prototype always takes longer than expected, so the PMs end up fretting over time-to-market anyway.


This is where profiling helps more. Find the weak parts of the code, try to optimise those. If the language proves to be a barrier then you have a justification for a rewrite.

All too often people don’t understand how to performance tune software properly and instead blame other things first (eg garbage collection)


Most slow languages make escape to C easy for cases where the language is the issue. Most fast languages make writing a C APIed interface easy, so if the language is your issue just rewrite the parts where that is the problem.

Of course eventually you get to the point where enough of the code is in a fast language that writing everything in the fast language to avoid the pain of language interfaces is worth it.


And there’s time when even C isn’t sufficient and a developer needs to resort to inlined assembly. But most of the time the starting language (whatever that might be) is good enough. Even here, the issue wasn’t the language, it was the implementation. And even where the problem is the language, there will always be hot paths that need hardware performant code (be that CPU, memory, or sometimes other devices like disk IO) and there will be other parts in most programs that need to be optimised for developer performance.

Not everyone is writing sqlite or kernel development level software. Most software projects are a trade off of time vs purity.

That all said, backend web development is probably the edge case here. But even there, that’s only true if you’re trying to serve several thousand requests a second on a monolithic site in something like CGI/Perl. Then I’d argue there’s not point fixing any hot paths and just rewrite the entire thing. But even then, there’s still no need to jump straight to C, skipping Go, Java, C#, and countless others.


Except when the program is actually written in C, then better hold the Algorithms and Data Structures book and dust it off, or Intel/AMD/ARM/... manuals.


Algorithms and data structures come BEFORE dropping to c.

These days it is rare that you can beat your compiler with hand machine code, and even if you can it isn't worth it because the difference is typically small and only applies to one specific machine.

Of course once in C you can often think about memory locality and other cache factors that higher languages hide from you.


Many applications still start in C, there is no dropping into C.


Quite true, a rewrite can help if it is also a "rethink". But you don't have to switch languages to get that effect--in fact you'll probably do better if you don't throw a new language/library into the mix.

My point was that, contrary to what is apparently a common impulse, rewriting the same thing in a different language while maintaining the lack of attention to performance considerations that was present in the first version isn't going to help much.


This is less an argument for a rewrite than an argument for redesigning parts of your codebase, which can be done much more easily than a complete rewrite.


The tricky thing is that it’s easy to end up with a result that’s not far off. Some modules will improve, but a lot of the time these kind of bottlenecks tend to happen because the performant version is not very idiomatic (feels weird), it’s too verbose, or it’s to confusing to think through.

Unless you have the same team (and they learned the lesson the first time), it’s very likely to end up with modules that perform in a similar way.

Sometimes changing the language makes thinking about the problems easier.


I would argue that the rewrites help when the information architecture for the original code is proven to be wrong, and there is either no way to refactor the old code to the new model, or employee turnover has resulted in nobody having an emotional attachment to the old code.

That said, to slot in a new implementation you often have to make the external API very similar to the old one, which can complicate making the improvements you're after.


> there is either no way to refactor the old code to the new model

That doesn't happen. Write facades as needed. Even if they are slower than everything else write the facades so you can keep in production all along.


If you get the object ownership and the internal state model wrong (information architecture) facades don't help you.

You can't put an idempotent or pure functional wrapper around a design that isn't re-entrant and expect anything good to come from it. IF you get it to work, it'll be dog slow.


Last time I was in a rewrite the boss had the old software on a computer next to him with the label "Product owner of rewrite". He regularly when asked how to do something looked at what that did.


I downvoted you at first and then changed my mind. I think I would like your comment more if it were more worded like: "buried in here are great examples of important optimizations that did not require a rewrite". Or something like: "this article does a great job of showing that you can hit many reasonable performance targets while using a GC'ed language like Go."

You can pretty much always get better performance with more control over memory, and more importantly, you can dramatically lower overall memory usage and avoid GC pauses, but you have to weigh that against the fact that automated memory management is one of the few programming language features that is basically proven to give a massive developer productivity boost. In my corner of the industry, everyone chooses the GC'ed languages and performance isn't really a major concern most of the time.


> The problem here isn’t that the language has GC, it’s that memory usage was just not considered.

While I agree with the gist of what you're saying, I do think runtimes based on the we'll-clean-it-up-some-day GC paradigm makes it more important to consider memory allocation than less laissez-faire paradigms (like RAII or reference counting), contrary to how it's presented in the glamorous brochures.


Put it this way: Each of the things mentioned in that post were errors that could just as easily have been made in Rust, and Rust would not necessarily have helped avoid. At best you can make a case for the errors being more explicit, but in my personal experience even that would be weak.

The last error in particular, using byte buffers instead of a streaming abstraction, is pervasive in programming. I don't know if Rust is necessarily any worse than Go's library environment for dealing with that problem but I doubt it's any better. By having io.Reader in the standard library from the beginning (and not because of any other particular virtue of the language, IMHO) it has had one of the best ecosystems for dealing with streams without having to manifest them as full bytes around [1].

It amounts to, the root problem is that they didn't have the problem they thought they have. Rust will blow the socks off the competition w.r.t. memory efficiency of lots of small objects, which is why it's so solid in the browser space. But that's not the problem they were having. Go's just fine where they seem to have ultimately ended up, stream processing things with transient per-object processing. Even if you do some allocation in the processing, the GC ends up not being a big deal because the runs end up scanning over not much memory not all that frequently. This is why Go is so popular in network servers. Could Rust do better? Yes. Absolutely, beyond a shadow of a doubt. But not enough to matter, in a lot of cases.

[1]: An expansion on that thought if you like: https://news.ycombinator.com/item?id=28368080


I think the Rust and Go stories with buffers vs. readers is pretty comparable. They both have good support for readers, and to-good support for reading whole messages into slices or Vec<u8>'s.


Good to hear. I hope it's something all new languages have going forward, because like I mentioned in my extended post it's almost all about setting the tone correctly early in the standard library & culture, rather than any sort of "language feature" Go had.

As mostly-a-network engineer it's a major pet peeve of mine when I have to step back into some environment where everything works with strings. I can just feel the memory screaming.


You mean just like XML-RPC and JSON-RPC (sorry REST), work?

Because the best way to contribute to global warming is to waste CPU cycles serializing and deserializing data structures into XML and JSON, and parsing them as well.


More importantly, GC'ed languages tend to use at least 2x the memory of un-GC'ed languages and have to deal with the consequences of GC-induced pauses and generally inferior native code interop. Whether that matters to you or not depends on your application. No one is going to use a GC'ed language in the Linux Kernel, but practically 100% of backend applications are written in GC'ed languages because the productivity benefits are of automatic memory management are massive.


I’m not really sure if that 2x figure is accurate. I’ve seen charts on both sides of this and a lot here depends on your programming language and the things it can optimize: with Linear/Affine types, I’m fairly sure Haskell could, in theory, eliminate GC deterministically from the critical sections of your code-base without forcing you to adopt manual memory management universally.

But, there’s just the fact that people writing real-time/near real-time systems do, in fact, choose GC languages and make it work: video games are one example with Minecraft and Unity being the major examples. But also HFT systems: Jane Street heavily uses Ocaml and other companies use Java/etc. with specialized GCs.

This is not even to mention the microbenchmarks that seem to indicate that Common Lisp and Java can match or exceed Rust for tasks like implementing lock-free hash maps and various other things https://programming-language-benchmarks.vercel.app/problem/s...


I am aware that you can hit really good latency targets with GC'ed languages, like in the video game and finance industry. Whenever I investigate examples, though, I find the devs have to go through a ton of effort to avoid memory allocations, and then I ask if using the GC'ed language was even worth it in the first place?

I'm actually fascinated with the idea of going off-heap in the hotspots of GC'ed languages to get better performance. Netty, for instance, relies on off-heap allocations to achieve better networking performance. But, once you do so, you start incurring the disadvantages of languages like C/C++, and it can get complicated mixing the two styles of code.


"Whenever I investigate examples, though, I find the devs have to go through a ton of effort to avoid memory allocations"

Yep, also the median dev in a GC'ed language is simply incapable of writing super efficient code in these languages because they rarely have to. You would have to bring in the best of the best people from those communities or put your existing devs through a pretty significant education process that is similar in difficulty to just learning/using Rust.

The resulting code will be very different to what typical code looks like in those languages, so the supposed homogeneity benefits of just writing fast C#/Java when it's needed are probably not quite true. You'd basically have to keep that project staffed up with these kinds of people and ensure they have very good Prod observability to ensure regressions don't appear.


Yes, and I think one important aspect to this is the necessary CI/CD changes needed to support these kinds of optimizations. If your performance targets are tight enough that you are making significant non-standard optimizations in your GC'ed language, you're probably going to want some automated performance regression testing in your deployment pipeline to ensure you don't ship something that falls down under load. In my experience, building and maintaining those pipeline components is not easy.


> … tasks like implementing lock-free hash maps…

Please be specific.

You pointed to spectral-norm, what does that have to do with lock-free hash maps?

The 2.java program seems to be 4x slower than the 7.rs program !


Look at 2.cl, though: the lisp solution is faster than everything except one c++ solution. (And, aside from the SIMD intrinsics, the lisp solution is fairly idiomatic)

I was referring to this with the lock-free hash maps: https://twitter.com/nodefunallowed/status/137196906733924761...


> I was referring to this with the lock-free hash maps…

Well thank you for providing an actual reference.

afaict from a twitter thread, "42nd At Threadmill" and "Luckless" are both Lisp re-implementations of the same Java hashtable code.

afaict the Rust sofware is not a re-implementation of that same Java hashtable code.

afaict that chart does not show any measurements of Java software, just Lisp and Rust.

So "… Java can match or exceed Rust …" seems to be based on nothing.

> Look at 2.cl, though…

So hand-coded AVX is hand-coded AVX in any language?


I mostly agree with what you're saying, but I'll also add that GC pauses are mostly a problem of yester-year unless you're either managing truly enormous amounts of memory or have hard real-time requirements (and even then it's debatable). Modern GCs, as seen in Go, Java 11+, .NET 4.5+ guarantee sub-millisecond pauses on terrabyte-large heaps (I believe the JS GC does as well, but I'm less sure).


Rewrites can definitely help but rushing into them before doing these other things is going to net you a lot less gain for the time.


That is correct.

In the worst case, you can always (even on GC'd languages) pre-allocate buffers and do your work without new memory requests. But you need to plan for this, in the same way you'd do in a language without GC.


> For our application, it would be acceptable to simply exit when memory usage gets too large

Could you not just set a ulimit on memory usage of the process in that case? (And use another process as the parent, e.g. a supervisor or init, to avoid exiting the container and just restart the process instead)


I have a feeling that they will end up eventually rewriting this in Rust as the use case they describe is where a non GC language can definitely provide more performance (beyond the case they solved). APM tools usually need to be more performant to ensure they add as little overhead to the actual service as possible. I guess what's helping here is that this is passive monitoring which allows a little lag in the system. Question relavent here is will there be more issues with memory in general based on their current roadmap.


Rebuilding in a different language is just trading one problem set for another. Better using the tools you've already taken on is a much better strategy if you don't have the money to hire a whole new set of devs or a year to burn onboarding onto a new language.


If I was going to write a satire piece representing a typical HN post, I would 100% start it with the same opening 2 sentences.


Well good for author that they were able to fix the issue.

However I think writing efficient code in even in managed memory languages for large, heavily used service is kind of normal thing and not above and beyond normal work.


This might be more fit for StackOverflow, but I have a related question.

I have a Go application that runs in Kubernetes, where memory usage steadily increases until it's at around 90% of the cgroup limit, where it seems to stabilize. As far as I can tell, Go GC uses the container memory limits to navigate it's total memory usage (this might be the fault of the OS not reclaiming what Go has already freed(?)).

However, my issue is that in this app, I also call out to cGo, and do manual memory allocations in C++ every 10-30minutes. This works well, except when the container is stabilized at a high memory usage, and my manual allocation brings it over the limit, thus forcing kubernetes to terminate it. (These allocations should as far as I know not be leaking. For a short while, I have two large objects allocated, and 99.9% of the time it's only one)

So, what I'd ideally want is to be able to specify a target heap size for GoGC, and then have a known overhead for the manual allocation. But as far as I'm aware, this isn't possible (?)

Does anyone have any experience with something like this, or see any obvious avenues to pursue to solve the termination issue?


Since Go seems to respect the memory limit, you could try using syscall.Setrlimit to set an artificially lower limit that you know will leave enough room for your other allocations. Have you tried playing with the GOGC environment variable from the runtime package? Maybe you could also manually collect a memory profile with runtime.MemProfile and call runtime.GC() if needed, but I've never done anything like this, just throwing out ideas I would probably try


Every article on Go allocations can benefit from a heap escape analysis section. I was hoping to find one here, but no luck. Stack allocation is a powerfull technique to reduce GC times.


Agreed, many put all GC languages on the same bag without understanding that several of them (including Go) do provide C like features.


> But our profile wasn’t ever showing us 500GB of live data, just a little bit more than 200MB in the worst cases. This suggested to me that we’d done all we could with live objects.

Is this a typo? Weren't seeing 500 MB of live data, just a little more than 200MB in the worst case?

EDIT: Btw, I read the entire article. It was fascinating, thank you!


Yes, that's a typo, thanks!


"How we avoided rewriting in Rust" feels like clickbait given that the answer is "our problems were algorithmic, not language-specific"


Memory issues are amplified a bit by garbage collection though, in that every pointer must be stored twice, and collection will take time and evict things from cpu cache etc.

If you were struggling with this, turning to Rust might be a thing people would try, even if it wasn't fixing the first order problems, and only addressing the 2nd order ones.


The whole post is about how Rust turned out not to be the answer to exactly this problem.


A bit yea, but it is somewhat telling that their first instinct was to find a GC "knob" and twist it around until they could go back to ignoring their basic architecture.

Go and Rust are great in that they let you write code at good speed, although, I think this just highlights the well known problems of over optimizing a single metric.


I would call this a language issue as you need to understand the various abstractions and how they interact which is endemic to almost all languages, ideally a language would type system to express resource usage


I assume it's tongue-in-cheek; because "rewrite in Rust to improve performance" is such a meme, the headline is subtly calling attention to the fact that this is rarely good advice and certainly not the first lever an engineer should reach for upon running into a performance problem.


> subtly calling attention

That's generous. I'd call it clickbait.


Well consider all the projects titled “blah blah blah… written in Rust”

Who gives a shit what it’s written in—what does it do?


People who is interested in rust maybe want to see how it was used.

The author could have just kept "Taming Go's Memory Usage".

Maybe they never considered rewriting in rust. The pros and cons looks like just some random arguments to add rust to the title.


> The author could have just kept "Taming Go's Memory Usage".

It’s their article. They can choose to write it however they want. You may find this type of humor distasteful, fine, write your articles that way.

As a user of both rust and golang, I chuckled at the headline and then forgot about it.


a


TFA doesn’t argue that one is better than the other? Maybe you’re commenting on unrelated “click bait shit”?


It's not the first lever an engineer should reach for regardless of the languages involved. Calling out Rust specifically feels like a bit of a cheap shot


To be fair, that is now the common "I rewrote X in Y" theme, which followed upon the Y ∈ { Ruby, Clojure, Scala, Kotlin,.... } from previous years.


And Go too! It's always fun to see posts from around 2014/2015 complaining about how every submission to Hacker News is now "I wrote X in Go", while now Go is the boring stuff and Rust is the hot new thing. I wonder what will be the next Rust though.


BPF-verified C.


Naw, BPF-verified BF. Or someone will make "BrainFuck Plus" so we can get BPF-verified BFP.


Some GC based language with dependent types.


Nim is on the way up in HN posts...


Nim has dependent types?


Write it in Malbolge? Slightly less silly, modern C++ and modern Python are evergreen.


They are, but they also are too big, so at some point people will want to replace them with something simpler/smaller. Go has been used for this for some projects in C++ and Python.


It's a shot at the "just rewrite it in Rust" meme, not at Rust or the Rust community.


> Rust has manual memory management, which means that whenever we’re writing code we’ll have to take the time to manage memory ourselves.

No.


Yeah, sounds like someone doesn't understand lifetimes and RAII. Even in modern C++ the number of times you have to actually think about memory management instead of lifetimes is basically zero unless you have to work with old libraries.


But thinking about lifetimes and RAII is 90% of memory management.

Basically whether you write C, C++, or Rust, you have to track ownership the same ways, the only thing that changes is how much the compiler helps you with that. However, if you write your program in Java, Lisp or Haskell, you simply do not care about ownership for memory-only objects, and can structure your program significantly differently.

This can have significant impact on certain types of workflows, especially when it comes to shared objects. A well-known example is when implementing lock-free data structures based on compare-and-swap, where you need to free the old copy of the structure after a successful compare-and-swap; but, you can't free it since you don't know who may still be reading from it. Here is an in-depth write-up from Andrei Alexandrescu on the topic [0].

Note: I am using "object" here in the sense from C - basically any piece of data that was allocated.

[0] http://erdani.org/publications/cuj-2004-10.pdf


With modern C++ your memory checklist is two steps: put it on the stack, put it in a unique_ptr on the stack. There are more steps after that, but you almost never get to them and wouldn't remember them if you discovered the need for them (which is okay because you never get there).


Your checklist is only covering the simplest case, direct ownership of small data structures.

I'm not going to put a large array on the stack. I'm not going to pass unique_ptr (exclusive ownership) of every resource I allocate to every caller. I still need to decide between passing a copy, a unique_ptr, a reference, or a shared_ptr. When I design a data structure with interior pointers, I need to define some ownership semantics and make sure they are natural (for example, in a graph that supports cycles, there is no natural notion of ownership between graph nodes).

These are all questions that are irrelevant in a GC langauge, for memory resources.


Not really irrelevant when the said GC language also does value types, e.g.

   // C#
   Span<byte> buffer = stackalloc byte[1024];


True, but even then you only have to decide between passing a copy of the value or a reference to it, no need to think about ownership.


Kind of, if it is a struct with destructors you need to ensure a region exists.

So either do something like

    using MyStructType something = new MyStructType ()
Or a more FP like stuff with

    myVar.WithXYZResource(res => { /* .... */ })
And then consider if it should be a ref struct, so that is only stack allocated.

This from C# point of view, in something like D, there would be another set of considerations.

Still much easier than "in your face ownership management" though, yes.


Well, destructors come up if you have non-memory resources to manage, and there you do go back to ownership and deterministic destruction issues.


> … put it on the stack, put it in a unique_ptr on the stack.

What happens when the stack frame gets destroyed but you kept a reference to the data around somewhere because you needed it for further compilation?

I, for one, am a fan of using the heap when doing the C++ things…


I guess something like Android Oboe, macOS DriverKit, Windows Runtime C++ Template Library, or C++/WinRT could be considered old libraries then.


Even then, just add a wrapper and off you go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: