
For Better Computing, Liberate CPUs from Garbage Collection - mbroncano
https://spectrum.ieee.org/tech-talk/computing/hardware/this-little-device-relieves-a-cpu-from-its-garbage-collection-duties
======
arcticbull
IMO garbage collection is the epitome of sunk cost fallacy. Thirty years of
good research thrown at a bad idea. The reality is we as developers choose not
to give languages enough context to accurately infer the lifetime of objects.
Instead of doing so we develop borderline self-aware programs to guess when
we're done with objects. It wastes time, it wastes space, it wastes energy. If
we'd spent that time developing smarter languages and compilers (Rust is a
start, but not an end) we'd be better off as developers and as people. Garbage
collection is just plain bad. I for one am glad we're finally ready to
consider moving on.

Think about it, instead of finding a way of expressing when we're done with
instances, we have a giant for loop that iterates over _all of memory_ over
and over and over to guess when we're done with things. What a mess! If your
co-worker proposed this as a solution you'd probably slap them. This article
proposes hardware accelerating that for loop. It's like a horse-drawn carriage
accelerated by rockets. It's the _fastest_ horse.

~~~
davedx
> It wastes time, it wastes space, it wastes energy.

But all of these are much cheaper than developer labour and reputation damage
caused by leaky/crashy software. The economics make sense.

Anecdotally, I spent the first ~6 years of my career working with C++, and
when I started using languages that did have GC, it made my job simpler and
easier. I'm more productive and less stressed due to garbage collection. It's
one less (significant) cognitive category for my brain to process.

Long live garbage collection!

~~~
geocar
> But all of these are much cheaper than developer labour and reputation
> damage caused by leaky/crashy software.

And that's why so much effort has gone into making it fast and low latency,
but it was a false dichotomy: We can have memory safety without garbage
collection.

\- Automatic reference counting: Most people know about Objective-C's efforts
in this space, but it's admittedly less automatic than programmers would like
so perhaps it doesn't get enough attention. And yet it should, since q/kdb+
uses reference counting exclusively, and it holds top performance numbers on a
large number of data and messaging problems.

\- Linear lisp[1] asked functions to fully consume their arguments making
(cons x y) basically a no-op. Again, no garbage collection, and no
fragmentation (at least as long as all objects are cons), and yet no matter
how promising this path looked, garbage collection got much more attention.

\- Rust's borrow checker/tracker makes ownership explicit; somewhat of a
middle-ground between the two...

There's other scattered efforts in this space, and I don't know about all of
them but for more on "everything we know about languages is wrong", also
consider that Perl5 uses plain old reference counting, executes the AST
directly, and still outperforms python for data and IO[2]!

I think the thing to take away is that memory management has to be
automatically managed for developer sanity, _not_ that garbage collection is
the way to do it.

[1]:
[http://home.pipeline.com/~hbaker1/LinearLisp.html](http://home.pipeline.com/~hbaker1/LinearLisp.html)

[2]: The asyncio stuff in Python looks really promising though...

~~~
tomp
This comment is just bad and misinformed all over.

(1) _Automatic Reference Counting_ doesn't work; its equivalent in interpreted
languages is, well, _reference counting_ , which can be optimized quite a lot
(though has some issues with multithreading), but cannot collect cycles.

(2) therefore, if you want reference counting, you have to either also have GC
(for cycles), or program carefully to avoid creating cycles (which is then
only marginally better than C++)

(3) your comment on Python vs Perl5 is just nonsense, Python uses reference
counting as well (along with the occasional GC to collect cycles)

(4) linear / uniqueness types (not exactly the same, but both can be used to
ensure safe memory management) impose significant mental overhead on
programmers as they prohibit many common patterns

(5) Rust features a lot of "cheating" \- pointers that allow mutation from
multiple threads, reference-counted pointers, etc. - so obviously just
ownership&borrowing isn't good enough

 _conclusion:_ you can't have your cake and eat it too (at least according to
current cutting edge research and implementations) - you either have GC _or_
you have to be very careful/restricted when writing code

~~~
cyphar
> (1) Automatic Reference Counting doesn't work; its equivalent in interpreted
> languages is, well, reference counting, which can be optimized quite a lot
> (though has some issues with multithreading), but cannot collect cycles.

This is what weakrefs (or better data structures) are for. The Linux kernel
uses reference counting incredibly effectively for almost every structure. I
think that pretty much discounts any argument that reference counting cannot
work.

> (5) Rust features a lot of "cheating" \- pointers that allow mutation from
> multiple threads, reference-counted pointers, etc. - so obviously just
> ownership&borrowing isn't good enough

"unsafe" isn't cheating and isn't even related to whether garbage collection
is a good idea or not. Yes, Rust's ownership and borrowing model doesn't cover
all programs and hence "unsafe" is required -- but "unsafe" doesn't use a
garbage collector so the existence of "unsafe" doesn't change that Rust works
without one.

~~~
m0th87
The point with weakrefs, though, is that they still require developer
intervention.

Also, you can use most memory management facilities in Rust (including
reference counting) without `unsafe`.

~~~
arcticbull
Correct, leaks are considered "safe" \-- it's premature deallocations that are
considered unsafe.

------
exabrial
Azul Systems has asked Intel to do this once... but instead created their own
processors with interesting memory barrier properties for awhile that greatly
sped up JVMs beyond what was capable (at the time) on x86-32/ppc/sparc.
Eventually they gave up and became a purely software company, but their "Java
Mainframe" product was many times faster than the Intels of the age executing
the same code despite much slower CPUs. Died a quick life despite the cool
factor.

~~~
fooker
I had a mentor who worked at Intel labs when this was happening. The reason
this died was because someone invented a gc algorithm which consistently
outperformed this, leading Intel to drop their hardware gc plans.

~~~
ddingus
Could that not be hardware implemented / augmented?

~~~
fooker
It could be, but then after investing a billion dollars what happens when
someone develops another algorithm?

What happens when a programming language with different gc requirements become
popular?

~~~
sgift
Seems to be one of the main risks for any specialized circuits, if I
understand you correctly. You always have to guess "will this _really_ be
relevant long enough to invest the money to bake it into hardware?" .. and if
you guess wrong you just wasted a part of your silicon budget for something no
one will use.

~~~
ddingus
Right. Got it.

There may still be room for specialized GC assist circuits.

~~~
twic
When i thought about this (using my programmer brain which knows nothing about
hardware), i came up with the idea of a 'store pointer' instruction, which
took two addresses and an offset, and stored the first address into the field
of the object pointed to by the second address. And also, if the two addresses
referred to different memory regions, recorded the pair of addresses into some
kind of buffer on the processor. When that buffer got full, the processor
would trap to some preconfigured location.

That could be used as a basis for a write barrier.

The devil would be in the detail of how the regions were defined.

And maybe the trapping would mean this wasn't even all that fast.

------
notacoward
Readable copy of the paper at Berkeley:

[https://people.eecs.berkeley.edu/~krste/papers/maas-
isca18-h...](https://people.eecs.berkeley.edu/~krste/papers/maas-
isca18-hwgc.pdf)

ETA: this doesn't seem to be quite the paper that the story refers to, but
undoubtedly describes the same work in enough detail for people to get the
gist of it. Darn paywalls.

~~~
kazkada
The paper is available on Scihub: [https://sci-
hub.se/10.1109/MM.2019.2910509](https://sci-hub.se/10.1109/MM.2019.2910509)

------
simen
They're comparing to an in-order CPU. Given that most CPUs are out-of-order
(at least of the non-embedded variety, and GC is less used in such
applications anyway), it would be better and more intellectually honest to
actually compare to a typical CPU that performs GC. They kind of address this
in the paper but only in a short aside: "Note that previous research [1]
showed that out-of-order CPUs, while moderately faster, are not the best
trade-off point for GC (a result we confirmed in preliminary simulations)." So
they don't quantify what any of this means.

I think it's an interesting idea, but it doesn't bode well when they seemingly
choose the wrong target for comparison and hand-wave away the difference as
insignificant.

~~~
reitzensteinm
The comparison at least in the abstract is energy efficiency. It's quite
likely that a small in order CPU is very good at chasing dependent pointers
around the heap for its power consumption.

Imagine a linked list. Each pointer access is likely to miss to main memory,
and no concurrency is possible. Both the highest and lowest end cores will sit
around making a single request every 80ns.

They claim that the comparison was to the best alternative and I'd probably
take them at their word barring any specific evidence.

------
_bxg1
> globally this represents a large amount of computing resources.

Much of which would just sit idle otherwise, on client machines. Of course,
the energy savings still apply.

> He also points out that many garbage collection mechanisms can result in
> unpredictable pauses, where the computer system stops for a brief moment to
> clean up its memory.

This is more of a hard barrier that's being solved.

All in all pretty cool idea, but I think the impact would be different from
what's discussed here. Truly high-performance computing is already written in
non-GC languages. This hardware would give medium-intensity GC programs (read:
web servers on JVM, .NET, Node, Ruby) a boost, and could also allow some
higher-but-not-peak intensity software to be written with GC where it might
not be today (games come to mind), although that could actually encourage
_more_ energy usage than what it would save.

~~~
p_l
We wrote latency-sensitive and high-performance code in GCed languages back in
1980s - avoiding pauses or having predictable latencies (in fact, more
predictable than usual manual memory management!) are more of "we don't teach
people how to program" rather than issue with GC.

As for energy savings, many garbage collectors have amortized energy use lower
than malloc/free. Even pretty simple ones (some of the simplest I've seen beat
it so hard it's not competition, but they are specific to their application)

~~~
gdwatson
Could you explain further or give links to more information? I'd love to read
about old-timey techniques for programming in GCed languages.

~~~
jayd16
His point is it's not rocket science. Preallocate and pool what you might
need, don't call new in your tight loop. Change the GC algorithm to something
that never runs unexpectedly. If you're still allocating such that you need GC
eventually, manually GC at an appropriate time like a load screen.

~~~
vbarrielle
> Preallocate and pool what you might need, don't call new in your tight loop.

This piece of advice is also valid in languages without GC. In C++ it's well
known one should avoid allocating memory in tight loops, and it's strongly
advised to call e.g. `reserve` on vectors to avoid allocating.

~~~
p_l
The point is that you lose not only in tight loops. (Tight loop allocating
with good GC is actually well known _good_ [1] case, it's deallocation that is
PITA).

free() as well as malloc() and related have significantly complex algorithms
behind them, unless one can simplify base memory management in certain ways -
and those ways tend to also allow region-enabled GC to outperform them anyway.

[1] Allocating memory in many Garbage Collected systems can be down to one
instruction, and if using per-thread heap, it can be atomic add without CAS

------
jokoon
It reminds me of the days I was reading Knuth's quote "97% of the time
premature optimisation blabla" every time someone was trying to make something
faster.

CPUs are not getting faster, yet it seems using tools that makes things run
faster are somehow taboo.

Wirth's law:

Wirth's law is an adage on computer performance which states that software is
getting slower more rapidly than hardware becomes faster.

Why is java taught in university, and why is this language considered like
some kind of standard? Most OSes are written in C, yet most of silicon valley
frowns upon writing C because of arrays. Even C++ is getting a bad reputation.

~~~
pkulak
> frowns upon writing C because of arrays

Do you mean horrifying security flaws?

~~~
jokoon
You're right, but you cannot accuse the entire language of being insecure,
ultimately it's the responsibility of the developer.

Also you can't always say we don't use C only for security reasons, as other
languages also have their security issues. There are many modern ways to avoid
those flaws. Like someone answered, it boils down to a matter of development
cost.

I'm also quite skeptical when people always rise the objection of security
when writing software. Security is not so simple, and so far it's its own
specialty, and pretending that it's worth it to make things slower and that
the security gain is actually there, is not really completely accurate.

Security is almost a post 9/11 paranoia knee jerk reaction.

------
DenisM
Objective-C ARC (automatic reference counting) solved the problem neatly for
my iOS apps.

Is there some overhead? Maybe, but it's neatly spread out through the entire
application life time, so there is rarely[1] a UI-freezing stutter associated
with GC. To reduce the overhead I turned off thread-safety and simply never
access the same objects from more than one thread (object has to be "handed
off" first if it comes to that).

One wart on the body of ARC is KVO, which I avoid like a plague for many other
reasons anyway.

The other wart is strong reference loops. This can be solved by the app
developer by designing architecture around the "ownership" concept (owners use
strong references to their ownees, all other links are weak references). This
is a good idea in itself as it increases clarity of the program. I do make an
occasional slip, which is where I need to rely on Instruments, and I do wish I
had better tools than that, something more automatic that would catch me in
the act. Maybe a crawler that looks for loops in strong references during the
development process but is quiet in release builds. Or at least give me a
pattern to follow that makes it easy to catch my errors. For example, we could
assign a sequential number to each allocated object, and only higher-ranked
object could strongly refer to lower-ranked object. This won't work for
everyone but I wouldn't mind fitting my app to this mold if that gave me
immediate error when I slip.

[1] if you release a few million objects all at once it may stutter for a
second. Could be handed off to a parallel thread maybe.

~~~
fwip
ARC is just garbage collection that doesn't always work (circular references).

~~~
klodolph
True! And ARC in Rust suffers from the same problem (note that while ARC in
Rust and ARC in Swift are the _same thing,_ the "A" happens to stand for
different words in each case).

~~~
steveklabnik
They’re not the same thing; Swift is “automatic” and Rust is “atomic”; Swift’s
ARC is implemented the same way as Rust’s Arc, but Rust’s is manual.

~~~
klodolph
Hi! I think you may have repeated what I said in my comment, that the letter
"a" stands for different things. Although they do the same thing: they are
both atomic (in the sense of concurrency) and automatic (in the sense that you
do not have to explicitly retain or release objects).

~~~
steveklabnik
That’s the thing: they’re not automatic in Rust. You have to explicitly
retain. (Release is automatic though.)

Furthermore, in Swift it’s pervasive, and in Rust, it only happens for types
that explicitly opt-in.

~~~
klodolph
It sounds automatic to me. Maybe we disagree about what "automatic" is.

It also sounds like you are hunting for ways in which Swift and Rust are
different. This is completely unnecessary! It turns out that I already know
that they are different languages, and that the features do not map exactly
1:1 to each other. I hope next time you give me a little credit.

------
azinman2
I’m probably wrong, but didn’t the Symbolics LISP machines have some kind of
hardware support for GC?

I think for platforms like Android this makes a lot of sense. Should help
quite a bit with battery consumption and responsiveness. Also makes sense for
server loads in Java or Go.

~~~
gwern
A
[https://en.wikipedia.org/wiki/Tagged_architecture](https://en.wikipedia.org/wiki/Tagged_architecture)
yes. Tags make life a bit easier for the CPU when it accesses objects and is
deciding what to do with them, but the CPU is still doing all the work and
walk the RAM to do the GC. (I vaguely recall stories about Lisp machine users
who would turn off GC while working, and then let it run when they left work
and returned the next day.) The idea here seems to be to have an entire
separate chip, a specialized CPU, which walks RAM independently of the 'main'
CPUs, whose only task is freeing up memory.

~~~
_old_dude_
There is a renaissance of that idea in ZGC [1].

You have a lot useless bits in a 64 bits pointers and thanks to the virtual
memory, you can manage to have an untagged address and its corresponding
tagged address referencing the same physical memory. This give you free bits
that you can use to track the liveness/evacuation of a graph of objects.

[1]
[https://wiki.openjdk.java.net/display/zgc/Main](https://wiki.openjdk.java.net/display/zgc/Main)

~~~
p_l
Lisp Machines didn't use tags for tracking liveness/evacuation of objects,
though. They used them for safety, which automatically gave them _precise_ ,
as opposed to _conservative_ , GC which always knew whether it was dealing
with a pointer. They also had special, CPU-handled type of forwarding
pointers, which when accessed "normally" would transparently redirect you to
forwarded location.

~~~
_old_dude_
The forwarding pointer you are describing is equivalent to a ZGC colored
pointer with the evacuation bit set that a GC barrier (a load barrier) will
rewrite to the evacuation address.

and yes ZGC doesn't use colored pointer to track if a value is an integer or a
pointer because Java unlike Lisp is typed so the VM derives those information
from the bytecode.

~~~
lispm
Most Lisp implementations use typed memory via tags. Lisp doesn't have
pointers, but references and usually knows if something is an integer or a
reference - because it's encoded in the tags.

------
jakeinspace
My day job is writing C for on an embedded real-time system. No manual memory
management necessary... because we're forced to declare all struct and array
sizes at compile time! Not a malloc or free in sight. Obviously, it's
extremely limiting - pretty limiting as far as algorithms beyond "read data
off bus, store in fixed array, perform numeric calculation, write back to
bus." But I've gotta say, it's pretty freeing to write C in such a limited
environment.

------
CoolGuySteve
From an environmental perspective, I wonder how much energy is consumed (and
emissions generated) for garbage collection and interpreters. These things
exist to make programming easier but are then duplicated across thousands of
servers.

If everyone used some compiled language that was just a little simpler, a
little safer, had just a little better memory management/tooling, or like
here, had better hardware support, how much would that reduce global emissions
caused by data centers?

~~~
flukus
Here's a benchmark that includes an energy comparison:
[https://thenewstack.io/which-programming-languages-use-
the-l...](https://thenewstack.io/which-programming-languages-use-the-least-
electricity/) . It's interesting that speed does not directly correlate with
energy consumption and the the energy consumption of functional languages was
much higher than imperative.

~~~
Symmetry
That's more a measure of indirection than of garbage collection.

~~~
flukus
I don't disagree, but that indirection is necessary for the garbage collector,
even ref counters usually add levels of indirection.

------
kazinator
I'm skeptical; GC is closely tied to programming language run-times. How is
some accelerator going to know which pointers in an object are references to
other GC objects and which are non-GC-domain pointers (like handles to foreign
objects and whatnot)? How does the accelerator handle weak references and
finalization?

People aren't going to massively rewrite their language run-times to target a
boutique GC accelerator.

~~~
mar77i
I remember ruby had some approach with reusing previously allocated yet out of
scope objects. I can very well imagine taking this concept above and beyond,
and having virtually separated stacks for each type...

------
daemin
One talking point I'd like to ask is:

For small short lived scripts and applications, do we even need to free any
memory these days? For example you write a script which takes several seconds
to execute, moves files, computes stuff with strings, etc. Should we really
invest time and effort in the script interpreter to free the memory, where
instead we can just exit normally and let the OS handle the clean up.

I would imagine this kind of paradigm could be much faster to run because of
less runtime work being performed. The allocator used could also be a simple
linear allocator which just returns the next free address and increments the
pointer. If using multiple threads there could be one per thread.

What do people think of this?

~~~
lioeters
> For small short lived scripts and applications, do we even need to free any
> memory these days? ..we can just exit normally and let the OS handle the
> clean up.

That makes me think: why couldn't larger programs be composed of many such
small, short-lived scripts/processes that give up all allocated memory upon
exit?

I suppose there could be accumulated overhead for starting many such
processes, and also the issue of allowing shared memory spaces that are
explicitly _not_ automatically freed. I'm way out of my depth in this line of
thinking though, so, just speculating.

~~~
alexhutcheson
That’s essentially what happens if you use region-based allocation (aka arena
allocation): [https://en.wikipedia.org/wiki/Region-
based_memory_management](https://en.wikipedia.org/wiki/Region-
based_memory_management)

------
etaioinshrdlu
It does seem like just doing it in hardware may be a linear gain but isn't a
fundamentally better algorithm. There's a proof that you do need to pause your
program eventually, if you want to be sure you get all the garbage.

~~~
gizmo686
Hardware is fundamentally parallel, CPUs are fundamentally serial; it is
possible for a hardware solution to have a super-linear speedup in time.

As a simple example, what is the time complexity of zeroing out n bytes of
memory. With a CPU, this is O(n). However, with proper hardware support, this
can be done in O(1) time.

For a simple garbage collecting example (no idea how their chip does it),
consider a simple mark and sweep algorithm. Assume the chip has an internal
object graph. At the begining of GC, only root nodes are tagged. At each step,
the neighboor of every tagged node is tagged. With a CPU, this step takes at
least Ω(n) time, where n is the number of tagged nodes. However, if this is
done entirely by hardware, then (depending on how the hardware is designed)
each node can independently look at its neighboors and complete a single step
in just O(1) time.

Moving stuff to hardware gives you a set of primitives that is asymptotically
different than the primitives you have on a general purpose CPU.

~~~
AnimalMuppet
But hardware that has to interface with the main memory can't be fundamentally
parallel, because of memory bandwidth limitations. If you want to make this
_part of the memory_ , then my objection does not apply. But if it's an
external chip to the memory, you still are fundamentally serial.

~~~
gizmo686
How much does it need to interface with main memory? If the external chip has
its own memory then it can maintain its own view of the object graph. The
software can update it when references are made/deleted and query it when an
allocation needs to be made.

There is still the overhead of communicating when references are made/deleted,
but you need that anyway, as something that only looks at memory doesn't know
what is a pointer, or the size of objects.

The communication overhead is then linear in with respect to the amount of
updates to the object graph, not the size of the object graph.

You could even go so far as to not put the chip on the memory bus at all.

~~~
AnimalMuppet
I'm presuming some kind of a heap. That heap has something like a list of free
blocks. Who keeps that list - the main memory, or this other chip? If the main
memory does, then when you garbage collect blocks you have to change the main
memory. If the chip keeps the list, then yes, the chip can do it all
internally. But...

What about multitasking? If there are N programs running, and each has their
own heap, the chip has to be able to keep track of each of them. Or it has to
be able to keep track of one monster heap that uses almost all of the
available memory, even if it's occupied by a bunch of small allocations. All
this without using any (main) memory itself. This chip would have to have a
large amount of onboard memory itself to pull that off. The very worst
scenario would be to be a long way into a run of a program that thrashed the
heap, and then the GC chip ran out of slots. You can't (easily) go back and
re-do the heap to have the heap control in the main memory, and you can't
continue with the chip managing the heap.

------
jhallenworld
So my idea for GC is to offload it to a separate machine through a
communications channel. The main CPU sends messages to the co-processor
whenever it allocates memory, or whenever it mutates (whenever it writes a
pointer to allocated memory or to the root set- there could be special
versions of the move instructions which send these messages as a side-effect).
There is a hardware queue for these messages and the main processor stalls if
it's full (if it's getting ahead of the co-processor).

The co-processor then maintains a reference graph any way it likes in its own
memory. It determines when memory can be freed using any of the classic
algorithms, and sends messages back to the main processor to indicate which
memory regions can be freed.

This has some nice characteristics: the co-processor does not necessarily
disturb the cache of the main processor (it can have its own memory). Garbage
collection is transparent as long as the co-processor can process mutations at
a faster rate than the main processor can produce them. The queue handles
cases where the mutation rate is temporarily faster than this.

~~~
aidenn0
That would seem to eliminate all moving GC algorithms though.

~~~
xenadu02
Not really though you may want to. The main CPU could compact memory if it
felt fragmentation was high enough to warrant it. By its nature the traffic is
asynchronous so the CPU won't do a perfect compaction (it won't have processed
all the inbound messages to mark garbage as free) but it would probably be
close enough.

The big win here is making it all async. No need to stop the world, issue
write barriers to application activity, or otherwise synchronize for GC. You'd
have a bigger highwater mark for used memory but otherwise the main CPU can
pickup the "the memory from x to y is now free" messages on any thread and
whenever it is convenient.

~~~
aidenn0
Compacting without tracing can't really be done, and not all moving algorithms
are collect and compact (though collect and compact seems to have won over
other moving algorithms in the SMP erra)

------
thekingofh
You can usually precisely control garbage collection by turning it off or
forcing it to run. The cognitive load to handle memory manually is not
insignificant. If you control memory manually, you eventually end up designing
some kind of mechanism like ref counting or something else to handle memory
cleanup automatically. And there's significant reasoning that ref counting
might not be the most desired solution for all use cases. Best is a
combination of the ability to handle memory manually, with some more automated
garbage collection when there's a need to write stuff that doesn't necessarily
have to be the absolute fastest. Kitchen sink languages like C++ tend to have
both and don't force the developer in either direction. Best would be to
#define out 'new' and make manual memory handling explicit.

------
kilon
The irony of the thing is that in manual memory management languages you end
up doing your own garbage collectors and in garbage collector languages you
end up doing your own manual management. Unfortunately if you look in a
language to solve such complex problems you are heading straight to severe
disappointment land. Same shit different package. I still prefer dynamic
languages by a long margin because of their ability to do decent
metaprogramming and reflection which is essential for managing any form of
data. Pick your poison and enjoy the hype while it lasts.

------
jondubois
>> consumes a lot of computational power—up to 10 percent or more of the total
time a CPU spends on an application.

I stopped reading there. 10% is nothing. For such a useful feature as
automatic garbage collection, for the vast majority of applications, I'd
gladly give away 50% of the CPU.

In terms of ensuring code correctness and robustness, if I had to choose
static typing or automatic garbage collection, I'd pick garbage collection
every time. It adds a lot of value in terms of development efficiency and code
simplicity.

~~~
notacoward
> for the vast majority of applications, I'd gladly give away 50% of the CPU.

Don't you think you're over-generalizing a bit much from your own
circumstances? If these issues don't matter to you then you basically don't
belong in this conversation. They matter a lot to people who write software
that runs on large numbers of servers, with both CPU and memory utilization
pushed to the limits. That's a lot of us. If you have a million machines, even
a 0.1% improvement on either axis is _huge_. That's racks upon racks worth of
equipment that doesn't need to be installed, powered, or maintained. If I
could save 10% of 10% ("nothing" according to you) across such a large fleet,
I'd be a hero and I'd be rewarded accordingly.

~~~
glenneroo
Don't forget game development, especially users of the Unity game engine,
where their "aggressive" GC has often been the bane of projects such as Kerbal
Space Program. I have been struggling lately with my own Unity GC issues,
where it's running the GC every frame, subsequently dropping my FPS from the
90 I need for VR down to 50 every few frames. Even the new experimental GC
they have implemented seems to have no effect.

~~~
p_l
Would probably help if they had constant-time GC available, where you bind
mutator ("application code") and collector (GC) to specific quanta of time to
run, thus providing reliable timings.

The question is whether it would be fast enough.

------
Aardappel
Want to get away from garbage collection, retain safety, but think Rust is too
invasive? Try compile time reference counting:
[http://aardappel.github.io/lobster/memory_management.html](http://aardappel.github.io/lobster/memory_management.html)

------
stcredzero
While we're at it, how about we liberate CPUs and caches from communication
between threads and cores?

~~~
garmaine
Many ARM chips have this.

~~~
stcredzero
Any useful links? What are the terms I should search for?

~~~
garmaine
It’s called a “weak ordering memory model.” Synchronization requires explicit
memory fence instructions.

RISC-V, btw, supports both weak and strong memory models as an implementation
choice.

------
ekianjo
> but the automated process that CPUs are tasked with consumes a lot of
> computational power—up to 10 percent or more of the total time a CPU spends
> on an application.

Is that even a problem when most CPUs are idle 90% of the time even when doing
typical daily tasks?

~~~
rkrzr
It is a problem if you have a server process running that you want to be
extremely responsive (latency <100ms) and that then suddenly decides to do
garbage collection for a minute or two before answering incoming requests.

------
wolfspider
The Kiwi scientific accelerator uses a similar approach with FPGAs I believe:
[https://www.cl.cam.ac.uk/~djg11/kiwi/](https://www.cl.cam.ac.uk/~djg11/kiwi/)

------
nottorp
Hmm so there's a coprocessor that does the GC... doesn't it need to lock the
memory away from the main CPU while it does that? And doesn't this lead back
to unpredictable pauses and slowdowns?

------
stephc_int13
I never understood the need for Garbage Collectors. In my opinion, the
difficulties of memory management are extremely overrated. I write code in
C/C++ for almost 20 years and I never encountered a difficult bug that would
have been avoided with a Garbage Collector.

If a coder really has a hard time with manual memory management it means he
can't really code, this is a beginner problem...

~~~
maaaats
I only work in GCed languages, so I don't know how manual memory management
works, except segfaults in some courses at university. Guess I should quit my
job, as you say I apparently can't really code :/ Thanks for letting me and
others here know!

~~~
ndepoel
In my experience GC makes programmers sloppy in their resource usage. Just
allocate a bunch of memory annnd... whatever, the GC will take care of that.
But there are a lot of other resources besides memory that aren't
automatically cleaned up like that. So what happens is people forget to close
network sockets, forget to unsubscribe event handlers, forget to set certain
references to null to actually allow the GC to do its work, etcetera... The
existence and over-reliance on GCs has led to a mindset where many programmers
are just not aware that what you create must also be destroyed at some point.

------
didibus
Maybe I'm naive, but with multi-core CPUs, and parallel GCs, isn't it somehow
the same? One core is mostly only used for GC, while the others do other
things?

Edit: I guess they mention their chip itself can do it at a high level of
parallelism, so that's probably one more advantage. But CPUs with additional
slower cores and a lot more cores are in the works as well.

~~~
muxr
But the GC has to do it in a thread safe way which involves
locking/synchronization. Otherwise you get nasty race conditions.

~~~
dnautics
There's no need to lock or synchronize if you're working in a language with
immutability and copy on write first class data structures.

------
k__
Hasn't Rust basically solved that problem?

But yeah, legacy stuff could profit from this.

~~~
klodolph
No, because it is in general intractable to figure out object lifetime at
compile time. Rust has just solved the problem for more cases than, say, C++
does, or perhaps just with more rigor.

------
unictek
Could this be applied to Chrome V8 for Javascript memory garbage collection?

~~~
ben509
That should be an ideal application. Usually the trouble with 3rd-party
garbage collection is that it has to discern pointer vs. any other machine
word. That's more of a problem with C; this is why the Boehm GC library calls
itself "conservative". A runtime like V8 can follow a spec when it allocates
memory so everything is properly marked.

------
qwsxyh
I don't care about being absolutely fast when writing code. The convienience
of not having to care about memory management is far more important to me.
That's why I like GCs.

------
jasonhansel
Didn't the old Lisp machines also do this?

------
mailslot
Seems like we could just as easily stop using garbage collection. ... or even
go back to reference counting / smart pointers and just live with the
“limitation” that we can’t have circular references.

~~~
Eric_WVGG
doesn’t seem cool to admire Apple technologies, but ARC seems to work
amazingly well with zero CPU overhead

~~~
kllrnohj
It's not at all zero CPU overhead, not even close. retain & release are
thread-safe, meaning atomic ref count. Very comparable in cost to
std::shared_ptr<T> or Rust's Arc<T>. Both of which also automatically insert
the calls to inc & dec ref counts.

It's cool that you don't need to bother with specifying the type as being
std::shared_ptr<T> or Arc<T>, but it's not particularly novel, either. It's
"just" syntax sugar (or lack of syntax sugar I guess?)

~~~
vlovich123
Almost but not quite. Clang has very special rules around ARC that allow it to
perform additional optimizations that would otherwise be illegal [1].

[1]
[https://clang.llvm.org/docs/AutomaticReferenceCounting.html#...](https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-
optimization)

~~~
kllrnohj
That just lets it release earlier, not retain/release less often.

ARC can't do anything magic here vs. something like really careful use of
std::move & const references.

~~~
favorited
ARC optimizations do let it retain/release less often. For example, instead of
a callee adding the return value to an autorelease pool then having the caller
immediately retain the returned value, the autorelease+retain pair can be
elided entirely.

~~~
kllrnohj
> For example, instead of a callee adding the return value to an autorelease
> pool then having the caller immediately retain the returned value, the
> autorelease+retain pair can be elided entirely.

Are you just referring to standard RVO? If you return a std::shared_ptr in C++
today you won't get the equivalent of retain+release, either, RVO avoids that.

~~~
vlovich123
Did you even take a look at the doc I posted?

> For example, if the user puts logging statements in retain, they should not
> be surprised if those statements are executed more or less often depending
> on optimization settings

> In general, ARC does not perform retain or release operations when simply
> using a retainable object pointer as an operand within an expression

> However, C and C++ already call this undefined behavior because the
> evaluations are unsequenced, and ARC simply exploits that here to avoid
> needing to retain arguments across a large number of calls.

> ARC performs no extra mandatory work on the caller side, although it may
> elect to do something to shorten the lifetime of the returned value.

------
SlipperySlope
In Java, I created thread local resource pools including strings which
eliminate garbage collection in sensitive routines. Of course it’s much faster
in java to perform pooled string comparison with ==. Likewise I always use the
indexed version of a for loop to avoid the iterator otherwise allocated.

GC in java is great for non-priority code which is most of the application.

------
Causality1
Ok, so saving 15 percent of 10 percent of power use by changing both how we
build processors and how we write software. Doesn't seem worth it.

~~~
tybit
Seems worth it if the software developed is a fraction of all software
written. Sure your VM/runtime will have to be refactored, but not the millions
of programs that run on it.

