Hacker News new | comments | show | ask | jobs | submit login
Emacs with the SpiderMonkey garbage collector (gnu.org)
183 points by noch 3 months ago | hide | past | web | favorite | 45 comments

I came across this some time ago, may be of interest:


It's a port of the EmacsOS to Rust, should be able to run all your elisp code at some point.

I'm thrilled to see how much progress this project is making. The ancient core of Emacs written in C is nearly impossible for a newcomer to understand. Porting it to Rust might make it possible to work on the codebase and feel confident that your changes is correct.

Yeah, much more hackable it will be fur sure.

It may also come with speed improvements, as Rust did for FF. :)

And I can imagine that writing (parts of) Emacs' plugins in Rust (for speed and/or hackability) may also be a great addition.

You can load shared libraries in Emacs and Rust can expose a C ABI so it should already be possible to write plugins in Rust. In the same vein, this is cool: https://github.com/janestreet/ecaml

extremely naive to think a new language added perf over C++. its obviously a side effect of rewriting a 20+ yrs codebase under an updated architecture vision from scratch.

In the case of Firefox, that new architecture was impossible without Rust. Mozilla tried and failed twice.

It's not that it's impossible in C++ just that it's impractical to write large-scale solutions with a complex threading model with the "don't screw up" model C++ uses. The Rust compiler provides what amount to built-in unit tests with every build pass that allow you to fearlessly write code without data races. This in turn allows the new architecture you're referencing.

I don't think you need a complex threading model. The browser showed that you can get good interaction with single-threaded environment as long as you 1) can offload computation to other processes (web services, in the case of the browser, or more recently, workers) through message-passing and 2) such interaction can be performed asynchronously (i.e. while you wait for the result you can continue with the rest).

I think such kind of approach could be adopted by emacs without having to rewrite everything from scratch. Taking advantage of this would require of course rewriting a good amount of code (both elisp and C), but it could be done bit by bit over the course of a long time.

I think the Stylo developments show that it's not entirely true. Yes, you can get good performance from single-threaded models, and you can get better performance with threading. I'm also a huge advocate of not rewriting things, which is why Rust's C interop allows for incremental re-development of legacy codebases in a reasonable way.

> you can get better performance with threading

Indeed. I think mrighele is talking about server side web app performance, which is a much higher level game and because a lot is stateless allows scale-out + loadbalancing to achieve resource utilization maximization.

When writing hi-perf systems (like a browser), this is a very different game. In this game threading is a tool you need to make sure all cores utilized.

Fearless concurrency is very important for performance though. You can read more on that here: https://blog.rust-lang.org/2017/11/14/Fearless-Concurrency-I...

I wonder if it's possible to combine remacs with guile emacs to make guile remacs?

The main difference between the current mark-and-sweep garbage collector and the SpiderMonkey garbage collector is that the latter is a copying garbage collector and precise, meaning that objects may change address during GC and that only typed values, not memory locations that might or might not be, must be trace

Out of all the types of gc out there going to a copying gc seems not very compelling. Mark and sweep is not bad at all (well understood, straight forward to implement), and you don't have to worry about double dereferencing pointers (ie some implementations of Cheneys, I don't know if spidermonkey refercences are indirect pointers to one of two buffers).

Out of the 3 general garbage collector types - copying, mark & sweep, and reference count - copying is superior in general. The benefits of copying include: automatic compaction when copying from old heap to new heap (no memory fragmentation), improved locality (objects are closely packed), extremely fast reclamation (just throw away the old heap whereas mark & sweep needs to maintain/update the free list for different sizes of objects), great for systems with lots of objects dying young (mark & sweep needs to reclaim each died young object), cycle removal (ref count's pitfall), and generational GC is just an extension of the copying concept (in addition to copy from old heap to new heap, copy from younger generation to older generations).

I agree. We used this architecture in Apache CouchDB, and it can also be seen in the Lucene fulltext index.

The only downfall is needing enough free space to perform the copy.

And this is where the "it takes five times as much RAM to run a GC'd program with the same performance as the equivalent program with explicitly managed memory" bit kicks in.

Value semantics covers 95% of your object-lifetime needs. The other 5% can be covered with some form of reference counting and being smart about strong vs. weak references. Tracing GC is like bubble sort: almost always a suboptimal solution.

Not 5x as much ram. Exactly 2x more ram. You copy between two half spaces, back and forth. That's why it's usually also called semispace collector.

What you are talking in the 2nd paragraph is total nonsense in a lisp world. Lisp solved this problem decades ago, with copying collectors. Stack values are 20%, plus refcounts do not work with rplac* able trees and are super slow.

Yes -- five times as much RAM: http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

As it turns out, compared to the analysis and tree traversal inherent in even a fast GC algorithm, the overhead of malloc() and free() is really quite small. So yes, it takes twice as much memory to use a copying GC, right out of the starting gate -- but in order to match the performance of the equivalent explicitly managed program, it takes even more memory on top of that. And real time -- forget about it. It takes highly tuned, specialized algorithms to run a GC language that can meet hard real-time deadlines. You'll know you've found a decent algorithm because it will be proprietary -- part of some expensive, specialized JVM the Navy uses to crunch radar data aboard ship or something. The conventional algorithms (stop-and-copy, generational, etc.) are all fucking terrible.

And I wasn't talking about "in a Lisp world". Lisp is dead for real application development. Hell, the scruffies won, and C++ is now the predominant language in AI -- where Lisp was once said to thrive. Of course refcounts don't work if you try to build a graph with only one kind of reference. But a) you should know what kind of data structure you're building (cyclic or acyclic, graph or tree) and b) you should use weak references where possible to inform the deallocator which references not to chase. It's called being smart. Yes, it's a little more work than using a tracing GC, but it's still far less cognitively taxing or error-prone than using malloc() and free() manually.

I think you misread the 5x, 3x and 2x sentences there. These are the workloads benchmarked there. The more memory used the faster was a slow GC compared to malloc, equally fast with 5x more memory used, and with 2x memory 70% slower.

Any GC uses much less memory than a refcount or malloc application. That's common sense. Just copying uses more vmem during the copying, compaction step, but even this size is usually less than the refcount and malloc overhead. The copying one even uses no stack-space, due to Cheney.

A GC is always superior to refcounts and manual malloc, because it's much too hard to do that by yourself. With real-time you use incremental collections. Just store the state where to continue.

Agreed, the conventional m&s, stop the world collectors as eg. used in most major fancy languages are all terrible. You only need that for an FFI. But a good GC is far superior. And most importantly faster and safe. You don't update refcounts with a GC. You don't pollute your cache and concurrent updates with that.

Look at Java or .NET if you don't like lisp. Same tech, a solved problem.

> Any GC uses much less memory than a refcount or malloc application.

That's plain wrong.

> because it's much too hard to do that by yourself.

Too hard to do yourself? There are some of us who have been doing it for decades. In places where performance, predictable runtime and memory usage is of paramount importance (like the linux kernel) it is still done manually. I'll grant you Moore's law has made those places increasing rare nowadays.

Yes it harder to do it manually, but only in the sense that it's harder to do anything manually than have it automated. But being hard and error prone doesn't mean the result is slower, or uses more memory. A moments thought should have told you that. GC delays freeing memory until there is sufficient memory pressure. The mental load imposed by tracking malloc's means a programmer does it immediately so he can forget about it. Thus the memory is available for re-use immediately.

In my experience a long running program manually managing it's heap has a heap about twice the size of actual space in use. In theory some degenerate cases caused by fragmentation can make it far worse - but I've never been bitten by it. Long running GC languages (they tend to be written in Java) it's around 5 times. Again, if you done any sort of server administration, you would know this. All of us who do have watched Java VM's grow over with amazement and dismay. Ironically, the cure is a form of manual memory management - kill them and restart.

> Yes -- five times as much RAM: http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

That article was presented at OOPSLA '05 which ran from October 16–20, 2005. I don't know when it was written, nor when the data for it was worked out. That said, it seems safe to say that it's more than twelve years out of date.

you don't need to compact the whole working area as in the classical implementation. if you find that a page is sparse enough, just copy it into a less-than-full page of the next generation.

What about a compacting collector? It gives the locality and fast allocation benefits of the copying while using less memory.

Moving GCs are fairly non-intuitive. It seems like the extra overhead to do pointer updates and data copying would cause performance loss, but the generational nature of objects helps a lot. Copying means you get clean pages a lot faster and have fewer OS level allocations caused by pages sparsely populated with gen2 objects.

The Microsoft .NET implementation use a copying GC. IIRC the HotSpot JVM GC is compacting as well. Conservative GCs are much easier when interacting with C though, especially if you have to share pointers.

It also means you can have a much dumber allocator, which is particularly helpful if you do a lot of allocations that never survive long enough to need to be moved.

Copying GC is actually very compelling for the right use case (and in fact was part of the standard JVM GC model until it switched over to G1GC). It's particularly useful when you have small, fixed sized objects (hello there cons cell!) and low survivorship (hello eden space!). In the right use case it also helps performance because it increases cache locality.

Copying GC can be straight forward to implement: https://github.com/serprex/luwa/blob/d96ba8fc478813a0a65a65c...

Get to punt the allocator as a bump allocator

I found the interaction between the C call stack, the implicit call stack of the interpreted language, and the garbage collector to be tricky. However, I do not see why copying would be any more tricky than compacting.

Could someone explain, for someone not familiar with low level programming, what would be the advantages of using another garbage collector in Emacs?

This is asked in the thread, though I'm not sure I saw a good answer, yet.

Basic answer is if there are complaints with the garbage collector causing stalls or other hiccups, then moving to a faster one could help.

Now, this implies that a new one would, ipso facto, be faster. That is not guaranteed. But, garbage collectors have gotten quite effective in modern times. Extra memory helps, of course.

There are two ways a new GC might be helpful. It might reduce the length of any individual pause (though normally at the expense of total overall pauses) and it might reduce total memory usage by compacting the heap and being a precise rather than a conservative collector.

The current emacs GC can introduce long pauses, so I’d be happy to see something which improved Thant, and I have seen annoying increased memory usage in long emacs processes so experiments in this area would be welcome.

I'm curious how you get long pauses. I only really see giant pauses when I do a sync process call that takes longer than I expected. So, updating GLOBAL tags, for example.

> Basic answer is if there are complaints with the garbage collector causing stalls or other hiccups, then moving to a faster one could help.

Is that an actual problem, though?

Maybe I just don't push Emacs that far, but I have never felt GC performance to be much of a problem on even remotely capable hardware (say, anything at least as fast as a Raspberry Pi model 2).

Plus, Emacs allows the user to configure the frequency at which the garbage collector kicks in. Tweaking gc-cons-threshold has been entirely sufficient for my needs.

I personally doubt it is much of a problem. However, that doesn't mean I personally think nobody should try it. Will be interesting to see if any results follow.

To your point, though, I suspect better async processing would be of much greater benefit.

I don't see any advantage of Emacs going from the current mark-sweep collector to a copying collector. I don't notice GC pauses using Emacs, and even after being up for weeks, I don't notice it bloating from memory fragmentation. It also seems like a lot of work to change a large, mature codebase to work with a GC that moves objects.

EDIT: Apparently my experience is not universal -- see below: https://news.ycombinator.com/item?id=15803046

my emacs gc strategy: disable the gc. do manual collections by restarting every now and then.

Why would you want to do that? Are you running on a severely memory-restricted machine?

I'm assuming the opposite - they're running on a machine with lots of memory, and Emacs is getting bogged down by frequent and slow GC pauses. I have run into this problem myself and actually quit using Emacs because its GC lag was so bad.

yeah, this is correct. iirc it was some sort of "autocomplete open a file" library that was running slowly, mostly because of gc -- disabling gc made that snappy, and i have enough memory that it doesn't really matter if emacs never reclaims it.

i also really don't like having my editor pause to do gc.

(i'm addicted to emacs flexibility, but i would love a better-engineered replacement. maybe xi editor, someday. (though their multiprocess approach seems _insane_ to me, _especially_ in a language that gives you safe shared-memory concurrency!))

here's the necessary incantations, in any case anyone wants to try this at home:

  ;; never collect.
  (setq gc-cons-threshold 10000000000)
  (defun garbage-collect (&rest args) 
	  (message "trying to garbage collect. probably you want to quit emacs."))
  (setq garbage-collection-messages t)
the message kicks in at 10GB.

Could turning it off cause other issues (in Emacs / in OS), apart from eating lots of memory?

The OOM killer preferentially targets processes with a lot of memory allocated. If you actually let emacs consume all memory it's likely to die without warning.

so far so good, been doing it since january (so coming up on 1 year).

i remembered what the driving reason was to disable gc:

turning off gc gets my emacs startup time down to ~400ms from ~600ms.

Wow! My gut instinct would have been to try Boehm, but I have no idea why one would prefer one GC over another with Emacs.

Because Boehm is still a conservative mark & sweep, whilst a modern copying collector with nan tagging is MUCH faster. It only needs twice as much memory.

The root scanning problem with nested calls causing races is known. I do have the same problem. You need to assign to volatile temps, or implement boehm-like scanning of registers. Which is not easily portable. Volatile temps are, and they are fast enough for me. Still running circles around boehm or any other m&s GC.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact