
Let's make the Emacs GC safe and iterative - noch
http://lists.gnu.org/archive/html/emacs-devel/2018-03/msg00014.html
======
kazinator
It alrady handles long lists without recursion:

    
    
        case Lisp_Cons:
          {
    	register struct Lisp_Cons *ptr = XCONS (obj);
    	if (CONS_MARKED_P (ptr))
    	  break;
    	CHECK_ALLOCATED_AND_LIVE (live_cons_p);
    	CONS_MARK (ptr);
    	/* If the cdr is nil, avoid recursion for the car.  */
    	if (EQ (ptr->u.s.u.cdr, Qnil))
    	  {
    	    obj = ptr->u.s.car;
    	    cdr_count = 0;
    	    goto loop;
    	  }
    	mark_object (ptr->u.s.car);
    	obj = ptr->u.s.u.cdr;
    	cdr_count++;
    	if (cdr_count == mark_object_loop_halt)
    	  emacs_abort ();
    	goto loop;
          }
    

The "goto loop" chases the _cdr_ pointer iteratively. So with regard to list
structure, this will only blow the stack if you have deep _car_ recursion.
When conses are used to represent nested lists, you don't get that much depth
on the CAR side.

I'm just wondering what is the exact scenario, involving what structure.

Of course there can be other kinds of linked objects. E.g. a huge graph
structure, where the GC happens to traverse a very long path through the graph
before bottoming out somewhere.

BTW, "[i]f the cdr is nil, avoid recursion for the car." Initial reaction:
good frickin' idea and I'm stealing it immediately. Though, wait, not sure
what it buys you in practical terms; who the heck builds a deep structure of
conses where the linkage is through _car_ and the _cdr_ -s are nil.

A better test might be for the _cdr_ being any atom at all. Though obviously
some kinds of atoms are structures with pointers that can have depth behind
them, many kinds of atoms are shallow. If a _cons_ has any atom whatsoever as
its _cdr_ , then maybe we should recurse on the _cdr_ and tail loop on the
_car_.

Analysis required.

~~~
zmonx
You can crash Emacs for example with the recipe from #2099:

    
    
       $ emacs -Q --eval "(let (v) (while t (setq v (cons v v))))"
    

In response to how such deep structures can arise: They may for example arise
when _testing_ Emacs with randomly generated code snippets. For testing edge
cases, it is highly desirable that Emacs run robustly also in such situations.

~~~
stevekemp
Indeed emacs should be robust. I recently ran some fuzz-testing of Emacs, and
found evaluating a similar recursive example would crash it:

[https://blog.steve.fi/i_ve_never_been_more_proud.html](https://blog.steve.fi/i_ve_never_been_more_proud.html)

My case was resolved (a file with a few thousand "`" characters), but despite
running more fuzzing I couldn't get any other interesting results.

~~~
kazinator
I use AFL (American Fuzzy Lop) on TXR from time to time.

And, guess what, there was also a problem with nested backquotes. Backquote is
spelled ^ (caret) in TXR Lisp, so the cases that were triggering the
degenerate behavior had ^^^^^^^ rather than ```````...

[http://www.kylheku.com/cgit/txr/commit/?id=ab98634ea89927220...](http://www.kylheku.com/cgit/txr/commit/?id=ab98634ea8992722046ab857ec0eaec7cb024761)

~~~
stevekemp
Small world. This is my bug-report, which meandered a little:

[https://lists.gnu.org/archive/html/bug-gnu-
emacs/2017-07/msg...](https://lists.gnu.org/archive/html/bug-gnu-
emacs/2017-07/msg00133.html)

------
mav3r1ck
Interesting. So that's why emacs whigs out sometimes when trying to open very
large files: because it has a recursive GC that has trouble when it runs out
of stack space. Assuming I read the first statement correctly.

Seems like a welcome change. I'm not too well versed with the internals, but
does 4.6MB seem reasonable for a memory footprint? I mean, compared to Slack,
not bad.

> _The naive version of this scheme needs about 4.6MB of overhead on my
> current 20MB Emacs heap, but it should be possible to reduce the overhead
> significantly by taking advantage of the block allocation we do for conses
> and other types --- we can put whole blocks on the queue instead of pointers
> to individual block parts, so we can get away with a much smaller queue._

~~~
harrygeez
Not to be rude but Electron apps' level of footprint has always been a
compromise and should not be used to justify extraneous resource usage. Emacs
must be able to run from the console and resource-constrained systems like AWS
instances, where 5mb of RAM can make a difference.

~~~
craftkiller
If 5mb of ram makes the difference wouldn't that be the time to try mg[1] or
edit remotely using tramp? Certainly agree that Electron should not be used as
a justification.

[1]
[https://en.wikipedia.org/wiki/Mg_(editor)](https://en.wikipedia.org/wiki/Mg_\(editor\))

~~~
nerdponx
How does mg differ from Emacs proper?

~~~
msds
It's a text editor, not a lisp environment that edits text.

I once had a decent amount of fun splicing it with Lua, before deciding that I
don't actually care about text editors...

------
kazinator
> _We need to fix GC being deeply recursive once and for all. Tweaking stack
> sizes on various platforms and trying to spot-fix GC for the occasional
> deeply recursive structure is annoying. Here 's my proposal:_

There is still a default stack size limit of 8 megabytes on Linux, even though
consumer machines nowadays have RAM measured in double-digit gigabytes.

On an Ubuntu 16.x machine:

    
    
      $ ulimit -a | grep stack
      stack size              (kbytes, -s) 8192
      $ free -h
                    total        used        free      shared  buff/cache   available
      Mem:            15G        4.1G        2.8G        600M        8.7G         10G
      Swap:           14G        626M         13G
    

This 8192 kbytes has not changed in at least 15 years, unlike that 15G. Slight
disconnect there.

This 8192 is just the soft limit; applications can raise it up to the root-
imposed hard limit.

Unfortunately, that tends to be only 4X larger than the 8192:

    
    
      $ ulimit -s 32769
      bash: ulimit: stack size: cannot modify limit: Operation not permitted
    
      $ ulimit -s 32768
      $ # OK
    

Clearly, the hard limit needs revision too.

Threads are another issue; if you have a "can run in any thread" garbage
collector, it has to live within a thread stack. You can't have huge stacks
for each of a large number of threads. A thread's stack cannot be temporarily
extended; there may be something in the address space preventing it, since
it's just another mmap. (Don't think Emacs has this problem. :)

~~~
quotemstr
> There is still a default stack size limit of 8 megabytes on Linux, even
> though consumer machines nowadays have RAM measured in double-digit
> gigabytes.

Current stack sizes are fine; storing large amounts of information on the
stack usually isn't a good idea anyway, and makes tools like debuggers and
perf(1) slow and awkward.

Besides, the ulimit means nothing. If you really want, you can allocate your
own "stack", as large as you want, and switch to it. I just wouldn't recommend
it.

> A thread's stack cannot be temporarily extended; there may be something in
> the address space preventing it, since it's just another mmap.

You might be interested in the split stack model, described here [1]. You
definitely can grow a thread's stack as needed. The approach has various
issues (which is why Rust eventually went with conventional stacks), but it
certainly works.

[1]
[https://gcc.gnu.org/wiki/SplitStacks](https://gcc.gnu.org/wiki/SplitStacks)

~~~
BeeOnRope
How does big stacks slow down perf?

I agree putting large amounts of data on the stack isn't a great idea, but I
think it's mostly because (a) default stack sizes are quite small (b) they
vary by distribution and even local configuration and (c) if you happen to be
on any thread other than the main thread, it's the pthread stack size that
matters and this is even usually smaller than the default main thread stack.

I never heard of it causing any problems for perf. Yes, perf unwinds the stack
if you ask it to, but all the mechanisms will don't depend (AFAIK) on the
stack size.

~~~
kazinator
By the way, on glibc/Linux, you need (main thread) stack space proportional to
the complexity of your shared libraries (number of symbols). The ld.so memory
manager uses alloca for data structures related to symbol resolution.

~~~
BeeOnRope
Interesting. I guess this memory is also used at somewhat unpredictable times
in the usual case where symbols are resolved lazily though the PLT: when you
first use a function you'll suddenly get this extra stack usage while ld.so
does it's thing?

Doesn't that also imply this caveat also applies to thread stacks if you end
up resolving symbols off of the main thread?

------
cwzwarich
Why don't they just use the Deutsch-Schorr-Waite algorithm to store the mark
stack in the heap itself with pointer-reversal tricks? This would remove the
additional overhead.

~~~
rurban
Much better would be a copying collector, like a twofinger collector, because
emacs doesn't have much foreign pointers for political reasons, which can be
handled separately. No stack, much faster, much slimmer.

~~~
kazinator
Copying collectors can handle foreign pointers. Why? Because foreign pointers
can be managed via handle records. Those handle records are GC objects with a
type tag and all that; they hold the foreign pointer as an opaque machine word
in one of their fields. The handles themselves can be moved around by a
copying/compacting allocator. Of course, the foreign objects aren't. You
typically don't even know their size. They could be things like a library
handle returned by dlopen(): don't even think about moving that. They could
also be pointers into the middle of some object; a displaced C pointer can
enter the GC world as a foreign pointer.

~~~
rurban
The biggest problem are callbacks from foreign calls back into your moved
heap. Called back objects need to stay put, in a separate fixed area. All
major lisps with proper FFI support do that.

modules are no big deal, FFI's with callbacks are. And since RMS opposed an
FFI for emacs for decades (DLL or OLE/COM support) - there's now a simple
internal one for the gtk binding - a moving collector would be pretty simple.

------
dawg
Did sth. similar a while ago for our not so great D GC.
[https://github.com/dlang/druntime/pull/1073](https://github.com/dlang/druntime/pull/1073)

------
kazinator
I wonder if a hybrid approach is possible. Run GC recursively. Arrange for the
segfault to run on an alternative stack (sigaltstack). If a segfault occurs
due to insufficient stack space, then do the rest of the GC using the "queue
draining loop" approach, within the small amount of alternative stack memory.
Then longmp to some point in the original GC to bail out the original stack.

~~~
quotemstr
Why would you bother supporting both approaches? It's not as if the recursive
approach is any more efficient --- and whether or not you start with the
recursive approach, you need to keep memory reserved for a worst case queue
traversal.

~~~
kazinator
For one thing, because the recursive approach is debugged. If we can show that
the new code doesn't change what the old code is doing when that case is not
triggered, then then we can be confident that a bug hasn't been introduced.

------
olliej
Super dumb question: why not just use a queue, and visit the objects
iteratively?

Eg each visited object appends/prepends its children to the queue of objects
to visit, and you just repeatedly drain until you have an empty list.

That’s what I had to do many years ago for the GC I was working on.

Then any special cases (like the cons case referenced elsewhere) just becomes
performance optimization rather than “don’t crash” correctness

------
gnufx
This talks about possible generational GC. Incremeantal/generational was
available many years ago with a port to the Boehm collector. I never
understood rejecting it and generally spending effort on a bespoke collector.
(I don't remember whether the Ravenbrook system was available at the time, or
if it was unsuitable for some reason.)

