register struct Lisp_Cons *ptr = XCONS (obj);
if (CONS_MARKED_P (ptr))
/* If the cdr is nil, avoid recursion for the car. */
if (EQ (ptr->u.s.u.cdr, Qnil))
obj = ptr->u.s.car;
cdr_count = 0;
obj = ptr->u.s.u.cdr;
if (cdr_count == mark_object_loop_halt)
I'm just wondering what is the exact scenario, involving what structure.
Of course there can be other kinds of linked objects. E.g. a huge graph structure, where the GC happens to traverse a very long path through the graph before bottoming out somewhere.
BTW, "[i]f the cdr is nil, avoid recursion for the car." Initial reaction: good frickin' idea and I'm stealing it immediately. Though, wait, not sure what it buys you in practical terms; who the heck builds a deep structure of conses where the linkage is through car and the cdr-s are nil.
A better test might be for the cdr being any atom at all. Though obviously some kinds of atoms are structures with pointers that can have depth behind them, many kinds of atoms are shallow. If a cons has any atom whatsoever as its cdr, then maybe we should recurse on the cdr and tail loop on the car.
$ emacs -Q --eval "(let (v) (while t (setq v (cons v v))))"
My case was resolved (a file with a few thousand "`" characters), but despite running more fuzzing I couldn't get any other interesting results.
And, guess what, there was also a problem with nested backquotes. Backquote is spelled ^ (caret) in TXR Lisp, so the cases that were triggering the degenerate behavior had ^^^^^^^ rather than ```````...
BTW: (setq v (cons v v)) -> (push v v)
If it cannot allocate more memory, I expect it to throw a Lisp exception that tells me so, not to crash the whole process.
I greatly appreciate Daniel's work to increase the robustness of Emacs, because it means that there will be fewer situations where a process crash leads to lost work for users. In addition, increasing the robustness of Emacs in such cases will also greatly broaden the set of test cases that can be run reliably, and hence help to detect and correct even more issues.
> Was: Re: bug#30626
(seq-doseq (_ (stream-range 1 1000000)) nil)
Looks like an X Y problem here. Let's fix the garbage collector (X) because of some badly implemented lazy processing (Y) where the real issue is.
I don't believe anyone has a filesystem that can't be traversed by a lazy data structure in Lisp, with a recursive GC, in 8 megs of stack. (Unless, say, the traversal is being fooled by cycles created by directory symlinks.)
Should it also be possible to successfully read any well-formed S-expression that we can create in a text stream? Oops, don't use recursion in the reader, then.
And yes, it should be possible to successfully read any well formed sexp that can be created in a text stream and held in memory. That's why quality parsers use an explicit stack as well.
Recursion is a useful development tool in some cases, but when iterative algorithms fail because they use it, the implementations are buggy.
Your alternative is table based parsers which can only reasonably be implemented with generators. The problem is that table based parsers are slower than hand rolled recursive descent parser by a fairl large margin, and general purpose generators (bison, yacc , etc) tend to have difficulty with the quirky syntax of large languages. Note there have been a number of papers over the years that demonstrated 2x perf improvements in yacc/bison by switching from indexed state transitions to computed go to, but even that leaves them slower than hand rolled.
Not to mention if it is defined in terms of a character-classifying read-table which dispatches functions, which can be patched by the user program (as in Common Lisp).
> That's why quality parsers use an explicit stack as well.
Observation: that rules out the entire description of the ANSI CL reader from being quality, haha.
Seems like a welcome change. I'm not too well versed with the internals, but does 4.6MB seem reasonable for a memory footprint? I mean, compared to Slack, not bad.
> The naive version of this scheme needs about 4.6MB of overhead on my current 20MB Emacs heap, but it should be possible to reduce the overhead significantly by taking advantage of the block allocation we do for conses and other types --- we can put whole blocks on the queue instead of pointers to individual block parts, so we can get away with a much smaller queue.
That doesn't seem right to me. (a) In emacs a file's contents aren't stored in a deeply nested structure, no matter what the size. It's a gap buffer. (b) Structure aside, that buffer isn't a Lisp object, but an object internal to the C level, so not subject to GC. And note that Emacs's large file behavior isn't especially different than other non-minimalist editors.
And then each generation reinvents its state-of-the-art abstracted memory hog...
Edit: more jokes, from 1985:
Eight Megs Ain't Comfortable Stack
I once had a decent amount of fun splicing it with Lua, before deciding that I don't actually care about text editors...
There is still a default stack size limit of 8 megabytes on Linux,
even though consumer machines nowadays have RAM measured in double-digit gigabytes.
On an Ubuntu 16.x machine:
$ ulimit -a | grep stack
stack size (kbytes, -s) 8192
$ free -h
total used free shared buff/cache available
Mem: 15G 4.1G 2.8G 600M 8.7G 10G
Swap: 14G 626M 13G
This 8192 is just the soft limit; applications can raise it up to the root-imposed hard limit.
Unfortunately, that tends to be only 4X larger than the 8192:
$ ulimit -s 32769
bash: ulimit: stack size: cannot modify limit: Operation not permitted
$ ulimit -s 32768
$ # OK
Threads are another issue; if you have a "can run in any thread" garbage collector, it has to live within a thread stack. You can't have huge stacks for each of a large number of threads. A thread's stack cannot be temporarily extended; there may be something in the address space preventing it, since it's just another mmap. (Don't think Emacs has this problem. :)
Current stack sizes are fine; storing large amounts of information on the stack usually isn't a good idea anyway, and makes tools like debuggers and perf(1) slow and awkward.
Besides, the ulimit means nothing. If you really want, you can allocate your own "stack", as large as you want, and switch to it. I just wouldn't recommend it.
> A thread's stack cannot be temporarily extended; there may be something in the address space preventing it, since it's just another mmap.
You might be interested in the split stack model, described here . You definitely can grow a thread's stack as needed. The approach has various issues (which is why Rust eventually went with conventional stacks), but it certainly works.
I agree putting large amounts of data on the stack isn't a great idea, but I think it's mostly because (a) default stack sizes are quite small (b) they vary by distribution and even local configuration and (c) if you happen to be on any thread other than the main thread, it's the pthread stack size that matters and this is even usually smaller than the default main thread stack.
I never heard of it causing any problems for perf. Yes, perf unwinds the stack if you ask it to, but all the mechanisms will don't depend (AFAIK) on the stack size.
Doesn't that also imply this caveat also applies to thread stacks if you end up resolving symbols off of the main thread?
Like say we are marking a vector containing pointers to objects. We need a state machine which tells us, on each visit to that object which tells us: "we have marked index [i]; now mark index [i+1]".
A list of 256 nodes doesn't have this problem: it has the marking bits in each node!
Also, how do we design the API for extending the heap with new kinds of objects. With the recursive traversal, you just provide a handler which traverses your objects and recurses down the pointers.
Perhaps we need some complicated protocol now whereby each object has methods like "init_traversal", "get_next_pointer", and "finish_traversal". In init_traversal, it receives a backpointer that it must store, and it returns a pointer that it would like the caller to retain (the one that is bumped to make space for the stashed one). In "finish_traversal", it gets back the retained pointer, in exchange for returning the stashed backpointer. Something like that.
Actually a protocol like this could perhaps clarify the algorithm.
The object pointer for the most recent continuation is stored in a global variable, and others are stored in a stack constructed by using pointers in objects that can now be inferred from the stacking nature of the recursion.
Part of the information for the code pointer for the most recent continuation is stored in the object pointer, i.e. from the object header you narrow the possibilities to process it. For an object containing N fields, you need log_2 N bits to store the field you were last processing. Since each object can appear in at most one stack frame of the recursive traversal (modulo where you put the marked check to bail out in the graph case), you need to put these log_2 N bits somewhere in the object.
You have a few options for where to put the bits:
- Find a few bits lying around where you can store them. This works pretty well for objects with few fields.
- Add an extra field with log_2 N bits to store them. This is the most straightforward solution for a larger object with an arbitrary number of fields.
- If you have an easy way of locating object headers from internal pointers, you can use internal pointers as your back pointers.
- Steal a few bits from every field that has a pointer value stored in it. This works because you actually only need log_2 M bits where M is the number of fields containing a pointer. The non-pointer fields can all be skipped.
This scheme can be generalized to implement arbitrary structural recursion on polynomial datatypes:
For the naive implementation you could either use a stack and tail recurse on the last unmarked cell, or a queue with tail recursion on the first unmarked cell (that is, you push on the queue cells starting at the second unmarked cell, and tail recurse on the first to improve locality of reference).
(I wrote GNU Smalltalk's GC in another life :-)).
Here is a good reason: you're debugging some problem that occurs in the middle of garbage collection. You have the world stopped and would like to look at some objects. Oops, pointers are scrambled by this algorithm and won't be put back in place until it runs to completion. (Which it might not even be able to do due to whatever bug you're trying to investigate, even if you have a way to initiate that.)
With regular recursive traversal, you can just bail garbage collection at any time: loop through all the heaps, and reset the gc bits of all objects to the "unmarked" state. Then examine whatever you want.
(If the bits aren't in the type tag, you don't even have to do that; though it's a good idea for them to be in a type tag because then an object which has been mis-handled by memory management (e.g. marked object remains in the live object graph after gc) then it has the wrong type and blows up when used.)
Some black box with that has no printf (or not that you can insert into a custom build), and for which code isn't available.
(Unrelated, but an amusing coincidence.)
modules are no big deal, FFI's with callbacks are. And since RMS opposed an FFI for emacs for decades (DLL or OLE/COM support) - there's now a simple internal one for the gtk binding - a moving collector would be pretty simple.
This is an interesting connection. Could you elaborate on this?
The implementation code is here:
Search for "setup_alt_stack" and "teardown_alt_stack".
Basically, this allows a Lisp signal handler lambda to execute even though the stack has been blown.
What if this is in the middle of gc? We can't run Lisp code with a heap that is the middle of gc, due to objects being marked and such. That situation is detected and a function called gc_cancel is called to simply cancel gc. Everything that has been marked is unmarked. To bring the gc machine to the most obvious safe state, all the structures related to generational gc are reset, and a full gc is scheduled.
With that implemented, the above Rosetta task became reliable; the random failures it exhibited were gone. Users programs (on platforms supporting sigaltstack) have a fighting chance to do something if gc runs out of stack, like save some data to disk or whatever.
Eg each visited object appends/prepends its children to the queue of objects to visit, and you just repeatedly drain until you have an empty list.
That’s what I had to do many years ago for the GC I was working on.
Then any special cases (like the cons case referenced elsewhere) just becomes performance optimization rather than “don’t crash” correctness