
Bringing GNU Emacs to native code [video] - zeveb
https://toobnix.org/videos/watch/1f997b3c-00dc-4f7d-b2ce-74538c194fa7
======
gumby
Note: this is a video. Here are the slides:
[http://akrl.sdf.org/gccemacs_els2020.pdf](http://akrl.sdf.org/gccemacs_els2020.pdf)

~~~
pmiller2
I'd recognize a LaTeX beamer presentation anywhere.

Here is the corresponding paper:
[https://arxiv.org/pdf/2004.02504.pdf](https://arxiv.org/pdf/2004.02504.pdf)

~~~
disgruntledphd2
From org-mode too, as it has the pointless Outline page that org always
inserts into beamer presentations.

~~~
cion
It's not pointless, but the outline only shows up if the generated tex file is
compiled twice. The default org to latex export will only compile it once.

Alternatively, it is easy to suppress the outline slide altogether (add
"toc:nil" to "#+OPTIONS:").

~~~
disgruntledphd2
Oh I know. I have toc:nil as my default because it annoys me.

It's just a bit of a giveaway as to how the file was made.

------
poidos
I tried this out on my system last night. Compile time was quite large (I
would say about an hour and a half on my Ryzen 3600X.) I use the DOOM Emacs
config, and was surprised to find most things working out of the box with the
native compilation. I noticed no difference in startup time. The speed boost
was surprisingly noticeable, however, when e.g. opening a buffer that causes a
language server to start. Opening CCLS was... instant. It's usually quite
quick, but this was noticeably faster.

Great job all around!

~~~
s-km
does it do anything to change/fix Emacs shitting itself on large buffers?

last time i gave it a try opening a large file would completely kill
performance, and iirc in particular really long lines (ex. 1000+ chars) would
make the thing chug even if the actual file wasn't super big or anything.

~~~
aasasd
> _in particular really long lines (ex. 1000+ chars) would make the thing
> chug_

My uninformed guess is, this is bottlenecked by an inefficient algorithm that
accesses the memory too much, i.e. it's nonlinear with respect to the number
of characters. I doubt that tweaking the overall speed would fix it.

~~~
morelisp
My experience is it's generally either

\- inefficient font-lock regexes, which are rarely benchmarked / optimized
since most files don't have long lines and because font-lock behavior is quite
complicated to begin with;

\- inefficient thing-at-point implementations / use, e.g. an O(n*n) thing-at-
point called at various points by an O(n) function;

\- modes using font-lock regexps where they really just need "fixed" styled
text, but other Emacs architecture makes it difficult, e.g. compilation-mode
and derivatives.

~~~
pulisse
Emacs dev here. The primary cause is something else.

Emacs's inefficiency in handling files with long lines is due to two factors:
(a) The primitive unit of work for the display engine (the code that
determines how to combine text, font metrics, syntax highlighting, inline
image display, etc.) is a line (a newline-delimited span of characters). (b)
The redisplay routine is called very frequently—not just when the screen is
repainted, but pretty much any time the screen location of a buffer element
need to be calculated, and so, e.g., during navigation. So Emacs is
constantly, under the hood, going back to the previous newline and re-
calculating how the buffer contents from that point forward should be
rendered.

~~~
morelisp
I’m not an Emacs dev so I’ll concede your expertise here - but if it is
fundamental to the core navigation and redisplay, why does “stepping down”
modes help so much? Fundamental and text mode certainly aren’t great with long
lines, but are usually orders of magnitude better (eg a few seconds per
command instead of 10s of seconds).

(I have not tried so-long-mode and am mostly on Emacs 26 with some 25.)

~~~
pulisse
What you're observing is caused by what I'm describing. The features of these
other modes are expensive because they do things that implicitly invoke
redisplay.

Consider an ostensibly simple operation like determining the column in which a
particular character appears. The answer to that isn't the number of
characters since the last occurrence of `\n`, because whatever modes are
active can inject arbitrary spacing, or display particular strings as
something shorter or longer (a trivial example is the mode that causes
`lambda` to be displayed as `λ`), or cause some characters to be displayed in
a non-roman font where a single glyph is more than one column wide, and so on.
To determine the character's column accurately you need to take into account
everything that could affect how you'd paint the screen at that position,
i.e., run the whole redisplay loop for some portion of a buffer. The richer
the set of active modes, the more expensive it is to do that and the more
often it is done.

------
tgbugs
I have a Gentoo ebuild [0] almost working for this (I think I'm missing the
step where the eln files are loaded before dumping the base image). The
compile times are ... substantial. However they don't affect the development
process for new code since the interpreter is always there, and I am excited
to see what performance gains we will see.

From an engineering perspective this is a excellent example of a direct path
from interpreted to compiled code. The trade-offs are clear (heck, they are
number 0-3), and while there is complexity, all the engineering time has been
effectively concentrated inside a single project, rather than forced upon tens
of thousands of maintainers and users. Bravo. I wonder what other bytecode
interpreters could benefit from this toolchain. Compile times from qlop.

On a Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz Without comp
`2020-05-03T15:33:27 >>> app-editors/emacs: 6′40″` With compile
`2020-05-03T18:35:43 >>> app-editors/emacs: 2:11:45`

0\. [https://github.com/tgbugs/tgbugs-overlay/blob/master/app-
edi...](https://github.com/tgbugs/tgbugs-overlay/blob/master/app-
editors/emacs/emacs-28.0.9999-r1.ebuild)

~~~
aoeuhtns
nice, thanks for the ebuild!

------
gpderetta
As a exclusive emacs user for the last 20 years, I'm quite excited; my main
complain about emacs is its slow down with some more sophisticated packages. I
wonder if it improves magit performance with large codebases.

Pigs fly just fine with with enough thrust.

~~~
metroholografix
One reason magit is slow (that last I checked still hasn't been fixed) is that
it spawns a large number of processes calling out to git for each operation.
Most of these are redundant and could/should go away with either a redesign or
a working caching scheme. This issue is more than obvious on platforms (e.g.
some macOS versions) where fork/vfork are not as fast as one would expect. I
like the paradigm behind magit, but the implementation leaves a lot to be
desired. So the pig in this case is definitely not Emacs.

Example:

    
    
      # Doing a simple magit refresh results in the following processes being spawned (tracked with dtrace)
      # Notice how many calls are completely redundant
    
      2020 May  4 12:46:22 11819 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11820 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11821 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11822 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11823 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11824 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11825 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11826 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11827 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11828 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11829 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11830 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11831 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11832 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11833 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11834 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11835 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11836 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11837 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11838 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11839 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11840 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>
      2020 May  4 12:46:22 11841 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11842 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11843 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11844 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11845 <67471> 64b  /opt/local/bin/git --no-pager -c core.preloadindex=true -c log.showSignature=false <...>
      2020 May  4 12:46:22 11846 <67471> 64b  /opt/local/bin/git --no-pager --literal-pathspecs -c core.preloadindex=true -c <...>

~~~
hvis
Meaning, it's only really slow on macOS.

~~~
armitron
Multiple versions of macOS. And Windows. And possibly other less known
systems.

Besides, why would a discerning engineer put up with magit creating all these
wasteful processes knowing that most of them are redundant, regardless of
performance feel? If I was a magit developer, I would surely try to fix this.

~~~
hvis
On Windows it's not as fast as Linux, but it's not terrible either (as far as
I heard).

Also see
[https://stackoverflow.com/a/16902730/615245](https://stackoverflow.com/a/16902730/615245)
(I think it's not needed with the latest versions of Git anymore, though).

> If I was a magit developer, I would surely try to fix this.

Since you're just a regular user, did you check out the issue tracker for
related discussions?

------
gongyiliao
Here some rough, non-scientific testing/benchmark statistics.

It took me 124.98 mins to build the native-comp branch with a
4-cores/8-threads i7-4790k, 32GB RAM on a LXC instance while the master branch
took me 244 seconds. Both branches are obtained from latest available git
snapshot as 2020-5-4 20:20 CDT.

The following function is passed to (benchmark-run-compiled 10 ...) for each
run.

    
    
      ;; -*- lexical-binding: t -*-
    
      (require 'cl-lib)
    
      (defun bf-1 nil
         (/
           (apply '+
            (cl-loop repeat 300000
             collect (cl-random 1.0)))
         300000.0))
    

gc-cons-threshold is set as 268435456 (~ 256MB)

Before each run, (garbage-collect) function is called.

With Emacs's built-in core lisp functions, cl-lib functions and the above
benchmark function are all native-compiled, it took 0.5823 second to complete.

With Emacs built-ins and cl-lib are native-compiled but the benchmark function
is byte-compiled, it takes 0.6411 second to complete.

With byte-compiled Emacs built-in/cl-lib/benchmark functions, it takes 1.3574
second to complete.

With byte-compiled Emacs built-in/cl-lib functions and interpreted benchmark
function, it takes 78.054 seconds with 1 GC taking 75.094 second, which
implies the execution roughly takes 2.96 seconds.

I also ran same benchmark on a 4-core A10-6800k. and observed similar ratios
on the builds from 2020-5-3.

------
register
I posted the same content 5 days ago here:
[https://news.ycombinator.com/item?id=23021574](https://news.ycombinator.com/item?id=23021574)

No reaction at all. I am really curios to understand what I did wrong at the
time.

~~~
saagarjha
Nothing at all. You just lost the Hacker News lottery.

------
ken
"Arguably the most deployed Lisp today?"

That's an interesting question. My other guess would have been Gimp, and based
on a quick glance at Debian's popcon, I'd say Gimp might be slightly in the
lead.

Then again, like Open Firmware deployed Forth on millions of computers, right
under our noses, it would not surprise me at all if there were a simple Lisp
implementation hidden on every computer in the world.

Is there anything more recent than DSSSL?

------
saagarjha
Looks really cool! One thing I noticed was the generated code seemed to have
fairly poor register allocation, looking more like it was just pulling things
straight out from locals into registers and immediately storing them back.
From the talk, it looks like that was what was being provided to libgccjit,
but surely it could optimize that further?

------
harrygeez
Just when I thought I know enough compsci to understand everything on a
sufficiently high level, this guy totally lost me starting from LIMPLE

~~~
philsnow
Compilers are "just" a series of transformations / translations from higher-
level code to lower-level code. The top is code like C, python, elisp,
whatever, and the bottom is machine code for amd64, arm7, whatever. All the
in-between code is in some intermediate representation (IR).

Each successive step takes care of different optimizations, modifying the code
as it goes down. At the last step, he converts LIMPLE to an IR (intermediate
representation) that libgccjit understands, and hands it off to gcc for native
compilation.

Could you just start with elisp and emit amd64 machine code in one step?
Absolutely, but it would be hell to maintain, and then you lose out on all the
pluggability of modern compilers. If you (consume and/or) emit standard(-ish)
IRs, you get to participate in a pretty amazing ecosystem.

~~~
__s
Also even if you do "one step" you'll want to have an intermediate data format
which is SSA for the more efficient optimizations

------
schiffl
Did anyone successfully compile it for x86 32-bit target? The docker image is
64-bit and the compilation seems to be hitting the gcc-i386 limit of 3GiB.

------
nemoniac
Very keen to try this out but that branch won't compile for me. What's the
trick?

------
fmakunbound
This seems convoluted compared to say moving the Lisp implementation from
Emacs Lisp to Common Lisp, of which several native code compiling
implementations exists.

~~~
widdershins
Rewriting thousands of packages (some of which have had man-decades of work
poured into them) would also be rather convoluted.

~~~
junke
You don't need to rewrite packages if you have a layer where Emacs Lisp is
compiled/interpreted as Common Lisp.

You can easily hack the readtable in Lisp, and rewrite Emacs Lisp sexps as
Common Lisp sexps.

Some interesting things to consider are file-local variables, buffer-local
variables, dynamic-scope/lexical-scope, how to handle floating point
(-0.0e+NaN in Emacs Lisp, custom floating point rouding modes/traps in SBCL
for example), but this looks like a reasonable approach overall.

For example (just a draft, this might be incorrect):

    
    
        USER> (defun make-local-variable (symbol)
                (eval `(define-symbol-macro ,symbol (bvar (quote ,symbol)))))
        MAKE-LOCAL-VARIABLE
    
        USER> (make-local-variable 'foo)
        FOO
    
        USER> (macroexpand-all '(list foo))
        (LIST (BVAR 'FOO))
        T
        T
    

The BVAR forms would then be able to access a buffer variable, with the
current buffer. This works with SETF too (but Emacs Lisp SETQ should be
replaced by SETF during the transform).

BVAR could expand into code that calls FFI functions, for compatibility. Using
an FFI approach would allow to progressively rewrite some parts of the runtime
into CL.

Just to clarify, I know this is a huge work to undertake, but this comes from
the assumption that we want to keep all existing Elisp files running
identically.

~~~
kazinator
Emacs contains a quarter million lines of C. What do you do with all the elisp
that calls into it? Maybe that could be turned into a "libemacs" and used from
FFI.

~~~
wtetzner
Or maybe you rewrite it in Common Lisp. If you're going to take the Common
Lisp route, I think it might be worth going all in and just re-implementing
Emacs in Common Lisp.

~~~
mikelevins
That's already been done. More than once, in fact. The problem is that,
although there are perfectly serviceable Emacsen written in Common Lisp, none
of them are GNU Emacs.

For example, there's Hemlock from CMUCL, and its descendants built into
Lispworks and Clozure Common Lisp.

Those implementations aren't likely to work well as a substitute for GNU
Emacs. For one thing, there's a substantial ecosystem of software that depends
on specific APIs and other characteristics of GNU Emacs. Writing code to
bridge GNU Emacs APIs with those available in the Hemlock descendants would be
a lot of work.

There are some other obstacles as well. CCL's implementation of Hemlock uses
the Cocoa text architecture, so it's not portable to platforms other than
macOS.

The Lispworks implementation is portable across Windows and numerous UNIXEN,
but the Lispworks license will not allow delivery of a proper substitute for
GNU Emacs (it forbids building an application that can be construed as a Lisp
development system, and when I pressed them about exactly what limits that
policy implies, they explicitly used Emacs as an example of an application
that would be forbidden).

The original CMUCL Hemlock and its portable version are designed to work with
CLX. It could probably be made to work in a modern X environment, but making
it fit well into modern GUI environments and porting it to all the platforms
GNU Emacs works on would be a huge amount of work.

I'm probably overlooking some other Emacsen, but I don't know of any off the
top of my head for which the situation is any easier.

~~~
fiddlerwoaroof
Robert Strandh is working on an Common Lisp emacs based on McClim but, McClim
basically forces you to use linux.

~~~
fiddlerwoaroof
There's also lem, which is more portable:
[https://github.com/cxxxr/lem](https://github.com/cxxxr/lem)

