Hacker News new | past | comments | ask | show | jobs | submit login
Gccemacs: Experiment with native compiled Emacs Lisp (sdf.org)
85 points by pjmlp 10 days ago | hide | past | web | favorite | 41 comments

I've always wished Emacs would have been written in Common Lisp. It would already support this, and it's all around just a better language (IMO).

There's Climacs, but it's so far behind...

There was some discussion[0] about this a few months ago... (where I was involved & sparked the discussion), and the conclusion was that SBCL & C interop is not as good for emacs.

SBCL doesn’t allow lisp functions to be called from C without crossing the conversion line two times, which would be inefficient (esp. since the C functions are written in C for fast speed).

I personally think that even with the overhead, adopting SBCL would be an overall improvement, but I’m not maintaining emacs so...

[0] https://lists.gnu.org/archive/html/help-gnu-emacs/2019-05/ms... and the thread following

I have a hard time following that line of argument. I'd think much of the C code has been written a quarter century ago "for speed". Computers are orders of magnitudes faster these days, so we can waste a few cycles here and there. I wouldn't think it sensible to require Emacs to be fluid on computers slower than e.g. an original Raspberry Pi, which is much faster than the early Unix workstations the C code was written for.

Said that, while I personally might prefer an efficient implementation of a well documented and standardized variant of Lisp, Emacs Lisp is arguably easier to learn and more appealing to those who just want to customize the editor, which I'd presume it the vast majority of users.

Not only that, but the runtime semantics are different. ‘foo isn’t a function in CL. But in emacs lisp, it might be.

It’s not just a pedantic destination either. ‘(do-foo do-bar) is far easier to write than (list #’do-foo #’do-bar). And you can pass it around more easily. These are all things users want and need.

Emacs lisp probably wasn’t designed as a lisp. It was probably designed to be a configuration language, and lisp happened to be the most flexible way of designing it. That’s why stuff like lexical scoping or compiler semantics took a back seat (as it should in this case).

Look, I love lisp. You’d be hard pressed to find someone more fanatical about lisp. But CL is... well. CL is CL. You can make it into a config language, but even basics like (require ‘foo) leaves much to be desired.

Emacs Lisp is fundamentally simple; most people just don’t take the time to learn the design, and they dismiss it due to the long identifiers and global scope. But the global scope is the power. The global namespace matters. A lot.

> ‘foo isn’t a function in CL. But in emacs lisp, it might be.

In Common Lisp it might be.

  CL-USER 72 > (funcall (first '(sin cos)) pi)
In Common Lisp all symbols denoting functions can be called via FUNCALL and APPLY.

Thus one can also pass symbols for functions to higher order functions like MAPCAR:

  CL-USER 73 > (mapcar 'sin '(12 34 56))
  (-0.53657294 0.5290827 -0.521551)

  CL-USER 75 > (mapcar (lambda (f arg) (funcall f arg))
                       '(sin cos tan)
                       '(1.3d0 1.4d0 1.5d0))
  (0.963558185417193D0 0.16996714290024104D0 14.101419947171719D0)

  CL-USER 76 > (mapcar 'funcall
                       '(sin cos tan)
                       '(1.3d0 1.4d0 1.5d0))
  (0.963558185417193D0 0.16996714290024104D0 14.101419947171719D0)
A restriction of Common Lisp is that local lexical functions can't be called via symbols. But anything defined by DEFUN, DEFMETHOD, DEFGENERIC uses symbols as names and those can be used by FUNCALL and APPLY.

> basics like (require ‘foo) leaves much to be desired.

REQUIRE isn't much used in Common Lisp in the last decades. Individual implementations may still support it as a calling interface to some more complex machinery to locate code. Much of the current code uses ASDF to describe and load code files.

Heh. You’re right of course. Thank you for correcting me.

I was going to retract most of my criticism due to being blatantly mistaken. But, the namespace thing is actually the part that seems like it would cause the most friction for CL as a configuration language. Specifically: in emacs lisp, if you write ‘foo-frobber, you know it’s probably from the package ‘foo, and that it’s a frobber. Whether it’s a function or a variable can often be inferred correctly from context.

Whereas in CL, isn’t the norm to use the package system everywhere? (Perhaps I am mistaken about this as well.) So you can’t just pass ‘foo-frobber to arbitrary packages and expect it to work. For example, (intern “foo-frobber”) returns a symbol. (Or does it? Now I’m second guessing everything I thought I remembered about CL.) But can you just pass that plain old symbol between packages without having to explicitly indicate its origin?

Hmmm... now that I’ve talked through the problem, perhaps the namespace issue isn’t such a big deal. Maybe the friction was moreso ASDF’s peculiarities rather than the fundamental behaviors of CL.

> in emacs lisp, if you write ‘foo-frobber, you know it’s probably from the package ‘foo, and that it’s a frobber.

It's a convention, not an actual mechanism. But it's open to interpretations. What does ANIMATED-WINDOW-DRAW mean? Is it supposed to be WINDOW-DRAW in the ANIMATED 'package' or is it DRAW from the ANIMATED-WINDOW package? Probably there is another convention for that.

Note then that in Emacs Lisp the concept package means a software library, where in Common Lisp the concept package means a symbol namespace.

> Whereas in CL, isn’t the norm to use the package system everywhere?

It's widely used, but in slightly different ways. Some use it as a poor module system replacement. Often it is used a way to structure code into larger symbol namespaces. Sometimes you'll find code bases which do everything in one package.

> So you can’t just pass ‘foo-frobber to arbitrary packages and expect it to work.

'Passing to packages' is not a concept in Common Lisp. But you can use symbols from all packages everywhere. There is no enforced hiding mechanism or module system which hides things. The only thing packages actually do is being a namespace for symbols. Thus it is possible to have many symbols with the same name, but they need to be in different packages or need to belong to no package.

> For example, (intern “foo-frobber”) returns a symbol.

It does. It uses at runtime the package bound to the current package to intern the symbol.

The current package is the value of the symbol

> But can you just pass that plain old symbol between packages without having to explicitly indicate its origin?

You can pass the symbol around. Its package can be queried. If you want to refer to the symbol in written code while being in another package (another symbol namespace) one has four options:

1) use the symbol fully qualified FOO::BAR. Note the two colons.

2) use exported symbols as FOO:BAR. Note the one colon.

3) use the symbol after importing it into the current package as BAR.

4) programmatically lookup the symbol: (find-symbol "BAR" "FOO")

There's only one problem[0] that could bite someone used to the way elisp code is written: when the reader sees e.g. FOO:BAR, or FOO::BAZQUUX, the package FOO must already exist (and AFAIK have BAR exported), otherwise you'll get a reader error. You can't work around it with regular macros, because the error is at read-time, way before macroexpansion-time or compile-time.

This shows up whenever you need to set up some functions using a package before the code defining that package is loaded. This might be a rare use case in general, but it's something that shows up frequently enough when configuring Emacs. The only workaround for CL that I know is programmatic lookup of the symbol via FIND-SYMBOL, or just keeping the problematic code as string and running it through (EVAL (READ-FROM-STRING ...)) when it's safe for reader to parse it. Since Elisp doesn't have a package system, the problem doesn't exist there; at worst you'll get a warning from the byte compiler/interpreter that it doesn't (yet) know what variable or function a symbol represents.


[0] - Or at least what I believe to be a problem; I'm half-hoping there's a trivial solution that I'm missing.

Right, these are two limitations in CL at read time (when symbols are read by Lisp):

the package needs to exist and for an exported symbol notation, the symbol actually needs to be exported. The latter makes sense in some way, because the mechanism should warn, when a symbol is used, which is not exported.

One way one might be able to deal with that would be to catch the reader error and create a package on the fly, which then would have these symbols and/or exports some of these. Later, when a DEFPACKAGE is seen, the package could be adjusted to the new package definition. Note that modifying an existing package via DEFPACKAGE has undefined consequences in the standard. Implementations may deal with that, though.

But there are problems with the approach: imagine someone uses FOO:BAR in his source code. But that was a symbol from an old version of that software and now FOO:BAR is no longer exported or even doesn't exist. What now? Should this modify the package? Should it warn? Be an error?

Currently when using Common Lisp and reading a symbol with a non-existing package, the Lisp system shows an error. I can't remember what the Lisp Machine did, but it easily could have presented a restart option to create the package and go on.

LispWorks does that: it shows an error and provides a restart.

  CL-USER 89 > (read-from-string "LOL::ROTFL")

  Error: Reader cannot find package LOL.
    1 (continue) Create the LOL package.
    2 Use another package instead of LOL.
    3 Try finding package LOL again.
    4 (abort) Return to top loop level 0.

  Type :b for backtrace or :c <option number> to proceed.
  Type :bug-form "<subject>" for a bug report template or :? for other options.

  CL-USER 90 : 1 > :C 1
In SBCL one can create the package via DEFPACKAGE and use the RETRY restart:

  * (read-from-string "LOL::ROTFL")

  debugger invoked on a SB-INT:SIMPLE-READER-PACKAGE-ERROR in thread
  #<THREAD "main thread" RUNNING {10005184C3}>:
    Package LOL does not exist.

      Line: 1, Column: 9, File-Position: 9

      Stream: #<SB-IMPL::STRING-INPUT-STREAM {1002F8ADF3}>

  Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

  restarts (invokable by number or by possibly-abbreviated name):
    0: [CONTINUE ] Use the current package, COMMON-LISP-USER.
    1: [RETRY    ] Retry finding the package.
    2: [USE-VALUE] Specify a different package
    3: [UNINTERN ] Read the symbol as uninterned.
    4: [SYMBOL   ] Specify a symbol to return
    5: [ABORT    ] Exit debugger, returning to top level.

  0] (defpackage "LOL")   

  0] 1

I am not arguing that converting Emacs to Common Lisp is a good idea, but it would seem that much less low level C code would be required.

Also, the startup time for compiled and packaged SBCL applications is very fast because loading a prepared Lisp image file is very fast.

I much rather support the efforts to port to guile. Guile is the base of quite a few GNU projects and some effort has already been made on that front. For example one guile based tool I really like is mcron, a cron replacement/augment.


Guile Emacs, last I checked, was basically working fine. I don't even remember what the open issues were, because I never encountered any.

Couldn't emacs lisp be simply ported to work on the JVM? Then emacs would get a battle tested virtual machine, and the emacs devs could focus on the editor, instead of dealing with VM speed.

Keep your fingers crossed for Remacs. Cranelift can probably give it a magnitude increase of speed if things go well.

They are about halfway through, give them a hand!


If every C/C++ programmer/member of the Rust Evangelism Strikeforce here on HN help port a function, we would have a Rust Emacs in under a month.

C'mon everyone, the remaining code is pretty simple and straightforward:


Just mechanical translation to Rust. ONE FINAL SPRINT, and we will be done. The hard work of setting up the tooling and macros are already finished. Many here are happy to moan and complain about Electron infecting everything. Channel your anger into this project! Just help port one function, this isn't even leet coding level of difficulty. If you believe yourself to be a great software engineer, then lend a hand. Don't want Electron editors like VS Code and Atom to be the only game in town 5 years from now? Help translate a couple lines now. It's literally a single docker pull to setup the dev env.

Low hanging fruits:


Tl;Dr: like Leftpad but more ugly.

None of these proposals are going to address the main problem: most Emacs modes depend on a lot of regular expressions for syntax highlighting and many other things. Regular expressions are slow; it is a computational complexity problem. That is, for example, the main problem behind long lines being slow (and why they are not so slow in fundamental-mode). Now that Emacs has multi-threading, pervasive regular expressions are the last big thing that is responsible for noticeable performance issues.

> most Emacs modes depend on a lot of regular expressions for syntax highlighting and many other things. Regular expressions are slow

I use Emacs every day. Never had a problem with slow syntax highlighting when working on regular source code files.

I don't feel regexs are bottlenecks in everyday use.

What sort of Regex are they using? The garden variety can be compiled to DFAs (O(n)) which a modern computer should have no issues handling. Further optimizations

Here is a good example:


DFAs for regular expressions use exponential memory for regular expression size (also exponential time to compile them to DFAs). Try that for the regular expressions in the link above. There is no way to make regular expressions fast.

I believe Emacs has its own regular expression syntax that doesn't correspond exactly to either POSIX regexes, PCRE, or any other common regex variant. (Presumably because none of them existed yet when Emacs implemented regexes?)

How do more performant editors achieve syntax highlighting?

https://www.lazarus-ide.org/ does language modes with a parser-generator; the coloring is done on tokens. There is no good reason why Lisp code needs to be parsed with regular expressions.

You imagine the rusters think re-writing Emacs in rust is to solve some problem other than the act in and of itself.

What are the advantages to replacing the C implementation of Emacs to a Rust based one? Especially in terms of performance, I'm not seeing it, C is generally faster than Rust, so all Rust could provide is better safety no?

What needs to be made faster is ELisp, the parts of Emacs not implemented in C basically. Is remacs addressing that as well?

If the extension language of Emacs were implemented in Common Lisp, as grandpa suggests, then it could run fine on the JVM using the ABCL implementation, as well.

Or implemented in clojure, which is a more modern lisp...

I'm not sure clojure should be the language of choice to compile anything. At that point you might as well implement elisp as a hosted language as well. What papaw was saying up above was creating an elisp compiler for common lisp will make it portable to native code via sbcl and hosted code via sbcl.

I suppose one could argue writing it in clojure would make it portable to both the jvm and any javascript engine but what clojure project of significant complexity doesn't break out into the host language from time to time

What compilers have been written so far in idiomatic clojure?

EDIT: since I'm thinking about it could anyone imagine debugging a compiler in clojure? Talk about stacks on stacks on stacks...

ClojureScript is a Clojure to JavaScript compiler implemented in Clojure. I think that ticks the box.

True, and that's a good example. I guess what I'm really wondering is how many languages are written in clojure. Clojurescript for instance is just clojure compiling itself.

I remember seeing a few toy ones. No real serious languages though.

I don't think the parent meant to implement an elisp interpreter in Clojure. I thought they meant to build an Emacs like editor in Clojure. Similar to this now abandonned project: https://github.com/hraberg/deuce

There'd be no point in re-writing an Elisp compiler/interpreter in Clojure over its current C implementation apart maybe for if you think it be nice for you to work in Clojure instead of C when coding on it.

I can see the parent misconstruing compiling elisp with building a new emacs-like editor, but the thread (and article) was about compiling elisp.

That being said a clojure editor could be pretty fun, the oy downside being writing native widgets for a gui. I hardly know anyone who uses emacs in a shell if they can help it, and i know far more people who would prefer to use a shell inside of emacs instead.

The JVM would probably be a step backwards in terms of startup time.

Probably, but jvm bytecode can be compiled to native with graal/substrate.

And a couple of commercial AOT compilers since around 2000.

Additionally all modern JVM implementations have a JIT cache as well.

From that thread (https://lists.gnu.org/archive/html/emacs-devel/2019-11/msg01...):

I was already into gcc and libgccjit. I thought was cool to apply these to some lisp implementation. I decided to have Emacs as a target cause I imagined would have been useful and because I'm obviously an Emacs user and fan.

I wanted to do something with the potential to be completed and up streamed one day. Therefore I discarded the idea of writing the full lisp front-end from scratch. On the other side I considered the idea seen in previous projects of reusing the byte-compiler infrastructure quite clever.

The original plan was the to do something like Tromey's jitter but gcc based and with a mechanism to reload the compiled code. So I did it.

I had a single pass compiler all written in C that was decoding the byte code and driving libgccjit.

I was quite unhappy with that solution for two reasons:

1- The C file was getting huge without doing anything really smart.

2- After some test and observation became clear that to generate efficient code this approach was quite limited and a more sophisticated approach with a propagation engine and the classical compiler theory data structures was needed. The idea of just driving gcc and having everything magically optimized was simply naive.

So I came up with the idea of defining LIMPLE and using it as interface between the C and the lisp side of the compiler.

In this way I had the right IR for implementing the 'clever' algorithmic into lisp and the C side has just to 'replay' the result on libgccjit. Moreover it saved me from the pain of exposing libgccjit to lisp.

I then realized I could, instead of decoding op-codes, just spill the LAP from the byte-compiler. This makes the system simpler and more robust cause I get also information on the stack depth I can double check or use during limplification.

Lastly I managed to reuse the information defined in the byte-compiler on the stack offset of every op to generate automatically or semi the code of my compiler for the translation from LAP to LIMPLE for good part of the op codes.

The rest just iterating over tests debugging and implementing.

Dynamic languages like lisp are much better suited for jitting. The author acknowledges that most functions do not contain type information of what they return, limiting the amount of optimizations.

Jitting is better suited for handling this type of situation since you can JIT the version of the function for common types with a suitable guard that the input types are what you expected. Thanks to modern branch prediction those guard conditions are essentially free in the common casez and in the slow case you were slow anyways since you haven't jitted that version yet.

This is doubly true of emacs lisp, which is both dynamic in the type sense and dynamic in scope.[0]

Thus one has to guard both against type change, and change in what function a symbol resolves to.

[0] elisp does have lexical scope, but it's bolted-on, reduces performance (at least, used to) and isn't used in the core codebase, which predates it.

Not true, lexical vars are much faster in emacs.

More about this optimization technique: https://en.wikipedia.org/wiki/Inline_caching

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact