
Unexecute - ingve
http://emacshorrors.com/posts/unexecute.html
======
neonscribe
Saving initialized data structures into an executable was the traditional way
to build large Lisp systems, and was a built-in capability of PDP-10 operating
systems in the 1970's. When I was a student at Utah porting PSL (Portable
Standard Lisp) to Vax Unix around 1981 we noticed that there was no such
capability available. For a while our workaround was to dump a core file by
sending SIGCORE (^\\) to the process, then start (resume) our system in a
debugger. Spencer Thomas, who was also a student at Utah at the time, wrote
the function he named "unexec()" to give us a more sensible path to the same
functionality. exec() takes a file and turns it into a process, unexec() takes
a process and turns it into a file. This code served our needs very nicely at
the time, allowing us to load compiled Lisp code into a bare interpreter and
save a complete system. Later, this code was incorporated into GNU Emacs for
essentially the same purpose.

At the time, building these systems took several minutes, so it really wasn't
feasible to expect users to just load everything they needed on startup. It is
highly non-portable, of course, and has caused headaches for Lisp builders
ever since. Amortizing startup time over a larger amount of work is still the
only portable solution I know, along with keeping initialized application
state in databases rather than in-memory data structures.

~~~
derefr
And really, this is just (a hackish implementation of) an image-based runtime,
ala Smalltalk. All ELISP is missing is a big list of all the globals it needs
to care about saving and restoring (so it can _not_ save all the random other
memory-garbage it happens to still be holding onto), a
serialize()/deserialize() pair of functions to run those through that result
in a standard on-disk representation, and a boot strategy involving
deserializing those structs into memory.

If you want to be fancy, you can make the on-disk VM-image format a database
(SQLite, LevelDB, whatever) so as to avoid writing it all out every time. Then
it becomes cheap enough to write out a differential state that you can make
the runtime do it automatically at intervals, after certain operations,
manually with a sync(1)-equivalent call, etc.

~~~
arnsholt
I had a similar realisation recently. I had to learn Smalltalk recently (for
my new job, believe it or not!), and Smalltalk really does strike me as image-
based programming find right. My previous exposure was Common Lisp, but the
image and the source code getting desynchronized was a recurring pain. After
deleting a function, but missing some uses for example, the code might work
fine until you reloaded the source into a clean image. In Smalltalk, that
doesn't happen, because the image _is_ the code.

~~~
qwertyuiop924
That's because Lisp isn't image based at all. It merely has a live
environment, much to the frustration of anybody who wants to dump the running
state of their lisp system to disk.

~~~
arnsholt
Yeah, that's true. Nonetheless, Smalltalk really feels like an improved
version of working with Lisp.

~~~
qwertyuiop924
That's true. Although I still say you'll have my sexprs when you pry them from
my cold, dead hands :).

Although, I wonder if I could hack together a lisp on top of the cog vm. That
would be cool.

Anyways, what kind of awesome job do you have where you get to write st for a
living? :)

~~~
DonHopkins
The Lisp Machine keyboard had dedicated open and close parenthesis keys, so
you could hold a hefty bag of nitrous oxide in your other hand while you typed
s-expressions.

~~~
qwertyuiop924
Wait, what are you doing with that nitrous oxoide? Should I start worrying?

~~~
DonHopkins
Reflecting on the S-expressions I just typed!

~~~
qwertyuiop924
Well, when you're done with the nitrous oxide, just toss whatever you wrote
into the obfuscated code contest.

~~~
DonHopkins
There's a reason the paren keys had really fast auto-repeat!

------
kazinator
I ported unexecute to GNU Make about ten years ago.

I was working in a project in which a "make" would load a huge tree full of
rules scattered in sub-makefiles, and take a full 30 seconds to evaluate
before kicking off the first incremental build, really putting a damper on the
edit-compile-test incremental cycle.

I got sick of this and so took the unexecute code from GNU Emacs into GNU
Make, and added an option to do "make --dump" to dump an image after loading
the rules. The restarted make image would kick off a build recipe almost
instantly.

~~~
VoiceOfWisdom
FAKE [1] the F# build tool uses a similar technique on the .net vm.

[1] [http://fsharp.github.io/FAKE/](http://fsharp.github.io/FAKE/)

~~~
TwoFx
Do you happen to have a link to the code in FAKE that implements this for
.NET?

------
kragen
It's pretty sad to see someone trashing a process checkpoint and restart
facility because they've never seen one before. As other commenters have
added, checkpoint-and-restart is a pretty useful facility for a variety of
reasons; it's a shame that Unix doesn't support it better.

I imagine if the author of this article looked at a Unix kernel they would
start arguing that we should remove context switches because they don't
understand them.

~~~
caf
I have wondered before if emacs could use CRIU on Linux to do this (
[https://criu.org/Main_Page](https://criu.org/Main_Page) ).

~~~
rurban
I've looked into criu. It's usable, but not as good as unexec and not yet
properly supported from packagers. Even if only the build step needs it. And
it doesn't produce a single binary, which is ok I guess for emacs.

Easier IMHO is to keep maintaining a proper malloc implementation. glibc were
not able to update to ptmalloc3 anyway, and now they want to destroy ptmalloc2
even more. I wouldn't trust them.

------
asah
Great idea, but please show some respect. Emacs was first designed in 1976,
when startup times were a very big deal.

"My .emacs is older... than most engineers..."

~~~
616c
I fucking love the .emacs community.

To keep pushing a project from the 70's and keep it current is quite a feat.

Rebuilding from scratch is its own challenge, but to keep maintaing something
at this scale and have such quotes be realized by hundreds of people is awe-
inspiring.

~~~
pjmlp
Even if I rather use IDEs, Emacs was eventually my rescue in the 90's after
failing to find anything on UNIX that could somehow resemble the Borland IDEs
I enjoyed using.

This were the days when fvwm was considered new.

So even today, if my IDEs aren't around, I get to use my old friend Emacs.

~~~
qwertyuiop924
My short-lived experience with Java and Android made me wary of IDEs. It
taught me an important lesson: If a language's IDE can autogenerate 90% of the
code in "hello world" for you, without knowing anything about the program you
are about to write, than that language has far too much boilerplate.

I've never loved IDEs, because they increase the amount of time between having
an idea and starting to code it, whereas with emacs, I can just start writing
code.

The only IDE that is any good is the Smalltalk IDE, because it's not so much
an IDE in the java sense as it is a realtime window into the soul of your
application and environment. I mean, if you thought the modern LISP's realtime
interaction was good...

But yeah, even then, I start to miss emacs. And I'm not even a serious emacs
user. Which is to say, my config can probably fit on only a few pages, and I
don't know the 100+ set of basic keyboard shortcuts yet.

~~~
pjmlp
I used Emacs for around 10 years, so I do know what it is capable of.

Regarding Lisp many in the FOSS camp that never experienced commercial Common
Lisp IDEs should give a try, the REPL is only the tip of the iceberg. Maybe
Racket is the closest one can get without paying.

Back in the mid-90's I couldn't even get Emacs to do what Borland got me with
their tools or even what I later learned Xerox environments were capable of.

Energize C++ with their custom Emacs was probably the closest thing that one
could get, if the company could afford it.

~~~
qwertyuiop924
>Regarding Lisp many in the FOSS camp that never experienced commercial Common
Lisp IDEs should give a try, the REPL is only the tip of the iceberg.

You've said. However, I am unsure as to whether any Lisp implementation
outside of the lisp machine truly allowed for programming inside a live
environment, such that there is no distinction between the live environment
and the code on disk, where the image IS your environment.

Although being able to serialize your live environment to disk goes a long
way. As a schemer, I'm still drooling over that particular feature.

~~~
lispm
'the lisp machine' did not do that. There were different attempts. The MIT
Lisp Machine used Lisp code in files, with some tool support.

The Xerox Lisp Machine (different hardware, different Lisp, different OS) used
a different approach will full managed source code in the image, with managed
files as kind of a way to persist sources.

~~~
qwertyuiop924
Ah, apologies. And yes, I know there were multiple lispms.

------
drfuchs
This isn't specific to Emacs; TeX is another notable example. Generally, this
was a popular technique to dramatically improve start-up time on DEC PDP-10
and -20 machines. There was support from the OS as well as the language
runtimes to make it work (for instance, saving an image of a running program
that had fd's open would still have to know they weren't open by the time the
saved image was restarted).

------
rurban
I worked with this recently. This feature is not a nightmare, just because the
author doesn't understand what a linker does, and the necessary separation of
old and new dynamic memory.

It's rather a stable and very useful feature, which just recently got under
attack, because glibc doesn't want to maintain malloc_get_state() /
malloc_set_state() anymore. XEmacs has a portable dumper pdump, which is a
hack compared to emacs unexec.

See [https://lwn.net/Articles/673724/](https://lwn.net/Articles/673724/) and
esp. [https://lwn.net/Articles/673815/](https://lwn.net/Articles/673815/)

I recently re-added unexec support (i.e. native compilation) to perl5 in my
cperl fork, but I haven't got it stable yet. Super trivial on solaris, but not
so easy on elf, darwin and windows with its various compilers and the
different way to treat their segments. But it's still the easiest way to do
it, compared to pdump or a seperate compiler or criu, which is still not in
the kernel and not in debian. They are saying it's unstable for 2 years, where
it's stable for 1 year already.

self-dump via crui is besides unexec the most stable variant, but it needs
either a service or root perms, first of all a package, and then it's not so
attractive because it produces many files instead of just one binary.

If glibc removes malloc_get_state() even if darwin still has a similar API,
I'll happily build with a static ptmalloc3, which is the better variant of the
glibc ptmalloc2 anyway, and they never where able to update this. (much
faster, but needs a bit more memory for housekeeping).

[https://github.com/perl11/cperl/issues/176](https://github.com/perl11/cperl/issues/176)

[https://github.com/perl11/cperl/commits/feature/gh176-unexec](https://github.com/perl11/cperl/commits/feature/gh176-unexec)

[https://criu.org/Main_Page](https://criu.org/Main_Page)

------
mindslight
It's not really valid to critique a compiler for being too system dependent.

Being used to this layer of abstraction being hidden in ld(1) doesn't mean
that reimplementation of it is _wrong_ , just _perhaps_ an ill-advised
maintenance burden.

An appeal to ASLR is a bit fallacious - that technology developed _for_ C's
deficiencies, including ISAs tailored for it. There are likely better ways
than using an untyped language that _begs_ attackers to forge object handles,
and then kludging around that by making attackers guess.

------
616c
Apparently someone looked into the FreeBSD temacs commentary on the status
report on the HN landing page today ...

[https://www.freebsd.org/news/status/report-2016-04-2016-06.h...](https://www.freebsd.org/news/status/report-2016-04-2016-06.html#ASLR-
Interim-State)

[https://news.ycombinator.com/item?id=12178766](https://news.ycombinator.com/item?id=12178766)

I love Lisps, but to an amateur with rudimentary infosec coursework this does
scream scary.

I LOVE THIS SITE. Hello early weekend entertainment reading, emacshorrors.com
...

~~~
aseipp
All Lisp systems basically work in a manner similar to this, by dumping their
'boot image' to disk so it can be loaded and worked with, as an interactive
image. Even systems like Smalltalk do similar things. A setup like this is not
particularly unusual in concept, although some of the exact specifics may be
different than e.g. SBCL.

ASLR is also a weak defense by itself as I noted in that FreeBSD thread. Any
ASLR-enabled application is, essentially, about 1 infoleak away from being no
different than a non-ASLR application. If you want to stop exploits, invest in
real mitigation tech, not cheap defenses, and suddenly you won't need to worry
about this so much. (Feel free to add randomization on top of working defenses
as an extra layer -- just not by itself.)

People always hem and haw over how this makes ASLR not work for XYZ, and thus
is a 'security nightmare'. But then, very strangely, we still find that it's
very possible to write all kinds of exploits that bypass usable ASLR anyway in
a variety of applications, with only a single infoleak, coupled with a
vulnerability, at only marginally higher work effort. ASLR does not actually
eliminate a class of vulnerabilities, it only adds an extra step in the
process of exploitation. It can only truly prevent a narrow class of exploits,
under very specific constraints.

So, it does seem like there's a real security nightmare happening, but it's
almost certainly not because random XYZ thing lacked ASLR at compile time.
It's because we invest our 'faith' in stop-gap, mostly futile defense
mechanisms that are obsoleted without much extra effort, normally.

~~~
nneonneo
One big thing that ASLR mitigates is _non-interactive_ exploits. In a lot of
applications, you can only send a payload once, and can't modify the payload
after the fact (for example, vulnerabilities in image file processors). This
is a common point of entry, and ASLR makes exploiting the underlying bugs much
harder.

So I wouldn't call ASLR a weak defense - it closes off a lot of exploitation
avenues by itself, and it can make exploiting interactive situations quite a
bit harder. Finding that second infoleak bug isn't always quite so trivial.

~~~
poizan42
I can tell from experience that the Linux implementation of ASLR however is
completely worthless. Why? Because the executable you launch itself isn't
randomized. The executable must be completely trivial for there to not be
enough usable gadgets to defeat ASLR.

The Windows implementation is actually better in this regard since executables
are randomized as well as libraries. However the randomization is the same for
all processes and only changed on boot (because libraries on Windows usually
uses relocations rather than PIC so the pages wouldn't be shareable if they
were randomized per process), so an infoleak in one process can be used to
attack another.

------
atemerev
Why people are so afraid of self-modifying code? It enables some cool hacks,
it is no more and no less secure than non-self-modifying code (as long as it
is properly contained), and, well, this what makes computers and programming
interesting rather then limited to boring table lookups and finite state
machines.

~~~
dikaiosune
> as long as it is properly contained

Why are people so afraid of C buffers? They enable great performance, and they
are no more and no less secure than bounds-checked buffers (as long as the
array indices are properly contained), and well, this is what makes computers
and programming run fast rather than being forced into slow interpreted
execution.

~~~
btown
[https://en.wikipedia.org/wiki/Poe%27s_law](https://en.wikipedia.org/wiki/Poe%27s_law)

------
marmight
Some previous discussion here:
[https://news.ycombinator.com/item?id=11001796](https://news.ycombinator.com/item?id=11001796)

------
lnanek2
Supposedly this sort of thing is also why Microsoft Word documents were so
hard to parse at first - they were just a memory dump of the process and not
text at all.

~~~
spolsky
not true, actually

[https://msdn.microsoft.com/en-
us/library/office/gg615596(v=o...](https://msdn.microsoft.com/en-
us/library/office/gg615596\(v=office.14\).aspx)

~~~
mdadm
From your link:

>Applies to: Office 2007 | Office 2010 | Open XML | Visual Studio Tools for
Microsoft Office | Word | Word 2007 | Word 2010

I feel as though the gp comment is referring to far older versions, although
without clarification, it's hard to be sure.

~~~
int_19h
The older versions are also not literal dumps. They're binary "dumps" of the
object tree in memory, yes, in a sense that you walk the tree and write it
out. This is bad because your in-memory object tree then effectively defines
the format, and it's not spec'd otherwise, which makes portability that much
harder, especially for a closed-source application where you can't see code.
But it's a very different problem.

FWIW, old Office documents were actually CFBF (Compound File Binary Format)
files - think of it as FAT-in-a-file, allowing for multiple independent
streams inside, with transactions. This was very commonly used on Windows in
the OLE/COM era, because it was the underlying format for OLE Structured
Storage. It's what allowed a Word document to embed another arbitrary document
in an extensible way. The underlying data in the streams within CFBF was a
loose object graph dump.

It all makes a lot of sense when you have your OLE glasses firmly on - it's
basically a natural design that follows if your world consists of OLE objects
and interactions between them. Look up IStorage and IStream to see what I
mean.

The side effect of all this, however, is that the data inside an old Office
file is not laid out in a logical way - streams consist of non-sequential
interleaved blocks in a seemingly random order (depending on what was written
when), some blocks may contain garbage data, and so on. So it's very difficult
to reverse engineer, which is why it took so long back in the day, and the
results were often unreliable.

~~~
poizan42
> FWIW, old Office documents were actually CFBF (Compound File Binary Format)
> files

That's actually the "new" binary formats. The usage of CFBF seems to have been
introduced in Office 4.2 (at least Excel 5.0 is the first Excel version to use
them, it's hard to find information about the old Word document file formats).

> The side effect of all this, however, is that the data inside an old Office
> file is not laid out in a logical way - streams consist of non-sequential
> interleaved blocks in a seemingly random order (depending on what was
> written when), some blocks may contain garbage data, and so on. So it's very
> difficult to reverse engineer, which is why it took so long back in the day,
> and the results were often unreliable.

I don't believe the OLE compound file format has ever been much of an effort
to reverse engineer. But the CFBF based Office documents are also basically
just blobs of the older binary formats saved in a more structured way. The
issues with Office documents have always been a question about their sheer
complexity combined with their tight coupling to the internals of the Office
programs. This still shines through in the OOXML formats which contains lots
of stuff like "position something the way it was done in Word 5.0".

------
jimrandomh
If you think of emacs as a text editor which happens to have a LISP
interpreter, then this is silly. But if you think of it as a LISP interpreter
which happens to have a text editor, then it makes a lot more sense.

~~~
carterehsmith
Like they say, "Emacs is a great operating system, lacking only a decent
editor".

~~~
aidenn0
Thanks to evil-mode it now has a decent editor too.

------
lispm
> But loading these files with it to “dump” the Emacs binary? The only time
> I’ve heard of that was when reading a discussion about creating
> “executables” with Common Lisp which is apparently achieved by serializing
> the program’s current state to disk.

The very first Lisp implementation did that already. It could dump and read
memory images to/from tape. From that on, most Lisp implementations, and not
just Common Lisp, are doing it. Some have extensive capabilities in this area
(like tree-shaking or generating shared libraries which can be included in
programs).

------
jwilk
[https://lwn.net/Articles/673724/](https://lwn.net/Articles/673724/)

~~~
ank_the_elder
Great follow up to this article - thanks!

------
CyberShadow
> I hope at least Guile Emacs will try getting rid of it.

Guile Emacs is dead, isn't it? No activity in over a year:
[http://git.hcoop.net/?p=bpt/emacs.git](http://git.hcoop.net/?p=bpt/emacs.git)

~~~
jpfr
> Guile Emacs is dead, isn't it?

Emacs-guile is working today. That's a first after many many years of talk and
proof-of-concept stage efforts.

I hope the development will pick up speed once guile 2.2 and emacs 25.1 are
out. Both projects underwent some big changes/improvements lately.

One sign of live is that bpt's improvements to guile elisp were rebased on a
recentish guile-master. This branch lives in the main repository now.

[http://git.savannah.gnu.org/cgit/guile.git/log/?h=wip-
elisp](http://git.savannah.gnu.org/cgit/guile.git/log/?h=wip-elisp)

Also, I have seen some recent commits from emacs developers to guile. So the
two projects are talking to each other.

Now, imho, the success if emacs-guile strongly depends on whether they manage
to share the load across several developers. A more open development culture
surely would help.

------
to3m
I've spent a bit of time looking at this code. It's not really that bad,
though if might be a bit of a surprise to find it if you weren't expecting it.

The most serious problem with it I had is that there's something wrong with
the makefile dependency checking. For certain types of change you have to do a
full rebuild. But I'm pretty certain autotools is scarier than any memory
dump, so I just put up with this.

------
AceJohnny2
What's a good way for a program to trigger its own coredump? I had the issue
on Linux and QNX embedded systems that I was unable to live-debug, and I
wanted to be able to retrieve a coredump at various stages of execution.
Unfortunately, all I found were CLI utilities that just called GDB, which
wasn't available on my target.

I never felt confident to implement the feature directly :\

~~~
icebraining
If you can change the code, you can simply fork() and then abort() on the
child. If you attach that as an handler to SIGUSR1/2, you can have core dumps
on demand.

~~~
AceJohnny2
Nice! Thanks!

------
haberman
Admittedly I only glanced at this, but why can't the process image in question
be dumped to a C string (or .S file if you really want) and then linked with
the normal linker?

~~~
rurban
That's what's need to be done on windows. The other dumpers can do better by
just dumping the segments and loader commands, by taking special care of base
address adjustments and restoring the old malloc'd heap structures, not be
mixed up with the new heap. A normal linker will still need to reload the
shared libs, and then adjust the external symbols there.

Your solution needs several seconds, unexec just a few milliseconds.

~~~
haberman
Sorry, maybe I wasn't clear. I was suggesting doing all of this at build time,
not runtime.

Oh wait does this all get serialized to disk every time you exit emacs? Is
that what I'm missing? If so, why not run the linker at emacs exit time, to
optimize the loading sequence?

~~~
rurban
unexec is only done once, at build-time, to convert temacs with a bunch of
loaded libraries to emacs, which includes those libraries already. So it
doesn't need to search them on disc, and compile them.

It's never serialized. It's just dumped. Like a core file, with just proper
headers, sections and segments, so that it can be executed. A proper COFF/ELF
binary. A core file has all the segments but misses the headers.

~~~
haberman
Ok, so why can't it just be dumped to a .S file, so the regular toolchain can
handle creating the headers, sections, and segments? .S files basically let
you create arbitrary ELF images without making you implement the ELF format
yourself. I'm really not getting what the custom linker/toolchain in unexec is
buying you.

------
derefr
Question: is unexec() more or less portable than CRIU? If CRIU is the more
portable of the two (or if they're equivalent in both just basically working
only on Linux), could unexec() be dropped in favor of just generating a CRIU
process snapshot, and giving emacs a launch wrapper that restores from that
snapshot?

~~~
HelloImDumb
Emacs unexec is mature (by decades), ported to 11 operating systems. CRIU is
Linux-only, more or less experimental, and not included in mainstream stable
distros. Does that answer your question?

------
mherrmann
This would be really useful to speed up Atom/Electron startup time.

