
XARs: An efficient system for self-contained executables - terrelln
https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/
======
peterwwillis
I would use this if it didn't depend on OS-specific features. Squashfs is not
portable to Windows, unless you extract it to disk.

I actually prefer the jar/Tomcat model, where the read-only image gets
distributed to servers, and when you run the app the image gets unpacked to
disk as needed. You could also write I/O wrappers that would obviate the need
to extract them to disk, and you could even make compression optional to
reduce performance hits.

It seems like all you really need is a virtual filesystem implemented as a
userspace i/o wrapper. Basically FUSE but only for the one app. There's no
need for the FUSE kernel shim because only the application is writing to its
own virtual filesystem. So this would work on any operating system that
supported applications that can overload system calls.

For example, I would start with this project
[http://avf.sourceforge.net/](http://avf.sourceforge.net/) and modify it to
run apps bundled with itself. With FUSE installed, other apps could interact
with its virtual filesystem, but without FUSE, it could still access its own
virtual filesystem in an archive. I would then extend it by shimming in a
copy-on-write filesystem to stack modifications in a secondary archive.

~~~
ctur
I agree it is a bummer that FUSE doesn't directly work on Windows, but it
should be doable -- we would love for someone to figure out the best way to do
this on Windows. Happy to collaborate with anyone who'd like to make this a
reality.

~~~
PeCaN
While FUSE doesn't work directly, there are userspace filesystem
implementations for Windows. Dokany¹ even implements most of the FUSE API.

1\. [https://github.com/dokan-dev/dokany](https://github.com/dokan-dev/dokany)

------
ctur
Hi, I'm Chip, one of the authors of the blog post and XAR itself. Happy to
answer any questions anyone may have about how XARs work, the way we use them,
or the motivations that drove their development.

~~~
jkingsbery
How do you pronounce "XAR"? "Ex-AR"? "Sar"? "Shar"?

~~~
ctur
One syllable, rhymes with "car" but with a "z" sound ("zar").

~~~
blattimwind
Czar?

------
rgovostes
Probably too late now, but xar already stands for "eXtensible ARchiver" and is
a file format used on macOS in some package installers. It's notable for
having an embedded XML "table of contents" that describes metadata of the
archived files, so new fields can easily be added while maintaining backwards
compatibility. (Compared to, say, the zip file format which does not even
specify how to store Unix file modes.)
[https://en.wikipedia.org/wiki/Xar_(archiver)](https://en.wikipedia.org/wiki/Xar_\(archiver\))

~~~
JohnDotAwesome
Names are hard. My name is John. Programmers get really confused when I tell
them that. "Did you know there's another programmer named John? You probably
should have researched names before you decided to go with that one"

~~~
kstenerud
Yup, just like with people, name clashes in computers won't cause problems!

    
    
        # apt install docker
        Reading package lists... Done
        Building dependency tree       
        Reading state information... Done
        The following NEW packages will be installed:
          docker
        0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
        Need to get 12.9 kB of archives.
        After this operation, 45.1 kB of additional disk space will be used.
        Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 docker amd64 1.5-1build1 [12.9 kB]
        Fetched 12.9 kB in 1s (24.9 kB/s)
        Selecting previously unselected package docker.
        (Reading database ... 13860 files and directories currently installed.)
        Preparing to unpack .../docker_1.5-1build1_amd64.deb ...
        Unpacking docker (1.5-1build1) ...
        Setting up docker (1.5-1build1) ...
        root@testing:~# docker
        bash: docker: command not found

~~~
jwilk
For people who don't understand what happened here:

In Debian ecosystem, "docker" is a system tray application:
[https://icculus.org/openbox/2/docker/](https://icculus.org/openbox/2/docker/)

Docker, the container software, is packaged as "docker.io".

------
rwmj
I'm not really a fan of containers, but I read this and thought "why not
containers"?

The page mentions cryptically _" They could almost be thought of as a self-
executing container without the virtualization"_. The _" self-executing"_ bit
makes sense - you don't have to remember to type "docker". _" without the
virtualization"_ doesn't make sense unless they mean without cgroups or are
talking about Kata Containers.

~~~
ctur
Generally XARs are lighter weight than a container. While you can (and
sometimes we do) use XARs to deploy, say, a self-contained service like a
website, often they are used to replace command line tooling.

Container isolation (cgroups, namespaces, etc) would make it difficult to do
some of the system level tasks we use such tools for, such as configuration
changes or monitoring.

Likewise, we often are replacing a PAR or C++ tool with a new XAR version, and
it is nice to simply replace the executable and not have to change how it is
invoked. In this regard, invoking a XAR is identical to running any normal
executable or shell script.

~~~
e12e
I can see why you'd sometimes want less isolation (although things like
docker-compose runs fine from a docker container). But how is it "lighter"
than a container? Aren't you striving for self-contained executables? What do
you leave out of a XAR that you'd want to put into a container?

[ed: I now saw this question and answer:

[https://news.ycombinator.com/item?id=17526178](https://news.ycombinator.com/item?id=17526178)

Frankly using system/external python (or other VM) seems a bit risky... But
whatever works, I guess..]

------
secure
Cool idea! Is there any particular reason to use SquashFS via FUSE instead of
via the Linux kernel driver?

Slightly related: we also recently switched to SquashFS for the gokrazy.org’s
root file systems.

If you’re curious about how SquashFS works under the hood, check out
[https://github.com/gokrazy/internal/blob/master/squashfs/wri...](https://github.com/gokrazy/internal/blob/master/squashfs/writer.go).
I also intend to publish a poster about it at some point.

~~~
ctur
We actually started with using "real" squashfs files. This had three main
disadvantages:

\- We had to maintain our own setuid executable to perform the loopback setup
and mount (rather than relying on the far more tested and secure open source
fusermount setuid binary that all FUSE file systems rely on) \- Getting
loopback devices to behave inside of containers (generally cgroup and mount
namespace containers) was a little tricky at times in some of our environments
\- We didn't want to have a huge number of extra loopback devices on every
host in our fleet

In fact, after implementing the loopback-based filesystem version, we almost
abandoned XAR as the downside of the security considerations and in-container
behavior wasn't ideal. The open source squashfuse FUSE filesystem really is
what made it possible.

Another side benefit is we could iterate far faster with squashfuse -- this
let us fix some performance issues, add idle unmounting, and implement zstd-
based squashfs files, and then deploy that to our fleet, faster than we could
deploy a kernel to 100% of hosts.

~~~
secure
Thanks, makes sense!

------
jillesvangurp
Not a bad idea. I wonder how this compares to ubuntu's snaps. Seems like a
good idea to me but I've not really seen it used much yet.

On OS-X apps have been distributed in a .app form for ages. It's very uncommon
for OS X apps to have installers or a more complicated installation (and
uninstallation) than drag and drop.

So, good idea and it kind of fixes a big issue where most linux distributions
seem to insist on dll hell with just about anything littering the file system
with cruft and just about every interpreter out there reinventing ways to
create virtual environments.

------
juliangoldsmith
This all reminds me a bit of Tiny Core Linux. IIRC, it uses SquashFS images
for all its packages, mounts them in a specific spot, then uses either
symlinks or UnionFS to put everything together.

------
FRidh
Yet another format for self-contained executables, and one that looks pretty
similar to the already existing AppImage.

Note that Nix users can use `nix-bundle` to create AppImages of all the
software in Nixpkgs, which is according to Repology one of the largest and
freshest package sets:
[https://repology.org/statistics](https://repology.org/statistics)

------
fenesiistvan
I am a windows developer and the single thing that stops me porting my apps to
linux is an easy to use deploy method. Is there some good way to handle this
task without to spend months learning about linux administration like shell
scripts, finding the best place for configs, logs on different linux distros,
daemons setup, etc. Something simple and distro independent would be fine...

~~~
rwmj
I am a Linux developer and the single thing that stops me porting my apps to
Windows is an easy to use deploy method. Is there some good way to handle this
task without to spend months learning about Windows administration like
installers, MSI, the registry, logging, services?

The non-flippant answer is to just provide the source and let the distros
package it for you. It's a different model. Linux users want to get their
software through an integrated package manager, and volunteers will take your
software and do all the work needed to make that happen.

~~~
TimJYoung
You can typically deploy a Windows binary to targets that use much older
versions of Windows, and you cannot say the same for Linux distributions.
There is significant backward-compatibility with Windows binaries, and it is
intentional. It provides for a much better ROI for applications that need to
be around for more than a couple of years.

~~~
tathougies
> You can typically deploy a Windows binary to targets that use much older
> versions of Windows, and you cannot say the same for Linux distributions.

This has nothing to do with the OS. A statically linked linux executable will
likely run on any kernel from version 1 to version 4. The issue here is
dynamic linking, and windows DLL hell has a name for a reason

~~~
TimJYoung
I'm referring to the Windows system APIs, which do not require any sort of
special treatment for any executable that is linked against them. This
includes everything that is included with Windows, from networking to file
handling to crypto to UI.

Which libraries are you including in your description of a statically-linked
binary on Linux ?

Finally, Windows DLL hell hasn't been a thing since the early 2000s, and even
then, it was primarily only an issue with applications dumping shared
libraries into the Windows system directories.

------
nikolay
Is Facebook an NIHS (Not Invented Here Syndrome) sufferer?

~~~
lttlrck
Indirectly this proves to be a useful discovery mechanism for me - when tools
crop up on HN I think ‘hey that’s interesting’, then oftentimes when I read
the comments I find there are numerous existing solutions I had never heard of
along with helpful links and insightful info :-)

It’s brilliant. It’s one of the reasons I value this site so much.

------
fooblitzky
Sounds similar to TCL's StarKits

------
mediocrejoker
I wonder how this compares to AppImages
[https://appimage.org/](https://appimage.org/)

~~~
thangngoc89
\- AppImages: Linux apps that run anywhere \- XARs: packages Python (node.js,
lua scripts app) into executable files

~~~
mediocrejoker
Thanks for providing that summary. Does this mean the two could be used
together?

~~~
thangngoc89
From my understandings:

\- AppImage: it packs Linux apps into a tar file so you can unzip it later and
run the executable of the app. The main selling point of AppImage is it's
distro-independent

\- XAR: it packs dynamic languages programs (python, node.js, lua) into a
executable file. The executable includes the language runtime, a fuse
filesystem to mount the program's source code.

In conclusion, I don't think you ever needs to make an AppImage for XAR
executable file.

~~~
RazZziel
Not exactly: An AppImage file packs a Linux app as a compressed ISO image with
an ELF preamble that is able to access the contents of the ISO image without
unpacking, so you can just double click the package and run the app without
ever unpacking it.

------
dagenix
Does the XAR file containing the Python executable itself, or, does running it
rely on having Python installed on the host already?

~~~
terrelln
The current Python XARs rely on Python being on the system path. But it would
be easy to build a custom Python XAR with the XAR builder library that
includes the Python executable and makes sure to use the packaged executable.

------
jarvuschris
Have you looked into Habitat? It provides a similar result with a complete
build workflow that works across technologies and platforms:
[https://www.habitat.sh/](https://www.habitat.sh/)

There's a rapidly growing library of libraries and services packaged with it:
[https://bldr.habitat.sh](https://bldr.habitat.sh)

Its build artifacts can be exported to a number of formats including container
images and tarballs, maybe a XAR exporter could be built:
[https://github.com/habitat-
sh/habitat/tree/master/components...](https://github.com/habitat-
sh/habitat/tree/master/components/pkg-export-docker)

~~~
JohnDotAwesome
Seems like Habitat (which looks awesome by the way) relies on Docker. Which,
if you consider performance heuristics in the article (size, cold/hot start
time), may be a non-starter for what they're trying to do.

~~~
djb_hackernews
Can you expand on that? In my experience nothing about Docker implies a
performance impact in terms of size or start time.

~~~
collinf
Well, there is the overhead of creating and removing namespaces each time a
container is ran, or communicating with the Docker daemon.

I think to most people it would be negligible, but fb operates at a scale
where these normally insignificant pieces matter. I would be interested to
hear more about the _why_ of a system like this over containerization.

edit: rwmj's comment has a good discussion over the benefits of this over
containerization.

~~~
nwmcsween
I promise you 100% the overhead is docker and nothing else.

------
ASinclair
How similar is this to Google PAR/SAR executables for Python and Bash scripts
respectively?

~~~
terrelln
Facebook's PAR is a self-extracting zip file, I assume Google's is similar.
XARs are self-mounting SquashFS archives (a compressed read only filesystem).
This means that XARs don't have to be extracted to a temporary directory to
run, they can run in place. Zip files have to be completely extracted before
running, but SquashFS decompresses pages on the fly, so startup times are much
faster (especially with zstd compression).

------
aumerle
It's somewhat counter-intuitive that start times with XAR are lower than start
times without it. Is fuse faster than a kernel filesystem? Even with
compression?

~~~
ctur
FUSE isn't generally lighter weight than a filesystem but it can be relatively
competitive for simple use cases like a read-only filesystem. Additionally,
squashfs lets you pack metadata and data very tightly, and since it is a
readonly filesystem, has some optimizations normal filesystems can't (how data
is placed, overhead of managing metadata operations, etc). Also squashfs lets
you choose how the files are laid out and compressed so that all files of a
certain type, such as all .pyc files, are close together, which increases
compression ratio and reduces overhead for subsequent file accesses (i.e., can
reduce random disk or flash IO).

In practice the timings of XAR vs filesystem are close enough to be "in the
noise" \-- it's when compared to PEX or PARs that the difference is quite
large.

~~~
aumerle
Makes sense, this is somewhat analogous to the import speedup you can get by
putting all the python modules into a zip file. I tend to do that when
distributing python applications on windows, where the speedup is more
noticeable.

~~~
ctur
Yep, it's similar, but squashfs is more optimized than zip files for random
access like a filesystem (rather than an archive). Also when using zstd-based
squashfs files, there is much less overhead for the decompression itself which
effectively becomes free.

------
TheAceOfHearts
I'm on mobile, but do you have an example of bundling a node app somewhere?

I'm curious how it compares to using something like pkg [0].

[0] [https://github.com/zeit/pkg](https://github.com/zeit/pkg)

~~~
terrelln
We currently don't have a nice open source API for building node apps, but
would welcome PRs that get us in this direction!

There are two ways to build a node app using the XAR builder tools. 1\. Use
the `make_xar` tool which will create a XAR from a directory and takes an
optional script to run on execution. 2\. Use the XAR builder library to make a
XAR builder that is specialized for building node apps.

------
dcgudeman
_Requirements

Python >= 2.7.11 & >= 3.5_

you need both?

------
russellbeattie
Facebook spent a decade contributing virtually nothing to open source, now
they're flooding the world with random projects of varying and questionable
value - most developed as a result of Facebook's severe N.I.H. attitude. I'm
honestly not sure which is worse.

~~~
tony-allan
XAR is a simple way to package and deploy Python and similar apps.
Dependencies are a real issue. Anything to improve the situation and to also
deploy related files is a great idea.

I don't like a lot about what Facebook does with user data and marketing but
their support of open source is better than many companies. Give credit where
it is due.

------
saagarjha
Kind of an unfortunate name, considering that xar (eXtensible ARchive) is
already a thing:
[https://en.wikipedia.org/wiki/Xar_(archiver)](https://en.wikipedia.org/wiki/Xar_\(archiver\))

~~~
nine_k
I'd hazard to say that almost any 3-letter abbreviation has been already
taken, many of the easy-to-pronounce ones, multiple times.

~~~
52-6F-62
Oh there are some fabulously childish three and four letter abbreviations
available yet. Probably not very appropriate, though.....

[https://fileinfo.com/browse/](https://fileinfo.com/browse/)

~~~
sigjuice
[http://dilbert.com/strip/1993-06-23](http://dilbert.com/strip/1993-06-23)

~~~
52-6F-62
.bug is apparently free. Might make way for more apologetic file names

    
    
        $ ./good_program_i_promise.bug

------
jeffrallen
<troll type="language">Just use Go.</troll>

