Hacker News new | comments | ask | show | jobs | submit login
XARs: An efficient system for self-contained executables (fb.com)
246 points by terrelln 7 months ago | hide | past | web | favorite | 120 comments

I would use this if it didn't depend on OS-specific features. Squashfs is not portable to Windows, unless you extract it to disk.

I actually prefer the jar/Tomcat model, where the read-only image gets distributed to servers, and when you run the app the image gets unpacked to disk as needed. You could also write I/O wrappers that would obviate the need to extract them to disk, and you could even make compression optional to reduce performance hits.

It seems like all you really need is a virtual filesystem implemented as a userspace i/o wrapper. Basically FUSE but only for the one app. There's no need for the FUSE kernel shim because only the application is writing to its own virtual filesystem. So this would work on any operating system that supported applications that can overload system calls.

For example, I would start with this project http://avf.sourceforge.net/ and modify it to run apps bundled with itself. With FUSE installed, other apps could interact with its virtual filesystem, but without FUSE, it could still access its own virtual filesystem in an archive. I would then extend it by shimming in a copy-on-write filesystem to stack modifications in a secondary archive.

> I would use this if it didn't depend on OS-specific features.

I always make the same mistake to assume that none would deploy something on windows for a server–side environment.

I agree it is a bummer that FUSE doesn't directly work on Windows, but it should be doable -- we would love for someone to figure out the best way to do this on Windows. Happy to collaborate with anyone who'd like to make this a reality.

While FUSE doesn't work directly, there are userspace filesystem implementations for Windows. Dokany¹ even implements most of the FUSE API.

1. https://github.com/dokan-dev/dokany

Hi, I'm Chip, one of the authors of the blog post and XAR itself. Happy to answer any questions anyone may have about how XARs work, the way we use them, or the motivations that drove their development.

Hi Chip, thanks for releasing this as open source. From a quick look XARs seem pretty similar to AppImages in the sense they both use an executable preamble and a squashfs so I'm wondering which would be a better choice for me to distribute my Python apps, and why. Thanks!

I am not an expert in AppImage but I think the main thing is it is pretty easy to make a XAR of a Python script and that the overhead is probably less (especially for repeated invocations).

I think XAR is simpler than the other alternatives; they have more complex specifications, being aimed at distributing full GUI applications (though I bet they work fine for these kinds of use cases, modulo perhaps quick and easy conversion of a Python program into a XAR). I suspect, but haven't measured, that XAR has lower execution overhead since it will re-use a mount point if a tool is recently invoked, rather than remounting every time.

One thing XAR doesn't really try to do is work cross-platform. It will rely on the system Python, for instance, rather than embed the interpreter and all libraries inside (it will embed the libraries your tool depends on that aren't part of Python itself). This is a pro in some cases (lower overhead), but a con if you want something you can carry across wildly different systems.

Could you write the core in Go, compile cross-platform, bundle the core binary, and then remove the reliance on the host's Python?

Hi Chip!

Did you evaluate flatpak/appimage/snap before developing XARs? If so, what were the shortcomings that you noticed in them?

I have found reasonably good success with snap on Python and node apps, but am not an expert on them. Just want to know from other practitioners about any gotchas that others might have studied/stumbled upon.


XAR development began in 2014 and hit production use in an early form in 2015; this was before some other options were out there, or before, I believe, AppImage went the squashfuse route.

XAR doesn't aim to, say, represent a sandboxed filesystem or provide some isolation or provide a specification for GUI apps to seamlessly integrate with your desktop. Instead, the main idea is to just get a single file that, when run, mounts a filesystem and runs a command inside it. In my mind, at least, it's a simpler, smaller primitive you could build other systems on top of (like we have with deploying Python, Lua, and Node).

Awesome work. Did your team evaluate creating a virtual filesystem that could process the SquashFS images without involving the kernel? Having completely independent executables that could run on _any_ system with zero additional install would be sweet.

To clarify - a stub in each XAR would act as a filesystem driver and intercept calls to open/read/etc, redirecting them to the internal data blob.

Edit: I see your comment below which answers this! https://news.ycombinator.com/item?id=17524910

For what it's worth, this is how Tcl self-contained executables work [0], since Tcl already has a virtual filesystem mechanism.

[0] https://www.tcl.tk/starkits/

I recently found out about Singularity[0], which seems to be very similar (squashfs application bundle). What advantages does XAR have?

[0] https://github.com/singularityware/singularity

A budget? Singularity was basically born of not-invented-here syndrome in the neuroscience community and doesn't have nearly the amount of resources that Facebook is able to devote to this problem. If that were to change (i.e., Greg Kurtzer gets millions of dollars in VC or NIH funding out of nowhere) and Singularity weren't a lone developer project it might be worth considering as a viable alternative to XARs.

Is it related to / inspired by Ruby Packer[1]? Because I am skimming it and seems 90% similar.

[1] https://github.com/pmq20/ruby-packer

Not related, but at first glance it looks similar. However, it appears that ruby-packer uses a library to access the squashfs image rather than actually mounting it. This makes it difficult (probably impossible) to load .so files from the squashfs file without having to copy them externally or to carry other helper executables along inside the image without also having to copy them externally to invoke.

A big goal for XAR is to be utterly transparent and not require hooks into Python, Node, or other runtimes to function.

Chip, thanks for sharing!

Could you have solved the same problem ("deploy apps with dependencies in a single file") using snaps? (snapcraft.io)

I think XARs are much simpler. snapcraft (and AppImage and other tools) seem to be much more invasive to the development flow and more opinionated about what the contents of the image look like (directory structures, yaml/xml/json files for metadata, etc).

XAR is much simpler. All it needs is a shell script (or raw executable) to run inside the XAR. Everything else is up to the bootstrapping. Note this isn't always a good thing -- opinionated software has its place. For what we use XAR for, though, having it "just work" with standard Python tooling without expecting open source libraries or modules or build tools behave differently is a strong plus.

AppImage only needs a shell script, which you can use to run whatever you throw at it (native, python, perl, ruby, java, wine, qemu...), the only difference seems to be it will use a compressed ISO instead of squashfs to pack the directory and later mount it with fuse.

Additionally and optionally you can setup a small .desktop file to provide some convenience metadata what will be useful to pack the app easily, and an icon file.

How do you pronounce "XAR"? "Ex-AR"? "Sar"? "Shar"?

> XAR is pronounced like "czar" (/t͡ʂar/). The 'X' in XAR is meant to be a placeholder for all other letters as at Facebook this format was originally designed to replace ZIP-based PAR (Python archives), JSAR (JavaScript archives), LAR (Lua archives), and so on.


One syllable, rhymes with "car" but with a "z" sound ("zar").


Hi Chip. How does facebook deploy XARs? Is an advantage over docker containers not having to run the docker daemon on every machine?

We generally deploy XARs like a normal executable (delivered via RPMs or other native packaging). We also have an internal packaging format we use to deploy them as well. The big thing for us is we treat them like normal executables, so any standard deployment model for an executable works for XARs.

Probably too late now, but xar already stands for "eXtensible ARchiver" and is a file format used on macOS in some package installers. It's notable for having an embedded XML "table of contents" that describes metadata of the archived files, so new fields can easily be added while maintaining backwards compatibility. (Compared to, say, the zip file format which does not even specify how to store Unix file modes.) https://en.wikipedia.org/wiki/Xar_(archiver)

Names are hard. My name is John. Programmers get really confused when I tell them that. "Did you know there's another programmer named John? You probably should have researched names before you decided to go with that one"

Yup, just like with people, name clashes in computers won't cause problems!

    # apt install docker
    Reading package lists... Done
    Building dependency tree       
    Reading state information... Done
    The following NEW packages will be installed:
    0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
    Need to get 12.9 kB of archives.
    After this operation, 45.1 kB of additional disk space will be used.
    Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 docker amd64 1.5-1build1 [12.9 kB]
    Fetched 12.9 kB in 1s (24.9 kB/s)
    Selecting previously unselected package docker.
    (Reading database ... 13860 files and directories currently installed.)
    Preparing to unpack .../docker_1.5-1build1_amd64.deb ...
    Unpacking docker (1.5-1build1) ...
    Setting up docker (1.5-1build1) ...
    root@testing:~# docker
    bash: docker: command not found

For people who don't understand what happened here:

In Debian ecosystem, "docker" is a system tray application: https://icculus.org/openbox/2/docker/

Docker, the container software, is packaged as "docker.io".

Funny story... We had a contractor start named "John Smith" and this guy just could not get on the domain, receive email, etc. All his shit was just not working until like a week after his start date.

My team doesn't handle new user creation, and it's a big company, where sometimes other teams may as well be other companies. I kept getting tickets from this guy and he's reporting that none of his stuff works. There's nothing I can do, but I try to help the guy out.

Anyway, a week later his stuff works. I guess whatever team does the new user creation never planned for there to be more than four employees with the same name, and their automation broke on "johnsmith5". Sigh.

I'm trying to imagine the architecture that led to this.

Trust me, I couldn't imagine it either. My guess is it actually probably broke on the second John Smith that joined, and they had to manually configure all subsequent John Smiths, and they just wait until they get a ticket.

Alternatively a dev thinks gotta handle user name collisions: "try creating user... hmmm if exists append ‘1’ and repeat with 2..n+1, oh uhhh, better stop after, idk, 5 times (last time my test borked and filled up the database with username ‘(null)’ last time, sigh)". Programmer takes lunch.

A plain fixed size C array embedded in some data structure somewhere?

Doesn't seem that big a stretch to imagine.

This would be more like Windows batch jobs or Powershell scripts. I honestly think it just cannot handle duplicate names at all, and they wait until they get a complaint. When we were acquired, we had a person on our team receieiving emails for a different user with the same name that previously existed in the company that acquired us.

But look, if I started a company called McDonald's, everyone would be confused and it'd probably be illegal. Complaining on forums about names is the open source community's alternative to trademarks.

Yes, if you were living in a world where the bank would only give an account to the first person named "John" to sign up, you should have researched names first. It will be fun dealing with file associations over the .xar extension.

Companies used to coordinate to make sure they didn't conflict on port numbers, Apple used to run a database for ensuring that creator codes were unique, etc. Now one of the biggest tech companies can't be bothered to check if they collide with another large tech company in a sparsely populated namespace.

Namespaces can help with that. For example, I note that you aren't just John, you're John Fawcett. And you probably have a middle name (or two, or more).

I bet you've worked somewhere with a collision on John. Did you adopt a handle, like John F, or maybe JohnDotAwesome?

I really wish I did have more than one middle name, both of which starting with `R` (indeed, my middle name does start with R). Maybe then as a J.R.R. Fawcett I could write fantasy novels.

Oh I'm one of the lucky ones then. It's funny, I often wish I had fewer names. I swear— very few forms ever allow room for two middle names or two middle initials.

R.R.A. Fairley

But I'm a little more into science fiction

Pen-initials are quite common. Iain M. Banks wrote SF; Iain Banks wrote the more mainstream stuff.

The larger "person's name" namespace includes the first name and last name. And people do generally treat it as a surprise when they find someone with the same name.

Yes, I was really confused by the AppImage comparisons in this thread until I realised this wasn't the XAR I already knew. It's hard to imagine they couldn't have easily checked the name before claiming it. The (Darwin) XAR page on Wikipedia was created in 2006 [1]...

[1] https://en.wikipedia.org/wiki/Xar_(archiver)

Same thing with Flux and Yarn.

I'm not really a fan of containers, but I read this and thought "why not containers"?

The page mentions cryptically "They could almost be thought of as a self-executing container without the virtualization". The "self-executing" bit makes sense - you don't have to remember to type "docker". "without the virtualization" doesn't make sense unless they mean without cgroups or are talking about Kata Containers.

Generally XARs are lighter weight than a container. While you can (and sometimes we do) use XARs to deploy, say, a self-contained service like a website, often they are used to replace command line tooling.

Container isolation (cgroups, namespaces, etc) would make it difficult to do some of the system level tasks we use such tools for, such as configuration changes or monitoring.

Likewise, we often are replacing a PAR or C++ tool with a new XAR version, and it is nice to simply replace the executable and not have to change how it is invoked. In this regard, invoking a XAR is identical to running any normal executable or shell script.

I can see why you'd sometimes want less isolation (although things like docker-compose runs fine from a docker container). But how is it "lighter" than a container? Aren't you striving for self-contained executables? What do you leave out of a XAR that you'd want to put into a container?

[ed: I now saw this question and answer:


Frankly using system/external python (or other VM) seems a bit risky... But whatever works, I guess..]

XARs are just self mounting compressed readonly filesystems with an executable inside. We get hermitic dependencies by setting the PYTHONPATH, LD_LIBRARY_PATH and such in the bootstrapping script.

One big benefit is that the filesystem only decompresses the pages as needed, which greatly improves the start up time over existing solutions.

"virtualization refers to the act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms, storage devices, and computer network resources."

Containers have virtual file systems they access instead of the host one. XARs don't.

Cool idea! Is there any particular reason to use SquashFS via FUSE instead of via the Linux kernel driver?

Slightly related: we also recently switched to SquashFS for the gokrazy.org’s root file systems.

If you’re curious about how SquashFS works under the hood, check out https://github.com/gokrazy/internal/blob/master/squashfs/wri.... I also intend to publish a poster about it at some point.

We actually started with using "real" squashfs files. This had three main disadvantages:

- We had to maintain our own setuid executable to perform the loopback setup and mount (rather than relying on the far more tested and secure open source fusermount setuid binary that all FUSE file systems rely on) - Getting loopback devices to behave inside of containers (generally cgroup and mount namespace containers) was a little tricky at times in some of our environments - We didn't want to have a huge number of extra loopback devices on every host in our fleet

In fact, after implementing the loopback-based filesystem version, we almost abandoned XAR as the downside of the security considerations and in-container behavior wasn't ideal. The open source squashfuse FUSE filesystem really is what made it possible.

Another side benefit is we could iterate far faster with squashfuse -- this let us fix some performance issues, add idle unmounting, and implement zstd-based squashfs files, and then deploy that to our fleet, faster than we could deploy a kernel to 100% of hosts.

Thanks, makes sense!

Not a bad idea. I wonder how this compares to ubuntu's snaps. Seems like a good idea to me but I've not really seen it used much yet.

On OS-X apps have been distributed in a .app form for ages. It's very uncommon for OS X apps to have installers or a more complicated installation (and uninstallation) than drag and drop.

So, good idea and it kind of fixes a big issue where most linux distributions seem to insist on dll hell with just about anything littering the file system with cruft and just about every interpreter out there reinventing ways to create virtual environments.

This all reminds me a bit of Tiny Core Linux. IIRC, it uses SquashFS images for all its packages, mounts them in a specific spot, then uses either symlinks or UnionFS to put everything together.

Yet another format for self-contained executables, and one that looks pretty similar to the already existing AppImage.

Note that Nix users can use `nix-bundle` to create AppImages of all the software in Nixpkgs, which is according to Repology one of the largest and freshest package sets: https://repology.org/statistics

I am a windows developer and the single thing that stops me porting my apps to linux is an easy to use deploy method. Is there some good way to handle this task without to spend months learning about linux administration like shell scripts, finding the best place for configs, logs on different linux distros, daemons setup, etc. Something simple and distro independent would be fine...

I am a Linux developer and the single thing that stops me porting my apps to Windows is an easy to use deploy method. Is there some good way to handle this task without to spend months learning about Windows administration like installers, MSI, the registry, logging, services?

The non-flippant answer is to just provide the source and let the distros package it for you. It's a different model. Linux users want to get their software through an integrated package manager, and volunteers will take your software and do all the work needed to make that happen.

You can typically deploy a Windows binary to targets that use much older versions of Windows, and you cannot say the same for Linux distributions. There is significant backward-compatibility with Windows binaries, and it is intentional. It provides for a much better ROI for applications that need to be around for more than a couple of years.

> You can typically deploy a Windows binary to targets that use much older versions of Windows, and you cannot say the same for Linux distributions.

This has nothing to do with the OS. A statically linked linux executable will likely run on any kernel from version 1 to version 4. The issue here is dynamic linking, and windows DLL hell has a name for a reason

I'm referring to the Windows system APIs, which do not require any sort of special treatment for any executable that is linked against them. This includes everything that is included with Windows, from networking to file handling to crypto to UI.

Which libraries are you including in your description of a statically-linked binary on Linux ?

Finally, Windows DLL hell hasn't been a thing since the early 2000s, and even then, it was primarily only an issue with applications dumping shared libraries into the Windows system directories.

With something like AppImages you get pretty much eternal backwards compatibility. The kernel _never_ breaks backwards compatibility with userland, and the AppImage pulls in all other dependencies.

Not if you need to communicate with other user space services, such as:

* systray icons/application indicators

* newer fontconfig versions support new config options which results in fonts being broken for apps that use older versions

* I think there is some issue around glibc locales handling if you use a different glibc

* systemd/logind/... DBus apis

Those were just some external interfaces that came to my mind. I'm sure there are many more.

It is very easy on Windows to create a statically linked executable that you can just copy to wherever you want.

And it's very easy on Linux to provide a source tarball and let the distro packagers do all the work for you. Plus, you know, static linking exists on Linux too, it's not something that only Windows can do.

Distro packagers do nothing for you if:

- Your app isn't licensed in the way you like

- You don't want to provide the source code to your app

- They don't find your app interesting enough

- They don't agree with your app's policies

Static linking is a much better response for actually getting the program running on Linux systems but this still leaves out the "is there a way I can package it once for Linux" problem.

Now if it's a CLI only utility/service type tool it's a different story.

Fine, build your own package.

  # dpkg-deb --build ./src/ mydeb.deb
Stick your dependencies etc in ./src/DEBIAN/control

Or just distribute a tar with your application.

I used to have to distribute a cross platform java progam. It was horrendous creating the MSI on windows. The linux and osx versions were trivial

Java? Shouldn't that just be JAR + JVM?

Called native executables too, launch icons/shortcuts etc, jvm version detection

Deployed by apt/yum/mdm andslme crappy windows thing. The latter was the problem.

Every night we have a job (run by jenkins I think) download the latest ffmpeg code, add some non-approved patched, and cross compile to static binaries for x64 linux, osx, and windows (we've dropped the 32 bit linux build)

It's not hard to make a static binary.

On the other hand most of the "software" I write - mainly shell scripts - includes things like apache configuration, requires other applications (like apache, iperf, tcpdump, lldp, etc)

I want to deploy these to multiple boxes, I want them versioned, I want them easy to update, I want to know what version is installed. All of this is handled by a 10 line config file in a .deb.

Except, if you need to pay rent and put your kids through college, shipping source tarballs is going to significantly reduce revenue for a small developer. (It's sad, but it does.)

A few years ago I tried to compile a C++ project into a static binary which would work on both Ubuntu and CentOS.

Long story short, it wasn't possible. Because the latest versions of Ubuntu and CentOS at the time used different versions of glibc which were mutually incompatible.

If you statically link, it doesn't matter what version of glibc is on the target machine, because it's not used.

You would think so, but it's not true. I struggled with this for a few days before giving up (after finding articles which stated that this is really going where nobody went before)

> Static linking of glibc is not supported on Red Hat Enterprise Linux.


I don't pretend to know a lot about linking files, but I just checked an old static ffmbc binary I had lying around

  :/tmp$ file ffmbc 
    ffmbc: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), 
  statically linked, for GNU/Linux 2.6.15, BuildID[sha1]=...., stripped
  :/tmp$ ldd ffmbc 
	not a dynamic executable
It runs on the following libc versions:

  ldd (GNU libc) 2.12 (Centos 6.9)
  ldd (Ubuntu EGLIBC 2.19-0ubuntu6.14) 2.19 (ubuntu 14.04)
  ldd (Ubuntu GLIBC 2.23-0ubuntu10) 2.23 (Ubuntu 16.04)
I'm fairly sure it ran on distributions as old as ubuntu 8.04

You're supposed to bundle a fixed version of glibc with your application.

> just provide the source and let the distros package it for you.

And then sit and scratch your head over why software vendors don't sell a version for your OS.

I work for Red Hat so ... I'm not scratching my head. We sell $3bn+ of operating systems and other software annually.

Is your application open source?

If so, I wouldn't worry about packaging for every distro. Just make sure that your application isn't difficult to build, and most things (paths, etc.) are configurable. The config path can be handled with a command line switch.

Once you've gotten your application so that it can be built easily, I'd only really worry about packaging it for your distro of choice. If people are interested in your application, it'll get packaged for their distros.

If your application is proprietary, I wouldn't even worry about packaging it yet. Getting most Linux users on non-essential proprietary software will be an uphill battle.

AppImage⁰ seems to be what you're looking for. XARs maybe, but I don't have experience with them to recommend them.

⓪ - https://appimage.org/

I must say as a long time windows developer, .net core on linux is much simpler than on windows. I'm moving everything I have from windows to Linux and the experience, simplicity, speed, stability, etc are much better on linux.

I guess of you're working with gui that's a different story. But for .net websites and background servers, simply use docker on linux and never lool back. And it's much simpler than docker for windows too.

For desktop applications, Flatpak does this. Otherwise, containers or the new "portable services" feature of systemd. Thanks to systemd, there are standardised ways of doing a lot of things, but yeah, distributions do vary, so you really need an abstraction, or target just a few of the popular distributions.

What you want is snapcraft.io

IMO AppImage is better for his needs. Snap, like Flatpak, are as much distribution systems as they are packages and they are not widely enough adopted yet to make packaging for each worth it vs just providing the binary to download yourself.

Is Facebook an NIHS (Not Invented Here Syndrome) sufferer?

Indirectly this proves to be a useful discovery mechanism for me - when tools crop up on HN I think ‘hey that’s interesting’, then oftentimes when I read the comments I find there are numerous existing solutions I had never heard of along with helpful links and insightful info :-)

It’s brilliant. It’s one of the reasons I value this site so much.

Sounds similar to TCL's StarKits

I wonder how this compares to AppImages https://appimage.org/

- AppImages: Linux apps that run anywhere - XARs: packages Python (node.js, lua scripts app) into executable files

Thanks for providing that summary. Does this mean the two could be used together?

From my understandings:

- AppImage: it packs Linux apps into a tar file so you can unzip it later and run the executable of the app. The main selling point of AppImage is it's distro-independent

- XAR: it packs dynamic languages programs (python, node.js, lua) into a executable file. The executable includes the language runtime, a fuse filesystem to mount the program's source code.

In conclusion, I don't think you ever needs to make an AppImage for XAR executable file.

Not exactly: An AppImage file packs a Linux app as a compressed ISO image with an ELF preamble that is able to access the contents of the ISO image without unpacking, so you can just double click the package and run the app without ever unpacking it.

Is there anything XAR can do that AppImage cannot?

There are very complex Python applications (e.g., Ultimaker Cura) which are packaged in the AppImage format as self-standing single-file executables, including the Python interpreter, libraries, and other resources.

Does the XAR file containing the Python executable itself, or, does running it rely on having Python installed on the host already?

The current Python XARs rely on Python being on the system path. But it would be easy to build a custom Python XAR with the XAR builder library that includes the Python executable and makes sure to use the packaged executable.

Have you looked into Habitat? It provides a similar result with a complete build workflow that works across technologies and platforms: https://www.habitat.sh/

There's a rapidly growing library of libraries and services packaged with it: https://bldr.habitat.sh

Its build artifacts can be exported to a number of formats including container images and tarballs, maybe a XAR exporter could be built: https://github.com/habitat-sh/habitat/tree/master/components...

Seems like Habitat (which looks awesome by the way) relies on Docker. Which, if you consider performance heuristics in the article (size, cold/hot start time), may be a non-starter for what they're trying to do.

It does not rely on Docker. On a Linux system you can use the build tools and there's no dependence on Docker at all, it uses chroot's for its isolated "studio" build environments. The mac and windows habitat clients use Docker to create a Linux-like environment

One of the key innovations in habitat is that it gives you reliable dependency isolation WITHOUT needing the runtime isolation of containers to achieve it. Containers become optional.

A habitat build artifact can be installed and run natively on nearly _any_ POSIX system, the state of the system doesn't matter and you get the same behavior top-to-bottom. There are a number of "exporters" available to repackage build artifacts into different runtime formats -- Docker container is just one option

Can you expand on that? In my experience nothing about Docker implies a performance impact in terms of size or start time.

Well, there is the overhead of creating and removing namespaces each time a container is ran, or communicating with the Docker daemon.

I think to most people it would be negligible, but fb operates at a scale where these normally insignificant pieces matter. I would be interested to hear more about the _why_ of a system like this over containerization.

edit: rwmj's comment has a good discussion over the benefits of this over containerization.

I promise you 100% the overhead is docker and nothing else.

How similar is this to Google PAR/SAR executables for Python and Bash scripts respectively?

Facebook's PAR is a self-extracting zip file, I assume Google's is similar. XARs are self-mounting SquashFS archives (a compressed read only filesystem). This means that XARs don't have to be extracted to a temporary directory to run, they can run in place. Zip files have to be completely extracted before running, but SquashFS decompresses pages on the fly, so startup times are much faster (especially with zstd compression).

I don't think Google's implementation of their hermetic par files are open source.

It's somewhat counter-intuitive that start times with XAR are lower than start times without it. Is fuse faster than a kernel filesystem? Even with compression?

FUSE isn't generally lighter weight than a filesystem but it can be relatively competitive for simple use cases like a read-only filesystem. Additionally, squashfs lets you pack metadata and data very tightly, and since it is a readonly filesystem, has some optimizations normal filesystems can't (how data is placed, overhead of managing metadata operations, etc). Also squashfs lets you choose how the files are laid out and compressed so that all files of a certain type, such as all .pyc files, are close together, which increases compression ratio and reduces overhead for subsequent file accesses (i.e., can reduce random disk or flash IO).

In practice the timings of XAR vs filesystem are close enough to be "in the noise" -- it's when compared to PEX or PARs that the difference is quite large.

Makes sense, this is somewhat analogous to the import speedup you can get by putting all the python modules into a zip file. I tend to do that when distributing python applications on windows, where the speedup is more noticeable.

Yep, it's similar, but squashfs is more optimized than zip files for random access like a filesystem (rather than an archive). Also when using zstd-based squashfs files, there is much less overhead for the decompression itself which effectively becomes free.

I spent some time today investigating what exactly is causing the difference between native and XAR start times. I confirmed the culprit is `pkg_resources.load_entry_point()`. Modern installations using wheels should avoid this overhead, and those native installations will be slightly faster than XARs:

black: 0.171 s (vs 0.208 for XAR) jupyter: 0.165 s (vs 0.179 s for XAR)

My test setup used the older loading method because "pip install ." won't install wheels if the wheel package isn't installed in the virtualenv.

Admittedly I haven’t profiled this yet, but my guess is it is a constant overhead of setting up pkg_resources that the native code uses to load the entry point.

The test against native start speed was hot, so the pages required were already in the page cache, so the filesystem shouldn’t matter.

I'm on mobile, but do you have an example of bundling a node app somewhere?

I'm curious how it compares to using something like pkg [0].

[0] https://github.com/zeit/pkg

We currently don't have a nice open source API for building node apps, but would welcome PRs that get us in this direction!

There are two ways to build a node app using the XAR builder tools. 1. Use the `make_xar` tool which will create a XAR from a directory and takes an optional script to run on execution. 2. Use the XAR builder library to make a XAR builder that is specialized for building node apps.


Python >= 2.7.11 & >= 3.5

you need both?

Facebook spent a decade contributing virtually nothing to open source, now they're flooding the world with random projects of varying and questionable value - most developed as a result of Facebook's severe N.I.H. attitude. I'm honestly not sure which is worse.

XAR is a simple way to package and deploy Python and similar apps. Dependencies are a real issue. Anything to improve the situation and to also deploy related files is a great idea.

I don't like a lot about what Facebook does with user data and marketing but their support of open source is better than many companies. Give credit where it is due.

Kind of an unfortunate name, considering that xar (eXtensible ARchive) is already a thing: https://en.wikipedia.org/wiki/Xar_(archiver)

I'd hazard to say that almost any 3-letter abbreviation has been already taken, many of the easy-to-pronounce ones, multiple times.

Oh there are some fabulously childish three and four letter abbreviations available yet. Probably not very appropriate, though.....


.bug is apparently free. Might make way for more apologetic file names

    $ ./good_program_i_promise.bug

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact