
Binctr: Static, unprivileged, self-contained containers as executable binaries - GordonS
https://github.com/genuinetools/binctr
======
jclarkcom
I worked on something similar for Windows for nearly 10 years, called
Thinstall (acquired by vmware and renamed to Thinapp). I love the idea of
something like that for linux.

We put a lot of effort into being able to run an application directly from a
compressed single-file binary, you have a small executable with a big payload
attached to the end of it. The payload essentially contains a mountable
filesystem, however because windows doesn't support doing anything like that
from an unprivileged account we had to emulate many things Windows normally
does, including execution of binary images.

We did some tricks where executable data was stored on disk in a format it can
be directly memory mapped and run without reading anything - this allowed us
to launch most applications over a network without extracting anything to disk
locally and achieve millisecond launch times. Pages would get pulled into
local memory by the OS as needed while the application was running.

The end result was that you can create a single EXE file could contain things
like python + script, or more complex like photoshop & Word. Just go to a new
computer from a guest account, launch the EXE and you instantly start using
that app.

I haven't been involved with it for the last 6 years, but I believe vmware
still offers it.

~~~
zozbot123
Or, you know, you could just uncompress the application in a newly-created
subfolder of C:\WINDOWS\TEMP\ and launch the binary that is to be run
afterwards (which of course is conveniently named SETUP.EXE). The way it's
always been done.

~~~
fooker
How do you stop the app from permanently writing stuff to the registry or
filesystem?

~~~
jclarkcom
Thinapp solved this by presenting a synthetic view of the filesystem and
registry to the app. This view merged the contents of the real system with the
contents of the package, so the app thought everything was were it expected
when it runs. In addition, we provided a copy-on-write sandbox of this view,
so any writes would be redirected to a local folder without the app knowing
about it. If you delete the sandbox, the app goes back to it's "first run"
state.

~~~
hashhar
It sounds a lot like how Flatpak and friends work. Would be interested if
there were a few detailed blog posts about it?

------
pritambaral
I see that it extracts files into `/tmp/...` inside the container first before
running it. It'd be better if it could read the files directly from the image,
say, like a self-executable squashfs image.

~~~
zozbot123
Wouldn't in-place decompression for a "self-executable" squashfs require one
to disable NX for that binary? That seems like it would be a terrible idea
from a security perspective. (Even for pure data, in-place decompression seems
like it would heavily interfere with e.g. demand paging of such data into RAM,
using the binary itself as backing store. Disk space is so cheap these days -
I don't think things like squashfs should be used nowadays unless truly
necessary.)

~~~
exrook
NX does not need to be disabled[1] for the binary thanks to W^X[0]. As long as
pages are never mapped as writeable and executable at the same time, you don't
lose any protection.

[0] [https://en.wikipedia.org/wiki/W%5EX](https://en.wikipedia.org/wiki/W%5EX)
[1] AFAIK, on Linux NX isn't something that is enabled or disabled for
individual programs anyways, it's up to the userland whether or not it takes
advantage of the mmap(2) mapping flags

~~~
bdonlan
In general, you lose some protection, as a two-stage exploit might write some
data in W state then flip to (or wait for) X state for execution. That being
said, a self-decompressing binary would only do this at startup, before it
consumes untrusted input, so given a way to drop map-executable permissions
that wouldn't be a problem.

------
mroche
> Well judging by the original GitHub issue about unprivileged runc
> containers, the largest group of commenters is from the scientific community
> who are restricted to not run certain programs as root.

Which is why we use Singularity[0]. Apologies if I’m missing something, but to
me this problem they’re trying to solve, has already been solved. As stated in
the README:

> Singularity is an open source container platform designed to be simple,
> fast, and secure. Singularity is optimized for EPC and HPC workloads,
> allowing untrusted users to run untrusted containers in a trusted way.

Singularity ‘requires’ root or elevated privileges to create and modify
containers, but running them is done in the user’s namespace with that user’s
permission set. And privileges can’t be elevated during the containers runtime
as there are kernel level catches that prevent sudo and su from working inside
the container.

In terms of the ‘static’ part, Singularity uses SquashFS to produce a single
binary image, and uses the SIF (Singularity Image File) format to define
applications, their dependencies, and runscripts. On top of being able to run
anything inside the container with the `exec` subcommand.

You can also create sandbox development images that are just a directory on
disk for testing things out and installing/building tools manually, then
convert it into a SIF container after for cluster deployment. Or be nicer and
port your process into a recipe/definition file.

Most scientific/HPC related centers have started utilizing Singularity for
their container workflow. Tools like Docker are essentially banned because of
their perma-root privileges[1].

[0]
[https://github.com/sylabs/singularity](https://github.com/sylabs/singularity)
[1] I’m an admin who reimplemented Singularity on our university’s cluster.
With a heavy need on filesystem integration (which Singularity solves with
great configuration options), containers which run with root privileges would
be a massive security vulnerability.

~~~
cyphar
I started the rootless containers project and have had long email threads with
the Singularity folks.

> Which is why we use Singularity[0]. Apologies if I’m missing something, but
> to me this problem they’re trying to solve, has already been solved.
> Singularity ‘requires’ root or elevated privileges to create and modify
> containers,

Well, Singularity came out at the same time as runc's rootless containers
support was developed -- so at the time it wasn't solved at all. It required
(and still requires for several operations and features) root privileges in
order to work (such as suid helpers in addition to explicit root
requirements).

It's not acceptable for a usecase where you cannot run anything as root (not
even installation scripts). That was the use-case, and rootless runc (and now
thanks to Akihiro and Guiseppe, rootless Docker and Kubernetes) in theory can
now work without the need for any suid helpers -- though these days there is
an increasing usage of newuidmap and newgidmap (which isn't mandated for the
rootless runc implementation).

At the time, LXC's unprivileged containers was the closest thing and it had a
few (mostly optional) suid binaries -- I wanted absolutely none.

> but running them is done in the user’s namespace with that user’s permission
> set. And privileges can’t be elevated during the containers runtime as there
> are kernel level catches that prevent sudo and su from working inside the
> container.

This last part really isn't revolutionary at all. All it really takes is a
syscall and three files to write to. The hard part is getting everything else
to work without root privileges.

~~~
mroche
Are you a dev on binctr? Is so, the following statements are based off an
assumption of yes.

> Well, Singularity came out at the same time as runc's rootless containers
> support was developed -- so at the time it wasn't solved at all. It required
> (and still requires for several operations and features) root privileges in
> order to work (such as suid helpers in addition to explicit root
> requirements).

Alright, but I wasn’t asking about a “back then” scenario, I was mostly trying
to figure out what this offers _today_. From a Singularity end-user’s
perspective, there’s only one function that requires elevated permissions, and
that’s building.

> It's not acceptable for a usecase where you cannot run anything as root (not
> even installation scripts). ... At the time, LXC's unprivileged containers
> was the closest thing and it had a few (mostly optional) suid binaries -- I
> wanted absolutely none.

Now I understand what this offers and it’s overarching goal is. In my specific
situation (which is the angle I’m looking at this from), that’s not really a
big deal due to the way we handle global and local software installation. I
understand this is not a universal situation.

> This last part really isn't revolutionary at all.

I don’t think I made it sound like it was revolutionary, I was just sharing
what a commonly used tool in HPC does for a security measure.

Based off of the binctr README:

> Create fully static, including rootfs embedded, binaries that pop you
> directly into a container. _Can be run by an unprivileged user._

My main confusion stems from that. In comparison to Singularity, it doesn’t
seem to offer anything new or noteworthy that would make me take a second look
at it. The first phrase just makes me think I’m saving myself a few keystrokes
as I wouldn’t have to type `singularity shell <container>` to shell into my
container of choice. It just looks like yet another container solution. I
still find these things cool (and their above my head in dev terms), but it
feels like programming languages, a new one popping up every so often.

I’m primarily looking to see what advantages (or differences in approach) this
offers over Singularity, Shifter, or Charliecloud, etc for my context. At
least in terms of security and efficiency.

~~~
cyphar
> Are you a dev on binctr?

No, but I'm a maintainer of runc and implemented all of the core features that
binctr uses (or rather, that Jessie hacked together for a proof of concept).
I've also implemented rootless support in quite a few other tools to the point
where now an unprivileged user can download, extract, and run a rootless
container. In addition, I'm working with some other folks who joined later to
get Kubernetes (and Docker) to be completely rootless.

> From a Singularity end-user’s perspective, there’s only one function that
> requires elevated permissions, and that’s building.

(Most forms of) execution still requires setuid binaries, which means that you
have to have root permissions in order to install Singularity. You can use
rootless containers as a __completely unprivilged user __(meaning that if you
have unprivileged shell access to a random box you can use containers).
Singularity cannot do this.

> I was just sharing what a commonly used tool in HPC does for a security
> measure.

My point was that basically any container runtime can do what you described.

> I’m primarily looking to see what advantages (or differences in approach)
> this offers over Singularity, Shifter, or Charliecloud, etc for my context.
> At least in terms of security and efficiency.

Many of those tools require setuid binaries that are developed as part of
their container runtime -- which means that all of the security is contingent
in no vulnerabilities in their setuid binaries (something that is hard to do
_even for the developers of programs like sudo_ ).

Rootless containers are a project that requires __absolutely no privileged
codepaths for any container operation __(though these days you can optionally
use standard setuid binaries -- by default they are not used). This is
something that none of the projects you listed (as far as I 'm aware) can do.

~~~
gnufx
Thanks for piping up.

Charliecloud specifically doesn't require setuid, but many HPC systems won't
have user namespaces enabled. (It removed the setuid component which allowed
it to work on RHEL6, for instance.) You can also easily build a root for it
under proot, for instance -- and proot itself is probably OK for running
computationally-intensive work. HPC systems already have a privilege
escalation mechanism available from the resource manager, but I don't know
there's a problem using it for this sort of thing. Shifter controls the
images, so you're not at risk from malformed filesystems, at least.

Definitely setuid in Singularity should worry people. It has a non-stellar
security record, and has been less than transparent about issues. The last
time I looked at the code for instance it still had many calls unchecked for
error returns (including all mallocs), and wrote uninitialized memory to image
files. It was a mistake to get it into Fedora, and HPC people should be a bit
more circumspect. That said, resource managers probably provide most attack
surface in HPC systems, in some case completely trivially.

By the way, it's often possible just to run programs from an unpacked
filesystem of another distribution just by setting PATH and LD_LIBRARY_PATH.

~~~
mroche
Not sure when you last looked at Singularity's code base, but it was
completely re-written this year into Go, so some of the issues you saw may
have been solved. Not sure, just letting you know if you haven't been
following its development.

> It was a mistake to get it into Fedora, and HPC people should be a bit more
> circumspect.

I never installed it from EPEL, as building from source was relatively easy in
2.x (trivially easy in 3.x) and allowed buildtime configurations specific to a
cluster. If you pay for support you get custom repos, but that's up to the
cluster maintainers to decide.

~~~
gnufx
I know it's been re-written, but I don't know why that would restore my faith
in it and Sylabs (for multiple reasons). It emphasizes the point about
inclusion in Fedora (and other distributions?) for which there's some
expectation of security and stability, even if you don't care about that.

------
zbentley
The container revolution appears to have come full circle and reinvented
static linking.

~~~
markbnj
Which language produces statically linked binaries that embed their own root
filesystem?

~~~
TheDong
Smalltalk, which distributes code as a vm.

Java jars aren't far off from having their own rootfs included along with the
compiled class files.

------
amelius
Why does the Linux kernel/Posix make sandboxing so difficult?

~~~
shawnee_
It doesn't, really. Since there are umpteen ways to do things in GNU/Linux,
developers often end up with less-than-ideal implementations for what they are
trying to do using a single user account.

The single-user approach creates clashes between the network devs who want to
build empires of containers they "own", and the stack-level bare metal purists
who want the system to be as clean and secure as possible by isolating things
where they should be isolated (to a single user instance for that purpose
alone). This is not a new problem, nor a very well thought-out solution.

Containers are __always __a less-than-ideal implementation for people running
Linux natively. The ideal way to sandbox in Linux is create a user account,
download and test whatever code, see what breaks or infringes with its unique
notion of "privileges", and delete the user when done.

But because you can't switch users on the same kernel when you're not running
Linux natively, we have containers and all the messes they create.
[https://developers.slashdot.org/story/12/12/29/018234/linus-...](https://developers.slashdot.org/story/12/12/29/018234/linus-
chews-up-kernel-maintainer-for-introducing-userspace-bug)

~~~
icebraining
I have to admit that I'm quite confused about this comment. Are you saying
simply running a command under its own uid is enough to provide the same
isolation that containers do, and that the latter were only created because
people are not running Linux natively?

~~~
g82918
I think he is saying there are a lot of 80% solutions like user based
isolation that could have been made more secure, but instead people invented a
new solution that has its own problems, and that the fractured landscape of
solutions we see now is due to the freedom of open source.

------
hkt
Strongly reminds me of UML: [https://en.m.wikipedia.org/wiki/User-
mode_Linux](https://en.m.wikipedia.org/wiki/User-mode_Linux)

~~~
rkeene2
I used UML to do this very thing 15+ years ago !

------
Annatar
So basically a full circle on running single binary executable processes, a
multitude of them, as was done on a UNIX system back in the '90's, only with
more complexity (this effort is an attempt to reduce that complexity, which in
and of itself is telling).

And still nowhere near the flexibility, security, the power and simplicity of
illumos zones and SMF.

~~~
p2t2p
Well, you can't statically link bunch of python scripts. With this one you can
do it easily (relatively).

~~~
progval
Yes you can: eggs and wheels.

~~~
filmor
Still needs Python itself.

------
nwmcsween
I'm not too excited about embedding the data (rootfs) within the image. Have a
.preinit_array section that runs a seccomp-bpf filter, unshare, etc with
filter, unshare, etc data in their own sections (this allows zeroing a section
to get original behaviour)

------
cat199
how does this compare to:

[https://en.wikipedia.org/wiki/Singularity_(software)](https://en.wikipedia.org/wiki/Singularity_\(software\))

~~~
TheDong
Requires set-uid, it's not fully unprivileged containers.

------
hden
The same idea two years ago:
[https://github.com/bfirsh/whalebrew](https://github.com/bfirsh/whalebrew)

~~~
cyphar
This project was written in 2016 too.

------
equalunique
The advantages of this are kind of like two security features recently added
to OpenBSD: pledge & unveil

------
sandGorgon
Isn't CNAB doing this (and more) and is going to be more widely adopted?

[https://github.com/deislabs/cnab-
workshop/blob/master/conten...](https://github.com/deislabs/cnab-
workshop/blob/master/content/01-what-is-cnab.md)

~~~
solarengineer
Context: [https://cnab.io](https://cnab.io)

