Now the only thing jails can't have is their own IP stack. Jails share the hosts IP stack which improves efficiency and simplifies most deployments, but it prevents them having administrative access to the IP stack. There is an experimental kernel feature (VIMAGE) to jails to run their own instance of the IP stack but there are still some nasty bugs hiding in this code, because nobody thought about how to tear down the IP stack. After all it was initialised once during the boot process and kept running until the power went out.
The largest difference is in the mindset behind jails. Jails are designed as secure operating system level virtualisation. Docker on the other hand is fairly fragile and offers neither secure isolation between containers nor between containers and the host. Jails can contain a complete userland and this a very common setup.
A full FreeBSD userland + some ports/packages to make it useful is about one 1GB. This used to be a lot 15+ years ago when jails where created and the older jail management tools like ezjail reduce the per jail storage requirements with nullfs and unionfs hacks. These days 1GB isn't that much for a simple container and most FreeBSD servers run on ZFS. ZFS offers a much simpler and cleaner way to reduce storage requirements: just clone a snapshot (the template) create a new jail and copy a few config files into the clone. The only problem is that you can't rebase your clone.
Docker is designed around the idea of single purpose containers without stable storage. You can use jails to implement this idea, but FreeBSD jails support more than that. Also all the FreeBSD jail managers I used try to stay out of your network configuration as far as possible and at most configure alias IP addresses on existing interfaces.
Docker is very opinionated software fighting against limitations imposed on it by the Linux kernel. Jails are FreeBSD kernel feature touching multiple parts of the kernel with a minimal userland interface in the FreeBSD base system . Multiple higher level jail managers are available in the FreeBSD ports tree.
There is no reason why you couldn't implement a docker like jail manager and the jetpack projects started doing exactly this. Keep in mind that docket images are the worlds new statically linked binaries for people who can't figure out how to define and reproduce the relevant parts of their development environment in their production environment. Executing existing docker images with their Linux binaries would probably require a massive update to the Linux compatibility layer (a reimplementation the Linux syscall ABI).
I hope we'll get VIMAGE into GENERIC for FreeBSD 11.1. He completely refactored the IP stack shutdown logic. Now the layer are shut down from top to bottom instead of the other way around. This avoids most of the nasty locking problems draining the higher levels with their pointers to the lower layer resources (e.g. routes, interfaces) first. The remaining bugs won't be found without a lot more exposure.
Please elaborate... Otherwise, great summary.
There have been lots of bugs (some could be considered design failures) allowing processes to escape from a docker container. Fixing those problems hasn't received the attention from the docker community I would expect from a serious OS level virtualisation community. In some cases it boiled down to "yeah just don't do that" or "no problem just put docker inside a VM".
Why is this NEVER brought up when FreeBSD and linux are compared?
You can prevent root inside a jail from modifying files with several mechanisms. The simplest is to not allow (re-)mounting from inside the jail which is the default and mount the relevant file systems read-only. BSD extended flags offer a finer granularity and by default root inside a jail isn't considered privileged by chflags(2). You can disable the security feature in which case the normal rules apply and access is controlled by the secure level (jails have their own secure level).
Processes running inside a jail have access to the full UID and GID namespace without any mappings.
PIDs exist in a global namespace and are allocated by the kernel. Processes inside a jail can't address PIDs unless they are inside the same jail or a child jail. Root inside the host can always address all PIDs. Non-root processes in the host (and jails) are subject to the security.bsd.see_other_uids and security.bsd.see_other_gids sysctls which can be disabled to protect the obvious ways to spy on other users. There is also support for various forms of mandatory access control. Have a look at the `mac_*` manual pages for more information on the available MAC disciplines.
Restricting the jails to just their processes in a shared namespace is a much cleaner alternative to loosing the PID as unique global process identifier. And mapping between different scopes is even worse than extending the identifier from an integer into a tuple.
FreeBSD also supports hierarchical resource limits. Jail ids are one possible subject type to limit. That way you can limit resource consumption per jail. See https://www.freebsd.org/cgi/man.cgi?rctl and https://www.freebsd.org/cgi/man.cgi?rctl.conf for more details on that.
Am I not seeing it because I am not looking hard enough, because FreeBSD guys don't really understand this whole issue, or because it's not there?
Re: resource limits: can the CPU load per jail be limited?
Yes you can limit resources per jail through hierarchical resource limits.
Does this answer your questions or did I missunderstand your question?
Resource limits for Jails can be done with rctl
If anyone is interested, please contribute to the Open Container Initiative. https://github.com/opencontainers
If I had to move away from iocage right away, how would I preserve my jail? I don't really care what the jail is called - I just want Plex or whatever to start up when I start the jail, and to be able to keep upgrading packages inside it.
If anyone knows a updated status on the Go rewrite, please chime in!
Lets say you are a webhoster around the year 2000 and your servers have a handfull 36GB or 72GB SCSI disks. You want to protect each customer from all other customers and protect yourself from all customer scripts. This was before IA32 CPUs offered the features to support efficient transparent virtualisation and even if they did the resource demand per VM would have been too high. As long as your customers are happy with static file hosting everything is fine, but as soon as way want to execute some useful server side scripts you have a problem. FreeBSD offers a way to run one HTTP server per customer inside jail, but keeping a full FreeBSD userland (base + http server + databases + scripting language + customer code) per customer would quickly fill your puny little disks. Ezjails offers a neat solution to the problem: store a template just once and instantiate it with a nullfs read-only mount. Now your storage requirements are manageable at a reasonable price with hardware of the day and your buffer cache hitrates are better too. All of these indirection and aliasing hacks make ezjail more complicated than modern jail managers, because ezjail had to work around the operating system limitations instead of taking advantage yet to be invented operation system features.
Sharing one basejail via nullfs is useful feature, can Iocage do this?
Nullfs not only allows to save space on disk but also allows faster updates (extract new basejail then switch all jails to it, without full upgrade of each jail).
Also in software old doesn't mean bad, and newer is not automatically better.
The iocage shell script was totally unmaintainable. I know because I forked it and used it for my own purposes, until I stopped and wrote my own thing (coincidentally, also in Go). Implementing state machines correctly in shell script is painful.
It's a little overwhelming though. FreeBSD really needs more/better jail and bhyve tools.
The distinction between jails and basejails is tricky to follow.
I don't know where the rewrite went.
At the time I had to decide ezjail didn't work with FreeBSD 10, not sure if it has been updated.