Hacker News new | past | comments | ask | show | jobs | submit login
Docker on FreeBSD (freebsd.org)
232 points by zdw on July 18, 2015 | hide | past | favorite | 66 comments



The FreeBSD people seem to be on a roll when it comes to porting things. They mention a 64-bit Linux compatibility layer they recently rolled out in this post. But in addition, some iXsystems employees have ported much of the OSF Mach kernel (sans memory object/external pager interface) as a module, plus partial or full implementations of XPC, libdispatch, ASL (Apple System Logger), liblaunch and other facilities just so they can run the low-level Darwin/OS X userspace and launchd, especially. FreeNAS then wrote Cython bindings to all this for use in their system services. This is all in feature branches atm, but it should be rolling out soon enough.

It's insane.


To be honest, I think OpenBSD is ahead on some fronts (research on security mechanisms, for instance), but the reason that I tend to use FreeBSD is that it actually works with software packages.

So the computers are based on FreeBSD (which manages the hypervisor -- and soon container daemon!), and appliances only are based on OpenBSD.

Compatibility ftw.


> it actually works with software packages.

Well I don't know about that. I can't open Settings in Chromium at all anymore, the whole browser crashes. It also constantly uses 100% CPU for something so with Chromium open my load is always >1.

In VirtualBox none of the file open dialogs work.

If I switch to a console from X the whole screen glitches and gets stuck, I have to SSH in to restart/stop X.

Suspend doesn't work for me.

So yeah, it doesn't work on the desktop as well as Linux does. Haven't had any problems with it as a server though.


The Chromium issue was fixed recently. Try updating your Chromium to at least 43.0.2357.132.

The VirtualBox issue is long standing, it's caused by the fact that vbox is setuid. Try this:

env KDE_FORK_SLAVES=1 VirtualBox


Thanks, that fixed it.


There is a bug in the kernel that returns the wrong error message when Chrome/Chromium attempts to get a variable. This will be fixed in 10.2.

See: https://twitter.com/cperciva/status/619969753566744576


Oh and about the X thing, do you have a Haswell processor? And are you using the VGA driver for X? I had that, bought a $20 ATI card, works fine now, 3D acceleration in KWM and everything.


Sandy Bridge actually, i7 2600k. I have an ATI card too, I might try that. Wasn't too sure how well those work on FreeBSD.


I would go with nvidia - they work beautifully on FreeBSD IME.


This part of this thread reminds me of Linux circa 1997.


Welcome to a completely volunteer run effort.


Virtualbox - sounds like some kind of ports problem, maybe GTK vs Qt? Works for me on PC-BSD and FreeBSD -CURRENT under KDE.

Suspend/resume should be working pretty well in -CURRENT, unfortunately 11.0 is still a ways out if you want a -RELEASE.


I use OpenBSD and haven't found software to be an issue. Certainly mainstream stuff such as Chromium, Firefox, LibreOffice, VLC, Evince, Gimp, Inkscape... all works.


The only piece of software I really miss on OpenBSD is the Android SDK, other than that it has everything I use and it works great. So once again I want to tell everyone reading this, if you haven't tried it and if your hardware is supported I invite you to give OpenBSD a spin.


I've been looking at OpenBSD but the partitioning setup (fdisk) is off-putting. Why are we counting heads/tracks/cylinders in 2015? Also, the labeling thing is weird but I could live with that.


> research on security mechanisms, for instance)

Citation(s) requested.


http://youtu.be/OXS8ljif9b8

Edit: First 5 Minutes De Raadt speaks about Mitigation Techniques in comparison to other OSs, including FBSD. I fear, this is still true in mid-2015..


FreeBSD has a lot more energy and community going for it, improving things, that makes it a better bet in the long run.

In the security area, for instance, there are very good things coming from the TrustedBSD and HardenedBSD branch, getting ported back to FreeBSD.

http://www.trustedbsd.org/

http://hardenedbsd.org/

https://en.wikipedia.org/wiki/FreeBSD#Security


Can you provide further details or examples of how these two projects did or do improve FreeBSD's Security? I know for example about ASLR in FBSD 11 coming from Shawn Webb et. al. from the HardenedBSD project..


I seriously doubt that ALSR is coming in FreeBSD 11.


Canaries and other techniques that leverage volatiles do not prevent an overflow; they just try to cope with the consequences of an overflow which has happened. (This is why they're generally lumped together as "mitigation".) The canary tries to detect the case of an overflow which overwrites the return address in a stack frame. Data Execution Prevention (DEP) takes this idea a step further, it assumes that the return address has been overwritten and followed, and it restricts the areas where execution could jump. ASLR is yet another step further: it "shuffles" the areas where execution is allowed.

More specifically, stack canaries work by modifying every function's prologue and epilogue regions to place and check a value on the stack respectively. As such, if a stack buffer is overwritten during a memory copy operation, the error is noticed before execution returns from the copy function. When this happens, an exception is raised, which is passed back up the exception handler hierarchy until it finally hits the OS's default exception handler. If you can overwrite an existing exception handler structure in the stack, you can make it point to your own code. This is a Structured Exception Handling (SEH) exploit, and it allows you to completely bypass the canary check.

DEP and NX (what OpenBSD calls W^X) mark important structures in memory as non-executable, and force hardware-level exceptions if you try to execute those memory regions. This makes normal stack buffer overflows where you set eip to esp+offset and immediately run your shellcode impossible, because the stack is non-executable. Bypassing DEP and NX requires a trick called Return-Oriented Programming (ROP).

ROP essentially involves finding existing snippets of code from the program (called gadgets) and jumping to them, such that you produce a desired outcome. Since the code is part of legitimate executable memory, DEP and NX don't matter. These gadgets are chained together via the stack, which contains the exploit payload. Each entry in the stack corresponds to the address of the next ROP gadget. Each gadget is in the form of instr1; instr2; instr3; ... instrN; ret, so that the ret will jump to the next address on the stack after executing the instructions, thus chaining the gadgets together. Often additional values have to be placed on the stack in order to successfully complete a chain, due to instructions that would otherwise get in the way.

The trick is to chain these ROPs together in order to call a memory protection function such as VirtualProtect, which is then used to make the stack executable, so your shellcode can run, via an jmp esp or equivalent gadget. Tools like mona (https://github.com/corelan/mona) can be used to generate ROP gadget chains, or to find ROP gadgets.

There are a few ways to bypass ASLR:

Direct RET overwrite - Often processes with ASLR will still load non-ASLR modules, allowing you to just run your shellcode via a jmp.

Partial EIP overwrite - Only overwrite part of EIP, or use a reliable information disclosure in the stack to find what the real EIP should be, then use it to calculate your target. We still need a non-ASLR module for this though.

NOP spray/sled - Create a big block of NOPs to increase chance of jump landing on legit memory. Difficult, but possible even when all modules are ASLR-enabled. This won't work if DEP is switched on.

Bruteforce - If you can try an exploit with a vulnerability that doesn't make the program crash, you can bruteforce 256 different target addresses until it works.

Again, the important theme here is that canaries, DEP and ASLR do not defeat overflows themselves, but target the generic overflow exploit methods which have traditionally been employed. The arms race between attackers and defenders in this space is becoming too specialized and increasingly, misses the point.

Additionally, PIE (required for ASLR) has a negative impact on performance: https://nebelwelt.net/publications/12TRpie/gccPIE-TR120614.p...

Additional reading:

https://www.corelan.be/index.php/2010/06/16/exploit-writing-... https://www.corelan.be/index.php/2009/09/21/exploit-writing-...

Or if you're more academically oriented: http://www.scs.stanford.edu/brop/ http://people.csail.mit.edu/rinard/paper/oakland15.pdf

As feld indirectly points out, Capsicum is a much (much) better technology, because it traps the exploit (in a sandbox).

Capsicum extends file descriptors to include the notion of what you are allowed to do with the file. They already have some limited support for this. If, for example, you specify O_RDONLY to the open() system call, then you will get an error if you try writing to the resulting file descriptor. This is largely advisory: There is nothing stopping you from using fstat() to get the original path, and then opening it in a new mode.

This is where Capsicum enters the picture. After a call to cap_enter(), the program is in capability mode and is not allowed to create any new file descriptors via most of the standard mechanisms.

In particular, system calls like open() and socket() will simply fail. This has the advantage that it's a very simple test to perform and therefore quite easy to get right: Just check one flag and give up if it's cleared.

Capability file descriptors behave just like normal ones. You can pass them to any system call that expects a file descriptor, but you may get an error if you don't have the correct rights. These include read and write permissions—and also a variety of other things.

Edit: Theo seems to be cautiously boarding the capabilities train with tame (https://marc.info/?l=openbsd-tech&m=143725996614627&w=2), introduced today.

But Capsicum, Linux's seccomp-bpf (which Theo describes as 'insane') and OS X's seatbelt are all similar. Gaol (https://github.com/pcwalton/gaol) uses either seccomp-bpf or seatbelt as a backend.

Windows 8 has an equivalent of this, using a "mitigation policy" called ProcessSystemCallDisablePolicy, which is set using SetProcessMitigationPolicy(). Chrome uses this for sandboxing on Windows (https://src.chromium.org/chrome/branches/1312/src/sandbox/wi...) and uses seccomp-bpf on linux.

See also: Solaris' Role-Based Access Control and Privileges models. http://www.c0t0d0s0.org/archives/4075-Less-known-Solaris-fea...


Thank you very much, Gonzo. This is one of the most insightful comments i have read here on HN.


You are welcome, and thank you!


It's like nobody has ever heard of Capsicum


I believe capsicum is/will be used for sandboxing sshd, ping and tcpdump. Furthermore i know about Security Appliances making use of it, but that's about it, to be honest.


Google uses it extensively. But really, it's the future of security because it's proactive not reactive.


Anyone interested in a quick introduction to capsicum: https://www.youtube.com/watch?v=GI9PmtF9jdM


Apple uses it, too.


Removal of gets() from libc.


Thanks, Ted.




As noted below, Theo seems to be cautiously boarding the capabilities train with tame. That said, there appear to be some rather large issues with the implementation as it stands.


tame(2) seems really ad-hoc. Also, isn't the path checking, like

  strncmp(path, "/tmp/", 5) == 0) {
trivially bypassable with a something like /tmp/../usr/bin ?


The bit about adding in Mach/OS X support is really interesting. Any idea specifically what they're looking to run?

It's intriguing on its own and could provide a basis for reimplementations of CoreFoundation, AppKit, Quartz, etc, enabling the creation of FreeBSD-based "open OS X" of sorts.


>just so they can run the low-level Darwin/OS X userspace and launchd

OP mentioned the "why", it was just oddly worded.


That would be interesting to see. There's GNUstep and Darling Project, but they're both a bit different.


Is this a really long running project? According to [1], GCD (libdispatch) has been there since 8.1. Or is there another implementation being rolled?

https://en.wikipedia.org/wiki/Grand_Central_Dispatch


Yeah, I think this one's meant to run nearly unmodified because of an OSF Mach layer being supplied, rather than translating source code from Mach to POSIX.


Porting that much of Darwin does seem insane if the goal is just to have launchd. Is there more to it than that?


It's not just to have launchd. They really want to leverage the low-level userspace that OS X uses for dynamic systems and event handling.


Do you happen to have a link to the XPC port? I just did a cursory search but couldn't find anything.


It's in Kip Macy's work branch: https://github.com/kmacy/NextBSD

/lib/lib[asl, dispatch, launch, mach, osxsupport, xpc].

The XPC stuff is mostly Jakub Klama's work, I think.


Kip is allowed to use a computer in prison? Or was he recently released?


I have no idea what happened to "The Landlord" on that front. Story just about died in 2013, there was a brief segment on Good Morning America in 2014 and that was that. He works for iXsystems and even gave out some talks. It's not like I'd bother asking him.


An article around the time of sentencing (mid-2013) said he'd be out in about a year,

> Kip and Nicole Macy agreed to a plea deal of "four years, four months" in state prison. If a judge approves it next month, they could be out in a year, with time served.


Awesome thanks!


On a related note, if you've got a free hour, there was a recent talk at BSDCan 2015 by Maciej Pasternacki on Jetpack, a container runtime for FreeBSD

https://www.youtube.com/watch?v=8phbsAhJ-9w

https://www.youtube.com/watch?v=kJ74mgkzLxc


I just assumed we would not do docker because we have had jails for so long. In fact, ZFS and jail is docker no? I admit no familiarity with docker (played with LXC years ago and thought - gosh it's like jail! :-)


Docker is a management tool for the underlying jail-like features implemented by the kernel (cgroups, mainly). There's no reason why it can't be used to manage *BSD jails, ie., create jails from Dockerfiles, download jail templates from a public repository, etc.


> Docker is a management tool for the underlying jail-like features implemented by the kernel...

ezjail[1] manages FreeBSD jails quite nicely too. Paired with ZFS, the GP makes a solid case IMHO.

1 - https://www.freebsd.org/doc/handbook/jails-ezjail.html


What you don't get is 100% compatibility with all the Docker containers out there.


Docker is not a "jails" clone. (Linux has had "jail" like containers, called LXC, for almost 10 years now.)


Docker runs on top of lxc if I am not mistaken.


It used to run on top of lxc but it was replaced by https://github.com/opencontainers/runc if I'm not mistaken.


Not anymore - it used to. Still uses Linux namespaces and changeroots.


I moved from FreeBSD on a server two days ago, because it lacked Docker-support. Kicking myself in the foot right now...


Probably a good idea for a production server. From FreeBSD: Docker on FreeBSD should be considered experimental.


Docker in Production should be considered experimental

There ...fixed that for you


I really like a lot of things about FreeBSD and would love to try it in the cloud but I feel like I'm missing something? Running FreeBSD on top of ZFS seems to be the smart way to go and yet this requires 1GB of ram minimum. If I spin up an instance of Ubuntu using ext4 by default, the OS uses around 50Mb of ram total. I feel like only the big kids who run on dedicated servers or who are paying for a larger instance get to use FreeBSD.


There's nothing wrong with UFS+SUJ. FreeBSD is awesome for a lot more reasons than just ZFS, and it works well in low memory environments (assuming that you're willing to tweak the kernel/installation a little bit---for example, see https://www.freebsd.org/doc/en/articles/nanobsd/index.html). That said, the smallest current generation instance from Amazon comes with 1 GB RAM, which is more than enough memory to experiment with everything FreeBSD has to offer.


Good point - I just wanted to try out BSDploy - http://docs.bsdploy.net/en/latest/ - and found I had to use ZFS


Hey, that's neat. Thanks for the link!


I hope Sony will be implementing this with the PS4 for running an emulator of the PS3 for backwards compatibility in response to Xbox Ones recent emulation implementation.


Does this mean we'll expect a native port on Mac OS X soon? :)


No, since it uses FreeBSD features that are not in OS X.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: