
CoreOS: Boot on Bare Metal with PXE - philips
http://coreos.com/blog/boot-on-bare-metal-with-pxe/
======
WestCoastJustin
For anyone who does not know what PXE (pixie) boot is, it _is an environment
to boot computers using a network interface independently of data storage
devices (like hard disks) or installed operating systems._ [1] In the BIOS,
rather than booting from CD, or Hard disk, you would select Network.

A very simplified explanation is that the PXE enabled network card (almost all
modern network cards support PXE, desktop and servers), can make a DHCP
request (outside of an operating system), download via TFTP, then boot a
kernel/initial ram disk, say for example an OS installer, a LIVE CD, or
CoreOS.

    
    
        +-----> #1 PXE makes DHCP request, 
        |          redirects to TFTP server,
        |          loads kernel/initial ram disk via TFTP
        |
        |  +--+ #2 PXE boots kernel/initial ram disk 
        |  |
        +  v
      +------------------+
      | PXE Network Card |
      |------------------|
      |     server       |
      |   hardware/os    |
      +------------------+
    

So, with a little infrastructure (DHCP, TFTP, and elbow grease), you can boot
many things over the network without even having a hard drive or cd-rom in a
machine. I use this often for installing new desktops and servers. Just boot
into the PXE menu and then select the installer that I want. Then you use
something like puppet to configure the machine as needed. A typical RHEL
install can take ~5 minutes.

In summary, it looks like CoreOS provides you with their kernel and initial
ram disk [2], then you can just boot the machine over the network without
actually installing anything. Basically running everything out of RAM, just
like a LIVE CD, most likely with the option to install to disk for persistent
storage of your images, etc.

ps. It is common to use NFS to mount persistent storage in these environments
too, or if you are using a HPC environment, then Luster or something like
that.

[1]
[http://en.wikipedia.org/wiki/Preboot_Execution_Environment](http://en.wikipedia.org/wiki/Preboot_Execution_Environment)

[2] [http://coreos.com/docs/pxe/](http://coreos.com/docs/pxe/)

~~~
jlgaddis
"Back in the day" (pre-PXE), we had rooms full of machines -- without hard
drives -- that booted completely across the network and mounted all of their
filesystems via NFS.

[https://en.wikipedia.org/wiki/Diskless_node](https://en.wikipedia.org/wiki/Diskless_node)

Over the last few years, I've wondered why server vendors don't ship servers
with type of flash-based storage (or similar) -- perhaps 4-8 GB -- that's
large enough to hold an installation of (for example) VMware ESXi (or another
hypervisor) and its related configuration files, leaving any local storage
exclusively for VMs. Alternatively, you could boot the hypervisor from this
"onboard storage" and access all data across the network (i.e. NFS, iSCSI,
SAN) and not have any HDDs whatsoever in the server.

~~~
telephonetemp
Well, there's the solution of putting a USB flash drive inside the server's
case (optionally securing it with duct tape). In fact, a USB flash drive is
the recommend medium for booting FreeNAS.

~~~
derekp7
I've seen recent servers also ship with an SD or MicroSD slot on the
motherboard too. Probably a bit more secure than a USB stick that can come
unplugged easily.

------
wmf
Since I asked for this in the original CoreOS thread, let me be the first to
say thanks. I think stateless immutable servers that boot from the network and
run from RAM are going to be a great base layer to build on.

BTW, if people haven't tried PXE booting before, it's pretty easy with
dnsmasq. You can basically read the sample config file and uncomment a few
lines. I recommend experimenting with PXE in Vagrant or on a separate physical
network to avoid breaking your production DHCP.

~~~
286c8cb04bda
_> BTW, if people haven't tried PXE booting before, it's pretty easy with
dnsmasq. You can basically read the sample config file and uncomment a few
lines._

You don't even need a config file. Here's a command line snippet I have saved
for testing PXE installs:

    
    
        $ sudo dnsmasq -hdq -i en0 -p0 --enable-tftp --tftp-root=`pwd` -Mpxelinux.0 -F10.10.10.100,10.10.10.199

~~~
samstave
So this snippet makes the machine you run it from into the PXE Server?

~~~
wmf
Yes.

------
songgao
Awesome! My favorite part is this: [http://coreos.com/docs/pxe/#state-only-
installation](http://coreos.com/docs/pxe/#state-only-installation)

which would enable you to run CoreOS in memory only (loaded from PXE), but
still store all of your containers on filesystem. It liberates me from having
to install OS on a cluster, but still lets me use persistent storage.

------
voltagex_
PXE support on some consumer boards is a mess - often I have to use iPXE [1]
just to get them loading from TFTP reliably. Now I've played with UEFI PXE
boot, and it seems to be even worse - instead of requesting an "x64
bootloader", the NIC seems to request a "UEFI bytecode bootloader" which I
haven't been able to supply.

[1]: [http://ipxe.org/](http://ipxe.org/)

------
sturadnidge
I'm really glad to see CoreOS taking this path, forged by the likes of
VMware's ESXi and Joyent's SmartOS. It truly is the only way to run scalable
infrastructure.

~~~
lsc
>I'm really glad to see CoreOS taking this path, forged by the likes of
VMware's ESXi and Joyent's SmartOS. It truly is the only way to run scalable
infrastructure.

I find comments like these amusing. Sysadmins have been using PXE to boot
servers... for quite some time now. Long before joyent existed, and long
before VMware was something you'd seriously run a server under. (vmware came
about around the time of the 2.1 version of the PXE standard, which was when I
was first getting my feet wet; I didn't seriously start using pxe until the
early oughts.)

Hell, /I/ built a nfs/pxe diskless cluster before joyent existed. As part of
that, I demo'd a 'initrd only' system like this coreos thing, only we were
using FreeBSD. We ended up going with / on nfs; it was way easier to update.

That said, I'm not knocking CoreOS; I might even use this. Maintaining your
own bootable initrd with root filesystem is work. I currently use distro
'rescue images' (for centos, at least, you append 'rescue' to the installer,
and it downloads a small initrd /... but it's less than optimal.)

I mean, I'm not shitting on joyent, either; I think most of the value they
bring is managing this shit ongoing, which is not a trivial amount of work. I
mean, the whole idea behind companies like Joyent is to make is so you don't
need a me screwing with your dhcp server, and there's value in that. I'm
grouchy and charge a lot of money.

------
hosay123
Good luck finding a network infrastructure and PXE server able to boot a few
hundred machines simultaneously during a power event. Yes in theory PXE boot
sounds great. In reality it's a pointless SPOF.

Also most likely makes it harder to reuse most of the trusted boot
infrastructure that already exists for Linux. So we can assume at least in the
initial release Mallory can race with the real PXE server assuming a network
that hasn't been partitioned with a crazy complex config (i.e. basically all
of them).

~~~
fintler
We commonly reboot entire clusters at once (around 10,000 servers in larger
clusters -- each running a full Linux OS) over PXE without a problem. We have
a configuration management machine that creates an image, then we push that
down to a small cluster of TFTP servers that serve it out. The strain on NFS
(we keep parts of the OS in RAM, and load other parts on demand over NFS)
after we kexec from the PXE kernel into the production kernel causes more
problems than the initial TFTP traffic (but it usually works fine as well).
Btw, after booting, we use PanFS (DirectFlow) or Lustre for computing stuff,
not NFS.

Although it's not what we use, here's a program that does a similar type of
management: [http://warewulf.lbl.gov/trac](http://warewulf.lbl.gov/trac) If
you take the time to combine Warewulf with something like Puppet or Chef,
you'll have a nice system for managing 100s of thousands of machines (I could
easily see this scaling to over a million servers if you have the cash to
build something like that).

If you're wondering about dynamic libraries in an environment like this, take
a look at [https://github.com/hpc/Spindle](https://github.com/hpc/Spindle)

And yes, I still get giddy when I type one command to reboot 10,000 servers.

~~~
ajdecon
Since you mention kexec and TFTP+NFS, are you currently using Perceus? Or is
there another system out there with that combo?

~~~
fintler
We're using a modified Perceus tied into cfengine.

------
slynux
I had written a Wifi based LTSP for Linux sometime ago.
[http://www.sarathlakshman.com/2010/03/14/wireless-
ltsp/](http://www.sarathlakshman.com/2010/03/14/wireless-ltsp/)

------
jimmcslim
Running CoreOS on Intel NUC's via PXE boot from a HP Microserver; a poor man's
blade chassis/datacenter?

------
vezzy-fnord
So is this basically LFS with built-in PXE boot support and orientation
towards servers?

------
wiradikusuma
Is this some 'lightweight' OS that you can use for Raspberry Pi?

------
volokoumphetico
from the title I thought somebody had discovered a way to run an operating
system on just a thin sheet of metal

~~~
johnpmayer
Bare metal is a common term used to describe running directly on the hardware.
Such as a non-virtualized operating system, or a program running without an
operating system.

Or yeah, we could just downmod everybody.

~~~
volokoumphetico
wow tough crowd.

