
BareMetal is a 64-bit OS for x86-64 based computers - fogus
http://www.returninfinity.com/baremetal.html
======
yan
> Return Infinity goes back to the roots of computer programming with pure
> Assembly code. As we are programming at the hardware level, we can achieve a
> runtime speed that is not possible with higher-level languages like C/C++,
> VB, and Java.

When will this reasoning finally die?

~~~
haberman
It won't die because it's true, at least in some cases.

The LuaJIT 2.0 interpreter, written in x86-64 assembly language, is 2-5x the
speed of the plain Lua interpreter, written in C. Note that this is with the
JIT _disabled_ \-- it is an apples-to-apples comparison of interpreter-vs-
interpreter: <http://luajit.org/performance_x86.html>

I recently wrote a protobuf-decoding assembly code generator that is 2-3x the
speed of C++ generated code: [http://blog.reverberate.org/2011/04/25/upb-
status-and-prelim...](http://blog.reverberate.org/2011/04/25/upb-status-and-
preliminary-performance-numbers/)

What is your evidence in support of the idea that assembly cannot be faster?

~~~
phillmv
I don't know, I always found the notion that humans will _always_ be able to
optimize better than machines to be somewhat... naïve. Is it an NP complete
problem our own heuristics are currently better at estimating?

No one has performed controlled studies on these things - maybe they had some
crummy bottlenecks, used some language feature that their compiler couldn't
optimize away, maybe the benchmarks they use to determine performance are
trivial (which is very often the case), etc.

Not to mention that most of an operating system's time, post boot is spent
doing... what? Having the scheduler swap processes in and out? If you're
running a single program that fits inside ram... it's totally fucking
pointless, there's nothing left to optimize.

I have a better question. These guys are clearly smart. What the hell are they
still doing in _Atwood, Ontario_?

~~~
haberman
> Is it an NP complete problem our own heuristics are currently better at
> estimating?

Some of the important problems are NP-complete (like register allocation).
Another problem is that compilers aren't that good at telling fast-paths from
slow-paths (and keeping everything in registers for the fast paths). For more
info see this message from the author of LuaJIT:
<http://article.gmane.org/gmane.comp.lang.lua.general/75426>

~~~
kragen
There's a new formulation of register allocation that's computationally
tractable:
[http://compilers.cs.ucla.edu/fernando/projects/puzzles/exper...](http://compilers.cs.ucla.edu/fernando/projects/puzzles/experiments/)

------
mseebach
I'd be very much surprised if there were any significant workloads that would
work faster on this thing than on a decently set up Linux or BSD install on
the same hardware.

It's not my impression that modern OSs has a habit of getting in the way of
pure computation - and when the computation is done, I'd much prefer a solid
filesystem/network stack to get the results out of the door.

But as an academic/tinkering/hacking project, it's awesome. If assembly was in
my backlog of stuff I want to learn/play with, this would be an obvious thing
to get started on.

~~~
jedbrown
Cray runs a stripped down kernel they call Compute Node Linux. It still has
virtual memory, which combined with the frequency of getting poorly mapped
physical pages, causes difficult to predict performance. It is just accepted
that performance results are not reproducible on the Cray and most modern
clusters, especially those with "fat" nodes. For large runs on Jaguar, the
standard deviation is often 20% to 30%, so people who are doing scalability
studies run the same model several times and plot the best result. It can be
worse on clusters like Ranger (4-socket quad-core Opteron nodes, connected by
InfiniBand). Of course running the same 100k core job repeatedly to get a
stable timing is a waste of resources. The problem is a combination of VM,
multi-core interference, daemon noise, and network topology variability
between runs.

In contrast, IBM's Blue Gene series runs Compute Node Kernel which is not
Linux and uses offset-mapped memory. This obviates the need for a TLB. The
rest of the OS is also stripped down compared to Cray's already lean CNL.
Performance variability on Blue Gene is usually reliably less than 1%.

I think BareMetal looks rather silly and will probably not be used for
anything serious, but ordinary Linux or BSD is a dubious choice for HPC.

------
akent
This has been around for at least two years (if not longer?) see
[http://forum.osdev.org/viewtopic.php?f=2&t=20946](http://forum.osdev.org/viewtopic.php?f=2&t=20946)
for example.

Considering it is such a "from scratch" kind of project and given the progress
so far, it seems to me like it might be more of a "let's see if we can"
curiosity type thing rather than a project that an end user might actually
want to use for anything practical.

------
zokier
Previous discussion: <http://news.ycombinator.com/item?id=1698332>

------
krmmalik
Genuine Question: Can anyone give an example of something useful that has been
done with this OS, or is planned for the very near future?

~~~
nivertech
you can run Redis on it or any other simple single-threaded server

~~~
ch0wn
I wonder how hard it would be to port redis and how it performance would
compare.

~~~
nivertech
I don't have a link, but something similar was done with custom kernels with
Redis compiled to run native above Xen - the performance gain was only ~ 13% -
so in this case it didn't worth the trouble.

But if you a large HPC cluster, getting 13% more of each compute node
definitely worth the trouble.

EDITED: see link in child's post

~~~
davej
Not EC2, but is this what you are thinking about?…
<http://openfoo.org/blog/redis-native-xen.html>

~~~
kwis
That's interesting, though I think my takeaway was that the "performance tax"
of the operating system layer is pretty minor, all things considered.

------
mmphosis
<http://www.menuetos.net/>

------
richlist
A great idea especially if they try to attack one market at a time.

They should mandate a very small (but popular) set of hardware that will be
supported so if you want to use it that's it and then it reduces their support
issues (they could even sell pre-installed boxes). Possibly create some
drivers for Virtualbox drivers to allow people to dabble with it prior to
building their own compatible hardware.

I'd like to see them include much needed secondary features in Intel optimized
C with a roadmap for them to be reimplemented in ASM as time permits.

If they could develop/get a static web server with the speed of Nginx (or
better) I'm sure this thing would explode in popularity; I'm sure CDN's etc.
would see the benefits.

------
premchai21
The part of this that stands out to me: the OS claims to be open source. But
the bootloader is proprietary. Why? Does the source depend on proprietary
specifications that have been embedded in parts of it or something? The
documentation doesn't obviously preclude writing a replacement, but nor does
it seem to be designed to encourage such a thing. On the surface it's not
complex enough for this to be a huge task, but I'm suspecting there's at least
one strange grinding obstacle in the way…

------
lee
I'd be interested to see some benchmarks of some computationally expensive
applications vs. running them on Linux or another OS.

How much gains are you making by optimizing the OS?

------
ubasu
I understand that this is an experimental project, but it would seem that to
target high-performance computing, they should allow Fortran as one of the
languages also.

------
ZeSmith
The thing that strikes me the most is how the author seems to consider C and
C++ to be a single language called "C/C++".

~~~
aaronblohowiak
C++ is an almost perfect superset of C. (using the term "perfect" in the
"superset"-ness of c++, not its design quality.) From this perspective, it is
appropriate to lump them together.

~~~
ZeSmith
I see plenty of people who claim they write "C++" but end up writing some
mutant "C with classes" C++ is not just "C with addons", it's a different
language that happen to share its syntax and part of its standard library.
Lumping them together leads to ugly C++ code.

------
prodigal_erik
Having just seen a billboard ad for <http://www.mokafive.com/baremetal>
(enterprisey desktop virtualization), I was briefly expecting a legal dispute,
but "BareMetal" isn't actually on their trademark list.

------
jedbrown
If it doesn't support MPI or a functional threading system, then it will never
be used for HPC.

~~~
mbreese
That's my biggest question: can it support threading? I work on HPC tasks that
don't need MPI, but that are I/O bound. This means that it is more efficient
to use multiple threads to process data while another is waiting for data to
be loaded from the disk. I'm all for getting as close to the bare-metal as
possible, but you're right. Without MPI or threading, this doesn't have much
of a chance to be adopted.

------
SonicSoul
any performance metrics? we can speculate about effectiveness of such a
solution but it should be fairly easy to validate by running some common
computational tasks this OS was designed to excel at, vs other popular OSs..

------
eddof13
Why would they use FAT16 for the file system? Seems limiting to me...

~~~
DrJokepu
Because it's really trivial to implement. Other file systems are a lot more
complex, especially to write.

~~~
marshray
It's also well supported by boot media like .isos, USB, network, etc.

------
malkia
And then an FPGA guy walks in a bar and...

~~~
chrisjsmith
There I upvoted you. Perfectly valid comment.

FPGAs can be programmed to give the answer in the time it takes the gates to
propagate which is usually damn quick. None of these "cycles" things that CPUs
use up.

------
bxr
The popularity of virtualization technology and the new trend of selling
instances could make operating system development interesting again.

The requirements for an operating system have changed drastically with this
new way of thinking about what it means to run an operating system. The
requirements can be as low as supporting a single process that can talk tcp
and (maybe) to disk. Look at Haskell Network Stack, it provides network
support to an application and you don't need an OS proper, just Xen.

I'm very excited to see where highly lightweight OSes end up.

~~~
eru
It's exokernels all over again.

~~~
marknutter
It's ________ all over again. Love these types of comments on HN.

~~~
eru
Oh, Xen and other baremetal hypervisors are actually very closely related to
exokernels. I work with Xen for a living.

