

ZeroVM: lightweight containers based on Google Native Client - timf
http://zerovm.org

======
Tobu
This is intended as a way to run computation close to data. Some databases
embed Lua or pluggable languages for that; ZeroVM can run NaCL binaries
(compiled with a special toolchain), verified in the same manner as the JVM
checks bytecode, on a very limited sandbox (just some pre-configured data
channels). Besides the NaCL verifications, they are enforcing functional
programming: the program only has access to deterministic instructions and
library calls.

~~~
Tobu
Why downvotes? I provided an explanation because the initial reaction on HN
was confusion.

~~~
skrebbel
They're called the downvote mafia. Little to do about them.

~~~
andybak
Is it people or scripts? What's the motivation or is it a mystery?

~~~
skrebbel
It's people. I'm slightly blinded by their motivation. Something inside me
wants to believe they're 14 year old mischievous scriptkiddies, but I bet
they're really just guys and girls like you and me.

------
willvarfar
I will try and explain it as I understand it:

Google Chrome has a sandboxed VM called _Native Client_ (NaCl) that runs at
near full speed. Its very neat.

So they have taken that same VM and, instead of Chrome's Pepper API, they have
a file-handle-based API and some message-passing between instances.

Now you can compile your C/C++/whatever program and run it on the cloud!

It seems an excellent building-block for big data and big crunching on the
cloud, and it gets the benefit of Google's massive resources on security and
performance fixes.

~~~
ch0wn
If I'm not mistaken, NaCl's integration with Chrome is actually implemented
using Pepper.

EDIT: I found something on this. "NaCl was integrated into Chrome 5 as an in-
process Pepper plugin. The NaCl modules that it runs can utilize the Pepper
API for browser interaction." ([http://www.chromium.org/nativeclient/getting-
started/getting...](http://www.chromium.org/nativeclient/getting-
started/getting-started-background-and-basics#TOC-Native-Client-NaCl-))

~~~
willvarfar
Exactly; that's what they don't do; they use a file-handle-based IO instead of
peppar (which has file opening libraries and event loops of its own)

------
equark
Very cool, I've been waiting for this to happen.

I'm surprised that Google doesn't explicitly talk more about this use case for
Native Client. It could be the backbone for an AWS/Heroku competitor. The
ability to run lightweight tasklets securely would enable a lot of interesting
scenarios. While not very significant, Google actually already started doing
this with their Exacycle program:

<http://research.google.com/university/exacycle_program.html>

------
majke
If you think NaCL is bloated, Russ Cox's vx32 may be a lightweight answer:

<http://pdos.csail.mit.edu/~baford/vm/>

Some of my experiments:

[http://www.lshift.net/blog/2010/03/31/what-has-happened-
to-t...](http://www.lshift.net/blog/2010/03/31/what-has-happened-to-the-
segment-registers)

<https://github.com/majek/vx32example>

~~~
eklitzke
Interesting, Russ Cox works at Google. I'd have to imagine that he's talked to
members of the NaCL team, and vice versa.

~~~
bradfitz
He works on Go at Google. He updated an earlier version of Go to run on an
earlier version of NaCL, but it's since bitrot as NaCL's formats were changing
at the time.

------
StavrosK
I have never read so many words and understood so little about a technology
before. The density of marketing-speak per word is approximately 1.

Why can't some people just explain things simply?

~~~
reginaldo
From what I read it's something like this:

It gives a bare-bones environment for you to run your programs that is
presumably very low overhead. Think of it as an embedded system where programs
run without an OS. This is the environment a program running inside zerovm
will see. All you have is libc and the zerovm-provided APIs. If you want more,
you'll have to statically link your programs.

The thing is, you can run many, lets say thousands, of these little programs
inside a single machine in such a way that each one can never see the other
ones (as long as it's impossible to break out of the ZeroVM sandbox).

Such a technology would enable neat stuff, like renting a server for someone
to run a single program for some period of time and have the results sent
back. Nobody does this for unrestricted programs today, for many reasons, a
very important one being the fact that it would be very hard to do this in a
secure way.

The "run a C program for some period of time thing" would work kind of like
the AWS dashboard, but instead of having to spin up a machine with linux on it
and running your program inside that, you would only upload your binary and a
manifest file. Kind of like what app engine does, but with less restrictions
(you'll probably be able to do anything as long as you're able to compile a
"safe" binary that does it).

~~~
icebraining
_Such a technology would enable neat stuff, like renting a server for someone
to run a single program for some period of time and have the results sent
back. Nobody does this for unrestricted programs today, for many reasons, a
very important one being the fact that it would be very hard to do this in a
secure way._

Well, almost nobody. NearlyFreeSpeech[1] lets you compile and run unrestricted
C and C++ programs on their servers, or any other binary, if you compile it
somewhere else. I've done some tests with a Go based webservice.

They do have a very restricted time limit until they're killed, but that's
because their servers are designed for web applications, not data crunching.

[1]: <http://example.nfshost.com/versions.php>

~~~
reginaldo
Thanks for the info. Seriously I should have said "almost nobody" in the first
place. In fact I thought I had said that :)

I didn't know nfshost was doing it. From what I see, they're probably using
FreeBSD jails in this case, which is nice if they are.

Anyways, we have to agree that this space is largely unexplored. I never
thought there was a need for this kind of service, but just after reading the
ZeroVM pages I think it's a very good idea. With a "little" more initial
effort, it would enable writing systems in a very interesting way: self-
healing (when the other end has failed, make an API call to provision another
copy of it), self-provisioning (when traffic is high, make an API call to
provision another copy of a worker), etc. Of course we can already do this
already, it would just be more natural, and if you combine this with the idea
of Mobile Agents, then the cloud suddenly becomes much "cloudier".

------
vardump
I read this as RPC with code instead of just data. If so, this is exactly what
I've been looking for a long time, because traditional RPC roundtrip latency
is often high - so high, that you need to create a more complicated API to
avoid excess iteration.

Combine this with ZeroMQ and MessagePack, and you have some serious power at
your fingertips.

Messages can execute at destination, do iteration, API calls and return only
needed part of the data and results back.

~~~
ericbb
Typical solution is to use an interpreter (turning data into code). How often
is it necessary to run arbitrary machine code on demand?

~~~
DavidGruzman
I would expect about order of magnitude speed difference between interpreted
language and optimized machine code. I can recall case when reducing
analytical request from 10 hours to 10 minutes changes the qualities of
research company was doing - since analysts where able to do more queries
selecting better dataset for the report. Order of magnitude response time
might also be go/no go for interactive analytics. In case of clouds where we
can assume infinite resources it can mean 1/10 of cost. For the private
clusters - it can be differenace in buying 10 machines (something common in
hadoop's word) and buying 100 machines - something very few groups can get.

------
mike_ivanov
This is a perfect Mobile Agent platform
(<http://en.wikipedia.org/wiki/Mobile_agent>)

~~~
camuel
Correct!

I wonder how I missed this on zerovm site. For the folks not familiar with the
issue, it is not about mobile phones :)

With the emergence of large immobile datasets, software mobile agents may have
a renaissance era... especially with AI being cool again.

------
lbotos
I'm slightly confused as to if this has any use beyond "on-demand data access"
use cases. I reviewed the site and I like the idea but I'm confused as to what
else this could be used for. Would something like a "Heroku" style PaaS using
PyPy or something that targets NaCL benefit from this is a process separation
sense? Anybody care to clarify?

~~~
willvarfar
Ever used distcc? (or even mosix?)

Imagine that everyone in the office could be running your tests, compiles and
such easily and transparently.

There are lots of grids to do this now; but this seems like a new lighter-
weight and faster solution that, as it'll run well on Windows too (in that
Chrome proves it does), is going to be great.

Here's hoping :)

------
iseyler
This is very interesting! It will be even more interesting to see if it can be
compiled to run on my OS. This is exatly the kind of application I have been
looking for. ESXi is too big :)

Shameless plug: <http://www.returninfinity.com/baremetal.html>

~~~
camuel
I can assure you it can easily be ported and used on any OS. We just a few
guys right now and don't have the capacity to test it on anything but Ubuntu.
However, we are designed it to portable (and NaCl/Chrome code is also portable
which helps a lot). Even to run on bare-hardware. So tried to keep OS usage to
minimum. In fact, porting would be a more extensive effort to architectures
not naively supported by Nacl. For example zerovm on tilera-linux (MIPS
variant) will be much more effort then FreeBSD on x86-linux.

As a side note, I personally convinced that today OSes are an overkill for
cloud-based number crunching (the prime case for zerovm) wasting resources. I
am looking forward for future a lot lighter 'cloudware'. Think 'opencompute'
approach for OSes. zerovm is being a humble experiment here.

~~~
iseyler
Agreed on the overkill part. That is the reasoning behind the work we are
doing. I sent you an email to discuss further.

------
justauser
Perhaps the "motivation" section would serve as a better landing page.

What's the current state of security with LXC? As I recall, Heroku relies on
this for it's virtualization.

~~~
mike_ivanov
LXC has its warts but generally it's ok.

------
karterk
_what is less known is that when deployed at cloud, Hadoop cannot access that
enormous dataset locally due to security restrictions and therefore is
screamingly inefficient compared to on-premise Hadoop deployment._ [1]

Can someone explain what that means?

[1]: <http://zerovm.org/killer-apps/>

~~~
camuel
If you use EMR or just roll your own Hadoop in EC2 then:

1\. Hadoop runs on EC2 2\. Data is stored on S3 3\. Intermediary results
stored in EC2 4\. Hadoop loads the data from S3 to EC2 5\. EC2<->S3 bandwidth
is not that fast or efficient (S3 proxy, network contention, TCP/IP
processing)

Hypothetical MapReduce/ZeroVM/Swift scenario:

1\. Data is stored on S3/Swift 2\. Map and Reduce functions are run inside
S3/Swift secured by ZeroVM in majority of cases accessing data locally without
networking/proxies getting in the way. 3\. Intermediate and final results are
also stored within S3/Swift. 4\. Local data access is efficient, fast and
predictable 5\. Local networking within S3/Swift is more efficient, fast and
predictable than S3<->EC2 / Swift<->Nova

Accelerated Hadoop scenario:

Exactly as in #1, just Hadoop makes "predicate pushdown optimization" into
S3/Swift secured by ZeroVM.

Regarding 'due to security restrictions' I meant that cloud vendor would not
let you run your own code in S3 or CloudFiles. Why? Because you could mess up
other people data and storage system itself. Why not run in VM inside S3? well
I guess it would be impractical due to long provisioning time of conventional
VM.

~~~
lobster_johnson
That criticism is specific to S3, not EC2 or Hadoop. It's perfectly feasible
and probably preferable to have Hadoop work on local files in instance store
volumes (or EBS if you're mad).

~~~
DavidGruzman
There is other issue with running hadoop on EC2 (w/o S3). Instance storage is
relatively small - about 3.6 TB on largest instance and 1.5 TB on other
"large" instances. In typical Hadoop machine I would expect about 8TB. So
local storage is prohibitively expensive for the big data tasks. In the same
time - if we use local storage we a loosing elasticity - we have to run
cluster all the time, even there is no jobs to run. It kills main point of
using hadoop in the cloud - to pay for the computational resources on demand.

------
rektide
OpenMirage project is a similar-ish idea, providing numerous implementations
of a std-lib for different targets (sample targets: Android, Linux OS, raw
x86), and using this limited API. Sure, ZeroVM has it's own "vm instructions"
rather than "library calls," but the ideas both reduce to building virtual
machines for great glory and profit.

------
mcartyem
How secure is the VM given a binary that can dynamically modify itself to
bypass the inspection of the VM?

~~~
camuel
I assume by VM you mean ZeroVM. Well, ZeroVM currently doesn't allow self
modifying code at all. Nice try.... We haven't touched Google's provided
validator in order not to break anything security-related. And if you think
you have a good idea for vulnerability then you can claim some Google prizes.

If you meant more practical uses for it then unfortunately modern JIT would be
difficult to support efficiently as they constantly recompile and with ZeroVM
it is not only recompilation but also validation. However, JIT that recompile
only once, on loading, is easy to support. In fact, next version of NaCl dumps
GNU toolchain in favor for JITy LLVM, but then recompilation is happening only
once.

------
xal
I love how many high level technologies are left to mature in the java
environment and then are reintroduced on a level that's a lot closer to the
kernel and the metal.

ZeroMQ is another great example of the progression that started with AMQP.

Hadoop hopefully will see a similar fate. ZeroMQ combined with ZeroVM actually
offers two important building blocks.

------
alexchamberlain
So, does this mean, when implemented in a browser, I can use C instead of
Javascript?

~~~
justincormack
Thats what you can o with native client in chrome right now. This reuses that
code for server apps.

------
hobbyist
how is it different from freebsd jails or containers in linux?

~~~
mike_ivanov
Containers/jails draw the isolation boundary _around_ your processes, whilst
this technology confines your code within a singe process and completely
isolates it from the system.

~~~
camuel
exactly!

Further from being a different abstraction, container technologies (at-least
in their current implementations of 'chroot on nukes') are not completely
sealed or 'secure'. OpenVZ seems to be the most secured one over-there,
requires kernel-patching and still... close but not 100% airtight. That is one
of the reasons that many lightweight containers are used only as secondary
sandbox (like Heroku) and not allowing you to run arbitrary C/assembly inside
your environment. So, practically, LXC always ends up as secure-python-
environment or ruby-environment as so on... never as secure x86 execution
environment.

Correct me here if I'm wrong...

~~~
shykes
dotCloud (<http://dotcloud.com>) supports arbitrary code execution inside LXC
containers (pre-2010 versions used OpenVZ, and very early versions were built
on V-server). The main limitation is that the process runs under an
unprivileged uid under a kernel managed and deployed by dotCloud.

I agree with the assessment that containers are not "completely secure" - I
would not trust it to contain a root-privileged process. However an
unprivileged process running inside an lxc container on a recent kernel will
have an extremely hard time escaping.

~~~
camuel
What if I DoS attack some syscall? Or create zillions of files with 1 byte
size driving crazy file-system or anything else.

Kernel is such vast area vulnerable for an attack that it is scary even to
think about securing all of it and not leaving a single weak point. Moreover,
you will screw your syscall API to the point that it will become unusable. At
bare least we need standard for the syscall capping and etc... so programmer
will know what to expect.

And thanks for the link, will check them and what solution they use and
whether they are happy with it.

