
Manta: Unix Meets Map Reduce - dmpk2k
http://dtrace.org/blogs/brendan/2013/06/25/manta-unix-meets-map-reduce/
======
phaet0n
_Wow!_ I've spent a hour or so looking at the Manta docs and I think Joyent
has found the _sweet spot_ as far as abstraction is concerned. Kudos to the
Joyent team!

The api [1] is so simple it's absurd. Looking at the design I just keep on
asking myself, "but how will it perform?" And the answer is, I don't know. But
what I do know is that because of the simplicity of the design, if this
compute model catches on, it'll be the start of a new computing
paradigm—probably the first genuine attempt at something we can agree is
cloud.

The essence of the model is that "caching", in whatever form, is abstracted
away. It is up to the architects of the system to ensure that the system
performs well on a wide variety of compute scenarios. Let me explain. Usually,
in a model like App Engine, you worry about how you'll represent your data in
the Datastore, how you'll shard, how you'll use Memcache, and what type of
batch jobs you'll run to reduce the amount of dynamic computation that is done
on a request. On Manta, you just store your data to their object store and
process it as needed. You let the system figure out data locality, what to
cache, on which node to perform the compute, etc.

As I said, I have no idea how, or if, this will perform, but as far as
abstraction level goes, it's perfect. This is where pricing becomes an issue.
At 40µ$/GB/s it works out to 14.4¢/GB/hr with no mention or guarantee of the
performance of the system. If the compute is slow, or the node is overloaded
for tasks, what can a customer do? What if the compute is not limited by cpu
but access to the underlying object store. Sure they say the compute is done
at the edge, but what if they stick 4TB of data on a node that's the compute
equivalent of an Intel Atom? So many questions about performance!

All that said, congratulations to the Joyent team. The HN response so far has
been muted, but I think time will reveal Manta to be an important step towards
true cloud computing.

Hopefully, Manta will mature enough to get a chance to kiss a girl.

[1]
[http://apidocs.joyent.com/manta/api.html](http://apidocs.joyent.com/manta/api.html)

~~~
dap
Thanks for the praise!

As far as pricing goes, you're charged for the wall clock time spent running
inside the zone, so if your task queues up because the system is jammed,
you're not paying for that. Task runtime can be affected on busy systems, but
this has been in the noise for the jobs we've looked at. We're also looking to
see how the system gets used in order to understand whether it would be useful
to have other pricing options (e.g., tiers of service with guaranteed
resources).

As for performance: we're using the system internally every day, and we've
found the performance very good. Of course, the best way to find out is to
evaluate it on your own workload. :)

Finally, you'll probably be interested to read Keith's summary of our hardware
choices here: [http://dtrace.org/blogs/wesolows/2013/06/25/manta-what-
lies-...](http://dtrace.org/blogs/wesolows/2013/06/25/manta-what-lies-
beneath/)

------
rsync
"I’m looking forward to new discoveries now that I can store and process large
amounts of performance data."

There it is. The central dogma of "big data".

But there is a dissenting view...

Jaron Lanier said, and I paraphrase: A computer will pass the Turing Test when
the human running it becomes dumb enough to be fooled.

I have been very skeptical that the "discoveries" that big data presents to us
will only be useful inasmuch as we dumb down our criteria for what discoveries
are.

I suspect that anyone that has tried to model data and find _real_ insight
with it for any amount of time shares this skepticism.

~~~
brendangregg
The criteria for discoveries is that they solve real customer issues and solve
them fast. This may be direct or indirect, elegant or inelegant, simple or
complex.

Last night I helped analyze an issue on Manta regarding job run time, and used
DTrace to measure all process run time from all servers. It was a lot of data,
but easily awk'd (in this case, not enough data to warrant map reduce), at
which point I found run times from a certain set of processes that were
equivalent to the job latency - taking us towards the fix.

That's just last night. A large number of real fixes have come from being able
to quickly gather and process performance data, and from that spot latency
outliers, unexpected workloads, and patterns over time.

I have spent time modeling systems. I wouldn't say I was left feeling
skeptical of data-driven analysis, rather, I was left appreciating how complex
real computer systems are, which defy accurate modeling. That's not to say
that modeling is inferior or not worth doing -- it's another tool. I'll use
whatever I can to fix my customer issues the fastest.

------
sturadnidge
Not just MR, it's a full blown object store too.

[http://apidocs.joyent.com/manta/storage-
reference.html](http://apidocs.joyent.com/manta/storage-reference.html)

------
wicknicks
The next big data framework must be a programming language. Most languages
make the assumption that the data it is processing, is available locally. This
was true until the last few years for almost all situations. But not anymore.

~~~
webjprgm
I think I misread this the first time. You mean a "big data" framework not a
big "data framework", right?

The idea of using a programming language as a database language: MUMPS (aka
M), JavaScript as query language to some new NoSQL databases, ORM like Ruby
ActiveRecrod which makes it look like you work on data in just Ruby.

As for the "big data" interpretation, I don't have much to say.

------
buster
As much as i love Solaris (and ZFS and dtrace and Zones and all the nice
things Sun and Joyent brought) i wish this would be available for Linux as
well.

Edit: Ok, so it's called "Joyent Manta Storage Service" and the tutorial talks
about creating accounts on the joyent cloud. So this is a Service, not a
software? Sorry about the confusion then... I thought it would be a new
software for SmartOS or Solaris...

~~~
dap
It's not "for" any particular system. It's a Unix environment.

~~~
buster
How is it a unix environment? On the first glance i don't really get how it is
implemented. That's why i supposed that this only runs on SmartOS. Is the
object store software downloadable somewhere? The tutorial mentions creating
an account on the Joyent cloud...

------
spullara
Near as I can tell this and Hadoop Streaming are very similar. In fact, you
could probably take their open source CLI and wrap it around streaming without
too much trouble.

[http://hadoop.apache.org/docs/stable/streaming.html](http://hadoop.apache.org/docs/stable/streaming.html)

That would at least avoid lock-in to Joyent's infrastructure.

------
helper
This looks awesome. I looked through the docs and didn't see much information
on what happens in failure cases. If one node that is storing a replica of my
data goes down how long until the data is replicated to a new node?

~~~
dap
It depends on what "goes down" means. The vast, vast majority of such issues
are transient: the node panics or loses power and comes back up. There's no
impact to data availability or durability unless all nodes storing a copy of
the same object are down at the same time.

It's a good question. Sounds like we should document or write up a blog post
about the gory details.

~~~
helper
That would be really useful. Along the same lines, one of the blog posts said
that you chose to use strong consistency (CP in CAP). In the event of a
failure, how many replicas (out of N) need to be down before the API returns
500s for reads or write (if they are different).

~~~
dap
For a read to fail as a result of storage node failure, all replicas would
have to be down. You may see increased latency on successful reads if some but
not all of the replicas are down.

Recall that writes always create new objects. The storage nodes are selected
dynamically for each write, so in practice that won't fail for storage node
failure.

The more likely way to get a 500 from the data path is if parts of the
metadata tier become unavailable. As with storage nodes, such failures are
typically transient. They're also unlikely to affect reads. Writes may
experience transient errors as the metadata tier recovers from failures.

[Edit for grammar.]

------
oxtopus
Reading the Manta website, I can't help but be reminded of this webcomic:
[http://rbxbx.info/images/fault-tolerance.png](http://rbxbx.info/images/fault-
tolerance.png)

~~~
epistasis
I don't see how that comic is relevant to Manta, and it seems to be a bit of
an unfair comparison. The Manta web site emphasizes the ability to use common
and existing tools to work with the data. In addition to standard Unix pipes,
they have bindings for pretty much every language I could think I want to use.

I.e. Manta seems to emphasize the ability to use existing tools in a
distributed fashion, rather than forcing you to learn a new language with
unfamiliar syntax.

------
pepijndevos
So what is this SmartOS thing?

From what I understand: KVM is hardware virtualization, SmartOS is(a distro
for) OS virtualization and Docker is process virtualization?

~~~
shykes
SmartOS is a distant relative of Solaris. It's an alternative to Linux for the
server with all the Solaris bells and whistles (zfs, zones, dtrace) and great
tooling for virtualization. Unfortunately it's not very widely adopted and to
my knowledge is only supported by one start-up (Joyent) which limits its
chances of dislodging Linux. So, a very powerful, highly integrated and
ultimately niche stack.

It also inherited a subset of the Solaris community which is jaded by the
popularity of Linux and its "primitive" toolset. You can often see them
rolling their eyes and making condescending comments as the rest of the open-
source world moves on.

The huge potential for all this is to contribute these wonderful tools to
Linux, but there seems to be reluctance on that front.

~~~
insaneirish
_The huge potential for all this is to contribute these wonderful tools to
Linux, but there seems to be reluctance on that front._

The code is there for the taking. FreeBSD has ported ZFS and DTrace. Guys at
LLNL are using illumos ZFS code as upstream for ZFS on Linux. Others are
working on DTrace.

One can wait for efforts like that to keep progressing, help in whatever way
they can, or just use illumos based distros (e.g. SmartOS) now and not wait
around. Choice is a wonderful thing.

~~~
shykes
Note that I am not complaining or demanding anything. You are right in
assuming that I do not miss dtrace and zfs enough on my linux boxes to
contribute individually to the porting effort.

I am simply observing that porting these tools to Linux doesn't seem to be a
priority for either the Linux community (which is already incubating multiple
"home-made" alternatives such as btrfs, lxc userland tools etc.) or the
SmartOS community (which can understandably be tempted to keep these tools as
proprietary "killer apps").

I believe that, if there had been a more concerted and serious effort to get
these tools into linux early on, by people knowledgeable and influential
enough to get it done (ie. not me), setting aside platform rivalries and NiH,
a lot of sysadmins and developers would be having way more fun at their job
right now.

~~~
mwcampbell
I'm curious about why you don't think DTrace is that important. If dotCloud
were running on a bunch of SmartOS boxes, and assuming that SmartOS was
running on the metal and only OS virtaulization was used, then you could use
DTrace to observe all levels of the stack from the hardware all the way up to
the application. That seems like it would be useful for a PaaS company.

~~~
shykes
I do think dtrace is awesome, and would certainly not make an argument against
using it. We may very well use it in the future.

There was no active decision _not_ to use it at dotCloud, our "visibility
toolbox" is simply based on other tools - a combination of collectd, nagios,
distributed rpc tracing, active and passive checks from the http frontends,
and a decent logging setup. I'm sure you could accomplish this with dtrace and
probably a dozen other tools, but I guess the same is true for every problem.
At some point you just choose a tool that works and move on to the next
problem.

------
willcodeforfoo
Interesting, is Manta open source?

~~~
zeckalpha
The client is:
[https://npmjs.org/package/manta](https://npmjs.org/package/manta)

~~~
susi22
Those are the client libraries. It seems like the actual manta compute
environment is primarily for SmartOS.

[http://apidocs.joyent.com/manta/compute-instance-
software.ht...](http://apidocs.joyent.com/manta/compute-instance-
software.html)

[https://github.com/joyent/manta-compute-bin](https://github.com/joyent/manta-
compute-bin)

Also probably a better article:

[http://www.joyent.com/blog/hello-manta-bringing-unix-to-
big-...](http://www.joyent.com/blog/hello-manta-bringing-unix-to-big-data)

------
trevoro
Official Announcement: [http://www.joyent.com/blog/hello-manta-bringing-unix-
to-big-...](http://www.joyent.com/blog/hello-manta-bringing-unix-to-big-data)

