
SmartDataCenter and Manta are now open source - zbb
http://dtrace.org/blogs/bmc/2014/11/03/smartdatacenter-and-manta-are-now-open-source/
======
jarpineh
If I read this correctly, these components run on top of open source fork of
Solaris, SmartOS?

Or SmartOS is using parts of Illumos, another for of open source Solaris, but
it's not really a Solaris with its userland, but a container OS? Like Linux +
Docker, but without the OS parts of Linux distribution.

The whole thing sounds very intriguing and really useful for a lot of things.
I would like to do a Python Pandas DataFrame on top of Manta objects, for
instance.

Hopefully someone could get a Vagrant box or something to try the whole stack
on your own hardware. What I'm looking at now, I don't know where would I
start building. Other option is to use their cloud services, but looking at
their price charts [1] I get overwhelmed with options (and no free tier to
take it for a spin).

[1] [https://www.joyent.com/products/public-
cloud/pricing](https://www.joyent.com/products/public-cloud/pricing)

~~~
jperkin
Briefly:

* Between Solaris 10 and Solaris 11, Oracle bought Sun and killed the open source efforts known as OpenSolaris. illumos started from the final open source bits of what eventually became Solaris 11 and has now significantly diverged. Solaris is effectively dead, illumos is very much alive.

* There are a number of illumos "forks", of which SmartOS is ours, but all forks still share the common illumos code (we merge daily) and contribute heavily back to the common base. Each fork may contain features which aren't yet ready for merging back, e.g. our work to port KVM[1] to SmartOS is not part of illumos yet, but other distributions such as OmniOS[2] have taken that work and integrated it.

* SmartOS is a minimal distribution, we have removed a lot of parts (desktop, shared storage, etc.) which do not fit in with our explicit design goals, and added tooling around virtualisation. You boot from USB/CD/PXE into a minimal live-image hypervisor known as the Global Zone[3], and then perform work in zones which are backed by local storage. To upgrade, you simply replace the USB image with a newer platform and reboot into the new live image.

* As for software, we provide userland built from pkgsrc[4], which gives you access to over 13,000 packages available under /opt/local, allowing you to use both the SmartOS tools as well as any third party software you may need (e.g. GNU stuff). There is even full desktop stuff provided, should you want to use it[5].

* In Manta, when running a job you are basically running in a zone with as many pkgsrc packages pre-installed as we can manage (currently nearly 9,000), so the chances are that the software you need is available. If not, you can easily build it yourself and the store it back in Manta to use later as an asset[6] for your jobs.

* There is definitely a free tier for Joyent zones (search for "free" on [https://www.joyent.com/products/public-cloud/pricing](https://www.joyent.com/products/public-cloud/pricing)), I thought there was also an additional free tier for Manta but I can't see it right now, however it will only cost a few cents to do some basic tests in Manta and get a feel for what it can do.

I primarily work on pkgsrc, but I know a number of other engineers will be
reading this thread and can comment in more depth on SDC/Manta, so feel free
to ask any more questions or pop onto #smartos or #manta on Freenode IRC.

Thanks.

[1] [http://dtrace.org/blogs/bmc/2011/08/15/kvm-on-
illumos/](http://dtrace.org/blogs/bmc/2011/08/15/kvm-on-illumos/)

[2] [http://omnios.omniti.com/](http://omnios.omniti.com/)

[3] [http://www.perkin.org.uk/posts/smartos-and-the-global-
zone.h...](http://www.perkin.org.uk/posts/smartos-and-the-global-zone.html)

[4] [http://pkgsrc.joyent.com/](http://pkgsrc.joyent.com/)

[5]
[https://twitter.com/jperkin/status/348506063336783872](https://twitter.com/jperkin/status/348506063336783872)

[6] [https://apidocs.joyent.com/manta/jobs-
reference.html#assets-...](https://apidocs.joyent.com/manta/jobs-
reference.html#assets-assets-property)

~~~
jarpineh
Ok, thank you and others very much for your detailed info. I did use
OpenSolaris for a brief time, but kind of lost track after Oracle came and
took it.

In regards of the free tier, now I found it, but its mention is placed as the
last line, after all the paid plans and the part about "Contact sales if your
requirements exceed everything we've planned a price for" ;)

About Manta, I'd like to know about what kind of data objects it supports?
Like having a matrix of time series data and making quick selections and
sorts, near real time?

Docs talked about hierarchical storage and search, which sounds really useful,
too, but what is search in this context?

~~~
jperkin
Manta objects are effectively just files stored on ZFS, so you write anything
you like. Maybe this example session will help:

* Perform a directory listing of a tmp directory in my public area. mls(1) is the ls(1) equivalent for listing Manta directories.
    
    
      $ mls -l /jperkin/public/tmp
      -rwxr-xr-x 1 jperkin        540751 Oct 23 14:19 bbc.png
      -rwxr-xr-x 1 jperkin         27237 Dec 13  2013 libreoffice.tar.gz
      -rwxr-xr-x 1 jperkin        132079 Oct 25 02:01 lx64.png
      -rwxr-xr-x 1 jperkin       2397256 Jul 09  2013 nas-workdir.tar.gz
      -rwxr-xr-x 1 jperkin       1626181 Jul 10  2013 nas-workdir64.tar.gz
    

* Log into a Manta zone using bbc.png as my input file. This creates a zone, and maps in my file which is stored on the same machine (you are always operating on the same host as your data is stored). mlogin(1) is a nice way to prototype jobs in an interactive session, and once you have it working correctly you can use mjob(1) to run it automatically.
    
    
      $ mlogin /jperkin/public/tmp/bbc.png
       * created interactive job -- f1a2e579-34f8-4dd4-da19-db33954a0772
       * waiting for session... | established
      jperkin@manta #
    

* At this point it's just Unix, so I can run any command on the file (which has been mapped in under /manta) I like:
    
    
      jperkin@manta # uname -a
      SunOS 0ae1c6ec-d47a-455c-9dd6-97eec16da31b 5.11 joyent_20140628T000418Z i86pc i386 i86pc Solaris
    
      jperkin@manta # ls -l /manta/jperkin/public/tmp/bbc.png
      -rw-r--r-- 1 root root 540751 Nov  7 11:41 /manta/jperkin/public/tmp/bbc.png
    
      jperkin@manta # file /manta/jperkin/public/tmp/bbc.png
      /manta/jperkin/public/tmp/bbc.png: PNG image data, 1680 x 940, 8-bit/color RGBA, non-interlaced
    

* Note that only the file I chose has been mapped in:
    
    
      jperkin@manta # find /manta
      /manta
      /manta/jperkin
      /manta/jperkin/public
      /manta/jperkin/public/tmp
      /manta/jperkin/public/tmp/bbc.png
    

* Let's convert it to a JPEG using convert(1) from ImageMagick and store it back into Manta in the same directory using mput(1):
    
    
      jperkin@manta # convert /manta/jperkin/public/tmp/bbc.png /var/tmp/bbc.jpg
    
      jperkin@manta # ls -l /var/tmp/bbc.jpg
      -rw-r--r-- 1 root root 504004 Nov  7 11:46 /var/tmp/bbc.jpg
    
      jperkin@manta # mput -f /var/tmp/bbc.jpg /jperkin/public/tmp/
      /jperkin/public/tmp/bbc.jpg    [======================================================>] 100% 492.19KB
    

* This file is now available at [https://us-east.manta.joyent.com/jperkin/public/tmp/bbc.jpg](https://us-east.manta.joyent.com/jperkin/public/tmp/bbc.jpg) and ready for further Manta jobs.

Of course this is a simple and contrived example, the real power of Manta
comes when you have say 1,000,000 log files stored under a particular path and
want to grep them all for a particular string. To do that you'd do something
like:

    
    
      $ mfind /jperkin/public/logs -n "access_log.*.gz" | mjob create -o -m "gzcat" -m "grep something" -r "cat"
    

This will scale to whatever size your Manta cluster is, e.g. if you have 10
hosts then the log files will be split up across those hosts and they each
will spin up multiple zones to run "gzcat | grep" on the local data, before a
final "cat" reduce job is used to collate the results from each map job.

~~~
jarpineh
Wow, thank you, again. I have to definitely take a look this. My use cases
tend to vary so much that creating a Hadoop like system would require too much
custom coding.

I wonder if it is possible to have compression and de-duplication, so that
there could be a one big base dataset and lots of containers that only add
what new data they generate.

Anyhow, looking at this it feels really approachable. What I have in mind are
quick-and-dirty data-sciency scripts for ad hoc use cases, like diffing
structured files and combing over matrix data.

------
bch
I saw a Joyent manta demo at a node.js roadshow/meet-up a year or so ago, and
it was impressive. It's lovely to see the action (code)/object (data)
relationship rethought, whereby they said "let's bring the tools to the data
instead of lugging data to the tools." It's also cool, refreshing, and
validating to see "the tools" being Unix. "Imagine you want to grep across a
dataset and sort those records and pull out some key fields with awk..." No
need to imagine: that could be your exact plan, using the familiar tools you
know and love.

I'm looking forward to trawling (ha! more nautical theming!) this code for the
education, and maybe some gems.

Edit: close quote, autospell fixing.

~~~
bcantrill
If anyone's interested in a presentation introduction to Manta (presumably
similar to the one you saw), you may want to check out my FutureStack
presentation from last year.[1] And if you have 90 minutes to kill and are in
the mood for the true origin story of Manta, see my video from NYC DevOps in
January.[2]

[1] [http://www.slideshare.net/bcantrill/future-
stack](http://www.slideshare.net/bcantrill/future-stack)

[2]
[https://www.youtube.com/watch?v=79fvDDPaIoY](https://www.youtube.com/watch?v=79fvDDPaIoY)

~~~
pbowyer
Thanks, that's a very amusing talk!

------
e12e
Oh wow, my crush on Joyent just increased by two orders of magnitude. First,
now (finally!) being open, manta is a viable alternative, and second, landing
on MPL v2.0 seems like a very nice and pragmatic choice.

I highly recommend checking out the linked talk on Open Source anti-patterns
for those, that like me, haven't seen it. I do think Bryan is a little quick
on dismissing the GPL/CDDL problem (implying that RedHat didn't want dtrace)
-- but I do see how the CDDL could evolve as _unfortunately_ GPL incompatible,
rather than _intentionally_ so (he goes into this a little in the USENIX talk
on OpenSolaris/Illumous).

Exciting times.

------
knotty66
This is unbelievably neat. So much good news coming out of Joyent recently.

LX branded zones revived, funding to make SmartOS Docker compatible and now
SDC and Manta are open sourced. I love SmartOS.

It feels so elegant and well designed.

------
porker
See also comments on
[https://news.ycombinator.com/item?id=8567620](https://news.ycombinator.com/item?id=8567620)

------
chubot
This is cool. But, uh, how do I see the source to Manta? I read through all of
the README on
[https://github.com/joyent/manta/](https://github.com/joyent/manta/), and
cloned it, and can't figure it out.

Is there not a meta fetch/build tool like Chrome/Android have? I have built
those two projects from scratch with no problems.

Or do you have to start with the sdc repo?

EDIT: I just cloned the repos manually. I thought that there would be some
automated way to start from the manta repo and get all dependencies.

~~~
trentmick
FWIW, in the joyent/sdc repo there is an etc/repos.json so that you can do
something like:

(mkdir -p repos; cd repos; json -f ../etc/repos.json -a git | xargs -n1 git
clone)

We should add a etc/repos.json to manta.git as well.

------
nnq
Side-topic q: is anyone using Manta in production? And, do you think it could
be used as a sort-of-graph-db?

~~~
james33
We are heavily using it in production on
[http://casinorpg.com](http://casinorpg.com) (not a common use-case, but it
works well for us). Basically, when a player creates a new character, they are
able to customize out of millions of unique possibilities. Since the game is
HTML5, we must generate a unique PNG sprite for their character. We run Manta
jobs to do this by generating the image, compressing it and then storing it
directly in Manta (where it then gets picked up by a CDN).

------
openstacker
How does this compare to openstack?

~~~
bcantrill
Given that you're asking from a throwaway (and especially given your choice of
nickname), you're almost certainly trolling -- but given how old this thread
has become, I think it's safe to answer here for the sake of posterity...

OpenStack and SmartDataCenter address some similar problems: they both manage
fleets of physical machines and provide orchestration (provisioning,
monitoring, etc.) of standing virtual machines on those physical machines. But
in every other regard, these two projects are very dissimilar: they differ in
their goals, in their organization and in their technology choices.

Getting more specific, I (obviously) have a dog in the fight, so it's hard for
me to look at this with an outsider's perspective -- but I think the most
nonjudgemental way of phrasing the fundamental difference is that
SmartDataCenter is opinionated where OpenStack gives architectural choice to
the integrator and/or operator.[1] So in SmartDataCenter you don't pick the
storage substrate (it's ZFS) or the hypervisor (it's SmartOS) or the network
virtualization (it's Crossbow). While OpenStack deliberately accommodates
vendor differentiation, SmartDataCenter deliberately rejects it: we are
designed for commodity storage (shared-nothing -- and no RAID controllers,
please!), commodity network equipment (no vendor-specific SDN) and (certainly)
commodity compute. The upshot is that the integrator/operator needn't design
the system themselves -- which we know from experience can result in greatly
reduced times of deployment. (Indeed, one of the great prides of
SmartDataCenter is our install time: provided you're racked, stacked and
cabled, you can get a cloud stood up in a matter of hours rather than days,
weeks or longer.)

That fundamental difference -- opinion vs. choice -- has many ramifications.
For example, we have no interest in governance (sorry, democrophiles!); if
someone has a good idea, we'll do what most open source projects do and let
our user community make that determination. This is not to be overly
controlling; anyone who disagrees with us (or with anyone!) is welcome to fork
the project -- we have deliberately selected a fork-friendly license in MPLv2.
We also have no interest (zero, none, nada) in legacy enterprise hardware
vendors that are interested in cloud computing only as a vector for preserving
their inalienable right to screw their own customers; there won't be "hooks"
or "plugins" in SmartDataCenter or generally other such sheep-like wolves.

There are many differences, of course, but I think many can be explained by
the fundamental difference in engineering principles of the two projects.

Anyone who is interested in the nature of the opinions asserted in
SmartDataCenter -- or in helping form new ones! -- should check out the
repos[2][3] (we try to document our thinking), join the mailing lists, and/or
join us in IRC at #smartos on Freenode.

[1] This is a bit of a "pro-choice/pro-life" nomenclature in that I have
deliberately picked nomenclature that I think these two projects would use to
put themselves in the best light. More candidly, I would say that OpenStack
seems to me to be a discombobulated mess -- and I'm sure to OpenStack,
SmartDataCenter would seem supremely fascist.

[2] [https://github.com/joyent/sdc](https://github.com/joyent/sdc)

[3] [https://github.com/joyent/manta](https://github.com/joyent/manta)

~~~
jacques_chester
Would you accept the characterisation that SDC is an "opinionated IaaS layer"?

~~~
bcantrill
I think that that's completely fair -- and in fact, it's in the "Design
principles" section of README.md in the SDC repo[1]: "SmartDataCenter is very
opinionated about how to architect a cloud. These opinions are the result of
many years of deploying and debugging the Joyent Cloud."

[1]
[https://github.com/joyent/sdc/blob/master/README.md](https://github.com/joyent/sdc/blob/master/README.md)

