
Clustrix (YC W06) Builds the Webscale Holy Grail: A Database That Scales - pg
http://gigaom.com/2010/05/03/clustrix-builds-the-webscale-holy-grail-a-database-that-scales/
======
leif
Good for them and all, and I know they're still kind of stealth mode, but this
article doesn't say much to me.

It's not that I doubt they can deliver, I just want to see some numbers. The
one graph I could find ([http://www.clustrix.com/wp-
content/uploads/2010/04/clustrix-...](http://www.clustrix.com/wp-
content/uploads/2010/04/clustrix-brochure-01-no-on-sql-mysql-object-key-value-
store-database-scaling.pdf)) really doesn't say much, and to be honest, I
can't even really tell how I'm supposed to be reading it.

You don't go and build the best sports car in the world and then tell everyone
"it goes really fast", you show videos of it screaming around a curvy mountain
road. I want my curvy mountain road video.

~~~
bertove
They were supposed to launch on May 4, but the story seems to have got out a
day earlier. There should be more details coming, starting from their official
launch at Web 2.0 tomorrow.

------
snitko
I thought the holy grail of the web scale is being able to start with little
(startups cannot pay 80K for a licence), then painlessly scale the thing. What
they offer seems more like a better price option for large enterprise
solutions - licence is (supposedly) cheaper than the cost of scaling. And
that's cool, ok. But, personally, what I would like to be able to do is to
launch a small application with a DB holding, say, 10K records and then scale
it up to billions of records investing money when I actually have them. Or
not.

~~~
ErrantX
Agreed. We purchase an Oracle driven enterprise app that would probably be the
target market for this. We would hold billions of records if it was economical
to do so - the problem is that 80K is a _lot_ and it is probably cheaper to
pay me to do 3hrs a week maintenance/backup.

The other area we might have used this is somewhere we have about 40 billion
database entries (several TB's of data) and growing. However, like Google et
al, once you get to that scale it's not worth paying for a proprietary service
because cheap/comodity hardware on a FOSS stack is _a lot_ cheaper.

However, there is almost certainly a market here for "enterprise" type
businesses (you know, the sort of grey faced telecoms-style place). The
problem is supplanting Oracle...

------
rbranson
* Cool. This is a hard problem.

* Custom hardware sounds expensive.

* This does not solve CAP issues.

* There is a physical limit to ACID scalability because of synchronization latency and network throughput maximums. KV stores don't have this issue.

* Will developers still have to implement KV caching layers for database data? I suppose the real question is: does the system accelerate "hot" data where normal RDBMS sharding schemes would otherwise not?

* Does the linear scalability also apply to OLAP-type activity where queries are ran against large datasets?

* What are they doing that Oracle RAC isn't?

~~~
sergei
Oracle RAC allows multiple machines to synchronize access to a shared disk.
Each node in a RAC cluster locks individual disk pages and brings the data to
the node for processing. Only one node is working on a query a time, while
synchronizing access to resources at a physical level.

Clustrix is fundamentally different. We compile the query down into query
fragments and evaluate relevant portions of the query at the node which holds
the relevant data. We bring the query to the data.

You can find more specifics on our architecture in our white paper:

[http://www.clustrix.com/wp-
content/uploads/2010/04/clustrix-...](http://www.clustrix.com/wp-
content/uploads/2010/04/clustrix-whitepaper-01-no-on-sql-mysql-object-key-
value-store-database-scaling.pdf)

Clustrix is all about concurrency and parallelism. While we didn't
specifically focus on OLAP, our model allows us to do very well for these
types of workloads because we break the query apart and let individual nodes
work on their portion of the query.

While we deliver our product as an appliance, it's all based on commodity
hardware.

If you had an infinitely scalable database, would you still want to put a
caching layer in front of it? Most people would rather not deal with the added
complexity.

In a typical sharding scheme, one would choose a partition key (typically some
primary key, e.g. user id) and use that as a distribution strategy. At such
granularity, hot spots are a common problem. In Clustrix, we have independent
distribution/partitioning for each table/index, down to the row level. To get
hot spots is if you contend on a single row -- but that's really an indication
of something going terribly wrong in the data model.

ACID (transactional properties) and KV (data model) are orthogonal concepts.
Our systems scales exceptionally well and supports a wide range of data
models.

~~~
strlen
I'd hesitate to call RAC a true example of "scaling out". As you've mentioned,
RAC relies upon shared disk in a SAN to provide data storage. Exadata would be
the more appropriate comparison: there's also the "appliance" model and while
Exadata was originally meant as an OLAP solution, it is claimed to be OLTP
capable via SSDs and fast interconnect.

You're also very correct in that a caching layer in front of a database that
_itself_ includes a cache seems like added complexity and resources (similarly
to using a query cache _in front_ of InnoDB's own cache with MySQL). A true
scale out solution should not require (or even significantly benefit) from a
caching layer.

However, I don't see how your product runs on commodity hardware: I can't buy
a machine from Supermicro to run it, it makes use of high-speed interconnect
(not something I can place in most data-centers). It seems more targeted at
e.g., billing/accounting systems (which require strong transactional
integrity) rather than storing high-scalability web application state (e.g.,
sessions, shopping carts, user histories, analytically derived data, CMS
content, small pictures).

"Appliances" generally have the stigma of being difficult to manage, requiring
more admins for for the same number of machines than regular hardware.

"Web scale" companies (e.g., Facebook, Google, Amazon, Yahoo) can run _huge_
clusters with very few numbers of administrators (easily thousand machines per
person) due to extensive automation of the whole process. Since the appliances
running on a different OS image, the operations team can't manage them via
their standard set of provisioning and configuration management tools: when
one goes down, they can't use their main jumpstart/kickstart server to perform
an unattended install; there will be difficulties managing its configuration
via standard tools like puppet/cfengine/chef (much less custom tools that such
companies write). From what I've seen, they end up typically relying on
separate teams to manage their Oracle RAC installations due to these issues
(these teams often have a much worse admin:machine ratio e.g., multiple-person
teams for a <100 node RAC cluster). Facebook opts to stick with MySQL partly
because their operations team know how to automate every bit of it.

How do you plan to address these issues?

That being said, I love seeing start-ups solve difficult problems.
Congratulations on shipping a product that solves a tough problem.

~~~
rbranson
> You're also very correct in that a caching layer in front of a database that
> _itself_ includes a cache seems like added complexity and resources
> (similarly to using a query cache in front of InnoDB's own cache with
> MySQL). A true scale out solution should not require (or even significantly
> benefit) from a caching layer.

Caches can be implemented at different levels. I'm talking about query caching
(Exact SQL -> Exact Results). How would a query cache be effective in an
environment like Clustrix (distributed MVCC)? Invalidation would be almost as
expensive and hard to manage as the DELETE/UPDATE itself. With lots of RAM,
it's probably cheaper just to run the query.

------
DenisM
The "OLTP landscape chart" raises red flags. Matching functionality of Oracle
in 4 years and $18m is an extraordinary claim.

So this raises the question as to what exactly can it do? The white-paper
covers straight select, insert, sort and a an inner join which is pretty good
already, but it doesn't go beyond that.

~~~
bertove
I believe it supports the entire mysql api.

------
boris
The complete MySQL API compatibility is a bold statement.

If they implemented the whole thing themselves (I would say that should take
more than 4 years in a couple-of-guys setup) then maintaining that
compatibility must be/will be a nightmare. If you sold me a product as MySQL-
compatible and I find that some bizarre and undocumented side-effect of MySQL
implementation that my code relies on is missing, you will have to emulate
that side effect.

If, on the other hand, they based their implementation on MySQL code, then
this raises some licensing questions (MySQL is GPL so they will have to make
their changes public). They may have separated the code somehow so that this
is less of an issue, though it adds a bit of uncertainty to the whole setup.

------
lzimm
there was this programming language i remember reading about a while ago that
was basically about moving computation around to different nodes to bring the
"function to the data" as opposed to the other way around -- which is the
exact same thing that the white paper says clustrix is doing.

anyways, with any sort of non-trivially related data, it becomes a continuous
graph optimization problem to minimize the number of nodes you hit on any
given "query" (ie: keep data that tends to be accessed together on the same
nodes or die), and can easily see how though it may seem like an elegant
solution, that you can very quickly run into scary problems that really are
just a giant tangled mess.

anyways, distributed joins aren't that hard. select and insert are solved by
the key-value/consistent hashing dbs. even mysql cluster gives you all that
stuff these days (the whitepaper mentions non-trivial multi-master joins)

holla

------
Maro
Hey, they're looking for a "Technical Support Engineer - 6AM Shift":

Required Skills / Duties: ... \- Able to lift 80 lbs several times in a single
day. Our servers weigh between 50 and 140 lbs depending on configuration.

Sounds more like a bodybuilder position to me =)

<http://www.clustrix.com/about/careers>

------
lsb
_Clustrix plans to sell its appliance (which consists of more than a terabyte
of memory and its proprietary software)_

If you give people a few TB of memory, of course database reads are going to
be fast.

~~~
DenisM
I think they meant TBs of storage...?

~~~
sl_
_Each CLX 4010 appliance contains:

Two Quad-Core processors (8 CPU cores) 32GB memory Seven 160GB solid-state
drives (SSD)_

~~~
lsb
See, that's a shame it's only 32GB of memory. I can go on dell.com and buy a
machine with 128GB of RAM for about $10k, far less than I'd expect they're
selling their boxes for.

------
mattmaroon
My company might have bought this 6 months ago assuming it provably works as
advertised. We ended up sharding (which I'd gladly pay $80k to have avoided)
and going Cassandra for future apps.

------
DenisM
_hundreds of nodes_

Whitepaper seems to imply all nodes are connected to all other nodes, using
infiniband. Infiniband's point-to-point afaik, so it'd be interesting to know
how they plan to do that with hundreds of nodes.

~~~
josephruscio
IB is typically deployed in a Fat Tree network:
<http://en.wikipedia.org/wiki/Fat_tree>

It's been used in high-performance compute clusters with thousands of
individual nodes.

~~~
wmf
Gratuitous Infiniband porn: <http://blogs.sun.com/jonathan/entry/size_matters>

3,456 ports.

~~~
DenisM
Dear god. I had no idea.

------
jbellis
Superficially at least it sounds a lot like what you'd get if you added SSDs
to Oracle's Exadata.

~~~
drosenthal
Exadata does use SSDs now, in the form of Sun's flash memory cards (though
they still are backed by rotational disk). You're right, however that this and
Exadata look to attack the same problem in a similar way. If we believe
Clustrix about the "100s of nodes" rather than the 20 they show data for, it
would likely out-perform Exadata by a bit too.

------
coffeemug
Congrats Sergei and the Clustrix team! Scalable databases are a _really_ hard
problem to solve. It's great to see products that push the boundaries of
what's possible.

------
shaunxcode
How is this comparable to fathomdb in terms of what it provides on a business
level. (not how i.e. fathomdb leverages 'the cloud' where as this seems to be
using specialized hardware?)

------
bkrausz
Their website has no call to action or availability information...only a
hidden "Contact Us" link. What if I want to buy one (I don't, but what if)?

~~~
apphacker
It's $80,000 for a 3-node machine running the software, and what, you're
looking for a Guaranteed Paypal merchant button or something?

~~~
bkrausz
No, I'm looking for a call to action if I'm interested in finding out more. I
recognize that big purchases are on a salesperson system, but I should have an
easy avenue to reach these salespeople.

------
sown
I wonder what they look for in hiring? I guess experience is a big plus and
general smarts.

I'm getting bored at my current job. I think if I could port our enterprise
solution to cassandra it would help in an interview?

------
ck2
This makes me happy to hear, especially that it's NOT April Fools!

------
einarvollset
Greenplum.com

