MariaDB acquires Clustrix

mr_pickles · on Sept 21, 2018

Alright, for your education startup employees, let's run through some numbers I have as a stockholder in Clustrix.

In 2010, they raised $12M in their Series B at ~$100M post-money valuation. Things were looking alright.

In 2013, they raised $16.5M in their Series C, and then shortly thereafter $10M in an unusual series D. That funding round reverse-split the outstanding stock 26-to-1 and converted all existing shares, preferred or otherwise, to common stock. What was left was $10M in new preferred stock, and $20M in existing common stock! New post-money valuation: $30M. This down round ended up being a 30x dilution for existing shareholders. If you had a tenth of a percent of $100M before, now you had a hundredth of a percent of $30M. Yowza!

After that bath, the board amended the charter so they stopped mailing out these notices. I don't know what's happened since, but I'll find out soon enough.

I feel bad that the company wasn't successful. It really was a great team and an impressive technical feat.

wumpus · on Sept 21, 2018

VCs do deals all of the time, and us startup people do only a few, so I'm hardly an expert... but this down round sounds unusually friendly to common stockholders.

Shelnutt2 · on Sept 20, 2018

First they picked up infinidb a while back and have been working to mainline "mariadb columnstore". Now the acquisition of Clustrix and discussion of mainlining that also. It looks like MySQLs separation of the storage engines is paying off in their ability to keep one interface but allowing significantly different backends to meet the different workload requirements.

There has been a lot of work expanding the storage engine API so columnstore can be mainlined and work so spider can progress. I imagine that progress has showed them that they are likely to be able to integrate a much larger and less connected (to existing mariadb) code base like clusterix. I can only open they decide to open source (BSL license?) the clusterix solution eventually.

Between MyRocks (replacing tokudb), columnstore, spider and eventually clustrix seems that mariadb is trying to making the case that they can handle any size workload being through at it.

karmakaze · on Sept 20, 2018

I had high hopes for MyRocks, then I got a chance to use it. The limitations, mainly being 5.6 and no coexistence with InnoDB made me reevaluate TokuDB and it was a better choice for a write heavy, low update, workload especially with interval flushing (non-fsync durable) commits.

zepearl · on Sept 20, 2018

I am using as well MariaDB + TokuDB with "a write heavy, low update, workload especially with interval flushing (non-fsync durable) commits" => do you have maybe a short list of the limitations of MyRocks in MariaDB for this area?

I tried to use MyRocks (never used it before) in MariaDB some months ago but couldn't find almost any docs and ultimately didn't understand which parameters were supposed to be set how under which condition... .

sethhochberg · on Sept 20, 2018

IMO this is one of the biggest issues with the alternative storage engines for MySQL-family databases... we've also experimented with TokuDB for log-like data but found that, ultimately, the shortage of detailed documentation and operational issues like needing to develop homegrown tooling for things like backups overpowered the performance benefits.

InnoDB isn't perfect, but it _is_ exhaustively documented and pretty well-understood, with a great set of related tools from Percona, etc, for simplifying operations. That goes a long way.

Recently we've switched back to using InnoDB for ingestion on one of our write-heavy tables and aggressively archiving the data out of it and into Clickhouse (InnoDB deals with the high volume of concurrent inserts, data is loaded into Clickhouse in large batches for querying). By comparison to Toku or RocksDB, Clickhouse is refreshingly well-documented and its easy for us to make consistent backups with ZFS snapshots.

mdcallag · on Sept 21, 2018

There are many options but most don't have to be set. We need to improve the tuning experience. See http://smalldatum.blogspot.com/2018/09/5-things-to-set-when-...

mdcallag · on Sept 21, 2018

MyRocks isn't 5.6 only. It is in Percona 5.7 and MariaDB 10.3

I agree that tuning is too complex and we should do much better there. This explains where to ask for advice - http://smalldatum.blogspot.com/2018/05/where-to-ask-question...

no1youknowz · on Sept 20, 2018

How is this any different from what MemSQL is already doing?

newnewpdro · on Sept 20, 2018

Clustrix had a great team and technology but nearly everyone responsible for building the product left the company long ago. Even Sergei, the founding CTO, eventually left for Dropbox a few years ago.

It'll be interesting to see if this acquisition results in the interesting clustrix bits becoming libre software.

sciurus · on Sept 20, 2018

I was unfamiliar with Clustrix. It looks like they've been around a while (YC06) and have some sophisticated technology (http://docs.clustrix.com/display/CLXDOC/Distributed+Database...). A horizontally scalable drop in replacement for MySQL is nothing to sneeze at.

https://www.clustrix.com/

newnewpdro · on Sept 20, 2018

There's some technical information still available on Sergei's old company blog:

http://sergei.clustrix.com/

spooneybarger · on Sept 20, 2018

We used it at a previous employer of mine. It had a few quirks but in the end, it saved the company from a mountain of technical debt by allowing us to avoid a huge rewrite to address database scaling issues that were becoming intractable.

reiger · on Sept 20, 2018

I believe many of their founders were ex-Isilon people.

ddorian43 · on Sept 20, 2018

I think it's in-memory. There is mysql-ndb-cluster free.

newnewpdro · on Sept 20, 2018

> I think it's in-memory.

In-memory support was added later, though I'm not sure if it made it into a release as I stopped paying attention around when they started developing that feature.

They originally ran exclusively on specialized appliances having battery-backed write caches. The implementation is all in C using Linux AIO w/epoll. It's a sophisticated, high-performance storage engine.

When everyone shifted to the cloud and stopped owning hardware, Clustrix was forced to add a file-based backend. This still uses the high-performance AIO+epoll engine.

meguest · on Sept 20, 2018

I spent a lot of time at a previous job supporting an NDB cluster and I can attest first hand that it is awful.

Single node failures cause entire cluster shutdowns, the cluster then takes forever to recover and must be done in a specific order. In fact just thinking about it makes me anxious.

ddorian43 · on Sept 20, 2018

What version was this ?

mr_pickles · on Sept 20, 2018

Early Clustrix employee here. Holy cow that took a long time! 12 years since the company was founded and finally they exit. I can't wait to exchange my illiquid startup stock for... stock in a _different_ private company.

newnewpdro · on Sept 20, 2018

So it was a 100% stock purchase? Bummer man.

mr_pickles · on Sept 20, 2018

I don't actually know the terms yet, I just assumed. I only learned of the acquisition from hackernews.

TylerE · on Sept 20, 2018

Beats the stock turning into toilet paper.

pinewurst · on Sept 20, 2018

But it turned from one brand of toilet paper into another, with indeterminate scratchiness.

misterbowfinger · on Sept 21, 2018

Couldn't really parse out the website for what Clustrix actually is. Is it basically a leader node that distributes writes to key-value stores, and then the reads figure out where the data is by a partitioning scheme, with the benefit of MySQL protocol? Similar to CockroachDB?

mr_pickles · on Sept 21, 2018

It was a fault-tolerant, fully distributed relational database which was compatible with MySQL's variant of SQL. There were no key-value stores involved.

Tables (and indexes) were automatically partitioned and replicated as needed, completely under the covers.

Queries (reads and writes) were distributed to the nodes where the data resided, in parallel.

Scaling the system was as simple as adding new nodes. Data was automatically rebalanced to take advantage of the new capacity.

Failure recovery was automatic too. If a disk or node failed, the data involved would be reconstructed from replicas and moved elsewhere with no interruption in service and no failed transactions.

It was a pretty impressive system, which predated Google Spanner. But, in the early days, you had to run their custom hardware to get it. There was no cloud version.

misterbowfinger · on Sept 21, 2018

Thanks! Is there a diagram that shows how it works? I'm still having trouble visualizing it.

nehcsivart · on Sept 21, 2018

Not a diagram, but here is an informational video by Clustrix: https://www.youtube.com/watch?v=PUq1fYZlNPs

The video is from almost 5 years ago, but the high level idea discussed is still true today.

Rafuino · on Sept 20, 2018

What does Clustrix offer that MariaDB Cluster within its MariaDB TX offering not?

erulabs · on Sept 21, 2018

I believe the main (and only?) purpose of Clustrix is sharding, which MariaDB Cluster doesn't provide - with the provision I suppose that multi-master is strictly -not- the same as sharding.

mjw00 · on Sept 21, 2018

Not exactly. With Clustrix, your application isn't aware of the data partitioning. Instead, the query compiler creates a multi-stage program that forwards tuples from one partition directly to the next, with only the result set coming back to the node the application is connected to. Applications can connect to any node within the cluster and issue queries (usually this is done through a load balancer).