
Ex-Facebookers launch MemSQL (YC W11) to make your database fly - ericfrenkiel
http://gigaom.com/cloud/ex-facebookers-launch-memsql-to-make-your-database-fly/
======
Shish2k
I've seen a lot of these new SQL databases, and they all seem to compare
themselves to MySQL. That seems like a particularly easy target, since MySQL's
main strength is speed, and it's a fundamentally old-school (ie slow) design
which has sacrificed everything else to get half-fast. What I'd like to know
is how any of these compare in features and reliability to the featureful,
reliable databases like Oracle and Postgres.

(For specific examples that I've run into recently: does it have geo-spatial
extensions, and does it crash and burn and corrupt all its data if a malloc()
fails)

~~~
chris_wot
It still uses a transaction log, but seems to make snapshots of the logs at a
period of time to create compressed snapshots of the data itself.

You can turn off durability, but then you risk losing _everything_.

For features... forget about geo-spatial extensions - they can only do read-
committed isolation level... but as they only allow for one SQL statement per
transaction I suppose this might not be such a big concern.

If you were using this for something like a session management database, or
something where durability is not so important then it's probably fine. Not
sure I'd use it for anything that relies heavily on transaction management
features.

I see other limitations, like it cannot support CHANGE COLUMN, and it can't do
joins on more than 2 tables. Actually, it doesn't look like it can do FULL
OUTER or RIGHT OUTER joins, either. :(

~~~
nikita
Memsql is a database that does some things incredibly well: deliver on high
throughput for on small transactions and some things not as well (surface
area).

Of course a new product will have certain limitation which will be removed as
the product matures.

~~~
chris_wot
No probs, like I've said elsewhere, not attacking the product. Looks very
interesting! I guess I'm trying to work out what sort of market you are trying
to target here.

------
vessenes
This company was probably an extremely easy investment decision:

1) Facebookers: Check

2) Data Scaling experience: Check

3) In-Memory with SQL semantics: Check

4) New-York based software that can be sold to quant funds: Check

I'm excited. It's downloading now.

~~~
moe
Is it just me or does this entire product smell like it was designed for the
sole purpose of extracting money from less than tech-savvy investors and
clueless institutions?

The bullshit-bingo-lingo on their homepage is mindnumbing.

Meanwhile their actual software seems rather underwhelming, bordering on
SnakeOil.

~~~
vessenes
If you've ever tried to use SQL on a large, but fittable-in-memory dataset on
existing row databases, then you would not be underwhelmed. Existing non-
finance-industry solutions suck for things like mid-size trading queries, or
(in coinlab's case) work setups that require periodic, frequent table scans.

You can buy something like kdb, but that costs $70k a year and requires that
your engineers learn some extremely new semantics if they're only used to SQL
and (choose any popular language here, Python to C++).

~~~
moe
So, MemSQL is a ColumnStore now? The docs didn't make that clear for me.

Anyway, why is there not a single benchmark available to support the sales-
pitch?

(In your favor I'll just pretend you didn't mention kdb here...)

~~~
gruseom
_In your favor I'll just pretend you didn't mention kdb here_

Please explain. What's bad about kdb, besides how nonstandard it is?

~~~
moe
There is nothing bad about it (other than the price). It just seems an
outrageous claim that MemSQL, with all its constraints, is even in the same
game.

At the least they should come up with some seriously impressive benchmarks
before dropping names like that.

------
moe
Is this a joke?

Data must fit in RAM, no joins over >2 tables, no transactions, no builtin
support for clustering/sharding/horizontal scaling whatsoever.

What is the advantage over memory-tables, MySQL on a ramdisk or something
purpose-built (redis)?

~~~
old-gregg
On the other hand, if your data fits in RAM and you don't need to join more
than two tables and you don't need clustering/sharding whatsoever and you need
the full power of SQL, what would you use?

 _> ..something purpose-built (redis)?_

Errr maybe "features" like GROUP BY with HAVING? I think you're trolling, Mr
Moe.

~~~
moe
_trolling_

Says the guy who resorts to selective quoting to make a point? You may want to
review my original comment where I mention two other options.

~~~
MuGo
nope. you're definitely a troll

------
snissn
MemSQL does not yet support JOINs on more than 2 tables. [1]

1\. <http://developers.memsql.com/docs/1b/sql/join.html>

~~~
chris_wot
Nor transactions with more than one statement. And only read committed
isolation level. [1]

1\. <http://developers.memsql.com/docs/1b/isolationlevel.html>

~~~
snissn
This seems really confusing and like it might be a maintenance nightmare
<http://developers.memsql.com/docs/1b/memory.html>

Do you need to ensure that all of your data fits in memory? It's easily
possible to have ten gigs in your db but only need a small amount of it to be
hot in ram. Does it take a really long time to restart if it has to warm all
the data?

~~~
chris_wot
Good questions - no idea! Can one of the developers of MemSQL clarify this?
How long does it take to startup a large database of several 100 GB?

~~~
nikita
Recovery on the system restart is going at the speed of hard disk.

~~~
chris_wot
Can you pin how much data from a table can be placed into memory, or does the
whole table need to be placed in memory? I can see settings that limit the
transaction log memory usage, but no way of reducing the actual amount of data
in-memory.

~~~
nikita
The whole table has to fit in memory. RAM is getting very cheap, you can buy
1Tb for 12K today.

~~~
tintor
Recovery on the system restart is going at the "sequential scan" speed of hard
disk.

~~~
nl
The "sequential scan" speed of a generic SSD is pretty good. It's even better
on something like FusionIO.

------
alexro
How it compares to VoltDB, which used similar approach and already "combines
the proven power of relational processing with blazing speed, linear
scalability and uncompromising fault tolerance" (according to VoldDB website)
?

~~~
nikita
There are few differences with VoltDB. 1\. It's much easier to use (MySQL
compatibility, ad-hoc sql no java dependency) 2\. It doesn't have any issues
with data skews. We are using lockfree skiplists vs b-trees partitioned by
core. 3\. I believe we are faster on a single box, however this claim should
obviously be verified by a third party.

~~~
jhugg
Congrats to the MemSQL team. It's always validating to see more entrants into
the space. The comparison between the two systems is fascinating; the approach
taking is really different than VoltDB, but the systems share some similar
choices as well.

As for your comments:

1\. Ad-Hoc SQL in VoltDB is a current area of focus in VoltDB development. We
use similar compilation and expect them to be no slower than our stored
procedures in the 2012 timeframe. Starting with our release next week, we'll
also support multiple SQL statements in a serialized/ACID ad-hoc transaction.

2\. On a single-node (or a several replicated nodes), VoltDB doesn't really
have many issues with data skews. Rebalancing a distributed system is also in
development, but hasn't turned out to be much of an issue with our production
customers.

3\. Faster at what? Single statement non-durable transactions? That's probably
true for simple SQL operations for the time being. It gets murkier once you
consider VoltDB features like materialized views that avoid table-scans
altogether. I am also curious how durability affects performance though.

But I'm really interested to revisit this comparison once MemSQL supports real
transactions and replication/sharding. For the time being, most of the VoltDB
use cases we target won't run on it.

------
spitfire
There's another in memory mysql replacement whose name escapes me right now.
How are they doing?

Ycombinator seems to have backed a few of these next-gen databases. It will be
interesting to see who wins.

EDIT: Rethinkdb. But it looks liked they've dumped their mysql engine and gone
to a pure key-value store.

~~~
JonM
Xeround looks really interesting (<http://xeround.com/>). Seems to do
everything in RAM and takes care of any replication / backup & scailing for
you. Looks like it would get expensive with large amounts of data though...
anyone using this in production?

~~~
nikita
One of the things that separate us from Xeround is the fact that you can see
the benefit of super high throughput on any linux box, including your laptop.
You can download the software and use it by yourself.

Xeround is a sas model, where you can't get exact same experience as on your
local machine.

------
zwischenzug
I'm obviously missing something. Is SQL really the bottleneck for SQL
performance? I'm not sure how much compiling down to C++ will help.

Is there more to this than the article suggests?

~~~
al_james
I think its that its an in memory database, optimised for memory as opposed to
disk. The SQL to C++ seems a bit of a confusion to me, I guess it means it has
a JIT SQL compiler to optimise the query to native opcodes. Meh. But the in
memory and mysql wire compatibility are big wins.

~~~
chris_wot
I don't believe this is the case - it's not an in-memory database.

Even if it is (and I can't see anything on their developer website) then if
it's ACID compliant, like they say, then it will have to be writing to the
disk at some point - at the very least it will need a transaction log.
Otherwise, it's not really usable for anything you need to persist to long
term storage!

~~~
al_james
Yes, its not clear on their site, but the gigaom article states: "As its name
implies, MemSQL achieves its fast performance in part by keeping data in
memory".

Yes, it does write to disk (append only logging I think) but my point was,
that if you are keeping ALL your data in memory, you can optimise the storage
for fast queries, as opposed to a hybrid / paged memory and disk system.

~~~
chris_wot
Oops, sorry - I'm wrong. Actually, according to their documentation:

 _"By default, MemSQL runs with full durability enabled: transactions are
committed to disk as a log and later compressed into full-database snapshots.
The snapshot and log files are stored in the datadir directory configured in
memsql.cnf. When MemSQL is restarted, it will recover itself into the state it
was in before the shutdown, by reading the snapshot and log files in datadir"_
[1]

I missed that piece of documentation on their website. I think they could do
better to explain better the advantages and how their technology works. Their
product overview doesn't really tell you much of anything, except that they
have an enhanced query parser. [2]

1\. <http://developers.memsql.com/docs/1b/durability.html>

2\. <http://developers.memsql.com/>

------
continuations
Why should I use this over MySQL Cluster?

MySQL Cluster is lock-free, distributed, HA, and has better performances:

[http://mikaelronstrom.blogspot.com/2012/05/mysql-
cluster-72-...](http://mikaelronstrom.blogspot.com/2012/05/mysql-
cluster-72-achieves-43bn-reads.html)

[http://mikaelronstrom.blogspot.com/2012/05/mysql-
cluster-727...](http://mikaelronstrom.blogspot.com/2012/05/mysql-
cluster-727-achieves-1bn-update.html)

And MySQL Cluster has been used in mission critical apps for years.

What does MemSQL give me that MySQL Cluster doesn't?

------
chris_wot
Question - how well does the query parser handle stale queries? SQL Server for
the longest time had issues with plan stability in that the plan became _too_
stable. When the data distribution changed dramatically, the queries didn't
age out of the cache and the queries would do such things as use the wrong
index, or not work out the correct cardinality of a table and then use an
index where really a full table scan would have been better... and so on.

How does the database handle this sort of thing?

~~~
nikita
Right now you can handle this with query hints. The plans are stable, but they
are attached to the query text. When you throw a hint in there memsql
generates a new plan since the query text changes.

~~~
chris_wot
To clarify - if you add the query hint, run the query, then remove the query
hint then has the original query's plan expired?

 _Edit:_ I know I'm asking a lot about the plan cache, but as it's a core
selling point, then I'd like to know how data distribution changes will affect
performance. Stale queries will potentially adversley affect performance under
certain circumstances.

~~~
tintor
No, original query plan will only expire if indexes are dropped or new ones
are created.

------
acdha
It looks like this is a fork of MySQL. Did they actually pay Oracle for a
proprietary license or is there a source release hidden on the website?

~~~
ericfrenkiel
yes, we have commercial licensing agreements in place.

------
al_james
Looks useful. Drop in mysql compatibility is a huge plus for many projects
that are tied to mysql (for whatever reasons).

Does anyone know if there are any plans for having MongoDB style replica sets
(e.g. sharded and replicated databases in a cluster)?

Also, it would be great if it supported the native mysql replication, so you
could have MemSQL replicas of a master mysql DB.

~~~
ankrgyl
Sharded/replicated MemSQL is in the works. And yep, the replication will be
compatible with MySQL replication to enable grouping MemSQL and MySQL nodes.

~~~
al_james
Excellent! Good work!

Side note: IMO, If you want to completely dominate this space, make it super
easy to add new nodes and replicate the existing DB without having to take a
snapshot from the master.

~~~
nikita
Thanks! This is absolutely on our minds.

------
factorialboy
Looks promising. Love how it inter-operates with MySQL driven apps.

I should be able to try it for one of my apps over this weekend!

------
ozgune
Congrats guys on the launch! Looking forward to giving it a spin.

------
prezjordan
Looks like they're using jQuery Knobs: <http://anthonyterrien.com/knob/>

~~~
nikita
Yes, for the workload simulator :)

------
alexrson
Congratulations, Eric!

------
Produce
Why not just use objects in your favourite shitty OOP language as usual and
take regular snapshots of the data in memory and save them to disk as rollback
points instead? Bam! Nothing new to implement in the application. Bam!
Lightening fast access and write times. Bam! Data persistence.

Bam!

------
ericmoritz
Wow, HN is harsh. This is probably a really solid product that solves a
specific problem and you all nit-pick at it's trade-offs and poo-poo it.

~~~
MrMan
HN is not harsh. HN is one of the main marketing channels for YC-funded
projects such as this.

~~~
ericmoritz
Sorry some commenters on HN are harsh

------
vmalkani
Congrats!

