This is really important, as MySQL has a weird dialect of SQL with all kinds of broken MySQL-isms (including fundamental things like not using the SQL-standard meaning for the various forms of quoting) as well as drastically limited support for supposedly-advanced things like sub-selects and truly-advanced things like windowing and partition functions...
Seriously: I don't think I have any apps that would actually /run/ on something that spoke MySQL without recoding all of the data access parts (and that's pretty much what a web backend is) from scratch, swearing and cursing the entire time about why I don't get to use X or Y or Z feature.
To me, this is just "yet another of those MySQL optimizer companies" that we see come and go year in and year out, and one that is almost certainly going to experience the same fate as all of the other ones: they are a dime a dozen, and almost always fail. The failure rate on PostgreSQL optimizer companies, however, seems much lower, despite the fact that they don't seem to get the fancy funding (probably because, if you don't take into account the underlying demographic of "performance-savy" from "database user" it looks like a smaller market).
We maintain the MySQL protocol on the wire to minimize friction for existing applications.
We really see ourselves as a counterpoint to NoSQL, which has traditionally offered speed but sacrificed functionality. MemSQL is as fast, scalable, yet retains SQL.
The whole thing makes a lot of sense to me.
Is it based on the same paradigm of single-threaded serial access? What differentiates it from VoltDB? Is it the MySQL compatibility and ability to use ad-hoc queries instead of stored procedures? (If so, how do you make that bit fast?)
Why would I use MemSQL as opposed to relational database on SSD? RDMS on SSD gets me a similar speed boost as opposed to running on a completely new database.
+ DRAM is actually much higher on the memory pyramid than SSDs, so MemSQL handily beats MySQL on SSD any day
+ Putting MySQL on SSD doesn't change the fact that MySQL isn't optimized for SSDs or DRAM; so you're still using b-trees, latching and locking, etc.
+ SSDs have poor write endurance, meaning it has low number of write cycles
+ You would still have to shard your database yourself; while MemSQL helps manage that process
+ DRAM is available on the cloud today while it will be some time before SSDs are offered on most public clouds.
DRAM is an order of magnitude slower than SSD's. It's also an order of magnitude more expensive, however. You can get a TB of SSD for about $2k, which means 50TB of SSD for $100k, or the price of a single engineer for a year. Meanwhile, a Quad High Mem EC2 instance (64GB of Memory is around $1,700 a month), or $20k a year.
True, but there are still a number of optimizations that can be made.
Issues like locking become a problem when IO is slow, and transactions have to wait for locks to clear. When locks are 3 orders of magnitude faster on SSD's vs spinny disk, those issues tend to fade away.
Agreed, SSD's have worse write endurance than RAM. You should be able to architect around that fairly easily, however.
That being said, I spoke with Blekko's CTO, and out of the 700 SSD's they purchased 2 years ago to serve their index off of, they've had 2 drive failures. Their index is write only.
Not having to shard is a Good Thing(TM). But, if you can throw 10TB of SSD storage in a server, sharding should be less of an issue unless you're dealing with Google(n) sized data problems, no? I suspect that 10 TB of DRAM MemSQL would be cost prohibitive.
Absolutely correct, but SSD's are rapidly replacing spinning disk in Enterprise data centers. I suspect that the first VPS service that offers SSD storage is going to rapidly kill the competition because of the insane performance differential. How long am I going to be willing to pay an extra $10 a month for 1 GB of extra RAM in my VPS when I can purchase an 80 Gig SSD for $120?
I'm not trying to troll or be difficult. I'm honestly looking for answers to the in memory / NoSQL / MemSQL vs SSD debate. I've looked at the issue from a number of different angles, but I keep coming down on the SSD side. I figured you would have some of the best, most thought out rebuttals to the issues I've been thinking about.
But, everybody seems to be using Cloud VPS's because they seem very wary of putting some hardware together and throwing it in a rack. I suspect that because I've been rolling my own PC builds for more than a decade, I'm probably much more excited about throwing 20 SSD's in a 2 x 8 core server to see what it would do than someone who has purchased Mac laptops for most of their career. (not flaming, just thinking out loud).
I have a theory that a side effect of Apple owning the startup developer market is that developers think of servers as something residing in a cloud to be rented rather than built from parts. And, because of that, things like NoSQL and non-relational databases have gotten a ton of buzz rather than throwing a _little bit of money at hardware to solve the problem.
EDIT: Just so say that I'm so very happy to see a company which tries to solve real problems not Web 2.0 social networking crap is able to raise money.
EDIT2: The company like this needs $40M investment not $2M.
I don't see why a pure software company needs that much funding in the beginning. Their expenses are close to zero.
So, continuous snapshotting means writing to disk asynchronously?
For our distributed cluster we have the option of writing the write ahead log to the network "log servers" so you can do it even faster. In this case once ones K out of N log servers receive a log record we acknowledge the transaction. Then eventually the log servers flush it to disk. In this case you are able to write very quickly (you can spin off as many log servers as you like), you store your data in memory in one copy and you're durable.
An opensource implementation of Paxos: 
This is the best example yet of YC's considerable ability to take a very deep tech, non-consumer startup, help vet and guide its technology and marketability, and provide the credibility to get it well-funded.
Fantastic round for a database software startup, a breed written off not long ago. I'd think it wouldn't happen maybe anywhere besides YC. Really shows why YC should be (and is) sought out by founders looking to build this kind of company.
Does MemSQL just have more compact data structures that let more stuff fit in memory?
It seems to me that unless it is a complete one-to-one mapping, it is going to be a nightmare to explain what is supported and what is not.
As the market has shown, this isn't a compelling improvement. For us, every part of MemSQL has been optimized for memory - this gives us a 30x boost, rather than the 4-5x you might see just switching out the storage engine.
The decision to support the MySQL protocol on the wire was made to reduce friction for the application. However, that doesn't mean we will support every quirk or 'misfeature' for MySQL just to say we're "MySQL compatible."
Our value proposition is easy performance at scale. In working with customers, it's been pretty straightforward to explain the MemSQL surface area.
(I can imagine designing for RAM from day one can help... but curious as to how much it helps, once the actual storage layer and investment in RAM is identical.)
(Sorry, 'order(s) of magnitude' is a pet peeve of mine.)
Is that binary?
Seriously - if that is a pet peeve, then you need bigger problems. I'll make it easy for you: if someone says "order of magnitude" they always mean base 10, unless otherwise noted.
Those places where "order of magnitude" is most commonly used – especially sciences and computing – are those places where other bases are possible, and where precision matters.
Your 'make it easy for me' doesn't match my experience. Often "orders of magnitude" is used as hand-waving, to make a difference sound more impressive than the speaker is willing to go on record about. When tangible numbers arrive, it may only turn out that what was meant was 'a large multiple', less than the actual power-of-ten that was implied by the high-falutin' phrase.
If you've got a multiple in mind, and want to impress careful quantitative listeners, name the multiple. Say "around 10x" or "up to 10x" or "more than 10x". In comparison, "order of magnitude" sounds like wordy posturing.
In that way it is a non-mathematical form of big-O notation.
You might prefer saying 10x or whatever, but complaining about the base is unnecessarily pedantic.
Edit: Actually, Wikipedia even says In its most common usage, the amount being scaled is 10 and the scale is the (base 10) exponent being applied to this amount (therefore, to be an order of magnitude greater is to be 10 times as large).
How are you sharding the TPC-C workload?
If the answer is sharding on warehouse, how are you implementing the distributed transactions in new order?
1) Can @memsql scale writes ?
2) Is @memsql ACID compliant ?
Your answering on sharding indiciates that @memsql scales writes in exactly the same way as MYSQL - sharding
Re. ACID complaince. I'm guessing you are ?
This looks like a very interesting project!