1.) The range query issue you pointed out can be explained by a well known limitation of skip lists. Unlike B-Trees, skip lists are unidirectional. By default, our indexes are ascending, so indeed you have to skip to the end to run a MAX() or "ORDER BY id DESC" query. To fix this, just change the primary key to be descending in your schema:
CREATE TABLE x (id int, ..., PRIMARY KEY id (id) DESC, ...)
2.) The transaction-buffer > 0 setting is really where we shine. Synchronous durability is something we have for flexibility, but I'll be the first to admit that it's not really optimized to perform well. The customers that we're working with are okay with this. And it's inspired by what modern companies do. Maybe it's changed in the time since I've left (Domas?) but Facebook certainly does not write comments synchronously to disk.
3.) Our synchronous durability does indeed flush in 50 MS cycles, so you'll see poor performance on single threaded inserts. However, as you pointed out, we're optimized for the case with multiple concurrent writers. Since MemSQL implements group commit on high parallel load throuput picks up. Sure, writing your own very specific benchmark you can show us writing poorly, but we've never worked with a customer that's needed single threaded, synchronous writes to shine. If this is your use case, unless you need substantial read performance, MemSQL is not the database for you.
innodb_flush_log_at_trx_commit=1 (which is default, but we add it to our configs since we run with =2 on the slaves since they don't need full durability)
We do use some performance related settings related to this as well:
innodb_flush_method = O_DIRECT
innodb_prepare_commit_mutex=0 (enables group commit in the Facebook patch, other branches have a different setting)
For settings like innodb_doublewrite we leave the default of enabled.
See the issue?
Additionally, skip-lists can be used for performing range queries as well. This means that "SELECT * FROM table ORDER BY id DESC LIMIT 5;" can be executed very efficiently as long as there is a skip-list on "id". Additionally, even COUNT(), MAX(), MIN(), etc... queries can be optimized in a similar fashion. Again, however, doing it in a lock-free manner might not be the most fun thing to do.
That's all well and good, but as a consequence, it is only fair to compare yourself to other systems that are configured to take advantage of the same context.
"MEMSQL IS 30 TIMES FASTER THAN MYSQL."
It's an extremely invalid and biased comparison. You're quite literally comparing the speed of writing to RAM versus the speed of writing to disk. If you didn't make such ridiculous assertions, people would accept your product for the actual awesome things it does and not focus on refuting your bullshit claims.
memcached is probably 30 times faster than MySQL too, but they don't go around claiming it's a durable database.
edit: memSQL main page says "Durable by Default"
We have the durability interface and configuration options here http://developers.memsql.com/docs/1b/durability.html
This should give you a good idea about how durability will react to tuning, but it doesn't dive deep into the internal design. If there's enough interest (seems like there is) we'd be happy to discuss it in a blog post.
This is a huge differentiator we are talking about here. For someone who has been around databases for quite some time (of which I had spent about 5 years working on SQL Server engine) this is very exciting. Is it not?
This sounds like absurd cargo culting. I've never designed a database but parsing the SQL can not have ever been the bottleneck; the fact that you rarely write to disk is where any speed improvements can be found.
I don't get excited for 1% decreases in latency. I'm willing to bet money that the performance penalty behind parsing the sql queries is asymptotically a constant - or at least, a tiny fraction of the time spent computing the data set.
I feel especially confident in this because memsql never brags about the specific increase in speed. This can't be hard to measure.
You are making all these skeptical claims that are technically wrong. I suggest you download MemSQL, compile a query and read the generated C++ code. Then download MySQL source and look at that too. At that point you will be coming to your own conclusions and I'll be happy to resume this debate.
Some queries in databases are indeed CPU bound, but the general statement that well-optimized databases turn to be CPU bound is not correct like that. It all depends on the use case.
If you're thinking full table scans with complicated processing on a database that's entirely in memory, then yes, you'll be CPU bound. But many (most?) databases are not entirely in memory, and there's no reason they need to be. With reasonable memory and indexing, most queries boil down to only a few IOs/disk seeks, so you're looking at <50ms, which is entirely fine for your average web request.
That's the case for the majority of "get by ID" style queries, which in my experience really are the lions share of queries, and in particular are also what needs to be fast. A complicated reporting query can take a couple of seconds, but your "main page/item view" must be fast.
If you have random reads on a large database that cannot be kept in memory, this will be the majority of your workload. Those queries will spend all their time waiting for the disk, and more or less none interpreting your query tree, so optimizing the query tree processing part makes no sense at all.
TL;DR, in my experience optimizing query tree interpretation does not yield significant benefits for the majority of queries, and IMHO won't be a significant differentiator for MemSQL outside of marketing.
There are use cases where 50ms is way too long, and where jitter associated with disk I/O is not acceptable. Capital markets for example. You are looking for sub-millisecond latency to be in the intraday post-trade game, and single digit milliseconds for quote-to-trade. The number of instructions in the hot code path (and cycles per instruction) actually starts to matter. Lock-free sturcutres utilized for storage are also important. This combination gives MemSQL some rather unique low-latency properties that certain folks can exploit to their delight.
To your other point, random reads and key-value lookups is where MemSQL shines. Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.
I'm not sure where you are getting this information. According to the MemSQL documentation, they have one and only one isolation level, READ COMMITTED. But READ COMMITTED takes write locks and blocks reads, but doesn't take any read locks. Blocking behaviour still (and should!) still occur.
It sounds like you are looking for dirty read behaviour, but from the literature MemSQL doesn't support READ UNCOMMITTED.
Update: Read a later FAQ on the MemSQL website. They are using MVCC, so write locks are irrelevant. My apologies to the MemSQL team.
It is addressed in this FAQ: http://developers.memsql.com/docs/1b/faq.html#c1-q4.
I didn't realise you had a new FAQ up on the website - I'll have a look.
Compiling queries to C++ is great for some BS bullet point on a website with BS claims of performance gains, but it's only a niche feature at the end of the day. There are more important things that should be getting tackled first: multi-node distribution, better fsync handling, etc.
SELECT * FROM table WHERE id > 5 LIMIT 10;
is actually of the same form as this query:
SELECT * FROM table WHERE id > 10 LIMIT 5;
without actually parsing the two. I mean you will incur a parsing overhead either way. What I think MemSQL is trying to optimize out is the actual execution of the query. I mean rather than interpreting and running it with checks performed in every iteration, they are compiling the code. I don't know how much of a difference this would make without actually seeing the numbers.
As far as I know that's the same on just about all databases.
Not sure which ones if any do this today, the optimisation might no longer be relevant - I spotted DBs doing this 10 years ago when CGIs that didn't bother with prepared statements were rife.
Do you mean you determine placeholders without actually parsing the SQL?
@jakejake The docs. seem to suggest that the benefits of prepared statements are won without using prepared statements in MemSQL.
http://developers.memsql.com/docs/1b/faq.html gives some more details.
edit: incidentally, one of the corner cases I can think of is joining 10 tables, which gives you 10! combinations for the query processor to work through. This is irrelevant in MemSQL, because you can only join to another two tables, and you cannot do right outer for full outer joins.
The original DB2 for VSE and VM (not DB2 for z-series/390) which was the productized version of System R did exactly this - compiled to native code. Subsequent DB2 implementations chose not to go down this path - even for single OS systems like DB2/390 and DB2/400.
In any event, I'm skeptical if this is going to make very much of a difference for the following reasons:
1. The time spent by the runtime/interpreter in "evaluating" the operators or expressions is really small relative to the heavy lifting of the actual relational operators to say nothing of I/O etc.
2. Most serious implementations don't really interpret the plan using a giant switch statement. For instance, with PostgreSQL, the interpreter puts in function pointers on first iteration. Thereafter it's only slightly worse than compiled native code.
When you compile to native code, you can essentially save the dynamic dispatch on query parts, plus all the other benefits you get from compilation.
E.g. if you do good type analysis, you can also compile to much more efficient data representations, which will again have benefits for memory access. The compiler can also inline code, do loop lifting, and all those goodies.
But overall I strongly agree with 1), any time spent doing execution is usually dwarfed by IO.
What does this mean? Does the SQL get converted into actual C++ source code, then compiled with an internal C++ compiler? That seems like a weird thing to do. Or is it translated directly into an AST? If yes, I can't imagine other db's not doing it? I don't understand the claim being made here.
I am also curious if there is some way the developers have demonstrated the benefits of this approach vs. what other systems do. Most of the CPU time in a simple query in VoltDB is spent in networking, validating and suffering the cache-miss pain of walking index data structures. I can't see clearly how compiling to native code makes any of that much faster.
In fact, making a plan completely stable can cause problems if the data distribution changes dramatically. I asked about this problem in a previous article about MemSQL, and the answer was that the plan won't expire unless you drop an index or add a new one. 
I will say positive things for Microsoft: you can always trial their software before you buy it.
InnoDB also does this, too . I don't know if MemSQL actually has checkpointing or what, but it's just worth noting.
Still, that's not what we're talking about. We're talking about the difference between writing data to the transaction log, and writing data to the transaction log and then flushing it. Once you call fsync() and the data is flushed you know for a fact (provided the hardware isn't lying to you) the bytes are on the damn platter, and not in some OS buffer cache.
However in practice most operating systems and/or file systems and/or disks cheat. fsync() is usually buffered in the hard disk itself. You can disable that and force the hard disk to truly write out on fsync, but that is so prohibitively slow that people rarely do that.
If you want absolute durability, you'll have to have hard disks running on some battery buffered power supply, which is a common configuration.
On the other hand, in a proper database system, at least the data files won't be corrupted by a missing fsync, so it'll come up. Figuring out whether that one commit did or did not go out in the very rare event of a fatal power failure and a just pending fsync() and the commit making it out of the network stack in time is probably a Heisenberg-esque inquiry into obscure realms of uncertainty.
Without the proper safety precautions, use of write cache on physical drives could cause data loss in the event of a power failure....
The bottom line is that Database pros insist on and get actual durability, and with a battery backed write cache, it's not painful.
But I agree, getting battery backed write caches is not that hard, it's just not a default config.
From the article: "See, MemSQL also has “transaction-buffer” setting, which, in default “full durability mode” will asynchronously return “ok” until 128M buffer is full (or background log flusher thread writes it out)."
So which is it?
If you generate data which are honest in your use case, then you can publish them honestly as long as you are specific about your use case.
Making write speed claims when you're not touching the disk is like bragging about how fast you can cat a file to /dev/null.
On that note, here's my take on MemSQL - People who love MySQL, almost never want to migrate. People who hate MySQL, usually hate it because of SQL. People who are far along in the projects enough that decision of database is no longer a matter of taste, don't use a brand new untested DB.
Therefore, I am unsure how a brand new, in-memory, SQL based solution fits.
Or because of the shortcuts it has taken to avoid the "hard stuff", which results in developers taking shortcuts by using MySQL to avoid the "hard stuff".
Love or hate SQL for what it is, not the MySQL implementation of it.
I am deeply interested in what you think this is
For one thing, it's not an optimizer like gcc at all. It doesn't take your existing algorithm and speed it up, it chooses a better algorithm. How does it know which is better? It collects statistics and runs them through a cost model.
This is what allows something like PostgreSQL to offer nested loop joins, merge joins, and hash joins. Without a CBO, how would it know which to choose? Similarly, it can offer multiple ways to use the same index (normal index scan, bitmap index scan) and multiple ways to aggregate (Sort+Group or HashAggregate).
Those are just a few of the many algorithms postgres offers that are chosen in a principled way by the CBO.
Consider unsupported SQL constructs from other engines (CTEs, window functions, better join and subquery support), and the effort and mess to accomplish what should be straightforward tasks generates SQL hate.
Some NOSQL projects have merit as they provide mechanisms for working with data structures that do not map well to rows and columns. However, difficulties faced with MySQL have also pushed some projects to use equivalently broken data storage technology, when a "better" RDBMS would be more appropriate.
Citation needed. Which database system on earth doesn't do page based storage? In fact, you cannot get any durability guarantees if you don't write page-at-a-time.
MongoDB is relying on the operating system's cache manager by using memory mapped files. Whether that is a good idea depends a lot on your operating system, and whether it's page replacement algorithm fits your database use case. Most database implement their own page cache because they want tighter control over those algorithms.
I like MongoDB's data model, I don't understand their choice of query language, and I think their database engineering is odd to say the least. What's the point in having a non-durable system if you don't even get scalability across nodes? Where's the hot backup? And so on. It seems like something people use who just couldn't figure out indices in relational databases, just to run around and tout that NoSQL is so much faster :-(
Isn't that just a higher likelihood of "durability"?
When the operating system sends a write request to the disk
hardware, there is little it can do to make sure the data has
arrived at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to be sure that all storage
components ensure data integrity. Avoid disk controllers that have
non-battery-backed write caches. At the drive level, disable
write-back caching if the drive cannot guarantee the data will be
written before shutdown.
The SQLite documentation goes into further detail surrounding the atomicity of writing a single 512 byte sector during a power failure (on old drives, on 4096 byte drives, and on SSDs). Few people seem to account for any of this stuff, yet sleep soundly at night regardless.
What if that disk crashes, or the SAN array brakes and kills all the data, or the data center burns down?
It's much, much, much more likely to have a memory failure (a crash anywhere from thread to hardware) than a hard drive failure. I don't mind them claiming that as durability.
So you can measure memory and process durability in hours to days, and spinning disk durability in years. I don't know how long SSDs last yet.
Say, (just making up numbers) you expect your servers to run 10 days without a reboot or crash, and your disks to last 5 years before failing. That means writing to memory is about 0.005479 as durable. So it's not great, but it's not zero either.
An interesting thought experiment: how many machines replicating data only in RAM does it take to be more durable than one machine with a spinning disk?
So just pick the tool that fits your requirements the best and stick with it. That being said, I also urge you to pick something free and open-source if you can. Some of the ways big sites have leveraged MySQL to the extent they have is due to the custom patchsets they apply. Closed-source solutions may not provide similar hotfixes or customization (and when it comes it's months later than they claimed they'd have it ready)
 Someone create a project on GitHub
 Upload an amply sized CSV full of data
 Publish list of queries needed to "gauge" speed
 Allow anyone to post their metrics to see how well tuned any database can be in comparison
The only other criteria would be to ask them to all use the same form factor. EC2 makes the most sense.
I HATE the idea personally, but I cannot think of a better way to start comparing apples to apples.
Domas' main criticism is that our video, http://vimeo.com/44087431, uses a MySQL with standard defaults. Since tuning a database is by definition a custom process, we wanted to demonstrate what performance you'd get "out of the box." The video speaks for itself. MemSQL can push 80k/sec vs 3.5k/sec for MySQL on EC2.
Domas actually demonstrates MemSQL's robust durability controls: he bounds MemSQL to disk. If you want to throw an anchor behind a speed boat, you can do it, though we don't recommend it.
Even Domas agrees his article isn't serious: "I do agree, that its execution speeds for table scans are not bad – 8M rows scanned a second (for SELECT COUNT(*) query) from single thread is sure a great achievement."
If that's the highest praise I can get from Domas, I can live with it.
I wouldn't have written that post the way it is if your marketing would be honest and with lots of actual technical merit. There're plenty other companies in the community that have invested much more into decent technical communication.
I agree that in some cases pure durability isn't needed and the best-effort with enforced maximum slippage is good enough, but still, pure durability implementation is extremely naive, and I can show that it hits benchmarks really badly :-)
Your optimizer does not optimize very simple things, and though you can claim that in-memory performance hides that, still, it is nowhere close a drop-in replacement of any kind.
I understand what you've done from technology perspective, and it is an interesting approach (albeit I'm not trying to evaluate applicability).
I just cannot approve the way you did initial publicity - unfair comparisons are unfair comparisons, and if you're using them that way, I can come up with my unfair comparison :)
Companies use MemSQL because it uses memory as the primary locus of data. If you have a fast data problem, you couldn't use a disk-based system. At high speeds, you'd want to deploy any database in an active-active mode.
It's dishonest, at the very least, to claim ACID compliance and then do benchmarks against a known ACID compliant competitor without having the same level of ACIDity enabled in your program.
If you want to prove how fast MemSQL is against MongoDB, that's one thing (I understand it also can be "fully durable" but it is not by default due to speed concerns). To prove how fast MemSQL is against MySQL, you have to prove it on the same playing field, which means it has to, necessarily, be just as ACIDic and respond to "real" queries on large datasets with high concurrency.
tl;dr Do some benchmarks of apples-to-apples ACID, and do something about the common case of a head of a table, then you will be honestly faster (or not) than MySQL
Just a little addendum: MongoDB has been "durable by default" since v1.7 I believe (current: 2.0.6).
Yes, and Domas was criticizing that as a strawman; as comparing apples to oranges. Since they tailor their defaults to different criteria (e.g. different standards of durability), of course they will perform differently.
The only really valid benchmarks are when you get an expert in each to tune their respective products to optimize for a specific scenario, and see how outperforms who. And then make public exactly what configuration was done to get there.
Additionally, I'm not sure that the implementation of the hard reliable case sounds very mature or even well thought-out. Having a syncer running at 20hz and waiting for it even in the synchronous path seems a bad idea. That's kind of like adding an anchor to your anchor so your speedboat can crawl while it crawls. Or was that a misrepresentation?
And who cares if the tone is serious? The criticisms seem to be valid, and ones that anyone releasing software making bold claims should expect. Statements about performance need to be qualified. Out-of-the-box behavior is not a reasonable point of comparison if (1) that's not what people do in the real world (2) the tests you use are ones that one piece of software is tuned for and the other is tuned against. Some kind of sorted index by field is necessary for some very common and desirable queries. Some of the performance gains come at an unacknowledged cost. The durability just isn't there (and for trivial reasons that could be fixed by making a few more parameters tunable, it sounds like), but is implicitly lauded by claiming to be ACID.
I don't know if you need to go looking for praise in that article. It really seems like there are some criticisms it would be beneficial to take seriously, even if you object to the author's tone or lack of praise. Even-handedness is not a marker of honesty, and honest criticism is worth the headaches it causes and the growth it spurs if you're willing to take it seriously.
Index was missing on ID column in MemSQL's case, so MemSQL was running entire table scan instead of using the index as MySQL.
Perhaps the vimeo video needs the same disclaimer.
Since tuning a database is by definition a custom process, we wanted to demonstrate what performance you'd get "out of the box."
Since you're picking the tests and setting what the "out of the box" defaults are the whole thing is kind of a waste of time, which is really what Domas is pointing out.
There's a weird implication here that once you need disk persistence, you might as well give up. Group commit and well tuned, append-focused IO can do wonders if leveraged correctly. For example, we've benchmarked VoltDB using synchronous disk writes (group commit every 2ms) at half a million SQL statements/sec and 100k txns/sec on a single box using a magnetic media and disk controller with a BBU. This same config on EC2/EBS will run dramatically slower, but would still be considered respectable compared to MySQL.
I have no opinions either way on your product, but have you tried benchmarking against an out-of-the-box MySQL installed entirely on a ramdisk, so that it doesn't touch disk at all as well? That would make the comparisons more valid, since both installations wouldn't touch a physical disk.
So, what your video is saying, is that your speedboat is 80,000 faster than MySQL with an anchor attached. Seems like a fair comparison.
I think it is reasonably fair to test MySQL's default, recommended configuration which writes to disk frequently, particularly if the other configuration is not worth using.
If it is worth using, then the real question is what you give up by using MemSQL instead of MySQL in the alternate configuration.
If you mean to ding MemSQL on durability then you can ding MySQL on speed. If you aren't going to ding MemSQL on durability (because you are comparing it to less disk-intensive MySQL configs) then presenting a case for MySQL means you have to come up with something that you are giving up other than durability based on frequent disk writes.