Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MemSQL CTO here. Great article- Domas has done a good job of digging into the internals of MemSQL! A few questions/comments:

1.) The range query issue you pointed out can be explained by a well known limitation of skip lists. Unlike B-Trees, skip lists are unidirectional. By default, our indexes are ascending, so indeed you have to skip to the end to run a MAX() or "ORDER BY id DESC" query. To fix this, just change the primary key to be descending in your schema:

    CREATE TABLE x (id int, ..., PRIMARY KEY id (id) DESC, ...)
    
This is explained here http://developers.memsql.com/docs/1b/indexes.html#skip-list-.... If you want both behaviors in your app, you'll have to create two skip list indexes (for now).

2.) The transaction-buffer > 0 setting is really where we shine. Synchronous durability is something we have for flexibility, but I'll be the first to admit that it's not really optimized to perform well. The customers that we're working with are okay with this. And it's inspired by what modern companies do. Maybe it's changed in the time since I've left (Domas?) but Facebook certainly does not write comments synchronously to disk.

3.) Our synchronous durability does indeed flush in 50 MS cycles, so you'll see poor performance on single threaded inserts. However, as you pointed out, we're optimized for the case with multiple concurrent writers. Since MemSQL implements group commit on high parallel load throuput picks up. Sure, writing your own very specific benchmark you can show us writing poorly, but we've never worked with a customer that's needed single threaded, synchronous writes to shine. If this is your use case, unless you need substantial read performance, MemSQL is not the database for you.



Regarding #2, we do normally run MySQL with full transactional durability at Facebook. We have for a very long time (several years at least). For example, here is a FB note regarding our enhancing performance under full durability in MySQL from Oct 2010:

https://www.facebook.com/note.php?note_id=438641125932


Harrison, what are these settings for InnoDB? innodb_flush_log_at_trx_commit, innodb_flush_methodm, innodb_doublewrite,


The primary settings are:

sync_binlog=1 innodb_flush_log_at_trx_commit=1 (which is default, but we add it to our configs since we run with =2 on the slaves since they don't need full durability)

We do use some performance related settings related to this as well:

innodb_flush_method = O_DIRECT innodb_prepare_commit_mutex=0 (enables group commit in the Facebook patch, other branches have a different setting)

For settings like innodb_doublewrite we leave the default of enabled.


1, O_DIRECT, 1


It's not specifically related to the original article discussed here, but skiplists don't need to be unidirectional. For instance Redis implements all the sorted sets operations using doubly-linked skiplists, so once you identify a node you can traverse other nodes in both directions.


Good point- ours are currently unidirectional because they have to be lock free. We have some ideas on how to keep them lock free and make them bidirectional, but that's not part of the current release.


Haven't thought for a second about how to recover being stomped on, but xor'ing the directions together might be promising...


That doesn't solve concurrency.


I don't think the point of the article was seriously about the details of performance in MemSQL versus MySQL. I took it as a criticism of the irresponsible reporting of benchmarks.


Yet you still claim your memSQL is durable by default. Except its not. That's the point. If you make memSQL durable in the configuration (since it's not actually durable by default as you guys claim), then the performance is apparently terrible. Very horrible kind of terrible.

See the issue?


The problem is the claim of being durable by default. It is, to be extremely charitable, a stretch of the truth. Asynchronous durability is essentially an oxymoron.


Even though skip-lists as described in literature are singly linked (horizontally), they can be augmented with a back pointer to make them doubly linked. This would allow one to re-use an index as either a forward or backward index, and the order of the index is no longer important. I am not sure how easy this is to do in a lock free fashion. However, if one were to adopt a strategy which combines functional data structures and RCU, I am guessing that it would be possible to implement it with relatively good readability.

Additionally, skip-lists can be used for performing range queries as well. This means that "SELECT * FROM table ORDER BY id DESC LIMIT 5;" can be executed very efficiently as long as there is a skip-list on "id". Additionally, even COUNT(), MAX(), MIN(), etc... queries can be optimized in a similar fashion. Again, however, doing it in a lock-free manner might not be the most fun thing to do.


Skip list is essentially a hierarchy of linked lists. It is very hard to make a doubly linked list work concurrently without using locks. One would have to update two pointers atomically in order to insert a new element in a doubly linked list.


You could probably pull that off if you implement it like full text search indices in e.g. Lucene. Have every modification create a "diff" skip list that takes precedence over older skip lists, and then have a merge operation that runs asynchronously (and will either have to lock, or copy the entire data structure and switch it in atomically). That's also essentially how Bigtable's disk format works.


Actually, there are papers about it http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.4... but I don't know much about the implementation complexity since I've never implemented one.


That shouldn't be any harder than updating one pointer atomically, modern x64s have two-pointer sized atomic exchange operations.


Modern x64s have 128-bit CAS instruction, but the problem is that those two pointers can be far apart in memory, in two different elements of the list.


The tl;dr here is: MemSQL is optimized for a different context.

That's all well and good, but as a consequence, it is only fair to compare yourself to other systems that are configured to take advantage of the same context.


You seem to be missing the main reason why people have an issue with MemSQL. The issue are quotes like this on your website and video:

"MEMSQL IS 30 TIMES FASTER THAN MYSQL."

It's an extremely invalid and biased comparison. You're quite literally comparing the speed of writing to RAM versus the speed of writing to disk. If you didn't make such ridiculous assertions, people would accept your product for the actual awesome things it does and not focus on refuting your bullshit claims.


This is exactly right, and should be addressed.

memcached is probably 30 times faster than MySQL too, but they don't go around claiming it's a durable database.


It fucking says "Durable by Default" on their main page !!!!

edit: memSQL main page says "Durable by Default"


To clarify, memcached's page does not say "durable by default"


[deleted]


Read more carefully next time. The post you are responding didn't say anything a bout MemSQL. It was talking about memcached.


Hi Tim, we're not comparing "the speed of writing to RAM versus the speed of writing to disk." You can run InnoDB with a buffer pool large enough to keep the entire database in memory and we'll still outperform it significantly. We're actually working on a blog post right now with that comparison.


I think it might help more to explain how your design works and why we should trust it, than to provide simple benchmarks proving the speed. Speed is only one part of the equation, and frankly it's the one I care about only after durability is taken care of. Explain thoroughly how you achieve both and it will be more relevant to me. Looking forward to the post.


@calinet126

We have the durability interface and configuration options here http://developers.memsql.com/docs/1b/durability.html

This should give you a good idea about how durability will react to tuning, but it doesn't dive deep into the internal design. If there's enough interest (seems like there is) we'd be happy to discuss it in a blog post.


Does 'significantly' means '30 times' here?


Tim, if you want to talk substance, you have to peer deeper under the covers. And if you do that, then you just can't overlook the fact that the query execution model used by MemSQL is significantly different than that used by old school relational databases, including MySQL. And what's different is that MemSQL translates you SQL query into extremely efficient C++ code. Code that is compiled and executed natively. Whereas MySQL, SQL Server, Postgress, Oracle - all of these products evaluate queries by interpreting their respective tree representations of your SQL queries. So which database do you expect to be faster? The one that runs native code or ones that interpret?

This is a huge differentiator we are talking about here. For someone who has been around databases for quite some time (of which I had spent about 5 years working on SQL Server engine) this is very exciting. Is it not?


>And what's different is that MemSQL translates you SQL query into extremely efficient C++ code. Code that is compiled and executed natively. Whereas MySQL, SQL Server, Postgress, Oracle - all of these products evaluate queries by interpreting their respective tree representations of your SQL queries.

This sounds like absurd cargo culting. I've never designed a database but parsing the SQL can not have ever been the bottleneck; the fact that you rarely write to disk is where any speed improvements can be found.


You are asserting that databases are always I/O bound and never CPU bound. Your assertion is wrong. A properly tuned database will become CPU bound (whether it's MemSQL or MySQL). At that point hyper-efficient execution results in higher throughput and lower latency.


> At that point hyper-efficient execution results in higher throughput and lower latency.

I don't get excited for 1% decreases in latency. I'm willing to bet money that the performance penalty behind parsing the sql queries is asymptotically a constant - or at least, a tiny fraction of the time spent computing the data set.

I feel especially confident in this because memsql never brags about the specific increase in speed. This can't be hard to measure.


Phill, parsing has little to do with what goes on at query execution time. Parsing is typically done once for a given parameterized query (and MemSQL automatically parameterizes queries with constants). After parsing comes compilation which produces a query plan. The query plan is cached and that is the thing that is actually used to evaluate the query when it's re-submitted again. We are discussing the differences in how MemSQL handles executing compiled queries. Executing compiled queries is what a (well tuned) database spends most of its time doing. It is the hot code path. And if you cut the instruction count in half you can expect latency to decrease proportionally and throughput to increase.

You are making all these skeptical claims that are technically wrong. I suggest you download MemSQL, compile a query and read the generated C++ code. Then download MySQL source and look at that too. At that point you will be coming to your own conclusions and I'll be happy to resume this debate.


I've actually written a compiler from database queries to native code (JIT'ed bytecode actually, but doesn't make a difference here) once.

Some queries in databases are indeed CPU bound, but the general statement that well-optimized databases turn to be CPU bound is not correct like that. It all depends on the use case.

If you're thinking full table scans with complicated processing on a database that's entirely in memory, then yes, you'll be CPU bound. But many (most?) databases are not entirely in memory, and there's no reason they need to be. With reasonable memory and indexing, most queries boil down to only a few IOs/disk seeks, so you're looking at <50ms, which is entirely fine for your average web request.

That's the case for the majority of "get by ID" style queries, which in my experience really are the lions share of queries, and in particular are also what needs to be fast. A complicated reporting query can take a couple of seconds, but your "main page/item view" must be fast.

If you have random reads on a large database that cannot be kept in memory, this will be the majority of your workload. Those queries will spend all their time waiting for the disk, and more or less none interpreting your query tree, so optimizing the query tree processing part makes no sense at all.

TL;DR, in my experience optimizing query tree interpretation does not yield significant benefits for the majority of queries, and IMHO won't be a significant differentiator for MemSQL outside of marketing.


@Nitramp: Fortunately for all of us this world is full of unsolved problems many of which exist outside of consumer web.

There are use cases where 50ms is way too long, and where jitter associated with disk I/O is not acceptable. Capital markets for example. You are looking for sub-millisecond latency to be in the intraday post-trade game, and single digit milliseconds for quote-to-trade. The number of instructions in the hot code path (and cycles per instruction) actually starts to matter. Lock-free sturcutres utilized for storage are also important. This combination gives MemSQL some rather unique low-latency properties that certain folks can exploit to their delight.

To your other point, random reads and key-value lookups is where MemSQL shines. Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.


Since you never wait on I/O and don't take locks, optimizing execution makes a lot of sense actually. All that's left is execution and network.

I'm not sure where you are getting this information. According to the MemSQL documentation, they have one and only one isolation level, READ COMMITTED. But READ COMMITTED takes write locks and blocks reads, but doesn't take any read locks. Blocking behaviour still (and should!) still occur.

It sounds like you are looking for dirty read behaviour, but from the literature MemSQL doesn't support READ UNCOMMITTED.

Update: Read a later FAQ on the MemSQL website. They are using MVCC, so write locks are irrelevant. My apologies to the MemSQL team.


Chris, MemSQL utilizes versioning to implement READ COMMITTED. In this implementation readers are never blocked.

It is addressed in this FAQ: http://developers.memsql.com/docs/1b/faq.html#c1-q4.


OK, didn't see that you are using multiversion concurrency control.

I didn't realise you had a new FAQ up on the website - I'll have a look.


@Criss Isolation levels are used to specify semantics only. DBMS is free to implement them in different ways. Take a look at this VLDB 2012 paper about optimistic concurrent control in main memory databases, like MemSQL, which also describes implementation of isolation levels: http://arxiv.org/pdf/1201.0228v1.pdf


Ah. Sorry, didn't realise that you were using MVCC. That puts a different light on things.


Get by id queries are majority when the database is used as a key-value store, and the only reason to such an underuse of RDBMSes is precisely because under heavy load some of them have hard time doing proper queries, with joins, subqueries and all the stuff databases are designed for.


It's not "underusing" an RDBMs, those are just the lions share of queries in many applications. Also note that a join say from some parent item by ID to children who have a parent_id makes essentially two lookups by ID in two indices, so the same reasoning applies; this is not just SELECT * FROM bla WHERE id = :x.


I don't know how to tell you this, but query plans parsing is never the bottleneck in any database. You're just plain wrong.

Compiling queries to C++ is great for some BS bullet point on a website with BS claims of performance gains, but it's only a niche feature at the end of the day. There are more important things that should be getting tackled first: multi-node distribution, better fsync handling, etc.


It is not about the cost of parsing SQL text. Think of it as intermediate bytecode for SQL that is interpreted as query is being executed. Every SQL database does some variation of it. What is unique about MemSQL is that this bytecode is actually compiled to native code using GCC.


I agree with what you have to say, but I don't see how you can determine that this query:

SELECT * FROM table WHERE id > 5 LIMIT 10;

is actually of the same form as this query:

SELECT * FROM table WHERE id > 10 LIMIT 5;

without actually parsing the two. I mean you will incur a parsing overhead either way. What I think MemSQL is trying to optimize out is the actual execution of the query. I mean rather than interpreting and running it with checks performed in every iteration, they are compiling the code. I don't know how much of a difference this would make without actually seeing the numbers.


That's why prepared statements are preferable because those two statements have to be parsed as unique queries. They are actually the same query with different parameters.

As far as I know that's the same on just about all databases.


At least some databases I've used have parsed out constants in the client library code and submitted prepared statements to the server, passing those constants in. That meant that the server got better use out of its query cache; on the server side, those are the same query.

Not sure which ones if any do this today, the optimisation might no longer be relevant - I spotted DBs doing this 10 years ago when CGIs that didn't bother with prepared statements were rife.


@nikita I see: http://developers.memsql.com/docs/1b/faq.html#does-this-mean...

Do you mean you determine placeholders without actually parsing the SQL?

@jakejake The docs. seem to suggest that the benefits of prepared statements are won without using prepared statements in MemSQL.


Not every parsing is equal. MySQL and other databases produce a tree as they parse a query. If you just want to match parameters you can avoid memory allocations and just replace parameters with ^ or @ signed in place in the query string.

http://developers.memsql.com/docs/1b/faq.html gives some more details.


Actually, it can have a significant impact but it's a corner case. It's a moot point though - as an example, SQL Server stores plans in a memory cache. [1]

1. http://www.sqlteam.com/article/what-query-plans-are-in-sql-s...

edit: incidentally, one of the corner cases I can think of is joining 10 tables, which gives you 10! combinations for the query processor to work through. This is irrelevant in MemSQL, because you can only join to another two tables, and you cannot do right outer for full outer joins.


Interested in knowing why this has been voted down! A response would have been more useful. Would love to know what is inaccurate about my comment.


And so the wheel turns.

The original DB2 for VSE and VM (not DB2 for z-series/390) which was the productized version of System R did exactly this - compiled to native code. Subsequent DB2 implementations chose not to go down this path - even for single OS systems like DB2/390 and DB2/400.

In any event, I'm skeptical if this is going to make very much of a difference for the following reasons:

1. The time spent by the runtime/interpreter in "evaluating" the operators or expressions is really small relative to the heavy lifting of the actual relational operators to say nothing of I/O etc.

2. Most serious implementations don't really interpret the plan using a giant switch statement. For instance, with PostgreSQL, the interpreter puts in function pointers on first iteration. Thereafter it's only slightly worse than compiled native code.

SK


Re 2.: you can get significant improvements when compiling to native code. You still have to dereference those function pointers, which is essentially the same as a big switch statement (if not worse, your function pointers might be anywhere in the heap, not in your cache line).

When you compile to native code, you can essentially save the dynamic dispatch on query parts, plus all the other benefits you get from compilation.

E.g. if you do good type analysis, you can also compile to much more efficient data representations, which will again have benefits for memory access. The compiler can also inline code, do loop lifting, and all those goodies.

But overall I strongly agree with 1), any time spent doing execution is usually dwarfed by IO.


"And what's different is that MemSQL translates you SQL query into extremely efficient C++ code."

What does this mean? Does the SQL get converted into actual C++ source code, then compiled with an internal C++ compiler? That seems like a weird thing to do. Or is it translated directly into an AST? If yes, I can't imagine other db's not doing it? I don't understand the claim being made here.


Yes, they compile SQL statements into linux binaries using a bundled GNU compiler toolchain.

I am also curious if there is some way the developers have demonstrated the benefits of this approach vs. what other systems do. Most of the CPU time in a simple query in VoltDB is spent in networking, validating and suffering the cache-miss pain of walking index data structures. I can't see clearly how compiling to native code makes any of that much faster.


In-memory storage layer is very efficient. There are no buffer pool pages to manage or cache misses that result in I/O. That puts an accent on what goes on between storage engine and network, that is on query execution. Tight loops in native code is a good way to cut down on CPU cycles required to evaluate a query.


Actually, no. You speak like database engines have to parse a query every single time it's run. That's not true - any reasonable database engine has a plan cache. Just because something is compiled into object code, doesn't make it that impressive.

In fact, making a plan completely stable can cause problems if the data distribution changes dramatically. I asked about this problem in a previous article about MemSQL, and the answer was that the plan won't expire unless you drop an index or add a new one. [1]

1. https://news.ycombinator.com/item?id=4126330


MemSQL isn't open source. If I can't hack it, who cares.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: