Ex-Facebookers launch MemSQL (YC W11) to make your database fly

Shish2k · on June 18, 2012

I've seen a lot of these new SQL databases, and they all seem to compare themselves to MySQL. That seems like a particularly easy target, since MySQL's main strength is speed, and it's a fundamentally old-school (ie slow) design which has sacrificed everything else to get half-fast. What I'd like to know is how any of these compare in features and reliability to the featureful, reliable databases like Oracle and Postgres.

(For specific examples that I've run into recently: does it have geo-spatial extensions, and does it crash and burn and corrupt all its data if a malloc() fails)

chris_wot · on June 18, 2012

It still uses a transaction log, but seems to make snapshots of the logs at a period of time to create compressed snapshots of the data itself.

You can turn off durability, but then you risk losing everything.

For features... forget about geo-spatial extensions - they can only do read-committed isolation level... but as they only allow for one SQL statement per transaction I suppose this might not be such a big concern.

If you were using this for something like a session management database, or something where durability is not so important then it's probably fine. Not sure I'd use it for anything that relies heavily on transaction management features.

I see other limitations, like it cannot support CHANGE COLUMN, and it can't do joins on more than 2 tables. Actually, it doesn't look like it can do FULL OUTER or RIGHT OUTER joins, either. :(

nikita · on June 18, 2012

Memsql is a database that does some things incredibly well: deliver on high throughput for on small transactions and some things not as well (surface area).

Of course a new product will have certain limitation which will be removed as the product matures.

chris_wot · on June 18, 2012

No probs, like I've said elsewhere, not attacking the product. Looks very interesting! I guess I'm trying to work out what sort of market you are trying to target here.

tintor · on June 18, 2012

"and does it crash and burn and corrupt all its data if a malloc() fails"

MemSQL is fully durable (with option of sync and async durability, similar to sync and async replication) with its transactional log. If MemSQL process is killed it will recover all commited transactions on the next startup. Durability has very little performance cost on throughput as MemSQL uses only sequential I/O. And of course, durability can be completely turned off for maximum performance.

vessenes · on June 18, 2012

This company was probably an extremely easy investment decision:

1) Facebookers: Check

2) Data Scaling experience: Check

3) In-Memory with SQL semantics: Check

4) New-York based software that can be sold to quant funds: Check

I'm excited. It's downloading now.

moe · on June 18, 2012

Is it just me or does this entire product smell like it was designed for the sole purpose of extracting money from less than tech-savvy investors and clueless institutions?

The bullshit-bingo-lingo on their homepage is mindnumbing.

Meanwhile their actual software seems rather underwhelming, bordering on SnakeOil.

vessenes · on June 18, 2012

If you've ever tried to use SQL on a large, but fittable-in-memory dataset on existing row databases, then you would not be underwhelmed. Existing non-finance-industry solutions suck for things like mid-size trading queries, or (in coinlab's case) work setups that require periodic, frequent table scans.

You can buy something like kdb, but that costs $70k a year and requires that your engineers learn some extremely new semantics if they're only used to SQL and (choose any popular language here, Python to C++).

3pt14159 · on June 18, 2012

Really? I've done many, many queries on large, but fittable-in-memory datasets and with a few little tricks even MySQL can be fast (make everything a temporary table with an append only transaction table to rebuild the temporary tables if lost, don't use keys, use indexes, avoid bad practices (filtering after a join, using subselects, or needlessly avoiding HAVING clauses).

Did you mean large, fittable-in-memory, and (distributed across multiple computers and/or very reliable)? Because that is much harder.

AaronBBrown · on June 19, 2012

Can you explain this further? How is using a temporary table faster? Explicit temporary tables are simply regular tables that only last your current session. Do you mean the MEMORY storage engine? What do you mean by "don't use keys, use indexes?" The term KEY and INDEX in MySQL are exact synonyms.

3pt14159 · on June 19, 2012

Temporary tables default to being mapped in memory and for more complex querying (where you have multiple queries and conditionally select further queries depending on results) are faster since the table that they are based off of could have changed, but they will not, so caching is much more likely. Keys and Indexes are not exact synonyms. Give it a try some time, do a where clause with mod(id, 5) = 0 on a keyed table and on an indexed table. The indexed table will be non-trivially faster (I've seen 3x faster).

AaronBBrown · on June 19, 2012

You are clearly not talking about MySQL. CREATE TEMPORARY TABLE simply creates a physical table on disk that is destroyed at the end of the session, which can be created with any storage engine. Explicit temporary tables are handled just like "regular" tables and is dependent upon the storage engine (w/ MyISAM, indexes are stored in the key buffer, InnoDB keeps both recently accessed data and indexes in the InnoDB buffer pool).

KEY and INDEX are literally the exact same thing and are just there to support different syntaxes: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

Perhaps you are referring to primary keys vs secondary indexes, for which there is a difference in how they are physically stored on disk and in memory, particularly with InnoDB. There is a substantial performance gain to be had by having a relevant primary key that is referenced in the where clause as the data is stored in the same page as the primary key, both on disk and in the buffer pool. Secondary indexes reference the primary key, thus require two lookups.

AaronBBrown · on June 19, 2012

Another key point is that temporary tables are horrible for concurrency. Obviously only one session can access them, but more importantly, they end up causing a great deal of disk activity. At the end of every session, the table is physically dropped. In ext3, this can take quite a while for large tables (not so bad in filesystems like xfs). It also can cause substantial stalls in InnoDB as the buffer pool is scanned for matching pages and are flushed out. Percona Server has innodb_lazy_drop_table to mitigate this issue to some degree, but dropping tables is still always an expensive operation. This is mostly a concern for OLTP, not necessarily OLAP servers.

moe · on June 18, 2012

So, MemSQL is a ColumnStore now? The docs didn't make that clear for me.

Anyway, why is there not a single benchmark available to support the sales-pitch?

(In your favor I'll just pretend you didn't mention kdb here...)

gruseom · on June 18, 2012

In your favor I'll just pretend you didn't mention kdb here

Please explain. What's bad about kdb, besides how nonstandard it is?

moe · on June 18, 2012

There is nothing bad about it (other than the price). It just seems an outrageous claim that MemSQL, with all its constraints, is even in the same game.

At the least they should come up with some seriously impressive benchmarks before dropping names like that.

iskander · on June 18, 2012

I have had the misfortune of working with Q. It's a very clever little language that could be a great platform for numerical prototyping. Unfortunately, it's hobbled by a terrible programming culture. The interpreter gives cryptic single-character error messages and the C interface is written in a nightmarish soup of macros.

nikita · on June 18, 2012

MemSQL is a row store. KDB is single threaded. More benchmarks are coming!

vessenes · on June 18, 2012

http://kx.com/faq.php seems pretty clear that kdb is multi-threaded. "Does kdb+ support multiple cores and/or multi-threading? Yes, both are built right in to the system and this makes kdb+ extraordinarily fast when compared with traditional applications, as it can make full use of all available cores. "

This ties with my own experience using it.

nikita · on June 18, 2012

I'm sorry, I have to clarify this. It's a single writer database...

HarrisonFisk · on June 18, 2012

Any chance you could publish full info on the benchmarks?

For example, where can we find your configurations for the MySQL vs. MemSQL benchmark you show in your video? Or how big the dataset was, etc...?

eddie_the_head · on June 18, 2012

Actually, what you say about the price for kdb+ is true, but Q (the new language to query kdb+, essentially the latest version of K which is now deprecated) is very much like SQL, syntaxwise.

The way I see it, this is very much targeted at financial firms that use kdb+ or products like it that perform really well for things like writing lots of tick data from dozens of exchanges at a time and querying across them quickly in a familiar SQL style.

capkutay · on June 18, 2012

Fast, in-memory data stores already exists with support and ease of use/deployment...I was able to set up hazelcast for my distributed cluster in minutes...I don't see the advantage of this over nosql or in-memory data storage.

evaluate · on June 18, 2012

It's not just you. But I think that's the whole idea. It's about fooling people. And I think some of the profits will be flowing back to Oracle, though that is not stated upfront. The marketing power of the acronym "SQL" should not be underestimated.

But that's only a query language. Not a db architecture.

Let's see the source code. That's the only way to verify the approach that is being taken.

Given the source the first thing I'd do is search for "mmap".

They already get one demerit for using a row store.

And another for disk access.

SnakeOilSQL.

nikita · on June 19, 2012

We do use mmap, but we do not use memory mapped files. We use mmap to allocate big chunks of memory from the OS, which we later distribute to our custom memory allocators.

It's not a column store. We are thinking about it, maybe some time in the future as we add more analytical feature. It's a row store.

We do write back to dist. We have a transaction log, which we truncate once it grows over a certain threshold, by taking a snapshot.

MuGo · on June 18, 2012

paul buchheit and max levchin seem pretty tech-savvy to me

codefryer · on June 20, 2012

All of the arguments for using this are contrived. How big is the total addressable market of projects where there's a huge budget and engineers who ultimately have no idea of what they're doing?

ericfrenkiel · on June 18, 2012

Thanks! We're super-excited to have the opportunity to build this. When we started we asked ourselves - what would a database look like if it were designed today?

We see a future where OLTP databases live in memory, and where you have hundreds of terabytes of memory and hundreds to thousands of CPUs at your disposal.

We built MemSQL to make it easy to go directly into memory and get the speed improvements we all need. What we're releasing aims to fulfills that vision.

vessenes · on June 18, 2012

p.s. Eric, two annoyances about your site: once I've been dumped on to the developers subsite, it's non-trivial to get back to the main company homepage.

Also, pricing. Come on, man! Pricing!!

ericfrenkiel · on June 18, 2012

Thanks! We'll add a link back to memsql.com =) What did you think of the video and diagram on developers?

ecaron · on June 18, 2012

I think none of that really matters until we see the price. There's a trial version and a free version of Microsoft SQL, but their full version can quickly top $20k without much effort - knowing how bad your "ouch" will be is key before I'd spend any time on the "ahhhh."

vessenes · on June 18, 2012

The diagram was fine; I didn't watch the video. I'll hit it up later.

p.s. Isn't a totally reasonable business model for a truly faster DB engine "free, everywhere, bought by Oracle?"

nikita · on June 19, 2012

We think a lot about it. And even though this model definitely makes sense, we would like to build a business that can generate money and pay for itself.

moe · on June 18, 2012

Is this a joke?

Data must fit in RAM, no joins over >2 tables, no transactions, no builtin support for clustering/sharding/horizontal scaling whatsoever.

What is the advantage over memory-tables, MySQL on a ramdisk or something purpose-built (redis)?

chris_wot · on June 18, 2012

That's a little harsh I think. It still has a transaction log, so it has certain advantages over memory tables. Some interesting enhancements to indexes also. And it is also binary compatible with MySQL, so you can actually add it as a MySQL node.

I'd say it's a database for a niche market, as one of the folks from MemSQL has been saying. It would fit in nicely under a few scenarios.

AaronBBrown · on June 19, 2012

MEMORY tables have table-level locks and from a concurrency perspective, is just as bad as MyISAM, so really offer no advantage unless there is only a small number of consumers. A well tuned InnoDB table will outperform the MEMORY engine )(after the warm-up period) as soon as you have more than a handful of users performing write operations.

tigerBL00D · on June 18, 2012

Advantages are clear. Blazing speed, transactional durability, ease of use and flexibility that come from using a well known protocol and SQL dialect. Did I miss something?

old-gregg · on June 18, 2012

On the other hand, if your data fits in RAM and you don't need to join more than two tables and you don't need clustering/sharding whatsoever and you need the full power of SQL, what would you use?

> ..something purpose-built (redis)?

Errr maybe "features" like GROUP BY with HAVING? I think you're trolling, Mr Moe.

moe · on June 18, 2012

trolling

Says the guy who resorts to selective quoting to make a point? You may want to review my original comment where I mention two other options.

MuGo · on June 19, 2012

nope. you're definitely a troll

snissn · on June 18, 2012

MemSQL does not yet support JOINs on more than 2 tables. [1]

1. http://developers.memsql.com/docs/1b/sql/join.html

chris_wot · on June 18, 2012

Nor transactions with more than one statement. And only read committed isolation level. [1]

1. http://developers.memsql.com/docs/1b/isolationlevel.html

snissn · on June 18, 2012

This seems really confusing and like it might be a maintenance nightmare http://developers.memsql.com/docs/1b/memory.html

Do you need to ensure that all of your data fits in memory? It's easily possible to have ten gigs in your db but only need a small amount of it to be hot in ram. Does it take a really long time to restart if it has to warm all the data?

chris_wot · on June 18, 2012

Good questions - no idea! Can one of the developers of MemSQL clarify this? How long does it take to startup a large database of several 100 GB?

nikita · on June 18, 2012

Recovery on the system restart is going at the speed of hard disk.

chris_wot · on June 18, 2012

Can you pin how much data from a table can be placed into memory, or does the whole table need to be placed in memory? I can see settings that limit the transaction log memory usage, but no way of reducing the actual amount of data in-memory.

nikita · on June 18, 2012

The whole table has to fit in memory. RAM is getting very cheap, you can buy 1Tb for 12K today.

genwin · on June 18, 2012

But for anyone who is renting the server, it's $$$$ monthly for a fraction of that.

tintor · on June 18, 2012

Recovery on the system restart is going at the "sequential scan" speed of hard disk.

nl · on June 19, 2012

The "sequential scan" speed of a generic SSD is pretty good. It's even better on something like FusionIO.

alexro · on June 18, 2012

How it compares to VoltDB, which used similar approach and already "combines the proven power of relational processing with blazing speed, linear scalability and uncompromising fault tolerance" (according to VoldDB website) ?

nikita · on June 18, 2012

There are few differences with VoltDB. 1. It's much easier to use (MySQL compatibility, ad-hoc sql no java dependency) 2. It doesn't have any issues with data skews. We are using lockfree skiplists vs b-trees partitioned by core. 3. I believe we are faster on a single box, however this claim should obviously be verified by a third party.

jhugg · on June 18, 2012

Congrats to the MemSQL team. It's always validating to see more entrants into the space. The comparison between the two systems is fascinating; the approach taking is really different than VoltDB, but the systems share some similar choices as well.

As for your comments:

1. Ad-Hoc SQL in VoltDB is a current area of focus in VoltDB development. We use similar compilation and expect them to be no slower than our stored procedures in the 2012 timeframe. Starting with our release next week, we'll also support multiple SQL statements in a serialized/ACID ad-hoc transaction.

2. On a single-node (or a several replicated nodes), VoltDB doesn't really have many issues with data skews. Rebalancing a distributed system is also in development, but hasn't turned out to be much of an issue with our production customers.

3. Faster at what? Single statement non-durable transactions? That's probably true for simple SQL operations for the time being. It gets murkier once you consider VoltDB features like materialized views that avoid table-scans altogether. I am also curious how durability affects performance though.

But I'm really interested to revisit this comparison once MemSQL supports real transactions and replication/sharding. For the time being, most of the VoltDB use cases we target won't run on it.

spitfire · on June 18, 2012

There's another in memory mysql replacement whose name escapes me right now. How are they doing?

Ycombinator seems to have backed a few of these next-gen databases. It will be interesting to see who wins.

EDIT: Rethinkdb. But it looks liked they've dumped their mysql engine and gone to a pure key-value store.

JonM · on June 18, 2012

Xeround looks really interesting (http://xeround.com/). Seems to do everything in RAM and takes care of any replication / backup & scailing for you. Looks like it would get expensive with large amounts of data though... anyone using this in production?

nikita · on June 19, 2012

One of the things that separate us from Xeround is the fact that you can see the benefit of super high throughput on any linux box, including your laptop. You can download the software and use it by yourself.

Xeround is a sas model, where you can't get exact same experience as on your local machine.

sargun · on June 19, 2012

What do you think of Riak (basho.com)?

AaronBBrown · on June 19, 2012

Riak is a great product with an incredible support team, if you choose to use their support.

zwischenzug · on June 18, 2012

I'm obviously missing something. Is SQL really the bottleneck for SQL performance? I'm not sure how much compiling down to C++ will help.

Is there more to this than the article suggests?

chris_wot · on June 18, 2012

Most times that the press mention SQL, they actually mean "Relational Database Management System". But in this case, MemSQL seems to be doing two things: reducing locking semantics (the only transaction isolation level is read committed), and doing something or other to speed up query compilation.

Frankly, I can't see how their query compliation is much better than anything Oracle, SQL Server or PG does already. (see here: http://developers.memsql.com/docs/1b/codegen.html) Perhaps there is no plan cache in MySQL, or the plan cache is inadequate?

Incidentally, you don't want to use it if you need to do multi-statement transactions. Can't be done. Check out the documentation here:

http://developers.memsql.com/docs/1b/isolationlevel.html

It has a few extra tricks up its sleeve to enhance indexes in MySQL, but again nothing really particularly spectacular. http://developers.memsql.com/docs/1b/indexes.html

More interesting for me, is that it can be hosted on EC2, which is pretty neat!

nikita · on June 18, 2012

Query compilation works incredibly well for high throughput. If you have a lot of small transaction (OLTP) you can benefit tremendously by dramatically reducing the amount of instruction you burn per query. Lots of database vendors know that, but they don't do it b/c they often have "one size fit all" dbms. BTW you will see some changed from the big db vendors as well.

There is no plan cache in MySQL. SQL Server has it, but it's nearly not as efficient. I know, since I used to work there.

Multistatement transaction are coming. We are beefing up SQL surface area, as well as working on many more exciting things.

chris_wot · on June 18, 2012

Not sure what you are comparing against. However, from what I've seen of SQL Server execution plans, actually it's extremely robust and I would have to argue that it can often be far more efficient than even Oracle. I've used both.*

Incidentally, sorry if I seem to be disparaging the product. I'm actually looking at it myself, but want to know what the limitations are. Multi-statement queries would be a big win. Are you considering implementing higher isolation levels? Also, as was asked above, how are the startup times for multi-gigabyte/terrabyte databases?

* Hint for the unwary - if you run a SELECT statement with a WHERE 1=0 AND <rest of predicates> (for brain dead "dynamic" SQL generating apps), the initial compilation can take a dramatically long time in Oracle 10g (and I believe 11g, but cannot confirm). Oracle have not got any mechanism to shortcut the query compilation for such queries. A bit obscure, but I never saw this problem with SQL Server. Oracle support were stumped by the issue too and gave me short-shrift ("it might be fixed in an 11g patch... some time").

nikita · on June 18, 2012

No worries :). MemSQL does very well for OLTP like workloads, but it's not an "all size fits all" database.

Recovery goes at the speed of disk.

The compile time for memsql is mostly code generation time. Once the code is generated/compiled MemSQL is doing a very good job matching the queries against the existing plan by removing parameters from the query text.

chris_wot · on June 18, 2012

:-) I really like what you are attempting to do here. What would you say your core market is?

nikita · on June 18, 2012

Some of the stuff is on the website (Capital Markets, Ad Serving, Mobile). It will be expanding as we add clustering and more analytical features.

MattRogish · on June 18, 2012

Most DBMS products have "Stored Procedures" which, essentially, are compiled chunks of SQL that you can call by name, e.g.

CREATE PROCEDURE foobar AS SELECT * FROM users

On very large queries (many table joins) there can be a noticeable performance hit while the DBMS figures out the optimal query plans. Stored procedure compile and store this plan after first execution (some pre-compile) so the DBMS can just look it up in the cache.

It also removes the need to do runtime parsing (compiled vs. interpreted languages). There are more nuances to Stored Procedures than I can put here, but here's a link to a more in-depth analysis I wrote: http://stackoverflow.com/questions/59880/are-stored-procedur...

In general, MySQL has limited query plan cache (last I read, it was one cache per database connection and was wiped out when the client disconnected), so this is definitely a win. However, since every other DBMS out there already has robust query cache, I don't think the Postgres/Oracle/MS SQL etc. folks are losing any sleep.

nikita · on June 19, 2012

Supporting stored procedures doesn't mean compiling to native code. E.g. sql server just stores a compiled plan so saves of parsing/optimization. Execution is still an interpretation.

Giants are investing into big memory technology no doubt about it, which is great. That's the beauty of the huge big data market - one size doesn't fit all.

MattRogish · on June 19, 2012

SQL Server supports .NET stored procs, which are compiled code bits. Oracle / Sybase support Java.

I believe some DBMS vendors are working on the interpreted nature of the execution, but basically the query plan determination makes up the bulk of the parsing/execution time. Compiling SQL to something... "native" doesn't provide a whole lot of win in terms of execution latency.

al_james · on June 18, 2012

I think its that its an in memory database, optimised for memory as opposed to disk. The SQL to C++ seems a bit of a confusion to me, I guess it means it has a JIT SQL compiler to optimise the query to native opcodes. Meh. But the in memory and mysql wire compatibility are big wins.

zwischenzug · on June 18, 2012

Over what? The big iron DB I use is to all intents and purposes in memory in any case. The only Input requirement is to pull the data in at some point (which can be forced) and the Output requirement is to write to a logical log (which we buffer in any case). It flies.

ericfrenkiel · on June 18, 2012

yes, memsql is an in-memory database. the big innovation here is that we're literally hardwiring the SQL into the server and matching queries to those compiled query templates.

Since we work in memory, you see a huge perf boost with code generation because there are no lags with disk.

chris_wot · on June 18, 2012

I don't believe this is the case - it's not an in-memory database.

Even if it is (and I can't see anything on their developer website) then if it's ACID compliant, like they say, then it will have to be writing to the disk at some point - at the very least it will need a transaction log. Otherwise, it's not really usable for anything you need to persist to long term storage!

al_james · on June 18, 2012

Yes, its not clear on their site, but the gigaom article states: "As its name implies, MemSQL achieves its fast performance in part by keeping data in memory".

Yes, it does write to disk (append only logging I think) but my point was, that if you are keeping ALL your data in memory, you can optimise the storage for fast queries, as opposed to a hybrid / paged memory and disk system.

chris_wot · on June 18, 2012

Oops, sorry - I'm wrong. Actually, according to their documentation:

"By default, MemSQL runs with full durability enabled: transactions are committed to disk as a log and later compressed into full-database snapshots. The snapshot and log files are stored in the datadir directory configured in memsql.cnf. When MemSQL is restarted, it will recover itself into the state it was in before the shutdown, by reading the snapshot and log files in datadir" [1]

I missed that piece of documentation on their website. I think they could do better to explain better the advantages and how their technology works. Their product overview doesn't really tell you much of anything, except that they have an enhanced query parser. [2]

1. http://developers.memsql.com/docs/1b/durability.html

2. http://developers.memsql.com/

imfine · on June 18, 2012

I am not sure this solves anything. From their implementation details I could just mount mysql on a memory filesystem and have 100% of MySQL's feature. If I need near 100% assurance that the data will be saved, there is always block replication, transaction logs and regular replication. their transition module adds nearly nothing but overhead, since stored procedures are better, and if you have the need for ultra performance, there is always the ability to access mysql at the command level and bypass sql entirely.

nikita · on June 18, 2012

You could do that, but you won't be able to achieve as high throughput in highly concurrent environments. MemSQL achieves this via code generation and lockfree data structures.

We will also put the technology on steroids, by building a clustered system

genwin · on June 18, 2012

> We will also put the technology on steroids, by building a clustered system

Isn't that the same as MySQL Cluster, minus the SQL-to-C++ part (and maybe lock-free structures)? Why not fork MySQL Cluster to add the SQL-to-C++ code generation? Then it seems you could claim (from MemSQL site) "MemSQL is ... still faster than other in-memory databases because of SQL-to-C++ code generation" and also support every feature of MySQL Cluster. Why reinvent the wheel only to create something with great limitations compared to the existing wheel?

tintor · on June 18, 2012

"Why not fork MySQL Cluster to add the SQL-to-C++ code generation (and lock-free storage engine)?"

:) It is a little more complicated than that.

imfine · on June 18, 2012

lets see some stats. Its a bit unfair to compare an in memory database with a severely truncated feature set to a more robust sick based system.

nikita · on June 18, 2012

Correct! Different scenarios require different solution. One size doesn't fit all. And it's nice when solutions play nicely with each other.

iamleppert · on June 18, 2012

lockfree data structures FTW!

nikita · on June 18, 2012

There are many bottlenecks and it really depends on the workload. Sometimes it's IO, sometimes it's CPU. When you have a throughput problem, yes, it's the SQL.

continuations · on June 19, 2012

Why should I use this over MySQL Cluster?

MySQL Cluster is lock-free, distributed, HA, and has better performances:

http://mikaelronstrom.blogspot.com/2012/05/mysql-cluster-72-...

http://mikaelronstrom.blogspot.com/2012/05/mysql-cluster-727...

And MySQL Cluster has been used in mission critical apps for years.

What does MemSQL give me that MySQL Cluster doesn't?

chris_wot · on June 18, 2012

Question - how well does the query parser handle stale queries? SQL Server for the longest time had issues with plan stability in that the plan became too stable. When the data distribution changed dramatically, the queries didn't age out of the cache and the queries would do such things as use the wrong index, or not work out the correct cardinality of a table and then use an index where really a full table scan would have been better... and so on.

How does the database handle this sort of thing?

nikita · on June 18, 2012

Right now you can handle this with query hints. The plans are stable, but they are attached to the query text. When you throw a hint in there memsql generates a new plan since the query text changes.

chris_wot · on June 18, 2012

To clarify - if you add the query hint, run the query, then remove the query hint then has the original query's plan expired?

Edit: I know I'm asking a lot about the plan cache, but as it's a core selling point, then I'd like to know how data distribution changes will affect performance. Stale queries will potentially adversley affect performance under certain circumstances.

tintor · on June 18, 2012

No, original query plan will only expire if indexes are dropped or new ones are created.

acdha · on June 18, 2012

It looks like this is a fork of MySQL. Did they actually pay Oracle for a proprietary license or is there a source release hidden on the website?

ericfrenkiel · on June 18, 2012

yes, we have commercial licensing agreements in place.

al_james · on June 18, 2012

Looks useful. Drop in mysql compatibility is a huge plus for many projects that are tied to mysql (for whatever reasons).

Does anyone know if there are any plans for having MongoDB style replica sets (e.g. sharded and replicated databases in a cluster)?

Also, it would be great if it supported the native mysql replication, so you could have MemSQL replicas of a master mysql DB.

ankrgyl · on June 18, 2012

Sharded/replicated MemSQL is in the works. And yep, the replication will be compatible with MySQL replication to enable grouping MemSQL and MySQL nodes.

al_james · on June 18, 2012

Excellent! Good work!

Side note: IMO, If you want to completely dominate this space, make it super easy to add new nodes and replicate the existing DB without having to take a snapshot from the master.

nikita · on June 18, 2012

Thanks! This is absolutely on our minds.

songoffire · on June 25, 2012

Seconded, this would be an amazingly compelling and valuable feature!

factorialboy · on June 18, 2012

Looks promising. Love how it inter-operates with MySQL driven apps.

I should be able to try it for one of my apps over this weekend!

ozgune · on June 18, 2012

Congrats guys on the launch! Looking forward to giving it a spin.

prezjordan · on June 18, 2012

Looks like they're using jQuery Knobs: http://anthonyterrien.com/knob/

nikita · on June 19, 2012

Yes, for the workload simulator :)

alexrson · on June 18, 2012

Congratulations, Eric!

Produce · on June 19, 2012

Why not just use objects in your favourite shitty OOP language as usual and take regular snapshots of the data in memory and save them to disk as rollback points instead? Bam! Nothing new to implement in the application. Bam! Lightening fast access and write times. Bam! Data persistence.

Bam!

ericmoritz · on June 19, 2012

Wow, HN is harsh. This is probably a really solid product that solves a specific problem and you all nit-pick at it's trade-offs and poo-poo it.

MrMan · on June 19, 2012

HN is not harsh. HN is one of the main marketing channels for YC-funded projects such as this.

ericmoritz · on June 19, 2012

Sorry some commenters on HN are harsh

vmalkani · on June 18, 2012

Congrats!