Hacker News new | past | comments | ask | show | jobs | submit login
MemSQL Launches Unlimited Community Edition (memsql.com)
198 points by ericfrenkiel on May 20, 2015 | hide | past | web | favorite | 54 comments

  > The Community Edition is distributed as an executable 
  > binary and is a free edition of the commercial MemSQL 
  > Enterprise Edition. You are free to download and use 
  > MemSQL Community Edition within your organization.
So.. how long until the same thing happens as happened with FoundationDB?

I think the FoundationDB acquisition by a company with no interest in selling enterprise products was an anomaly. A popular, commercial enterprise storage system that actually makes money would be an acquisition target from the likes or Oracle, SAP, EMC, etc...in that scenario, the acquiring company would have significant interest to increase adoption of the product and maintain the developer community versus completely shutting the product down.

You mean like Oracle with MySQL? At least in that case the 'community' could move to MariaDB, which is not an option for non-Free databases like MemSQL.

you can do the very same - MemSQL uses the MySQL-wire protocol so it works with any MySQL driver and tool.

If MySQL was a drop-in replacement for MemSQL, you wouldn't need MemSQL in the first place. The reason you chose MemSQL is probably because it offers something that MySQL doesn't. If I can't take the source and continue using the product, it's a very different situation from MySQL/Oracle.

While I am an enterprise user of memSQL now, I still have a machine running the beta version 1.0 from years back. It is still stable and still fast as hell.

He's referring to what happened with FoundationDB. The announcement that they were bought by Apple was sudden and they pulled out all their downloads including open source ones. They didn't even update their webpage or give explanation why download links or github repos are gone.

The biggest issue is that FDB is not available even if you would pay. So it sucked for anyone who decided to use it in production.

My thoughts exactly. Be careful people!

Cue "Call me maybe, MemSQL"

Aphyr's posts (taken with appropriate amounts of salt) have become the authority on marketing claims. That said, many solutions are perfectly viable with their shortcomings, but knowing what those shortcomings are is essential.

When MemSQL is configured for synchronous durability and all databases are configured with synchronous replication, Jepsen confirms that writes to MemSQL are durable and acknowledged appropriately. As part of testing for the 4.0 release, we used the jepsen network partition test and observed data durability equivalent to Postgres(https://aphyr.com/posts/282-call-me-maybe-postgres) e.g. the scalability of a cluster with the durability of a single node machine. As part of running this test, we noticed some opportunities to do some cool performance optimizations. Stay tuned for a blog post to follow!

^ would love to see the details of this in a blog post, yeah. Please submit that on HN when it's ready; I'm sure I'm not the only one interested :)

will do!

Eric, one of the cofounders, here. happy to answer any questions on MemSQL 4 and the community edition. Some new features in MemSQL 4:

- fully distributed joins

- native geospatial index and datatypes

- lots of new SQL surface area

- concurrency improvements

- analytic optimizer

- Spark, HDFS, and S3 connectors

Hi Eric!

Not sure if you remember me, but we spoke several (5?) years ago when you guys first started. I was the SAP HANA guy and I think we were talking about the landscape of in-memory solutions back then. First off, congrats on the success so far. Second, a few questions:

- How is MemSQL comparing to HANA and Vertica? My understanding is that MemSQL provides the same infrastructure (columnar in-memory based storage) of those solutions but will run on commodity hardware (HANA for example is hardware-vendor locked).

- One of the interesting topics that has come up in the HANA space is that it's expensive to maintain and scale. Specifically, provisioning new servers for data growth and archiving old data out of memory. Are these issues present at all in MemSQL?

- Lots of your customers seem to be using it for company-specific strategic solutions. Are any using it for operations? (like financial close reporting, or as a transactional DB)

Of course we remember you. Please stop by our new office!

You are right about the commodity hardware. The other difference with HANA is that MemSQL rowstores are in memory for high throughput applications and columnstores can be stored or flash or disks. So it's economical to scale MemSQL to very large datasets.

- MemSQL is very easy to scale. It comes with an ops dashboard that lets you add nodes with just a few clicks.

- There are a lot of different use cases. Some companies use us for operational reporting, end of day financial reporting, high throughput counters, real-time risk analysis, etc

I might take you up on that! Shoot me your contact details so I can set up (just did a search on my emails and can't find anything). My contact is in my profile.


eric at and nikita at memsql.


"but will run on commodity hardware"

We run memSQL 4.0 on 18 machine cluster, all commodity hardware. It is awesome.

Interesting. We've implemented a metadata layer for HDFS and YARN using NDB (MySQL Cluster) - that also supports READ COMMITTED transactions. Do you support:

- row-level locking

- independent transaction coordinators at data nodes

- pruned index scans

- network-aware transactions (with user-defined partition keys for tables)

- any asynchronous/event API


- row-level locking -> yes we use MVCC and take a row level write lock when necessary for consistency

- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.

- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.

- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.

- any asynchronous/event API --> no, we don't have an event API Most of our use cases are "pull" oriented which scales very well with MemSQL

Great. Lots of good stuff there. Pruned index scans are index scans where the data is located on a single shard and the index scan doesn't flood all nodes in the DB. I'll definitely be looking into MemSQL.

MemSQL partitions data across nodes by hash, not by range, so partition prunning is less applicable. However, in a case when it can be applied MemSQL does apply it. [1]

Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning. [2] [3]

[1] http://docs.memsql.com/latest/concepts/distributed_sql/#inde...

[2] http://docs.memsql.com/latest/concepts/columnar/#query-effic...

[3] http://docs.memsql.com/latest/concepts/columnar/#maintenance...

Shard key matching is effectively partitioned pruning - which is great. This is a feature not many people are aware of, but is super important when scaling to large clusters and when you have "session-oriented" (or in our case inode-oriented) data spread across different tables.

What is the replication picture for the community version? I can see that Enterprise has HA features but I have to guess that there is some form of safety if one node goes down in Community.

What's the catch here? :)

It seems like the improvements here are OLAP focused, and welcome ones at that, but the docs and product, if not the marketing, seem to be moving away from operational workloads.

From my interpretation of the docs, there are no "transactions" in the Jim Gray / ACID sense of the word. MemSQL offers transactional semantics with READ COMMITTED isolation. This is not just not SERIALIZABLE, it's also not REPEATABLE-READ or SNAPSHOT-READ.

For example, imagine a two statement transaction where statement 1 reads a counter value and statement 2 increments it. If two users run this transaction at the same time, the counter could lose an increment. This example is trivial and probably could be done in a single statement, but many other read-then-write operations could cause such an inconsistency.

Unless I'm misunderstanding something.

Hi @jhugg, a performant implementation of a counter usually does not read and update the value in separate statements within a transaction. Generally, people use UPDATE or INSERT...ON DUPLICATE KEY UPDATE (upsert) to implement this workload. In fact, transactional, high-throughput counters is an extremely common use-case for MemSQL [1].

As a matter of fact, even Oracle and MS SQL Server offer READ-COMMITTED as the default isolation level. Moreover, there are known issues with using SERIALIZABLE isolation in Oracle [2].

[1] - http://blog.memsql.com/high-speed-counters/

[2] - http://stackoverflow.com/questions/11826368/oracle-select-im...

Yes. I know there are other ways to do simple counters. The counter-example broadly applies to multi-statement operations that feed the output of reads into writes, i.e. in general transactions.

And yes, the defaults on many systems are low, but you can turn them up if you have a transactional workload. Read-committed might be fine for a Drupal backend, but it's not truly transactional.

Related and neat post:


One of the relevant points Peter makes is that weaker isolation may work ok at low contention and low scale, which matches most DB workloads, but probably not the ones people on HN care about.

http://voltdb.com/john-hugg-work-volt is this you at voltdb ?

Well if MemSQL supports locks, then you can implement any stronger isolation model using both locks and READ COMMITTED transactions. Do they support row-level locking?

Yes, but then you losing the performance benefits.

seg-fault: Yep. That's me. You could take what I say with a skeptical eye because I work on a competing system, but it increasing appears that that's not actually true.

VoltDB for transactions and ingestion-time analytics and MemSQL for deeper analytics might be a neat combo system. YMMV.

Our company (Simbiose) recently did a strong stress test with memSQL with billions of JSON rows, using complex JOIN queries and the results are simply AMAZING.

  > While you are free to use Community for your projects,
  > MemSQL does not support or endorse using it in production.

Ehhh. Do they mean that the Community Edition is only usable for development?

for critical deployments, the enterprise version has high availability, cross data center replication, and more. you can use the free edition however you please.

I currently use beta 1.0 for one of my tools and it has been functioning for years without a problem. I think they are just saying don't sue us if you use in production and didn't pay for it :-)

The CloudFormation cluster generator tool they have is really cool. (http://cloud.memsql.com/cloudformation) The templates it generates are pretty complex, I wonder if that is hand written or using some kind tool.. would anyone be able to shed some light on how they did it? Do you think most of it is hand coded? I've been playing around with the .NET SDK in VisualStudio 2013.. it includes a cloud formation project type, and you can type out the JSON with Intellisense which is pretty cool.

Hi @superlogical, thank you for using it and for the kind feedback! CloudFormation by itself provides some very basic level of conditional/looping logic, but we found that it was not enough yet to provide a stellar experience. So, when you fill out the form on cloud.memsql.com, we auto-generate a template (we wrote the code to do this) that matches the parameters you filled in, upload it to an S3 bucket on our account, and then expose it as a download or via GET directly to your AWS account (i.e. you own the hardware/database/data).

This is way cool. Already spinning it up on my AWS clusters. Such a quick setup. Super stoked to us it.

It is way cool. I've been an enterprise user for some time now and am excited to get to use it on other projects for which there wasn't cost justification in the past.

When can we expect windowing functions, specifically row_number, rank, count, lead, lag, by partition.

All I can say is "it's on the roadmap." :) I used to do a lot of analytics at Facebook and am now a PM at MemSQL, so it's one of my priorities.

Would love to hear from any existing users on their experiences so far (assuming that's allowed under previous licenses). Choosing a database is one of those decisions where I tend to go with the safest, well known option, but maybe I'm missing out.

I have been an enterprise user and had the luxury of using the 4 beta for the last month or so. I run a cluster of 18 machines with 192 cores and 540GB of RAM.


memSQL is remarkably stable. I actually have one machine running the old memSQL 1.0 beta that has not rebooted in months. 4.0 has similarly stable. The only problems happen when you run too many other processes on the aggregators (which is really just me being stupid).

Speed is great and the wire compliance with mySQL makes it very easy to develop for. To be honest, the "keeping the data in memory" part isn't the best part, it is the query compiling. It is incredibly fast. Often a query that takes 30sec to 1min to execute will compile down to fractions of a second. It is very cool to watch and never gets old.

We are looking to literally move all of our internal stuff to memSQL community edition while keeping our customer tools on enterprise.

Slightly off-topic: The font weight is too light to read properly on my PC (Widows 8.1, Chrome). Stopped reading because it was too much effort to try and read.

Lol. I got what i asked for in the quora question?

Are there any optimizations or explicit support for proximate ordered joins?

Could you please elaborate? Do you mean approximate joins as in this talk? http://www2.research.att.com/~divesh/papers/ks2005-aj-tutori...

No no, sorry, it's much simpler (at least in how it works, no guarantees on implementation complexity, of course). It's an issue that comes up in timeseries databases pretty often.

Say I have a table full of quotes and a table full of trades. I want to know what the quote price was at the time the trade occurred. In no-frills SQL, that translates into something like:

    select * from t left outer join q on q.time=(select max(time) from q where time<=t.time and sym=s) and t.sym = q.sym where date = d and sym = s
If you have some sort of support for time proximate joining, the query engine only has to perform a binary search (as long as the indices on the symbol and time columns are appropriate, say symbol partitioned, then time sorted within symbol) to find the correct row from quote to join against. If it doesn't have such support, then a scan is required to find the maximum time value from the quote table that satisfies the constraint on the trade time. Presumably, if you do have support, this wouldn't be the exact query syntax, because that would heavily imply that you want to perform a table scan, or that you could change some aspect of the subquery without affecting performance. Maybe you'd have it be something like this:

    select * from t left outer join q on before(t.time,q.time) and t.sym = q.sym where date = d and sym = s

If you just use the no-frills SQL query, we're able to optimize it to do a fast index seek (instead of scan) on q.time, because we know we only have to get the max row. This optimization isn't specific to proximate joins, a simple query like: select max(a) from t where a < 42; will be optimized by memsql.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact