> The Community Edition is distributed as an executable
> binary and is a free edition of the commercial MemSQL
> Enterprise Edition. You are free to download and use
> MemSQL Community Edition within your organization.
The biggest issue is that FDB is not available even if you would pay. So it sucked for anyone who decided to use it in production.
Aphyr's posts (taken with appropriate amounts of salt) have become the authority on marketing claims. That said, many solutions are perfectly viable with their shortcomings, but knowing what those shortcomings are is essential.
- fully distributed joins
- native geospatial index and datatypes
- lots of new SQL surface area
- concurrency improvements
- analytic optimizer
- Spark, HDFS, and S3 connectors
Not sure if you remember me, but we spoke several (5?) years ago when you guys first started. I was the SAP HANA guy and I think we were talking about the landscape of in-memory solutions back then. First off, congrats on the success so far. Second, a few questions:
- How is MemSQL comparing to HANA and Vertica? My understanding is that MemSQL provides the same infrastructure (columnar in-memory based storage) of those solutions but will run on commodity hardware (HANA for example is hardware-vendor locked).
- One of the interesting topics that has come up in the HANA space is that it's expensive to maintain and scale. Specifically, provisioning new servers for data growth and archiving old data out of memory. Are these issues present at all in MemSQL?
- Lots of your customers seem to be using it for company-specific strategic solutions. Are any using it for operations? (like financial close reporting, or as a transactional DB)
You are right about the commodity hardware. The other difference with HANA is that MemSQL rowstores are in memory for high throughput applications and columnstores can be stored or flash or disks. So it's economical to scale MemSQL to very large datasets.
- MemSQL is very easy to scale. It comes with an ops dashboard that lets you add nodes with just a few clicks.
- There are a lot of different use cases. Some companies use us for operational reporting, end of day financial reporting, high throughput counters, real-time risk analysis, etc
We run memSQL 4.0 on 18 machine cluster, all commodity hardware. It is awesome.
- row-level locking
- independent transaction coordinators at data nodes
- pruned index scans
- network-aware transactions (with user-defined partition keys for tables)
- any asynchronous/event API
- independent transaction coordinators at data nodes -> we have a tier called "aggregators" that act as transaction coordinators. These are the nodes you connect to. Under the hood leaf nodes in memsql also manage transactions.
- pruned index scans -> Do you mean information retrieval? Our indexes support seeks and range scans if that's what you mean.
- network-aware transactions (with user-defined partition keys for tables) --> yes, we have user-defined partition keys (shard keys) and transactions work across multiple nodes on the network.
- any asynchronous/event API --> no, we don't have an event API
Most of our use cases are "pull" oriented which scales very well with MemSQL
Within each node, for column store tables in MemSQL we do use segment elimination very aggressively, which is effectively the same thing as partition pruning.  
From my interpretation of the docs, there are no "transactions" in the Jim Gray / ACID sense of the word. MemSQL offers transactional semantics with READ COMMITTED isolation. This is not just not SERIALIZABLE, it's also not REPEATABLE-READ or SNAPSHOT-READ.
For example, imagine a two statement transaction where statement 1 reads a counter value and statement 2 increments it. If two users run this transaction at the same time, the counter could lose an increment. This example is trivial and probably could be done in a single statement, but many other read-then-write operations could cause such an inconsistency.
Unless I'm misunderstanding something.
As a matter of fact, even Oracle and MS SQL Server offer READ-COMMITTED as the default isolation level. Moreover, there are known issues with using SERIALIZABLE isolation in Oracle .
 - http://blog.memsql.com/high-speed-counters/
 - http://stackoverflow.com/questions/11826368/oracle-select-im...
And yes, the defaults on many systems are low, but you can turn them up if you have a transactional workload. Read-committed might be fine for a Drupal backend, but it's not truly transactional.
Related and neat post:
One of the relevant points Peter makes is that weaker isolation may work ok at low contention and low scale, which matches most DB workloads, but probably not the ones people on HN care about.
VoltDB for transactions and ingestion-time analytics and MemSQL for deeper analytics might be a neat combo system. YMMV.
> While you are free to use Community for your projects,
> MemSQL does not support or endorse using it in production.
Ehhh. Do they mean that the Community Edition is only usable for development?
memSQL is remarkably stable. I actually have one machine running the old memSQL 1.0 beta that has not rebooted in months. 4.0 has similarly stable. The only problems happen when you run too many other processes on the aggregators (which is really just me being stupid).
Speed is great and the wire compliance with mySQL makes it very easy to develop for. To be honest, the "keeping the data in memory" part isn't the best part, it is the query compiling. It is incredibly fast. Often a query that takes 30sec to 1min to execute will compile down to fractions of a second. It is very cool to watch and never gets old.
We are looking to literally move all of our internal stuff to memSQL community edition while keeping our customer tools on enterprise.
Say I have a table full of quotes and a table full of trades. I want to know what the quote price was at the time the trade occurred. In no-frills SQL, that translates into something like:
select * from t left outer join q on q.time=(select max(time) from q where time<=t.time and sym=s) and t.sym = q.sym where date = d and sym = s
select * from t left outer join q on before(t.time,q.time) and t.sym = q.sym where date = d and sym = s