Hacker News new | comments | show | ask | jobs | submit login

If you've ever tried to use SQL on a large, but fittable-in-memory dataset on existing row databases, then you would not be underwhelmed. Existing non-finance-industry solutions suck for things like mid-size trading queries, or (in coinlab's case) work setups that require periodic, frequent table scans.

You can buy something like kdb, but that costs $70k a year and requires that your engineers learn some extremely new semantics if they're only used to SQL and (choose any popular language here, Python to C++).

Really? I've done many, many queries on large, but fittable-in-memory datasets and with a few little tricks even MySQL can be fast (make everything a temporary table with an append only transaction table to rebuild the temporary tables if lost, don't use keys, use indexes, avoid bad practices (filtering after a join, using subselects, or needlessly avoiding HAVING clauses).

Did you mean large, fittable-in-memory, and (distributed across multiple computers and/or very reliable)? Because that is much harder.

Can you explain this further? How is using a temporary table faster? Explicit temporary tables are simply regular tables that only last your current session. Do you mean the MEMORY storage engine? What do you mean by "don't use keys, use indexes?" The term KEY and INDEX in MySQL are exact synonyms.

Temporary tables default to being mapped in memory and for more complex querying (where you have multiple queries and conditionally select further queries depending on results) are faster since the table that they are based off of could have changed, but they will not, so caching is much more likely. Keys and Indexes are not exact synonyms. Give it a try some time, do a where clause with mod(id, 5) = 0 on a keyed table and on an indexed table. The indexed table will be non-trivially faster (I've seen 3x faster).

You are clearly not talking about MySQL. CREATE TEMPORARY TABLE simply creates a physical table on disk that is destroyed at the end of the session, which can be created with any storage engine. Explicit temporary tables are handled just like "regular" tables and is dependent upon the storage engine (w/ MyISAM, indexes are stored in the key buffer, InnoDB keeps both recently accessed data and indexes in the InnoDB buffer pool).

KEY and INDEX are literally the exact same thing and are just there to support different syntaxes: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

Perhaps you are referring to primary keys vs secondary indexes, for which there is a difference in how they are physically stored on disk and in memory, particularly with InnoDB. There is a substantial performance gain to be had by having a relevant primary key that is referenced in the where clause as the data is stored in the same page as the primary key, both on disk and in the buffer pool. Secondary indexes reference the primary key, thus require two lookups.

Another key point is that temporary tables are horrible for concurrency. Obviously only one session can access them, but more importantly, they end up causing a great deal of disk activity. At the end of every session, the table is physically dropped. In ext3, this can take quite a while for large tables (not so bad in filesystems like xfs). It also can cause substantial stalls in InnoDB as the buffer pool is scanned for matching pages and are flushed out. Percona Server has innodb_lazy_drop_table to mitigate this issue to some degree, but dropping tables is still always an expensive operation. This is mostly a concern for OLTP, not necessarily OLAP servers.

So, MemSQL is a ColumnStore now? The docs didn't make that clear for me.

Anyway, why is there not a single benchmark available to support the sales-pitch?

(In your favor I'll just pretend you didn't mention kdb here...)

In your favor I'll just pretend you didn't mention kdb here

Please explain. What's bad about kdb, besides how nonstandard it is?

There is nothing bad about it (other than the price). It just seems an outrageous claim that MemSQL, with all its constraints, is even in the same game.

At the least they should come up with some seriously impressive benchmarks before dropping names like that.

I have had the misfortune of working with Q. It's a very clever little language that could be a great platform for numerical prototyping. Unfortunately, it's hobbled by a terrible programming culture. The interpreter gives cryptic single-character error messages and the C interface is written in a nightmarish soup of macros.

MemSQL is a row store. KDB is single threaded. More benchmarks are coming!

http://kx.com/faq.php seems pretty clear that kdb is multi-threaded. "Does kdb+ support multiple cores and/or multi-threading? Yes, both are built right in to the system and this makes kdb+ extraordinarily fast when compared with traditional applications, as it can make full use of all available cores. "

This ties with my own experience using it.

I'm sorry, I have to clarify this. It's a single writer database...

Any chance you could publish full info on the benchmarks?

For example, where can we find your configurations for the MySQL vs. MemSQL benchmark you show in your video? Or how big the dataset was, etc...?

Actually, what you say about the price for kdb+ is true, but Q (the new language to query kdb+, essentially the latest version of K which is now deprecated) is very much like SQL, syntaxwise.

The way I see it, this is very much targeted at financial firms that use kdb+ or products like it that perform really well for things like writing lots of tick data from dozens of exchanges at a time and querying across them quickly in a familiar SQL style.

Fast, in-memory data stores already exists with support and ease of use/deployment...I was able to set up hazelcast for my distributed cluster in minutes...I don't see the advantage of this over nosql or in-memory data storage.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact