
NewSQL overview - lelf
http://www.slideshare.net/IvanGlushkov/newsql-overview
======
xtrumanx
For anyone else too lazy to read the slides to find out what NewSQL is, this
is slide 13:

    
    
        NewSQL: definition
    
        "A DBMS that delivers the scalability and flexibility 
        promised by NoSQL while retaining the support for SQL 
        queries and/or ACID, or to improve performance for 
        appropriate workloads."
    
        451 Group

~~~
drapper
Better one is on the next slide:

\- SQL as a primary interface

\- ACID support for transactions

\- Non-locking architecture

\- High per-node performance

\- Scalable, shared nothing architecture

Michael Stonebraker

------
moondowner
The first time I ran into the NewSQL terminology was when I checked out
CockroachDB:
[https://github.com/cockroachdb/cockroach](https://github.com/cockroachdb/cockroach)

'A Scalable, Geo-Replicated, Transactional Datastore' written in Go,
influenced by Google's Spanner.

~~~
philippelh
There's an interesting presentation on cockroachdb
[https://www.youtube.com/watch?v=MEAuFgsmND0](https://www.youtube.com/watch?v=MEAuFgsmND0)

------
lobster_johnson
It's a shame that most of these aren't open source.

At one point I had great hopes for RethinkDB. It seemed like a great fit when
I first heard about it: It's among the newer databases, it's open source,
MVCC, distributed, sharded, multi-master, and has a neat query language with
some advanced features.

Unfortunately, while RethinkDB does feel very pleasant and modern, performance
is pretty terrible, and it's quite lacking in some other areas.

~~~
danielmewes
Daniel @ RethinkDB here

We're still optimizing the performance of certain operations. There are
definitely still a few rough edges, especially with analytical queries as
you'd pointed out.

As far as basic operations (inserts, updates, retrieving documents etc.) are
concerned, RethinkDB already has competitive performance and we hope to
publish official benchmark results soon. As you know, good, comparable
benchmarks aren't easy, and we're currently spending most of our time on
improving RethinkDB rather than working on those. But they're definitely
coming.

This is just a guess, but a query involving COUNT(*) might execute faster with
PostgreSQL since it can use an index for making the count basically a
constant-time operation, while RethinkDB doesn' currently support this. That
is definitely a missing feature on our end.

If you get a chance, I'd be very happy to hear from you about the specific
queries and the data you'd been testing, so we can look into those specific
performance problems. My email is daniel at rethinkdb.com .

~~~
lobster_johnson
Benchmarks are indeed hard, which is why I only posted a very specific couple
of numbers.

Postgres isn't using an index for the count. When you're essentially tallying
an entire table it's a lot less efficient to traverse a B-Tree than to just
sequentially stream the table, which is something Postgres is pretty good at.

Here's a sample explain output [1]. Mind you, that server was under heavy load
(load avg 12) when I ran the query, which is why the numbers are higher than
usual; with no competing CPU or I/O, it normally takes about 500ms.

That box has only 4GB of RAM, out of which it's using about 512MB for caching,
and the OS is using about 1.5GB for the file page cache. As you can see from
the plan, it's hitting 38,108 buffers, or 297MB of data, meaning its plowing
through approximately 148MB/sec to compute the aggregation.

For comparison, RethinkDB chugs about 6GB of RAM before it manages to eke out
some results from that query. I'm honestly not sure what it's doing.

[1]
[https://gist.github.com/atombender/e97dde73ed90054c7626](https://gist.github.com/atombender/e97dde73ed90054c7626)

------
amaks
It's strange he doesn't mention Google's Spanner, or references a paper
([http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en/us/archive/spanner-
osdi2012.pdf)).

~~~
gliush
I'm going to. I just don't want it to be half-done, and this paper requires a
lot of time :(

------
dfragnito
We consider ourselves NewSQL
[http://schemafreedb.com/](http://schemafreedb.com/). We are looking into
using Nuodb as one of our backend storage options when scalability is a
concern, default is MySQL with TokuDB.

~~~
gliush
Take a look also at FoundationDB, very promising DB with a good performance
and scalability.

------
0942v8653
Is there a non-slideshare link?

~~~
jgord
share the sentiment, to the point I'll never visit an SL link

~~~
devty
mind if i ask why?

------
ddorian43
Is there any open-source/free newsql db ?

~~~
mathnode
Slide 27. VoltDB.

Or use posgres-xc, or mariadb 10 with galera and maxscale. Postgres has JSON
storage. MariaDBB has dynamic and virtual columns, and the CONNECT engine
which can now directly create new or edit existing JSON files using the SQL
interface, instead of dedicated functions for JSON manipulation.

[https://mariadb.com/kb/en/mariadb/connect-json-table-
type/](https://mariadb.com/kb/en/mariadb/connect-json-table-type/)

~~~
ddorian43
voltdb free has no HA

postgres-xc/xl has no HA

Maxscale looks like it doesn't yet offer sharding

------
frik
MySQL/MariaDB/Percona with its different modern db engines and MySQL Cluster
should be mentioned.

~~~
gliush
I haven't found any notion about good scalability as a feature. Could you
provide a link to a good article about it so that I could mention them

~~~
frik
[http://highscalability.com/blog/2010/11/4/facebook-
at-13-mil...](http://highscalability.com/blog/2010/11/4/facebook-
at-13-million-queries-per-second-recommends-minimiz.html)

