
NuoDB another NewSQL database. - steilpass
http://nuodb.com/
======
LeafStorm
To be completely honest, I have never heard the term "NewSQL" before in my
life. After a quick Google, it seems that the term basically means "SQL
databases that go FAST LIKE NOSQL." (Though it's not really popular.) Which
is, once again, completely missing the point.

I can't speak for professional devs - I'm just a hobbyist - but I'm not
developing for "NoSQL" databases like CouchDB and Redis because I want to do
SUPER WEB SCALE. I'm developing for CouchDB and Redis because they offer truly
interesting alternative ways to look at your data. (In fact, Redis is what got
me interested in data structures.) And if I wanted to write a project that
used data that makes sense from a relational standpoint, SQLite (for
development) and Postgres (for production) are just fine with me.

~~~
pjscott
There are a lot of people who would like a full-fledged relational database
that scales up to large clusters. Just because you're not one of them doesn't
mean they don't exist, or that they're "completely missing the point."

~~~
LeafStorm
I was referring more to the marketing aspects of the term (and proposing their
solution as a counterpart to "NoSQL"), than the concept itself. There are a
lot of people who think that "NoSQL" is all about magic scaling sauce, and
this isn't really helping.

~~~
nupark2
I don't really understand why one would accept the trade-offs of "NoSQL" if
you aren't interested in "magic scaling sauce" (and clustered redundancy).

What's the advantage here? Less rigorous schema, moving data modeling and
constraints into the more complex application code that's much more likely to
have bugs?

I can't tell you how many times database constraints have caught application
bugs with fatal data correctness consequences on the products we've worked on.

The only reason I can see for abandoning those declarative constraints is
because of the scalability issues one must contend with (and even then, I'm
somewhat unconvinced that they matter for all but the highest-load entities).

~~~
pjscott
Some the the databases grouped under the over-broad NoSQL umbrella have
reasons to use them other than magic scaling sauce. Redis, for example, is a
ridiculously fast in-memory database that gives you data structures like sets,
lists, hash tables, strings, numbers, and so on. It's tremendously useful for
some problems.

For example, at Greplin we're using Redis for our document analysis pipeline.
We're pushing thousands of documents through Redis every second, using it to
durably hold data while it gets passed around through our various back-end
worker server clusters. We've been very happy with how Redis has been
performing; it handles the write-heavy workload beautifully, and is way more
resistant to data loss than most systems for doing this sort of thing. We'll
be even happier when it gets server-side Lua scripting in a stable release,
and we can define our own commands. The point is, though, that this would be
crazy to do with a traditional SQL database, or even something like Riak or
CouchDB or most of the other NoSQL databases. Redis fills a different niche,
and filling a lot of cool niches is what I think the NoSQL stuff should be
about.

------
wccrawford
For those wondering about it, don't bother searching for 'NewSQL'. It's a
buzzword that doesn't actually mean anything other than 'it's not mysql,
pgsql, oracle, or mssql.' They claim to improve on the standard DBs, but every
single DB does it in a different way. The name is useless.

Instead, read the 'who we are' and 'how it works' bits on NuoDB's page. They
were much more informative.

I think it sounds exciting and fresh... And doesn't seem to be open source at
all.

I'm not allergic to paying money for software, but I've found I'm happiest
when using something that I can guarantee will survive, even if may have to do
so by my own hand. Commercial products have a tendency to be a pain in the
rear.

~~~
rkalla
Thanks for the heads up (was looking for more real info and not the damnable
'beta signup' form)

This sounds like SQL ontop of Cassandra (for a more or less mental-map if you
are looking for one):

    
    
      Our distributed non-blocking atomic commit protocol allows database 
      transaction processing at any available node. The system is 
      designed to provide a consistent view of data, high availability, 
      and the ability to detect and manage partitions in a predictable   
      and consistent manner. Query processing scales with the number of 
      available nodes, storage scales according to the choice of storage 
      manager of which there are many. Storage managers are key/value 
      storage engines that also operate as a distributed in-memory cache 
      and participate in the commit protocol to ensure durability 
      guarantees.

~~~
dberg
Given that they seem to favor consistency in the CAP model, it would be closer
to Hbase in this regard. Cassandra is, by design, eventually consistent and
can not offer these same guarantees.

~~~
stephenjudkins
This is false. Cassandra can offer any level of consistency you desire, on a
per-operation basis. See the "ConsistencyLevel" section of
<http://wiki.apache.org/cassandra/API>. The tradeoffs of the CAP theorem, of
course, apply. But if you want 100% consistency (availability and partition-
tolerance be damned) it's there for you.

There is a general misconception that the an entire DBMS must pick tradeoffs
based on the CAP theorem. (IE, you can use a database that offers
availability, but can NEVER offer immediate consistency) Instead, a DBMS may
offer different CAP guarantees per operation.

~~~
eis
I don't think so. Does Cassandra provide guarantees regarding the ordering of
queries? Cause I couldn't see that.

Another problem I can see is when you write to 3 nodes, have W=2 and a write
succeeds only one one node. Then later on through read repair the value
propagates to the other 2 nodes. So our write "failed" but really in the end
succeeded. I'm not too much into the Cassandra details but those two things
immediately came to my mind when reading some of the wiki pages.

It boils down to not using/offering algorithms like 2-Phase Commit or Paxos
but more sloppy quorums.

So from what I can tell, Cassandra can not guarantee strong consistency. But
then again I could be wrong and missed something.

~~~
stephenjudkins
Taken directly from the page I linked from above:

"Note that if W + R > ReplicationFactor, where W is the number of nodes to
block for on write, and R the number to block for on reads, you will have
strongly consistent behavior; that is, readers will always see the most recent
write."

Could you explain why this is incorrect? It seems the Cassandra devs use the
definition of "consistency" from Brewer's paper, and Cassandra offers this
level of consistency.

Your point about read-repair is correct when using QUORUM consistency level,
but even then you can avoid that by using the ALL consistency level.

------
seiji
Now with _Cloudbursting_!

Looks like they are doing a pure enterprise push: _NuoDB, Inc., the thought
leader in cloud database technology, ..._

------
julochrobak
I'm a big fan of relational theory and I'm always happy to see any activity in
this space. However, their "revolutionary" distributed model seems to be
similar to something which is already out there, open source, and ready for
use:

[http://bandilab.org/blog/2011-06-22.Ostap.Running_on_Multipl...](http://bandilab.org/blog/2011-06-22.Ostap.Running_on_Multiple_Nodes.html)

or maybe I just don't have enough information about the nuodb ?

~~~
jimstarkey
Nope. Not enough info on NuoDB.

NuoDB doesn't have a centralized transaction controller. It doesn't have a
centralized anything. And it isn't "like" anything else -- all new. It's a
distributed database without distributed transactions.

In a nutshell, it isn't based on pages or disks but distributed objects. Each
distributed object, called an atom, can exist in as many places as necessary.
Each atom instance knows of the other instances and replicates changes.
Transaction nodes do SQL and are diskless. Archive nodes listen to replication
and store serialized atoms on disk, S3, or some other KV store. If a
transaction node needs an atom, it gets it from another transaction node or,
at last resort, from an archive node that fetches the atom from disk, S3, or
wherever.

~~~
julochrobak
fair point about the transaction controller, this is indeed something which is
not yet distributed in bandicoot. Though, I don't think it's a big deal as the
tx is really simple and lightweight.

Anyway, let's go back to nuodb :) I'm really interested on how it manages the
whole ACID. I did not get it from the page nor from your reply.

For simplicity, let's pick on durability only. The archive nodes listen to
replication and store serialized atoms on disk. At the same time the atoms
replicate the changes to other instances. Is the durability achieved by a
synchronous replication?

------
moe
Has anyone actually tried this wonder-machine or is it Vaporware?

I remember submitting an evaluation request almost a year ago but never
received a response.

Their homepage and blog don't seem to have changed since then either; no real
information anywhere, only truck-loads of PR fluff.

------
jbellis
Their claims reminded me of Jim Starkey's history with the CAP theorem:
[http://pl.atyp.us/wordpress/index.php/2010/03/cap-and-
leases...](http://pl.atyp.us/wordpress/index.php/2010/03/cap-and-leases/)

------
andrewcooke
in case anyone else was wondering: the key CAP-related "compromise", from "how
it works", is that on partition it supports transactions on only one sub-set
of nodes (the "healthiest"); the others provide read-only snapshots.

<http://nuodb.com/how_it_works.html> (see "Availability and Resiliency").

~~~
CPlatypus
In other words, it's plain old CA.

------
wmf
Anybody know why they changed the name? Nimbus seems more pronounceable.

~~~
rkalla
To add validity to their new term "NewSQL" possibly? (I don't know, just
guessing).

To the "NewSQL" term, I don't really have a problem with it... from their "how
it works" it sounds like they are genuinely doing something interesting and
cool with SQL. I have no qualms with them calling it "NewSQL".

I don't know why marketing terms get people so mad, sometimes they make sense.

I'd also point out that the first few times people used "NoSQL", I imagine
there were comments along these same lines talking about how silly it was.

It isn't like they named it "WebScaleSQL" :)

~~~
jimstarkey
Marketing statements notwithstanding, the name change was the result of
something really boring -- a name conflict with another company.

Oh, "New-oh dee bee".

------
jcapote
can't wait to read the white paper

~~~
spuz
I'm guessing you're referring to their claim of being able to do distributed
transactions?

 _Data is spread out across transaction and storage nodes and it always
available. When network partitions occur NuoDB continues offering services via
the most available and healthy segment while the other segments provide a
consistent read-only snapshot while awaiting reconnection at which point they
will synchronize with their counterparts. Clients can simply reconnect to the
active partition of the cluster to continue processing updates._

I'm not sure how they can ensure that network partitions only occur between
fully replicated sets (i.e. how can all the data be available in all
locations?)

------
MostAwesomeDude
Alas, I can't downvote this link.

Can anybody explain why I should go with this instead of Pg?

To the submitter or anybody else sufficiently knowledgeable: When will ORMs
get support for this? I use SQLA but could port Django or ActiveRecord code.

~~~
shin_lao
_disclaimer_ : I have a bias as we could be seen as a "competitor".

If I understand correctly, their SQL engine can scale through the use of a
proprietary peer to peer technology. This would be new and would indeed make
it possible to solve bottlenecks and scale the SQL database to a very large
number of nodes.

However, I think it misses the point where SQL is simply not always needed.
They will always be slower than a NoSQL engine that scales well.

Your example of ORMs is on spot: do you really need SQL as the back end of an
ORM? If not, why pay the relational tax?

No silver bullet.

~~~
jimstarkey
The question really isn't SQL but ACID transactions. If you can do ACID
transaction, SQL drastically reduces the amount of data transmitted.

There is no intrinsic reason that ACID transactions, with or without SQL,
can't scale, just that up until now, it hasn't.

And there's a reason that until now, it hasn't. From the beginning of time,
academic computer "scientists" have confused the terms serializability and
consistency. In short, serializability is a sufficient condition for
consistency, but it isn't a necessary condition. If you design a system that
enforces consistency without requiring serializability, it scales. Period.
Legacy RDMSes don't work that way, but that's their problem.

~~~
lincolnq
Huh. Your insight is an interesting one, but I am having trouble understanding
how it is actually useful.

If I may reduce to a concrete example:

Posit shared variables x and y, both zero, and simultaneously do (on threads
T1 and T2)

    
    
        T1: atomic { y = 1; return x; }
        T2: atomic { x = 1; return y; }
    

where 'atomic' indicates a transactional operation.

Traditional ACID rules would state that either T1 takes effect before T2 (and
so (T1, T2) return (0,1)) or T2 before T1 (so they return (1,0)). Other
interleavings are not permitted -- returning (0,0) or (1,1) would violate
consistency.

What it seems to me that you are saying is that both (0,0) and (1,1) might be
OK, depending on your application domain -- maybe the app can specify a
consistency rule that says that (1,1) isn't OK. Then you can design databases
that follow application-directed consistency rules.

At first blush this seems impractical, so I am hoping you can elaborate on the
point you were trying to make.

~~~
jimstarkey
The C in ACID is consistent, not serializable (i.e. it's not ASID).
Consistency means that declared consistency constraints are enforce -- no
dirty writes, no violations of unique indexes, referential integrity, and
anything else that the bells and whistles supports.

Let me give a simpler example: Database with one table of one field, a number.
One transaction: count the number of records and store that number.

A serializable system will force one zero, one one, one two, etc. But a
consistent system can have two zeros and no ones. Why? Because that's what
each concurrent transaction saw? Nothing wrong with that. But if application
semantics dictate that each value must be distinct, then put a unique index on
the number and the system will enforce uniqueness. Automatically enforcing
"auto-magic" constraints that nobody cares about is why serializability
destroys scalability of distributed system.

In NuoDB all messaging is asynchronous and batched, making it very fast and
efficient.

For more than you want to know, see
[http://www.gbcacm.org/sites/www.gbcacm.org/files/slides/Spec...](http://www.gbcacm.org/sites/www.gbcacm.org/files/slides/SpecialRelativity\[1\]_0.pdf)

Someplace there's even an audio recording which I recommend if you're into
self-abuse.

~~~
jhugg
So running with your example, assume I have a transaction that adds 5 to a
column value and then reads the value back.

If I start with a value of 0, then run my transaction twice, can both
transactions return 5? Or is it guaranteed that one will return 10?

~~~
andrewcooke
[sorry. replied earlier incorrectly]. one of the two transactions would abort
in this case. snapshot isolation needs to check that any data mutated in the
transaction were not also mutated externally. in the example you are replying
to, there is no mutation, and so no problem, but in your example there is. see
<http://en.wikipedia.org/wiki/Snapshot_isolation>

note that it's only _mutated_ values that are checked for conflicts, and only
against other mutations (this is why it is efficient - the number of checks
required is small). so you can get weird behaviour when multiple values are
read while different transactions change each - there's a good example in the
link above. this is called "write skew".

the whole approach is, in a sense, exploiting poor phrasing of the ansi sql-92
standard, which doesn't actually require serialisation even though that is the
most natural way to interpret it (as far as i understand things). so you can
think of MVCC as "exploiting a loophole" that leads to a more efficient
system, but one that is less intuitive. on the other hand, this is not new -
it's already the standard behaviour for postgres, oracle, sql server, etc.

~~~
julochrobak
Thanks for the clarification. However, this does not sound like something I
could use in practice.

If I have two transactions and one of them is aborted because of the changes
done by the other transaction what am I as a developer supposed to do? Retry?
I hope not, because a retry is in other words serializing the execution. One
after another.

So, if I have a system with a lot of concurrency (i.e. bank accounts and
transfers) I'd better not use NuoDB because I'd get a lot of transfers
aborted, not good. If I have a system with a very little write conflicts I'd
go for serializability because that gives 100% consistency and will be fast
anyway due to very little conflicts.

From the wikipedia link you posted, it is fairly clear that the Snapshot
isolation is good when you don't need consistency. For that you'd have to
either abort every time there is a conflict or introduce write-write conflict
(ie. serializing).

~~~
andrewcooke
(1) serializability will not be "fast anyway". it will be slow. this is the
problem with, for example, mongodb's global write lock.

(2) you would not get "lots" of aborts with bank accounts because each
transaction is, typically, to a different account. you're only going to get a
problem when two processes try to change the same person's account at the same
time.

(3) what this provides is standard-compliant ACID sql, the same as postgres,
oracle, sql server, etc. if you use any of those and don't have retry in your
code for when transactions fail then you're already in a mess.

i am not associated with this project, but what you're saying doesn't really
make sense. as far as i can see, you're criticising it for being the same as
everyone else in the standards compliant, sql world.

~~~
julochrobak
I'm not criticising it for being the same as everyone else. I'm just not sure
how the whole thing works. I don't see enough information provided on the
site, yet a lot of statements about providing full ACID and scalability.

re 1) and 2). I think they are related. The serialization can be done on
account level. There is no need to have a single global write lock. This also
implies that if the bank account transfers are typicaly to different accounts
the serialization will be as often as the aborts. Hence very rare and the
system will perform equally good in both cases. However, the serialization
does not require further retries and doesn't force the application programmers
to workaround the problems.

