
Why some NoSQL DBs only let you perform transactions on a single data item - tim_sw
http://dbmsmusings.blogspot.com/2015/10/why-mongodb-cassandra-hbase-dynamodb_28.html
======
arosenbaum
Misleading title: Abadi specifically called out particular NoSQL DB's, he
wasn't generalizing to all NoSQL DB's. MarkLogic
([http://www.marklogic.com](http://www.marklogic.com)) is an Enterprise
Database used by many large organizations as a system of record for
information traditionally stored in RDBMS or Mainframe.

Joe Hellerstein gave a wonderful keynote at ACM SoCC last year that is a great
read for all who are interested in this subject:
[http://db.cs.berkeley.edu/jmh/talks/SoCC14-keynote.pdf](http://db.cs.berkeley.edu/jmh/talks/SoCC14-keynote.pdf)

MarkLogic has had transactions since version 1.0. The same mechanisms that
enable multi-statement transactions are also critical for overall reliability.
A recent Gartner study ranked MarkLogic as #3 for reliability (not NoSQL
databases, all databases).

Many on HN may have not heard of MarkLogic as we are not open source. We are,
however, the largest NoSQL ISV by employee count (and probably other measures
also.)

I am VP, Product Strategy for MarkLogic (ExIngres, ExCohera)

~~~
manigandham
Do you have info or can you point to anything where I can learn about
MarkLogic without all the marketing speak on the site? It's really hard to get
any info about how the software actually looks like and works and compares to
other options out there...

~~~
arosenbaum
[http://developer.marklogic.com](http://developer.marklogic.com) is the place
to go.

The closest we have to a "comparison" is [http://www.marklogic.com/what-is-
marklogic/features/](http://www.marklogic.com/what-is-marklogic/features/)
those pages do actually end drilling down to documentation.

------
MatthewWilkes
ZODB ( [http://www.zodb.org/en/latest/](http://www.zodb.org/en/latest/) ) is a
NoSQL database that has been around for almost twenty years now. It's a
Python-specific NoSQL database that allows persisting of arbitrary objects in
a tree structure. It not only has multi-object transactions, transaction
savepoints for memory efficiency's sake (both optimistic and normal), a
pluggable 3-way conflict resolver, two-phase commit and support for long-
running branches which contain multiple transactions.

~~~
the-dude
ZODB is no database. ZODB is a persistance layer. Where is the SQL in ZODB?
The level of sophistication in the query interface is laughable.

~~~
plonh
It is a database, not a relational database.

------
judah
On the .NET stack, RavenDB[0] supports transactions on multiple items[1].

RavenDB is a nice middle ground: eventual consistency for queries for speed,
but ACID for create, update, and load (one or more items by ID).

It uses transactions throughout, so a failure in the midst of 10 writes will
rollback all of them, as one would expect in a traditional relational
database.

[0]: [http://ravendb.net/](http://ravendb.net/)

[1]: [http://ravendb.net/docs/article-
page/3.0/csharp/start/gettin...](http://ravendb.net/docs/article-
page/3.0/csharp/start/getting-started)

~~~
polskibus
It does indeed look nice, however the pricing looks worse than MS SQL in the
long run (for the enterprise version).

~~~
judah
It's free for open source projects.

For commercial enterprise, Raven is effectively $788/core/year [0]. Quite
reasonable, IMO.

Contrast this with MS SQL Server Enterprise, which appears to be $14,000/core
one-time cost [1].

(Disclaimer: I'm a part-time employee for RavenDB. But I loved and used Raven
on my own projects before becoming an employee.)

[0]: [http://ravendb.net/buy](http://ravendb.net/buy) [1]:
[http://www.microsoft.com/en-us/server-cloud/products/sql-
ser...](http://www.microsoft.com/en-us/server-cloud/products/sql-
server/purchasing.aspx)

------
LoSboccacc
[https://en.wikipedia.org/wiki/FoundationDB](https://en.wikipedia.org/wiki/FoundationDB)

designers haven't solved all problem around transaction and performance, but
that doesn't mean they have stopped trying

"As various NoSQL databases matured, a curious thing happened to their APIs:
they started looking more like SQL. This is because SQL is a pretty direct
implementation of relational set theory, and math is hard to fool."[1]

[1][http://blog.memsql.com/cache-is-the-new-
ram/](http://blog.memsql.com/cache-is-the-new-ram/)

~~~
jclaybaugh
As a heavy user of FoundationDB I'd point out they have a five second
transaction limit. One of the big compromises made to make their system work.
This limit of course makes things... interesting in the real world.

------
fiatjaf
What is the percentage of web applications that really need all that scaling?
How many database instances does Hypothes.is need? What about Airbnb?
Duolingo? Feedly?

~~~
kpil
Exactly.

To handle massive loads, you can trade away transactional integrity and get
increased performance AND a whole new range of complicated problems that was
more or less solved for you, that you now have to take care of yourself.

For some applications, the tradeoff is not a problem - and in others it
requires a lot of work, up to more or less your own implementation of
distributed transactions.

I have this feeling that I can't shake off , that a lot of people think
relational (transactional) databases are complicated, and fail to see why they
actually are complicated. This probably is not the category of people that
truly need to solve their performance problems though.

------
mooreds
This was very helpful. I am in my first project with MongoDb after a lot of
experience with RDBMs and I do miss transactions. There are other things I
miss more: foreign keys (dobt use dbrefs!), joins, and SQL, though.

Unfortunately this application is new, so we haven't yet seen the benefit of
schemaless data changes.

~~~
zurn
PostgreSQL + JSON gets you working transactions and schemaless.

~~~
radicalbyte
Plus it scales well and doesn't randomly lose data. Oh, and the project
leaders are professionals who are honest and transparent about the
capabilities of their product.

~~~
Rezo
It scales, to a point. In my opinion however, any type of sharding/clustering
is still a total mess in PostgreSQL. Most projects will however probably never
need to.

~~~
Roboprog
How DARE you, sir, imply that we are not the next "Amazon"!

Kidding aside, I suspect with a little discipline, one could design a PG DB
that could be ported to Oracle (Exadata???) should the need, and budget
($$$$$), for that sort of scaling arise. Cheap and safe now, Expensive and
safe later (if needed)

------
dan31
The question here is really "When do you need distributed transactions"? One
of misuses is when one cannot achieve enough performance on a single node.
E.g., one builds a system serving 1000 REST requests per second (RPS),
achieving 2 seconds latency per request, having a DB as a bottleneck. To be
honest, I've seen real software built giving 23 sec latency per only 50 RPS.
Does it mean there is a need to scale it out or simply that a chosen DB is a
problem? The cases I've seen through my practice are mostly on the latter.

Simply choose the most suitable solution, not the most hyped one. A mistake
would be to sacrifise transactions via using some "general NoSQL database".
Today, for real, you can have a single node capable of millions RPS on real-
world scenarios with ACID transactions of arbitrary complexity, choosing
solution like [http://starcounter.io/](http://starcounter.io/). Please, please
don't just take yet another no-transactions DB, which is no-transactions even
on a single machine, having 4 db nodes on 4 cores completely separate as if
they've been 4 different machine. Then you observe bad performance and start
to scale things up, paying more and more for the cloud. Not the best idea to
spend time and money.

And, if you DO really need distributed transactions, then it mostly means
they'd be driven by a logic of your subject domain. E.g., you might have one
department in Sweden, one in the USA, then you need to manage distributed
accounts in the right way, where "right" is up to your banking policy.
However, if you need to scale reads, there is just no problem of doing so
within "no-distributed-ACID" solution. The same time, if you need to scale
writes, doing distributed transactions isn't a good idea either, as you've
seen from the topic starter article.

So, right tool for the right job.

Outside of the brackets I keep a topic of fighting with latency via
distributed transactions. I mean those things around caching, CDN and async
replication. Distributed transaction isn't a remedy there at all, since it
doesn't patch speed of light by any means.

------
CydeWeys
Nobody else has said it, so I'll do it: This website is horrible. The font
color is #888888 on a white background, which is very hard for me to read. And
I can't even fix it by disabling a CSS style in the web inspector panel like I
usually would because the CSS styles are hard-coded into style attributes on
spans for each paragraph! What is this, a website from the 90s? How is it so
horrible, given that other random Blogspot blogs I'm checking have sensible
markup?

~~~
MarkCole
You know how when you write something in Microsoft Word, and then copy it out
into an editor, it takes terrible and broken HTML markup with it? That is
exactly what has happened. The author has written the post in Word copied it
to blogspot, and wham, bam, thank you ma'am, you get that clusterfuck.

------
shin_lao
Sorry to say but we have something that is truly transactional and distributed
([https://www.quasardb.net](https://www.quasardb.net)). It slows down a bit
writes (because of the commit) but it scales well.

And before us FoundationDB delivered the same feature (although implemented a
bit differently).

The truth is that there is a market for databases with "unreliable" writes and
there are easier to design. That's why you see a lot of them.

~~~
eis
What are you using LevelDB for? I hope not the main KV store.

Do you use 2PC or something like Raft to handle distributed transactions?

From the docs it sounds like everytime a node joins/leaves the cluster goes
into a (brief?) unstable mode and failures can happen, that doesn't sound
great.

~~~
shin_lao
I'm happy to see you've read the documentation thoroughly!

We indeed stopped using LevelDB. It has been used for a while but we couldn't
work around some speed issues. LevelDB is good for many scenarii when properly
configured, though.

The database is multilayered, LevelDB was the last layer, most parallel
operation were managed by a massively parallel in-memory database which used
LevelDB as a backend.

We do 2PC to handle distributed transactions. Raft breaks our consistency
model.

When a node joins/leaves the cluster you might have some transient failures
which are generally absorbed by the protocol.

I hope I answered your questions and feel free to give it a spin next week,
we're about to release a major update!

------
hardwaresofton
As previously noted, title is misleading. Also, pick a better document store
(though you should note that rethink only guarantees operations on single
documents as atomic):

[http://rethinkdb.com/](http://rethinkdb.com/)

Yes, "better" is subjective, but rethink has a good page detailing the
differences (between rethink and mongo):

[http://www.rethinkdb.com/docs/rethinkdb-vs-
mongodb/](http://www.rethinkdb.com/docs/rethinkdb-vs-mongodb/)

Feature-list type breakdown:

[http://www.rethinkdb.com/docs/comparison-
tables/](http://www.rethinkdb.com/docs/comparison-tables/)

And pay attention to the guarantees that it provides/doesn't provide when you
pick. I think this is a big part of what separates novice developers form
middle-tier developers. There are always tons of choices, for just about
everything in software these days, and your job is to figure out how best to
solve which problem you're facing, in a way that future generations won't hate
you for.

------
schmichael
Nothing in the first paragraph is true about Cassandra except perhaps that it
does allow for some limited nesting of data. The rest of the article is great
though. It doesn't even need the first paragraph.

------
bascule
HyperDex Warp does lightweight multi-key transactions:

[http://hyperdex.org/warp/](http://hyperdex.org/warp/)

~~~
lobster_johnson
Are you using HyperDex in production? I have never heard about anyone using it
-- but then good products often stay under the radar when all its users are
satisfied and productive.

------
cesher
Have any of you heard of DocumentDB from Microsoft Azure? It is very similar
to documentDB but It was specifically designed to have atomic transactions.

------
krisdol
The multi feature of redis kind of allows you to create some level of
transactionality over multiple items, no?

~~~
ddorian43
Not over multiple nodes. Only on 1 node/core.

