
What's left of NoSQL? - MarkusWinand
http://use-the-index-luke.com/blog/2013-04/whats-left-of-nosql/
======
bowlofpetunias
As usual, the next-big-thing-to-replace-the-old-crappy-thing has become
nothing more than a useful addition to our toolbox.

Which is not a bad thing, but I wish we could do it without the overhyped
nonsense. It really is time for this business to grow the fuck up.

The "web generation" has brought us lots of great changes, I would have quit
IT back in the 90's if the internet hadn't happened. But this immature
attitude towards everything from technology stacks to how to run a company is
really starting to grate.

~~~
threeseed
You are right. They are just tools nothing more, nothing less.

For those of us that have been doing software development for a long time we
would have heard countless vendors hyping their products. It's what they do
and will continue to do. Normal developers have always ignored it and chosen
the right tool for the job.

I actually find the anti-NoSQL crowd to be the immature ones and are often
quite patronising as though I am some idiot for choosing a particular
database. It's quite bizarre.

------
redwood
This is an incredibly short-sighted piece for two reasons

1) the data challenges of yesterday's internet giants are the data problems
for just about every 21st century enterprise tomorrow. We're at the start of /
in the midst of an irreversible data explosion.

2) Established players not taking new technologies and upstarts seriously is
never evidence of the threat not being real. We know that on the contrary our
industry is characterized by constant disruption where established players for
whatever reason do not consistently stay at the front line of innovation and
tend to get leapfrogged. (as an analogy Google+'s lurch into social may be
akin to Oracle's lurch into nosql. Does that mean social was not real?)

Bottom line: It all depends on what you're doing. But more and more of us will
be dealing with more and more complex data in the years to come. Of that, I'm
certain.

~~~
jhorey
I agree that's a bit short sighted, but I think it's one of perspective. The
mistake most people make is to believe that Oracle (and other relational DB
vendors) are competing directly against the various NoSQL stores (this
includes doc stores, scalable key-value stores, etc.). I don't think that's
true. I think that relational databases are really competing against industry-
specific SaaS. So instead of implementing their own database for inventory &
sales, companies may opt to use a service. These SaaS companies, in turn, are
more likely to adopt a variety of technologies (including NoSQL and
relational). Since the SaaS companies serve more than a single company,
they're also more likely to adopt easily scalable technology like Cassandra.
So fewer businesses will need to purchase databases (of any sort), but the
ones that do will have greater scalability needs.

~~~
threeseed
NoSQL is absolutely competing with Oracle and other relational DB vendors. You
are seeing it today with the valuations for companies like Mongo or DataStax.
SaaS is hardly a big enough market compared to say every enterprise. And
surely the companies listed on the client pages agree with me.

And people forget something important with SaaS. Data sovereignty. Here in
Australia for example there are many enterprise companies who are forbidden
from using ANYTHING that is hosted in the US due to the grey legal area e.g.
Patriot Act. So in-house databases are absolutely still here to stay.

~~~
jhorey
It's interesting, my experience with companies here in the U.S. (mainly SMBs)
are that they feel much more comfortable using a SaaS compared to adopting an
internal NoSQL store. Granted most of these companies aren't highly technical,
but I suspect for them using a SaaS is easy and doesn't require much internal
expertise. Adopting a new NoSQL store though may require training, etc. Of
course I concede that things may be different in larger enterprises and
companies outside the U.S.

I would bet, however, that most of the clients that do business with Mongo or
DataStax still intend on using/maintaining a relational system. I haven't
really encountered many companies that's decided to completely dump their
relational systems in favor of something else.

------
bitL
I am wondering if the prolonged latencies from most of the Google services I
am experiencing in the past few years are somehow related to using Spanner
instead of the previously conventional "compute in advance offline"-"serve
immediately" approach. The Spanner paper mentioned they sacrificed a bit of
latency (from nanoseconds in DHT lookup to milliseconds). I understand it's
more convenient for Google developers to have more predictable thought
framework instead of making every single project a piece of art while spending
most of the time fighting inconsistencies arising from eventual consistency.
The question is if it was worth it? I remember time when Google services were
amazingly fast - that time is unfortunately gone :-(

~~~
nevi-me
In what context are you referring to the "compute in advance offline" part?
Sometimes it's costly to compute things in advance and store them, when you
might not need them. If storage space becomes a concern, then Google would
have to write some fancy algorithms based on data access patterns that
determines what to compute in advance, and what to compute JIT. Which
complicates things because now they have to write and regularly run programs
that tell them what to compute, which simplistically means that they are at
least running 3 different tasks, instead of 1 on-the-fly computation.

~~~
bitL
I meant the traditional NoSQL architecture (like what LinkedIn was using)
where you had offline batch processing based on Map Reduce executed regularly
during the day and the results were stored in a distributed hash table with
super low latency and eventual consistency. This made access super fast but
inflexible, i.e. only what was precomputed could have been accessed.

------
versusdotcom
We switched from SQL to Mongo on [http://versus.com](http://versus.com) one
year ago and are quite happy—we couldn't imagine to go back.

But I think it heavily depends on the specific use case what DB to use.

~~~
BugBrother
I agree, Mongo is easy and fast.

My problem with NoSQL is that you MUST know there won't be changing
requirements:

Later you might _need_ to do the equivalent of joining 3-4 tables. Then you
have trouble... But I'm a coward that look both left and right before crossing
a street.

~~~
arethuza
"My problem with NoSQL is that you MUST know there won't be changing
requirements"

I thought the main advantage of document-oriented schema-less databases (I
haven't used MongoDB but I have used CouchDB) was that they are supposed to
give _more_ flexibility in the early stages of projects.

NB Personally, I've moved to PostgreSQL and its JSON storage as then you get
most of the benefits of both worlds.

~~~
BugBrother
>>[NoSQL] are supposed to give more flexibility in the early stages of
projects

I agree with what you wrote.

But my point is that _changing_ requirements can screw you, if you find that
you _later_ need relational capabilities.

~~~
arethuza
Well, that's why I gave up on CouchDB once I found that I was actually much
happier with the hybrid approach of using JSON documents _inside_ a relational
structure.

------
room271
An interesting article and it does seem like people rushed to embrace NoSQL
and are now trying to force it into some kind of consistency after the event -
not a bad thing, incidentally - there's a lot of interesting work here (Vector
clocks, CRDTs, etc.).

One thing that surprised me though was the lack of a key player: Amazon. Their
Dynamo paper was hugely significant and as a company they use eventually
consistent stores for a whole swathe of products at scale.

Why mention Facebook and Google but omit this other major player, especially
are their experiences tell a different story.

~~~
bitL
Amazon estimated that each 1 ms of additional latency costs them a few million
dollars a year as well as decreases rate of returning users. For low latency
there is still nothing better than eventual consistency, so they may be driven
by the bottom line. Also, from my experience of being a merchant on Amazon
with hundreds of thousands of items in inventory that need to be updated
almost realtime as prices/stock changes all the time, leading to a few
millions updates a day (in bulk), I can't envision how Amazon would be able to
achieve fast import from a horde of merchants like me on any SQL system.

~~~
room271
Yeah, the Dynamo paper is very clear about the rationale behind creating
DynamoDB and its usage:

\- downtime (lack of availability) is very expensive to them, both directly
and also reputationally

\- traditional ACID stores don't scale with writes whereas an eventually
consistent store can achieve this

So I completely agree with you in short: Amazon have great reasons to pursue
approaches based on eventual consistency.

------
h1karu
Relational databases don't scale horizontally. That's still as true today as
it was a decade ago. Therefor engineers who need to cope with Big Data and Web
Scale will continue to migrate away from relational database solutions towards
persistence technology centered around distributed systems. It's as simple as
that.

If you build your app around a relational database and you need to scale up
big then at some point you're going to hit a brick wall in terms of scaling
out storage and/or writes. You either have to build sharding logic into your
relational db app from the beginning(which is a pain that NoSQL saves you
from), or else you have to re-architect your entire app when the time comes
that you need to deal with scale. Many shops end up borrowing VC money to
build out a team to re-architect their systems to handle web scale, but this
can be avoided by thinking about data access patterns from the beginning and
choosing a technology that can handle your future needs.

[https://en.wikipedia.org/wiki/Scalability#Horizontal_and_ver...](https://en.wikipedia.org/wiki/Scalability#Horizontal_and_vertical_scaling)

~~~
workhere-io
AdWords was run on MySQL up until two years ago. The vast majority of
developers working on "web-scale" projects won't get to handle projects with
larger requirements than AdWords. In that perspective the whole NoSQL trend
makes very little sense, especially given the fact that e.g. PostgreSQL and
MySQL have a lot of needed features that many NoSQL databases don't, and that
the most "hip" NoSQL database a couple of years ago was a database that
doesn't even scale that well (MongoDB).

~~~
threeseed
You do understand how MySQL et al are used in those cases right ? They are
treated as dumb key value stores and sharded horizontally with joins done in
the application layer. They are NOT your typical SQL deployment and the
features you talk about are often meaningless.

And you are 100% wrong about MongoDB not scaling well. The stories you hear of
people switching are never going back to PostgreSQL or MySQL they are going to
the next level in scalability e.g. HBase or Cassandra.

~~~
h1karu
Like I said because the relational database doesn't scale horizontally you are
forced to build a sharding layer into your application which introduces
complexity into the application layer and limits your ability to use the
relational features of the relational database. At that point you might as
well just be using a key-value store that does the sharding for you and offers
greater flexibility.

~~~
baibaichen
there is still transaction in one shard beyond SQL. I believe that ACID is the
main benefit comparing with NoSQL store

~~~
h1karu
My NoSQL store of choices gives me document-level ACID semantics along with
eventual consistency, MapReduce, and replication.

Here's an example that explains how to get transaction-like guarantees from
this kind of NoSQL data-store:

[http://guide.couchdb.org/editions/1/en/recipes.html](http://guide.couchdb.org/editions/1/en/recipes.html)
[https://en.wikipedia.org/wiki/BigCouch](https://en.wikipedia.org/wiki/BigCouch)

~~~
baibaichen
Interesting, that isn't transaction, it is just a workaround, and I don't
think you can design your app like that which treat document as a transaction
log. And then using view to generate the real information.

~~~
h1karu
that's exactly how I design the parts of my app that need a transactional
nature. Map/reduce views make it a breeze to query for a consistent aggregate
view of the world.

This is not something new it's a well established technique from the
relational world called "event sourcing"

[http://www.martinfowler.com/eaaDev/EventSourcing.html](http://www.martinfowler.com/eaaDev/EventSourcing.html)

when everything is a write you don't have to worry about conflicts or locks,
so that's nice, plus couch is really good at scaling write thoughput with
small documents.

------
dchuk
I'm far from an expert on this topic, but I am a developer, and I am currently
working on a project that leverages both MySQL and ElasticSearch for storage.
MySQL handles all of the "boring" data such as users, profiles, comments, etc.
ElasticSearch is basically a giant product database, no real relational data,
just a normalized structure and easy to query. ElasticSearch is serving as a
primary data store, there is no backing in a database because ES is just that
good.

Whenever I see these "SQL vs NoSQL" arguments, I always have to wonder: Why
one over the other? A lot of projects can benefit from both and there's no
reason you absolutely HAVE to use one or another. It's perfectly reasonable
(and probably ideal) to use more than one storage system in your projects.

If you have a bunch of nails to hammer in and bolts to tighten, you don't
choose just a hammer or a wrench to do that job...you grab both and use each
for what they do best.

~~~
jcroll
> ElasticSearch is serving as a primary data store, there is no backing in a
> database because ES is just that good.

And you've been running this application in production how long?

~~~
room271
I agree that using Elasticsearch as a primary data store seems risky. The
situation is improving though; the v1 release introduces easy backups (for
example, to s3) which from my experience work very well.

Prior to this backing up was a messy process.

------
Arkadir
NoSQL discussions always seem to conflate three very different things: storage
engines, APIs and architecture. Where do we store the data ? How do we access
it ? How do we make sure it scales ?

The "traditional" approach is to use Oracle/SQL Server/MySQL for storage, SQL
and/or ORM as an API, and single-server tables-with-relationships as an
architecture. Back in the early 2000s, everybody did this. Sure, there were a
few performance-minded exceptions that went with sharding or master-slave
architectures instead, but those were exceptions.

And single-server architectures tend to behave badly at medium loads. Spend
the market rate for a genius DBA, and they still behave badly at high loads.
The next step is a 32-core 128GB RAM monstrosity that costs an order of
magnitude more than what eight 4-core 16GB servers would cost.

Most NoSQL solutions came with a new architecture. You had the MongoDB flavor
of distributed storage, or the BigTable flavor of distributed storage, or the
CouchDB flavor of distributed storage, and so on. Properly implemented
distributed storage eats high loads for breakfast: just add more servers. This
is a good thing.

My issue with the NoSQL movement is that they threw away the baby with the
bath water. They threw away the single-server relational architecture, which
was a nice change, and they also gave up the old battle-hardened storage
engines and the highly expressive SQL language and replaced them with only-
recently-experimental engines and ad hoc lean APIs.

It takes time for a storage engine to mature. To have all its performance
kinks ironed out and all its bugs smoked out. I still remember the brouhaha
around MongoDB persistence guarantees, or the critical data loss bugs in
CouchDB.

And the lean APIs just forced back all the querying logic into the
application, with all the filtering and the manual indexing and the joins and
the approximate but ultimately incorrect implementations of whatever subset of
ACID was required at the time. This wasn't an entirely bad thing: it certainly
made many developers aware of the performance implications of some joins or
transactions. But when you need to write a JOIN or GROUP BY or BEGIN
TRANSACTION that you know will scale properly, and there's no API support for
it ? Feh.

I'm a huge fan of the CouchDB architecture. Distributed change streams, with
checkpointed views and cached reductions. But I have been burned by the
CouchDB storage engine (can you say "data corÊ–NÑ %ñXtion" ?) and I see no
point in bending knee to the laconic CouchDB API. So I took the CouchDB
architecture and reimplemented it with a PostgreSQL back-end. It's _faster_
(don't underestimate the cost of those HTTP requests), I have trust that after
PostgreSQL's decade-long history all threats to my data are long gone, and I
can always whip out an SQL query when I do need it.

It's nice to see so many NoSQL solutions migrating back to an SQL-like API and
gaining enough maturity to keep your data safe. In the near future, I expect
them to be nothing more than "Architecture in a box" solutions for when you
don't want to implement specific architectures in SQL. And I expect more and
more "architecture plugins" to become available: with a library, turn a herd
of SQL databases into a distributed architecture of type X.

~~~
angrybits
Single-server relational backends can survive loads well in excess of most of
us will ever see for our software. There are a handful of truly web-scale
companies that need specialized engineering, and the time to solve for that is
when you actually have that problem in sight. Conforming SQL engines can do
some amazing things with simple and declarative code, giving that up in the
hopes you might be Twitter-scale one day is, in my opinion, a quite poor
tradeoff.

~~~
Arkadir
I certainly agree that most people who pick NoSQL solutions "for scalability"
never add a third server to their cluster.

------
zaidf
How do people that abstract away their database via an ORM feel comfortable
about not dealing directly with the data and accidentally dropping a db column
via ORM? I know when I am dealing with the database I am a lot more surgical
in my approach than when I am writing code.

~~~
StevePerkins
You generally only let the ORM create/update your schema during early
development. By the time you get to production (or heck, by the time your team
is even seriously engaged), you will have disabled that feature. In pretty
much every enterprise-grade ORM I've ever seen, schema modification is
disabled by default, and must be explicitly turned on. In most Java shops
where I've ever worked... unless it's a quick prototype, the database schema
will be developed before you start coding anyway. Also, in a typical
enterprise scenario, the username with which you configure the ORM lacks
admin-level privileges to alter the database.

Most of the criticisms about ORM's come from people who have never used them,
beyond maybe working through a Rails chat-room tutorial once upon a time. It
really has nothing to do with "abstracting away the database".

As the name indicates, Object-Relational Mapping is merely about reducing the
boilerplate required to map a relational schema to programming language
objects. If you do that mapping by hand, then you have to make decisions when
a table/object has relationships. Picture a CUSTOMER table, which has a
foreign key relationship to an ADDRESS table. When your application loads a
"Customer" object:

[1] You could "eager fetch", meaning that you go ahead and retrieve all of the
ADDRESS rows related to that CUSTOMER, and attach the Address objects to the
Customer object. Eager fetching is wasteful and leads to poor performance,
because you're hammering your database for values that you often don't ever
use.

[2] You could "lazy load"... meaning that your Customer object has an
"addresses" field, but you wait until some code tries to use that field before
you actually query the ADDRESS table to populate it. This is much better
design, but complicates things. The lazy load logic has to go somewhere. You
either have to ensure that every piece of code using that object is aware of
the lazy load pattern, or you have to stuff database logic into the "getter"
method for each lazy-loaded field.

ORM's give you highly-performant lazy loading, without the buggy boilerplate
suggested by #2 above. Moreover, enterprise-class ORM's typically handle
caching for you, to avoid hitting the database unnecessarily. Monitoring the
state of objects to notice when they've gone stale due to changes on the
database side, etc.

Lastly, for complex queries, most ORM's have a query language (e.g. JQL, HQL,
etc) that is nothing more than a VERY THIN wrapper around SQL. It merely
smooths out differences between various vendor dialects. You're not abstracted
away from the database, you still very much need to understand SQL and the
underlying structure of your data.

~~~
zaidf
_you still very much need to understand SQL and the underlying structure of
your data_

That's good to hear! Though in my experience of interacting with people who
have learned development in last 3-5 years, many of them are very oblivious to
basic database concepts. They mostly think in terms of the object in the code,
relying on the framework to take care of the database. It might even generate
the sql that they just need to execute. But anyone who's worked with
production database from before the advent of frameworks like Rails etc. knows
that the only thing worse than messing up an sql statement is running auto
generated sql.

------
CraigJPerry
>> NoSQL is nothing more than a storm in a teacup

There's been a difference with this "technology cycle" though, there have been
some prominent, well grounded voices from the start of this cycle.

That's a good thing.

That wasn't the case for other cycles - the thin vs fat client cycle, the DAS
vs NAS cycle etc. etc.

EDIT: i'm not saying NoSQL has no application. I earn a paycheck working with
a huge graph db (and it's nothing to do with "social", yay!), and have
previously been a heavy user of Cassandra.

------
mathnode
> NoACID

Perfect!

------
phpnode
NoSQL has always been a stupid term, especially as there are "NoSQL"
datastores that support SQL, e.g. OrientDB.

~~~
crusso
The term was perfect. The spectrum of NoSQL implementations out there defies
an easy name to cover them all. They aren't all key-value stores, they aren't
all json stores, they aren't all big tables, etc.

Indicating that they were NOT within the current paradigm of SQL databases was
the only way to go.

~~~
phpnode
Why is it a good idea to lump such technologies together, when many of them
are fundamentally different? Do we call road vehicles that aren't cars NoCars?

~~~
falcolas
Ironically, we kind of do, at least in the US. We call the trucks. SUVs are a
bit of an anomaly, but I've heard plenty of people colloquially call our
explorer a truck.

~~~
twic
You call motorbikes, buses, and bicycles trucks?

------
tempodox
Finally a good writeup that explains the strange phenomenon of NoSQL. Now, all
that's left is to introduce a tax for “drinking the Coolade”.

------
frik
We are stuck with NoSQL in HTML5, because Mozilla and Microsoft refuse to
implement WebSQL
([http://en.wikipedia.org/wiki/WebSQL](http://en.wikipedia.org/wiki/WebSQL) )

IndexedDB is fine for storing JSON objects, etc. but a relational database
with SQL query syntax, indexes, etc. more powerful and means less code to
write. With IndexedDB one has to reinvent the wheel to just get basic query
features.

WebSQL is not deprecation, the W3C Working Group Note actually says:

    
    
      'This specification is no longer in active maintenance 
      and the Web Applications Working Group does not intend to 
      maintain it further'.
    

WebSQL is only available in Webkit based Browsers (Safari, Chrome) which means
most mobile browsers. As SQLite is in public domain, no company would "loose
their face" if they choose to use it. They could fork off SQLite and change
the SQL query syntax (parser) to whatever the W3C finds suitable.
[https://www.sqlite.org](https://www.sqlite.org)

Mozilla Firefox and FirefoxOS both already ship SQLite for years and can be
accessed by its internal JavaScript API. And several Microsoft products
already use it anyway (e.g. Forza Xbox games). Microsoft has of course also
various other SQL database libraries like MS Access JetRed, MS Outlook JetBlue
and SQL Express.

We had a discussion about it recently:
[https://news.ycombinator.com/item?id=7574754](https://news.ycombinator.com/item?id=7574754)

The new hip things is "NewSQL"
([http://en.wikipedia.org/wiki/NewSQL](http://en.wikipedia.org/wiki/NewSQL) ).
For example Facebook, Google Ads, etc. are powered by MySQL's InnoDB database
engine. I would go as far as count SQLite to this group.

We would need a movement to convince Mozilla to finally add WebSQL to Firefox
and FirefoxOS.

~~~
davidjohnstone
There are good reasons why work has stopped on the WebSQL specification —
everybody was using SQLite, and the specification can't be tied so closely to
a single implementation of SQL. The specification even has the line "User
agents must implement the SQL dialect supported by Sqlite 3.6.19"[1].

1\. [http://www.w3.org/TR/webdatabase/#web-
sql](http://www.w3.org/TR/webdatabase/#web-sql)

Edit: here is Mozilla's rationale for not supporting WebSQL:
[https://hacks.mozilla.org/2010/06/beyond-html5-database-
apis...](https://hacks.mozilla.org/2010/06/beyond-html5-database-apis-and-the-
road-to-indexeddb/)

~~~
frik
SQLite implements the _SQL-92 standard_
([http://en.wikipedia.org/wiki/SQL-92](http://en.wikipedia.org/wiki/SQL-92) ,
[http://en.wikipedia.org/wiki/SQLite#Features](http://en.wikipedia.org/wiki/SQLite#Features)
).

Firefox certainly ships with SQLite and even their IndexedDB implementation is
based on top of it internally (the irony):
[https://plus.google.com/+KevinDangoor/posts/PHqKjkcNbLU](https://plus.google.com/+KevinDangoor/posts/PHqKjkcNbLU)

Microsoft has MS Access JetRed, MS Outlook JetBlue and SQL Express SQL
Databases that ship already with Windows and/or Office and are SQL-92
compatible. It should be trivial for them to figure out which SQL DB engine
would fit best and integrate it in IE 12.

~~~
davidjohnstone
First sentence of
[http://en.wikipedia.org/wiki/SQLite#Features](http://en.wikipedia.org/wiki/SQLite#Features)
: "SQLite implements most of the SQL-92 standard for SQL but it lacks some
features."

------
guard-of-terra
If you're doing accounting, sure you would want ACID compliant database. There
you have a limited amount of kinds of data to store with strict and rarely
changing constraints. You can keep it on one big expensive server (plus
backup) and it's better to bring the system down to allow inconsistency.

However, for most web development SQL is seriously not good. You end up with
hundreds of loosely coupled tables which constantly change their structure for
new features. Half dozen of joins on every request. Hundreds of lines of SQL.
Constant pain. And it's not like you cared so much for the consistency - if
once a year three comments disappear from your web site, so what?

And it's painful to make SQL multi-master.

For (the most of) web development document databases are so much better.
MongoDB is pretty nice because it makes hundreds of lines of SQL with ten
files of code redundant - all per one complex document.

