
The 5 Stages of NoSQL - soofaloofa
https://sookocheff.com/post/opinion/the-five-stages-of-nosql/
======
paulddraper
Reminds me of [http://www.sarahmei.com/blog/2013/11/11/why-you-should-
never...](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-
mongodb/)

> Remember that TV show application? It was the perfect use case for MongoDB.
> Each show was one document, perfectly self-contained. No references to
> anything, no duplication, and no way for the data to become inconsistent.

> One Monday, at the weekly planning meeting, the client told us about a new
> feature that one of their investors wanted: when they were looking at the
> actors in an episode of a show, they wanted to be able to click on an
> actor’s name and see that person’s entire television career.

> The client expected this feature to be trivial. If the data had been in a
> relational store, it would have been.

> I learned something from that experience: MongoDB’s ideal use case is even
> narrower than our television data. The only thing it’s good at is storing
> arbitrary pieces of JSON. “Arbitrary,” in this context, means that you don’t
> care at all what’s inside that JSON. You don’t even look.

> When you’re picking a data store, the most important thing to understand is
> where in your data — and where in its connections — the business value lies.
> If you don’t know yet, which is perfectly reasonable, then choose something
> that won’t paint you into a corner. Pushing arbitrary JSON into your
> database sounds flexible, but _true flexibility_ is easily adding the
> features your business needs.

------
gr3yh47
Why not postgres?

with the JSON(B) datatype, you can start out with full documents and pull out
relational fields/create tables as it makes sense

~~~
ngrilly
Exactly my advice to everyone: If you're in doubt, choose PostgreSQL. It's a
strong SQL database. Yet, you can store, index and search documents in a jsonb
column. You have built-in full-text search. You have PostGIS. And you can
scale and shard using Citus.

------
michaelchisari
I've seen this cycle play out in realtime.

I'm thankful for the flexibility that NoSQL added to the conversation, though.
But I'm confident in sticking with SQL for primary purposes, and only adding
in NoSQL solutions when it really fits the requirements perfectly (and I can't
see those requirements radically changing anytime soon).

~~~
mikestew
I do wonder if the inappropriate use of NoSQL is due to an ignorance of why we
used SQL in the first place (and the article doesn't directly say this, but in
a roundabout way does). In which case, perhaps the brief flirting with NoSQL
at many companies served as a good reminder of "oh, yeah, queries and
transactions are kind of important; I forgot about that because I began to
take SQL for granted."

IOW, you didn't know what you had until you went and used something else. And
maybe you won't be so quick to jump on the next hyped bandwagon.

~~~
WorldMaker
I think it has more to do with the fact that it's a giant spectrum and we've
never quite managed to build something in a particular strong enough sweet
spot so at least for now you are going to continue to see the pendulum swing
between NoSQL and SQL (or more usefullly, Document and Relational) databases.
What's worse is that there may probably be no mainstream acceptable sweet
spot, every developer/team has their own particular preference, and we might
never see the pendulum stop.

The spectrum seems to be ad hoc data on the one hand versus ad hoc queries on
the other. Document databases prioritize being able to store just about any ad
hoc document you might need at the expense of being able to quickly pull off
new ad hoc queries that you didn't originally plan for. Relational databases
prioritize being able to query very complex relationships in just about any
way you can imagine at the expense of making it hard to add new shapes of
data.

There's no right answer, just the answer that works best for you at the
moment. The pendulum will probably keep swinging for a while, and we'll see
more hybrids like we've been seeing Postgres and SQL Server embracing JSON
more heavily, because the right answer is probably closer to the middle ground
or at least "Why not both?".

~~~
michaelchisari
It's a question of up-front or down-the-line. Do you design your database
schema, think out your queries and plan up-front? Or do you do ad-hoc, quick-
turn-around development and deal with the technical debt down-the-line?

From the beginning, I saw NoSQL as being great for prototyping or MVP. But the
reality of our industry is that you very rarely get to bang something out
overnight, then rewrite it the "right way". Your prototype or MVP more often
than not becomes the foundation of your stable, mature releases.

~~~
WorldMaker
It's not really a question of up-front or down-the-line, even.

Some projects may never need ad hoc queries because by the time they hit scale
enough that they need such reporting they might go directly to BI tools
(Cubes, Elastic Search, et al) or "Big Data" tools (Hadoop, Big Query, et al)
without needing an intermediate SQL step.

The same is true with transactions and consistency models. Some projects may
never have a need for anything stronger than eventual consistency and may
never need more complicated transactions beyond the scope of an individual
JSON document. Even then, there are chunky hybrid solutions with good options
for choosing a different ACID plan or locking mechanics or transaction
capabilities, and just because most "NoSQL" databases are in the corner of
eventually consistent and transactions only at the individual document level
does not mean that _all_ are (there are definitely exceptions out there).

Again, there will always be projects and business domains on both sides of
this "data spectrum". There's no wrong answer, just what's right for your
specific project or business domain.

------
pmontra
The premises of the first step are often true, but strictly speaking they are
false.

Some NoSQL dbs have schema. Cassandra does
[https://docs.datastax.com/en/cql/3.1/cql/cql_reference/creat...](https://docs.datastax.com/en/cql/3.1/cql/cql_reference/create_table_r.html)
and I think hbase has too.

FoundationDb was (is?) ACID.

NoSQL literally means that it doesn't have SQL.

~~~
paulddraper
Literally speaking, "NoSQL" as much for databases as "NoHTTP" means for
application protocols. It says literally nothing other than it's not something
else.

It could be key-value database or it could be a graph database. It could have
a schema or it could be schemaless. It could support one index or it could
support multiple. It could have a permissions system or it could be single-
user. It could have a server process or it could be embedded.

The author is thinking of the MongoDB and CouchDB variety.

------
pmontra
Doesn't work for me. Google cache at
[http://webcache.googleusercontent.com/search?q=cache:B3U3vwv...](http://webcache.googleusercontent.com/search?q=cache:B3U3vwvsczQJ:sookocheff.com/post/opinion/the-
five-stages-of-nosql/+&cd=1&hl=en&ct=clnk&gl=en)

------
SFJulie
Rule #1 of NoSQL, if you use NoSQL, then your data have no business value,
else they would be used for billing hence you would need transactions.

Rule #2: if you have no business value you can be fired/outsourced.

The 5 rules of grief then apply to your job. Amen.

------
combatentropy
I have a couple of points, unrelated.

(1) Tabular data comes to people naturally. It's not an obscure, impractical
theory foisted on us by E. F. Codd. Peer over the shoulder of your non-
programmer coworkers: the front-office secretary, the project manager, the
company president. You will find that each of them love Excel. They're using
it for address books, project statuses, meeting minutes, and a million other
things. Some might say they're abusing it, because Excel is meant for math,
and they're using it just because it makes it easy to set things into tables.

(2) Organizing a complex application into tabular data is hard. Look again at
your secretary's Excel spreadsheet of contacts, and you will see that it fails
the first level of normalization: contact names and company addresses are
repeated across rows. Changes are delicate, and the spreadsheet likely has bad
data.

An application of even medium complexity, if properly normalized, will need
several tables, including one-to-many and many-to-many tables. For each
column, you must choose among a cornucopia of data types: boolean, integer,
real, numeric, date, time, timestamp, enum, text, array, json, and several
others. Furthermore, have you chosen the right constraints: primary key,
foreign key, unique, check constraint, rules, trigger functions? Most pages of
the app will involve joining several tables, and often there is a way to
arrange it that makes it run 100 times faster. But then again, such is the
case in any programming language.

So I do not blame programmers who want to reach for the NoSQL database. It is
a thousand times easier to just:

    
    
       save(json_encode($_POST));
    

or whatever. I've been programming with SQL all day long for over 10 years,
and it's still hard. I'm still finding better ways to write the SQL or use
features of my database (PostgreSQL) that I wasn't using, at least not as much
as I could.

The biggest encouragement I can give to someone who doesn't know anything
beyond

    
    
      select * from table
    

is that writing it in SQL is almost always shorter than in your procedural
language of choice. How would you enforce all these rules in procedural:

    
    
       create table sales (
          id serial primary key,
          sold timestamp,
          emp int references employees,
          cust int not null references customers on update cascade,
          item int not null references items on update cascade,
          qty int not null check (qty > 0),
          price numeric (12, 2) not null check (price > 0)
       )
    

or formulate this report:

    
    
       select emp
       from sales
       where age(sold) < '3 years'
       group by emp
       having sum(price) > 1000000
       order by 1
    

(DISCLAIMER: The code in this post is not guaranteed to compile.)

~~~
paulddraper
Scala

    
    
        sales
          .filter(DateTime.now - 3.years < _.sold)
          .groupBy(_.emp)
          .collect {
            case (emp, r) if r.map(_.price).sum > 100 => emp
          }
          .toList
          .sorted
    

But yeah, you're right.

------
chrisdima
Polyglot is here. It creates opportunities for a lot of folks. And it creates
a lot of headaches for others. The open source NoSQL analytics project,
SlamData, will reduce a lot of the friction described in these comments. I
work on the project.

Three things to know:

1\. SlamData is souped-up SQL + data viz platform. 2\. SlamData send analytics
to the data (computation happens in-database) 3\. SlamData connects to
MongoDB, Couchbase, MarkLogic, Spark and relational stores. 4\. You can do
JOINs across data stores

Four things, sorry.

If you want to know what's going on across a polyglot business or
application... it's exponentially easier now. So go ahead and use the new
technology -- the overhead isn't nearly as much as it was. I think this
materially changes the conversation.

