
Rethinking the limits on relational databases - ninthfrank07
http://www.craigkerstiens.com/2014/01/24/rethinking-limits-on-relational/
======
sehrope
I may be in the minority, but I rather like the rigidity of a fixed schema.
Yes, it's a bit more "painful" for initial setup ( _you have to actually
create the tables_ ) and yes, it's a bit more work during migrations ( _you
have to actually add the columns_ ), but I don't see either of those as
getting in the way enough to give it up. It's just too useful.

Data structures are meant to last much longer than application code. Anyone
that has worked on a long running system can attest to that; well defined data
structures and table layouts will outlive any application code.

When I'm designing a system, I think I spend orders of magnitude more time
thinking about data structures then actually implementing the CREATE/ALTER
TABLE code for them. Planned properly, you can even do the ALTERs/CREATEs
necessary to add columns in advance of any actual app usage ( _ie. "two stage"
app deployment_).

There is a place for "flex fields" or storing generic "documents" but legit
use cases are pretty rare. When they are necessary, a single JSON column is
usually enough. The example I generally use is an audit trail: The who ( _FK
to user_ ), what ( _event enum_ ), and when ( _timestamp_ ) are all strongly
typed but you may want a JSON field for event specific data.

Oh and if anybody has every tried to do a data migration with a schema-less
database ... well have fun with that. Either you bite the bullet and convert
everything or you end up with a lot if/then/else logic littered through your
app that will bite you down the road.

~~~
prodigal_erik
I'd put it more strongly. If your apps are anything more than dumb opaque
storage, if they ever process the data they write, _they have a schema_. There
are certain fields and values your code is relying on to work as intended.
"Schemaless" merely means your schema is not written down anywhere, and you
might not have any tests that could detect whether any version of your apps
ever wrote data that violates it. I say "apps", plural, because in my
experience if the data store has been there for more than about a year, anyone
who thinks there's just one piece of code that uses it is in for a very
unpleasant surprise.

On data migration, I don't think the if/then/else version is even feasible. If
there have been _n_ versions of your apps, there are _2^n_ possible states any
particular record could be in depending on which versions of your apps did and
did not update it. I've seen Notes documents after a few years of this that
are in such weird states that not even the dev team could say just what the
hell happened to that doc or what correct (well, least bad) behavior of the
apps would be, much less what the current versions of the apps would probably
do. You can kind of get partway there with apps that have existed and been
actively maintained for as long as any of your data has been there, but trying
to write anything like new analytics over old non-migrated data is hopeless.

------
themckman
Currently using MongoDB on a project and boy do I just hate it. We have highly
relational data that is a nightmare to reason with and query. Guess what? We
also have migrations! It's hard to explain why, but I just get a bad feeling
in my tummy every time I have to interact with the database. Just not
enjoyable to work with at all. The only thing I've seen it do that actually
impresses me is the aggregation framework/pipeline. That's actually pretty
neat, and I feel like it's a good fit for the data we're querying with it
(analytics data). Other than that, man, I wish I was using PostgreSQL...

~~~
tjr
_We have highly relational data_

How did your project end up selecting MongoDB? That seems like exactly the
sort of thing that it's not good for.

(I'm not just being rhetorical; were there other attributes of MongoDB that
made it seem preferable to a relational database earlier in the project?)

~~~
woah
When you really start looking into it, what projects actually _are_ good for
Mongo?

EDIT: It's nice and quick for prototypes. Ironically, I'm building one with it
now.

~~~
boomzilla
After a few months dealing with a system that's built on Mongo ad the data
store, I've come to the conclusion that MongoDB only use cases are:

1\. A persistent cache, or a compute-ahead read-only data store. 2\. A write-
only store for high volume but non-critical data, such as comments, log
events, etc.

MongoDB should never be the primary source of truth data store. Nothing can
replace a relational DB for that purpose, yet.

~~~
vinceguidry
> 1\. A persistent cache, or a compute-ahead read-only data store.

Even then, you're probably better off with Redis or memcached.

> 2\. A write-only store for high volume but non-critical data, such as
> comments, log events, etc.

Just write to disk. It's very hard to beat flat file text for this use case
and most of the time it's not worth it.

------
taspeotis
> There’s no reason higher level frameworks or ORMs couldn't handle the
> migration process.

> ...this is a manual painful process today, but theres no reason this can’t
> be fully handled by PostgreSQL or directly within an ORM .

You mean like EF already does [1]?

> Add-Migration will scaffold the next migration based on changes you have
> made to your model since the last migration was created

For what it's worth, I don't use that feature. I prefer to use SQL Server Data
Tools [2] to maintain a model of the database and use its schema and data diff
tools to generate upgrade scripts. This is more due to the database pre-dating
EF migrations but as well the schema is fairly complex so having SSDT (with
its knowledge of nearly all SQL Server object types) do diffs against the
actual database model is better than EF diffing its own abstract model.

[1] [http://msdn.microsoft.com/en-
us/data/jj591621.aspx](http://msdn.microsoft.com/en-us/data/jj591621.aspx)

[2] [http://msdn.microsoft.com/en-
us/library/hh272686(v=vs.103).a...](http://msdn.microsoft.com/en-
us/library/hh272686\(v=vs.103\).aspx)

~~~
samstokes
The Haskell database-mapping library [1] "persistent" does auto-migrations as
well. You declare your data types in a DSL reminiscent of SQL CREATE TABLE
statements. If you add a new data type, it creates the table; if you add a
field, it adds the column; if you make a field optional, it changes the column
to allow nulls.

It will only do that for non-destructive changes, so if you remove a column
(which would lose data), or make an optional field non-optional (which is ill-
defined if you had NULLs), it bails out with a message suggesting the SQL you
should run to migrate manually.

[1] I can't really call it an ORM, since it maps to Haskell record types
rather than _O_bjects, and it works with NoSQL databases as well as
_R_elational databases.

------
nemothekid
I find the whole "schema-less" thing to be a side-effect of NoSQL marketed as
a feature. Unless you are about prototyping a small feature quickly (which
MongoDB is great for) - on the large scale of things, 5,6 months out in
development time, you will find that the time saved on "not having to do
migrations" may very will be nil. In most cases, if you want to keep your hair
you end up having a schema anyways. At the very best MongoDB and friends make
it easy add columns, but removing and changing datatypes might as well be
migrations. Also, as the article mentions, Postgres now has a JSON data type
making the schema less advantage kind of useless.

When it comes to limits of relational databases some of the real limits are
actually sharding, replication and high availability, which are all relatively
more difficult to do on the popular YesSQL databases.

In any case, its pretty tiring to see NoSQL only refer to MongoDB and other
document stores. Redis, Cassandra, HBase and Hive are all NoSQL engines ranked
before the next document store, and IMO have a lot harsher performance
penalties for incorrect usage than Postgres & friends. Given that Cassandra is
a the second highest ranked NoSQL store which also (sort-of) enforces a
schema, implying that the limits on relational databases are schemas is pretty
bizarre.

[http://db-engines.com/en/ranking](http://db-engines.com/en/ranking)

------
roycehaynes
Craig - great post. I think you missed a few huge benefits that drive the
"NoSQL movement." Specifically, sharding and aggregation.

Most devs don't need the power of sharding, which is why that benefit can
never be felt. But the reality is this is probably the #1 characteristics
(huge benefit) of NoSQL databases. Google and Amazon definitely paved the way
for this movement, primarily b/c they dealt with tons of data. Its simply
cheaper to scale out (distributed) than scale up.

You can't aggregate data with a relational database. But you can aggregate
with (most) NoSQL databases (exception is graph dbs). Instead of building
relationships, with NoSQL you're building composites. The huge benefit here is
enabling sharding while, having your data all in one place.

Lastly, a relationship db is made up of tuples and sets of tuples. With a
NoSQL DB you can have complex data structures. I think this is the point you
we're trying to make re: Documents.

I still love relational databases. It's cool to have options though. Before,
relational was the only way.

The question is how do we determine which db to use (or use more than one)?
Ah, the beauty of polygot persistance...

------
george12
You would not want a Framework/ORM that abstracts alter statements from you
when working with large tables in a relational db. Alter statements can have
real performance interruption and is something that an engineer has to be
aware of. It's a case where a framework might help 70% of the use cases but
really hurt the other 30% with medium-larger sized tables.

------
stephen
> Having Rails/Django/(Framework of your choice) automatically notice the need
> for a column to exist and make appropriate modifications you could work with
> it the same way you would managing a document relation in your code.

Hibernate can do this (IIRC, it's been awhile), which is great for development
mode, but I would never trust a framework/ORM to "auto-migrate" my production
database.

------
nathancahill
Thought this was going to be an article about RethinkDB.

~~~
hardwaresofton
this. I don't know if it would be appropriate to suggest author try RethinkDB

if so... Please try RethinkDB, I think it's fantastic. If offers some
interesting promises, and is by far the best noSQL DB that I have worked with
(they actually fix problems with it, too, constantly).

Comparison from RethinkDB website (RethinkDB vs. Mongo/others):
[http://rethinkdb.com/docs/comparison-
tables/](http://rethinkdb.com/docs/comparison-tables/)

I think they're really up front with what they promise, instead of people just
saying it's "web scale", and I think of them as an iteration after Mongo. Then
again, I haven't done a super large amount of work in Mongo, so...

~~~
nathancahill
Absolutely agree with this. RethinkDB is the NoSQL db that got me to use NoSQL
again. Currently use it in tandem with Postgres for most of my projects.

------
mattfenwick
I am not an expert on RDBMSs, but isn't the issue of distributed computing and
the CAP theorem a far more important driving force behind the NoSQL movement
than any of the reasons in the article?

~~~
willlll
no

~~~
eloff
Read the community rules please. Comments should add value. Nobody cares
whether you agree or not. We want to know why you disagree.

------
woah
The problem with relational databases is not the schema or the relations, it
is SQL.

~~~
vinceguidry
What's wrong with SQL?

~~~
afhof
The language. Really the problem with SQL is the API. I like having a
relational database, but SQL strings are a really crummy way of writing to it.

~~~
vinceguidry
What's so crummy about variable interpolation?

    
    
        def insert_person(first_name, last_name)
          sql.query("insert into persons values(#{first_name}, #{last_name})")
        end
    

So long as you're not plugging form data directly into that function, that
works just fine.

~~~
zyxley
What happens when you have to deal with a crazy pop star with a " in his
(real, legal, you-are-the-unreasonable-one-for-demanding-something-else) name?
(This is ignoring the myriad of issues that come with parameterizing names
into "first name" and "last name" in the first place, but that's a separate
thing.)

This is what binding variables is for, but to use them you're either writing
for specific platforms (PSQL, Oracle SQL, etc), or you're using middleware
that hides the raw SQL from you.

~~~
vinceguidry
This is a problem with the string domain, not with SQL. You'd have it no
matter what data persistence method you use. The answer is to put logic
between that function and your form data puller.

