
SQL is Agile - johns
http://lucumr.pocoo.org/2012/12/29/sql-is-agile/
======
btilly
At a past job we had a really fun interview question. What we did was laid out
a concrete problem, and asked the interviewee to lay out a schema. Since we
know the problem well, for virtually any schema we'd be able to invent
reasonable requirements that would force them to change their schema. At the
end we'd verify a basic knowledge of SQL itself by asking them to answer a
couple of basic questions.

(The actual problem we had them tackle was reports to use in making picks in a
fantasy sports game. But it could be anything simple to explain, fairly
straightforward to solve.)

The reason why the question worked so well is that this was, in fact, pretty
much how we build CRUD applications. We'd decide the schema, and once the
schema was known, the CRUD application was fairly obvious and straightforward.
And a large part of the programming model would be immediately provided by
your ORM once you had done so.

~~~
mattmanser
There's no difference between this and designing the classes first.

Designing the classes first in an explicitly typed language actually works a
lot better than designing a schema in a DB as you're not constrained by
relational requirements and it's easier to change imo. Make a change to your
classes and you can usually just right-click refactor, try and make a change
to your SQL schema and it won't accept it at all until you unhook all the
dependant tables.

Basically, I don't really get what point you're trying to make.

You _never_ know your domain well enough to make a decent schema unless you've
already written the program once before. Which is a pointless thought exercise
where you get to point at flaws in someone else's nascent design. Like it
sounds like you do in an interview, which is great for seeing their thought
processes, but don't fool yourself that it's because of that design method.

In other words, the reason it worked so well is because you had already built
it once, nothing at all to do with your choice of schema first.

In the end do what works for you, but schema first vs classes first is a
trivial and pointless argument as both work fine.

~~~
Xurinos
Having done this for a while, I can solidly say that the relationships and
structure of data has been consistently better when designed as a set of
schema tables than as a set of classes _because_ they are especially
constrained. The constraints forced people to actually think about the
relationships between facets of their data, and the resulting classes were
much cleaner and more maintainable. Should we ever decide to move aspects of
the classes into a database, the transition requires little code refactoring;
this comes up fairly often for us, since we are trying to give users more
customization powers.

People _should_ be doing that when they design classes, but the flexibility
tends to work against them. It could just be a mindset that makes it work
well, and this might be a more appropriate approach for larger projects than
for smaller types.

~~~
radicalbyte
In my experience, relational modelling works fine for domains containing
simple logic.

As soon as you start having constraints such as "from date must be before to
date" and "the quote must have at least one line, and these lines should sum
to > 0 and < 1000000" then relational modelling fails. Hard.

I like to use a Quote with QuoteLines as an interview question. Relational
modellers make two tables, both having unique identifiers. Domain modellers
create two classes, sometimes exposing only one to the outside world. Only one
of those classes (Quote) has an identity.

Domain/class models tend, from what I've seen in the wild, to fail faster
under load and under change. Ironically, experienced relational modellers tend
to build cleaner class models. Although that might be more a function of
experience than anything else..

~~~
sjwright
> As soon as you start having constraints such as "from date must be before to
> date" and "the quote must have at least one line, and these lines should sum
> to > 0 and < 1000000" then relational modelling fails. Hard.

Really? Both those problems are insanely trivial to solve with one or two SQL
statements. Your second and third criterium are _perfect_ candidates for
relational modelling; the first is so trivial it beggars belief.

~~~
joevandyk
Enforcing that a quote has at least one quote_line is pretty tough, at least
in Postgres, and involves triggers and locks.

How do you enforce that two transactions don't delete the last quote_line at
the same time?

And how would you enforce that the sum of the quote_lines is correct with
multiple writers?

Not saying it can't be done easily, but I'd love to hear how.

~~~
jeffdavis
"Enforcing that a quote has at least one quote_line is pretty tough, at least
in Postgres, and involves triggers and locks."

The reason it seems simple when using a few classes is because there is an
implicit assumption that concurrency is a non-issue and there's only one
application involved. But those aren't good assumptions, so the approach using
classes will start to look more complex (and involve locks, etc.).

In Postgres, the trick is to use SERIALIZABLE transactions everywhere (should
be the default, eventually), which avoids the need for explicit locking. Then,
add a trigger that is fired before changes to either table, and it would
simply check that the condition holds for the quote that was modified.

Here's some code, since you asked:

    
    
      create table quote(
        quote_id int8 primary key,
        customer_id int8,
        valid daterange
      );
      
      create table quote_line(
        quote_id int8 references quote(quote_id),
        amount numeric(10,2)
      );
      
      create or replace function check_quote() returns trigger language plpgsql as $$
      declare
        line_count int;
        line_sum numeric;
      begin
        select into line_count, line_sum
          count(*), sum(amount) from quote_line where quote_id=NEW.quote_id;
        if (line_count < 1 OR line_sum < 0 OR line_sum > 1000000)
        then raise exception 'invalid quote';
        else return NEW;
        end if;
      end;
      $$;
      
      create constraint trigger quote_check_trig after insert or update or delete on quote deferrable initially deferred for each row execute procedure check_quote();
      create constraint trigger quote_line_check_trig after insert or update or delete on quote_line deferrable initially deferred for each row execute procedure check_quote();
    

Note that I did not need to add the CHECK constraint, because it's much better
to use the appropriate data type -- DATERANGE -- instead of hacking it
together from parts.

~~~
joevandyk
I figured that serializable transactions would be necessary. That's going to
be the default?

If you forget to use serializable transactions every time you update these
tables, is it possible to store incorrect data?

~~~
jeffdavis
I think that SERIALIZABLE should be the default eventually, but others may
disagree and I won't make a prediction.

However, it isn't necessary to solve the problem. It would be relatively easy
to use a row lock in this case to solve the problem, as well, but I like to
avoid those unless there's a reason.

If you want to have a mix of SERIALIZABLE and other transactions, or you are
worried about making a mistake (or some malicious user), then you need to use
the row lock. Eventually there should be a way to force users into
serializable transactions.

EDIT: actually, in the trigger, you could explicitly check if the transaction
isolation mode is serializable. That would be the best approach:

    
    
      if current_setting('transaction_isolation') <> 'serializable' then
        raise exception 'serializable mode required';
      end if;

------
outside1234
There are no schema-less databases. There are databases where the schema is
applied at write and there are applications that apply a schema at read.

~~~
PommeDeTerre
Very true. I always chuckle when some NoSQLer tells me how much time he's
saved by not having to define and update a schema. Then he tells me how all he
has to do is check whether fields exist or not in his code, and enforce
constraints, and provide default values, and so forth. He has to do this in
the Ruby code that he's writing, and his colleague has to do the same in the
PHP code she's writing, and then there's the analysis team who has to write
the very same code in their Python scripts...

~~~
wpietri
I'm not a big fan of either of those approaches.

The cross-implementation schema constraints in SQL DBs are pretty weak. I can
only express a small fraction of my application logic there unless I want some
serious vendor lock-in.

And then regardless, once I'm referencing the same schema from multiple code
bases, schema improvements are basically impossible.

If I need to get at the same data from multiple code bases, I think an API is
a lot cleaner. And then I'd rather put the constraints in my API code. The
tools are so much better, and programming languages are much more flexible.

~~~
PommeDeTerre
I find your arguments somewhat contradictory.

So you want to avoid vendor lock-in, but you see NoSQL databases as being
better in some way? They're far worse. If you initially target MongoDB, that's
what you're going to be stuck with. You aren't going to be moving to CouchDB
easily.

It's far easier and much more reliable to use a subset of SQL that's common to
the major relational database systems commonly used today. This subset of SQL
still ends up being extremely powerful.

I'm not sure exactly what you're getting at with the incorrect claim that
"schema improvements are basically impossible" when accessing a single
database from multiple code bases. There are many trivial ways to offer or
retain compatibility. Using views is one common way. You then have much more
freedom to change the underlying schema, while still supporting existing
queries, for example. Stored procedures, functions and triggers offer other
possibilities.

An API doesn't magically get rid of compatibility issues, either. You'll need
to maintain compatibility within your API, or you'll have to run multiple
versions of your API simultaneously, or you'll have to deal with it some other
way. All that you've done when using an API is added yet another layer that
you'll have to maintain. Duplicating the functionality offered by relational
databases within your API's code will only end up taking more of your time.

At this point, I think that we've got enough experience as an industry and a
craft to know that it's a good practice to use a schema, it's not difficult to
change your schema while still maintaining compatibility, and it's better to
keep your constraints as close to your data as possible. Relational databases
enable this, while NoSQL databases make it much more difficult than it needs
to be, if it's even actually possible.

~~~
jeffdavis
"It's far easier and much more reliable to use a subset of SQL that's common
to the major relational database systems commonly used today."

I think that is generally a mistake for non-trivial applications. MySQL and
SQLite are so far from the standard that it's just not worth trying to find a
subset of SQL that works the same on those systems as the others.

The similarities between the SQL implementations do make porting easier,
however. But I consider it a port rather than using a common subset.

------
FuzzyDunlop
The thing I noticed about NoSQL is that, at least with Mongo, you just end up
putting your schema in your app. You probably do it by saying a particular
class has a bunch of fields or instance variables that you expect to be a
certain type.

I wrote a portion of an app that relies on Redis, and despite the fact that
it's _really nice_ to work with, especially in Ruby (that practically mimics
Redis' native API), it was incredibly difficult to remember the structure.
What keys do I need? What type are they? Oh I'll have to do `KEYS *` and look
I guess. Then I'll remember to document it (and thus create the schema).

Of course, the problem on my part was, in part, that I probably didn't need to
use Redis. And when I needed to use Mongo, well, I wasn't storing things that
look like documents. I think this is an easy mistake to make, but one you
learn from quite easily once you've experienced it yourself.

~~~
Skywing
I recently switched from Mongo over to CouchDB. The reason that I favor Couch
over Mongo is primarily because I felt like Mongo's querying capabilities suck
when dealing with embedded documents and that Mongo's MapReduce is not their
first-class focus. Second, though, was because with CouchDB, I'm forced to
basically put all of my query logic on the database server, which keeps it out
of my client code. This cleans up my client code a ton, and querying from
views gives context to what the data that you will be getting in return is. I
don't think I'll ever look back to Mongo as long as I can choose CouchDB.

------
d--b
This post is not addressing the right question. Think of databases as storage
space, like wardrobes. The sql database will allow to store your clothes in
spare parts each with its own dedicated space, allowing you to quickly
calculate the sum of all stripes on all your socks. That's great but on the
other hand, if you want an outfit you have to put it together, starting by
sewing the sleeves on the shirt. On the other the NoSQL wardrobe will have
various boxes where you can put any cloth you want. Now it's up to you to tidy
the whole thing. If you want you can put full outfits in the box so that you
don't have to think much about what to wear. However to count the number of
stripes on your socks, you'll have to go through the whole wardrobe. The point
is: It's not the fault of the wardrobe if your room is a mess

------
lnanek2
Mmm, there's so much less code when everything is in JSON though. You can
often ditch half the classes in the system. Sometimes I just don't need a
client side class, a server side class, a schema, etc. just to store a few
fields. It violates Do Not Repeat Yourself, especially when client and server
are running different technologies like NodeJS server vs. Objective-C client.
It is so much simpler to have just JSON in node and a dict on the client than
to have a set class JS file with schema in node and a class on the client, and
have to change them all whenever the data format changes. Also violates You
Ain't Gonna Need It, from the agile rule house. A lot of times you build all
that boilerplate, maintain all that bolierplate, and you never end up needing
it anyway. Or have to change it all before you would have needed it.

~~~
matthavener
What's the difference between a server side class and a schema? And isn't the
necessity of creating a class just a complexity required by some statically
typed OO languages. Some data defined by a schema is still just data -- it can
be used similarly to JSON.

------
mistercow
It seems to me that of all the advantages of NoSQL, schemalessness is probably
the most minor and dubious. Most of the pain of dealing with SQL comes from
the relational part, not from needing a schema.

The problem with SQL is that it's _clever_. To do things efficiently and
reliably, you often must do very clever combinatorial operations on different
tables, producing elaborate structures which are then winnowed down to a
simple list of information. So right off the bat, you encourage programming
logic that is difficult for future maintainers to wrap their heads around. Of
course, most programmers _aren't_ clever, so their queries end up being both
difficult to understand and subtly wrong.

Then, of course, your data comes out as a list of rows. Well that's just
great. If your needs involve a more complex data structure than that, well,
you're going to need code to rebuild that outside the database.

~~~
loxs
Yes, and not even only this. The really big difference is the eventual
consistency, used as a core concept in most horizontally scalable data stores.
If you once design your application with heavy reliance on hard consistency,
you will have a _huge_ problem scaling it later on.

~~~
jeltz
Very few applications need to scale past that point. I know plenty of business
where if they would scale past the point of being able to run on a single SQL
database instance they would make billions and be larger than their
theoretical target market. Not everyone is building the next Facebook (which
coincidentally uses MySQL for a lot of things).

~~~
loxs
It's not only about being able to serve lots of users. It's also about
availability and geographical distribution. Even if you don't have millions of
users, they will gladly benefit from you being split across several
datacenters (serve everyone's requests from a location near to them).

~~~
PommeDeTerre
It's quite easy to do this using traditional relational database technology,
and various forms of replication. We've been doing it for many years, likely
before many of today's NoSQLers were even born.

In the relational world, we don't really see such capabilities as anything
special. They're merely one of the many basic features we've come to expect
from any relational database system worth using.

------
freework
Mongo is like the PHP of the database world. Many important features are
dropped in the name of "getting started quickly".

What you gain in short term wins, are nullified by all the long term headaches
you suffer.

~~~
tracker1
I don't think I would quite agree with that.. I would say that mongo is very
accessible in terms of getting started... also, you can index nested
properties via dot notation, unless you are really deeply denormalized,
collections of objects with collections, I haven't been aware of any issues
there.

A few things that surprised me about mongo... first that if you do a sort, you
either need an index on your sorting fields or your sort info needs to fit in
32mb. I'm using skip/limit for paging, and it really became an issue once I
was running against the full datastore. Second, only one single index is used
in a query, unike most sql databases that will collate results of mutiple
indexes under the covers... this caused me to rethink my index strategy, as
well as add both asc/desc indexes for sorting results.

For me it was a matter of getting away from an over normalized data set (30+
queries for a single vehicle classified ad display, 23-joins if you wanted to
reduce 15 of those queries (my main VIEW created for migration)...
denormalized data makes more sense for this use case. For others it may or may
not make sense.

~~~
aidos
Relational systems use multiple indexes for queries, though I'm not aware of
an implementation that uses multiple indexes for a _single table_.

Remembering that a query in Mongo is basically a query on a single table it's
acting in the same way a relational db does - ie, check statistics, choose
best index, perform lookups.

------
vyrotek
This matches my experience from the last few years as well. I was burned by
denormalizing too early and not fully understanding the trade-offs I was
committing to for the sake of not dealing with 'schemas' (which I actually
have no problems with). And just like the OP, I have found far better success
sticking with SQL early on for many of the same reasons mentioned. Later, as
products and data schemas matured I found very specific uses for data stores
like MongoDB. But, they are never the core data store.

------
joe_the_user
Well, another thing that is not mentioned here is that SQL and the original
relational model were effective designed from the start to be "agile", to
allow ad-hoc queries and other "quick changes".

The original relation database model was, from the get-go, intend to decouple
the database from large-scale programming. The DB administrator would make
queries and evaluate their effect on the system without having to be a
programmer.

------
thezilch
How can you have a project without models and models without schema? And so
you have to change a schema; is there usually much difficulty? More than
supporting multiple schema across a single set?

Even if DBs were still bad at migrating a schema (eg. table locking horrors),
the problem usually only presents with millions of rows, which one presumably
doesn't have in the early stage of a project -- "start[ing] a project."
Nonetheless, even with millions of rows, there have been solutions for all the
major RDB since before MongoDB.

~~~
camus
data mining : you dont know how exactly how your data will look like , and how
much datas you'll have to handle. And you are opened to new types of datas.
data logging : you dont need schema databases for that. file upload : GridFS ,
... User generated content : save semi structured documents ( like a
configuration , a spreadsheet , ... ) that need to be parsed but dont need a
"schema".

~~~
disgruntledphd2
True, typically you don't know in advance of a data mining operation.
Nonetheless, you have some invariants that you know you'll need and these can
be modelled with a schema.

Additionally, when attempting to put a model into production, you'll normally
know what it is you need, and this can be put into a schema for ease of
interoperability with other requirements (CMS, booking etc).

------
programminggeek
I don't think that schema or schemaless should change your app structure. Your
entities/models should contain your data structures. If you structure your
application right, schema doesn't matter. In fact, your persistence mechanism
shouldn't either. Mongo, Mysql, Redis, they're just different ways to store
and retrieve your data.

The biggest reason to avoid schemas is to avoid schema migrations. On big
enough datasets, they take a long time. That is not every app's problem, so it
depends on your app. Also, not having a schema doesn't mean you completely
avoid data migrations as a result of changing your schema in all cases.

~~~
wvenable
The problem with avoiding schema migrations is that you end up with those
migrations in your code. If you add a new field or re-arrange your data
structure, you have to add code to handle that old data. In many ways, it's
simply easier (and less permanent) to do a migration.

I've done it both ways and I wouldn't go schema-less again. Right now I have
decade old code in my models that handles old data structures that more than
likely don't even exist anymore. With a schema I can just make the change and
perform any data transformations in SQL and not be required to have any code
to handle legacy data.

~~~
programminggeek
I agree that avoiding schema doesn't solve the problem. My point is that your
schema exists in your code one way or another (explicitly or implicitly), and
even without a database schema, you still need to either handle the schema
differences in your code, or you need to migrate your data into the new schema
in some way or another.

The best use of schema-less data storage I've heard of is Craigslist of
MongoDB for their archives. Apparently migrations on their MySQL servers was
taking way too long, and switching that system to Mongo made a lot of sense
and apparently has worked out pretty well.

In the end it's about understanding the use of the right tool for the right
job.

~~~
wvenable
I have experienced slow migrations and it is a pain -- but it's not a
fundamental conceptual problem -- it's just a tool problem. I believe postgres
will or already has the ability to do background incremental schema changes
without locking the table.

My point is that you have a schema in code and a schema in the database
whether you have a fixed schema or an implicit schema based on the structure
of your data. I prefer to be explicit.

~~~
programminggeek
I agree.

------
jaequery
mongo is easy to start with but it sure makes your project more complicated
and painful as time goes on. it was to the point, i actually stopped enjoying
work.

the power of sql is just too great to ignore, had gone through too many times
where i wish i was on sql.

i will probably use mongo for those times where i absolutely need to store a
schema-less data (custom attributes), but will only do so in conjunction with
a sql database.

there is also that reliability and maintenance factor of securing it, making
it redundant, and correcting problems when problem arises. all of which takes
time off development.

and just like OP, i probably won't ever start another project with mongo from
ground up. there are just too much things to worry about and problems come out
from all different angles when you least expect it. but in sql, everything
just works.

~~~
camus
impedance mismatch , that's why SQL is uber non sense in OOP architecture,
especially when you end up using slow and buggy ORMs...

~~~
meaty
If an ORM is slow or buggy, you're using it wrong or its crap. Most things
that claim to be ORMs are really not very good quality.

Try hibernate / nhibernate with ehcache cache backend. We find both to be an
order of magnitude faster than hand tuned SQL and testable to boot.

~~~
mbell
I despise hibernate. I will never use a self managed 'session based' ORM
again, it creates far too many messes.

~~~
meaty
When you get over the learning curve, which is quite large, the mess doesn't
exist. I've never seen an issue with hibernate which isn't caused by poor
understanding. You really do have to know your shit before you start - you
can't just waltz in django-style and write code. You have to think about
session management strategies and architecture.

~~~
mbell
It has nothing to do with understanding and everything with having to create a
bunch of excess code to deal with the way it works.

As an example: I shouldn't have to write a filter to wrap the template
rendering stage of request processing in a transaction just to make lazy
loading work.

To make hibernate useful you almost always need to have another framework
manage it (SEAM, Spring, etc). I'm not going to use an ORM that requires an
entire other framework just to make it reasonable to work with.

Hibernate also has the worst query building framework I've seen. Its far and
away the most verbose piece of crap I've ever worked with. As a result most
people just drop to writing queries in JPL which removes one of the big
niceties of an ORM.

~~~
meaty
Well you've conclusively proven my point because its obvious you don't know
how to use hibernate properly.

You _never_ lazy load stuff from a view. That is a recipe for disaster. You
use a projection or set of projections and populate your model up front. The
transaction should be concluded way before the view is rendered. If you get
that wrong, your architecture is broken.

The only bit of infrastructure we use is an annotation + AOP which wraps a
service method in a transaction and unit of work. This is all container
managed. To be honest, it's literally 3 lines of config and works universally.

Verbosity - that's a load of crap. We have 3 lines of AOP config, a mapping
file per class and nh config which is about 12 lines of XML. All the config
was cut and paste.

The whole thing gives us a model which is portable between engines, proper
transaction management, cache layer, schema generation, fully testable data
API (we materialise the whole model into HSQL in integration tests) and
documentation which is actually good.

It's a no brainer. I suggest you go read POEAA and have a think.

~~~
mbell
Sounds like a complete mess and I see no reason to build an application that
way. I certainly can't see a reason to create a separate query for every place
in a view a model may be used just to avoid lazy loading. Maybe on performance
critical views but the rest of the time, that is an unneeded premature
optimization.

You've also proved my point, by needing an entire 'container' managing things
to keep everything straight. I'll happily keep hibernate in my rear view
mirror, I'm quite happy with Ebean.

~~~
meaty
Each to their own. Our architecture handles 22,000 reqs/sec, is scalable
across our 80 man dev team and pulls in approx $30mil on revenue. Does yours?

------
dbla
For me the process of designing my schema and planning my project are almost
one in the same. I have a hard time thinking about how I'm going to build
something without thinking about the schema.

------
contingencies
I spoke at Code Generation 2010 at Cambridge in the UK on my experiences
implementing code generation in various parts of an existing shop's
development process.

One of the low-hanging fruits in that regard is selecting existing and under-
utilized models and making the most of them.

I found that SQL schemas were a great way to model a business domain even if
you need to migrate to another database ... since you have so many tools
(including free cross-platform visual ones, like MySQL Workbench) at your
disposal, validation, decent type selection, options for more complex
validations using triggers, everyone knows it (unlike UML), etc.

Another great 'model' that people are usually sitting on top of, also ripe for
code generation, is the software dependency databases in many distributions.

Another is your RCS/VCS system.

Another is - and here comes the kicker - the filesystem, which with modern
tools (inotify, etc.) can give an impressive amount of RCS/VCS style
functionality (COW, event notifications, etc.) at kernel level.

What was that quote again? "Unix is simple, it just takes a genius to
understand its simplicity". They weren't lying.

~~~
meaty
If you're using code generation, your tooling or architecture needs
rethinking. There is 100% no reason to do it.

~~~
pdwetz
Sorry, to say there's 100% reason not to do anything is rather silly. By your
own definition, what if your tooling lets you do codgen? Regardless, I've
found code generation to be particularly useful in many scenarios and I would
never dismiss it out of hand.

~~~
meaty
By my definition, I'd avoid it. The pitfalls, additional abstraction and
debugging complexity just aren't worth it.

------
xweb
I've found that the primary value for me in NoSQL implementations is in
allowing for customization at the data level for the end users without having
to explicitly build structures for it. So if User A wants to add a tracking
field to all her data, she can do so without my having to build an extensible
data structure on the back end; it's already there, and it doesn't affect any
other user records.

------
ericfrenkiel
There is something to be said about the ambivalence of using SQL or NoSQL for
OLTP-esque applications (loosening the definition of a transaction, that is),
but SQL is very useful when working with OLAP-esque applications, i.e. writing
a SQL query and avoiding drafting a map-reduce job or building the logic into
a custom script, and still using good stuff like secondary indexes, etc.

------
mjouni
More than often, reading the schema is the lowest hanging fruit when you are
trying to understand the domain knowledge of a startup. This becomes even more
true when the codebase is not that readable. Now this is not an enough of a
reason to shy away from NOSQL data stores, but should make you pause and
consider what you will be losing when you invest in a schemaless DB.

------
tgandrews
As long as you have the data then you can calculate it. It may be easier with
SQL and a normalised form but that forces a lot of other constraints.

With a non-relational DB the idea is to attack it with map reduce and possibly
in app calculations. This may not be ideal, but the benefits of easy placement
and access of the documents outweigh the aggregation problems on most of the
problems I have dealt with.

Neither solution is perfect you just need to make a prediction based on the
projects requirements and make a choice.

------
veesahni
I've been using MongoDB to "start" with for quite a while (and used AppEngine
before that). Instead of using the low level mongo libraries, I use a wrapper
like Mongoid to provide a high level object mapping.

With MongoDB + Mongoid, the schema is in the code in the form of Mongoid
models and data stored in MongoDB mimics the in-memory application structures.
I've found that this makes it much easier to visualize data flow - although it
does force one to think carefully about their schema up front.

------
wheaties
I still like GridFS for simple distributed file systems and Redis for N-length
lists of string and/or binary content. That said I have to agree with the
author whole heartedly.

------
cageface
_At one point you need to have an understanding of what you're dealing with:
you need to know of what type a value is._

If we agree with this then why is it so controversial to assert that types are
also useful at the language level? For me working in a dynamically typed
language is just as awkward as working with a schema-free datastore. Writing
code is hard. Why not let the compiler help you?

~~~
marcosdumay
Typed languages add a ton of boiterplate to what would be simple operations.
That slows you down, more than most people notice.

Now, that said, explicity typing makes it easier to catch a lot of mistakes,
and helps you keeping you data consistent. It's very usefull... But slows you
down.

As a conclusion, you want to explicitly type anything that is important, but
also wants to keep the explictly typed data to a minimum. (What makes optional
typing, like in Haskel very interesting.) And, in nearly all the applications
all the important data (and very little unimportant data) goes to the DB.

------
jbverschoor
The title is very correct. SQL is agile.

When can we PLEASE stop messing up names? Everybody who's talking about
"NoSQL" should be thrown into a deep black pit.

SQL is a query LANGUAGE. It's excellent at what it does. To be honest, I like
HQL, even though I'm a ruby guy since about 5 years.

The whole scalability, consistency, acid, schemaless discussions are actually
not the point.

------
dotborg
I'm not sure about your point on schemas. When using relational database you
must have your schema in two places: 1. database and 2. ORM - the second one
is enough to avoid most of problems.

What's the point of having schema in database then?

~~~
pielud
1\. Your ORM can derive it's schema from the database's.

2\. I think the main point of the post was that you can run ad-hoc SQL queries
no matter how much you've denormalized. You can't necessarily do that with a
NoSQL database.

~~~
camus
Yet you are still using schemas in 2 places : not DRY , SQL just doesnt fit
OOP design. Most of languages today have strong functional capabilities , that
makes SQL obsolete , they have functions , event systems , RDBMS exist because
people use to query those systems directly , does your users log into your
database directly ? no they connect your database through middleware, that's
where the job should be done.

~~~
jerf
DRY is "Don't Repeat _Yourself_ ", not "Don't Repeat Ever". If your ORM fully
correctly derives everything it needs from your database schema, that doesn't
count as a repeat. If your ORM needs a couple of hints, but those hints are
indeed extra information that your DB can't have ("this isn't just a string,
it's an IP address") that doesn't count as a repeat. If _you personally_ typed
your schema into a database creation script and then _you personally_ also
typed the same schema into your ORM specification, only then are you violating
DRY.

Code generation is a powerful DRY tool, not something it suggests avoiding!

I say that independent from the question of whether you should be using an ORM
at all. If you are going to use it, either your ORM should come out of the DB
or be responsible for creating the DB, either is fine, as long as you're not
saying the same thing twice.

~~~
steveklabnik
> DRY is "Don't Repeat Yourself", not "Don't Repeat Ever".

It's actually "Every piece of knowledge must have a single, unambiguous,
authoritative representation within a system."

(this advances your point even further)

~~~
jl6
And makes clear that DRY is just a handy catchphrase for the design principle
of normalization.

------
hayksaakian
I avoid SQL not for the notions, or concepts behind SQL Databases, but for the
SQL itself; the LANGUAGE of it all (specifically the syntax).

Nowadays I work with things like ActiveRecord and Mongoid, which add an
abstraction layer on top of the literal interface language. However, at the
very beginning of my intro to databases (quite recent, last 6 months), the
prospect of learning a new language just to talk to databases was off putting.
The notion of documents and fields made more sense to me, as it was more
comparable to a file system. SQL tables, rows, and columns were daunting and
seemed like a relic of paper spreadsheets.

The ecosystem around SQL solutions if clearly far more developed though,
however I don't think we should avoid non-SQL solutions for that reason.

Give the alternatives a chance to develop.

Don't look at MongoDB through SQL tinted lenses.

~~~
lucian1900
Having to know yet another language is annoying, but mongo has one too (and
one that's less consistent and powerful).

A good SQL library (like SQLAlchemy) lets you easily build queries without
looking at SQL. The only case you might have to is debugging, which is the
same time you'd be looking at mongo queries.

------
camus
SQL is not DRY, especially when people stuff their apps with bloated ORMs. SQL
exists because people used to query RDBMS directly , now most of the time
databases are simple data repos , it makes sense to put most of the logic in
the application itself.

One should not have to deal with types in the app and types in the database,
exceptions in the app and database errors , events in the app and triggers ,
functions in the app and stored procedures , or it means one doesnt trust it's
own middleware.

SQL makes apps insecure, and hard to refactor, and is a non sense when doing
OOP design ( here comes the bloated ORMs again which usually come up with
their own query langage ... linq , dql , hql ... ).

Doesnt mean data modelling is bad , on the contrary that's the first thing to
do , but data modelling doesnt need to fit the SQL mindset ( thinking in
tables and columns ). Objects are parts of collections that can hold
collections of other objects themself. Why should one try to fit that model
into SQL ? it does not.

MongoDB may not be the best NoSQL solution out there neither , there are
better products.

~~~
jeltz
One major advantage of using SQL databases is that they provides excellent
guarantees for data integrity. This means I do not have to think much when
refactoring my data. And I do not have to worry about production systems
crashing due to old data being different from new.

When modifying the schema in NoSQL databases I always worry about if I mess up
and leave old cruft which will cause my code to crash, or even worse if I lose
data (data created between the last backup and the upgrade).

With proper SQL constraints there are many fewer ways where you can screw up
when refactoring. And easy refactoring of data is key to agile development of
systems which have been put into production.

