
Schema-less is usually a lie - mrkurt
http://blog.mongohq.com/schema-less-is-usually-a-lie/
======
programminggeek
100% agree with this post and it's important to understand where/when schema
in the code makes more sense than schema in the db.

If the DB is determining the schema and your code just mirrors it, you will
inevitably do things like triggers, stored procedures, etc. that essentially
put application code inside the DB. This makes testing and maintenance of such
things all but impossible, even if it does make the DB queries fast. At the
same time, your application still needs to mirror the schema of the DB or
things will break.

"Schemaless" (query schemas) isn't better per se, but having the mindset of
putting the schema in the code means you can write tests to ensure the schema
over time. Depending on your perspective, one might be better than the other,
but as they say "knowing is half the battle"

~~~
camus
> This makes testing and maintenance of such things all but impossible

Developers have been maintaining such apps for decades without any problem.
And everything is testable , even a stocked procedure.

If you dont care about data integrity, then make your application responsible
for your schema...Some do care. Non-rel databases guarantee 0 data integrity,
with very little performance gain over relational-databases.

Finally most frameworks allow developpers to generate the db schema during
development without writing a single query. with all the cache layers like
redis and other goodies , there is little to no reason to use non-rel
databases.

~~~
tracker1
A given RDBMS doesn't guarantee any data integrity. The integrity is a bit of
a lie. I've seen plenty of _real world_ databases with software created by
idiots where a DateTime field was defined as a VARCHAR. Or fields that should
have been a VARCHAR created as CHAR, then bugs happen with the strings aren't
auto-trimmed.

IMHO the database these days _SHOULD_ be fairly agnostic. Also normalization,
which is the core of your so called integrity actually REDUCES performance. At
my last job, our main VIEW into the data took over 28 join operations for some
highly normalized data (much of which had to overcome some bad data in the
db).

We setup a MongoDB database for searching against, as well as being able to
pull up a single record without dozens of join operations, and it was a _LOT_
faster, against real-time data... the search that was replaced was a batch
process that recreated a single table every half hour. In this case MongoDB
was a _much_ better fit.

Sorry, but SQL databases don't guarantee any data integrity either. It's up to
the developers that implement those schemas... and the fact is, for most of
them, they are better off doing that in their primary application code.

Also, if you are using an ORM tool to "generate" your schema from code, then
what advantage does said schema's "integrity" give you?

------
gregwebs
I disagree with the idea that MongoDB has struck the right balance. I have
seen production issues from manual editing of data which resulted in a
floating point instead of an Integer.

I really wish MongoDB had support for a strong schema in the DB besides
indexes (in addition to existing support for schema-less). I only actually
want schema-less at most 20% of the time, but I am stuck with it for
everything outside of indexes.

~~~
spamizbad
Completely agree. I'd love it if you could provide collections a baseline set
of document validation rules- rules that only apply to fields you explicitly
mention.

I guess what I want is a schema- _ish_ database, where I can say "these 5
fields are required and must conform to these rules. Anything else is fair
game."

~~~
vyrotek
I've been looking some something similar to what you've described. I recently
discovered PostgreSQL has a 'hstore' feature which seems to support this. You
seem to be able to make a normal table and then specify one column to act like
a key/value dictionary. These values can even be included in queries.

[http://www.postgresql.org/docs/9.0/static/hstore.html](http://www.postgresql.org/docs/9.0/static/hstore.html)

~~~
antsaasma
PostgreSQL also has JSON support. You might want to check out Christphe
Pettus's presentation on this subject:

[http://thebuild.com/blog/2013/07/02/postgresql-as-a-nosql-
da...](http://thebuild.com/blog/2013/07/02/postgresql-as-a-nosql-database/)

------
leeoniya
from
[https://news.ycombinator.com/item?id=3838131](https://news.ycombinator.com/item?id=3838131)

'...but every time I see "Schemaless" in MongoDB, I think "oh, so you're
implementing schema in your application?"'

~~~
marcosdumay
And missing most of the centralized documentation people use everywhere.

But if you only have one application, you don't need centralized
documentation.

~~~
tracker1
You can create an API for data access that validates a given document
schema... you don't have to operate in the open.

------
jbellis
Cassandra is not schema-less. I covered this in my recent article on Cassandra
Myths: [http://www.infoq.com/articles/cassandra-
mythology](http://www.infoq.com/articles/cassandra-mythology)

------
kailuowang
To me, schema-less simply means that you don't need to maintain the data
structure in a separate schema definition. It doesn't mean that your data
doesn't have a structure - the structure is in your data, query and indexes,
it's just easier to change without that extra "schema" definition.

~~~
wvenable
It's not easier to change. When you have a schema-less database you don't
change the schema, you add a new one for new data. Any data with an old schema
continues to have it. So at any time you have up to _n_ schemas simultaneously
in your database if you changed it _n_ times.

------
taeric
Isn't this akin to the idea that many dynamic programs actually have fairly
static types?

~~~
6ren
Stretching it: natural language was grammatical before grammar was invented.
Natural laws existed before they were discovered. But it can be easier to
change grammar/law/types/schema when not explicit.

~~~
joe_the_user
_"...it can be easier to change grammar/law/types/schema when not explicit."_

Now that's quite interesting statement. I'd say it is true for humans but
false for computers and it's paradoxical to think about why this is.

If computers were an intelligent as humans, you wouldn't have to worry about
giving your program any structure because the computer could change that
structure later. But sadly, spaghetti code isn't what it's cracked up to be.

Similarly, an application where don't bother thinking about your schema
beforehand isn't going to be application which you can change easily later.

~~~
taeric
It is not just spaghetti code, though. Right? It is code that does not
constantly check that input fits an expected pattern. Which is most code. I'm
reminded of this talk by Sussman.[1] Essentially, in software we are
(understandably) adamant about not modifying the input of our application for
"pure" functions. Nature goes right the opposite, often attempting to make the
input work at all costs.

So, whereas a schema based application that models a user with an address is
unable to cope with being given two addresses. A schema-less approach where it
just stores blobs simply stores what it was given. If code that reads this is
unable to make sense of multiple addresses, it will raise an error. Not
necessarily unlike a user being told to send a package to an address, but
given a list of addresses.

[1] [http://www.infoq.com/presentations/We-Really-Dont-Know-
How-T...](http://www.infoq.com/presentations/We-Really-Dont-Know-How-To-
Compute)

------
Jare
> Indexes on col­lec­tions in Mon­goDB are very much “schema”

I think that's all you need to read in this post. It's... rather thin IMHO.

------
Encave
Here are some slides from a good talk by Martin Fowler about schemalessness
and how there is always an 'implicit schema'
[http://martinfowler.com/articles/schemaless/](http://martinfowler.com/articles/schemaless/)

------
just2n
Isn't a schemaless database just a filesystem?

~~~
camus
it'w worse , it is basically a bin where you throw paper. You can still search
for a specific one ,but as you throw more paper in the trashcan , it gets more
complicated to find the paper you want. but hey ,it is easier and faster than
to organize them on a rack they say ...

~~~
tracker1
It really depends on the non-sql solution... Most of which do have indexed
keys for retrieving data at the very least (key-value storage), and use Map-
Reduce in order to perform basic queries, or you setup your own indexed
records.

MongoDB specifically allows for additional indexes to be used as part of
general queries, or the aggregation methods available. MongoDB even has some
nice built in features for querying against geolocation data. They do have
some limitations, but in general it works quite well.

Your analogy comparing non-sql to a trashcan is a bad comparison... a better
comparison would be to a set of file cabinets.

------
thelarry
i feel like a lot of people seem to jump on to "schemaless" dbs because they
feel the can just dump whatever they want into the database. they also run
into a problem later on when they realize that they need to care about data
integrity and types.

~~~
tracker1
Or you have a data structure that would require a lot of tables for what
equates to a single record of interest.

In my last job, we were presenting search, and display of classified listings
for cars. The normalized data required, iirc, 28 join operations to get most
of the data for a single record (for display), about 12 iirc for the search
support (not including geo/location based searches), and a second lookup for
related data.

This could be replaced by a single query in a non-sql database. There is a
real cost to these kinds of structures in SQL... There are a _LOT_ of use
cases where a single record structured as a complete object is much better
than having to break up said structure into dozens of fields.

------
gsabo
I believe Kyoto Cabinet is an example of a truly Schema-less DB:

[http://fallabs.com/kyotocabinet/](http://fallabs.com/kyotocabinet/)

