

Databases Should be Dynamically Typed - sharksandwich
http://blog.mongodb.org/post/215738382/databases-should-be-dynamically-typed

======
seldo
(I also commented on the article, but nobody seems to be commenting over
there, so reposting...)

The advantage of statically typed RDBMS that he's leaving out here are (a)
storage optimization (b) query optimization (as jawngee has already noted).

You could get halfway to simulating a dynamically-typed RDBMS by declaring all
your columns as, say, VARCHAR(5000). You could store strings, integers,
floats, dates, etc. all in there pretty simply. However, they would use a lot
more storage space as strings than they would as native data types (e.g.
integer 1000 is one byte, string '1000' is 4). Over a large data set that
would really add up.

Secondly, when doing queries, your comparison operators (e.g. WHERE date >
2009-07-06) would be way less efficient as string comparisons than native type
comparisons.

I don't want to be dismissive and curmudgeonly about this, but over and over
what I hear from people enthusiastic about NoSQL solutions is that they solve
the current problems of RDBMS, while forgetting all the great features that we
have spent the last 30 years building _into_ database systems.

The current "limitations" of SQL-based systems are often in fact age-old
trade-offs that we made, but people have forgotten the positive benefits of
those trade-offs. The reason databases are statically typed is because that
saves storage space and processing time. The reason there's a standardized,
domain-specific language that you have to learn is because having to learn a
completely new API and mental model every time you want to access a data store
from a new vendor is inefficient. Oh, and the reason SQL is so complicated is
because _relational algebra is complicated_.

Sure, SQL and RDBMS have their limitations. They're not the right tool for
every job, and they are not even 100% perfect at the jobs where they are the
right tool. But too often I hear people saying "fuck SQL!" simply because they
don't want to learn it, and because they're too early on in their little pet
project to realize the scalability problems a NoSQL system is going to run
into that RDBMS solved 20+ years ago.

~~~
evgen
> The reason databases are statically typed is because that saves storage
> space and processing time

The problem with this argument is that when I can slap several TB into every
node in my db cluster the storage cost issue is basically moot. When it is
cheaper to buy ten small boxes than one beefy server the processing time and
query optimization issue also starts to come into question. At some point
soon, and we may have already passed that point, the cost of centralizing db
queries through a small number of expensive, highly tuned servers necessary to
maintain the pillars of the RDBMS model will be outpaced by the benefits of
abandoning both optimizations in favor of being able to use a small, cheap
fleet of boxes running a distributed database.

The tradeoffs that were made several decades ago when the RDBMS and SQL
ascended to their current dominance may have made sense at the time, but a lot
has changed since then. You may try to delude yourself into thinking that
there are "scalability problems a NoSQL system is going to run into that RDBMS
solved 20+ years ago" but given the fact that almost all of these systems were
designed with an eye on under what conditions RDBMS failed and in light of
decades of research into distributed systems and scalability that did not
exist when SQL/RDBMS emerged I find it hard to believe that there are
scalability problems that RDBMS solved that NoSQL systems will run into. Hell,
most of them were designed to solve specific scalability failures in big data
systems where RDBMS fell over and died.

~~~
wvenable
You're putting a lot of faith into this. The fact is that performance,
scalability, features, and correctness are a trade-off. RDBMS is optimized for
certain conditions and these NoSQL databases are optimized for other
conditions. NoSQL databases are horrible at things that RDBMS's do extremely
well and vice-versa. NoSQL isn't some magic discovery -- of course they're
going to run into problems that RDBMS solved years ago!

Your argument for small boxes seems to be the same argument for one large box.
My (extremely average) desktop has a terabyte of storage in it -- why do I
need a bunch of boxes running a distributed database? One big server is a hell
of a lot easier to manage than a fleet (a fleet!) of small boxes.

------
jawngee
<sarcasm> Yeah that would be totally awesome! Now our databases can't optimize
our query plans and any aggregate functions will be doing super cheap type
conversions between strings and number types!

And, really, who cares about data consistency? Sure that stupid little script
is inserting strings into what should be a numeric column, but I'm sure that
won't cause any problems at all when the monthly reports run, or the batch
processes that charge our customers run, because the database will know how to
handle such cases because the DB developers have written all that extra logic
in it for such cases. </sarcasm>

<reality> Stop being lazy. </reality>

~~~
reg4c
Data consistency could be maintained through stronger runtime input
verification which in turn nulls everything you said.

I don't think that it would be a bad idea to have dynamically typed databases
although it would put a little more strain on developing the actual
application instead of having to spend time on making the database.

~~~
elblanco
This works really well for embedded applications since the code logic _should_
be doing some type checking and verification before insert.

SQLite is a very well known relational database with dynamic typing, it's
quite possible the most deployed relational dB in the world since it runs in
everything from browsers to mp3 players and if the data inserts are well
controlled, don't really result in any particular problems.

~~~
prodigal_erik
> logic should be doing some type checking and verification before insert

There are a minority of coders who are actually anal enough to get that right
100% of the time in every version of every app that touches the database. But
those are not the people who are reckless enough to want to omit types from
the schema. If you think doing it the easy and reliable way isn't worth the
hassle, you are not actually going to do it the hard way, even if you can
convince yourself you will.

~~~
elblanco
Dunno why you replied to me, but I agree. In single app usage, in a well
controlled and well defined situation, it's probably "okay" and you can get
away from it. There are _some_ environments like this. But I agree that in the
general sense, especially with lots of apps banging on the same set of data,
it's bad form.

~~~
prodigal_erik
I just try to counter the argument "the database doesn't need to validate
because the app can do it" wherever I see it, because it just doesn't happen
that way even though it's technically possible.

------
neilc
_even with a statically typed database, type matching errors storing data are
only reported at runtime! (That is, our java compiler doesn’t check our MySQL
schema.)_

That isn't necessary the case: for example, PG'Ocaml allows Ocaml programs
that access a PostgreSQL database to have their type consistency checked at
compile-time.

<http://developer.berlios.de/projects/pgocaml/>

------
ct4ul4u
How does he get from the the text of the article, which approximately says "It
would be useful in some cases for a database to be dynamically typed" to the
sweeping statement "Databases Should be Dynamically Typed".

As others have noted, relational databases have a rigorous theoretical
foundation that is not consistent with what he describes as dynamic typing. He
should consider looking at a graph database. Many of these are billed as "RDF
Stores" or "Triple Stores". Franz's AllegroGraph is an excellent example.

------
parenthesis
SQLite is: <http://www.sqlite.org/datatype3.html>

~~~
wvenable
Hey, is there anyone out there that uses SQLite and actually uses it's "duck
typing" feature to some effect?

I'm using it, but I still declare my "column affinity" and store values with
the proper types. I don't put strings in my number columns, for example. I
can't even imagine a good reason for that.

------
bayareaguy
One situation that calls for a typeless database is when the system _must_
accept data from a source that's unwilling or unable to adhere to a well-
defined static schema.

~~~
timwiseman
Perhaps, but then perhaps a database is not the proper store for that at all,
and a relational database almost certainly isn't.

If you really must do it via a relational database, you could use something
equivalent to varbinary(max) to accept whatever stream of bits is sent in and
then parse them out into something meaningful later (or just store it for the
applications retrieval).

~~~
bayareaguy
X ∈ { database } does not imply X ∈ { sql database } or X ∈ { relational
database }.

Even if your data is not a good fit for the relational model, it's often a
good idea to rely on a database system offering features such as transactions,
an ad-hoc query facility, enforcement of access permissions and update
policies, replication/recovery, efficient I/O, etc.

~~~
timwiseman
All true and good points, but if you feel the need to say that perhaps I was
not being clear enough.

1\. In the situation described so far, a relational database is almost
certainly not the optimum solution.

2\. If you must use a relational database, you can mimic the ability to accept
arbitrary input using something equivalent to Varbinary(max) columns.

3\. A non-relational database may be the optimum solution in your particular
case, but perhaps it is not. Before using some form of key-value store or
object oriented database etc, think about non-database solutions. Depending on
precise circumstances perhaps simply writing the input out to a file or into
an ordered file directory structure is a better fit. It may or may not be
depending on further details.

------
ecq
you can use SYS.ANYDATA and SYS.ANYDATASET if you use Oracle. I'm sure other
databases support similar feature.

