

Two common mistakes when using databases - Rexxar
http://caml.inria.fr/pub/ml-archives/caml-list/2008/02/b75fe4f240d3a3985e50f9bd23013579.en.html?submitted_two_years_ago

======
aaw
A well thought-out post and a good read, but both of the author's points about
the limitations of relational databases are a little misleading.

The first, "Every single element in a relation (aka table) has to be exactly
the same type" is true in many relational databases, but isn't necessary:
SQLite, for example, is implemented with a nice form of dynamic typing where
individual columns can take on multiple data types.

Second, "SQL isn't even Turing-complete" is sort of a folk theorem in computer
science that isn't necessarily true anymore, especially with all of the
proprietary SQL dialects in existence: the original standard, SQL92, is
basically equivalent to relational algebra, which has the expressive power of
first-order logic - it's so weak it can't even express concepts like graph
reachability. But constructs like recursive queries have been added to the SQL
standard since SQL92 which may make even standard SQL Turing Complete.

~~~
gaius
PL/SQL and T-SQL are certainly Turing complete.

~~~
samuel
AFAIK PL/SQL is not "Oracle's SQL". It's executed by a different engine and
you get context switches when you execute SQL Code (don't know about T-SQL).

But Oracle's SQL is by itself Turing complete since the introduction of the
"Model" clause (see here:
[http://www.adellera.it/investigations/nocoug_challenge/index...](http://www.adellera.it/investigations/nocoug_challenge/index.html)
how you can compute the FFT using it).

Moreover, I read somewhere that the engines that implement the "recursive
with" are also Turing Complete, but can't recall where.

------
rbranson
This seems like an aimless rant. Rails (and Hibernate to my knowledge) uses
SQL for exactly what he recommends: filtering, aggregation, and joining. The
RDBMS is a very real and concrete concept in ActiveRecord. If you don't know
SQL, it's going to make it difficult to get beyond the most basic ActiveRecord
functionality.

~~~
fauigerzigerk
ActiveRecord and Hibernate can certainly do everything SQL can do. But the
modeling and data access patterns they encourage are inconsistent with the
strengths of relational databases and the relational model.

The point of using a relational database system is to think in sets of tuples
because doing that makes life simpler (for some types of tasks). You apply one
of a handful of set operations to a set and you get back another set of
tuples.

Once you start assigning types and custom operations to particular
combinations of attributes, you lose this recursive transformability and hence
the simplicity of the model. What you get is a horribly complex monster.

The whole OO idea is a tangled mess. ActiveRecord and Hibernate do their best
to help you view the database as a tangled mess as well.

~~~
ubernostrum
Well.

I've written at length about this in the past, and basically my opinion comes
down to there being two ways to approach this. One is driven by the
application, the other is driven by the database.

The application-driven approach is, I believe, the vastly more common use
case: you have some _things_ in your code (objects, data structures, whatever)
and want some way to persist them and fetch them back later according to their
properties, so you use a SQL-based DB because, hey, you can get them for zero
monetary cost (PostgreSQL, MySQL, SQLite, etc.), every language has ways to
speak SQL and every developer on your team knows SQL.

The database-driven approach is, in my experience, less common: you start with
the database, design it meticulously, take full advantage of the relational
model and the features it and your DB offer, build as much as possible in the
DB layer and then let people write applications which talk to it.

Both you, and the author of the reply in the email this thread links to, seem
to inhabit the latter use case. And that's fine; I know people who do that
stuff and I respect it. Where the problem comes in is that you have a tendency
to assume that _your_ use case and _your_ approach are or should be the _only_
use case and approach. Which is, frankly, wrong. There are plenty of
situations where the application-driven approach does just fine and will take
you a very, very long way (and when it breaks down, it probably will not break
down because you need to switch to the database-driven approach; it'll break
down because of other things), and where the database-driven approach really
isn't a great fit (for various reasons).

~~~
fauigerzigerk
I don't disagree with your application driven vs data driven distinction and I
agree that it's impossible to say in general which one is better. My
experience is that it is rare for an application to stand on its own for long.
Data has a longer life span than application code, but anyway, both scenarios
do exist and both are legitimate.

But making this distinction says little about what are good design principles
in order to reduce the complexity of a system. My opinion is that data and
relationships among data items should be represented uniformly on the
application level as well as on a broader data management level.

Creating one API per combination of attributes leads to a combinatorial
explosion, an increase in coupling and unnecessary mental load. It makes
generic transformation of data and querying very difficult indeed. We need to
work with few general purpose data structures that are easy to reason about.

And no, I'm not assuming that my approach should or even could be the only
one. But I think the standard reply of "one size doesn't fit all" or "use the
right tool for the job" has become all too fashionable. There are many tools
for the same job, so I have to have an opinion and I have to choose. I'm not
an authoritarian person at all, so I don't care the least if you make a
different choice ;-)

~~~
ubernostrum
Except you're falling right into that worldview.

You seem to be arguing for databases which exist and are structured
independently of any applications which happen to access them. Personally, I
think this makes about as much sense as arguing for, say electrons to have
well-defined properties independently of anyone trying to measure them, which
is the same as saying that there is no such thing.

You also seem to have trouble accepting that there may be lots of situations
where there is exactly one application, and if that application goes away,
then so does the company (or the department, or the project). In those cases,
I don't see much value in trying to make the database be independent of the
application; the database exists to serve that application, and if they happen
to be tightly coupled to each other, so be it: sometimes that's how you get
something to work.

------
didroe
I liked the OPs reply to this message:

>Thanks for your reply. It was quite interesting, though I get the feeling you
used my question solely as a trigger to share with us a long-held
dissatisfaction with the current state of affairs concerning the use of
databases, regardless of whether it actually applies to my particular problem.

------
Kirby
For beginners, this is definitely good advice. Particularly the first point -
if you're using a relational database, and don't structure your data around
its strengths, you'll take a profound performance hit. And sure, don't pass
around data structures if you don't know what you're doing, and MySQL is a
pretty crummy place to put them.

However, once you're doing real work, sometimes translating to XML and back is
extraordinarily expensive.

The best approach is a hybrid. Use the database to store things relationally
if possible, and using defined APIs for sure. And if your translation stage is
expensive, use something like memcached to make sure you do that translation
as infrequently as possible. _This_ is the layer that it's appropriate to
store serialized data structures at. It's not permanent storage, you can blow
it away when you change the internal structures, and nobody external is
relying on it. But you end up, in most programs, with even more speed benefits
than if you'd stored it in the database like this to begin with - the initial
build can be expensive, but then you're reading basically from memory. (Not
all data is well suited to this approach, but if your data is read frequently
and written to infrequently - there's few things you can do to increase
performance more than this.)

