

The Database Rant - arohner
http://arohner.blogspot.com/2009/04/db-rant.html

======
andr

        SQL was invented in the 70s for business analysts to write reports without having to go bug the programmers.
    

So what? Unix was invented in the 70s because Ken Thompson wanted to play
Space Travel. [1]

    
    
        Queries are not composable.
    

Per your example, you want to throw out SQL because using AND and OR doesn't
feel right. I'd love to see you compose a 10-line SQL query. Also, views ARE
query composition.

    
    
        PL/SQL is primitive by today's standards, yet it's the only game in town for certain classes of features, like stored procedures and triggers. 
    

Untrue. Oracle, the creators of PL/SQL, now let you write stored procedures in
any JVM or .NET language (so Jython, Clojure, etc.). MS SQL supports .NET
stored procedures. PostgreSQL supports Tcl, Perl and Python. Also, your
alternative suggestion of a simple and stupid data store DOES involve fetching
all the data from the database into RAM and operating on it in your
application. You can already do that with SQL.

    
    
        It's not obvious which queries are slow.
    

Untrue. MySQL can log slow queries and help you analyze them in real time
using tools like mytop. I assume similar tools are available for all
databases. When you write code it's not always obvious if it will be slow,
either.

    
    
        No back door for map, filter, reduce
    

Isn't the WHERE clause filter? Isn't GROUP BY very similar to reduce? SELECT
UPPER(firstname) is map? With the ability to define custom functions in
different languages that's a fairly powerful tool. Also, comparisons between
MapReduce and SQL are misguided, because MapReduce does not operate on tabular
data.

Finally, your solution is based entirely on the idea that people will prefer
Lisp to SQL.

SQL is not perfect for everything, but if you want to suggest a better
alternative, start with a solid argument.

[1] <http://en.wikipedia.org/wiki/Unix#History>

~~~
davidmathers
_if you want to suggest a better alternative_

How about an RDBMS that exposes a relational algebra api and then people can
experiment with developing alternative languages/interfaces instead of having
a single designed by committee string-eval based atrocity.

~~~
sokoloff
What percent of the top N (your pick of N and your choice of big-ness metric)
websites do you suppose are using predominantly string-eval'd SQL? I don't
think it's very many; I'd wager that most big shops are quite happily using
SQL, using stored procedures, views, and parameterized queries far more often
than using "jam SQL into an nvarchar and execute the resulting string".

SQL may have its warts, but in the world of "I gotta get things done and not
sweat data persistence", I think it's pretty wildly, successfully, and fairly
universally used, almost like the cockroach of the data center.

When I look out and think about what's going to kill our ability to take
orders, scale to new levels of growth, or to have development be efficient at
developing new features, SQL the language doesn't make my top 20, and probably
doesn't even rise to the level of "all other receiving votes". MS-SQL the
database engine probably does make my top 10, but that's only because we're
continually chopping away at the items on that list. It does take us a while
to train brand-new devs on how to write effective SQL, but it takes us time to
train them on how to write C#, how to write good test cases, how to structure
code for testability and write unit tests, etc. IOW, it's just one of a
laundry list of needs, and a proposed alternative interface would also be
something on that list to teach them.

------
neilc
_Current databases don't give the developer a way to bypass the optimizer._

Sure they do. Oracle and other commercial DBs provide "planner hints", which
essentially force the optimizer to use a particular access method / join
method for a given part of the plan.

 _why does SQL decide that declarative style is the One True Way to get access
to your data_

Well, part of the reason for a query optimizer is that if the physical
representation of the data changes, the queries don't need to change. So that
if you add or remove an index, you don't need to adjust all the queries on the
table, for example. Obviously the query planner is imperfect, but planner
hints are fairly common in practice. The further you stray from declarative
queries, the more application changes are needed when physical database
properties change.

 _Datatypes don't match your PL_

If you want a strongly-typed database system that can be accessed from any
language, you need the DB's type system to be distinct from any of the client
language's type systems. This isn't always the right tradeoff, of course.

~~~
arohner
_part of the reason for a query optimizer is that if the physical
representation of the data changes, the queries don't need to change._

And that's a great reason to use the declarative query as the default. What
I'm complaining about is when you need to eek out the last bit of performance
out of a query, there should be another option. Sort of like dropping into
assembly to optimize a tight loop. Of course you don't want to write the
entire program in assembly.

 _you need the DB's type system to be distinct from any of the client
language's type systems._

I don't agree. Just because your DB's type system uses say, Java, doesn't mean
that other languages can't access the DB. Non-Java languages would access the
DB the same way they do today, but at least the Java developers wouldn't have
to convert.

------
vomjom
I was all prepared to rant about how wrong he was, but he convinced me.

I've had all kinds of problems with PostgreSQL because its planner would
choose a poor query plan in certain situations. Specifically, we don't have a
good way of estimating random page fetches for index scans on multi-column
indexes.

I also had this weird problem suddenly occur. When one of my tables got to a
certain size (about 40 million rows), Postgres started giving a completely
different query plan that it thought was faster. The query execution time went
from a few minutes to many hours. It took me all day to debug this problem
because you simply don't expect the query plan to suddenly change like that.

To tell Postgres that you want the query to run in a different way, you must
turn off certain things, like disabling sequential table scans or bitmap
scans. You can't tell it specifically how you want it to execute. The reason
is pretty simple: SQL is a declarative language. You declare want results you
want, and the server figures out how it'll get you there.

I think the author is right. We need a simple language where we can tell the
server exactly how we want to get the result. This is essential, because, like
the author said, finding the optimal query plan is NP-hard. In practice, most
programs run only a few queries. It'd be worth the time invested to try out a
few query plans and find out which one is the best and stick with that over
having the SQL server replan the query almost every time you submit it.

There could even be third party planners where it could take SQL input and
return the resulting executable s-expr if you want to retain that
functionality. This way, the development of planners remains separate from the
development of the database.

------
russell
Any attempt to throw away SQL is misguided. If you are going to do server side
development, you must know SQL. Thinking in sets is hard work, but you must
learn it. For the most part fancy ORM's just get in the way. The solution is
not less SQL; it's more SQL. Stored procedures are your friends. When I write
a complex query, I develop and test it entirely in SQL, before I write a line
of procedural code. It's way easier.

The thing about query optimization is that the optimizer stops when it thinks
going on will take longer to perfect it than executing what it has. If you
have a query that takes too long look at the query plan. Break up the query
into smaller queries and use temporary tables. Give it hints.

Dates are often poorly handle in language libraries. Store them as UTC in the
database and pound them into shape in code.

Apologies for the counter rant.

~~~
davidmathers
_Any attempt to throw away SQL is misguided._

Maybe not throw away, but why should we have only 1 relational language?
Functional programmers can choose between scheme, ocaml and haskell. Why
should we be stuck with what is in so many ways the COBOL of relational
languages as our only option?

~~~
russell
What I meant was that, as far as I have seen, the tools that try to replace
SQL by some OO intermediary, give you poorer abstractions that limit what you
can do. SQL seems clunky, but I think the core is quite well done considering
when it comes from.

OTOH a relational relational database doesn't fit all needs. They scale pretty
well, but when you get to gigantic data sets, they don't do so well and things
like map reduce do a better job. Or cases where you are streaming massive
amounts of data and you need performance, not ACID transaction reliability.
Actually, we are more in agreement than disagreement.

~~~
neilc
_They scale pretty well, but when you get to gigantic data sets, they don't do
so well and things like map reduce do a better job._

I don't think the evidence supports that. See the recent MR vs. parallel DB
comparison, for example: <http://database.cs.brown.edu/sigmod09/>

There isn't a good massively parallel open source SQL implementation right
now, but that is just a matter of engineering -- SQL databases have been shown
to scale to massive data volumes (e.g. Fox Interactive have ~200 TB of user
data in Greenplum, which is a parallel DB based on Postgres).

------
yummyfajitas
His point about composability is not a complaint about what sql does. It's
merely a complaint that he thinks that function composition should be used to
combine predicates, but that isn't how sql works.

I.e., he wants "SELECT * from X where P(x)" to be a function acting in X. It
isn't a function (or at least should not be thought of that way), it's a
description of a set:

{x : x in X, P(x)}

The _predicates_ ARE composable:

{x : x in X, P(x)} intersect {x : x in X, Q(x)} == {x : x in X, P(x) and Q(x)}

~~~
arohner
Author here. I was thinking in terms of an entire query being a function that
returns data.

Also notice that your set notation demonstrates composability, but the
equivalent SQL does not.

~~~
neilc
How is your example SQL "not composable"?

 _Notice that this solution is composed of the first two queries, however
composing in SQL kills your performance, and this is generally not the right
way to structure the query._

It doesn't actually "kill" performance at all, as a typical SQL optimizer will
rewrite the query to remove the subquery, or else to push the outer predicate
down into the subquery (which for this query accomplishes basically the same
thing).

 _I was thinking in terms of an entire query being a function that returns
data._

A query returns a relation, and that relation can be used as the input
relation to another query, either via a subquery or via views. That seems like
pretty powerful composability to me.

~~~
davidmathers
_A query returns a relation_

Actually it doesn't. That's one of the problems with SQL and part of the
reason SQL databases aren't truly relational databases.

~~~
neilc
_Actually it doesn't._

Can you elaborate on what you mean? The presence of duplicates is orthogonal,
if that's what you're referring to.

The point is that the output of a query can be consumed by another query
(either via views or subqueries), which enables query composition and logical
data independence.

 _SQL databases aren't truly relational databases._

People who get hot and bothered about that are almost invariably cranks, in my
experience.

~~~
davidmathers
_People who get hot and bothered about that are almost invariably cranks, in
my experience._

Honestly I think you, being a database expert, don't fully appreciate how bad
the situation really is. Relational databases are almost universally not
understood by programmers. Why? I think it might have something to do with the
fact that programmers can't actually use the relational database as a
relational database.

They have to programmatically _construct a string_ (seriously, wtf?) to do
some relational processing "over there" in the scary database and then
_serialize_ the result and import it into a data structure in their language--
the complete destruction of the relational idea.

Using a relational database should be similar to using jQuery in javascript
programming. The anti-"relational database" backlash is huge right now. And
since most of the people involved don't understand what a relational database
is they don't understand that it isn't relational databases that they have a
problem with.

Have you seen this new thing called Redis? It's "like a key-value store except
values can be sets." Wait, it's a "structure server." Pretty soon someone is
going to have the brilliant idea of letting the keys be sets also.

I predict that within 3 years the anti-relational people are going to invent
the relational database, without realizing it, and be amazed at how awesome it
is. Because for the first time in their lives they'll be able to actually use
an actual relational database.

That's my database rant.

P.S. As for the SQL != Relational people like Chris Date and Hugh Darwen I
don't see how they could be considered cranks. Bitter old men? Maybe. But
cranks? No.

~~~
silentbicycle
Good point, about redis.

I think part of it, also, is that a lot of peoples' exposure to relational
databases is via MySQL and PHP, which is ... not representative.

Most people don't understand why the N-normal forms are significant, etc.,
because they learned SQL from a simple "how to save and load rows" kind of
tutorial, rather than learning it from the big relational ideas outward. (I'm
a "learn how it works as a whole" guy, but I recognize that most people
aren't.) If you don't know why they matter, it just seems like a bunch of
bizarre busywork, and then you get lousy performance.

------
iigs
Have you looked at <http://en.wikipedia.org/wiki/MonetDB> ?

It exposes some of the DB guts at a different layer while leaving a SQL
"porcelain" feature available for people who desire that.

------
jrockway
It sounds like his ideal database already exists; AllegroCache. It's an object
database, but you can easily restrict yourself to certain data types to make
it "relational", if that's what you're after.

Some of the open source databases are almost as featureful; Elephant for CL
and KiokuDB for Perl come to mind.

------
jasonkester
I never went through this phase myself, but I see a lot of developers stuck in
it. It's this sort of exasperated anger towards a technology that the author
has tried his best to understand but failed.

For some reason some people never really wrap their head around Relational
Databases, Normalization, and SQL. They seem to not be wired for it. And it
understandably pisses them off.

It's people just like this author that are behind all those terrible Object
Relational Mapper projects that try to pretend that your data isn't really
data. It's just an attempt to escape the obligation of truly understanding how
databases work and how they can help you. It's sadly ironic when you think
about it.

~~~
arohner
No, I'm pretty sure I understand the technology. I've written thousands of
lines of queries, triggers, stored procedures. I've read the source to
Postgres.

In the article, I specifically say ORMs are a terrible idea. I specifically
say Relational databases, data integrity, foreign key constraints, declarative
queries and ACID are all good things. I forgot to mention normalization, but
that's good too. My point is that SQL is a terrible interface to get to all of
the good things about RDBMs.

------
zandorg
I recently made a GUI for a search utility which is filled in, and then it
builds an SQL query from the GUI, and then displays the results of the query
in another GUI.

My point is that you can map a search to whatever language you like -
whichever is best for that utility.

Plus a friend told me about Linq, which is an interesting evolution from SQL.
It lets you break up queries over several cores (though that's just because
it's iterative in nature).

------
vozoscuro
how about a 10+ table join on f(any function)?

------
TweedHeads
Those who complain about SQL just haven't used Oracle, fully.

