
Sqlphobia: The Irrational Fear of SQL - fogus
http://www.javaworld.com/community/?q=node/4196
======
Zak
I can think of a couple reasons not to like SQL.

First, it's ugly. It's from an era when it was popular to try to make
programming languages look like English. Most people have since decided that
trying to make code look like English text is a Bad Thing.

Second, it is very common to create SQL queries by splicing together strings
in whatever language you're using for the rest of the application. That's a
horrible way to write any kind of code. ORM is, I think as much an attempt to
avoid that as an attempt to avoid SQL itself. CLSQL provides SQL semantics
with Lisp syntax that can be used directly in Lisp programs. I'm not sure what
other languages have similar tools, but that seems more like the Right Way to
me.

~~~
bobbyi
For python, the SQLAlchemy library provides excellent ways of programmatically
building SQL queries rather than splicing strings together. It also offers an
ORM, but that is an additional layer on top of the SQL stuff that you don't
have to use.

~~~
didip
The ruby equivalent for it would be Sequel.

------
jrockway
The problem is that SQL is not a high-enough level of abstraction over the
relational algebra. Application developers want to say "Give me all the blog
posts with 8 replies written by users with a karma more than 42." This is
pretty trivial to code if you are using real data structures, or if you have a
compiler from set arithmetic to SQL ("ORM"), but pretty difficult if you have
to write the statement manually. SQL doesn't abstract relationships, it
abstracts set operations.

Programmers want to avoid SQL for the same reason they don't want to write
their applications in assembler. Assembler doesn't abstract computational
abstractions, it abstracts away the CPU microcode. Since we want to write
programs in terms of computations and not CPU micro-operations, assembly is
tedious and pointless. SQL is the same way; we don't want to write apps in
terms of inner joins. We want to write them in terms of the high-level
relationships. ORMs allow this.

(Key/value stores are also the wrong abstraction, they don't abstract much
except physically writing the bits to disk. "Yay.")

~~~
bfung
>"Give me all the blog posts with 8 replies written by users with a karma more
than 42."

    
    
      SELECT P.*
      FROM blog.Posts P
      INNER JOIN (  --postid and count of replies
        SELECT P.PostId, COUNT(UserReplies.PostId) NumReplies
        FROM blog.Posts P
        INNER JOIN (  --Replies with users karma more than 42
          SELECT R.PostId
          FROM blog.Replies R
          INNER JOIN blog.Users U ON R.UserId = U.UserId
          WHERE U.Karma > 42
        ) UserReplies ON P.PostId = UserReplies.PostId
        GROUP BY P.PostId
      ) PostReplies ON P.PostId = PostReplies.PostId
      WHERE PostReplies.NumReplies > 8
    

After writing the query (3 min, while eating pizza), there's obviously much
room for optimization in table design and query design. I doubt the ORM'ed
developer would be able to see the optimizations not understanding how to
write the SQL (so they can reason about indices, aggregations, etc...).

My take is, most people fall into the trap of programming language X is THE
language of this application. If an application has a relational database
component to it, it'd be best if there was someone who understood SQL and the
connection points to the application.

~~~
jrockway
Programmers can reason about runtime cost without seeing the assembly. Why is
SQL any different?

------
DenisM
All the anti-sql bitching can be summed up in one sentence: "what do you mean
I have to learn a complex tool in order to use it?!". Yet somehow most
developer got over "what do you mean you have to learn about algorithms and
data structures before you can actually program?", but this other one seems
like an insurmountable obstacle to a great (and vocal) many.

It's really tiresome.

~~~
weavejester
I think you're putting up something of a straw man. There may be some people
who dislike SQL because it is too complex, but there are plenty of valid
reasons to dislike it as well. SQL is not a particularly well thought-out
language, and the relational model does not suit all problems.

~~~
DenisM
No solution suites all problems. Does it even need to be mentioned?

 _SQL is not a particularly well thought-out language_ Uhm, that not a very
substantiative argument - care to clarify? Bear in mind that the structure of
the language is 100% dictated by the Relational Algebra - form follows
function to the letter. And the very constraints that algebra puts on what you
can do are the same constraints that allow the system to function based on
your intent, so that you don't need to define or in most cases even chose the
algorithms. The only wiggle room you have in designing a language based on
relational algebra is the choice of tokens and ordering of language constructs
(e.g. FROM before SELECT of after). So?

~~~
weavejester
SQL is based on relational algebra, but does not strictly follow it. For
instance, each tuple in a relation is unique; but an SQL table can have
identical rows. So firstly, SQL isn't a language that is "100% dictated by
relational algebra".

But even if it were, the syntax is still poorly thought out. There's no
consistency in SQL; each command has its own grammar. You "SELECT ... FROM"
and then "INSERT INTO ...". Command syntax is effectively arranged at random.

Worse, there's no extensibility. Most programming languages, when faced with
the problem of manipulating data, make use of functions. Instead of having
specific syntax for selecting columns or filtering by a condition, a more
sensible approach would be to make them functions that lazily transform
tables. Then the user could define their own operations, and we'd no longer
need to generate a page of SQL each time we wanted to do something complex.

Next problem; integration with calling languages. SQL is not designed to
integrate well with ORMs or other client libraries. We either want a language
that's basically little more than a serialized AST, or take an approach
similar to CouchDB, and farm out specific calculations to some server or
library running our programming language of choice.

Finally; optimization opaqueness. It can often be difficult to predict how the
database server will optimize a particular piece of SQL. It might be useful if
we could drop down to more primitive, but more predictable, operations when
necessary.

In short, I think that if a relational database system were to be designed
from scratch by experienced programmers, it would look very different to the
SQL databases that are around today.

~~~
DenisM
You can achieve function-like behavior for vertical or horizontal slicing of
tables by using views - I use them all the time, and mine are updateable. Some
vendors allow parametrized views, e.g. Table Valued Functions in MS SQL (sadly
those aren't updateable I think?) and Common Table Subexpressions. Between
these three spaghetti SQL doesn't have to exist.

I spent a lot of time writing SQL and the ordering of SELECT ... FROM doesn't
bother me at all. Frankly, I find this argument silly and I do not want to
entertain it - when buying offroad SUV I will only compare the color of door
handles when all else is equal, which it isn't. The only serious part of it is
intellisense problems which admittedly could use some work (this is why LINQ
is the other way around - table name is always spelled first).

The other two points are entirely valid. The integration is a real pain. The
opaqueness can be mitigated by vendors as well - MSSSQL allows hints to force
desired behavior - but that doesn't so much drop down in abstraction level as
it bastardizes the existing one.

~~~
weavejester
It's not a matter of aesthetics; it's a matter of power. If each table
transformation is a function, the language becomes considerably more flexible.
Consider the following syntax:

    
    
        users where(u => u.login = "foo" and u.passwd = SHA1("secret" + u.salt))
    

Since "where" is a function, we can isolate just this transformation:

    
    
        authenticate(login, password) := 
           where(u => u.login = login and u.passwd = SHA1(password + u.salt))
    

Now "authenticate" is a table transformation as well:

    
    
        users authenticate("foo", "secret")
    

This is a trivial example, but this could just as easily apply to more complex
queries. One could construct functions to manage "nested set" tables, or
create custom aggregation functions. Everything that is currently difficult or
impossible under SQL.

Because SQL is so limited in what you can abstract, there's very little room
to factor out common code. Views, stored procedures and even T-SQL functions
just operator on set tables. With a more flexible syntax, one could have
third-party libraries containing more complex table transformations.

Personally, I wonder whether we need a separate language for database queries
at all. It would be nice if there were a just a library we could use to create
our own database server inline, using whatever language we wish.

~~~
DenisM
I see what you mean, and yes it would be nice. In essence what you advocate is
a table type and a number of tables conforming to that type could be then
passed into a function which operates over the parameter given what it knows
about it by the type description. It's a very appealing idea and many times I
wished it was possible.

My guess as to why it isn't possible is that in the database world there is an
assumption that all records of the same type lie in one table, i.e. the table
_is_ the type. And if you look at it form theoretical relational theory point
of view, that is so. Alas in practice we do have things like third-party
components and we have to worry about lifetime (e.g. transient table-typed
variables).

To be fair Oracle I think has something called "package" which allows writing
reusable code. Don't know much about it, but supposedly that's what it does.

Also, MSSQL has notion of Table Type which can be used to declare variables. I
actually implemented (a part of) it while I worked there. :)

------
protomyth
My problem with SQL has always been the optimizer. Without heavy hinting some
implementations (I'm looking at you Sybase) went off and did things in a very
non-optimal way.

When programming in a declarative language, you really need to trust the
action it takes make as much or more sense than a developer picking the step.

------
yesimahuman
Using the Django ORM my application is built much more quickly than if I were
to write SQL by hand. That matters more to me than most anything else,
especially since the performance is perfectly reasonable and caching helps for
the rest. Of course I still have to add indexes and certain
constraints/triggers by hand, but the payoff is huge.

One could also say that the more you know about SQL and RDBMS's, the better
you are able to properly use an ORM.

------
mistermann
I can't think of a single article I've read with an anti-orm stance that has
addressed what my issues are, and are successfully solved with an ORM.

For most ORM advocates, it is not a "fear" of SQL, it is a productivity issue.
A good ORM gives you an entity model with inter-object relations, a powerful
query language, in-memory object cacheing, and all sorts of other useful
things, for free. They can be a huge productivity boost. And for those "what
about this" scenarios that are always raised, a good ORM will allow you to
easily drop down into calling native storedprocs when that makes sense. (This
also applies to the supposed performance issue with ORM's)

I've read most of the anti-orm articles out there and I always get a
disingenuous feeling from the author, like they are knowingly not being
entirely honest.

A few other issues I've found with SQL: \- business rules or perhaps more
appropriately, behaviours, when implemented down at the stored procedure level
(which in a non-orm environment is exactly the proper thing to do) are largely
hidden from the calling source code. So when reading the code, much of what is
happening is not obvious unless one examines each SP. When using an ORM, this
logic can be brought up into a higher level of your application making it much
more visible and accessible.

\- in large projects, every time you need a certain subset of data (filtered),
or a different set of fields in the resultset, you end up writing yet another
stored procedure, likely just for that task. Often, a stored procedure may
already exist that would server your needs, but in a __large __project with
multiple developers, how does one know of its existence, and even then, find
it? I've found almost without exception that large projects start to degrade
into a quite disorganized mess that is very hard to understand for a new
maintenance developer. Of course, many of these problems __can __be avoided
with proper organization and discipline, but the fact is in many places, they
aren't. I find a good ORM largely prevents the problem in the first place.

------
yangyang
It's not just SQL the language. You also need to decide where you implement
stuff - in the database, or in the app. I've seen applications written where
almost all the logic is in stored procedures, triggers and functions, and
others where just straightforward DML is used. "Best practice" can vary
widely, from banning stored procedures altogether, to forcing all access to be
through them.

And the point someone else made about putting SQL together as text is very
valid. I used the Roguewave C++ libraries for generating SQL a few years back.
Whilst it was quite clever, and useful in some circumstances, most of the time
is was a real pain to figure out how to express SQL in C++.

------
bellybutton
Common programming idioms, for example iteration, are hard to express in a SQL
query, and I think if you try to force iterative ideas on this declarative
language the result is frustration or fear.

The need to be able to express yourself iteratively was a motivator for
Microsoft's DryadLINQ - a programming language for distributed computation
that has SQL-like constructs integrated into C#, giving you a high-level
language with some of the declarative benefits as well as some of the
imperative benefits (including more natural ways to express iteration).

~~~
DrJokepu
_I think if you try to force iterative ideas on this declarative language the
result is frustration or fear_

But fortunately we have (tail) recursion to fight back in most declarative
languages. Which is something that's completely missing from SQL. There should
be a way to express recursion in SQL queries in my opinion.

~~~
jamii
Datalog is more or less equivalent to SQL with recursion. I don't know of any
serious, database-backed implementations but there are a lot of in-memory
implementations and some interesting extensions for distributed databases.

<http://en.wikipedia.org/wiki/Datalog>

------
simonw
I've been using SQL (in the form of MySQL, PostgreSQL and SQLite) for nearly a
decade, and I still feel like I only understand a fraction of the capabilities
of those products. I've been using redis for a few months and I feel like I
understand almost everything it can do. That's the core of my SQLphobia - I'm
more comfortable with tools I feel I've mastered.

~~~
wanderr
Redis does less, so there's less to understand. It's a simple tool that solves
less problems. Less scary? Yes. Less robust? Definitely.

~~~
nzmsv
Smaller generally means more, not less robust. SQL databases do have the
advantage of decades of testing though.

~~~
wanderr
I mean robust in terms of being able to deal with a wider range of
circumstances. Perhaps smaller means that in general (don't really have an
opinion on that), but in the case of Redis vs SQL, Redis is less robust by
design. That's not necessarily a bad thing, it all depends on your needs.

~~~
nzmsv
I meant robust in terms of security and lack of other bugs. So I guess I'm
agreeing with you :)

------
gscott
All of this would be solved by throwing in x million records into whatever is
being developed. All problems will come out immediately and they can be solved
instead of blaming SQL/Relational Databases.

------
sid0
His comment about ORM is interesting. I'm about to start a small database
project for my class, and we've chosen web.py to work with. To us web.py seems
like the right level of abstraction -- it allows us to do simple things with a
nice pythonic syntax, but stays out of the way and allows us to use raw SQL
for joins and such. (I can't even imagine how a full-blown ORM would represent
joins!)

~~~
Lewisham
The way I think about ORM is that I'm navigating through a data structure. The
ORMs I've used (DBIx::Class, SqlAlchemy) represent joins as arrays, so if I
want to get all the Employees from the Company object, I just do:

employee_array = myCompany.employees

and out pops all the records.

It's easy to understand _in the base case_. Anything with any real complexity
to it is doomed to failure, and then you have to start monkeying around with
SQL in your mapper anyway.

I won't start a project with a pre-existing database with ORM, it's too
squirrely to get the mapping right. However, if I need some persistence for my
app easily, ORM provides a good fit.

