
Learn SQL, dammit - jpgjb
http://gun.io/blog/learn-sql/
======
zzzeek
The main point that one should know SQL as much as possible before using ORMs,
I agree with fully. The point that applications should be written by quick-
prototyping with an ORM, then replacing the ORM entirely with raw SQL, I could
not disagree with more. Since he is using my own ORM (SQLAlchemy) as his
example, I'd like to point out (as many of you know I always do) that
SQLAlchemy's entire approach is one of exposing the relational model at all
times. There is virtually no need to drop out of using the Query and/or
select() constructs into raw SQL strings, as these constructs can represent
SQL fully and more or less directly, not really any different in concept than
when the relational database itself parses an incoming SQL string into a
tokenized parse tree internally. The results returned are tuples. If your
tuples happen to line up with the attributes in your object model, then you
can tell it to get objects back. There is no sharp red line between "I'm using
an ORM!" vs. "I'm using SQL!". With a mature tool like SQLAlchemy you're using
_both_ , and the tool is there to _automate_ your work with SQL, not to
replace it.

So of course, learn SQL as fully as possible. But I recommend using an ORM
that allows you to make full use of your SQL knowledge at all times.

~~~
tjr
If you've already learned SQL and are comfortable using it directly, do you
think there is any reason that it'd be wrong to continue doing so?

~~~
jrochkind1
Any reason it would be 'wrong'? There are few things in software engineering
that are universally 'right' or 'wrong', it all depends on context.

There are plenty of people who already know SQL and are comfortable using it,
but still prefer to use an ORM though. It's not like ORMs are only used or
liked by people who don't know SQL.

~~~
tjr
Fair enough. Replace "wrong" with "generally considered a suboptimal idea".

~~~
jrochkind1
Still, 'it depends'. Yeah, there are all sorts of reasons, depending on
context (and plenty of people to argue all those reasons).

Personally, I agree with OP that nobody should ever use an ORM as an excuse to
not know SQL. You need to know SQL anyway.

So, once we agree on that, your question "Is there any reason to use an ORM if
I know SQL" (that's basically what you mean when you phrase it in the inverse
"Is there any reason it would be suboptimal to NOT use an ORM", right?) --
basically just boils down to "Is there ever any reason to use an ORM?".

A topic which is basically the equivalent editor/OS war of db-based web
development on HN or reddit. Meaning it's an argument that can and does go on
forever, and you can find in-depth treatments of in many other threads.

~~~
tjr
Thank you. I believe my question has been answered, even if I failed at
expressing it the way I intended.

------
pjungwir
For a while I used to ask interview candidates to explain the difference
between WHERE and HAVING, to see if they'd ever done anything beyond the
basics. I'm still not sure if that's too hard, but people who could answer it
did tend to do much better in the rest of the interview as well.

~~~
archon
This prompted me to go look it up, since I didn't know. HAVING is WHERE for
aggregate functions (SUM, etc).

Funny thing is, I've used HAVING a lot in the past, but couldn't have
explained the difference succinctly without cheating and looking it up.

~~~
zorbo
I deal with pretty complex queries daily, and I completely forgot what HAVING
did exactly. I guess it's because I long ago realized that if I had to use
HAVING in a query, I'm doing something wrong. That's almost always a
Reporting/BusIntel tool's job, not mine.

~~~
6cxs2hd6
That's a great point. Part of learning SQL is learning what parts of it _not_
to use. Things like HAVING and ORDER BY are useful for ad hoc queries at a SQL
command-line, but are either not useful or a waste when there's another level
in the system like a BI tool.

------
sehrope
If you really want to master non-trivial SQL learn to think about filtering,
joining, and combining data via set theory. Once you can logically break down
the set you're trying to get to the rest then just becomes learning/looking-up
syntax.

The other big piece of advice is the tried and true incremental approach. The
more complicated something is, the more likely I am to use the SQL client the
way one uses a REPL and incrementally write the query:

    
    
        1. Write basic SELECT
        2. Add another clause (WHERE filter, GROUP BY, etc...)
        3. Execute (syntax/sanity test)
        4. Finish or Goto step 2
    

Just like everything else in programming it's amazing how much simpler things
are when you just piece them together one step at a time.

[1]: <https://en.wikipedia.org/wiki/Set_theory>

[2]: [http://seanmehan.globat.com/blog/2011/12/20/set-theory-
and-s...](http://seanmehan.globat.com/blog/2011/12/20/set-theory-and-sql-
concepts/)

~~~
tom_b
I believe it's somewhat impossible to overestimate how REPL-like data munging
with SQL actually is. Particularly as the complexity of your data exploration
increases.

Celko calls this "thinking in sets."

------
redcircle
He left out an important reason to know and use SQL directly: you should
always be aware how you are using your indices. This goes beyond making an
index for a column in the table: you also need to know whether the query will
map well to a btree index. Mobile, with its low resources, makes index
optimization even more important; apple's Core Data can easily kill your app.

------
pjungwir
A class of query I love that scares off a lot of developers is a correlated
sub-query, where the subquery references a value from the outer query. For
example, finding all employees with at least one assignment:

    
    
        SELECT *
        FROM   employees e
        WHERE  EXISTS (SELECT 1
                       FROM   assignments a
                       WHERE  a.employee_id = e.id)
    

For a while in Oracle this was a _lot_ faster than IN/NOT IN. I'm not sure if
that's still the case, or if it's true for other systems. I believe I read
that in Postgres the query planner does the same thing whether you use
EXISTS/NOT EXISTS or IN/NOT IN.

EDIT: This kind of query is great with Rails scopes, because you can write
something like this:

    
    
        class Employee
          scope :with_assignments, where(<<-EOQ)
            EXISTS (SELECT 1
                    FROM   assignments a
                    WHERE  a.employee_id = employees.id)
          EOQ
        end
    

and that is easily composeable with other scopes/conditions/etc since it
doesn't force you to use any joins. Yay for mixing SQL with your ORM!

~~~
obviouslygreen
I recently discovered that MySQL before 5.6 (5.5 being the latest available on
RDS, of course) _does not honor indexes_ if they should be invoked for a
subquery, e.g.

select whatever from wherever where user_id in (select id from users where
somethingorother like '%lol%');

Got an index on user_id? Too bad. Ignored.

If you precompute the values, though?

select whatever from wherever where user_id in (1, 2, 3);

Sweet, I love indexes! I'll definitely use them.

~~~
michaelt
Let's say your inner query returns n rows and wherever contains m rows. And
your index is a B-tree, so you can hit it once in O(log m).

If you use a nested loop join, where you hit the index once for each inner
query result, that's O(n*log m). If on the other hand you do a hash join,
skipping the index but doing a full table scan of wherever, the complexity is
O(m+n). So which is the faster choice depends on how many rows the inner query
returns.

If you want to plan up front, before you know what m is, how would you decide
which join to use?

------
nsxwolf
Everyone uses an ORM. You use a well known, documented, and supported ORM, or
you're writing your own wether you realize it or not.

Don't believe me?

1\. Do you have objects? 2\. Do you have relational data?

There's the O and the R. How do you get them together? That's where the M
comes in. You use a library that knows how to do the M, or you do your own M
with a bunch of getters and setters, for loops and case statements.

Eventually, any little change to the database becomes a regression nightmare.

Once you find yourself saying "I know, I'll build a code generator to create
these DAOs", that's when you should finally realize you should have used a
real ORM. Sadly, many people still won't get it at this point and will go
ahead with the code generator.

~~~
papsosouid
>Everyone uses an ORM

Bullshit. You are making absurd generalizations based on your personal view of
how the rest of the world operates.

>1\. Do you have objects?

No, I do not. That makes it pretty obvious that I do not use an ORM doesn't
it? "Everyone" includes more than just people using OO languages.

~~~
nissimk
I don't know why you're being downvoted. Even when you're using an oo
language, there's not always a reason to force your query results into an
"Object." Frequently the only object you need is a 2 dimensional data
structure which could be a list of dictionaries, or a DataTable or something
like that. You don't always need a special named structure for the results of
every query. ORM's always seem to be designed for people who have a one to one
mapping between data structures in their program and tables in their DB. I
find that this very rarely makes sense, particularly when you're talking about
analytical applications.

~~~
nsxwolf
Great, you have a list of dictionaries of dictionaries. Now the user has
updated some of the items inside one of the nested dictionaries.

The user clicks save. Now what? You've got to get that change to the data in
the dictionary inserted into the correct place in the relational database.

You're going to write code to do by hand what an ORM wants to do for you.

------
mongol
I don't understand how it is possible to develop anything database-related
without knowing SQL. Are these people not querying the database directly to
figure out what it contains (during development, debugging, testing etc etc)?

~~~
plorkyeran
I've never seen a GUI frontend that managed to remove the need to write custom
queries while inspecting the data, but if you don't know SQL I could see
someone not even realizing all the things that can be trivially done with a
relational database, and just settling for manually cross-referencing tables.

------
untog
I once interviewed a guy with a _Masters_ in computing (of some sort, I
forget) who didn't know SQL. He'd been developing for years, but lived
entirely in .NET land and just used ORMs.

Absolutely crazy.

~~~
jmmcd
I have a PhD in computing (of some sort, I forget) and I know about 3 SQL
commands. I've been developing for decades, and I never need to interact with
databases. Like everyone in my field of research, I keep my data in text
files.

~~~
dragonwriter
> I have a PhD in computing (of some sort, I forget) and I know about 3 SQL
> commands. I've been developing for decades, and I never need to interact
> with databases.

If you were a developer that _did_ interact with databases (but not a DBA or
specifically a "database developer"), you could probably get by with the four
that correspond to CRUD operations directly (SELECT, INSERT, UPDATE, DELETE),
so knowing "about 3" isn't all that bad.

~~~
chris_mahan
Technically, an UPDATE is a DELETE and INSERT. so if you only knew SELECT,
DELETE, and INSERT, you could do crud.

~~~
dragonwriter
I was referring to what you could practically survive with as an application
developer who didn't control the databases involved; for the reasons
prodigal_erik states in their reply, DELETE + INSERT, while you can do
equivalent transformations to UPDATE with them, aren't going to be a general-
purpose replacement for developers of real-world applications against real-
world databases that they do not control.

~~~
chris_mahan
I agree with you. I was just saying that it is possible to do crud with just
three sql commands. Not friendly, I agree, but possible.

------
bayesianhorse
Most of my experience is with the Django ORM. The point of the django ORM is
not to replace SQL knowledge, even though that is feasible. The main point is
reusability.

I had to work with SQL through PHP for a while and I found myself "composing"
SQL queries in a myriad of ways. I tried to not repeat myself, but it felt
like the Django ORM would have gone a lot further in cleaning up the query-
building.

In conjunction with Django forms and Django Admin, maybe even the template
language, the ORM makes query construction reusable.

One of the kickers is the ability to unify object construction from table
columns. It's easy to convert a string or number to some Python field. It's
more elaborate with Decimal, Json, or whatever you want to cook up.

------
thelarry
How do you optimize your system if you don't understand the queries that the
ORM generates?

~~~
evan_
You throw more hardware at it, of course!

------
tomku
What are some good resources for methodically learning SQL? Like many of the
other devs I know, I learned a hodge podge of SQL while working on other
projects, but I've never had any formal or comprehensive training on it.
Ideally I'm looking for a book or two, and I don't mind if they start simple
as long as they're comprehensive and recent enough to be relevant to modern
RDBMSs.

------
addflip
Learn to write SQL views. It'll make you feel all warm and giggly inside.

~~~
habitue
Then learn the pain as everyone uses an expensive view in all sorts of queries
because it's easier, and they dont understand the performance characteristics
of joins well enough to understand why they shouldn't do that.

------
daigoba66
An ORM isn't necessarily complex. It just needs to map a tuple to an object.

Where things get complex is when your framework starts introducing other
concepts such as query generators, unit-of-work, caching, lazy-loading, etc.

One of the most critical features is query generation, which I think is the
point of this article. Simple queries are pretty easy to abstract, such as
loading rows by primary key or querying based off a simple index. Other
queries, especially aggregate queries, get tricky fast. I argue that often it
is much harder and more work to construct an appropriate query via your
frameworks query generator.

Fortunately many good frameworks allow you to essentially write the exact SQL
to be executed and the rest of the framework (mapping, caching, unit-of-work)
"just works" with the results.

------
overshard
Learn caching, dammit

~~~
rheide
No. First, learn the internals of your system. Learn SQL. Learn NoSQL.
Understand why your software is slow. Then, once you've optimized the worst
case away with decent SQL and/or a good object-oriented architecture. Then and
only then should you start with caching. Too many developers these days are
just brainless monkeys who dump everything in memcache.

~~~
jeffasinger
A coworker mentioned to me that she came in early every day so she could run a
report that took about an hour and a half before the CEO got in.

I looked at the code, nothing looked THAT bad, so I did an EXPLAIN, noticed a
missing index, added it. I ran the report in 4 minutes.

Clearly, whoever wrote that report didn't know nearly enough about SQL.

------
Demiurge
The article has the good premise, of course you should learn SQL, but it goes
over the top and makes some false presumptions for the sake of the argument:

'Think about it, though: it’s absurd that you would even need to learn any SQL
at all! The very nature of an ORM is to bypass SQL.'

I think the nature of ORM is to map object oriented code to relation data. So
not to bypass, but to pass between the two conveniently. Conveniences that
ORMs do automatically that otherwise you do manually are type checking, sql
sanitization, and merging logic (methods) with the data in the same class.
Knowing SQL does not make the above tasks any easier, so does not, in any way,
prompt dropping ORMs.

Also, there is the wrong use of 'begs the question' in the same paragraph :)

~~~
derleth
> Also, there is the wrong use of 'begs the question' in the same paragraph :)

That was a bad translation from day one, and people are now using the phrase
in a way that makes more sense.

------
f4stjack
This was also discussed in Rob Conery's speech on Norwegian Developer's
Conference:
[http://ndc2011.macsimum.no/mp4/Day2%20Thursday/Track1%201140...](http://ndc2011.macsimum.no/mp4/Day2%20Thursday/Track1%201140-1240.mp4)

------
Trufa
Google cache/mirror: <http://goo.gl/3tclG>

------
baddox
> SQL is faster (because an ORM is an abstraction layer on top of SQL)

You can't make this stuff up.

------
jrochkind1
I am not an ORM hater. I like ORMs a lot.

But the point that one needs to know SQL even if one is using an ORM is just
obvious to me. It boggles and scares me that anyone thinks they can be a
competent web developer without knowing SQL.

I was going to say "...if they use an rdbms, maybe they just use some NoSQL
and can get away without it." But you know what, nope, not even that caveat --
if you don't know SQL and rdbms, you aren't going to be competent to know if
some nosql is right the choice, or which one, either.

------
lucb1e
So I know SQL... but what's that ORM he assumed I'm familiar with?

~~~
antoko
Object-relational mapping.

<http://en.wikipedia.org/wiki/Object-relational_mapping>

------
jamesmiller5
For those interested in sharpening their SQL skills I have found these two
books to be a great resource.

1\. Joe Celko's Trees and Hierarchies in SQL for Smarties 2\. Joe Celko's
Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL

~~~
jrochkind1
Joe Celko's more general "SQL For Smarties" is how I learned SQL, it is a
great book (or, at least, was like 3 editions ago)

------
Tycho
The nice thing about SQL is that it doesn't take very long to learn how to use
it and then it's incredibly useful any time you have a database to interact
with. I think all novice programmers should take a crack at it.

~~~
dragonwriter
More generally, I think that learning about the relational model (which, in
practice, probably means learning to use SQL, even though SQL is in many ways
far from ideal) of data is fundamentally mind-opening in how you understand
and deal with data and makes you a better programmer _in general_ , even when
you _aren't_ interacting with an external relational database.

------
moron4hire
I see no reason why one programmer can't learn just about everything related
to their application. I expect the programmers who work for me to be experts
in SQL, CSS, and everything in between.

~~~
benjamincburns
Everything?

[https://plus.google.com/112218872649456413744/posts/dfydM2Cn...](https://plus.google.com/112218872649456413744/posts/dfydM2Cnepe)

~~~
moron4hire
nobody likes a pedant.

~~~
benjamincburns
I truly wasn't being pedantic. Pretty much every project I've worked on has
required me to learn some new layer of technology that I never even knew
existed. Sometimes it's just too much for one person and you have to rely on
the strengths your team as a whole. The idea that a single developer can
rapidly learn all of the layers/technologies affecting his/her project is a
fallacy at best.

[Edit: However, in the spirit of the OP it's perfectly reasonable to expect
your devs to know or be able to learn SQL.]

------
krsunny
I dont understand how its possible for someone to be the "best python
programmer you know" yet that person doesn't know enough SQL to display data
from a table...?

~~~
dragonwriter
> I dont understand how its possible for someone to be the "best python
> programmer you know" yet that person doesn't know enough SQL to display data
> from a table...?

Because knowing SQL doesn't make you a better Python programmer. It might make
you a better _application developer_ , but SQL knowledge is not a subset of
Python knowledge.

~~~
krsunny
Agreed, however It still seems truly odd to me.

------
xradionut
Learn accounting, too!

~~~
chris_mahan
Don't forget econ and statistics.

