
"I've isolated the bug to a database query" - dmarinoc
http://thedailywtf.com/Articles/The-Query-of-Despair.aspx
======
kabdib
One company I was at did a merger with another startup, and most of the other
company's engineers quit. Amongst the piles of Visual Basic we found a stored
procedure that was about 5 pages long. It took about 18 hours to run; its job
was to do a daily report.

I hate being afraid of code. I spent a day with it, got to understand it, then
rewrote it as a couple of queries and some Java code, whereupon it took about
five minutes to run.

[... then there was the guy who implemented bitwise AND and OR by precomputing
some 65536 entry tables. Wow. Why do I find all the really howling bad stuff
so close to databases?]

~~~
tom_b
"Why do I find all the really howling bad stuff so close to the databases?"

Because the database and queries against were were coded as an afterthought.

Because the programmers ran all the db access tests against tables with 4 rows
of data when the customers would windup with 100 million rows in production.

Because most undergrad education around databases is poor and antiquated.
Plus, it is not SQL focused - much time is spent on talking about data
modeling. Not wrong, but not often helpful in practice.

Because DBA's are (rightly) focused on making sure your db is properly backed
up, replicated, and fault-tolerant. They may not even understand SQL queries
beyond the basic one table select.

Because the users will force data into the systems that the developers never
anticipated.

Because db jobs are segregated by product for _development_. If you've done
SQL coding on Oracle for the last 10 years, good luck applying for a job doing
SQL coding on MS SQL Server.

Because SQL hacking doesn't get much respect. I've been in job interviews
where deep into the process they ask me a question with data in two tables
that requires an outer join. And answering correctly differentiates you
_significantly_ from other applicants. I guess it is the "fizz buzz" of SQL
these days . . .

~~~
endersshadow
I work with large databases every day. I cannot tell you the amount of people
I've interviewed that weren't able to solve a left join question. Let's leave
right and full outer joins out of the discussion--a left join is as advanced
as I dare get off the bat in an interview.

I've once had a guy I worked with ask for help when he was with a new company.
I told him to look at the query plan. He replied, "Yeah, but that doesn't tell
you much."

Relational databases are oft-misunderstood and underestimated time and time
again. Where are the Codds of yesteryear?

~~~
matwood
I've never understood the typical programmers aversion to query writing.
Writing SQL is in general pretty easy (basic set theory).

I'm guessing it's all about what people have experienced. I have been coding
against RDBMSes since my first job while still in college. I have even read
Celko for fun at one point :)

IMHO, learning the _hard_ parts of SQL only come from experience with
particular RDMSes and dealing with large datasets. For example, knowing that
the only way to get a query to run against a large Teradata table without
running out of spool space is to create and use temp tables only comes as a
result from trying to run the query.

~~~
andymurd
One of the biggest hurdles that typical programmers must overcome is that SQL
is a declarative language, and therefore quite different to the imperative
languages they are used to.

~~~
gaius
Imperative programmers all secretly believe in a declarative style, that's why
they compile with -O and hope the compiler's "query optimizer" will make it
fast for them.

------
DanielBMarkham
Just so folks know, there are tools that will decompose queries and make nice
little pictures out of them. With something like this, you'd have to use it
just to get started.

Once you've visually decomposed it, you'd physically decompose it by splitting
it inside-out. Then proceed to understand and debug inside-outwards.

Not fun, but not impossible. Just a huge pain in the ass. Making it more fun
would be a database with bad RI, nulls, and duplicate data all over the place.
Don't get me wrong -- from looking at the image it definitely looks like
aspirin will be required. :)

~~~
offbyone
I've always believed those tools must exist, but sorting out the SEO crap and
advertising copy that gloms up search results for them is painful. Can you
name any Linux/Mac tools like that?

~~~
DanielBMarkham
Not linux, but I have a windows example. Being an El Cheapo, what I do from a
windows box is fire up MS Access, link into the database (and this works even
if the database is on a *nix box somewhere), then drag the text of the query
into the graphical query builder tool. It's taken some pretty complex stuff
apart for me in the past.

It's been a while since I last deep-dove in a complex database, so I don't
have any other examples handy. Sorry. Maybe somebody else can pick up the
thread. I remember reviewing a bunch of them several years ago -- these tools
have been around for a long time.

------
MattBearman
I once worked on a site where the original developer clearly didn't know joins
existed, so if he wanted data from two related tables, he'd get all the
required results from table one, then loop through them, one by one, querying
table two for the corresponding record. Sometimes this went 3 or 4 tables
deep, the site would take nearly a minute to load a table of products.

~~~
gibybo
I see this pretty often with people who haven't spent much time with
relational databases. I suppose it's understandable since thinking in sets
with SQL is a different paradigm than they are used to, but I even see tons of
PHP/MySQL tutorials online that use this method when they should be using a
join.

~~~
flomo
In the old days, MySQL didn't support foreign key indices, so the LAMP
developer community had this mentality that "Joins are slow" or even "Joins
are evil" and actively encouraged people not to use them. So its not really
surprising this kind of thing is still out there - PHP tutorials are somewhat
infamous for promoting 'worst practices'.

~~~
jaylevitt
Yeah, I know a Rails developer who thinks four-way joins are a design smell.

------
bmf
I'm currently reading "Mastering Relational Database Querying and Analysis" by
John Carlis, which posits that SQL is inherently flawed for several reasons.
To paraphrase from the text:

First, both the syntax and the way querying is generally presented in
textbooks, lead you to think that your task when querying is to _display one
unnamed table_. The author objects to each of those four words.

Second, many people have found querying with SQL terribly difficult. Even
experts find SQL hard to create and read. Do not be surprised if an analyst
struggles to understand his/her own SQL. It is impossible for users to
understand any but the simplest SQL.

Third, SQL practice suffers from the notion of a "correlated query" -- which
has a monolithic subquery that is executed repeatedly via looping, once for
each value of a candidate row picked by an outside SELECT.

The book has much more to say on the topic of SQL before going on offer
relational algebra (built on top of SQL) as a n alternative.

~~~
samuel
I disagree with the readability part. I do heavy use of "WITH" to name my
intermediate steps and comment the tricky parts (as I would do with any other
programming language) and my colleagues find them pretty readable(or that's
what I'm told). That's how I "reverse engineer" such monster queries,
refactoring them in intermediate relations with names. Pretty often the same
subselect is used more than once.

In fact, due the lack of side effects it's much easier to do than with
procedural code.

~~~
niccl
Doh! I must be one of the SQL bunnies everyone else is superior to...

I didn't know about the WITH statement. Thanks for enlightening me.

------
protomyth
A lot of the problems I have seen with queries (other than DBA issues) is the
conflict between application developers and report writers. A lot of databases
are designed for transactions and resources are not often available to do a
proper reporting database or at least summary data.

I have a very simple rule for myself - "if a user of the application is
concerned about a certain attribute or state an element (e.g. person, truck,
plane) is in, then a report will be required showing all elements with that
attribute or state."

If your database design cannot support that rule, then trouble will happen and
you will have serious performance problems.

To give a simple example, suppose you are running a group of storage garages.
You have a table with all your customers, a table with all your storage units,
an assoc table joining customer and units with active flag + date of start,
and a table with all your payments. Good enough to do transactions and figure
out for a unit if they are payed up.

On the other hand, writing the report to tell who hasn't paid is going to be
kind of a pain. It is a simple example, but not much different from what you
find in large systems.

~~~
salvadors
Is your point here that the data being stored is insufficient (e.g. you'd want
an end date, not just an active flag; this doesn't cope at all with prices
changing over time, bulk discounts, or different customers paying different
rates; there's no concept of invoices, or whether payment is due based on
calendar months or based on opening date; etc) or that you're ignoring all
that sort of stuff just to keep the example simple (so assume everyone pays a
fixed rate per unit, due weekly; someone can't close their account until
they're paid up; etc) but that you'd still want a more complex schema so as to
be able to more easily generate a "Who owes us money?" report?

If it's the former, then sure: you need to be able to model all these things
properly. If it's the latter, then I'm not so sure. The SQL to create that
sort of report is going to be non-trivial, but it shouldn't be overly complex
for someone who knows what they're doing, and if you have the correct indexes
it shouldn't take very long to run either.

If you want to start doing all sorts of fancy data warehouse slicing and
dicing, you're usually better extracting daily (or more/less frequent
depending on needs) dumps of your transactional database into a different
structure more suitable for reporting, than in restructuring your 'live'
database and having to deal with all the resulting denormalisation issues,
etc.

~~~
protomyth
As stated, I believe all the information needed to generate the report is
there. My point was that what is sufficient for an application is generally
insufficient for basic reporting. So, when budgets get short, the database
work to make queries by reporting easier is often ignored and complicated
queries and processes become the norm. Those laying the problem solely at the
feet of the DBA's are missing the other group that tends to make these queries
necessary: App Developers. I once had an app developer tell me one type of
invoice was impossible to make because some of the relationships and data were
intrinsic to the application and would need to be modeled the same as the
application. He wasn't far off.

There is a point between transactional and data warehousing that needs to be
hit. Simple summaries or considering the question of "how do I retrieve
everything in this state" will sometimes suffice. Building a system to get
single transactions in an out tends to make routine report take all night or
be impossible to get in a reasonable amount of time.

If your schema and app requires non-trivial SQL, then expect maintenance
nightmares and lack of ability to scale.

~~~
kfool
Not always. An app requiring non-trivial SQL can be an indication of a problem
requiring non-trivial workflows.

If the problem is complex, you want to model relations, not ignore them.

(But clearly not in this hairy query.)

------
pilif
When I've seen this article back in my RSS reader, it reminded me of one
particular query that was generated in the application I'm maintaining. My
irrational fear of sending too many queries to the database (I've outgrown
that in the last 6 years) caused a single query to be generated which was 4KB
in size.

Which of course is much less than the one in the picture, but still very, very
bad.

Some so we've refactored the beast. Now it's 2-3 smaller queries (which are
much easier to optimize for PostgreSQL and, above all, individually cacheable)
which lead to a nearly 100% speedup for common cases. Also, the code is
infinitely more readable which means that it's much easier to extend it.

I'm incredibly happy that we've seen the light and fixed it before it grew to
proportions like the ones on the original article _shudder_

~~~
JBiserkov
4KB? That's nothing. Search for 'media.sql' in any recent Adobe installation
media. You'll find 3 MB+ SQL files, containing:

    
    
      -(BASE64?) encoded InstallerIcon,
      -(BASE64?) encoded EULAs in various languages
      -GUIDs like {01C3BD72-7371-4472-B179-B4DFE6DDD251}
      -and my personal favorite: a 25 KB XML fragment

~~~
gecko
There was a source control system whose code I had the pleasure of reading
several months ago whose favorite way to store data was as a SQLite3 database,
with a single table, with a single column, with a single row, containing JSON.
Words failed me.

Based on what you're describing, I now believe those developers were poached
from Adobe.

------
jcromartie
People in that thread are bragging about their 10-page queries with 20 joins
or 8 unions.

I'm looking at a query here that is 37 printed pages, with 92 joins over 25
unions.

~~~
einhverfr
I think at this point I am bragging that I have never written one. Ok, I have
used views of views, but.......

I have, however, had the misfortune of troubleshooting those 10 page queries.
Finding a stupid typo in one of those is like looking for a needle in a
haystack.....

------
topbanana
My first assignment in my first ever job working for a 'proper consultancy'
was to babysit an overnight process which was a SQL Server stored procedure.
Back in those days they had a 64k limit, so it was split into 3 or 4 sections.
It took around 12 hours to run.

I'd like to say I rewrote it, but I didn't. I just left.

------
pork
Reading the comments below, I get the impression that all the "good" DB people
hang out on HN, not like those "other" incompetent nits out there who don't
know what a join is. Hubris, people.

~~~
funkah
Maybe HNers are just better at keeping their mouths shut when they don't know
anything about a topic.

~~~
gaius
I guess you're new around here ;-)

------
mgl
Really nasty piece of SQL code, definitely not for human-based processing.
What do you think about tools that may decipher and visualize such complex
queries in a more structured way, like DBClarity
(<http://www.microgen.com/dbclarity/>)? Have you been using something similar
recently?

(disclaimer: I work for mcgn)

------
harryh
The original version of foursquare.com contained a lot of stuff like this
(though not as epicly bad). It was a very small amount of poorly written PHP
code surrounding a bunch of unreadable SQL statements. It's amazing that it
worked at all.

Dens is a great guy, but I hope I never have to rewrite his code again.

------
nithinbekal
I was just trying to figure out a stored procedure that queries one table,
loops over the rows, and within the loop queries another table using the
values from the first query. Now, looping over these rows, it has a third
query and a corresponding loop over those rows.

And all that for inserting the values taken from the three tables into a 4th
table. This could have been done with a simple 3-table join query. Hell, it
could even have been done with a single insert statement! I wonder how people
fail to recognize an N+1 selects problem when it's staring them in the face.

Well, to be fair, this problem I described isn't exactly an N+1 problem is it?
More like an N(M(L+1)+1)+1 selects problem. ;-) (Unless I've got my math all
wrong there?)

How I hate working with PL/SQL stored procedures! :(

------
OiNutter
Reminds me of the stock update system for one of our major clients at my first
job. The predecessor of myself and my colleague had thought it a brilliant
idea to build a clothing ecommerce site, with a complete list of all stock
going back to the year dot with ASP and Access (that's Classic ASP, not .NET).
Towards the end the stock update would take pretty much an entire afternoon to
run.

Eventually we got the approval to change to MySQL for the database. When they
ran the first stock update with the new version they rang us up to check it
had worked because it was near instantaneous.

The moral of the story: Access is BAD! VERY BAD!

------
jswinghammer
I've seen and had to debug longer stored procedures for sure but never a
single query. I can't see the last page so maybe this is a stored procedure.
It's hard to tell.

------
bialecki
I worked for a company where there were queries somewhat like this, however
they were obscured because they would create views on the fly. So a query
would look deceptively simple only to realize (not exaggerating here) there
were four levels of views underneath it. Bugs were a pain, but the worst was
trying to optimize those queries. Just untangling what the actual query was
made life really difficult.

------
jakejake
I've seen SQL that looked like this but didn't wind up being very complicated.
I've also seen seemingly simple queries that were actually very tricky!

I can't read much of the query, but at least a few lines are checking for null
values. I wouldn't be surprised if 80-90% of the query is simply output
formatting. Depending on the DB platform, some formatting and null-check
statements are fairly verbose.

------
LarryMade
People can write queries that large without formatting? or was that the result
of some query generation application?

~~~
snorkel
It's got to be from a point-and-click Query-O-Matic tool.

~~~
josephcooney
Or an ORM.

------
kleiba
Please forgive me, this is OT: can anyone here recommend a good online
resource for learning SQL "the hard way"?

~~~
pajop
<http://www.db-class.org/course/video/preview_list>

------
protomyth
I will say, Ingres was not my favorite database, but its query plan display
should be used to explain how a database query works. It showed a tree of
operations for each query. If you saw FSM (full sort merge) or Cartesian
Product you better mean them or re-write the query.

------
mwexler
I'm pleased that most of the commenters recognize that SQL has a need and is
it's own language, for good and bad. I really expected to find a troll popping
out "NoSQL rulez" type comments, and the level of understanding of how and
where SQL can help is very encouraging.

------
jpadilla_
Soooo... what is it supposed to do? Looks like a million sub-selects and
joins! Already have a headache just with looking at it. I'd probably right it
all over again from scratch.

------
philjackson
A place I used to work at used an ORM which gradually constructed SQL
throughout the flow of a request. One of the calls we generated was probably a
couple of pages long.

------
acangiano
I'd love to see an EXPLAIN on that. ;)

~~~
ironchef
__________________________* 1. row __ __ __ __ __ __ __ __ __ __ __ __ __* id:
1 select_type: SIMPLE table: lots_of_em type: not_good possible_keys: none
key: none key_len: n/a ref: NULL rows: googol filtered: 0 Extra: You're
screwed, Do not pass go.

~~~
stevejalim
This is an old link, but you just reminded me of:
<http://howfuckedismydatabase.com>

~~~
MattBearman
That's awesome, especially this - <http://browsertoolkit.com/fault-
tolerance.png>

------
atsaloli
htsql (www.htsql.org) is a business reporting language -- one line in htsql
can generate 5 or 6 lines of SQL.

This query could be condensed considerably if rewritten in htsql.

(htsql automatically generates SQL code that covers all corner cases and
executes faster than hand-crafted SQL.)

------
dos1
While we're sharing horror stories...

I once encountered a stored procedure that returned HTML in a result set. It
literally created the UI of a webpage. It returned several columns of HTML
that the app would place in strategic parts of the page. Well, as years went
by, the app required a more innovative and web 2.0 UI. Rather than remove the
HTML from the sproc, more columns were returned with more HTML, Javascript and
the like. One time I had to fix some Javascript that rendered on the page.
When I finally found where the errant JS was coming from, I realized I had to
file a database change ticket to fix the UI :)

~~~
spydum
Sounds like Oracle's Portal product?

~~~
bni
I would guess Oracles mod_plsql

------
hackermom
See, this kind of crap is what happens when you have the programmers sit on
the bench and let the "engineers" take charge. Why do you hate society, you
Luddites?!

------
emehrkay
This is kinda impressive.

------
bni
Im sure 12 pages of Java code doing this procedurally instead is much more
maintainable.

------
yuvadam
Sorry, I call BS.

I might have just been lucky enough to always work at professional companies
and startups where this kind of stuff can never happen.

But something tells me there is no reasonable way an SQL query can grow to
these proportions.

~~~
ianterrell
All the good techniques and technologies you use now? Yeah, those were
invented because the way a lot of people used to do things was terrible.

I'm not old enough to have built systems like that, but I have inherited some
for maintenance and debugging. That's not likely to be BS. Back in the day "do
it in a stored procedure" was common advice. All the logic for untold
applications and millions of lines of code lived at the database level.

Count yourself lucky perhaps to have entered the market when you did, but
don't discount that you're standing on tall, sometimes ugly, shoulders.

~~~
flomo
There's still plenty of DBAs out there preaching "do it in a stored
procedure", because in theory it lets them do things like changing
normalization without affecting the application interface. (In practice, I've
never seen happen that way.)

But in general, people did thing that way because there wasn't any common
middleware or services interfaces. The RDBMS was the "app server".

