
Show HN: SQLCheck – Automatically identify anti-patterns in SQL queries - jarulraj
https://github.com/jarulraj/sqlcheck
======
bsg75
The tests in this read like someone who has only used one SQL dialect, and
wanted another project to show off C++ abilities. Its full of bad advice that
does little to help developers write better code.

CheckJoinCount: “Consider splitting up the complex query into many simpler
queries, and reduce the number of JOINs” - And offload the additional work to
where, to the client?

CheckSpaghettiQuery: Character count > 500, again "Split up a complex
spaghetti query into several simpler queries"

"Rewriting the query’s HAVING clause into a predicate will enable the use of
indexes during query processing." \- On what engines, and in what
circumstances?

~~~
Twirrim
> CheckJoinCount: “Consider splitting up the complex query into many simpler
> queries, and reduce the number of JOINs” - And offload the additional work
> to where, to the client?

I'm sure, like many, I've seen both extremes of this. It seems like most
developers I've worked with have stopped and generally thought about what they
were doing with their queries, but there are those few that leave you
bewildered.

1) Developer who produces a software package used by a fair number of counties
across the US. They're essentially one of, if not the only, company working in
that field. We acted as a go-between between the state and the software
company.

There are a set of standard reports included with the software, each of which
amounts to little more than a very basic PDF wrapper around an SQL query. A
huge, ugly turd of a query. One that they created through some SQL GUI,
whenever they created the report.

They hadn't profiled any of them or spent any time tuning them. For a large
number of them it took quite a while, even with relatively small numbers of
records. The state my employers worked with didn't do things on a small scale.
In one query's case it would take upwards of 45 minutes to produce a report
under optimal conditions. The indexes were great for the general workload, but
lousy for producing the report via that query.

They also clearly hadn't thought about the end user experience. Nothing on the
page to indicate progress (hard to demonstrate progress when you're waiting on
one monolithic query). It wasn't set up as a nice asynchronous report
production either.

While I didn't really have time for it, we wanted to be as helpful to the
agency in question as possible, so I took a stab at one report, using a quick
python script. Breaking the underlying query apart into a few distinct
components, and paying attention to the indexes resulted in 1) producing the
report in under a minute. 2) progress bars.

Trying to wrestle that change in approach into the monolithic query was way
too complicated for pretty much zero benefit.

2) At another company, a co-worker kept having to reject a deployment request
from one particular developer (this was quite a while ago, the company used
the waterfall method).

Littered throughout their code was "SELECT * FROM table", followed by
implementing _all_ filtering logic inside their code. It got to the point
where we celebrated the dev at least putting code forwards that included a
WHERE clause: "SELECT * FROM table WHERE", even if we still had to reject it
(due to the breaks-on-every-schema-change "SELECT *"). Over and over again we
demonstrated that the database server was quicker than his code, and really
hoped he'd eventually learn. That was one of the few times I really, really,
really wished a team would either use an ORM, or fire the developer.

~~~
jarulraj
Thanks for sharing these interesting anecdotes.

------
Achshar
[https://github.com/jarulraj/sqlcheck/blob/master/docs/logica...](https://github.com/jarulraj/sqlcheck/blob/master/docs/logical/1004.md)

Strongly disagree.

Table columns should always be prefixed with table name. So users.user_id is
one of the most hated patters for me. user.id looks _so_ much cleaner. No one
in their right mind should ever write the column name by itself. Especially
with queries that use more than one tables.

~~~
itsdrewmiller
It's pretty annoying to do joins and have to alias all your columns because
their names are ambiguous. Also in my experience most people use very terse
table aliases so it's not always obvious what "u.ID" "uc.ID" "puc.ID" etc. are
referring to.

~~~
combatentropy
You don't need to alias "all" of your columns, just about 1 out of 10. And I
don't understand the complaint about terse table aliases being non-obvious.
The answer key is right below the select clause.

------
MarkusWinand
How about "do not use offset for pagination"?

[http://use-the-index-luke.com/no-offset](http://use-the-index-luke.com/no-
offset)

~~~
yellowapple
I feel like the author unjustly brushes aside the main benefit of using
offsets for pagination (that is: the ability to query a specific page of
results). Yeah, most page-based UIs do a lame job at it, but it's nice to be
able to say "yeah, the record you want is on the fourteenth row of page 9".

If that's not a relevant use case for your application, though, then by all
means use key-set pagination.

As a side note, I happen to dislike most implementations of infinite scrolling
(especially when the keep pushing down the page footer, making it impossible
to actually read or click on anything down there... GRRRR!).

------
snake_plissken
Interesting.

Some bones to pick:
[https://github.com/jarulraj/sqlcheck/blob/master/docs/query/...](https://github.com/jarulraj/sqlcheck/blob/master/docs/query/3013.md)

First, dat implicit, inner join. Second, it could be written more clearly
that, in this case, the reason you want to use a join is because you are not
sure that the sub-query returns at most one row.

~~~
jarulraj
Can you please open an issue or send a PR to update the documentation? Thanks!

~~~
snake_plissken
Yup sure thing!

------
m-j-fox
If you know any devs at Tableau, please forward this to them.

~~~
jarulraj
Thanks for the suggestion. I unfortunately don't, but this tool can certainly
be integrated with ORMs and in other BI stacks.

------
sebazzz
Sounds good in theory. but I wonder how well it works for any SQL dialects. I
write SQL in a dialect like T-SQL usually. The tool would probably not
recognize the brackets or different join syntax.

~~~
jarulraj
By design, the tool is dialect-agnostic.

