
You Can Do It in SQL, Stop Writing Extra Code for That - geshan
https://geshan.com.np/blog/2018/12/you-can-do-it-in-sql/
======
ThJ
It's funny that this even needs to be a blog post. I can see why simple key-
value stores have become so popular in recent years. Developers don't want to
learn SQL properly, and end up using the database as if it were a key-value
store, and at some point, they begin to question why they need SQL at all, and
switch to NoSQL. Later, when they do need more sophisticated querying, they
then have to slap Elastic on top of their key-value store to get that feature
back, when they could've been using SQL all along. Full circle!

~~~
ben_jones
I agree with what you're saying but I think its important to point out that
there is a Developer experience in most mainstream languages and then there is
the Operations experience of deploying, configuring, maintaining, scaling, and
paying for, various RDBMS.

Developers grow up crashing their programs. Crashing your DBMS (or similiarly
messing it up) is not as friendly or immediate in feedback/consequence, so
people shy away from it.

The end result is that many developers are influenced to use code more and
databases less.

~~~
0db532a0
Elasticsearch quite conveniently makes it so difficult and fucking annoying to
do data analytics with their pile of turd custom query un-language shambles
that it leaves anyone stuck with it rightly reinventing oblong wheels with
diamond bearings lubricated with iron filings in code. Aah, but dev-mis-ops
say that it’s easy to deploy and maintain on AWS. It scales too! Well, at
least the whole set-up keeps everyone employed.

~~~
KennyCason
Could you elaborate more on your specific ES problems? Having worked with
Elasticsearch extensively for many years, and do not share this sentiment.

Many of the ES queries fundamentally aren't meant to have a 1-to-1 equivalent
in SQL as ES's goal is to specialize as a distributed search index, and not a
general relative database.

If your point is that ES is being more frequently used as an alternative to
just writing/scaling/using a SQL service to it's full potential, then I can
somewhat understand the sentiment.

~~~
0db532a0
I completely agree with the idea that ES might be good as a distributed search
index. I really wish people would leave it at that.

The company where I last worked had a main product which would send in
readings every second. The only interface to these readings was Elasticsearch.
dev-mis-ops were convinced that this was a good idea. Even the front end would
send its queries to Elasticsearch to display statistics.

My task was to look at the average energy usage over one year for the seven
days in the week divided into fifteen-minute intervals. This is literally just
a SELECT MOD GROUP BY in SQL. I am not even talking about more advanced SQL
features here like windows, transactions, joins and so on.

I tried for ages looking through terrible documentation trying to get it to
work, at which point I thought I’d have one of the resident ES experts take a
bit of his own medicine. It took about five attempts of back and forth over
two days to get it right.

This wasn’t the only query we had problems with. At some point, I just asked
them to give me a CSV interface so I could do it in AWK. You can’t even
download CSVs of data sets without some external tool! They were still
convinced that ES was fit for purpose.

------
nycdotnet
I understand where the author is coming from, and for purposes of aggregation
or filtering, they are correct; it’s often better to do those things in SQL.
There are two problems, though. CPU and IOs on your DB server are likely your
organization’s most scarce resource, and SQL is often hard to test and has
unpredictable performance characteristics (cliffs, blocking, etc). By all
means do the things you must do to get your app servers the data they need
with the fewest IOs and bytes across the wire possible, but please do your
string concatenation and sorting (when not using TOP or LIMIT) on cheap app
servers that you can always deploy more of.

------
wolfgang42
Relational databases are brilliant, but SQL is terrible and I don't understand
how it hasn't gone the way of COBOL yet. For a start, the bizarrely English-
oriented structure forces a completely different ordering of the code than
what comes out, and the syntax appears to completely ignore the underlying
tabular concepts. Not to mention that IMO joining on foreign keys shouldn't be
a thing--the database should be able to figure out what you want when you
write `order.customer.address.taxRegion.rate` without specifying any joins at
all. Even if you use an ORM you still have to understand the code it will emit
to avoid building bad queries. There's just so much awkwardness in trying to
write a SQL query that I can definitely see why people would avoid it.

~~~
user2426679
> Relational databases are brilliant, but SQL is terrible and I don't
> understand how it hasn't gone the way of COBOL yet.

Having written extensive amounts of SQL and greater than zero lines of COBOL
(f. you, Lawson), this really couldn't be further from the truth.

> For a start, the bizarrely English-oriented structure forces a completely
> different ordering of the code than what comes out, and the syntax appears
> to completely ignore the underlying tabular concepts.

Every language has keywords; I'm not sure how you expect to write code that
doesn't involve English in some form. Is the concept of `SELECT this_data FROM
this_source WHERE true_conditions ORDER BY columns` really that convoluted?
Sure, it _definitely_ gets more complicated, but is there really a
significantly better way to do it that doesn't involve taking away low-level
control from the programmer, in exchange for something sexier and easier to
pick up?

> Not to mention that IMO joining on foreign keys shouldn't be a thing--the
> database should be able to figure out what you want when you write
> `order.customer.address.taxRegion.rate`

Why can't you have a key-value storage engine that doesn't involve using keys
to look up values?! (Is that really your question?) Okay, just as an example,
what if you have more than one address for a customer? You need a way to tell
the engine which address(es) you want.

> There's just so much awkwardness in trying to write a SQL query that I can
> definitely see why people would avoid it.

While I don't claim that SQL is easy (or always intuitive), saying that it is
"terrible" for the above reasons is like saying that C is terrible because
it's not Pythonic.

~~~
natalyarostova
I'd prefer it if C were more pythonic tbh.

~~~
cultofmetatron
sooo..... go?

------
hunter23
Be careful in using group_concat in mysql - it actually truncates the result
to the max_len setting which is by default 1024 and will not tell you it
truncates the result in the logs. it took us forever to find the cause of this
issue in one of our batch jobs (I wish mysql issued a warning when
truncating):

"In MySQL, you can get the concatenated values of expression combinations. To
eliminate duplicate values, use the DISTINCT clause. To sort values in the
result, use the ORDER BY clause. To sort in reverse order, add the DESC
(descending) keyword to the name of the column you are sorting by in the ORDER
BY clause. The default is ascending order; this may be specified explicitly
using the ASC keyword. The default separator between values in a group is
comma (,). To specify a separator explicitly, use SEPARATOR followed by the
string literal value that should be inserted between group values. To
eliminate the separator altogether, specify SEPARATOR ''.

The result is truncated to the maximum length that is given by the
group_concat_max_len system variable, which has a default value of 1024. The
value can be set higher, although the effective maximum length of the return
value is constrained by the value of max_allowed_packet. The syntax to change
the value of group_concat_max_len at runtime is as follows, where val is an
unsigned integer"

[https://dev.mysql.com/doc/refman/8.0/en/group-by-
functions.h...](https://dev.mysql.com/doc/refman/8.0/en/group-by-
functions.html#function_group-concat)

~~~
geshan
There is a setting to make it longer.

~~~
hunter23
Yes this is mentioned in the documentation I quoted.

------
zzo38computer
SQL is a real programming language (even though, many people seems to not
think so). I use SQLite, although there are other SQL implementations (with
different features) too.

It is true many thing can be done well with SQL, whether you call from other
programs or use standalone.

Whether it will be faster or not to use SQL probably depends on what SQL
implementation is in use, whether the indexing is set up properly, and what
programming language you are calling it from.

Also, much things can be done with a shorter code in SQL if you are doing such
things as sorting the result set.

------
lsiebert
Some things are harder to do in SQL then in code. For example, assigning a
rank based on a field in MYSQL. I did this for production code in a rails app,
and it wasn't just ugly (requiring separate DB calls for variable
initialization and output and updating) but it ran slow enough that it had to
be scheduled for off peak time, and it requires temporary variables (or you
can do it with whole temporary tables, I believe).

[https://stackoverflow.com/questions/3333665/rank-function-
in...](https://stackoverflow.com/questions/3333665/rank-function-in-mysql)

That's not to say that I would do it in Ruby if I had it to do over again, I'd
probably use Redis, but still, sometimes SQL isn't as fast as other solutions.

I like SQL. Optimizing DB calls to speed up stuff is fun for me. But it's not
always the right choice.

(There's probably a clever and fast way to do it in postgres, but DB choices
were not mine to make).

edit: Incidentally, if you just need to know 1 record's rank at a time, like
getting a player's position on a leaderboard, instead of including each record
in the DB, you can do a select on scores (or distinct scores) greater then the
current record's score and add 1, and that can be much faster.

------
catchmeifyoucan
I think the problem is with Excel sheets and CSVs. We aren't taught how to
manipulate data otherwise without the use of code. SQL seems to have immense
overhead. Set up a db, import data, etc. Although it makes sense, it's an
afterthought. We know how to write scripts, but we don't necessarily need to
learn SQL to get the job done.

------
craftyguy
This discusses the "how" but not the "why." I'm not a SQL expert, so I find it
far easier to maintain my code in a language I am familiar with than to
maintain (seemingly cryptic) SQL equations, queries, etc. Is there a
quantifiable performance gain with moving more operations to SQL rather than
implementing it in my language of choice?

The conclusion states:

> Exploit the power of SQL to write less code because “the best code is the
> code that was never written”. If it is not written there is no need to
> maintain it.

But they're wrong, since you still need to maintain all of the SQL
implementations...

Edit: I previously said "I'm not a DBA", but some folks seem to be getting
hung up on that rather than the rest of my comments/questions.

~~~
thecopy
There is a significant difference between being a DBA and knowing SQL

Regarding your question:

> Is there a quantifiable performance gain with moving more operations to SQL
> rather than implementing it in my language of choice?

It depends, of course, but in my experience the answer is yes for 90% of the
cases.

~~~
notahappycamper
Is this due to query optimization, or something else?

~~~
vonseel
I’d love to see an answer from someone more experienced on this as well.

In my experience, the database is just faster than application code at most
stuff and queries can be written in a way that similar logic can’t be
optimized at the application level. Additionally, database code is usually
(always?) going to be faster than something like Python/Ruby.

Code review is more difficult at the database level, however, and SQL talent
is more rare. It may be more challenging to scale and maintain a team of
engineers working on a codebase with a lot of application logic implemented in
stored procs/SQL versus ORM code (such as active record or Django). Just my
opinion, have worked on teams with ORM usage as well as separate DBA teams and
all logic in SQL. SQL by hand is more powerful and flexible, surely, but for
most use cases I would prefer to keep code in one place (application) and only
reach for the tool (raw SQL) when needed for optimizations.

------
AndyPatterson
Off the top of my head I'm thinking that there's a serious lack of thought in
these use cases, especially string manipulation which I could imagine becoming
painfully slow as the database grows; any manipulation of the projection like
this is going to incur a large performance overhead.

It would be quite good to see this blog post, or a response to it, evaluating
the SQL codes performance on different datasets.

