
SQL: One of the most valuable skills - duck
http://www.craigkerstiens.com/2019/02/12/sql-most-valuable-skill/
======
slap_shot
SQL is one the most amazing concepts I've ever experienced. It's nearly 5
decades old and there is no sign of a replacement. We've created countless
other technologies to store and process data, and we always seem to try to re-
create SQL in those technologies (e.g. Hive, Presto, KSQL, etc).

I run a early stage company that builds analytics infrastructure for
companies. We are betting very heavily on SQL, and Craigs post rings true now
more than ever.

Increasingly, more SQL is written in companies by analysts and data scientists
than typical software engineers.

The advent of the MMP data warehouse (redshift, bigquery, snowflake, etc) has
given companies with even the most limited budget the ability to warehouse and
query an enormous amount of data just using SQL. SQL is more powerful and
valuable today than it ever has been.

When you look into a typical organization, most software engineers aren't very
good at SQL. Why should they be? Most complex queries are analytics queries.
ORMs can handle a majority of the basic functions application code needs to
handle.

Perhaps going against Craig's point is the simple fact that we've abstracted
SQL away from a lot of engineers across the backend and certainly frontend and
mobile. You can be a great developer and not know a lot about SQL.

On the other end of the spectrum are the influx of "data engineers" with basic
to intermediate knowledge of HDFS, streaming data or various other NoSQL
technologies. They often know less about raw SQL than even junior engineers
because SQL is below their high-power big data tools.

But if you really understand SQL, and it seems few people truly today, you
command an immense amount of power. Probably more than ever.

~~~
nostrademons
"we always seem to try to re-create SQL in those languages (e.g. Hive, Presto,
KSQL, etc)."

This is largely because of the number of non-programmers who know SQL. Add an
SQL layer on top of your non-SQL database and you instantly open up a wide
variety of reporting & analytics functionality to PMs, data scientists,
business analysts, finance people, librarians (seriously! I have a couple
librarian-as-in-dead-trees friends who know SQL), scientists, etc.

In some ways this is too bad because SQL sucks as a language for many reasons
(very clumsily compositional; verbose; duplicates math expressions & string
manipulation of the host language, poorly; poor support for trees & graphs;
easy to write insecure code; doesn't express the full relational algebra), but
if it didn't suck in those ways it probably wouldn't have proven learnable by
all those other professions that make it so popular.

~~~
radicalbyte
SQL is amazing IF you understand it. You need to think in sets of things. It's
like functional programming paradigms or recursion; once you really truly "get
it" you start to feel like a Jedi master.

Unfortunately the vast majority of SQL users aren't that proficient. It's also
fairly hard to learn because it's something that you only pick up with
experience and specifically longer time experience with a sufficiently complex
data model.

I personally would never have gotten good at SQL if I hadn't stayed in my
first two jobs for 5 years each working daily with the data-models and the
domain.

~~~
cr0sh
I was fortunate that my early SE career experience involved a lot of SQL (and
my first experience with a SQL-like query language happened under PICK).

But as my career has gone forward, I'm touching it less and less; today, I
hardly touch it at all outside of my personal usage.

Much of that has to do with the fact that I'm now employed building and
maintaining an SPA using javascript and nodejs where the backend is accessed
thru a RESTful API; we never get to touch the actual database.

The few times I have seen some queries for that DB - albeit not in our API,
though I could probably find them somewhere - all I can hope is that the SQL
engine being used does some kind of query optimization on-the-fly, because
there's so many inner selects that make me cringe it ain't funny (like I
wonder if they've heard of joins and such).

Before, I was involved in a lot of PHP web apps and backend server automation,
where I needed to use SQL a lot; I feel like I am getting rusty in it.

------
JimmyAustin
My first job out of university was on an analytics team at a consulting firm
(big enough that you know them) that used MS SQL Server for absolutely
everything.

Data cleaning? SQL. Feature engineering? SQL.

Pipelines of stored procedures, stored in other stored procedures. Some of
these procedures were so convoluted that they outputted tables with over 700
features, and had queries that were hundreds of lines long.

Every input, stored procedure, and output timestamped, so a change to one
script involved changing every procedure downstream of it. My cries to use git
were unheeded (would have required upskilling everyone on the team).

It was probably the worst year of my life. By the end of it I built a
framework in T-SQL that would generate T-SQL scripts. In the final week of a
project (which had been consistent 60-70 hour weeks), the partner on the
project came in, saw the procedures written in my framework and demanded that
they all be converted back into raw SQL. I moved teams a few weeks later.

The only good bit looking at it, is that now I'm REALLY good with SQL. It's
incredibly powerful stuff, and more devs should work on it.

~~~
beefield
> My cries to use git were unheeded (would have required upskilling everyone
> on the team).

What is the best practice workflow using git with SQL server views/procedures?
Can you actually somehow track changes in the views/procs themselves so that
if someone happens to run ALTER VIEW, git diff is going to show something?

~~~
dwd
Every conversation I've ever had came back to using RedGate SQL Compare to
diff databases and TeamCity for CI.

You basically shouldn't allow anyone to modify anything without it being
scripted (bonus points if it comes with a rollback and is repeatable for
testing). Your scripts then all go into Git.

~~~
RegBarclay
RedGate's SQL Change Automation (formerly ReadyRoll) is pretty slick. DbUp is
a good, free alternative.

I second the rollback and repeatability bonus. Every script should leave the
database in either the new state or the previous good state no matter how many
times it's run.

~~~
beefield
> Every script should leave the database in either the new state or the
> previous good state no matter how many times it's run.

I wonder if there were somewhere a website to describe good idioms to achieve
this?

~~~
dwd
Where I've seen this process work, testing for repeatability was part of the
peer review process where any sql scripts to be reviewed were executed.

Having a second person run a script is the best way to ensure no mistakes.

------
max76
I spent a year in a role where 50% of my duties was writing sql reports. These
reports where usually between 500 and 1000 lines of sql a pop. Sometimes the
runtime of the report was measured in hours, so learning efficient sql was
important. The company had a lot of people that had been writing sql for
awhile, and there were lots of cool code snippets floating around. I learned a
lot in that year.

I've moved to writing backend code. I'm surprised most of my peers cannot
write anything more complicated than a join. Most people are perfectly happy
to let the orm do all the work, and never care to dig into the data directly.
Every once in a while my sql skills save the day and several people in other
departments contact me directly when they need excel files of data in our
database we don't have UIs to pull yet.

~~~
chubot
Once your SQL gets into 500-1000 lines, and hours of runtime, I would suggest
using data frames instead (in R or Python).

I wrote this post to introduce the idea:

 _What Is a Data Frame? (In Python, R, and SQL)_
[https://www.oilshell.org/blog/2018/11/30.html](https://www.oilshell.org/blog/2018/11/30.html)

It's often useful to treat SQL as an extraction/filtering language, and then
use R or Pandas as a computation/reporting language.

I think of it as separating I/O and computation. SQL does enough filtering to
cut the data down to a reasonable size, and maybe some preliminary logic. And
then you compute on that smaller data set in R or Pandas -- iterating in
SECONDS instead of hours. The code will likely be shorter as well, so it's a
win-win (see examples in my blog post).

I can't think of many situations where hours of runtime is "reasonable" for an
SQL query. In 2 hours you could probably do a linear scan over every table in
most production databases 10-100 times.

For example, if your database is 10 GB, you could cat all of its files in less
than 5 minutes (probably much less on a modern SSD). In 2 hours, you can do a
five minute operation 24 times. I can't think of many reports that should take
longer than 24 full passes over all the data in the database (i.e. pretending
that you're not bothering to use indices in the most basic way). If it takes
longer than that, the joins should be expressible with something orders of
magnitude more efficient.

I've mainly worked with big data frameworks, but I think that almost any SQL
database (sqlite, MySQL, Postgres) should be able to do a scan of a single
table with some trivial predicates within 10x the speed of 'cat' (should be
within 2x really). They can probably do better on realistic workloads because
of caching.

~~~
busterarm
This isn't meant to offend, rather as a point of consideration, but seeing
your example use case being 10GB and then talk about big data frameworks makes
it hard for me to take this advice seriously.

I might reach for that kind of tooling at the hundreds of TB to PB scale, but
in our production applications we have _tables_ that are multiple terabytes.
SQL is just fine.

Yes, we also have have queries that run in the timescale of _hours_ and they
are always of the reporting/scheduled task variety and absolutely vital to our
customers. Long running reporting queries are pretty acceptable (and pretty
much the norm since forever) outside of the tech industry and your customers
won't balk at it.

~~~
chubot
It seems like you misunderstood what I wrote. I'm saying you should consider
using R or Python if your reports are taking a long time, not big data
frameworks.

Big data was a reference to thinking about the problem in terms of the speed
of the hardware. If it's 1000x slower than what the hardware can do, that's a
sign you're using the wrong tool for the job.

Getting within 10x is reasonable, but not 100x or 1000x, which is VERY COMMON
in my experience. These two situations are very common:

1) SQL queries that are orders of magnitude slower than a simple offline
computation in Python or R (let alone C++). The underlying cause is usually
due to bad query planning of joins / lack of indices.

You might not have the ability to add indices easily, and even if you did,
that has its drawbacks for one-off queries.

2) You need to do some computation that's awkward inside SQL. Statistics
beyond basic aggregations, iterative computations (loops), and tree structures
are common problems.

~~~
bonesss
ETL pipelines that hop between different kinds of storage/computing platforms
to exploit local maxima, like you're pointing out with R and SQL working in
concert, is pretty common in companies working their way up to BigData and
academia.

From the Enterprise side I think too many developers have an unfounded
expectations around data storage technology. There's this unchallenged belief
that monolithic datastorage that will solve thier problems across the entire
time/storage/complexity spectrum. By bringing multiple tools to bear, instead,
you end up with more purpose built storage but far less domain impedence.

Slapping a denormalized NoSQL front-end for webscale onto a legacy RDBMS can
be a cheap win/win to maximize the capabilities of both. SQL + R is oodles
better than R or SQL in isolation.

------
kumarvvr
SQL is a mind bender for me. I do a lot of work on Data, use python and pandas
to do a lot of data magic, but the problem with me is my mind is too
procedural in thinking.

\- Step 1 \- Step 2 \- Loop through results in Step 2 \- Curate and finish
output.

I try very hard to transform the above steps into an SQL statement spanning
multiple tables, but always fail and I usually fallback to python for manually
extracting and processing the data.

Does anyone else face this problem?

Any suggested guides / books to make me think more SQL'ley ?

~~~
ramraj07
I've thought about the same problem and think I have an explanation. SQL
inherently avoids operations that cannot scale. This means that while the same
operation can be written much more simpler as code if you're doing with pandas
or spark, trying to do it with SQL would be less forgiving if your underlying
data schema is not optimal or if your logic isn't scalable. But more often
than not, the SQL query will run faster and more reliably given similar
amounts of compute power to the two engines.

Of course you can always screw up and write suboptimal SQL that will take
forever to finish, but it's just harder to do unless you're really trying to
be dumb about it.

So I do think there's still a lot of merit in trying to learn and use SQL
more.

~~~
sixo
It also ignores operations of type (set of rows) => (set of rows) or (row) =>
(set of rows). and makes you jump through hoops of CTEs to implement them
except in some special cases. Limited versions of these (flatMaps) work just
fine in spark and have no problems scaling.

------
hardwaresofton
SQL is the most powerful query language ever invented and widely implemented
over relation databases IMO (qualified heavily for lurking RDF zealots). Every
time you see someone start to invent their own query language, I almost always
mark is as folly (similar to when people invent their own configuration
languages). Prometheus and GraphQL stand out as recent examples.

DBs like Kafka who recognize this and instead offer SQL on top of their things
take the right approach IMO KSQL.

~~~
Scarbutt
_SQL is the most powerful query language ever invented_

No, its Datalog. Joking aside, they are equally powerful but one could argue
about explicit vs implicit joins.

~~~
hardwaresofton
No idea why this is getting downvoted, people must really dislike Datalog.

A recent up-and-coming software tool is the Open Policy Agent[0] and it uses
Datalog -- though it's a custom implementation. This comment felt familiar so
I went back and looked and it's come up on HN before[1].

[0]: [https://www.openpolicyagent.org/](https://www.openpolicyagent.org/)

[1]:
[https://news.ycombinator.com/item?id=18156144](https://news.ycombinator.com/item?id=18156144)

------
dhbradshaw
SQL is nice on a surface level and helpful in practice.

Having a working intuition for relational databases is valuable on a deep
level. I mean having a sense of how to organize the tables, what sizes are
large and small, when to add what kind of index and what the size and speed
limits are likely to be for a given data structure. That's extremely valuable.

BTW, we're preparing to move a postgres database that's a few TB from Heroku
to AWS RDS. The catch is that we can't afford more than a few minutes of
downtime. If this is in your wheelhouse, reach out! We'd like to talk.

~~~
jweir
Ok, we had a MySQL DB around 500GB, and not on Heroku, but we used the AWS
Database Migration Service:
[https://aws.amazon.com/dms/](https://aws.amazon.com/dms/)

We had just a few minutes of downtime.

I did have few issues around permissions, but an AWS support person walked me
through the process and we got it up and running. The support was great, and
we just have standard business support.

EDIT: DMS might be a no go, as jpatokal points out. While Heroku allows
external connections, it doesn't allow super user access which, AFAIK, DMS
requires.

[https://devcenter.heroku.com/articles/heroku-
postgresql#conn...](https://devcenter.heroku.com/articles/heroku-
postgresql#connection-permissions)

~~~
jpatokal
Heroku Postgres doesn't support external replicas, which complicates
migrations out considerably. If you can't tolerate more than a few min of
downtime for writes, you pretty much have to build the replication at the app
level.

------
triplee
I've been saying this for years, because I started doing web apps when we
pretty much were sending raw SQL over to a database. Even after ORMs got
bigger, I still stress knowing SQL even if you don't use it often, because it
helps you understand the ORM.

And oddly enough, now I'm a data engineer. Everything I do, even when I'm not
doing pure SQL, is influenced by years of experience in SQL. Either the
languages of my big tools are still based on SQL in some fashion, or it simply
helps to have an understanding of the ecosystem to figure out what's going on
at scale.

Everything else has come and gone and come again, but SQL has help up pretty
nicely. The only other skills that come close for me are working in languages
that branched out of C (because the syntax isn't so different over time) and
years of on again off again procedural languages (ColdFusion, vanilla
Javascript, etc. leads to it being way easier to pick up Python).

------
teej
For those trying to up their SQL game from basic selects and grouping, I
suggest learning:

\- Common Table Expressions (CTEs). They will equip you to write clean,
organized, and expressive SQL.

\- CREATE TABLE AS SELECT. This command powers SQL-driven data transformation.

\- Window functions, particularly SUM, LEAD, LAG, and ROW_NUMBER. These enable
complex calculations without messy self-joins.

Learning those three will change your relationship with SQL and blow the door
open of problems you can solve with the language.

~~~
nimajalali
I’d add LATERAL JOINs to the list.

For those unaware of LATERAL JOINs, it’s like a for each loop in SQL. Blew my
mind and opened up so many possibilities.

------
GolDDranks
I consider myself functional, but not proficient in SQL. Is there a reliable
and easy way to assess one's skills in SQL? I feel that most of the new things
I learn about SQL these days are database specific. I wonder whether I'm
missing out on something, or do I already have the "core" SQL knowledge down,
and everything else is special cases?

~~~
AlisdairO
(self plug) take a look at pgexercises.com . It's Postgres-focused but mostly
cross-platform, and I think it covers most of the important stuff.

~~~
GolDDranks
Thanks, I'll check it out!

------
vietvu
The more queries I write on Elasticsearch, the more I value SQL.

~~~
rb808
Me too. I spent years complaining about databases and all the structure and
rules and inflexibility. It was initially great to go to NoSQL backends where
suddenly you're free. Now I really miss them :(

~~~
vietvu
Well, at least I can do simple query with SQL and Elasticsearch and on Hive.

------
truth_seeker
Hell Yes. Although it takes some time to switch from Procedural or OOP to
highly Declarative world of SQL, but man, its worth pursuing. Only thing which
really helped me build my confidence was solving more real world problems
which involves 2NF and 3NF design, JOINs, Triggers, Indexing, Views &
Materialized Views for denormalization, CTE and Recursive CTE's.

You may call me an extremist but from server side programming point of view
with Postgres and FDW (Foreign Data Wrappers) which has ton of features other
than SQL only thing i miss is HTTP server. :)

------
physcab
I’ve been a career analyst for about 12 years so I write SQL every day. I know
it very well. I’ve developed a very strong love / hate relationship with it
over the years.

On one hand, you write what you want to find in fairly common language and you
almost always get back what you want if you do it correctly. In many ways,
it’s like a precursor to Alexa but in written form. It’s super easy to pick up
for non technical people.

On the other hand, it’s extremely difficult to code review, and on a very
complicated piece of business logic, errors could mean the difference between
hiring 10 more people or laying off 100. So almost always it’s just easier to
re-write.

Imagine if engineers couldn’t understand each other’s work, and had to rewrite
code every time someone leaves the team. It’s insane to me that this is
standard practice.

~~~
wvenable
> It’s super easy to pick up for non technical people.

I haven't found that particularly true. I've trained people in SQL and yes,
they can do the basics pretty well, but it requires a particular skill to
actually write effect complex SQL statements. Non-technical people, being non-
technical, generally cannot do that.

> it’s extremely difficult to code review

I don't see why it's any more difficult than any other logic? It's always
important to test.

~~~
physcab
I think the operative word there is complex.

Yes non technical people can’t write complex sql, but they can self service
basic questions easier than learning Python or R for example.

And it’s difficult to code review SQL because complex business logic means
complex data manipulations and those are hard to visualize and comprehend. You
don’t have to think in a 3D space so much when you write python.

------
xpil
SQL is for data systems what IP protocol is for networks: it is the neck of
the hourglass. You can build plenty of various things on it (the top of the
hourglass: applications, reporting frameworks and so on) using various
underlying technologies (the bottom: storage engines etc.) but you can't
remove the neck without breaking the hourglass.

This is why SQL language and IP protocol are two most valuable things in
computer world.

~~~
veritas3241
You've intrigued me with this comment mainly because I know SQL quite well but
IP not at all. Is wikipedia[0] for IP a good reference to dive in a bit? Would
you recommend something else to grok why IP is so fundamental?

[0]
[https://en.wikipedia.org/wiki/Internet_Protocol](https://en.wikipedia.org/wiki/Internet_Protocol)

~~~
xpil
There is not much to grok really. It's just that there's plenty of low level
transmission protocols like Ethernet, PPP or CDP, there's plenty of even lower
level physical media types like copper or fiber or pigeons. Then there's a lot
of higher level things that rely on IP: TCP/UDP, ATP, SPX and so on, and then
the entire universe built upon them. But in the middle there is just IP (v4 or
v6) and that's it.

------
tigerlily
> Because so few actually know SQL well you can seem more elite than you
> actually are.

Thank you Craig, I'm convinced. Anyone know where best to begin learning SQL?

~~~
duck
There are a lot of great free resources out there, but I had recently shared
[https://sqlbolt.com](https://sqlbolt.com) with a friend that wanted to learn
SQL and she found it useful.

~~~
tigerlily
Wow, this is fun, thank you! I completed the Lesson 1 exercises first try!

------
diggernet
In University, I needed one more CS elective course, and the ONLY class
available that fit my schedule was SQL. I had no interest in SQL, wanted
nothing to do with it, and only took the class under duress. Even to the point
of admitting as much to the prof in a casual conversation (he was a good guy,
easy to talk to).

Within two months into my first job out of school, I was assigned to implement
a SQL parser as a modern new interface for an ancient proprietary database.
Every job since, I've written tons of SQL queries. SQL rocks.

Life is funny that way.

------
richpimp
It's very interesting how many application developers shun SQL. I believe a
lot of it is due to a pervading sentiment that SQL is unruly or inelegant.
There are certainly quirks to the language, and it is a difficult transition
to think in a declarative rather than imperative manner, but once you make the
jump, it's an invaluable skill to have. Fortunately, I have the luxury of
being able to manipulate our codebase from any level of the stack. What this
means is that rather than jump through hoops on the API or frontend side to
accomplish some task that would require reams of code, I can just get my
output via a simple query, and it will perform faster in almost all cases.

From an analytics point of view, I can't imagine not using SQL. I've seen
people pull reports from multiple websites, text files, etc., spend an entire
day manipulating them in Excel, and still not get their data model working as
expected, not to mention that it is very slow. A couple of queries with some
temp tables and voila, magic happens. It really does make you look like a
superhero when you can deliver more accurate results in a fraction of the time
it originally took. I'm surprised there isn't more of a market for this skill,
surely there's a lot of programmers from the 80's and 90's who have this
skillset in abundance.

------
johnmarinelli
I'm a frontend web developer, but for a year I worked on a custom ETL system
and wrote lots & lots of SQL. I still find myself using knowledge I learned
back then and applying it today - for example, being able to review BI's
queries is immensely useful. I'm also much less intimidated when I look at
backend code, since a lot of it is interfacing with a database in some way.

I think SQL is one of those essential "secondary" skills for developers.

------
dsego
Reading the comments here is like someone with years of experience with jQuery
saying how jQuery is simple and powerful and nothing will replace it. SQL is
to relational databases what jQuery is to the DOM, only shittier (maybe like
mootools) and refuses to die, probably because all these SQL experts aren't
really good programmers. Reading some compare SQL to 70s style procedural code
tells me they haven't moved on from expert beginner territory.

~~~
FroshKiller
I find that programmers with a chip on their shoulder about SQL have poor SQL
proficiency. And I don't mean "poor" in the sense of not knowing about window
functions. I mean coding a query inside of an enumerator and wondering why
they get feedback about poor performance in production.

------
jlj
SQL has opened a lot of doors for me. It's a starting point, not an end
though. Learning about dictionaries, lists, and other data structures has
proven valuable and compliments SQL and tabular datasets very nicely. I got
into those areas by working with SQL generators (ORM's) written in Python
(airflow).

My thinking on SQL has evolved and lately I see it as a set definition tool.
"Do action X on dataset Y." It's really useful for understanding data
structures and data meaning too.

I've worked with more than a few SDE's who look down on SQL, but it's a really
good tool when used properly, and it cuts across many technologies. Writing
code to write SQL can be very powerful. And sometimes coded or scripted data
wrangling without SQL is very useful too.

15 years ago SQL knowledge was not that widespread and it was easy to get
tagged as a report writer. Today, a lot more business, product, finance, and
accounting people are really strong with SQL, and rely heavily on exporting
data to excel for further analysis. Knowing how to answer business questions,
get insights out of the data, and define or categorize sets of data are all
enhanced by SQL. Report writing is not as much of a thing anymore because
people want to view the data in diverse ways.

The barrier to entry is low with SQL, but learning it well takes time and some
mistakes to get efficient and precise with it. 15 years later I am still
learning new uses for it. One example is JSON querying and transformation
which is supported by hive, presto, and some other compute platforms. It's
easy to mix and match JSON, arrays, and tabular data structures, in one or
more tables, from the same SQL query.

~~~
dswalter
Presto's set of functionality ( ANSI-compliant SQL plus so much more) is a
pleasure to use as a data scientist who does a bunch of analysis. JSON,
arrays, maps, dates, HLL, spatial stuff, rich aggregation functions, etc.

The only real problem with it is that it's easy to get into some thorny-
looking transformations because there is so much to work with.

~~~
jlj
One way I've solved for this in presto (Athena) is creating a table that
flattens the JSON, and views on the table to un-nest arrays or show different
grains of data for different use cases. Then transformations and joins to
other tables become very simple.

This works well for my current area because the JSON has a consistent and well
defined schema.

------
iblaine
I recently had a candidate that had very good Scala skills and very poor SQL
skills. Despite Scala and spark being powerful, you can over complicate a
solution in those languages that could otherwise be easily solved in SQL.
That’s interesting to me because I see SQL as a fundamental tool that you
build on top of and some people these days seems to skip it.

~~~
opportune
Do you know how good is the SparkSQL optimization in e.g. Databricks?
Conversely I have also seen SQL used in overcomplicated ways to replace a more
general programming language.

When _scripting_ in SQL I have seen that in certain SQL flavors there is no
good way to split a large file into k smaller ones for a fixed k without
creating a new column 1-<number of records>, selecting k new tables modulo the
number in the new column, then writing the k files. The way this is
implemented is usually naive so the performance is piss poor due to writing
and subsequently reading so much data.

If this candidate was very good at Scala, assuming they were a Spark
programmer, they could probably do lots of dataframe operations that were as
good or better than SQL commands regarding performance no?

------
vardump
I wonder if we'll one day get something like SQL, but that is based on
hypergraphs instead.

Correct me if I'm wrong; I understand you can represent any relational model
in hypergraph. And a lot that aren't representable in relational model, but
are natural in hypergraph form.

~~~
_bohm
Fascinating; I've never heard this. Can you recommend any reading material on
the topic?

------
manish_gill
SQL is pretty great. I've been doing advance SQL in Postgres and BigQuery for
quite a while and it's one of the most satisfying experiences.

Once you wrap your head around basics such as GROUP BYs, JOINs, CASE
statements etc, you move on to advance concepts such as Window functions and
there suddenly a new world of possibilities opens up to you for analytics!
I've dabbled a bit in PL/pgSQL, but the syntax is way too arcane.

SQL can also be counter-intuitive sometimes. I rewrote a particular PG query
and reduced time from ~30minutes down to 2 seconds by _adding_ a subquery. :)

And no, ORMs can't really do what SQL can do in an equivalent manner.

------
drej
Valuable yet fairly common.

I remember interviewing at FAANG and being asked to code up various tree
traversal algorithms... and moments later I would be asked to write window
function aggregations in SQL. And it was like this for all the interviews with
that company - it was fairly bizarre as I wasn't sure what the aim there was.
I understand that SQL is omnipresent, but surely people with algorithmic
knowledge would be able to pick up SQL in hours or days, while the opposite
doesn't quite hold.

(Oh and I agree with the article and I do like SQL for all sorts of workloads,
that's not the point.)

~~~
saagarjha
> I understand that SQL is omnipresent, but surely people with algorithmic
> knowledge would be able to pick up SQL in hours or days, while the opposite
> doesn't quite hold.

As someone who knows essentially zero SQL, is this really true? How long would
it take me to learn it well enough to be competent with it?

~~~
FroshKiller
In my experience, "algorithmic knowledge" doesn't indicate that you'll be any
damn good at SQL.

I think the time it takes depends on what competence looks like for your use
case. If we're talking about just retrieving records that match certain
criteria, you might get over the hump of understanding joins faster than a
non-programmer, and you could probably write 90% of the queries you'll ever
need in a week or less.

Reporting is different. Designing schema is different. Troubleshooting data
issues in a poorly designed database is way different. Tuning queries and
indexes for performance is way different. Not all use cases include those
things.

------
philonoist
Given such positive response, I want to learn SQL.

What are some good resources to learn?

I need to learn from the very beginning and it would be helpful if it had
practical training excercizes as part of the course.

~~~
Max_Mustermann
You can check out these 2 sites with practical exercices:

[https://www.dataquest.io](https://www.dataquest.io)

[https://campus.datacamp.com](https://campus.datacamp.com)

I've also seen this free online Stanford course recommended a lot:

[https://lagunita.stanford.edu/courses/DB/SQL/SelfPaced/about](https://lagunita.stanford.edu/courses/DB/SQL/SelfPaced/about)

------
ben7799
I learned SQL around 1996 or 1997 and had an absolutely fantastic college
course or two involving it.

I've mostly been a backend developer... having the ability to go in and fix
SQL and make things run in < 1 second instead of 5-10 minutes has been one of
the best skills I could have picked up.

It is sad that so often "self described 10x programmers" build solutions to go
around SQL that are horrible failures. Poor use of ORMs, weird abstractions at
the application layer that force developers to use poor data access patterns,
unnecessary "locking" at the application layer, unnecessary "existence checks"
at the application layer, processing objects 1-by-1 in the application. All
these things lead to terrible performance and huge wastes of application
memory/IO.

I love some of the NoSQL solutions too as in some cases they can force teams
to use better data organizations/patterns and scale so well. Those patterns
are often possible in an RDBMS but the system doesn't guide a team to using
those patterns. The way CQL in Cassandra forces you to think about data
organization is great for example.

------
hbarka
I love SQL. I love SQL for Data Warehousing even more. There’s a school of
thought for DW that goes back many years. Many dimensional modeling
practitioners know who Ralph Kimball is and also know the Inmon vs Kimball
battles. To this day, star schema as a design pattern has helped dozens of
analysts in my company abstract the relationship between measures (facts) and
how to slice them (dimensions).

------
childintime
Well, I'm not a fan of SQL. To me the math underpinnings are not well exposed,
probably because the language doesn't "see" obvious relations and that because
of this, joins must be performed explicitly, over and over again, and come to
dominate the query.

Isn't it obvious that when a column has the name of a table that the idea is
that the columns of that refactored out tables become available as if it were
a part of the main table?

When the query works regardless of the structure of the table, the database
can be refactored with ease. Most such queries would simply specify column
names, and a filter to apply on the records. Any joins needed, and which
result from the structure of the database, would be inferred.

Such a "NoJoinDB" would clearly boost productivity, and lots of applications
could be written with no explicit join at all. A language like SQL seems to
hide the simplicity of most queries!

Please comment about the validity of this perspective.

------
phonebucket
Another aspect of SQL that I have come to appreciate is it's underlying
philosophy: state what you want at a high level with a strict syntax; in
exchange, your backend will make aggressive optimisations for you.

This sort of paradigm really made me understand that writing 'high-level' code
and performant code are not mutually exclusive.

------
kerng
The reason SQL is great in my opinion is partially what the author highlights.
In addition I think one of the reasons is that it is a very mature technology
that baked over decades. I first was exposed to ANSI-92 SQL in school and that
was already like the 2nd or 3rd revision of the standard. I used it in my
first couple of jobs and the skill applied everywhere, Oracle, mySQL, SQL
Server and there was even something called Access that understood it and
Access was widely used amongst small businesses.

Knowing SQL to me was and even now is extremly helpful to understand data and
how to tackle data problems. There weren't many other technologies that had
such profound long term applicability, besides maybe knowing and understanding
HTML and C. Many new things are just variations and improvements of those core
technologies, and those can easily be understood and learned with good
foundational knowledge.

------
Hippocrates
I would say the same about bash.

~~~
emilsedgh
Products in production environments aren't relying on Bash that much anymore.
Nowhere near how they depend on SQL databases.

~~~
ehnto
I have never seen a production environment not use bash for part of the deploy
process, if not all of it. I have seen a lot of prod environments.

------
democracy
The thing is that most of the applications do need the complex data and thus
do need need deep SQL knowledge. This is why NOSQL (I think) databases became
popular. A DBA is called an "administrator" for a reason - most of the time he
spends making sure the data is replicated and backed up/restored and really
you need her help when things go to shits. It happens a lot but still in the
majority of cases her role is invisible as the devs are trying to abstract the
persistent layer (Hibernate and all other ORM frameworks, for java for
example) where SQL-compliant database are just a part of the corporate
landscape - not because I need it, but because it is there and all the
mentioned processes are there.

------
nicoritschel
And more recently, we see SQL abstractions overlaid on top of things like
Kafka streams with messages encoded in Avro thanks to technologies like Spark
structured streaming.

SQL is not only relevant to traditional databases. I never got into C#, but
LINQ looks _really_ interesting.

------
veritas3241
I fell in love again with SQL while working in my current position. What
_really_ made me fall in love was the project dbt[0] which is a SQL compiler
and executor. You can build a DAG of transformations all the way from your raw
data to tables ready for viewing in your BI tool or consumption by your ML
model. I'm still amazed at the things I can do with SQL alone without having
to bring Python into the picture.

Also, I had the pleasure of meeting Craig at PyTennessee a few years back.
Really great guy and yes, he does seem to wear a hat all the time!

[0] [https://www.getdbt.com/](https://www.getdbt.com/)

~~~
yittcvz
Same experience here. dbt has been a pleasure to work with, and putting it on
Redshift with Redshift Spectrum has proven a great pipeline for our startup.

------
minimaxir
With all the data scientist thought pieces out there, I really wish there was
more in-depth discussion of SQL and highlighting of the other shenanigans you
can do with query optimization, subqueries, and even analytic/window
functions. (although that last one is _apparently_ the Hello World equivalent
in data sci:
[https://twitter.com/minimaxir/status/1094430576092704768](https://twitter.com/minimaxir/status/1094430576092704768))

As a data scientist, I spend a _lot_ more time building queries than working
with models.

------
nazri1
Ha I could say the same thing about my $EDITOR, which happens to be vim. I
learned how to use it about 2 decades ago and I still use it almost daily
since. Some things are really worth investing your time learning them.

~~~
tapanjk
+1

I think the power comes from knowing something -- _anything_ -- well.

I feel the same about vim and common unix tools (bash, sed, awk etc). If you
can use these effectively, these can be very effective productivity tools. The
learning curve is steep, just like learning to ride a bicycle, but once you
do, it is difficult to imagine life without it.

------
enobrev
There's not much going on in this post, so I assume the overall high rating is
due to SQL being so damned fantastic.

I commented to a friend the other day that my favorite thing to do on any
project is project-wide query optimization. It's like pulling weeds or power
washing.

Arguably _every_ application is simply a means of manipulating data. All the
other parts are important, of course, but the data comes first. And SQL is a
hell of a UI for data. I've tried quite a few other query languages (or
structures), even inventing my own once or twice, but none come close.

------
OliverJones
If you're early in your career and looking for something useful to learn,
SQL's a good bet. It will probably outlast every other part of the tech stack
you now use. Everything the author says about SQL is true.

But, like asphalt for roads and aluminum for airplanes, it still can stand a
lot of improvement. By the way, it's just as important as asphalt and
aluminum, and consequently just as hard to change. That's the curse of the
customer base.

I wish the SQL world could agree on standards for string manipulation, and
stored function / proc programming.

------
berbec
It still amazes my that Don Chamberlain was my dad's college roommate and co-
worker at IBM. I always try to weasel college debauchery stories out of him,
but as yet have been unsuccessful.

------
gigatexal
I’ve met far too many devs who hide behind ORMs instead of writing raw SQL and
it makes me sad. There’s so much one gives up by allowing an ORM to handle
everything for you.

------
anorman728
As a dev, I have to use a query builder. It's frustrating because it's ten
times more complicated than SQL and slower to execute, with basically zero
advantages.

~~~
aitchnyu
Name and shame? Would you feel the same way about this?
[https://pypika.readthedocs.io/en/latest/2_tutorial.html#sele...](https://pypika.readthedocs.io/en/latest/2_tutorial.html#selecting-
data)

~~~
anorman728
Not really interested in naming and shaming because the lead dev trolls the
internet to find people that don't like the framework it so he can publicly
mock them.

As for PyPika, I wouldn't be fond of that, but that at least has an ability to
get the raw SQL being built. There's no such option in my framework (and I've
looked for hours combing the source code trying to find it), so debugging why
the data coming out is wrong is kind of a nightmare (especially since the
documentation is incoherent).

------
rv-de
I can't remember a single job interview where I was asked SQL questions. But I
can say that I conducted dozens of interviews for developers and almost always
asked for some simple queries.

The first question would be for the statement retrieving an entire table
without any restrictions or conditions and 50% would fail giving that query.
If somebody would manage to actually build a query involving a JOIN, WHERE and
a GROUP BY then that interviewee is pretty much hired ...

------
40acres
This thread has really opened my eyes. I was introduced to SQL on the first
day of my first internship, it threw me for a loop and while the job was
application development, the industry was finance so we were a very SQL heavy
shop. That first "professional developer" experience burned into my memory the
idea that all developers knew SQL, to me it was just an implicit thing that
everyone knew -- even if they didnt work with it every day.

------
aboutruby
I agree with the core idea. I've seen all kinds of people use SQL in their
jobs even if it's not every day: marketing, finance, developers obviously,
etc.

------
gherkin1
>You seem like a superhero. You seem extra powerful when you know it because
of the amount of people that aren’t fluent That's exactly what I said to newer
developers about SQL. Another two examples of this sort of skills are git and
regex. Most people knows how to use them but don't dive into them that deep
and you just need to spend very little time -hours, or days at most- and you
will better than most people at them.

------
rswail
The most important skill that knowing SQL has given me is the ability to think
in terms of sets and declarative ways of defining what I want the end result
to be (I believe the fancy term is the "projection" :) ).

Yes, it can appear to be magic to people who don't think that way, but doing
things with window functions etc and understanding what the output of EXPLAIN
means have helped unbelievably when improving performance etc.

------
jtms
I consider myself lucky to have begun my career in earnest at a time just
before ORMs became the norm in web development. I wouldn’t say I am an expert,
but wow has the sql experience developed early on paid dividends over and over
throughout my career. I feel like postgresql in particular is one of the most
powerful pieces of software I have ever worked with. You really don’t need
much more to build a robust backend!

------
scatterwilds
"like the other mechanistic software theories, the relational database model
is a pseudoscience; that it is worthless as a programming concept; and that
the relational systems became practical only after annulling their relational
features, and after reinstating – in a more complicated form, and under new
names – the traditional data management principles." \--Software and Mind, pg.
676

------
arendtio
While I find SQL useful too, there are other things I value even more than my
SQL skills.

Being able to use a POSIX Shell with tools like curl is certainly one of them
as it enables me to connect different technologies on a very practical level.

Furthermore, understanding the basic functional programming principles is
invaluable if you want to build clean algorithms (doesn't matter if you use
Excel, C or Lisp).

------
xpil
ROW_NUMBER(), LAG() and MERGE are the three most powerful operators I use in
SQL that are out of the average Joe's knowledge. They are simple to learn yet
extremely useful.

Also, after 25 years in the IT business I still can't write a proper PIVOT
clause without googling it first. Shame on me.

(btw I'm a MSSQL nerd. Been in Oracle/ODI world for a couple of years but it
was too scary)

------
rafiki6
The only thing I'll disagree with is "SQL is permanent". Nothing is permanent.
SQL will live for a long time, but eventually it will be displaced by
something else once something displaces RDBMS as the defacto data storage
standard of the world and chooses some other interface. That being said, learn
SQL, it really is a very very useful thing to know.

------
roystonvassey
Can't agree more. SQL is the one skill that I continue to use across a 15-year
span while many others have come and gone - SAS, SPSS, Clementine and name-
your-latest-plug-and-play-analytics platform.

It is a skill with the highest bang for buck and is a great way to introduce
non-programmers to a basic querying language. It is intuitive, easy-to-read
and unbelievably versatile.

~~~
bonesss
As an app developer & architect I'm continually surprised by how often I bump
into other techs on projects where BIG EFFORT is being put into all kinds of
crazy ORM shenanigans, or subsystem development, or integration because people
didn't understand how the underlying database platform could be managed
through SQL or how the underlying data could be manipulated through SQL.

Developing an data-driven app without understanding how your data-management
solution works is like using just one rollerskate. Sure, you can get places,
but...

------
Vektorweg
I believe, that the way data is structured in Prolog is easier and more
flexible than in SQL. Sadly I was busy with other things, so I can't tell for
sure how it looks for bigger projects and I haven't tried Datalog yet. But I
wouldn't consider SQL as a holy grail, after I run into various issues when
dealing with not so relational data.

------
zzzcpan
You can get to the opposite conclusion with some perspective. Just two decades
ago SQL was pretty much a mandatory skill for new programmers. Everyone was
learning it, it was in every programming job requirement in any way related to
storing some data, etc. Basically databases and SQL were synonymous. But today
a Citus guy writes a post attempting to convince people to learn some SQL
(Citus is an SQL database company). And last time I used SQL was when Chromium
dropped my cookies after an update and I was too lazy to relogin to all the
websites (it stores them in an SQLite database). I guess things are not
looking so promising for SQL.

Maybe we'll get to a better query language eventually. Distributed systems
have some fundamental incompatibilities with SQL and theoretically could
standardize on a different language, one that is not just about the data
anymore, but that also lets users express and choose various performance and
consistency trade-offs. I mean a query shouldn't, for example, invoke a
transaction and consensus algorithm when all you need is to increment a view
counter or give a one star rating or insert a log entry, those things can be
propagated eventually without destroying latency and throughput and without
complicating anything with transactional semantics.

~~~
busterarm
SQL is just fine. If incrementing a view counter, giving a one star review or
inserting a log entry is important to you, you absolutely want that wrapped in
a transaction. If you don't think so now, something will force you to think so
at some point in the future. If you wait that long, it's probably too late.

The direction this line of thinking trends in is not thinking or caring about
what data you're storing or caring about how it scales. If you're successful
enough, this becomes an enormous problem. It literally is a problem that sinks
otherwise successful companies.

Not thinking about your data model, not caring about the interactions of your
data and not caring about the reliability of your data is a very expensive
problem. It's also one where you don't know better until you do. Take some
advice from the rest of us, please. We're saying this for your benefit, so
that future you doesn't follow in our footsteps.

~~~
zzzcpan
> If incrementing a view counter, giving a one star review or inserting a log
> entry is important to you, you absolutely want that wrapped in a
> transaction.

You absolutely don't. You want it as CRDT operations, not transactions.

~~~
busterarm
That still depends. You're still making a choice of priorities. You're making
a choice that you're okay losing some data when failures happen.

For the mentioned use cases CRDT is probably an okay choice, but not always.
In some use cases/industries, losing log lines or having inaccurate counts is
absolutely not okay. There can even be regulatory concerns here. And fines!

It's also possible to scale traditional databases out pretty far. It's
difficult and expensive, but in probably 98% of cases you can do it. If your
company has to, it should be able to afford it anyway.

~~~
zzzcpan
CRDTs allow you to provide stronger guarantees wrt not losing data, than
transactions.

------
emiliosic
It still surprises me, although it shouldn't, that many developers try to
'hide' SQL through (in mi opinion, convoluted) ORMs. Although I see the
apparent beauty in the simplicity they provide to front-end developers, to me,
additional layers will always add complications down the line.

------
fogetti
My two cents (I haven't read the blog yet so there is that): everybody thinks
it's important but when you mention table inheritance or materialized views
then the loudest (how to call them not to hurt their feelings...maybe
proponents(?)) are the most clueless of all the folks.

------
Dowwie
I highly recommend the SQL and relational data modeling books by Joe Celko,
Bill Kirwin:

    
    
        - https://www.goodreads.com/author/show/93496.Joe_Celko
        - https://www.goodreads.com/book/show/7959038-sql-antipatterns

------
bayesian_horse
I know SQL. But I tend to download data from SQL databases and then use Python
and Pandas to deal with it, even though most things could be accomplished in
SQL more efficiently.

Maybe Python is more convenient, cognitively? Pandas and SQL are very similar
in concept, though.

------
ww520
Understanding relational model is a valuable skill. SQL is useful but it's
just the querying language. Understanding the relational model let you use SQL
correctly.

------
richardw
Started with raw light SQL, went to ORM's, then joined a team with heavy SQL
background. It's been a superpower since. A very reusable skill.

------
jsnk
Reading the thread so far, I see SQL and bash suggested as being valuable
skills.

Can you suggest more skills and tools like these that are timeless and
valuable?

~~~
wvenable
Regular expressions

~~~
ams6110
And sed, grep, and awk (though awk by itself can often do the job of the first
two).

~~~
yesenadam
Yeah, I use AWK constantly, not just for reading files line by line, but
whenever I need to e.g. create data in some format, like CSV or XML or for my
programs, or to write repetitive code in a program or LaTeX file (e.g. for a
lot of images) etc. It's just so quick and easy, such a versatile tool, a
pleasure to use. I learnt sed and grep too but rarely use them.

~~~
james_s_tayler
I only got into AWK much later than grep/sed but I was glad to discover some
of awks power. Sed is still and will always my first love, but more and more I
reach for awk to learn it as there is still so much I don't know.

------
mediaright
[digs out copy of "The Manga Guide to Databases" I bought in school as a
gag...but actually turned out to be really good]

------
whynotminot
Having a solid working toolkit of SQL knowledge really will make you seem very
useful on a team.

I do say "seem" though, in that if you weren't around it's not like the team
would fail without you. It's not hard to learn SQL or Google around for a
little bit to figure out what you're trying to accomplish.

Sort of like that one guy on your team who's a wizard at regular expressions.
He seems like a life saver in the moment, but you'd be able to figure it out
without him, too.

~~~
sooheon
Speaking as not an SQL guru -- isn't this true for every skill? Theoretically
anyone can figure anything out with enough time, the value is of specialists
having better mental maps for what solutions apply, when you need them.

~~~
whynotminot
> isn't this true for every skill?

But you're implying then that all skills are equally hard to get a handle on,
and that's just not true.

Working with sql is fairly simple even for those who aren't experts. But (as
the article mentions) it's not always something a dev has done before or
consistently so they'll often feel relieved if there's someone who's confident
about it around.

~~~
dominotw
>Working with sql is fairly simple even for those who aren't experts.

Its not though. There are windows, subqueries, joins , various functions,
proprietary extensions, partitions, aggregations ect ect. Even if you get all
the down, figuring out a performant version of the query is another hurdle,
the whole 'simple' abstraction falls apart once you start tuning your query.

------
growlist
I also love SQL! Spatial extensions are really valuable as long as you get
your spatial indexes built.

------
forinti
Learn to use analytic functions and you'll become a query guru.

------
forrestthewoods
I don’t know SQL. What’s the best resource to learn?

------
miguelrochefort
SPARQL: The most valuable skill.

------
neokantian
> Learning SQL once will allow you to re-use it heavily across your career
> span without having to re-learn.

Like all good abstractions, SQL is the practical expression of a mathematical
theory. In the case of SQL, you use Zermelo-Fränckel (ZF) set theory to reason
about data sets.

While it is easy to come up with merely conjectural implementations for
haphazardly doing things -- arbitrary trial and error, really -- it is
notoriously hard to develop a systematic axiomatization such as ZF.
Mathematics do not change and are hard to defeat. That is why math-based
knowledge and skills are able to span an entire career without having to re-
learn.

SQL got kicked off in all earnest when David Child's March 1968 paper,
"Description of A Set-Theoretic Data Structure", explained that programmers
can query data using set-theoretic expressions instead of navigating through
fixed structures. In August 1968, Childs published "Feasibility of a Set-
Theoretic Data Structure. A General Structure Based on a Reconstituted Set-
Theoretic Definition for Relations".

Data independence, by relying on set-theoretical constructs, was explicitly
called out as one of the major objectives of the relational model by Ted Codd
in 1970 in his famous paper "A Relational Model of Data for Large Shared Data
Banks" (Communications of the ACM 13, No. 6, June 1970).

Using Codd's work, Donald Chamberlin and Raymond Boyce at IBM, developed in
the early 1970s what would later turn out to become SQL.

~~~
kccqzy
Relational algebra is a much closer foundation for SQL than ZFC (or plain ZF,
because for finite sets the difference doesn't matter). You can say SQL has
ZFC as its foundation but that's not very useful. In the same way you can say
ZFC is the foundation of everything in mathematics. SQL doesn't even use
anything beyond naïve set theory because we are dealing with finite sets, let
alone the full power of ZF.

~~~
neokantian
> In the same way you can say ZFC is the foundation of everything in
> mathematics.

Yes, technically, that is indeed true of every Turing-complete axiomatic
theory.

You could conceivably even use the SKI combinator calculus as the foundation
of everything in mathematics. Still, something tells me that a database query
language in widespread use, based on the SKI combinator calculus is rather
unlikely to emerge (certainly not at this point).

Hence, since ZFC is one of the many alternative, equivalent foundations of
everything in mathematics, picking ZFC still amounts to making a choice, no
matter how small ...

> SQL doesn't even use anything beyond naïve set theory ...

Agreed. However, we also have to consider what situation we came from. David
Childs objected to the use of custom, application-specific code, data types
and functions, because these custom structures could (more easily) be
represented by one, general abstraction. Given the fact that back then, people
were not even using naive set theory, his considerations were clearly a step
forward in the right direction.

