
Unreasonable Effectiveness of SQL - mariuz
https://blog.couchbase.com/unreasonable-effectiveness-of-sql/
======
paddy_m
The relational model is awesome. SQL is one of the worst languages I have ever
encountered. The syntax is awkward and unwieldy. The only unit of abstraction
in SQL the language is the VIEW. I end up with so many repeated patterns in
SQL it's mind numbing.

I would love to see lisp-style (AST) macros applied to the relational model.
Here's an example that might have an obvious solution (please share), but not
off the top of my head. Find the id of every row having the minimum value per
group by aggregation. I have to write a subquery to do this, but the
transformation is quite regular and predictable. Where's my macro?

~~~
Mister_Snuggles
I believe some of the window functions[0] (specifically, row_number) will do
what you want. Honestly though, these might not be a readability improvement
over subqueries.

[0] [https://www.postgresql.org/docs/11/functions-
window.html](https://www.postgresql.org/docs/11/functions-window.html)

Edit: Just tested it, it works but I wouldn't call it pretty.

    
    
        create table testing(id int, grp int, other varchar2(255));
        
        insert into testing values (1, 1, '');
        insert into testing values (2, 1, '');
        insert into testing values (3, 1, '');
        insert into testing values (9, 2, '');
        insert into testing values (10, 2, '');
        
        
        with cte as (
          select grp, id, row_number() over (partition by grp order by id) as rn from testing
        )
        select grp, id from cte where rn = 1;

~~~
Tostino
Another solution for PG which i've used many times:

    
    
        select distinct on (grp) grp, id
        from testing
        order by grp, id;

~~~
Mister_Snuggles
Neat! It doesn't work on Oracle, but you implied that it's PostgreSQL-specific
so I somewhat expected that. It's a lot cleaner than mine and, even though
it's not something I've seen before, it's fairly obvious what it's going to
do.

When I did my testing I did it on Oracle since I have it available at work.
For my non-work projects I use PostgreSQL and, for the most part, find that I
can flip between the two database without too much trouble. Of course, this
also means that I don't know about some PostgreSQL niceties like what you
showed.

~~~
Tostino
Very true, it's a nice PG specific extension of the SQL standard. Would be
nice if some of these types of extensions made it back into the standard to
increase the likelyhood other DB's you use would also support it.

It's syntactic sugar, but it is a whole lot more readable in some cases than a
window function. It's nice to not need a sub query or cte to express it.

------
georgewfraser
SQL has a lot to recommend it; we've basically built a company around the
effectiveness of SQL as a way to interact with data. I will say that it's
pretty bad at code re-use, and `with` clauses are pretty awkward for breaking
down a complicated analysis into small stages.

Kusto/Azure Data Explorer [1] is a really interesting new query language,
focused on analytics. It fixes a lot of the things that are awkward about SQL.
A simple example:

    
    
      StormEvents 
      | where StartTime >= datetime(2007-11-01) and StartTime < datetime(2007-12-01)
      | where State == "FLORIDA"  
      | count 
    

[1] [https://docs.microsoft.com/en-
us/azure/kusto/query/](https://docs.microsoft.com/en-us/azure/kusto/query/)

------
nine_k
It's the unreasonable effectiveness of a language and an approach that has
some real mathematical backing behind it (relational algebra).

Lisps are unreasonably effective because they have lambda calculus behind
them. The Hindley-Milner type system and linear types, both mathematically
sound, are a large part of Rust's effectiveness and success, in my eyes.

I still would like that SQL read a bit more set-theoretic, if less English-
like. It would make it less quirky and more composable.

~~~
darksaints
> I still would like that SQL read a bit more set-theoretic, if less English-
> like. It would make it less quirky and more composable.

I wish that it didn't introduce an incomprehensible ternary logic for null
handling. SQL with algebraic data types would be amazing.

~~~
nine_k
Nulls are facts of life. I wonder how they could be _succinctly_ represented
in a query language. Something like the `?` operator in C# / Kotlin?

~~~
andrewflnr
That would be an improvement. I'd rather have proper algebraic sum types,
though (Option or Optional in languages like Haskell, OCaml, et al). They're
really useful for all kinds of stuff. Perfectly representing missing values is
almost a fringe benefit.

~~~
darksaints
Not only would algebraic sum types be awesome, but if user-defineable they
could eliminate so much of the ambiguity in how nulls are practically used in
SQL. People often say that null just means unknown but that is a bug-causing
simplification. In reality, null could mean unknown, but it could also mean
not yet known, uninitialized, missing, impossible, uncomputable, join
conditions not met, or hundreds of other things. And sum types would be
extremely useful even outside of null handling...there are thousands of
reasons why more and more modern languages are designed with sum types.

Not only would sum types be extremely helpful but it would also be awesome to
have a standard for product types. As it stands, some implementations have
them but they are clunky, and others you have to resort to just using multiple
columns, which is pretty bad for composability, bug-ridden, and leads to
behavior that is almost like Hungarian notation in order to maintain unique
column naming. For example:

    
    
        create table foo(
          start zoneddatetime,
          end zoneddatetime
          ...
        );
    

versus

    
    
        create table foo(
          start_time timestamp,
          start_tz timezone,
          end_time timestamp,
          end_tz timezone
          ...
        );
    

(timestamptz is not an acceptable substitute...it doesn't store timezone)

------
childintime
It seems to me SQL is, paradoxically, too low level, much like assembly
language, in the sense that it makes you specify the joins. Over and over and
over again. Why doesn't it infer them from the structure of the database (the
relations between tables)? If the answer is performance, well, that means we
are in agreement, and SQL is not high-level..

In summary I think a QL should produce a table from a database when provided
with 1) the set of field names desired, and 2) the set of constraints on the
desired rows. Anything that goes beyond this is in need of justification.

It is for this impedance mismatch (of understanding) that I have an aversion
against SQL (therefore I am not well-versed). I basically see SQL as an
exercise in complexity, resulting from the constraints prevalent in the
seventies. Please prove me wrong!

~~~
astine
_" It seems to me SQL is, paradoxically, too low level, much like assembly
language, in the sense that it makes you specify the joins. Over and over and
over again. Why doesn't it infer them from the structure of the database (the
relations between tables)? If the answer is performance, well, that means we
are in agreement, and SQL is not high-level."_

First of all, it absolutely can and does infer joins if you use a natural
join.[1]

However, a lot of the time you don't want the database to infer the joins
because how you join two tables depends a lot on the specifics of your query.
Not every join is along a clean primary key/foreign key axis. Sometimes you
have datasets from different sources. Othertimes, you have more novel
relationships between your tables. The whole point of the relational model is
to give the end user a lot of flexibility in querying the database which means
you need to specify the joins a lot of the time.

1\.
[https://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqljnaturalj...](https://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqljnaturaljoin.html)

~~~
marcosdumay
> However, a lot of the time you don't want the database to infer the joins
> because how you join two tables depends a lot on the specifics of your
> query.

Some 90%+ of the times, you want exactly the join the database would infer,
but natural join is basically useless, so you must repeat the basic join all
the way every time.

SQL did it backwards. The simple join keyword should mean natural join over a
foreign key. A cross product should be created with the `product` keyword or
something similar.

------
dreamcompiler
Prolog has always made more sense to me as a query language than SQL, but SQL
is more popular because it's "English-like." Which is exactly the reason SQL
bugs me.

~~~
kjeetgill
I've never managed to wrap my head around prolog but I'd say SQL bugs me for
the same reason. Care to throw out some examples?

~~~
dreamcompiler
I need to create some. In the meantime this is not a bad writeup:

[https://stackoverflow.com/questions/2117651/comparing-sql-
an...](https://stackoverflow.com/questions/2117651/comparing-sql-and-
prolog#2119003)

To clarify, I think Prolog is a lousy data storage language but a fantastic
query language. What I've done in the past is used Prolog to query a Common
Lisp object-oriented database (after modifying the Prolog to produce
composable lazy streams, which is not usually included in the Prolog box but
is very handy for database querying.) What I want to try next is the above
with standard PostgresQL as the backend repository. That will make possible an
apples-to-apples SQL comparison possible.

~~~
pritambaral
> ... used Prolog to query a Common Lisp object-oriented database ...

Could you share more details about this? I'd be very interested in exploring
something like this.

------
hestefisk
Interesting the article mentions spatial data and Oracle Spatial, but no
PostGIS. The latter, afaik, is state of the art for spatial data sets.

------
namelosw
Datalog is much more reasonable than SQL. Every time I have to write SQL more
than 100 lines I go crazy.

~~~
kristianp
Do you use Datalog with Datomic and Clojure or another database?

~~~
namelosw
I don't use Datomic since it's closed source. But the design is really, really
cool.

Actually, Datalog is a subset of Prolog, and could be used on any kind of
data, not necessarily database. Nowadays, it's implemented in different
languages, just like the Clojure version.

So like SQL, Datalog it's a declarative language and implemented on different
kinds of platforms but IMO it's much more reasonable than SQL since SQL is
much worse on composition.

------
truth_seeker
Great article and ref links.

Most people ignore the fact that Recursive Common Table Expression (CTE) makes
SQL Turing complete

------
smnrchrds
I have always used RDBMS through an ORM, usually Django's. I understand basic
SQL syntax, but nothing beyond that, and I cannot think in SQL if that makes
sense. Is there a good book or course for someone like me to dive into SQL,
preferably Postgres?

~~~
sammorrowdrums
This course is a wonderful start

[https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/abou...](https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about)

After this I would have a look at the postgres documentation. There's so many
great examples between there and stack overflow.

~~~
smnrchrds
This brings back memories. I remember going through the course many years ago,
but I got busy with other projects and quit about 25% through. I'm glad it is
still out there and still considered a good source.

~~~
sammorrowdrums
Yeah if you've gone through a good chunk maybe complete it.

I really hate knowing how easy it is to do some queries and struggling to use
the Django ORM to achieve what I could write in minutes - but for
maintainability I hate dropping down to SQL in that context.

For a fun advanced tutorial, have a go at pivot tables with crosstab using the
docs. I've used it for analytics queries to chart categories as headings, with
complex aggregations etc.

[https://www.postgresql.org/docs/9.1/tablefunc.html](https://www.postgresql.org/docs/9.1/tablefunc.html)

Window functions are another classic

[https://www.postgresql.org/docs/9.1/tutorial-
window.html](https://www.postgresql.org/docs/9.1/tutorial-window.html)

In general working with SQL and gradually solving the problems you have
learning as you go is excellent. If you want faster than that there probably
are courses but I really do rate their docs. Postgres is great.

