
SQL databases come up with algorithms you’d never have dreamed of (2017) [video] - truth_seeker
https://www.youtube.com/watch?v=wTPGW1PNy_Y
======
ak39
So in 20 years we've gone from vilifying SQL statements in code, witch-hunted
folks who did this and instead invented and propped up clunky ORM
implementations, ignored sage advice on the famous "object-relational
impedance mismatch" (which the author claims he doesn't "believe" exists -
it's not a choice btw) and now, after 20 years of agent orange in the bloody
Vietnam of Computer Science, we are presenting why SQL was better all along -
showing off windowing functions as examples? This is irony that took 20 years
to sink in.

Am I understanding this guy's presentation correctly? (I left around the 20
min mark of the 50 min presentation).

~~~
lugg
I dunno. I think there is quite a lot of us that have consistently expressed
the stupidity that is wrapping a powerful query language in a mediocre,
limited, buggy, half baked, shoe horned into OO shit what was I saying?

Oh yea, ORM.

I've always veiwed it as a legitimate (though always over engineered) solution
to wrap some repetitive database operations in a consistent manner.

The biggest issue with ORM isn't the solution they came up with, half baked
and buggy as they are, but the problem they are actually employeed to "solve":
developers afraid of SQL. Whether they can't write it, can't learn it, or just
straight hate the string representation in their code, it's the main reason
people sign on for ORM.

Then they prop it up with all manner of baseless reasoning.

Theb there is the leakyness. ORMs are usually quite fluid in their language,
they make it easy to represent the database in code. The only issue is that it
isn't there. It's a separate thing. When you see database code it should be
obvious and very far away from business logic. What does ORM do? Mix it all up
On purpose. Have fun with all the mocks.

~~~
ken
> the problem they are actually employeed to "solve": developers afraid of
> SQL. Whether they can't write it, can't learn it, or just straight hate the
> string representation in their code, it's the main reason people sign on for
> ORM.

This sounds awfully like a straw man. I've worked on several projects that
used an ORM, and it was never because developers couldn't write SQL.

ORMs solve a real problem: SQL isn't composable. That's it.

Sooner or later, every non-trivial program needs to compose arbitrary pieces
of one query with arbitrary pieces of another, so you'll either use an ORM
(i.e., native data structures instead of raw strings), or essentially write
your own ad-hoc one.

It has nothing to do with lack of ability. The staunchest ORM advocate I've
worked with was also the first to pull out the debugger and single-step the
MySQL driver to track down a bug (that one was nasty -- it was indeed in the
driver).

I use an HTML generator, and a compiler, too, and it's not because I'm
"afraid" of assembly language or that I "can't learn" HTML. These abstractions
have real value.

~~~
karmakaze
> ORMs solve a real problem: SQL isn't composable. That's it.

How so? I've always found the opposite. I can build up a compilex query
referencing many tables in SQL without impediment but am unable to do the same
with an ORM.

~~~
platz
OP means combining two different queries or clauses into a bigger query/clause

You cannot easily "modify" a sql statement.

There are type-safe DSLs in certain languages (if they have an expressive
enough type system) that let you do this, but they are uncommon

~~~
vp8989
You could use views. Although personally I dislike adding this layer of
obfuscation. Generally devs are very underskilled in SQL (relative to it's
importance) and ORMs hiding the SQL IMO is one of the reasons why.

There is great value in having the entire SQL easily visible in your source
code. If there is a performance issue with a specific query my app is running,
I can very quickly pull it out and into a database IDE and see what's going on
using the more detailed tools available (REPL, execution plan/statistics usage
analysis etc).

If the query is composed from a bunch of ORM function calls, I have to step
through the program to first generate it.

Reducing friction around debugging allows your devs to become more capable in
fixing things or making them work better, and SQL is no exception to that.

~~~
ako
Views are not a layer of obfuscation. They are a layer of abstraction...

------
ThJ
This talk didn't hook me, but then again, it looks like it's aimed at
developers who jumped on the NoSQL bandwagon and now need to be told why SQL
is clever.

I feel that much of what we do with map, reduce, sort and filter on arrays in
modern JS is similar to SQL. Unfortunately, there is no query optimisation,
because that would break your procedural code.

What I'd like to see more of is domain-specific languages embedded in general-
purpose languages, like React with JSX, or just a better bridge between the
two. We have these two worlds of in-memory data structures on one side, and
databases on the other, that we keep separate. NoSQL was an attempt at
bringing them together, but it put far too little emphasis on complex queries
and filters.

It would be nice if we could just define data structures and their relations
once and not have to do it again. It would be nice if foreign keys neatly
mapped to object references in OOP and everything stayed live, so if you
change a reference, that happens instantly in the database too, unless you
wrap it in a transaction block.

Now pull this off while still retaining the ability to open an SQL REPL to
test out statements, or manipulate field definitions.

I think that's what everyone wants, really. A seamless way for code to
interact with data on all levels, with no perceptible boundary once you've
opened a session to your DB server.

~~~
lugg
Look up some of the old YouTube vids on datomic / rich Hickey.

"Database as a value" had me jaw dropping and then crying most of the way
through..

~~~
dustingetz
+1 for "Database as a Value", this talk changed my world

------
rgoulter
At around 22:10, the speaker summarises equivalencies between common SQL
clauses (SELECT, GROUP BY, etc.) and what Java 8 calls a "stream API" (map,
collect, etc.). [https://blog.jooq.org/2015/08/13/common-sql-clauses-and-
thei...](https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-
equivalents-in-java-8-streams/)

Neat. I hadn't thought of it that way before.

~~~
mjburgess
I teach most (dynamic) functional programming constructs by starting with SQL.

Python comprehensions, etc. are mostly just SQL -- likewise most languages
where "map" is part of the collections API.

SQL motivates the utility of functional programming for data-oriented
applications.

~~~
collyw
Have you got any material online to look at? I am curious to see the sort of
stuff you are talking about ("Python comprehensions, etc. are mostly just SQL"
doesn't make a lot of sense to me right now. I see them as an inline for
loop).

~~~
afandian
I wouldn't compare a comprehension to a loop because a loop doesn't typically
have an output. If a loop does produce an output it's through side effects.
List comprehensions are the same as maps, and generator comprehension even
more so, as they can compose functions not values.

And a map can be the same as a projection as found in sql.

~~~
taeric
Loops in lisp would like a word with you. :)

The difference is that many languages have constructs that weren't value
producing. It is why some have the ternary operator, but lisp just had if
statements.

So, to that end, most loops do produce a value. And there are several common
ways they do it. In many languages, you have to give the details of how they
work. In some, you can only those details and the building if the output is a
bit more declarative.

~~~
afandian
I was coming from the Lisp perspective! Or at least Clojure...

Maybe it's a language issue. I would say a 'loop' has a jump and a condition.
So that's 'for', 'while' and 'do' in C, JS, Python, Pascal, Java...

Lots of languages including Lisp have recursion, maps etc but I wouldn't call
that looping. Clojure even has TCO recursion but it only returns one value,
not a sequence (unless you accumulate the whole thing)

~~~
taeric
But even the LOOP macro has returned values for a long time. It isn't even
hard to see how the new stream/collect API if Java is just approaching what
LOOP has been doing for decades.

~~~
dfox
Java's streams implement the same idea as Common Lisp's streams, which were
supposed to be more Common Lisp-like replacement for LOOP (eg. composable and
with syntax that does not involve string comparison of symbols). That idea was
droped from the final ANSI standard.

~~~
lispm
SERIES, not streams.

------
mnazim
A very good way to understand the fact "class are for encapsulating behaviour,
not data". But we don't learn until we fall.

------
bladelessninja
I have strong .net background with mostly pretty legacy code and I'm amazed
that this talk is so "wow" for some people. Till this point I wasn't aware
people has such problems like described in this talk. Maybe I'm just getting
old, but SQL was my first idea to solve problem. I think for most reporting
related tasks simple SQL will do. I think some people just get caught in their
bubble and forgot the basics: choosing right tool for a task. Of course SQL
has its issues: e.g. it is problematic to test it and to manage db model
changes (code first aproach wins there, hands down), but for data extensive
tasks it is the way to go. This whole talk just remind me some of legacy code
I was investigating one time, where developer just though he can implement
everythin on his own better than standard library. Probably you need some
distance when looking on particular problem to see more just "how I would just
loop through all this data an calculate this".

------
Sulong
May be the topic of talk should be how to compare apples and oranges for half
of time in the talk.

~~~
lukaseder
Yes that's what this talk is about.

A lot of people recognise apples and oranges for what they are. The talk is
meaningless to those people.

A lot of _other_ people don't / can't make the distinction (yet). For those
people, the talk is an eye opener.

------
RandyRanderson
I'll save ppl some time:

* first 20 mins: Writing a contrived report query that's easy in SQL is easy in SQL.

* but: most _real_ reports require many different fields collated into many parts of say a pdf or xls. Once you take this into account writing custom code to do the entire report is often the only way to do it. :<

* you often can't do some of what the author suggests bc in a "real" org you have db zealots that only allow access to their precious through stored procs and a surprisingly large number of meetings

* trying to do some non-trivial where clauses? Not in SQL - so you have to use an external lang. I call this "Works in my presentation" syndrome.

* he acts surprised that a lang that uses a db through a SQL api can't be as fast/efficient/terse as a system that uses SQL and has direct access to the db/storage.

Stopped after that.

PS Seems like a funny guy tho - liked his style! :>

~~~
lukaseder
"Funny guy" here.

Can't comment on db zealots. It must suck when that happens.

But disagree entirely on your other parts. Haven't run into such cases yet,
wrote all my reports in SQL (and something like XSLT for presentation logic).
I've mainly used Oracle.

------
asah
This is quite possibly the best talk I've ever seen. No joke.

~~~
skullborg
I'm liking it so far as well - I enjoy SQL though

------
ipsocannibal
One of the biggest problems with embedding SQL directly into your application
is the dreaded vendor lock in. In my experience each RDBMS vendor supports its
own special flavor of SQL. Once you've baked that flavor into your code base
it becomes non-trivial to change your SQL backend if necessary. From what I
understand Amazon has spent years trying to untangle the rats nest of embedded
Oracle SQL dependencies within its codebase so it can extract itself from
Oracle vendor lock in. So before you bake that SQL string into your code think
carefully about what SQL functionality you are using and ask yourself if its
representation is specific to your current RDBMS vendor.

~~~
tatersolid
Pure ANSI SQL gets you 90% of what most apps need and is portable between
MSSQL and Postgres. MySQL/MariaDB are far less ANSI-SQL-compliant, but usually
only function calls change when porting.

Nobody sane uses oracle for new development.

If you chose to code your app to MongoDB or Dynamo or whatever you are 100%
locked in unless you do a rewrite of your whole data access layer.

To top it off, in the real world you’ll still need a separate SQL DB for
reporting and analytics.

------
jayd16
I really do think a lot of ORM use is because devs are afraid to use SQL or at
least afraid of how others devs might use it. This is true whether devs want
to admit it or not.

That said, SQL doesn't lend itself to easy composability and you can end up
with "4000 line monsters" if you really tried to put as much business logic
into the query as possible.

Is anyone working on SQL extensions, ORM, or new 4th Gen language that could
actually support something like 4000 lines of SQL in a maintainable way?

~~~
btschaegg
> [...] and you can end up with "4000 line monsters" if you really tried to
> put as much business logic into the query as possible.

In my experience so far, this has usually been the actual problem. If you are
not actively trying to "cleanly" separate your business logic from your
storage, you tend to end up maintaining a mess. People correctly identify the
problems with the mess, but incorrectly attribute it to SQL instead of
architectural problems. So they switch to an ORM and wonder why it doesn't
improve anything a little while later.

And that's not to say it is easy to come up with a useful way to interface
between storage and the business logic -- on the contrary. It's just that the
naive approach (strongly coupling the two everywhere) is kind of a worst-case
scenario. Also, there are certainly situations where ORMs are very useful;
it's just that a messily integrated ORM isn't any better than messily
integrated SQL.

The thing is: Your business logic needs to be provided with ways of triggering
certain operations (along with certain guarantees). I haven't seen cases where
a single operation would take 4000 lines of "interesting" SQL though (ignoring
very long lists of columns :) ). So, if the SQL is hard to maintain: Is this a
problem of the SQL, or is this a byproduct of unclear (or ever changing)
interface requirements? Sometimes, taking a couple of steps back and
considering alternative approaches can yield much better results than
optimizing a "wrong" solution.

(I guess this could be considered a form of the "X Y Problem":
[http://xyproblem.info/](http://xyproblem.info/)).

------
jokoon
SQL feels opaque, and its syntax is one of a particular kind (can't remember
how it's called), which makes it difficult to learn and understand how it's
parsed, because it's not very well compartmentalized. If the language was
better structured like other programming languages are, that'd be better, and
jooq looks like something that should be standard across all languages.

Usually when I'm using any technology, being able to understand how it works
is my most important priority, because it prevents me from using it
improperly. I have a hard time understanding what those "algorithms" really
are after all. I heard that databases engines use backtracking, but I'm not
really sure that's what it is talked about here. Maybe databases don't use BT?

Using another language that parses every time you do something doesn't feel
very fast. It is fast for many applications, but I don't think it is for all
of them.

In the end, it's the same old combat, either choose peak performance or
development delays with good enough performance.

Although I have to admit that for GIS (geographic information system),
databases with embedded R-trees and other things are very much welcome, since
implementing those algorithms from the ground up is way too hard.

~~~
dahfizz
I'm not sure that parsing is going to be a bottleneck when you consider that
DBMS read from the disk.

~~~
thesz
Prepared statements would not have been invented if parsing and checking was
not a significant issue.

~~~
jhayward
The main point of prepared statements is that expensively optimized query
execution plans can be re-used, including incorporating things learned from
data patterns by previous queries. I've never considered them to be a crutch
on parsing.

------
jayrwren
reminds me of the presentations coming out of the .NET world in 2005 after
they launched LINQ. Nearly identical points being made.

------
ilitirit
Is SQL no longer featured as part of most Comp Sci/IS courses any more?

~~~
Izkata
It was for mine (2006-2010), but significantly more limited than you'd expect.
It was essentially bare minimum, focused on theory, and could be summarized as
"SELECT .. FROM .. INNER JOIN .. WHERE .."

Most of the focus was on things like database normalization instead of
practical usage. I recall upon starting my first job, a co-worker gave me a
crash course in what exactly a LEFT JOIN was (though at this point it had been
~3 years since the college database course, which had been so lackluster I
hadn't used one during those years).

------
didibus
Pretty funny talk, but also informative.

------
continuational
The problem with SQL is that SQL that works today will be too slow tomorrow.
Fixing it involves trying to please the query planner, all while making your
query less and less comprehensible.

As a contrasting example, the experience you get with ElasticSearch is that
queries have predictable performance even as your data changes. In that way
it's much, much nicer. On the other hand, you lose joins, which is a huge
downside.

~~~
anticensor
Butcher a SQL adapter over Elastic output and call it a day.

~~~
michaelcampbell
[https://github.com/NLPchina/elasticsearch-
sql/](https://github.com/NLPchina/elasticsearch-sql/)

[https://www.elastic.co/products/stack/elasticsearch-
sql](https://www.elastic.co/products/stack/elasticsearch-sql)

