
Opaleye’s sugar on top: SQL in the type system - michaelsbradley
http://ren.zone/articles/opaleye-sot
======
i_s
> That is, as an example in our case, we can’t expect that a Haskell data type
> that today maps perfectly to a SQL row will continue to do so after any of
> the two is modified.

But we can and do if we use the compiler to dynamically generate code. For
example, with F#'s SqlClient (a type provider) [0], types are generated based
on queries, correctly mapping database columns (including nullable ones) to
fields on types. If you change the queries, or tables (and you are doing
'select *'), the types get updated automatically. This seems like a better
approach.

This is also similar to how popular popular db access approaches in dynamic
languages such as ActiveRecord in RoR work. Tables are inspected at runtime,
and getters, setters and utility methods are generated based on that.

I imagine this is also possible in Haskell via Template Haskell, which the
author dismisses as too complex/fragile, though I don't know Haskell well
enough to be sure.

[0] -
[http://fsprojects.github.io/FSharp.Data.SqlClient/](http://fsprojects.github.io/FSharp.Data.SqlClient/)

~~~
thesz
Do type providers provide you with semantics difference when type errors
occur? If not (e.g. they don't tell you about arity mismatch citing relevant
part of awl query), then they are inferior to the method of expression of
queries in type level.

------
virtualwhys
Honestly none of the statically typed sql generators (Opaleye included) really
tick all the boxes; all are lacking in one way or another.

Here's a breakdown of the leaders of the pack on this front[0]

Haskell

1) Opaleye[1]: composable and prevents invalid sql, but no join support
(simulated via sub selects), Postgres only, and tough syntax to parse.

2) Esqueleto[2]: composable, readable, supports joins, but Postgres only, and
the DSL can't/doesn't prevent invalid sql (which is, IIRC, one of the reasons
why Opaleye came into being).

Scala

1) Slick[3] myriad databases supported (Oracle and SQL Server require non-free
license), join support, fully composable (really, king of monad hill), and
elegant syntax (until you `groupBy`/`sortBy` and then enter unreadble
tuple_.1, tuple._N soup). Also, grouping requires flattening result set via
min, max, sum, etc. aggregate function so not possible to return tuple
representing say, a User, when grouping. Suggested workaround is to group on
the client (and send a million rows over the wire to app server, ok o_O).

2) Squeryl[4] myriad databases supported (all free), join support, sql-like
syntax, but lacks compositionality of aforementioned and relies on runtime
reflection to make the DSL go.

Microsoft

LINQ to SQL/Objects myriad databases supported (not sure of driver licensing),
excellent sql-like syntax, and type providers straight other worldly, but
lacks compositionality and is tied to Windows (until *nix support lands).

[0] There are likely (many) others but these are the statically typed ones I'm
aware of. Feel free to add others, curious to see what's out there (e.g. OCaml
must have something)

[1] [https://github.com/tomjaguarpaw/haskell-
opaleye](https://github.com/tomjaguarpaw/haskell-opaleye)

[2]
[https://github.com/prowdsponsor/esqueleto](https://github.com/prowdsponsor/esqueleto)

[3] [https://github.com/slick/slick](https://github.com/slick/slick)

[4] [https://github.com/squeryl/squeryl](https://github.com/squeryl/squeryl)

~~~
tome
Some corrections/comments/questions:

> Opaleye ... no join support (simulated via sub selects)

This is true for exactly one reason: despite being raised frequently, nobody
has ever been able to demonstrate any performance difference between
subselects and joins (in Postgres)! As soon as someone does, they should file
an issue ([https://github.com/tomjaguarpaw/haskell-
opaleye/issues/](https://github.com/tomjaguarpaw/haskell-opaleye/issues/)) and
I'll add joins forthwith.

> Opaleye ... tough syntax to parse

What do you mean by this? The Haskell syntax of Opaleye, or the generated SQL?

> Esqueleto ... Postgres only

You're incorrect about this. From your Esqueleto link:

"It has many kinds of backends, such as SQL backends (persistent-mysql,
persistent-postgresql, persistent-sqlite) and NoSQL backends (persistent-
mongoDB)."

> Esqueleto ... can't/doesn't prevent invalid sql

But you're right about this.

------
moron4hire
I firmly believe that teams working in static languages do not easily
transition to dynamic languages, and vise versa, yet when working with SQL
that is what we're constantly asking people to do, because SQL is either less
static than C# or Java or more static than Python or JavaScript. It's this
weird, bastard of a language that sits halfway in between, with no good way of
unit testing it because most operations are stateful [0].

Most SQL engines include a static verifier of some kind, but it ranges from
nearly non-existent in MySQL's case to so-pedantic-it-hurts in Postgres' case.
Even worse are the ones like MS-SQL that are static in some regards (you can't
usually submit a stored procedure that references non-existent tables), but
dynamic in other regards (but non-existent columns in those tables are fair
game!).

Personally, I'd prefer a lot more static checking. This starts off ok, but I
need more. This appears to be Postgres only right now. I think a static
checker for vendor-specific SQL is a key component for getting people off of
Oracle or MS-SQL. I've seen tons of database conversion projects die on the
vine because of the issues with translating vendor-specific extensions. We
need a "VM for SQL", some way to write-once-run-anywhere, and ANSI SQL is
insufficient. I mean, have any vendors actually gone beyond SQL-92? There have
been four newer versions since then.

[0] I mean, that's not a surprise, that's the entire point, it just makes good
unit testing very hard.

~~~
greggyb
>I mean, have any vendors actually gone beyond SQL-92?

Yes. Here's a pretty good summary presentation (it says it's Postgres-focused,
but it provides adoption data for the major vendors)[0] and the HN
comments[1].

What we tend to see (I speak from a position as a BI consultant - I get to see
lots of companies' database infrastructure) is that adoption of new SQL
constructs is mostly limited by the developers, and in some cases IT
departments who won't upgrade SQL server software.

New SQL being limited by developers comes in two flavors, crusty DBAs and do-
it-all-in-the-app-layer developers. The second type tends to be more common
among demographics similar to HN.

[0] [http://www.slideshare.net/MarkusWinand/modern-
sql](http://www.slideshare.net/MarkusWinand/modern-sql)

[1]
[https://news.ycombinator.com/item?id=9018129](https://news.ycombinator.com/item?id=9018129)

~~~
moron4hire
That's a pretty abysmal report. You basically can't count on any of those
features to be available across databases.

You are correct about the fear-of-SQL amongst most application developers.

~~~
greggyb
The target is moving. Your question: "Have any vendors moved beyond SQL-92?"
The answer is a clear yes. The question was not, "Do all major RDBMSs support
the same subset of the SQL standard?"

The implied concern with the new question is portability of code across
databases. This is a red herring based only on some snippets of SQL syntax and
features.[0]

Even if the SQL were identical, no major RDBMS has as its selling point that
it only supports the SQL standard. Additionally, even with identical SQL, the
implementations in each RDBMS are different enough that logically equivalent
code would still need to be rewritten.

One example: There is nothing in the SQL standard about optimization of
queries. One place this shines through is in Postgres CTEs vs MS SQL Server
CTEs (I cannot speak to other implementations). In Postgres, a CTE is an
optimization boundary - predicates from the query utilizing a CTE will not be
pushed down into the CTE. The entire CTE is executed each time it is
referenced. This is in contrast to the behavior of sub-queries, which do
support predicates being pushed down into them. In SQL Server, CTEs are just
an in-line view definition and do support predicates being pushed into the CTE
from the outer query. Any codebase converting from MS SQL Server to Postgres
would need to have every instance of CTE use evaluated for performance.

[0] The previously linked deck highlights only some advanced excerpts of the
SQL standard. The venn diagram of supported SQL constructs is much more
overlap than exclusive.

------
greggyb
> Reading, writing and maintaining hand-written SQL is hard and error prone.

I see this attitude a lot on HN, often whenever any ORM comes up, and it's
often taken as axiomatic.

I'm honestly curious to hear people's opinions about what makes SQL so much
more difficult to use correctly than other programming languages or
environments. I come from a background in BI and data warehousing, so I'm
pretty comfortable with SQL as a second language, as I like to call it.

Here are a few things I understand as unique in SQL, some of which clearly do
add unique difficulties:

* Everything not a SELECT is stateful - unit testing becomes difficult

* Syntax/semantics are far from most "normal" languages (not C-like, not similar to shell scripting, not OO, not functional, verbose in a style similar to COBOL, the hated)

I, personally, do not find large SQL codebases particularly difficult to work
with correctly or understand. Perhaps this is a combination of ignorance and
Stockholm Syndrome since most of my experience is in databases and heavy SQL
work. I'd like to understand better why the attitude in the excerpt I quoted
seems so common.

Ninja edit: Additionally, I find a lot of power and expressiveness in the
relational model. Again, perhaps because of the focus of work I do I find SQL
indispensable and very easy to use for both exploratory programming and data
analysis, as well as robust programming e.g. for ETL processing. Note: I still
use ETL tools and frameworks, but this is not to remove SQL from the equation,
but for the better dependency tracking and process monitoring.

~~~
T-R
SQL is incredibly low level

\- Modelling Sum Types requires giving up correctness guarantees, performance,
or both.

\- It's not expressive enough to get performance or abstraction - I can't
communicate "this is a one-to-one relationship", "this is append-only", "this
is associative", or "these things commute" in anything but comments. And my
co-workers need to remember to make those joins, and think about how they can
use that information to hand-optimize, because the optimizer sure as hell
isn't going to do it for them.

\- There's no abstraction/composition of anything at lower than a full query
granularity, so everything needs to be inlined and repeated everywhere - this
makes conceptually small changes, like going from single- to multi-column
primary keys or vice-versa, require rewriting practically every view/query.

\- Some optimizer WATs are required by the spec ('where' clauses can be
propagated down the AST even past another clause that does type refinement;
CTEs always materialize the result set and act as optimization walls), others
may be implementation dependent (At least in Postgres, "Union All", instead of
acting like a simple stream concatenation, more often than not materializes
the whole result set and pages it out to disk - your 0.2ms query now takes 2
seconds). SQL is "Lazy when the stars align" and optimized by "this won't
break things in the trivial case" \- most effort in non-trivial queries is
spent trying to convince the optimizer not to hurt you on a whim. Don't get me
wrong, the devs do absolutely impressive stuff with what they've got - these
quirks are clearly what needs to be done to compensate for the spec, and the
spec can't change for compatibility resons - but it doesn't make it easy to
work with.

\- Things like missing a column in a join clause (like when changing that
primary key) cause significant bugs that don't actually throw an
exception/type error anywhere - they just change the size of your result set.
This could be completely fixed with type-level literals.

\- Types are internally inconsistent, and supplied functions are pretty much
ad-hoc "we thought this might be useful". Type conversions regularly involve
first converting to a string or wrapping in a nested query.

\- The syntax is pants-on-head ridiculous for anything non-trivial, making
education difficult and maintenance tedious. (especially window functions -
SELECT a, b, LAST_VALUE(b) OVER (ORDER BY c ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS x).

The problem, though, isn't just the amount of code and maintenance issues this
creates - it's the sheer amount of information that everyone on your team who
touches the database _needs_ to know and not overlook _just_ to not create
subtle bugs and significant performance issues. And all this knowledge is
completely non-transferrable.

SQL is so bad that standard practice is just to not do anything on the
database unless you have to; eating the connection overhead, adding yet
another cache, and having a layer of wrapper functions are just considered the
cost of doing business, leading a lot of people to think SQL is perfectly
sufficient for what it does.

~~~
greggyb
I do truly appreciate the critique you've offered, as it has some well-thought
out objections to the language.

I don't want to get into a back and forth on how to deal with the specific
issues - they're good food for thought, but I think it wouldn't lead to a
productive conversation.

I would object that SQL is actually incredibly high level, though its
abstractions differ from the typical ones seen in other programming languages.
As a declarative query language, I can describe a join between two arbitrary
(no need to do anything besides have the tables exist) tables and the query
optimizer will do a very good job of choosing among several different
algorithms to perform that join for me. Without a thought to any statistics of
the data, the query is optimized appropriately, including the construction of
a temporary index if that is the best way to compute the join.

Or, transactions in general - ACID guarantees are maintained with a simple
transaction - almost no boilerplate, just saying where it begins and where it
ends with some options for specific behavior, and the transaction is
guaranteed on arbitrary data structures without any other code necessary.

I won't go into the data consistency guarantees you can make just by defining
a couple of foreign keys and constraints (performance aside - these can be
optimized - this would be awful to put together by hand in a project)

These are some of the abstractions in SQL which are absurdly high level
compared to doing something similar in another language. Several of your
bullets also support that it is a high level language, e.g. 4 "others may be
implementation dependent" \- this is not a statement ever made about assembly
for example.

SQL is, at the end of the day, a query language built around relational
algebra, already an abstract topic, and implemented on arbitrary hardware,
operating systems, filesystems, and so on. It's not got a lot of high level
concepts that exist in other languages, sure, but it's got a lot of stuff
built in that would be a bear and a half to implement on your own. Its high
level concepts are in general very orthogonal to those in other languages.

Again, I'm not disagreeing with your bullets, and I appreciate the time you
took to reply and your points are exactly the sort of thing I was asking for.
I just think that you're ignoring some of the power in it for want of the
power of constructs you're more familiar with.

~~~
T-R
It's definitely relevant that my complaints are from a Haskell perspective (as
this is a Haskell thread). If I'm working in just about any procedural/OOP
language, sum types are a completely foreign concept. If I'm working in PHP, I
don't expect a consistent API. If I'm working in a strict language, I don't
expect to be able to abstract my data model away from how I loop over it in
memory. If I'm used to anything without some support for dependent types, I
just assume that an empty list, or a list with too many entries couldn't
possibly be detected at compile time. The high level constructs it does have
are strictly less powerful, largely as a function of not being
composable/consistent and certainly not as polymorphic as the kinds of things
in Foldable/Traversable, though certainly better than writing your own for-
loops.

The thing is, these are practically all you do in SQL, so it's pretty
frustrating that, when working with it, so much time is dedicated to dealing
with these _solved_ problems. They're the reason you can't import a "votes" or
"comments" package to your database, and why being a database expert is such a
niche, specialized, and lucrative position - because so much knowledge isn't
captured in the language, or the compiler, or libraries (even though it quite
feasibly could be), but just spread by books and blog articles. It's why
building complex things on relational databases is so hard - the level you're
working at is strictly limited to "what keys do I join on to get the result
set I want" and "how do I make this only loop once" \- it's not inherent in
the relational model/relational algebra, it's strictly a problem with SQL.

To add, as a declarative query language, it's worth also comparing to things
like Prolog/Datalog. Prolog certainly doesn't get you all of the things I
mentioned, either (e.g., no dependent types), but it is a much higher level of
abstraction - you can describe things like "this is a one-to-one relationship"
at a high level. The flipside is that, to write performant queries, you need
to understand the Warren Abstract Machine, but because of its consistency,
that's a significantly lower barrier to entry than trying to understand a
specific database's optimizer implementation, or the arbitrary decisions in
the SQL spec, and the knowledge there is actually transferrable.

------
i_s
> HList has some properties that are very interesting to us. To start with,
> not only we can store an arbitrary number of elements of different types,
> but we can also know statically—that is, at compile time—both the number and
> types of those elements.

I didn't know Haskell could do that - impressive.

~~~
harveywi
You can do it in Scala too (cf.
[https://github.com/milessabin/shapeless/wiki/Feature-
overvie...](https://github.com/milessabin/shapeless/wiki/Feature-
overview:-shapeless-2.0.0)).

------
e28eta
I wonder if something similar would be useful for mapping between application
models and web services for those models. They're probably different enough
that the most useful part would be the idea of modeling the types involved in
the operations, but not much else.

