

Building New SQL (2013) [pdf] - lelf
http://file.bestmx.net/ee/articles/SQL-2.0/sql-2.0.pdf

======
jamii
Identity is hard. If you assign an integer to every row, as the article
suggests, you have to decide how that integer is effected by update
operations. If the reference is by identity (ie the new row gets the same id)
then you have to worry about stability of ids in materialised views (eg if
some part of the view is recalculated how we decide whether the new rows are
altered versions of the old rows or brand new rows). If the reference is by
value (ie new rows always get new ids) then you will need some separate way of
assigning identity to entities which don't have a natural primary key (eg
users).

The data model I'm currently working on is an append-only event log with
incrementally maintained views. Since we don't have to worry about update to
the base rows I'm leaning towards having references be by value and relying on
the maintenance algorithm to ensure that changed ids are propagated nicely. In
cases where actual identities are required they will need to be supplied in
the event log so that they are stable (eg user signup messages must contain
enough unique information to create an id for that user).

I haven't been able to dig up much research in this area so I'm all ears if
anyone has any thoughts.

------
jsiegz
The central problem with SQL is a lack of modularity and reusability. The
syntax is easy to learn, it's just onerous to do complex things.

Dev teams and analysts use text files, Gists, and emails to save hundreds of
useful queries containing similar patterns, subqueries, and clauses that could
easily be repurposed if the language had any reusability features. There's a
lot of waste and it can feel like a big task to ask a simple question.

That said, it's very easy to learn SQL and the author's suggestions don't seem
to maintain that learning curve. I disagree that "human language mimicking" is
a problem -- SQL syntax actually makes a lot of sense (I'm "SELECTing FROM a
table and ORDERing the results" etc).

What we need is a higher level language or tool to describe relations in a
database -- something like an ORM or XML used in legacy BI tools. SQL still
lives underneath. IBM Cognos, SAP Business Objects, Microstrategy, and others
tried this to varying levels of success. Another commenter (siganakis) gets it
right: "I would prefer a syntax layer that can be compiled / transformed back
to SQL."

We're actually working on something very similar to that at Looker
([http://looker.com](http://looker.com)) if anyone is interested in hearing
more about it, but we aren't quite at the point where we can release the
language spec to the world yet.

~~~
siddboots
> The central problem with SQL is a lack of modularity and reusability. The
> syntax is easy to learn, it's just onerous to do complex things.

From what I understand of it, the author here is describing a completely
different avenue by which relational programming language could be made more
powerful. Namely, through a higher-order type system. This is surely more
powerful a proposal than merely providing a mechanism for modularity on top of
SQLs existing interpretation of the relational model.

> What we need is a higher level language or tool...

ORMs and framework layers that compile to SQL have been around for decades,
and while they do allow modularity, they remain a source of perennial
dissatisfaction^1. Why is yet another such language or tool what we need?

If modularity and reusability really was "the central problem with SQL",
surely the problem can be called solved given all of the existing solutions
that you describe?

^1
[http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Compute...](http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx)

------
mamcx
I like the approach explained here:

[http://www.try-alf.org/blog/2013-10-21-relations-as-first-
cl...](http://www.try-alf.org/blog/2013-10-21-relations-as-first-class-
citizen)

Is something like:

requester_city = 'Paris' solution = suppliers

# 1). A supplier may only see information about the suppliers located # in the
same city than himself. solution = restrict(solution, city: requester_city)

# 2) The supplier's `status` is sensitive and should not be displayed.
solution = allbut(solution, [:status])

# 3). The country name must be displayed together with the supplier's city.
solution = join(solution, cities)

The good idea is that is composable. You can query by all the solution =

------
siganakis
I think that the biggest problem with SQL is more around the actual syntax of
the language and how verbose it feels to write complicated queries.

I would prefer a syntax layer that can be compiled / transformed back to SQL
but that does basic things like having a query start with the tables, then
joins, then groupings then the final projection.

Also a less cumbersome way to use the "WITH" statement to form named sub-
queries.

Perhaps something like:

    
    
        SELECT 
            COUNT(*) as columns,
            column_type,
            table_name
        FROM (
            SELECT  c.id, 
                    c.type AS column_type,
                    t.name AS table_name
            FROM tables t
            INNER JOIN columns c
            ON t.id = c.table_id
            WHERE t.system=false;
        ) a
        HAVING COUNT(*) > 1
        ORDER BY columns DESC
    

Being re-written as:

    
    
        # Use ":="  to replace WITH for named ephermal views
        # Replace "WHERE" with "?", "SELECT" with "|>" at the end
        non_system := tables 
            ? system=false 
            |> name:table_name, is:table_id
    
        # Replace INNER JOIN with "*="
        non_system_columns := non_system.table_id *= columns.table_id
            |> c.id, c.type:column_type
    
        # GROUP BY columns are automatically generated by non-aggregated columns
        column_types := non_system_columns
            |> COUNT(*):columns DESC, column_type, table_name
    
    

So the final query may look like:

    
    
        non_system := tables ? system=false 
            |> name:table_name, is:table_id
        non_system_columns := non_system.table_id *= columns.table_id
            |> c.id, c.type:column_type
        
        non_system_columns 
            |> COUNT(*):columns DESC, column_type, table_name
    
    

Any thoughts on this?

~~~
cordite
It's also really cool when you can compose queries. A thing in the Haskell
world called esqueleto does this

[https://hackage.haskell.org/package/esqueleto](https://hackage.haskell.org/package/esqueleto)

~~~
siganakis
Yeah, I think composability is one of the biggest things missing from SQL.

The issue is that composability is often tied to actually moving data around
in the database which has terrible performance. That is, you can compose a
query of multiple queries that dump partial data sets into temp tables.

Views get you part of the way there, but they are designed to be long lived
and are visible to all database users until they are dropped. This means its
dangerous to change them or clean them up, as its not always clear who they
are being used by.

Ephermal or temporary views that are session/connection based, or even
loadable as modules would be useful to me.

~~~
siddboots
I completely agree that this is one of the major lackings of SQL.

Most databases offer the WITH syntax, but, bizarrely, the SQL standard
specifies that the WITH block should materialised separately, which prevents
it from being used as a device for abstraction.

Oracle is the only database I'm aware of that allows the WITH block, but
doesn't impose the optimiser fence.

------
Pxtl
A new SQL is long overdue. Too many folks throw the baby out with the bath-
water and lose the relational algebra that makes databases awesome, but I've
been using enough LINQ to see that I really want more competition in the space
of relational languages - there's a lot of room for improvement.

~~~
collyw
I noticed a couple of other people talking up LINQ. Is it an MS only thing?

~~~
Pxtl
Well, a C#-only thing. It is available in Mono. It's a series of functional-
programming features in C# that can also be compiled into SQL and run on a SQL
server through an ORM, so you can use the same syntax for queries as you do to
search local collections.

------
seomis
Maybe it's just the informal writing style and concentration on deficiencies
in current _implementations_ of SQL, but the whole things reads as if the
author gave only a cursory glance at any relational model theory.

------
adwf
Aren't multi-table indexes otherwise known as indexable views?

Plus, I can't see any mention of how exactly the data is stored on disk. A lot
of stuff about the language used to access it, but nothing about the storage
which is 99% of any database engine. One of the purposes of a properly
normalised RDBMS (eg. key tables/star schemas/etc) is to minimise the number
of bytes on disk in order to optimise the slowest component - reading from
disk.

~~~
autarch
In the ideal world of relational databases, storage would be defined
separately from logical structure.

In other words, DDL and DML operate on logical things (relations, domains,
etc.), and we'd have some other language/tools to tell the DBMS how to lay
things out on disk. Of course, in the really ideal world the DBMS just comes
up with the best storage solution without manual intervention ;)

~~~
jerven
The SPARQL/RDF world is moving to that "figure out the best storage solution
itself". You just throw data into a system and the system figures out how best
to store it. There is some research code for this in MonetDB and the
openlinksw/virtuoso guys are hinting about it in their latest blog posts for
their upcoming release. Basically as some people like to call it, they want to
infer the relational schema hiding in your RDF data.

Currently most SPARQL endpoint store data in two tables a quad/triple table
and a big id to value table. But there is no need to do so.

Main problem with SPARQL and RDF problem is that as the schema is in the data
its difficult to figure out how data is connected and how one can use it. But
the W3C is starting up a workgroup to work on that problem, for tool builders
to solve this.

------
dhruvbaldawa
I had similar ideas when I was in college. There are quite some projects
trying to do similar things: [http://htsql.org/](http://htsql.org/)

and my naive and outdated blog post from college
[http://dhruvb.com/blog/posts/specialized-syntax-for-quick-
sq...](http://dhruvb.com/blog/posts/specialized-syntax-for-quick-sql-query-
building/)

------
platform
Implicit joins idea seems has already been expressed in SPARQL
[http://thefigtrees.net/lee/sw/sparql-
faq#benefits](http://thefigtrees.net/lee/sw/sparql-faq#benefits)

In my view, a good query language should support tool-free DSL-ization.
Meaning that it should be easy to express data model entities in a DSL for a
specific data model. I also think that a data architect should provide as one
of the most important artifacts of his/her data model, a query DSL for the
data in his/her model. This way the users of the data model will be using a
type-safe DSL to get the data.

So to me the biggest limitation of SQL or even SPARQL or the authors' proposal
in the referenced article -- is the lack of DSL construction features that can
reflect a relational or graph oriented data model

------
Padding
Is this a joke? The language used certainly seems inadequate.

Other than that, aren't there already other "completely relational" systems
like Dataphor (and some arcane IBM products) .. that failed to gain traction?

~~~
siddboots
It doesn't seem to be a joke, but yes, it does come across as little more than
a loose collection of thoughts, based only on the author's experience with
extant SQL implementations, and ignoring the entire body of literature in this
area.

With that said, I think that there are some solid ideas in there. The concept
of _higher-order relational programming_ is something that I'm personally
fascinated by, and to my knowledge it has been neglected historically.

Sometimes any discussion is good discussion :)

------
andy_ppp
Is it allowed to put this on github and get people adapting and thinking about
solutions to their issues and thoughts and changes.

Event adding more examples would get a good thing. Generally it looks pretty
good to me though.

------
mbell
What's up with the line 'We like sex.' on page 12 under Arithmetic?

~~~
adwf
I'm guessing a poor joke about S-expressions.

