
We Can Do Better Than SQL - 1st1
https://edgedb.com/blog/we-can-do-better-than-sql
======
brightball
I don't disagree that SQL can be improved. It's one of the biggest reasons I
use Postgres in the first place because there are so many improvements
available on top of SQL.

All that said...SQL is pretty darn effective. As a language, it's the true
backbone of the internet today. It's readable, explicit, fairly concise and
naturally translates to how data should be broken down for efficient
storage...or make some trade-offs to allow for more efficient retrieval.

There are differences with different vendor implementations...but that's what
different vendors are for - to find things the other guys are doing wrong and
improving on them to build a better product.

I wish the folks luck in their work to improve things, but the language I've
been able to rely on consistently over the last 17 years or so has been
SQL...and I've worked with a lot of languages. SQL is the one that gets the
most done reliably and lets me down the least often.

~~~
setr
Afaict, the state of SQL as a grammar with tooling is kind of pathetic.

As a standardized language, it doesn’t really exist; everyone implements
numerous extensions, and almost no one is fully ansi compliant

Almost all formatters attempt to be generic (believing standardization
exists), and fail to support the full grammar for any dialect.

Across the board, all parsers have pathetic error message support (error on
line 3, which is actually just the start of the statement).

The schema offers type constraints, but querying/ide’s extract no value from
that (that is, types are statically specified/constrained, but query editors
all pretend its fully dynamic)

Theres a lot of awkward nonsense, like where clauses are parsed before the
select in most parsing engines, causing alias usage to fail without wrapping
in a subselect/with clause

The grammars themselves are an inconsistent, ad-hoc mess

The grammar is also unnecessarily context-dependent (eg from must follow
select, and where after that), making programmatic composition unnecessarily
difficult

I don’t know how much of the tooling issue is a result of SQL as a language
versus the history itself, but I can at least confirm that trying to parse
multiple dialects is absolute hell, which would at least explain the sorry
state of affairs for eg formatters.

But the majority of its expressive power derives from the relational algebra,
and has nothing to do with the SQL grammar, and thats the majority of its
value. It seems obvious to me that at the very least the compositional issue
of SQL, and its self-inconsistent grammar, should be vulnerable to near-
lossless improvement without too much struggle, though I can’t say what the
alternative would actually look like.

But it seems like its riddled with a lot of unnecessary flaws

~~~
namibj
JetBrains tooling does use schema metadata for it's autocompletion. I'm not
sure though if it's anywhere close to full-on Haskell autocompletion in the
Atom editor (very fiddly and prone to break on minor version changes, I must
say).

~~~
hannofcart
Offtopic, but I am genuinely curious now about whether Atom's Haskell
autocompletion is significantly better than in other editors like VSCode, or
(n)vim with things like YouCompleteMe/NeoComplete/Deocomplete etc. Have you
used other editors? What is your opinion?

------
reilly3000
I’ll never understand why obvious marketing makes it to the front page, only
to get shat upon by 95% of the comments.

Of course we can do better than SQL. We could obviously do better than
Javascript. It’s globally understood that we could all do better than English.

We sometimes struggle to express ourselves with language. We can blame the
language, try to fix it, or invent a new language and try to get people to
speak it.

I would rather spend my time refining my elocution than learn a new language.
That said, there are words and phrases that simply work better in other
languages. I don’t know a single-word corollary to ‘Simpatico’ in English.

I wish the best to those who would make programming more expressive, and the
worst to those who would try to streamline away subtlety.

Performance is quite loosely linked to language, given sufficient abstractions
and optimizations. I submit NumPy as an example. We humans have lots of ways
to say what we want, and we want lots of different things in myriad ways. As
an analyst I often wish for a richer language, some way I could transcend
tabular data thinking to find and make associations around real-world state.
I’m certainly self-satisfied when my nested subqueries return what I’m
expecting, but I may be able to add more value if I had a better way of
expressing my questions.

If we can do better than SQL, it ought to bring more people closer to the
reality that lies behind the data, and further from the methods used to obtain
it. Maybe I don’t have the words for what I’m looking for yet...

~~~
swebs
>I’ll never understand why obvious marketing makes it to the front page, only
to get shat upon by 95% of the comments.

Posts have an upvote button, but no downvote button, so dissenters can only
use comments to express disagreement.

~~~
jonny_eh
And people often upvote based solely on the post title.

------
cletus
Yeah I'm not sold. One example from this post that struck me was the author
wanted to embed a select in a table expression. I'm not a fan of this at all.
I don't want it not to be clear if a given expression list will explode in
values or not.

I like the fact that SQL has a solid foundation in relational algebra. I see
no such foundation for the alternative.

I do like what LINQ did here (being SQLish), which was to put the FROM clauses
first. Some SQL variants have WITH clauses that are quite convenient but you
end up with:

    
    
        WITH (...) AS a,
        (...) AS b
        SELECT
          a.a1,
          b.b1
        FROM a
        JOIN b
        ON a.a = b.b
    

Common alternative:

    
    
       SELECT
         a.a1,
         b.b1
       FROM (...) a
       JOIN (...) b
       ON a.a = b.b
    

whereas I'd prefer:

    
    
       FROM (...) a
       JOIN (...) b
       ON a.a = b.b
       SELECT
         a.a1,
         b.b1

~~~
MarkusWinand
WITH is supported by all major SQL brands in the meanwhile.

[https://modern-sql.com/feature/with#compatibility](https://modern-
sql.com/feature/with#compatibility)

~~~
latch
Does the spec have any execution requirements re WITH? Pretty sure `WITH` in
postgresql is gated and thus can have considerable performance implication vs
a nested query (or none at all depending on the query).

~~~
trollied
Execution is down to the RDBMS.

Postgres is changing in the next release, BTW:
[https://www.depesz.com/2019/02/19/waiting-for-
postgresql-12-...](https://www.depesz.com/2019/02/19/waiting-for-
postgresql-12-allow-user-control-of-cte-materialization-and-change-the-
default-behavior/)

~~~
mathh
Impossibility of With clause inlining is indeed one of the major limitation of
Postgres, compared to Oracle. We we still have to wait for optimizer hints

------
JackFr
If the language is the problem, why write a new server?

Transpile your language to the equivalent SQL, and rely on decades of research
and real world experience in things like replication, optimization, locking
strategies, high availability, security etc. _the things unrelated to the
language_ current SQL databases are really good at.

Enterprises need enterprisey features.

~~~
1st1
This is exactly what we do. EdgeDB is based on PostgreSQL.

~~~
lixtra
And do you support other backends as well (i.e. MySQL?). Does EdgeDB still
offer a regular SQL interface because people need time to migrate.

Even if I was convinced EQL was the future I wouldn’t throw out the old stuff.
If the old stuff continues working but there is a smooth migration path, I
would probably give it a try.

Think of TypeScript vs. JavaScript.

~~~
1st1
> And do you support other backends as well (i.e. MySQL?). Does EdgeDB still
> offer a regular SQL interface because people need time to migrate.

No, but we have a few ideas on how to connect existing databases to EdgeDB.

> Even if I was convinced EQL was the future I wouldn’t throw out the old
> stuff. If the old stuff continues working but there is a smooth migration
> path, I would probably give it a try.

Yes, we'll be working on that.

------
brianpgordon
I have no background in databases so this may be naive or wrong, but the
single biggest pain point in SQL that comes to my mind is that it can be
difficult to tell what a query is doing without also knowing the constraints
on the tables involved. Here's a real-life (ish) example from work:

We have some_table which we want to join to other_table, but we need to map an
identifier through mapping_table in order to do it. So we end up with a query
like:

SELECT (...) FROM some_table INNER JOIN mapping_table ON (...) INNER JOIN
other_table ON (...) ...;

I know for sure when writing this query that the middle join to mapping_table
will map every some_table row to exactly one row (no more, no fewer) in
mapping_table. The problem is that the query doesn't capture this. The mapping
table isn't _really_ named something as obvious as "mapping_table" so someone
reading the query has a hard time inferring what the intent was. It totally
changes how you mentally parse and think about the query if the result set can
be accumulating multiple matching rows from the join, or maybe even losing
rows if there are no matches. You have to go bring up your database schema to
figure this out.

And, as a fan of static typing, I can't help but cringe at the possibility of
someone changing the constraints on the table without realizing that there are
queries which implicitly depend on the old ones. SQL offers no resilience to
this and will happily change the meaning of your query without a peep of
complaint if you drop that constraint from mapping_table.

If there's a fancy way to capture this "mapping" relationship in standard SQL
that doesn't just use a dumb inner join, I'd love to know about it. If not,
I'd love a query language that supports some annotations that help reading and
are either stripped out before sending to the database engine, or are actually
checked at runtime.

~~~
oconnor663
I get that with some queries, you're not sure how the data is going to be
retrieved, and letting the database figure all of it out for you is a good
strategy. But with a lot of queries, particularly the ones that I'm doing in
the milliseconds of a pageload, I want to be very sure that all my joins are
hitting efficient indexes. I hate that someone can change my schema, and that
can turn my efficient index lookup into a horrible scan, _without breaking
tests_.

Folks who know more SQL than me: Is there a good way to say "I would rather
this query fail than try to scan a table?"

~~~
ErwinSmout
Where you start to go wrong is where your talk starts to be of "YOUR" schema.
The schema isn't yours, it's the company's. And guarding it is the DBA's job.

(I understand full well that that is a problem if the company has kicked out
the DBA role and handed it over to the individual programmers, but perhaps
that is precisely the problem.)

~~~
oconnor663
Fair, though maybe I could describe the same problem from the other end. If
I'm a DBA and I want to change the way indexing is done, I might need to audit
every query in my company to figure out which ones depend on the old index.
One way or another, it seems like knowledge about a query's intent to use an
index could've been captured explicitly, in a way that's automatically
enforced in the future against any number of accidental breaks. But instead
all that knowledge is implicit.

~~~
ErwinSmout
Integrating EXPLAINs in your build/deploy process should, in principle, make
it possible for anyone to address any concern in such realms.

The DBA you were mentioning here, e.g., could first do a query in the EXPLAIN
results, and he'll have his "company-wide queries audit" in a matter of
seconds.

------
ilitirit
I don't disagree with the obviously true statement, but this code comment
comes to mind:

    
    
        // Dear maintainer:
        // 
        // Once you are done trying to 'optimize' this routine,
        // and have realized what a terrible mistake that was,
        // please increment the following counter as a warning
        // to the next guy:
        // 
        // total_hours_wasted_here = 42

~~~
stingraycharles
Also, don't forget the hours wasted of the people having to learn another
query language. :)

~~~
ErwinSmout
You can test that for yourself. Go through the three language-related sections
on my site and decide for yourself how many hours you'd need to "waste" before
you would "get it".

And as for "waste" : if while learning a better language, they are in addition
also learning the relational model, and to think relationally, (their
knowledge of both of which will be VERY poor if all they've ever seen is SQL)
then there can't have been much time "wasted", can it ?

------
docker_up
I think "ease of use" over SQL is not the hill I would die over if I were
trying to displace SQL.

It's far too embedded throughout the entire industry and as a data analyst,
learning EdgeQL vs SQL and then being locked into a new startup database that
could disappear in a year doesn't seem like a high probability strategy.

I wish the people all the luck but unfortunately SQL is "good enough", pretty
standardized (I can use just about any relational database and get useful data
by knowing the basics). The inconsistencies may be mathematically "ugly" but
it's not hard to wrap your head around and overcome.

~~~
ericb
The middlebrow dismissal strikes again!

Why is it so hard to imagine something displacing SQL? A simpler, more
predictable syntax seems perfectly plausible--it could ship alongside SQL. Do
we have StockHolm Syndrome?

The negativity is surprising and at the same time predictable.

~~~
hobs
Because of the decades of things not displacing it - when someone suggests "oh
we'll just do it simpler" they often are not seeing the forest for the trees.

Simpler languages have been shipped dozens if not hundreds of times, and they
generally tend towards expressing the things they missed or not giving enough
functionality for the things they missed.

I am not saying its impossible, but you're going to have to do a lot more than
hand waving to justify the reverse position.

~~~
chuckgreenman
It's sort of like the mouse trap problem. Mouse traps and SQL are already
incredibly simple and incredible effective, that's why you don't see a
reinvented mouse trap at home depot and why SQL remains unseated despite many
efforts to replace it.

~~~
adamlett
You really think SQL is _incredibly simple_? I think TFA makes a pretty
convincing argument that it is anything but.

~~~
darkpuma
Most common SQL statements read like somewhat stilted english. Many non-
programmers find this particularly accessible.

Yes you can make some lovecraftian horrors if you really want to, but SQL is
one of those things where just a little bit of knowledge goes a long way. If
you can understand the basics you can get a lot of work done.

It's a lot like Excel. You can do some really complex confusing stuff in
Excel. But you can also teach the basics to non-programmers quite easily, and
command of the basic skills will be very empowering. Basic knowledge of Excel,
like SQL, gives the user new ways to leverage computers when creating their
own solutions to their own problems.

~~~
adamlett
_Most common SQL statements read like somewhat stilted english. Many non-
programmers find this particularly accessible._

The problem is not reading SQL, but writing it.

 _It 's a lot like Excel. You can do some really complex confusing stuff in
Excel […] Basic knowledge of Excel, like SQL, gives the user new ways to
leverage computers when creating their own solutions to their own problems._

I can’t speak to your experiences, but I’ve never in my life encountered
someone who was not a professional programmer ever even contemplating using
SQL for anything let alone _creating their own solutions to their own
problems_. I think it’s safe to assume that the overwhelming majority of
people who use SQL, are programmers who most certainly cannot get by with just
basic knowledge.

~~~
darkpuma
Librarians and secretaries are two examples of "non-programmer" careers where
functional knowledge of SQL is pretty common. Less-so these days with
secretaries, but moreso for librarians. Not to mention tons of researchers
across countless disciplines have SQL in their toolboxes. I've even met
government bureaucrats with professional backgrounds in regional banking that
know SQL. Previous programming experience? Using HP-12c calculators...

>I think it’s safe to assume that the overwhelming majority of people who use
SQL, are programmers who most certainly cannot get by with just basic
knowledge.

Most demonstrably do though, so there's that. If you cast a wide net when
polling programmers, I think you'd find that mode level of knowledge was
relatively low. You don't need to be a SQL rockstar ninja dude to do what most
professional programmers are doing with SQL most of the time. Obviously
advanced knowledge is good for any professional programmer to have, but the
fact is there are a TON of people out there who only know the basics, and that
works for them.

------
pbiswal
Hi - a lot of commenters have valid concerns and critiques, but I joined HN
after lurking for years to say that I really like the direction you’ve taken!
I’m particularly happy about the convenient syntax for joined inserts and sum
types.

I see that you’ve built this as a patched version of Postgres, but I’m curious
how much of this syntax you could implement as a client library and shell that
would run against an existing Postgres instance.

Right away, you would get adoption from people who have an existing Postgres,
or who want to take advantage of SaaS offerings like AWS Aurora.

Longer term, I could imagine the client/shell being extended to support
multiple backend DB dialects, even things like Spark or Redshift which you’d
have a hard time modifying intrusively.

It could also be cool to explore interoperation with existing schemas written
in plain SQL, so people could adopt it incrementally that way.

~~~
1st1
Thank you for your comment, it aligns pretty well with our current thinking
about this.

> It could also be cool to explore interoperation with existing schemas
> written in plain SQL, so people could adopt it incrementally that way.

Yes, this will happen too.

------
tathougies
The article criticized SQL because the following expressions are incompatible

> SELECT * FROM table

and

> SELECT count(*) FROM table

This is actually not true, as both return table values. It then says that, in
EdgeQL, every expression results in a 'set'. This is a distinction without a
difference.

I don't disagree we can do better, but this is the same.

~~~
RedCrowbar
There is a difference. You are right in a sense that scalar expressions are a
form of a table expression, but...

This is a valid expression:

    
    
      SELECT (SELECT count(*) FROM table) + 1
    

This is not:

    
    
      SELECT (SELECT * FROM table), 1
    

In EdgeQL:

    
    
      SELECT (count(SomeType) + 1)
    

and

    
    
      SELECT (SomeType, 1)
    

are equally valid.

~~~
tathougies
> SELECT (SomeType, 1)

This cannot be the case if you call each thing a 'set'. Unless `(a, b)`
calculates the cross product, there is no meaningful non-arbitrary way to
assign each element of a to an element of b. That would depend on an ordering,
which makes it not a set to everyone but a marketing dept somewhere.

~~~
RedCrowbar
`(a, b)` is a cross-product, just like "SELECT * FROM a, b"

------
mamcx
Yes.

I think many here miss totally the point. Is the same mindset that lead to
believe C, bash or unix are good(?) then why bother?

I'm very lucky that I start with FoxPro, so I know what is to live with a MUCH
better thing than SQL.

To the point:

\- "SQL is standard, everywhere... why try to change or do something else?"

I hope none here work in _new_ software products.

And also, try to replace ALL the OLD with the NEW is no the point, is to try
to DO BETTER THAN THE OLD.

I suspect most of use are in that business... right?

\- "SQL is a good language!"

Superficially, yes.. as good is javascript or C. More exactly to be the
"default" option we are VERY lucky SQL is not alike C or JS. But "good" is
not. Is adecuate.

IF you DON'T KNOW WHY then learn the why first. For us that LIVE in the RDBMS
world is clear as water than SQL is not the ideal, is what we have.

\----

Is important to understand that "SQL", Rdbms and the relational model are
badly misunderstood (just look how the NoSql movement market itself!)

\- SQL IS NOT THE RELATIONAL MODEL, is BASED on it.

The relational model is MORE expressive, powerful, flexible and EASIER than
SQL.

\- SQL IS NOT RDBMS

A much better APIs can be build on top a RDBMS but SQL is a poor language for
that. This is part of why exist so many extensions.

Think, for example, why you can't do

    
    
        SELECT key FROM BTreeInde_CustomerCode
        GROUP BY (whatever -- And I mean GROUP, not SUMMARIZE!)
        let city = SELECT name
        let filter = WHERE id = @id
        let query = city + filter(id=1) 
    

and many many other things like that

Many things that get mixed with the "SQL" language are part of the
implementation or capabilities of the RDBMS.

\- However, SQL IS A FINE EXAMPLE IN HOW DO DATA

And in fact, is not that hard to imagine a much better version, SQL is not
that far from a better option!

~~~
lincpa
I worked as a Foxpro programmer for 18 years. I later switched to the Clojure
language for 8 years. I used the functional programming language Clojure as
SuperFoxpro, which is very easy to use and enjoyable. I has formed a new
programming idea ---- `Everything is Foxpro (RMDB)`. ;-)

I advocate: Building a relational data model on top of hash-map to achieve a
combination of NoSQL and RMDB advantages. This is actually a reverse
implementation of posgtresql.

[Clojure is a functional programming language based on relational database
theory]([https://github.com/linpengcheng/PurefunctionPipelineDataflow...](https://github.com/linpengcheng/PurefunctionPipelineDataflow/blob/master/doc/Clojure_is_FP_based_on_RMDB.md))

[Everything is
RMDB]([https://github.com/linpengcheng/PurefunctionPipelineDataflow...](https://github.com/linpengcheng/PurefunctionPipelineDataflow/blob/master/doc/Everything_is_RMDB.md))

[Implement relational data model and programming based on hash-map
(NoSQL)]([https://github.com/linpengcheng/PurefunctionPipelineDataflow...](https://github.com/linpengcheng/PurefunctionPipelineDataflow/blob/master/doc/relational_model_on_hashmap.md))

~~~
mamcx
Excellent take. The functional model is not too far.

------
tschellenbach
It's like SQL, but different. About as hard to learn, not fully developed and
incompatible with years of tooling in the ecosystem.

~~~
lmkg
This critique reminds me of Dart: A better JavaScript, that wasn't better-
enough to be worth the transition.

I have the same concerns about the proposed language. Yes it's better, but is
that enough? What's going to drive adoption?

~~~
izolate
Dart was better enough, as evidenced by the recent uptick of Typescript. Had
it been released in 2019, I could envision a different story.

------
z3t4
SQL is an incredible language. I don't think we can do better, only different.
It's not that common to mix two programming languages like you mix SQL and
application code. For example having your app code in JavaScript then write
SQL queries to access the data feels unfashionable. Polling also seem like
something from the 80's. RethinkDb was a step in the right direction, but I
think the mistake was to invent another query language, instead of making the
database querying seamlessly integrated to programming languages. It should be
possible to write observables without having to context switch back and forth
between data querying and app logic.

I'm a "full stack" developer/speaker, and for a long time I useed English on
the front-end, German on the back-end, and Spanish for talking to the
databaste. When I switched to English on the back-end I saw an productivity
boost and I got better at and started to like English more. But I still talk
Spanish with the database. I could probably be more efficient using English to
talk to everything. I would probably still use French for design though.

------
madhadron
This is the wrong direction.

Let's get the history in place first. The relational calculus set up a set of
operators. That's what Codd worked with. It was combined with a natural
language project at IBM which gave us the vaguely English like syntax of SQL.

> In EdgeDB, the data schema is formulated in a way that is much closer to the
> contemporary application data model. ...Unlike SQL, EdgeQL can easily
> extract arbitrary data trees.

This is a reinvention of the the hierarchical database, one of the
architectures that was abandoned when the relational database came along. This
was all fought out on a large scale in the 1980's.

The fundamental idea that you have a highly structured data tree is what the
relational model threw away. "Data Oriented Design"[^1] does a pretty good job
of explaining why, but roughly: we're accustomed to programming where we have
an object/entity/value that represents a conceptual "thing" in our discourse.
We compose things into other things, subtype things, and otherwise deal in
things. And this turns out to be a disaster when lots of people need to extend
and reuse hunks of the same database. If you give up on things as first class
parts of your data model, and only talk about relations among attributes, a
lot of those problems go away.

If you want to improve SQL, look at going back to the relational calculus
without the natural language veneer. There have been a number of systems that
tried that, though none have gotten real world traction. I'm also blanking on
their names at the moment, so if someone who's memory is working better can
help me, I would appreciate it.

[^1]:
[http://www.dataorienteddesign.com/dodmain/](http://www.dataorienteddesign.com/dodmain/)

~~~
RedCrowbar
> This is a reinvention of the the hierarchical database

It's not. Just because data can be fetched as a set of "tree-like" things does
not make the model hierarchical in any way. But it makes lots of practical use
cases easier.

> The fundamental idea that you have a highly structured data tree is what the
> relational model threw away.

And then ORMs came and reinstated the status quo. Truth is, many applications
actually _benefit_ from a highly structured data graph. That said, you don't
have to use the "highly structured" bits of EdgeDB, it's perfectly fine if all
your types are simple tuples.

> We compose things into other things, subtype things, and otherwise deal in
> things. And this turns out to be a disaster when lots of people need to
> extend and reuse hunks of the same database.

I would be very interested in reading about this. Do you have any references?
Most cases of "disaster" in my experience had to do with extremely poorly
defined and documented schemas that had barely any relation to the actual
business model.

------
6gvONxR4sf7o
I always wished SQL had a better handle on sum types. If I have a user whose
favorite story is an instance of Either[Movie, Book], then it's already a pain
to deal with in a nice simple way. And that's as simple as sum types get.

~~~
RedCrowbar
You can do that in EdgeDB:

    
    
      type Movie { 
        property director -> str
      };
      type Book {
        property author -> str
      };
    
      type User {
          multi link favorites -> Movie | Book
      };
    
      SELECT User {
        favorites: {
            [IS Movie].director,
            [IS Book].author,
        }
      };

~~~
6gvONxR4sf7o
Can you tell me whether I'm understanding this correctly?

Would this query result in, e.g. [(director: Null, author: J.K Rowling),
(director: Spielberg, Null), ...] or would it be [author: J.K. Rowling,
director: Spielberg, ...] or just plain strings: [J.K. Rowling, Spielberg,
...]? I still don't totally get the model here.

~~~
RedCrowbar
If fetched as JSON: {"favories": [{"director": null, "author": "J.K Rowling"},
{"director": "Spielberg", "author": null}, ...]}

You can also query the actual object type by referring to the __type__ link:

    
    
      SELECT User {
        favorites: {
            __type__,
            [IS Movie].director,
            [IS Book].author,
        }
      };

~~~
6gvONxR4sf7o
Thanks for the clarification! Seems really cool :)

------
mongol
It is interesting to see new approaches. But SQL is so entrenched, so
available and so good enough.. it will be a very uphill journey to compete

------
jpobst
Many comments (probably correctly) point out that SQL can't be displaced
because it is ubiquitous and "good enough".

I wonder if there is room for a "TypeScript" of SQL that would allow
developers to opt-in to whatever new language features or paradigms we feel
SQL is missing.

It would then transpile down to regular SQL to be executed.

~~~
Svip
I was wondering the same thing. Why not rather than build a new database
engine, write a library like an ORM that transpiles a language - like EdgeQL -
to SQL, so you can attempt to use it, without switching underlying database
engine, and still use your old SQL code?

~~~
RedCrowbar
EdgeDB is built on Postgres, it's not from scratch.

------
oarabbus_
>The questions we often hear are “Why create a new query language?” and
“What’s wrong with SQL?”. This post contains answers to both.

Ah, I can already tell I'm not going to like this article.

>lack of proper orthogonality — SQL is hard to compose;

Their reasoning: "The difference in structure is large enough to make any sort
of source-level query reuse impractical."

This is blatantly false. The second example in the article using the join
covers both the single-row and multi-row result cases. Additionally, SQL can
be made very reusable and modular using tools like DBT, or within a DB using
views or other constructs.

>lack of compactness — SQL is a large language;

It's certainly smaller in the keyword set than almost any other computer
language I have used. This is like saying "It's a very cold day in the Sahara
desert" when the temperature is 40C/105F.

>lack of consistency — SQL is inconsistent in syntax and semantics;

Is it? Or do people take liberties with their implementations? There is an SQL
Standard and Postgres is compliant to it. Other databases not being fully ANSI
SQL compliant is the fault of the DB creators, not the fault of SQL.

>poor system cohesion — SQL does not integrate well enough with application
languages and protocols.

It is a relational language designed to be used within a relational database.
Do the authors also complain that their screwdrivers are very inefficient at
cutting meat?

They're onto something with the trickiness of NULL, but there have already
been tons and tons of discussions on Hacker News - anyone with database
experience, either analytical or transaction, will tell you that imagining a
NULL-free world actually creates more problems then it solves.

I found this to be a highly misinformed article, personally; really it's a
marketing ploy for their proprietary language. Just like most other "we have
our own query language better than SQL" product, it probably isn't.

~~~
ftcHn
Thanks for the DBT tip. I've always wanted composable SQL statements but never
known how to achieve it.

[https://docs.getdbt.com/docs#section-what-makes-dbt-so-
power...](https://docs.getdbt.com/docs#section-what-makes-dbt-so-powerful-)

~~~
daturkel
We've started using DBT at my office for ETL into our data warehouse and it's
a godsend. Being able to add tests to tables and views, built in documentation
support, built in DAGs are all great, and the folks at Fishtown who make dbt
are super responsive on their community slack channel. (Also it's open
source.)

One pain point is the integration of non-SQL (e.g. Python) jobs into the flow.
DBT only manages SQL scripts, so you can lose some of the elegance when you
have to glue some external jobs to it. That being said, the development is
pretty rapid and as I mentioned the support is great, so I'm excited to see
where they go.

------
ken
This article misses the single biggest failing of SQL, in my mind: that it's
not a language. It's a family of incompatible languages, with similar syntax
-- like "Lisp".

I can't write a program that works against 'any database', because switching
databases means a lot of extra custom work. That means databases can't easily
compete against each other. (Oracle's license terms sure don't help.) That's
terrible. It's worse than x86/PPC/ARM -- at least with CPUs, a compiler can
generate machine code for any of them, coming from the same source code.

~~~
ris
Your point seems a little like tilting at windmills to me. Different databases
written by different entities who didn't have a huge incentive to strictly
conform to each others syntax varied in their implementations. This is just
life.

> It's worse than x86/PPC/ARM

You're... saying that there should only be one CPU architecture too?

~~~
ken
> This is just life.

Market forces naturally explain what we have today, therefore ... I'm Don
Quixote. Sure, I'll accept the title. Likewise, I could say: if you don't like
reading complaints like this, blog posts and internet comments might not be
for you. :-)

> You're... saying that there should only be one CPU architecture too?

No. I'm saying that the analogous situation with CPU architectures was
observed 50 years ago, and despite causing inefficiencies (especially at
first), we've spent the effort to create abstractions to gradually make it
easier to describe systems at a higher level. It generally _doesn 't matter_,
to either users or developers, that there are multiple CPU architectures. We
still have many programming languages, too.

Databases have gone in the opposite direction: while adding features over the
years, even when these features are 99.9% functionally identical between
vendors, they've picked different syntax and data types. Even though they all
speak "SQL", databases are more incompatible today than ever.

I guess what I'm getting at is: it's a good thing that compilers were invented
before all software needed an ROI, and it's a shame databases didn't get the
same treatment before corporate interests took over.

------
pjungwir
I like a SQL a lot, but I wouldn't mind a nicer interface to a relational
database. TutorialD seems more elegant and expressive than EdgeQL though. If I
were designing a new query language, that's where I would look for
inspiration. Anyway I'm glad to see people experimenting!

I am waiting for someone to write a paper titled "SQL NULL is (nearly) a
Monad." It is "contagious" just like mapping over None. Sometimes I think it'd
be cool to add Maybe<T> columns to Postgres, where operators would do the
right thing for Some(x) vs None, but then I think, "Wait, that's how it
already works!" At least it is very close. But one way or another I would love
to see a re-assessment of SQL NULL from the perspective of category theory. If
we could redo NULLs as Maybe types, what would that improve? Incidentally,
Leonid Lybkin published a really cool paper in 2014 about handling NULLs with
category-theory ideas, but it is more about drawing inferences from incomplete
premises ("Incomplete data: what went wrong and how to fix it"). Maybe he'll
write something about Maybe column types too. :-)

Also if you are interested in improving relational databases, Codd's 1979
paper "Extending the Database Relational Model to Capture More Meaning" is
super interesting. The first part is where he adds NULLs (in order to add
OUTER JOINs basically), but the second part is almost never mentioned and
gives a way to query the schema itself as part of a database query. It looks a
lot like doing graph database queries from your relational db. Also a better
kind of EAV pattern. Also more support for OO-style inheritance. Somehow 40
years later it is still way more advanced than any existing RDBMS. It would
totally break the parse-plan-execute pipeline of today's systems, but it is
fascinating to think about. I wish more people would read this paper before
trying to go "beyond relational"!

~~~
wisnesky
Check out [http://categoricaldata.net](http://categoricaldata.net) for a
categorical approach to database theory that uses a labeled null semantics
from type theory.

------
beefield
Not sure if there is something wrong in my brain but I have not been able to
understand the hate against NULL. To me the three value logic feels very
natural and just clicks. Lately I have been dipping my toes into Alteryx,
which treats empty values in a way that definitely does not correspond to
'normal' NULL logic in databases and it feels very much like a constraint on
building my workflows.

~~~
vpetrovykh
If a NULL were just a value in 3-valued logic, it could have been OK. However,
that's not how it works out. Consider the following (that should be equivalent
if NULL is just like Maybe in True-Maybe-False logic):

    
    
      SELECT null AND true;
    
      SELECT bool_and(column1) FROM (VALUES (null::bool),(true)) AS foo;
    

In PostgreSQL the first query produces NULL, while the second produces TRUE.
Yet, it's also possible to get NULL as a result of aggregating values with
bool_and:

    
    
      SELECT bool_and(column1) FROM (VALUES (null::bool),(null::bool)) AS foo;
    

And that's not how a 3-valued logic works.

~~~
MarkusWinand
You are mixing two concepts here: 3VL and how most aggregates work in SQL: the
drop NULL values before doing their work. That's why the first BOOL_AND
example only sees one value, thus returning true.

------
fulldecent2
The demise of SQL has been greatly exaggerated. If somebody wants to actually
propose something better, the threshold is high. They should publish:

\- A tool which transpiles their new language into (multiple versions of)
SQLite, MySQL, PostgreSQL \- This tool should always produce a correct result
(Modernizr) or a specific error (e.g. upgrade MySQL to XX+ to use
`NEW_LANGUAGE_CODE_SNIPPET` JSON functions) \- A separate tool that up-piles
SQLite, MySQL, PostgreSQL to the new language, possibly exploiting new
streamlined expressions \- An editor with IntelliSense for the new language

Even then you can probably achieve a lot value just by inputting table joins
before the developer starts editing code.

\---

If you're not doing that then you are just keyword spamming by mentioning SQL.

------
beart
The one small thing I've often wished for is to place the from clause before
the select clause. Having to type out the columns before the table means auto
completion never works well.

------
xpil
I said this before and I will say it again: SQL is for data processing what IP
is for networks: it's the neck of the hourglass.

There are many things happening below SQL layer (storage engines,
implementations, hardware etc) as well as above it (applications, reporting
engines, various services and whatnot). There are many things happening below
IP layer (hardware, protocols) as well as above it (TCP, applications).

Even though there are better ways to build a neck of an hourglass, my bet is
that (1) they are not _much_ better and (2) it would be extremely difficult to
get them replaced in all the existing hourglasses without breaking things
terribly.

Conclusion: SQL is going to stay forever, like it or not.

------
nicoburns
Open source (Apache 2.0), and implemented on top of Postgres. This is quite
interesting. I love SQL, but there's definitely room for improvement.

------
alexnewman
The critiques of SQL arent' wrong. I have written 4 parsers and can tell you
the lack of orthogonality is a pain. Luckily we have the NIST compatibility
guidelines to keep us safe. A replacement of SQL which contains the same
compatibility tests would sell me on a new language, but it's really hard

------
acomjean
We can do better. I use sql a lot though I’m not great at it. This seems much
better.

The problem is getting a replacement on enough databases (storage engines?) so
it’s universal. And performant. I’m not sure what’s involved with that but
since there are a lot of open source dbs it seems possible.

Nice work

------
beagle3
There have been tens of attempts over the years - a Dutch one called Xplain
was quite good IIRC. But they all suffer from delivering too little extra (if
fully adopted) against the incumbent.

Unfortunately, SQL is good enough for the majority of users and uses.

------
darkpuma
Saying you can do better than SQL is an extraordinarily bold claim,
considering SQL is one of the most successful programming languages in human
history and certainly the most successful fourth-generation programming
language. Wide swaths of people you might ordinarily think of as non-
programmers know how to use SQL in meaningful ways.

Doing better is certainly possible, but SQL is much better than a lot of
people give it credit for.

------
zepearl
In my opinion the example of the first chapter ("Lack of Orthogonality") is
wrong. The subquery (is it called "inline subquery?)...

> _SELECT name FROM emp WHERE role = 'dept head' AND deptno = dept.no_

...should in my opinion definitely return only 1 row for each department -
from a logical point of view returning multiple rows would mean that the data
is corrupt upstream or that the organization itself is corrupt or that there
is a lack of attributes in the DB (if no additional selection criteria like
for example "management" or "operations" or "deputee" etc... can be added to
the query - meaning that no sub/organization can have more than 1 person
responsible for the exact same thing).

I admit that this is a very focused critic and that I'm very happy with the
current behaviour of the generic SQL language and its special cases which are
linked to the DB being used (using currently Oracle, MariaDB, Clickhouse -
used DB2, Kudu through Cloudera stack, PostgreSQL, maybe something else) and
how it stores/processes the data etc... .

------
ben509
> In situations where EdgeDB cannot infer the type of an empty set [2]

Why not bottom, or are containers not covariant? I'm worried that we're not
clear about the difference between variables and values.

> Strictly speaking, EdgeQL sets are multisets [1]

Like nulls, this is another broken aspects of SQL.

Maybe you can address these concerns, but I'm skeptical that it's relational:

* It conflates types with relation variables

* It's using links rather than foreign key constraints

* It doesn't seem like the user can even specify a candidate key

* Which is why the "id" field is magical and special

* Constraints only seem to apply to scalars

The reason SQL has been successful is that tables are dead simple to
understand, and you're throwing that out. This feels more like a graph
database.

[1]: [https://edgedb.com/docs/edgeql/overview#type-
system](https://edgedb.com/docs/edgeql/overview#type-system) [2]:
[https://edgedb.com/docs/edgeql/expressions/overview#set-
cons...](https://edgedb.com/docs/edgeql/expressions/overview#set-constructor)

~~~
RedCrowbar
>> Why not bottom, or are containers not covariant? I'm worried that we're not
clear about the difference between variables and values.

Yes, `{}` is inferred as `anytype`, but we chose to restrict its use in output
for interoperability reasons: the output of a query must be of a concrete,
known type.

>> Strictly speaking, EdgeQL sets are multisets [1] > Like nulls, this is
another broken aspects of SQL.

From the purity standpoint, yes, but in practice, duplicate values are either
expected, or too expensive to eliminate. That said there is a `DISTINCT` [1]
aggregate.

> * It conflates types with relation variables

Not exactly. We do use the same symbol to denote a type, _and_ a relation that
it represents, but the context of which is which is always clear (i.e. Foo in
a cast expression is always a type), but Foo in a regular expression is always
a relvar.

> * It's using links rather than foreign key constraints

Links are an abstraction over a foreign key. In the vast majority of cases
foreign keys are implemented as 'some_id' -> 'id' anyway. If you want a custom
foreign key, you can set an expression constraint on a property or a link.

> * It doesn't seem like the user can even specify a candidate key

type Foo { constraint exclusive on ((.name, .last_name)) }

> * Which is why the "id" field is magical and special

See RM proscription 15 in the Third Manifesto [2]

> * Constraints only seem to apply to scalars

They aren't, see above.

[1]
[https://edgedb.com/docs/edgeql/funcops/set#operator::DISTINC...](https://edgedb.com/docs/edgeql/funcops/set#operator::DISTINCT)
[2]
[https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf](https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf)

~~~
ben509
Thanks for the clarifications!

------
oxfordmale
>> We Can Do Better Than SQL (edgedb.com)

No you can't. Many companies have tried, including very large and successful
companies and have failed miserably. Just try out the JSON query language used
by MongoDB, including their aggregation pipelines, and you will absolutely
love SQL.

------
nicodjimenez
"We Can Do Better Than SQL", sure, but how much better?

Fundamentally, SQL is a declarative language for databases, that makes the
easy stuff easy, and the hard stuff doable.

If people want to build new front ends for SQL databases, to make it easier to
build apps, I can buy that.

But trying to replace SQL as the standard declarative language for data is a
fool's errand. You'd have better luck getting developers to switch away from
git. People don't use SQL databases because they're fun for developers to hack
on, they use them because they solve a business need.

If you want to innovate in the database space, I think you need to provide
something that benefits end users (global availability, auto scaling, speed,
client syncing ala Firebase, ...).

~~~
solidasparagus
> makes the easy stuff easy, and the hard stuff doable

I like SQL, but I disagree. It also makes easy stuff hard.

Given many numbers (billions), how would you find the 1st, 3rd and 5th highest
values?

~~~
dragonwriter
The SQL to do this is simple and straightforward using analytic functions
(specifically nth_value); the performance is likely to suck hard if you don't
have an appropriate index, but that's not an SQL problem, but a “sorting
billions of numbers is expensive” problem.

    
    
      SELECT 
        nth_value(num, 1) OVER ORDER BY num, 
        nth_value(num, 3) OVER ORDER BY num, 
        nth_value(num, 5) OVER ORDER BY num
      FROM t

~~~
im3w1l
nth_value can be done in a faster way than sorting and picking. It can be done
in O(n) while sorting is O(n log n).

~~~
jeltz
Which is why PostgreSQL uses heap sort with a fixed max heap size for sort
with a small limit (called "top-N heapsort" when running explain analyze).
Then the complexity for getting the kth value is O(n log k) which is O(n).

------
keithnz
Amazing how our history of programming has so many things putting layers over
SQL. I think mainly because the bridge between SQL and General Programming is
often not too pretty ( string based queries with no validation ). But in my
experience, it's often better to just embrace SQL. I won't argue the syntax is
nice, but it is conceptually nice. Usually the further away you get from
queries on relational databases, the more often you do things very
inefficiently from a database point of view, to the point of being orders of
magnitude worse.

------
jeffdavis
The {} construct seems awfully close to NULL. Not sure I understand the
fundamental difference -- having a third value in boolean contexts still
introduces 3VL.

Why not just use an Option/Maybe type?

~~~
RedCrowbar
Empty set differs from NULL is that you get an empty set if you apply an
element-wise function over it (which most operators and functions in EdgeQL
are).

> Why not just use an Option/Maybe type?

We are considering adding algebraic types and a syntax to match them, but it
ultimately boils down to taste and use case:

    
    
      SELECT value ?? "fallback in case of empty"
    

is not fundamentally different from (hypothetical):

    
    
      SELECT 
        MATCH value
        CASE Some(non_empty_val) THEN non_empty_val
        CASE Empty THEN "fallback in case of empty"
        END;

~~~
jeffdavis
"Empty set differs from NULL is that you get an empty set if you apply an
element-wise function over it"

It still sounds like NULL -- I must be missing something.

~~~
RedCrowbar
Take the CASE WHEN example from the blog. An equivalent EdgeQL expression is

    
    
      SELECT 'one' IF value = 1 ELSE 'not one'
    

If value is an empty set, then the result is _always_ an empty set, unlike SQL
that pretends NULL values are actually boolean for the purposes of the
condition. EdgeQL expressions are, essentially, set comprehensions, the above
is equivalent to this Python expression:

    
    
      {'one' if v = 1 else 'not one' for v in value}

------
haolez
I acknowledge the problems with SQL that the author stated, but I still prefer
SQL over the alternative presented.

What would be other alternatives to a declarative language like SQL? Prolog?

~~~
esfandia
Datalog indeed is a thing:
[https://en.wikipedia.org/wiki/Datalog](https://en.wikipedia.org/wiki/Datalog)

------
garmaine
Why not Datalog?

It would seem to meet all your constraints, and is well-studied.

~~~
Vaslo
With due respect, Datalog is a very difficult language to learn, and it just
doesn't have the support and resources SQL does. We were forced to use it in a
Data Cleaning course a few years ago because the professor was big on it, and
everyone struggled. To this day, fellow students complain about that part of
the course. No one ever looked at it again after that class.

Datalog will ALWAYS be a niche language, because it is designed for a certain
type of computational mind. The vast majority will want to learn SQL due to
its ease despite any shortcomings it may have.

~~~
garmaine
Is the difficulty you had due to it being a functional language? Or something
more intrinsic to the language itself?

~~~
Vaslo
Much of it was writing code akin to recursive SQL (which was touted as one of
the things it was way better than SQL at). Just never could understand it, and
it took hours for most to get simple recursive models like connecting family
trees. That language (like many) has a lot of work to do in terms of support
and tutorials in order to build reasonable interest for it.

~~~
garmaine
I think the problem here is is institutional--both our teaching institutions
and normal industrial practices. Objective studies have shown repeatedly that
functional programming models (which emphasize recursion, a stumbling block
you mention) are easier for students with no prior programming knowledge to
pick up and use effectively. It also tends to make better programmers, more
quickly. There's a reason MIT taught Lisp/Scheme in its introductory computer
science course for so many decades.

However (1) once you've ingrained all the counter-proactive habits of thinking
and worked past the stumbling blocks that imperative and object-oriented
programming models present, the conceptual jump from that to SQL is smaller;
and (2) good, in-depth introductory materials don't materialize out of nowhere
without demand. Your instructor was probably trying to do something good, both
by introducing you to data processing in a language with fewer syntactical
hurdles and in year-by-year incrementally improving the quality of
introductory material by using it in instruction.

Datalog allows you to express data relationships in a more straight forward,
more compact, and easier to refactor way free of most boilerplate. It does,
however, require you think about what you want to accomplish abstractly,
rather than as an imperative process, which is difficult merely because of the
years of experience the typical student already has in unergonomic languages
like C++, Java, etc.

------
wisnesky
Category theory can do better than relational database theory: the open-source
categorical query language CQL extends SQL with generalizations of
abstractions such as schema mappings and queries while supporting automated
theorem proving and fixing foundational semantic issues such as null handling.

[http://categoricaldata.net](http://categoricaldata.net)

~~~
stepbeek
Thanks for linking to this. I think I've been trying to build a poor mans
version of it in scala without realizing it...

------
dicroce
I'm fine with SQL, but i would also like there to be a more explicit language
that requires you to invoke the indexes when querying explicitly (instead of
implicitly like SQL). If you attempted to use an index that didn't exist the
query would simply fail (instead of just run slowly).

~~~
troels
Some dialects allow you to hint to what index to use.

------
chuckcode
Before diving into ways that SQL could improve I'd like to give some thanks to
a real workhorse that has proved useful over decades, which is an incredibly
long time in tech.

Could it be better? Sure, but author's proposal doesn't solve where I usually
have problems. What are my pain points and what would I like to see instead?

1) One giant statement. Personally I really like Hadley Wickham's dplyr [1,2]
(think "data pliers") which has a SQL like notion of joining different tables
and selecting values but separates the filter, mutate and summarise verbs as
separate steps in a pipeline rather than one huge statement. For transactions
dplyr would have to add an update verb as well.

2) Hard to test, especially for more complex ETL. dplyr approach highlights
that a lot of SQL these days is being used in ETL applications in addition to
the usual retrieval, transactions and reporting. Being able to express as a
pipeline of operations is easier for me to understand as execution is
conceptually consecutive and I can unit test individual parts as part of a
normal programming language environment.

3) My data isn't all tabular. Better support and semantics for non-scalar
entries where value is a record itself like in json, BigQuery, Hive, Presto,
etc.

4) Not that extendible. Better support for user defined operations (UDFs).
More and more frequently I want to apply some non-trivial operation to data,
e.g. run a machine learning model and it makes sense to do that as close as
possible to the data usually. It is possible to do a fair bit in SQL itself
with window functions but it is generally painful. You can point Hive at your
jar and run a UDF but it is also painful to integrate and debug in my
experience.

[1]
[https://cran.r-project.org/web/packages/dplyr/vignettes/dply...](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html)
[2]
[https://datacarpentry.org/R-genomics/04-dplyr.html](https://datacarpentry.org/R-genomics/04-dplyr.html)

------
blaisio
We can do so much better! The problem really is, even if you have a nicer more
expressive language that cross-compiles to SQL, it's really easy to end up
with things that are impossible write in some databases. Even though SQL is
based on Math, so it should "just work", the syntax is badly designed, and it
is tied up with database consistency models, so it is easy to write something
that should be possible but isn't. We can probably make much better
compromises in a different language, but it is unclear how we could
efficiently do backwards compatibility given all the existing databases and
database clients.

------
kerng
From all the languages I learned since high school, SQL is the one that is
pretty much still the same. Agreed, it's not perfect but I think it's one of
the most stable things there are. New features get added, which creates
inconsistencies between dialects. But if you stick to basics, it's all still
very much the same. In 2000s XML was added, then later JSON and there will be
future support for other formats also to integrate new data types.

------
mikl
The end of that headline is "but it's not really worth the effort and will
likely not succeed".

Sure, SQL is not the most elegant language. But it works. It does what we need
it to. Millions of developers knows it, countless tools speak it.

So I expect EdgeQL to go over like a lead balloon. Unless you can convince one
of the big existing SQL servers to adopt it, almost no one will ever get the
chance to use it, and it'll be a niche language for a niche database.

------
codetrotter
As someone with experience in using PostgreSQL, but who found that it would
fall short for a current project due to among other things the problems you
refer to as “poor system cohesion”, I am intrigued by EdgeQL.

Something I wonder though; can materialized views be created with EdgeQL? And
if so, does updating materialized views still require fully rebuilding them,
as PostgreSQL does, or can they be updated incrementally?

~~~
RedCrowbar
Alas, we don't support matviews yet, but it's on the TODO list.

------
londondev45
I fundamentally disagree with this article. SQL is arguably the most important
language in a developers toolbox. The attempts to isolate and replace it's
complexity through ORMs and nosql implementations has never resulted in a
viable replacement.

Like mathematical syntax fits mathematical logic. So SQL matches data
retrieval. Let's stop trying to replace it just because it's old. SQL is
amazing.

~~~
cmrdporcupine
I'm not sure you read the article? The author lays out a number of ways in
which SQL itself does not match the relational data model well, and gets in
the way of doing data retrieval well.

His criticisms are nothing new. Relational "gurus" Date, Codd, Darwin, Fabian
Pascal all made similar criticisms of SQL over the last 30 years.

------
stepbeek
EdgeQL looks really interesting - I'm glad I read through the article.

I think the issue - for me - with articles like this is the tendency to take
the stance of "${EXISTING_TECH} sucks, let's drop it for this new thing"

I would have found this way more compelling if this line was the very first in
the article:

> The relational model is still the most generally applicable and effective
> method of representing data.

------
namelosw
I always have negative opinion on SQL but I can bear with it by just using a
subset. Most of complex stuff are handled in application rather than SQL. Just
use it as a simple data store.

I think Datalog is a better alternative. However, for SQL or DBMS is a field
with more powerful lock-in than other fields. Or Oracle wouldn't live so long
so well without significant improvement over these years.

------
slifin
Datalog appears to address a lot of these concerns

------
ChicagoDave
I’ve been arguing relational databases are an operational anti-pattern for a
few years.

Your architecture shouldn’t start with an implementation choice. It should
start with business modeling and drive down to tech choices, which may be one
of many types of serialization/data storage technology.

Adopters of Domain-Driven Design have already moved past this argument.

------
ares2012
I'm disappointed they left QBE
([https://en.wikipedia.org/wiki/Query_by_Example](https://en.wikipedia.org/wiki/Query_by_Example))
out of their history of SQL and relations. SQL vs QBE is one of the great
examples of concise technology beating over-wrought UX.

------
sanxchit
Having worked with databases for a while, SQL seems to be useful because it
forces you to think about how your data is structured. To me SQL is a thin
wrapper around the relational algebra notation. The biggest problem I run into
with SQL is that it is hard to tell how performant a complex query is before
actually running it.

~~~
danisth
Not a great interface, but estimated query plans can be pretty useful for
getting a feel for how heavy a query will be.

------
mjirv
I'm confused by this:

> In EdgeQL, sets are flat, i.e. a set (including an empty one) cannot be an
> element of another set

It seems like in the last example, they're returning a set (of movies) that
contains other sets (directors, cast, reviews). Am I misunderstanding? I guess
it would be helpful to see what the output looks like for these examples.

~~~
RedCrowbar
There are good examples in the tutorial:
[https://edgedb.com/docs/tutorial/queries#ref-tutorial-
querie...](https://edgedb.com/docs/tutorial/queries#ref-tutorial-queries)

The last query is still returning a flat set of Movie objects. The shape
selector describes _related_ data that is also needed to be fetched by the
query. So, the query actually returns multiple sets that can be reconstructed
by the client into a graph of objects (or serialized into JSON).

------
ta1234567890
You know what would be amazing to have for DB engines? Auto-optimization. For
example, something that automatically determines which queries are slow and
which indexes should be created to speed them up. Basically self-tuning. Not
sure if this could be something for a machine learning model to do.

------
davidw
I like the groupwise maximum query...I always have to look that one up if I
haven't done one in a while.

------
consultSKI
>> The NoSQL movement was born, in part, out of the frustration with the
perceived stagnation and inadequacy of SQL databases.

Maybe. I would suggest tight budgets and lack of talented DBAs are the two
primary reasons NoSQL became a thing. "We can make do without a SQL DBA."

No. You can't. IMNSHO

------
jb3689
My real issue with SQL is how verbose it is. The amount of relative importance
to the number of characters in SQL is often pretty poor (for example, when
doing multi-table joins). Having to spend a few minutes deciphering each SQL
query is a real pain

------
asdkhadsj
I like the fact that this is built on Postgres. Will be curious to give it a
try on the side.

------
dragonwriter
Since Date’s criticism of SQL is prominently referenced, the natural question
is “Is EdgeDB a D, and, if not, why not?” (D referring to the class of
relational systems for which Date and Darwen provided criteria.)

~~~
RedCrowbar
EdgeDB is not quite D if you read all the proscriptions to the letter, but
it's very close :-)

------
jshowa3
We can do better than saying everything always needs to be "better".

------
just_myles
That's a pass for me for now. I think it's current state is fairly easy to
pick up. Should this somehow be highly a adopted and truly show a benefit over
the standard, I still might pass :)

------
galaxyLogic
[https://www3.cs.stonybrook.edu/~warren/xsbbook/node11.html](https://www3.cs.stonybrook.edu/~warren/xsbbook/node11.html)

------
systematical
I happen to like SQL and found it pretty easy to learn. Tons of resources out
there. And now with Postgres ad MySQL adding JSON storage we're getting all
the benefits of NoSQL in SQL.

------
asdkhadsj
Sidenote, has anyone seen how to actually talk to EdgeDB languages without a
Client? As it seems currently only Python has a client.

 _(edit: I 'm specifically wondering about Rust, if it matters)_

~~~
1st1
JavaScript and Go clients are coming soon. Java/Rust will follow.

~~~
RedCrowbar
There's also HTTP/JSON support.

~~~
asdkhadsj
Hah, that should have been in the first answer. With no Rust client, I almost
ignored the DB until there was a Rust client. If it has full HTTP/JSON support
then I could still use it.

Any links to the HTTP-JSON API documentation? I've not found it thus far.
_(edit: I hope it 's a fully featured API, and not the GraphAPI haha)_

------
_bxg1
Very exciting project. Not a very good name, as it conflates with the existing
term, "Edge Computing", making one wonder if they're somehow related.

------
MentallyRetired
SQL has been great. Relational data is greater.

I absolutely hate working with graphQL, and document stores are a non-starter
for projects with any decent amount of data.

------
JaggerFoo
Good luck with that.

------
nickpeterson
How about APL? (0=2|x)/x←⍳20

Visit rosettacode.org and try looking up some solutions in SQL and then
reference the APL variant. It's night and day.

------
ldb
How do I test for empty set? If I understand correctly then "SELECT * FROM
Movies WHERE description = {};" will not work, right?

~~~
1st1
You can use the EXISTS operator:
[https://edgedb.com/docs/edgeql/funcops/set#operator::EXISTS](https://edgedb.com/docs/edgeql/funcops/set#operator::EXISTS)

------
Pxtl
Seriously, were talking about a relational system where expressing a _tree_ is
super difficult. Why do we put up with this???

------
iblaine
Academically, EdgeDB is very interesting, but practically, it seems to be
solving a problem that does not exist. SQL is great for simple set operations
and will lose its efficiency as you move towards more advanced features. The
industry has accepted SQLs limitations and created new ways to interface with
data, such as UDFs, pig, and pyspark. Those widely accepted alternatives can
save you from doing something in SQL that you really shouldn't be doing in the
first place.

------
ldb
> any element-wise operation on an empty set is, likewise, an empty set

Does this mean that "SELECT 1 + {}" gives "{}"?

~~~
RedCrowbar
Correct.

~~~
ldb
Assume I have a table "Actors" with a column "age" and for some of the records
the age is not set (an empty set). Does this mean that "SELECT SUM(age) FROM
Actors;" gives "{}" or do you implement a special logic for empty-set
summation when used in connection with aggregation (like SQL does)?

~~~
RedCrowbar
Aggregate functions in EdgeDB have an "initial value", which, for `sum()` is
defined as zero. Other aggregates, like `avg()` are not defined for empty sets
(you cannot divide by 0), so an error is thrown in this case.

~~~
ldb
I am still a bit unclear about this. Assume we have three actor records where
two records have age=30 and for the remaining one the age is not set. From
what I understand then "SELECT sum(age) FROM Actor;" returns "60" while
"SELECT 30 + 30 + {}" returns "{}". This appears to be an inconsistent
handling of empty sets (thought it would be the same as in SQL).

~~~
RedCrowbar
The difference is that `+` is defined as a strict function (returns empty on
empty input): plus(a, b), whereas sum() is an aggregate that is specifically
defined as 0 on empty input.

------
rammy1234
if you can compare SQL every syntax to every usecase possible , until then we
cant claim what we are doing is lot better.

------
gigatexal
More vendors just need to support ANSI SQL.

------
max76
I would very interested in a language that transcribes to SQL the same way
typescript transcribes to javascript.

~~~
remus
If the generated SQL was compatible across a variety of databases and the
performance of the SQL was also passable you'd be on to a good thing.

~~~
max76
This is a problem I've been thinking about for years. It's possible to write
pure ANSI SQL and be compatible across a lot of databases, but to be optimized
in different databases it would need to have a target platform to compile to.

I'm really interested in what other developers think about this.

------
NeoBasilisk
"We can do better than SQL."

I heard this not too many years ago, and I am now very skeptical based upon
those results.

------
rvalue
Any performance metrics for edgedb ?

------
geuszb
It's kind of funny to point out issues with SQL's composability, and in the
same breath spec out a language in which "a set (including an empty one)
cannot be an element of another set".

There are many real life problems with SQL but this doesn't feel like it's
resolving any of them.

------
dubcanada
I wish it had slightly more details, like what does the last query on the page
actually return?

Also I am not sure how it's better yet. SQL is typed and has a ton of
historical overhead and used in many different database softwares.

This kind of just seems like a slightly different implementation of NoSQL
syntax.

~~~
1st1
> [..] SQL is typed and has a ton of historical overhead [..]

EdgeQL is strictly typed, here's more info:
[https://edgedb.com/docs/edgeql/overview/](https://edgedb.com/docs/edgeql/overview/)

> slightly different implementation of NoSQL syntax.

I'm genuinely curious what NoSQL syntaxes you are referring to.

------
starpilot
If there was a perfect, seamless pandas to SQL translator, that would be it.

------
yy77
criticism on SQL does not make it meaningful for a new database. I mean, an
ORM with its own query language, a new SQL dialect, some language that compile
to standard and optimized SQL are all good ideas.

------
Pxtl
I love he relational model, but any language where null = null does not return
true is insane.

Yes, I understand the theoretical rationale behind three-value-logic. Over
here where I'm getting actual work done, it's nothing but an obstacle.

------
znpy
Meh. It looks like just another graph database with a (possibly) better query
language.

People of HN: if you really want to make money, make a open source (open core
or whatever) horizontally scalable version of Versant OODBMS (Object-Oriented
DBMS).

------
leandrod
Object relational? No thanks. Just give me relational.

------
taffer
How many of the PostgreSQL features does edgedb support?

~~~
1st1
Quite a big number, but probably not all yet. For example, in one of the
future releases we'll add support for GIS types.

------
kthejoker2
Just wanted to say I see a lot of value in wrangling with the problems
inherent to SQL, so kudos to you.

Hopefully a transpiler API is in the works for all of those wonderful
standards deviant implementations?

------
coleifer
And thus a new SQL dialect was born.

------
ape4
SQL reminds me of COBOL. Even replacing some of the UPPERCASE words in SQL
with C-like symbols would be nice.

~~~
darkpuma
It's case insensitive, you can use lowercase if you like. The reason the
UPPERCASE persists is frankly because most people like it that way. Or maybe
I'm wrong about that and we're one preference cascade away from a major flip
in public opinion. But personally, I rebelled against the UPPERCASE when I was
first learning it, but I've since come to find the UPPERCASE to be quite
comforting. I think it has a pleasing aesthetic quality, and I find it helps
make large chunks of SQL less intimidating to read.

~~~
int_19h
I think the main reason these days is that there's still a fair bit of SQL
embedded in other languages, usually as string literals with placeholders. And
then it provides a way to visually distinguish it from "normal" string
literals.

------
branko_d
My two main gripes with SQL are:

\- Lack of interoperability with other languages.

\- General ugliness of server-side SQL (stored procedures and functions).

\--------

The first one isn't really solved by ORM tools, AFAICT. You can't simply write
the SQL query and transpile it into a nice, statically-typed method (with
types derived from the actual database structure!) that you can call directly
from your language.

For example, the database structure (I'm using T-SQL types here):

    
    
        CREATE TABLE T (
            A int PRIMARY KEY,
            B bigint NOT NULL
        );
    

And the query (T-SQL style parameter syntax):

    
    
        SELECT A, B FROM T WHERE A = @a;
    

Would produce the following method after transpilation (C#, hopefully self-
explanatory):

    
    
        IEnumerable<(int A, long B)> Query(int a);
    

But when the database structure changes, that would automatically be reflected
in the client language (after a build). For example, making B NULL-abe would
produce:

    
    
        IEnumerable<(int A, long? B)> Query(int a);
    

\--------

The second gripe is not very important if you use SQL just as a client-side
query language. But making the database "defend" its data (in presence of
complex business logic, or security requirements, not fully expressible
through declarative constraints) is still best achieved by "funnelling" all
clients through an API of stored procedures/functions/views, IMHO. As a bonus,
this approach also tends to lower database round-trips.

There seems to be a general lack of composability/reusability:

\- E.g. one stored procedure returning a set of rows cannot just "pipe" them
into another procedure or query - it must first copy the rows into a
(temporary) table.

\- If your dialect allows you to declare a table variable, you cannot just
assign it to another, you have to INSERT.

\- You can reuse the same SQL fragment multiple times in the same query
(through WITH), but not in different queries without encapsulating it in a
function (and good luck with performance if your dialect doesn't inline
function query plans or doesn't support functions at all).

\- You cannot parametrize ORDER BY, GROUP BY, IN...

And myriad of other problems:

\- The syntax is stuck in the '80ties, not well suited for auto-completion, no
type inference.

\- Lack of simple struct/tuple types in some dialects (may lead to huge
parameter lists).

\- Inconsistent exception / error handling behavior (sometimes the transaction
is aborted, sometimes it isn't).

\- Silent data truncation in some cases.

\- NULL sometimes meaning "unknown" and sometimes "empty".

\- No boolean expressions (e.g. you can't write A IS NULL = B IS NULL).

\- And probably many more that are currently not at the top of my head...
<RANT CLOSED>

It strikes me that we can do better on all these fronts (and more), without
abandoning the "good" parts of relational databases.

~~~
1st1
I think that EdgeDB actually addresses a lot of your comments re "General
ugliness of server-side SQL", i.e.

[ ] You cannot parametrize ORDER BY, GROUP BY, IN...

[v] type inference

[v] tuples, tuples of tuples, arrays of tuples, any combination, really

[v] Consistent exception / error handling behavior

[v] No silent data truncation in some case

[v] no NULLs -- empty sets are way more precisely defined in EdgeQL

[v] boolean expressions

Please give EdgeDB a try. Feedback from advanced SQL users is very important
to us.

------
nixpulvis
I know, I've used AR.

------
altmind
to do better than sql we should do at least as good as sql

------
daveheq
The correct title for this article is "SQL can do better"

------
krick
I wholeheartedly agree with the "complaining" part of the post, but then here
comes the "solution" and I'm not quite sold.

I mean, I don't necessarily claim that this is not a solution, it just isn't
obvious to me at all. Maybe a more extensive explanation with better examples
would make it all clear to me and I'd be super-hyped about it already, but
right now I'm more like confused.

First off, it would be helpful show the table structure in the examples, and
then compare EdgeQL query to the easiest solution in the SQL. After all, the
readers supposedly use SQL almost daily for many years (I know I do), but
don't know a thing about EdgeQL, so if it can do everything SQL can, but
easier, such a comparison must make it pretty obvious.

TBH, my knowledge of the relational algebra is quite rusty by now, so maybe
that's the problem, but as I remember, many queries we commonly use with the
SQL are not really "relational" queries. Relational algebra deals with the
sets of tuples, so things like count(*) or ORDER BY, or GROUP BY are not
really a part of relational model, they just exist because they are super-
helpful in what we usually are trying to achieve with SQL.

The problems with NULL are of a similar nature. I don't think we should
pretend that NULL not being equal NULL is not useful (we don't expect "missing
data" to be exactly the same value as another "missing data", do we?), and
SELECT DISTINCT treating them as equals is not intuitive (for me it absolutely
is: when I'm asking what values occur in a table, a missing entry is a missing
entry to me, I don't want to see NULL 10000 times).

So, the introduction kind of made me to expect the solution to be more in
compliance with relational concepts, but it doesn't seem to be, since all of
the above are present in the EdgeQL in one form or another.

I'm not sure how {} is different from NULL in the EdgeQL, since {} seems to be
kind of special thing here, the same as NULL is in the SQL. I mean, it doesn't
behave like a true empty set at all! Non-empty set {value} OR {} = {value},
not {} (OR ≡ ∪). <bool>{} being {} instead of a true boolean value looks even
more confusing to me than NULL OR (NOT NULL) = NULL. Same ternary algebra
here.

Then, I don't really understand a concept of a flat set here. I do kind of
understand what we are trying to achieve here: we want to solve the problem of
SELECT x, (SELECT y) FROM z throwing an error in a runtime, if count(SELECT y)
!= 1. And it kind of would make sense in SQL, but it's explicitly advertised
as a feature of EdgeQL that it can return trees (json-like structures), and
here it doesn't seem to make sense that an output of a query (which is a "flat
set", I guess?) cannot have another set as an element. Moreover, it obviusly
can be ordered, which also isn't a property of how "set" is commonly defined
in the set theory.

As a first impression, syntax and overall structure of the queries doesn't
strike me as obvious as well. In fact, since

> SQL does not integrate well enough with application languages and protocols

I would ultimately hope for something that can be expressed as a number of
function calls and commonly used data structures (a list, a dictionary/record,
etc.) in most/any mainstream PLs, not a one more DSL as in "free form text".
(Maybe with a more succinct DSL for the use in a console. Maybe.)

And my ultimate source of confusion. SQL is more or less the same thing even
in these DBMS where it isn't exactly The SQL (like ClickHouse). And given I
know the overall structure of the DBMS (like, is it, for instance, row-based
or column-based?) I can make pretty good assumptions of performance of a given
query, even though SQL is still declarative and I do not know what exactly the
query-optimizer will do. Maybe it's just that I'm not used to it, but I don't
have a feel about how performant would be the last EdgeQL example of the
article, and if it would be better to separate it into several queries at some
scale. In fact, I don't even understand if it's something that would be
reasonable easy to implement in other major RDBMS', or is it ultimately
EdgeDB-only feature? If so, it can be only as good as EdgeDB — and is it as
good as PosgreSQL, or MariaDB, or sqlite? Unfortunately, in the real world I
have to worry more about how performant and robust a thing is under load, than
I can worry about programming convenience.

~~~
RedCrowbar
> {value} OR {} = {value}, not {} (OR ≡ ∪), <bool>{} being {} instead of a
> true boolean value

This is because we define infix OR as

{a OR b | ∀ (a, b) ∈ (A x B)}

Same goes for the cast function.

You may argue that this is confusing, but if you think about everything as a
set comprehension, it is way more consistent. For example the `CASE WHEN`
example from the post would always return `{}` for empty input, making it
obvious.

> but it's explicitly advertised as a feature of EdgeQL that it can return
> trees (json-like structures)

The tree-like return is not an aspect of a language system, it's a matter of
output representation. For example:

    
    
        SELECT User { favorites: {name} }
    

still returns a flat set of User objects. The shape selector `{ ... }` is an
annotation that defines _how_ an object is serialized in the output.

> I would ultimately hope for something that can be expressed as a number of
> function calls and commonly used data structures

This is exactly where we are going. Orthogonality in the underlying language
makes it much easier to achieve this.

> and is it as good as PosgreSQL, or MariaDB, or sqlite?

EdgeDB is based on Postgres, we actually transpile EdgeQL to SQL. EXPLAIN and
query performance analysis are being worked on.

> in the real world I have to worry more about how performant and robust a
> thing is under load, than I can worry about programming convenience.

We have posted some benchmarks [1], and more are coming.

[1]
[https://edgedb.com/blog/edgedb-1-0-alpha-1/](https://edgedb.com/blog/edgedb-1-0-alpha-1/)

------
co_dh
kdb is better than sql.

------
yelloweyes
Well then do it.

Let me guess: ya can't.

------
throwayEngineer
So if you can handle the "complexity" of SQL and don't use NULL, SQL is fine?

~~~
1st1
> don't use NULL

I'm not sure this part is realistic to do in a non-trivial setup. At the very
least it's not a common thing to do.

~~~
6nf
You either have actual NULLs in your tables or you have implied NULLs that
don't exist in the actual tables but still come up when you do LEFT JOINs etc.

I don't think NULLs are actually a problem in SQL. It's not like we're talking
about pointers, it won't cause your server to crash if you use them wrong.
NULLs in SQL are fine.

~~~
ben509
They're _far_ worse than null pointers. They can cause subtle bugs that lead
to queries that appear to work but result in missing or incorrect data.

------
RocketSyntax
I use ORMs

------
usermac
What is EdgeDB? It is new to me.

~~~
1st1
Yeah, it's because it _is_ new. We released the first public version a few
weeks ago, here's a blog post with the announcement:
[https://edgedb.com/blog/edgedb-1-0-alpha-1](https://edgedb.com/blog/edgedb-1-0-alpha-1)
Hope you will try it!

~~~
benatkin
I skimmed the post and am not interested enough to look at it further. I'm not
saying it isn't interesting at all, but I still need to carve out the time to
learn Postgres' new features and check out RethinkDB again. If I had infinite
time I would check it out.

The MongoDB example is extremely contrived. If you wanted to do that with
MongoDB you would add a new field or a new collection with full_name rather
than doing a $map.

~~~
1st1
The comment in the query mentions that it makes the query only 5% slower. You
can drop the "full_name" part, but the query would be still too low level
(even compared to SQL/ORM examples) and slow.

~~~
benatkin
Why are you combining separate fields in the first place? Document databases
are denormalized. If you want a fair comparison, put full_name in a separate
field in addition to first_name and last_name in each document. That's the way
to use Mongo. The issue here is that a document db takes extra space, not that
you have to use $map for the common task of searching a full name. You only
have to use $map for querying things that you didn't plan to query.

~~~
1st1
The point of that blog post is to look at both performance and usability.

MongoDB has a way to fetch data the way we needed and the way other databases
fetch it in our benchmarks. We tried just that: the performance is roughly the
same anyways. It's up to MongoDB users how exactly they store the data and how
exactly they query it, but it's nice to know that their query language is
capable enough.

I think that it's pretty obvious that not a lot of users query data like that
in MongoDB and instead just store denormalized data. That's not the point of
that blog post though.

------
peterwwillis
So, to fix SQL, they're taking an SQL server and building a proprietary
language on top of it.

------
jimnotgym
We Can Do Better Than English. It is full of strange grammar and
inconsistencies. The tooling for checking grammar throws false positives. In
fact all it has going for it is that billions of people already speak
it...well I suppose that is a pretty big advantage.

