Half a century of SQL

DaiPlusPlus · 2024-06-04T17:51:57.000000Z

I want to know why there hasn’t been much innovation in the less-sexy parts of ISO SQL. My go-to for this is to point-out how SQL-92 defined the same small set of constraints that we have today (PK, FK, UNIQUE, CHECK, and that’s pretty-much it).

It’s been 30+ years and the expressiveness of SQL DDL constraints for data-modelling is completely unchanged. So far, only Postgres has extended it with EXCLUDE constraints; what I’d love to have is a “Non-unique foreign key” constraint and its inverse: a “NOT EXISTS” constraint - these alone would go far.

Another huge need is for safe-and-sound data-structure invariants - it’s almost impossible to correctly store a linked-list on a table without running until concurrency issues.

jandrewrogers · 2024-06-04T20:19:57.000000Z

Many constraints are extremely expensive to enforce at scale to the point of being prohibitive. You are essentially turning your relational database into a graph database under the hood, with the infamously poor scalability and performance implied. A legitimate argument for the obsolescence of SQL DDL (independent of the DML) is that it defines some features that inherently scale too poorly to be used in many modern databases and it assumes certain implementation details that aren't actually true in some modern database architectures.

DaiPlusPlus · 2024-06-04T22:18:55.000000Z

Well, yes - that’s ISO SQL “Assertions” right there.

I appreciate that eschewing RDBMS-provided data-integrity features makes sense when it makes sense to - especially when it’s your application code, and yours alone, that guards access to the underlying store.

But such-as-it-is, my current industry (think: incredibly unappealing on-prem CRMs) is all about treating the RDBMS as canon; reason 1 is because there’s a dozen companies all offering value-add services and systems that all integrate into each other solely by having an on-prem agent daemon that just logs-in to the first Oracle box it sees on the LAN and execs its raw SQL, not just SELECT, but plenty of UPDATE and ALTER - all without the decency of a TRANSACTION.

So having even only the most rudimentary of FK and CHECK constraints is what keeps hundreds of thousands of small-to-medium-sized business from imploding from a hosed production DB. They pay for themselves.

…but ifs not enough. There’s still plenty of data-anomalies even in the most well-designed aspirational 6th Normal Form DB you could make today simply because the RDBMSs are awkwardly rigid and inflexible in that regard. I wouldn’t be complaining about the lack of (cheap, high-perf!) EXISTS/NOT-EXISTS constraints in MSSQL/Oracle/MariaDb if those RDBMS were feasibly extendible in that regard. Postgres is the exception here, but unfortunately for me absolutely zero of the systems I work with use it; instead they’re more likely to run on Progress AS/400 or some proprietary xBase derivative.

refset · 2024-06-04T21:28:02.000000Z

> Many constraints are extremely expensive to enforce at scale to the point of being prohibitive

Incremental View Maintenance engines might be the solution we've been waiting for here.

DaiPlusPlus · 2024-06-04T22:29:51.000000Z

We use indexed-views in MSSQL for this and they’re great for many kinds of things - but they also kinda become almost like TRIGGERs and might ruin your day if the execution-plan-generation gods feel like it.

…and the lack of a LEFT OUTER JOIN also kills it for a lot of things too.

One problem is that a Txn won’t commit until all secondary indexes are updated, but I’d prefer it if there was a way for a Txn to unblock its caller once the base-table update is saved; updates to secondary indexes could then continue asynchronously. (I know this is hardly an original idea, but I haven’t read why this can’t be done)

Tostino · 2024-06-05T17:52:46.000000Z

Probably isolation level guarantees if I had to take a WAG.

wslh · 2024-06-05T13:12:23.000000Z

The general problem, which is rooted in computer science, is that there is no data structure that could be optimized for any kind of possibly query [language | algorithm].

One corollary of this is that you need to look at the problem you want to solve and decide. I don't think SQL is obsolete but I think it is not the iption for every kind of problem. Specially in HN there are a lot of discussions about "big data", "data science", AI.

sethammons · 2024-06-04T20:56:44.000000Z

are you saying that graph databases suffer scalability issues or are you saying relational databases used as graph databases suffer scalability issues?

I've done some high scale stuff off mysql with 10s of billions of operations a day across sharded clusters with some nodes holding terabytes of data -- fks, triggers, cascading deletes and such tended to be too expensive. I've not had the opportunity to use something like neo4j, which supposedly has a great horizontal scaling story.

edit: not that my downvoters will see, but, seriously? I'm asking a question to understand if I'm following. What are you downvoting? And yes, I've heard you are not supposed to question downvotes; I'm honestly perplexed.

PeterCorless · 2024-06-04T22:19:48.000000Z

The best scalability I knew about was JanusGraph running on ScyllaDB. You can also run JanusGraph on Cassandra, which is horizontally scalable. ScyllaDB will be more performant and vertically scalable.

• Docs: https://docs.janusgraph.org/storage-backend/scylladb/

• Detailed use case: https://www.scylladb.com/2019/05/14/powering-a-graph-data-sy...

• Video: https://www.youtube.com/watch?v=7WZyVUTwYJ4

Note this was a few years ago. I haven't seen a TigerGraph vs. JanusGraph head-to-head. The TigerGraph people are pretty sharp. If anyone has deeper knowledge would love to see comparative benchmarks.

Both JanusGraph and TigerGraph are rated about the same in terms of popularity on DB-engines.com (between 100-150 rankings). Neo4j is still far more popular/well-known, at rank #21 this past month. But there are other options you can explore.

tanelpoder · 2024-06-04T19:54:42.000000Z

FWIW, Oracle now has data domains that allow more complex constraints, including JSON schema based ones:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

yndoendo · 2024-06-05T03:15:13.000000Z

Side note. Had to setup an Oracle development environment this week. I find the sql shell to be quite bad compared to sqlite shell. Latter can has dynamic scaling with the terminal width while former requires a manually applied fixed width assignment.

Even the HMTL5 tools where bad. Creating a table via the UI caused a right pop-out modal that did not properly utilize the web browsers view. Typing in the column name would cause the data type field to be partial visible and selecting the data type caused the column name to be partial visible.

My take, expensive solution with inferior tooling based on competition.

bux93 · 2024-06-05T11:48:27.000000Z

Oracle is the bane of my existence, and I haven't worked with it for 10+ years. Still scarred.

DaiPlusPlus · 2024-06-04T22:30:36.000000Z

Domain-constraints are just reusable CHECK constraints on steroids :3

tanelpoder · 2024-06-05T00:39:53.000000Z

There's indeed some comparison there as (as far as I understand) the current implementation of domain constraints all still works on a single table (one column or multicolumn). But the flexible domains (enforce different domains based on a field's value or expression's result) are pretty cool and the JSON-schema based domains could make some (JSON-related) validation things easier out of the box too (I searched the web, looks like Postgres has support for domains too).

For more complex cases, one could write (or autogenerate) triggers, but that will likely cause performance and maintenance issues... so the other option is to standardize any important data validation & logic into application code - and that part of application code can be stored procedures if it needs to be close to the database.

Edit: Sometimes it's easier to sell the idea of using stored procedures for (data) logic to non-database developers when you call them database APIs instead of stored procedures :-)

vips7L · 2024-06-04T23:36:20.000000Z

JavaScript is just reusable assembly on steroids.

SpicyLemonZest · 2024-06-04T18:41:31.000000Z

Active development in data systems has focused heavily on larger datasets where complex global state isn't practical to enforce on write. Some systems don't even enforce their PK or FK constraints.

codeulike · 2024-06-04T20:05:49.000000Z

Yeah thats it, I've never seen contraints used that much in the real world. Unique keys but thats about it. Clever database-level contraints usually come back and bite you in the arse when you're trying to do some large scale change or upgrade or fix.

And if you're building an app you usually want the validation of input to be done as close to the user as possible (like in the UI), not relying on a round trip to the database and handling an exception. And some might say 'well have validation in both places' but then you've got a load of complex stuff to keep in sync which will be a pain in the arse if it gets out of sync (if your db rules differ slightly from your ui rules)

treflop · 2024-06-05T06:33:02.000000Z

Can’t you just use insert triggers and reject inconsistent changes? I haven’t used Postgres much but SQL Server triggers let you do pretty much any constraint you want.

Of course you have to be careful because constraints can really slow things down.

DaiPlusPlus · 2024-06-05T07:43:05.000000Z

The overriding reason is that a trigger (compared to a constraint) does not represent a static invariant which one can then reason about, and potentially manipulate further - or serve as the basis for query optimisations (for example).

When your RDBMS says that a particular CHECK constraint is enabled, then that’s a hard-and-fast guarantee that every row in that table satisfies that constraint. As a concrete example: if I have a simple CHECK constraint that ensures the Users.EmailAddress column matches a regex like “^\S+?@[\w\.\_\-]$”, then my ETL code that reads those records out from the DB doesn’t need to handle the case when “@“ is missing, thus greatly simplifying software design.

Put another way: consider how refinement-types (or dependent-types) in OOP exist to declare and enforce hard guarantees about the state or shape of an object or data in-memory, and they also provide effectively zero-cost compile-time precondition checks; then you can appreciate how constraints in a databases do the same, except for persisted data on-disk.

But a trigger, on the other-hand, needs some qualification as there are many kinds - we can ignore triggers which simply audit (I.e. log or alert) when some action is taken, but there are DML triggers which perform some test on data affected by the DML and either block the DML from continuing or committing, or allow it to happen. This kind of trigger is functionally identical to a CHECK constraint - but unlike a CHECK it doesn’t provide any guarantees. While he third kind of trigger won’t block DML, but it instead applies some transformation to the data affected by the DML before the change is committed; this has the advantage of allowing otherwise-invalid data to be corrected and accepted instead of blocking the operation; all of this is good stuff, honestly.

…the problem is that the presence and enabled/disabled statue of the trigger tells you absolutely nothing about the state of the data at-rest - but a CHECK, FK, or UNIQUE CONSTRAINT does.

Consider that if I removed that e-mail address regex CHECK and moved it into a block/allow TRUGGER then anyone who reads from my Users table now no-longer has any guarantees about the contents of the Email column; Now also let’s suppose that some other DBA, or perhaps an automated process, might need to legitimately disable the TRIGGER temporarily while they perform data cleanup work and then re-enable it; in the event that the cleanup job was botched then the invalid email addresses will lie there undiscovered until the ETL loader crashes the next time it runs because it was never written to handle malformed email addresses.

So in truth, TRIGGERs and CONSTRAINTS are complementary to each other: a DML trigger that transforms data to satisfy a CONSTRAINT is a good example.

————

Now consider an abort/allow trigger that performs something “non-trivial” (as far as the DBMS is concerned) DML validation check, such as a cross-table, or even cross-database query; obviously that won’t scale and it also introduces more points of failure (ext. DB connections) - but hang on: for limited kinds of cross-table validation checks we use FK CONSTRAINTS as that’s what they’re designed for - and they work great because we know how to optimise for them (usually by adding FK indexes).

So my argument is that there exist many other classes of validation rules which today can only be implemented as TRIGGERs but could instead be better implemented as a declared CONSTRAINT because of the hard guarantees that only declarative constraints can give (when enabled, of course).

For example, a “non-unique FK” constraint would not be any more expensive to enforce than an FK because it reduces down to the same (indexed!) set-membership test as a normal FK constraint. The exact same applies to an “anti-foreign-key” constraint (except the set-membership test is inverted). It’s so trivial to implement - which means its so maddening that it doesn’t exist in SQL.

My position is thus: if you can express a desired constraint as a deterministic function of a single row and the set of all indexes which runs within at-most O( n log n ) time proportional to the initiating DML’s time complexity then we should be allowed to do it ourselves.

DanielHB · 2024-06-05T11:03:57.000000Z

What we really need is a standard bytecode representation for relational queries so tooling can become more agnostic and more powerful.

It is kinda crazy the contortions you have to do to make some dynamic queries with the existing tooling.

mrtimo · 2024-06-04T17:43:05.000000Z

> SQL is great at expressing simple needs very well, but in more advanced cases, it can get "mind-twistingly complicated" even for a mathematically included brain.

I have found Malloy[1] to be easier to read and write, even for really complex queries. Malloy complies to SQL. As an example check out [2].

Malloy can query .csv files directly (enabled by duckdb). You can also join .csv files. This makes getting started a breeze for beginners, and is a boon for data scientists. Much easier than Pandas in many cases.

With the visualization capabilities it has built in, Malloy sort-of competes with Tableau and PowerBI.

Most people only write SQL to get data out of databases, for these people, Malloy is an interesting tool to consider.

[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/blog/2023-10-26-malloy-bump-char...

Not affiliated with Malloy, I've just played with it for a while and been impressed. It's also MIT licensed.

mikpanko · 2024-06-04T17:50:05.000000Z

Malloy is great. Why do you think it is not taking off if it is a clear significant improvement on SQL and even compiles to it?

arp242 · 2024-06-05T00:28:52.000000Z

Looking at this for a bit, there's a few reasons:

For starters, according to that link it only left "experimental status" in Oct 2023; it's pretty new. Although their GitHub still describes it as "an experimental language". That doesn't exactly inspire confidence for long-term production use.

The second problem is that while imperfect, everyone and their dog knows at least the SQL basics. There's a lot of value in that.

Thirdly it's written in TypeScript, so anyone not using TypeScript will either have to run some sort of "generate" step, or have to rewrite it in $language_of_choice. Both are painful. This is not a TypeScript problem: you will have that with any language (although with languages that compile to a binary it's a bit less painful, albeit still painful).

Lastly, it doesn't work for all SQL flavours: just BigQuery and PostgreSQL. That's pretty limited.

All of that is assuming the SQL it generates performs as well as "native" SQL, and that all of this can be reasonably debugged if something goes wrong.

mrtimo · 2024-06-05T04:36:01.000000Z

I see Malloy as an analysis tool like Pandas, PowerBI, Tableau, or Looker. It seems like most people write SQL to get "All the data" and then take it to an analysis tool for further study.

With Malloy you can do the analysis on your data lake directly for 80% of your questions.

bdcravens · 2024-06-04T18:17:30.000000Z

Limited support

https://docs.malloydata.dev/documentation/

"Malloy currently works with SQL databases BigQuery, Postgres, and querying Parquet and CSV via DuckDB."

emmanueloga_ · 2024-06-04T23:44:57.000000Z

Languages like Malloy or PRQL require some upfront investment to learn and setup. Additionally, I think both Malloy and PRQL are query oriented, so you still need to learn a decent amount of SQL to interact with your database (for updates).

I'm guessing most people would rather spend effort in solving their immediate problems with SQL rather than bet on a newer and less known technology, even if has a lot of promise.

myaccountonhn · 2024-06-04T17:53:04.000000Z

There’s also recutils for those that have simpler data storage needs.

parpfish · 2024-06-04T16:56:12.000000Z

i love SQL but it was really hard to get started for two reasons:

- it's hard to start tinkering unless you have a database to write queries against. most tutorials never really get past the most rudimentary hello-world-esque stuff because they're limited by the size and complexity of the example data.

- imperative programming teaches you to be very precise and explicit about what exactly should happen, so it feels very weird to start working with something declarative where you can just trust the computer to figure it out and do things well automatically.

craigkerstiens · 2024-06-04T18:29:55.000000Z

This is very much why we built the Postgres playground, which has Postgres embedded in your browser with guided tutorials - https://www.crunchydata.com/developers/tutorials

simonw · 2024-06-04T17:53:54.000000Z

I've been building SQL tutorials against my Datasette web UI for SQLite as a solution for getting people started without having to install anything: https://datasette.io/tutorials/learn-sql

refset · 2024-06-04T18:39:28.000000Z

Integrating live code editors within docs and tutorials is great.

Another example of this I saw recently using SQLite (compiled to Wasm) in the browser: https://docs.sqlitecloud.io/docs/sqlite

And if you ever want something similar for more general backend APIs (without relying on Wasm or the browser to run the software), https://codapi.org/ looks very slick e.g. as demonstrated in https://antonz.org/sql-upsert/ - discussed on HN previously [0]

Inspired by https://www.db-fiddle.com/ my colleagues ended up building a fairly bespoke setup for XTDB's docs (XTDB doesn't yet compile to Wasm) shortly before I came across Codapi, although our requirements were even more particular, e.g. see https://docs.xtdb.com/tutorials/financial-usecase/time-in-fi... - the backend here is https://github.com/xtdb/xt-fiddle which runs purely on top of Lambda Snapstart, and embedded within docs based on Astro's Starlight [1] and Web Components

[0] https://news.ycombinator.com/item?id=38663717

[1] https://starlight.astro.build/

itishappy · 2024-06-04T18:15:00.000000Z

Agreed, but data is so easy to find online! My breakthrough was that Python comes with SQLite built-in. Pointing DuckDB at a CSV file looks even easier.

https://docs.python.org/3/library/sqlite3.html

https://duckdb.org/docs/data/csv/overview.html

parpfish · 2024-06-04T18:26:50.000000Z

sure, you can find data and spin up your own db but you still need a tutorial to a) explain what the data/tables are and b) guide you through some things to build. for most programming languages you can make some sort of little widget or itneractive program that lends itself well to learning via curious exploration. SQL does not lend itself to that because you don't really make something 'fun'. You just make more tables. People aren't going to naturally come up with the kind of boring-but-challenging things that you use SQL for like "come up with a daily aggregate of pageviews, but exclude users that had account status X in the previous 30 days OR if they have a special status set in this other table as of the date of the pageview. And make sure that days with 0 page views show up with 0 pageviews and are not just missing from the table".

Also, it's very easy to make queries that complete and look correct but give wrong or incomplete results. You've not going to notice that from just running queries in self-directed practice.

edmundsauto · 2024-06-04T19:26:11.000000Z

From a slightly different perspective, some people like these tasks :). That's part of the mindset of being a good data engineer, for example - I rarely have a problem coming up with my own set of queries to try out a new database or dataset. But I'm also not really interested in a lot of SWE-specific things.

Maybe this just isn't something you really want to dive into. That's ok!

andy800 · 2024-06-04T22:19:24.000000Z

Forget online tutorials, you want to pick up thick red Wrox SQL books from your local used bookstore or co-op. Nobody selling you anything, densely packed, comprehensive info.

SQL has changed very little over the past 25 years, and while there are some syntax variations across different databases, the most important 80+% is universal. Even if the book is for SQL Server the vast majority will work on Postgres, SQLite, etc.

madcaptenor · 2024-06-04T20:05:45.000000Z

Also, if you don't know any SQL yet what are the chances you'd feel comfortable spinning up your own db?

AlisdairO · 2024-06-04T19:46:31.000000Z

self-plug, but you could give https://pgexercises.com/ a try. No need to spin up your own DB etc.

hnthrowaway0328 · 2024-06-04T17:28:05.000000Z

It's not that hard. You can use any SQL tutorial site for practice. If you want real life data, find the stackoverflow data dump on archive.org and install a PostgreSQL instance to play with it.

IMO the biggest issue of SQL is that it brings bread and butter for the BI/DA people, who are so close to the business that they usually don't care about code quality and such. It's not the fault of the language, but the fault of erroneous team structure.

kccqzy · 2024-06-04T19:02:41.000000Z

I think subqueries are a useful feature that doesn't get taught often to beginners. Using subqueries it's much easier to think clearly and intuitively. Even imperative programmers who don't have a ton of functional programming experience still understand nested function calls `f(g(h()))`. With this you don't even have to know when the different parts of a SELECT are applied, and you don't even have to know things like HAVING.

NortySpock · 2024-06-05T01:55:49.000000Z

I have found Common Table Expressions -- which are slightly easier for the developer to re-order in the query -- to be even more readable than subqueries.

cwbriscoe · 2024-06-05T05:29:43.000000Z

Yes, CTE's are a much more readable than subqueries IMO. I can't really explain it but it just makes queries easier to build and read to me.

fifilura · 2024-06-05T10:41:29.000000Z

Two benefits:

1) The flow is clearer: take this, then transform this, then filter this

2) CTEs makes it easy to debug parts. You just change what you select from on the last line.

Number 2 is the kicker IMO

disgruntledphd2 · 2024-06-05T12:25:20.000000Z

And this is why you should always, always have a `final` CTE and the last line should be `select * from final` so that you can easily change this.

We use a LOT of SQL at $current_dayjob and I almost always end up adding this to the queries. Clearly I need to do a better job of evangelising my approach.

therobots927 · 2024-06-04T22:09:15.000000Z

Actually with R it’s very easy. Just install the sqldf package. Then you can read in a CSV or whatever format you’re using to R (even if you don’t know R, ChatGPT could probably write the code for you). Once the data is inside an R data frame you can query it directly with sqldf like: sqldf(“select count(*) from csv_dataframe”)

fifilura · 2024-06-04T17:51:58.000000Z

I use to say that you have learned SQL the moment you forgot everything about C/C++.

Imperative programming is a strong habit to kick. You just have to let go. And once you have you will see things from a very different perspective.

Also in other programming you'll start seeing the map/fold/groupby/dict patterns everywhere.

parpfish · 2024-06-04T18:29:45.000000Z

every once in a while i start thinking "it'd be really nice to build a declarative language to do this", but then I realize that:

- that just means that I'm bored and would rather be writing out the logic to handle a more general/abstract representation of the problem

- whenever I come across a DSL that somebody else made to "make a really simple declarative-like interface", i hate it and wish that they didn't do it

fifilura · 2024-06-04T19:26:02.000000Z

Was your point that that goes for SQL?

One nifty thing with SQL is that you get all the threading/remote processing for free without having to think about it. That can be pretty hairy to write yourself.

SoftTalker · 2024-06-05T05:17:56.000000Z

The imperative vs. declarative was also a toe-stubber for me at first. It ties in with SQL being set-oriented, which is not natural for a lot of imperative-trained developers.

If you find yourself looping over a rows with a cursor you are probably doing it wrong, or at least suboptimally.

giancarlostoro · 2024-06-04T17:12:38.000000Z

Which is why I think we should always teach SQL basics with SQLite. You can just share the db file as-is. You can probably find dozens of GUIs and even web GUIs for SQLite that you could load a db into.

xarope · 2024-06-05T02:05:53.000000Z

1) sqlite is so easy to start with, I'm not sure what you mean to "hard to start tinkering"?

2) I'm not sure about your personal experience, but having worked with make files during C programming days, I think you get used to declarative state stuff pretty fast!

pmarreck · 2024-06-04T17:30:45.000000Z

SQLite is good for this.

bradford · 2024-06-04T17:29:47.000000Z

I view SQL right now similar to the way I viewed C++ in early 2000s:

I hate it, but there's little point in complaining because it's so ubiquitous.

More robust criticism is provided here (https://carlineng.com/?postid=sql-critique#blog), which pulls on an interview here (https://www.red-gate.com/simple-talk/opinion/opinion-pieces/...) The quote I usually drag out is from Chris Date, who helped pioneer relational DBs:

"At the same time, I have to say too that we didn’t realize how truly awful SQL was or would turn out to be (note that it’s much worse now than it was then, though it was pretty bad right from the outset)."

As an example of a language that does it better, I think kusto-query-language (KQL, https://learn.microsoft.com/en-us/azure/data-explorer/kusto/...) has been a dream to work with. (disclaimer, Kusto is a Microsoft product, and I'm a Microsoft employee).

wvenable · 2024-06-04T17:39:22.000000Z

I kind of disagree; the problem with SQL is that fundamentally it's actually pretty good. So alternatives either tend to be too radical (throwing out the baby with the bathwater) or simply not enough of an improvement to gain any momentum.

I feel like rational database querying is effectively solved and there's little point in re-litigating it. But still I'd be happy to switch to the perfect replacement if someone develops it.

vkazanov · 2024-06-04T17:47:41.000000Z

What does "good" mean in this context? SQL is not modular, most features are highly context-dependent and there numerous handguns.

Sql might be ok for trivial things, as in OLTP that programmers tend to work with.

But anything even slightly more advanced is... not nice.

And the standard is unique in its uselessness.

The underlying relational algebra model is brilliant thought.

rangerelf · 2024-06-04T17:56:23.000000Z

> SQL is not modular, most features are highly context-dependent...

Examples?

> SQL might be OK for trivial things, as in OLTP...

What is the threshold for triviality? I've seen understandable fairly complex queries, but they're not mind-twisters by any means; if you know what you need, and understand your data, >>and are not a layperson regarding databases<< it's doable without much sweat.

> But anything even slightly more advanced is ... not nice

Again, what is the threshold, or at least what is your threshold, for triviality vs. non-trivial?

You say it's "unique in its uselessness" but "the underlying relational algebra model is brilliant", can you explain a bit further what you mean by that?

wvenable · 2024-06-04T17:58:12.000000Z

Good means it gets the job done in a fairly logical and readable way. SQL queries are not giant programs and shouldn't be. I've written some very advanced queries with plenty of common table expressions, subselects, etc. Could it be more modular? Sure. Could the syntax be better? Yes. But would that radically change how queries are written? Not really.

The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

randomdata · 2024-06-05T00:55:22.000000Z

> But would that radically change how queries are written? Not really.

Rust didn't radically change how applications are written as compared to C, but that didn't stop us. Nor should it. Any improvement is worthwhile. It doesn't need to be radical.

But, like another commenter points out, SQL is like Javascript. Both having ecosystems so horrendously conceived that they have ensured there is no good path to replacing/augmenting them.

wvenable · 2024-06-05T05:48:50.000000Z

Rust hasn't swept the world yet and it helps prevent real bugs and security issues. A better query language may make it fractionally easier to write database queries but you have to toss out a half-century of experience. It's not worth it for marginal gains. This isn't even opinion, this is the reason it has never happened.

I think I agree that SQL is like JavaScript. If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago. But it's actually good enough, despite it's quirks, that there doesn't exist a language better enough to make it worth replacement. It's possible such a language might never exist. And both SQL and JavaScript continue to improve sometimes directly stealing ideas from potential competitors.

lenkite · 2024-06-05T12:04:13.000000Z

> If JavaScript wasn't as expressively powerful as it is, it would have been replaced a long time ago.

No, it wouldn't matter how terrible or in-expressive the language is - once it got rolled out to the web browser, then it would never be replaced. That is why you have WASM nowadays - you can add new stuff but never replace stuff.

wvenable · 2024-06-05T17:10:31.000000Z

I disagree. If JavaScript was objectively limited, one of the browser makers would have just added something else and if it was good enough it would spread to others. Just like other browser technologies (for example, xmlHttpRequest). In a way, that's what happens with JavaScript right now. It continues to evolve.

Browsers did support multiple programming languages with the language attribute on the script tag. This is how Microsoft added VBScript to IE.

mrguyorama · 2024-06-05T18:33:31.000000Z

>If JavaScript was objectively limited, one of the browser makers would have just added something else and if it was good enough it would spread to others.

So like Shockwave, or Flash, or Silverlight, or Adobe AIR, or java applets, or the entire ActiveX ecosystem or...

We literally only moved things to javascript after Google spent a billion dollars writing a hyperoptimized javascript engine that has to make a pact with the devil (pretty much every javascript based exploit of computers relies on the fact that it is leakily compiled JIT to machine code, if javascript required less optimization to be useful, the internet would be a less exploitable place) just so that you could read your email in a web browser in a slightly less ugly way.

It also required several doublings in normal person computing power to be usable, and literal armies of 20 somethings writing in a couple giant abstraction layers that had to reinvent the world to do anything useful. It is still the primary burner of average person computing power, to run a million lines of javascript to do the exact same shit we did in the 90s with a 386.

Javascript should be seen as a systemic failure. If it's supposed to be "assembly for the web" as it seems to be treated now, then it needs to be VASTLY more efficient to run.

wvenable · 2024-06-05T21:44:52.000000Z

> So like Shockwave, or Flash, or Silverlight, or Adobe AIR, or java applets, or the entire ActiveX ecosystem or...

Yeah you complain about JavaScript but listed 6 objectively worse technologies. Clearly the best option won.

Browsers are ultimately the write-once-run-everywhere platform that everyone was trying to develop for decades. The only successful implementation of that idea. This is a case of worse is better.

And I think you're actually overselling the problem -- I have plenty of vintage machines and this text box in HN has more functionality than word processors had in the 90s. Here we are safely using literally an infinite number of apps that never have to be installed. It's an amazing achievement.

randomdata · 2024-06-06T02:39:55.000000Z

> objectively worse technologies.

The word you are looking for is subjectively. Objective measures need to be specific.

> Clearly the best option won.

Although, if Konquerer had implemented its own Flash renderer back in the day, there is a decent chance that Flash would have won in the end. Flash failed only because there was, for all intents and purposes, only a single implementation and a notable individual didn't want to have to rely on it, breaking unanimous support.

> The only successful implementation of that idea.

SQL is the only successful implementation of that idea.

The browser has fallen quite short of any kind of success in that regard. For example, studies that have looked at iPhone users suggest that 80-90% of their "internet time" is spent in apps rather than the browser.

wvenable · 2024-06-06T05:12:18.000000Z

> The word you are looking for is subjectively. Objective measures need to be specific.

If you're going to be pedantic, you should at least try and be correct. Objective just means based on facts where subjective is based on opinions or feelings. Just because I didn't enumerate all the ways in which those technologies are objectively terrible, doesn't mean they are not.

> Although, if Konquerer had implemented its own Flash renderer back in the day, there is a decent chance that Flash would have won in the end.

But even if that was the case, we'd still be using JavaScript.

> Flash failed only because there was, for all intents and purposes, only a single implementation

...and that implementation was terrible. Full of security flaws, bugs, and insufficient investment from Adobe. But a single closed-source implementation is also a perfectly good reason why it objectively terrible as well.

> The browser has fallen quite short of any kind of success in that regard.

If Apple was not specifically crippling browser potential on their platforms, it might be more. But either way, the actual success of the iPhone was entirely based on the browser (the first version didn't even have apps). The web is what makes any alternative operating system viable today -- whether that OS is iOS, Mac OS, Android, WebOS, ChromeOS, Linux, etc.

randomdata · 2024-06-06T12:58:36.000000Z

> Objective just means based on facts where subjective is based on opinions or feelings.

Indeed, and the assertion made was based on opinion or feeling. There is no objectivity in the assertion.

> But a single closed-source implementation is also a perfectly good reason why it objectively terrible as well.

While I must share in your opinion, this again is not measured objectively. You just got finished defining the word... It is quite likely that at least someone at Adobe enjoyed it being a single closed-source implementation for a long, long time. "Terrible" is always an opinion.

> the first version didn't even have apps

Incorrect. While it did not support third-party apps (officially, at least; it didn't take long to see them added unofficially), the device was chock full of apps written by Apple.

randomdata · 2024-06-05T17:52:56.000000Z

> That is why you have WASM nowadays

As an aside, I always figured that the "WASM" of databases would come some day. With SQLite recently publishing details about its bytecode engine, perhaps that is looking more realistic?

randomdata · 2024-06-05T14:02:38.000000Z

> Rust hasn't swept the world yet and it helps prevent real bugs and security issues.

As the inventor of the relational model has written about extensively, and as you have no doubt came to realize yourself if you've used SQL for more than a few minutes, a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

Rust will never sweep the world, of course, because there is no reason to choose a single language in the application space. Something sweeping the world tells that you royally screwed up the execution environment. But it is a viable contender, despite being no different than C in any meaningful way (clearly you don't see bug/security issue prevention as being meaningful).

> despite it's quirks, that there doesn't exist a language better enough to make it worth replacement.

As you know, Postgres went in the opposite direction, eventually switching to SQL. A DMBS – one which is probably the second most popular DMBS in existence at that – completely upending what query language it supports is not without precedent. What do you think it was about SQL that made it substantially better to justify the change?

wvenable · 2024-06-05T17:19:26.000000Z

You seem to have my point backwards. It's not that SQL is substantially better than QUEL, it's that QUEL was not substantially better than SQL. Ultimately, what made Postgres interesting wasn't the query language.

> a different query language could have prevented a whole lot of real bugs too. SQL also has its fair share of security problems that are only prevented by telling developers to be careful.

I can't think of an example where one of these newer query languages actually solve bugs in the way that Rust does with the borrow checker. Slightly better syntax may prevent bugs but no where near in the same way. I don't find SQL particularly bug-inducing -- it's mostly just annoying.

randomdata · 2024-06-05T17:45:16.000000Z

> It's not that SQL is substantially better than QUEL, it's that QUEL was not substantially better than SQL.

You seem to have your own point backwards, unless you have failed to make one. The premise you gave, at least as I understand it, is that there is no reason to put in the effort in moving away from SQL because nothing else is substantially better. So, by the same token, unless SQL was substantially better than QUEL, there should have been no reason for Postgres to put in the effort to make the same transition.

Which implies that SQL was substantially better. The question was: In what way?

> I can't think of an example where one of these newer query languages actually solve bugs

Perhaps because you are getting hung up on newer? They need not even be newer. The most glaring bug-inducing "problem"[1] of SQL was already recognized and solved by the original database querying language, Alpha.

[1] Problem might not be the right framing, but I lack a better word. There is no problem if you write perfect queries every single time. But, like C, it opens opportunities for making mistakes that could have been made impossible with a different design.

wvenable · 2024-06-05T17:55:06.000000Z

> Which implies that SQL was substantially better.

That is begging the question. It doesn't have to be selected because it's substantially better. If you have two things that are roughly equivalent, it's always better to pick the more popular/standard option.

> The most glaring bug-inducing "problem"[1] of SQL was already recognized and solved by the original database querying language, Alpha.

What was that?

randomdata · 2024-06-05T18:32:29.000000Z

> If you have two things that are roughly equivalent, it's always better to pick the more popular/standard option.

There is a good case to be made that, for better or worse, single page applications (SPAs) are the more popular/standard option as compared to generated HTML, even if roughly equivalent, for web applications. Does this mean that HN is making a mistake in not adopting that model? You did say always.

It is not like Postgres was a new project and they simply had to pick something. It had been with us for a decade before deciding to take the SQL direction. It could have much more easily continued down the path it was already going. There was a significant amount of cost involved to make the switch.

When Postgres was first created there was no clear winner, but by the time the switch was made it was apparent that SQL was becoming the more popular choice, I'll give you that. Is that the only reason for the move?

Thing is, I'm not sure SQL is even all that popular nowadays. I expect the vast majority of DBMS queries are only using SQL as a compiler target, more likely being written in a class of languages often referred to as ORMs. By your logic, databases should be adopting what best aligns with that programming model rather than clinging to SQL. You did say always, after all.

Granted, there are some half-hearted attempts to do exactly that by some DBMS vendors (e.g. recent Oracle releases support "3D" queries that return JSON structures rather than relations, more closely matching language semantics), but not in any way that appears to be all that serious and definitely not in any kind of consorted way. It seems the SQL crowd still can't get over that there might be an alternative. In fact, often they can't even envision anything other than SQL being possible. I don't know how many times I've heard someone claim that SQL is some kind of fundamental mathematical property.

> What was that?

The most common SQL bug I see in my travels can be distilled down to something like:

    SELECT t1.* FROM t1 INNER JOIN t2 ON t1.id = t2.t1_id

To be fair, I'm not sure the bug would go unnoticed with such a simple query - where one is able to keep the entire mental model in their head. But consider the basis of that query expanded into something complex, where one will struggle to maintain all of the minute details in active memory. That is where I see bugs be introduced all the time, even by experienced developers. A different design – including the originally designed database query language – has proven to eliminate this class of bug.

wvenable · 2024-06-05T22:17:19.000000Z

> There is a good case to be made that

There is not. Counterpoint: Wikipedia. It's definitely not like this where SQL is used by 99.99999% of everyone and nobody has even heard of QUEL.

> It is not like Postgres was a new project and they simply had to pick something. It had been with us for a decade before deciding to take the SQL direction.

It wasn't even remotely popular until they made that change. In fact, it wasn't even remotely popular until years after they made that change.

Probably the interesting point is that they could make this change and not really lose any necessary functionality. Clearly the query language was not particularly important.

> Thing is, I'm not sure SQL is even all that popular nowadays.

A suggestion so laughable I have no response. SQL has never been as popular as it is right now in history. Even with the introduction of noSQL databases, SQL continues to gain. All boats are floating.

> By your logic, databases should be adopting what best aligns with that programming model rather than clinging to SQL.

Is there something they can adopt? If you presented me with 2 options, lets call one X and other SQL then we could have discussion about adopting one or the other. If this X was indeed more popular then maybe they shouldn't cling to SQL. But the dozens of ORMs across dozens of platforms is not an alternative that can be chosen.

> It seems the SQL crowd still can't get over that there might be an alternative.

If you've been around long enough you wait until someone can bring the receipts.

> The most common SQL bug I see in my travels can be distilled down to something like:

Yeah that is pretty common. Would it be nice if that wasn't an issue? Sure. I'm not arguing that an alternative might not be better, it's just not better enough. These are pretty minor annoyances. If you just made a language that fixes this one issue and left everything else alone, it would be better than SQL, but you wouldn't get anyone to switch. You'd be better off proposing this as a feature addition to SQL and maybe in 20 years you'll be able to use it.

randomdata · 2024-06-06T01:49:25.000000Z

> SQL has never been as popular as it is right now in history.

As a compiler target, but even then the SQL is compiled to another representation, so if you want to take that angle it isn't SQL that is popular. Turtles all the way down.

> Is there something they can adopt?

What Oracle is doing was already mentioned. I am personally not entirely sold on their technical choices, and feel that they are trying to stay too close to SQL unnecessarily, but at least they are trying something which is more than we can say for a lot of the other vendors out there. I commend them on that.

> If this X was indeed more popular

It isn't what I would call popular, but it is also rather new. It is quite unusual for a technology to become immediately popular. Though, frankly, I'm not sure it could become popular if never adopted by any other engine. Oracle doesn't have a stranglehold on databases, and if anything is losing ground. Popularity can never come before wide distribution.

> If you've been around long enough you wait until someone can bring the receipts.

But now we circle back to the original posts – there is no good vector for anything else to bring said receipts. SQL has things completely locked down, effectively, which you suggest can only be overcome by becoming more popular than SQL, but, again, popularly is impossible before wide distribution. So the only possible way off of SQL is to convince the SQL crowd that an alternative is possible in hopes that they will consider opening things up.

> You'd be better off proposing this as a feature addition to SQL

It is...? I believe it has been since the very beginning, or at least since the early days. It's such a glaring issue, of course it had to be dealt with. But you have to remember to enable it – which most of the time it seems developers forget, at least in my travels.

Of course, by the same token, if you are able to remember such things, what do you need Rust for? You can avoid bugs and security issues in C just fine if you keep a clear head and never make mistakes. So, Rust doesn't do anything meaningfully different from C, except in the places you suggest don't matter because as long as you don't make mistakes you'll never need it, and yet despite that has gained a decent following. Why it but not some SQL alternative?

wvenable · 2024-06-06T05:29:21.000000Z

> As a compiler target

Not as a compiler target. I doubt most projects exclusively use ORMs and never use SQL as a query language as intended. Also it's not just programmers using SQL inside of programs -- I teach SQL to accountants and they run their own queries. Any SQL alternative needs to be something accountants can learn.

I use an ORM all the time but I wouldn't use an ORM for reporting queries. Absolutely nobody abstracts over SQL for general purpose querying.

> But now we circle back to the original posts – there is no good vector for anything else to bring said receipts. SQL has things completely locked down, effectively, which you suggest can only be overcome by becoming more popular than SQL, but, again, popularly is impossible before wide distribution.

You've got it.

> So the only possible way off of SQL is to convince the SQL crowd that an alternative is possible in hopes that they will consider opening things up.

That's not how things work. You build something amazing and find a way to make it useful today. This is nothing new, it has been done through the history of technologies. This is how we get new programming languages, new protocols, and new standards. You could have said the same thing about x86 but ARM found a niche and expanded it. Or JSON vs. XML.

> It is...?

Is it? I don't know what you mean.

> Why it but not some SQL alternative?

I never said it couldn't be done -- it just hasn't been done. Any alternative to SQL has to have all the positive features of SQL and be a significant and obvious improvement. The majority of alternatives I've seen are intent on being radical redesigns. Rust, as you said, is not really radically different from C++ -- they just removed most of the footguns and design problems.

Any alternative to SQL needs to find a way to be immediately useful today. This is how NoSQL solutions became popular; the solved a real problem and I could use it today within an existing environment. But NoSQL is not a replacement for SQL -- it was just a replacement for tasks that an RDBMS was never good at in the first place.

But I think the idea of someone just building a better query language in their basement, posts it to Github, and we should all just move over is pretty unrealistic. I could build one of those tomorrow if I wanted.

randomdata · 2024-06-06T13:20:55.000000Z

> Not as a compiler target. I doubt most projects exclusively use ORMs and never use SQL as a query language as intended.

If projects use 90% ORMs and 10% SQL, ORMs are still widely more popular. This exclusively you are trying to draw is nonsensical.

> Any SQL alternative needs to be something accountants can learn.

Try reading the thread next time. The entire discussion is about how SQL's biggest screw up is the lock-in. The better solution would clearly allow SQL to coexist. If accountants want to use SQL, there should be nothing holding them back. You can still write code in C too, if you want to. The existence of Rust didn't put an end to it, nor should it.

> Is it?

It is. You commonly see the problem as said earlier, but never deal with it, just leaving the bug there?

> This is how NoSQL solutions became popular; the solved a real problem

Indeed, they offered an alternative API which solved the real problem that is known as SQL – hence why the movement became known as NoSQL (i.e. literally "no SQL"). Unfortunately, most of the implementations that offered better APIs fell short in the actual database-y bits, souring the effort, which has ultimately also quashed any progress in improving the APIs over databases you can use.

> Any alternative to SQL has to have all the positive features of SQL and be a significant and obvious improvement.

Therein lies the problem here. Rust brings no significant and obvious improvement over C. It might be marginally easier to use, and might prevent a bug here or there, but in the end it doesn't do anything radically different from C. The same can be said of most SQL alternatives. Yet Rust "made it" regardless where the others haven't.

Of course, Rust had the power of marketing behind it. Being a Mozilla project originally, it had a built-in audience, and the flashy video they released extolling its virtues back in the early days won sufficient hearts and minds. Other than maybe Oracle who at least seems to be giving some minor effort, the big names are stuck on SQL.

Although credit to SQLite for recently documenting its bytecode format. While they have not committed to it being a stable API, and probably isn't suitable as an API in its current form, it does provide a framework in which to start considering a "WASM" for database engines. This will undoubtedly happen someday if we keep the pressure on the SQL crowd to change their ways. And, indeed, they are evolving. The "SQL is a fundamental mathematical property" claims have diminished greatly over time, and they are even starting to recognize that there have been alternatives, like QUEL. It's a slow road, but one that continues.

> I could build one of those tomorrow if I wanted.

And I'm sure it would be the greatest thing the world has ever known, but since nobody knows who you are, there is a fat chance of anyone ever coming to realize that it exits. It is not sufficient to just be significantly better.

exnihilo21 · 2024-06-13T13:29:50.000000Z

> Yeah that is pretty common.

What is the bug?

collyw · 2024-06-05T17:55:11.000000Z

SQL isn't the nicest language to write, but I do find it one of the most readable languages out there.

> The worst SQL I've ever seen is when someone attempts to program it imperatively. It takes a different mentality to write SQL then to write imperative code.

Use to any me when my team lead who was a fairly shitty programmer used to describe queries to me in an imperative way. "IF, blah blah blah, THEN". You need to think of your queries as "Return this to me WHEN". Once you get your head around this, there is something quite elegant about the relational model.

indymike · 2024-06-04T19:02:46.000000Z

All of this is true.

"Being ok for trivial things... that programmers tend to work" with is precisely why SQL has won over the years. SQL has never been a great general-purpose language, but is fantastic at the things programmers tend to work with. That SQL (and associated databases) keeps getting extended to meet new data storage/retrieval problems has really helped SQL stick with us.

theLiminator · 2024-06-04T19:03:05.000000Z

Honestly for OLAP use cases, I think dataframe apis are superior.

metrognome · 2024-06-04T21:26:26.000000Z

I gained a a few years of experience in SQL-based OLAP systems at my current job. In this time I developed a strong appreciation for SQL, especially for its composability. Recently, I started a project in Google Colab, gluing together queries from several systems with Pandas DataFrames. I can honestly say that I've never been more frustrated learning an API than I have with Pandas.

Need some window function like LAG() or LEAD()? Too bad, I hope you like writing Python "for i in range(...):" loops. My notebook is littered with ".reset_index()" calls, ".replace(np.nan, None)", "axis='columns'", "foo.assign(bar=lambda df: df.apply(lambda row: ...))". groupby is especially confusing to me, as a Pandas GroupBy is difficult to compose with a normal DataFrame until you call .reset_index(). Compare this to SQL, where a subquery is a subquery, whether or not it has a GROUP BY clause.

The Pandas documentation also leaves a lot to be desired. Take the documentation of pandas.NaT[1] for example. "pandas.NaT: alias of NaT". Ok? That still doesn't tell me what NaT is, nor does it link to the thing that it aliases. The groupby documentation[2] also caused me some headaches, as it covers only the simplest aggregation use-cases.

Pandas is clearly better for some use-cases, but mostly for simple operations that are well-supported by the API (perhaps numeric operations that are implemented with native numpy routines). But if I'm doing some interactive OLAP stuff, I'll reach for SQL. Perhaps the problem is I'm trying to use Pandas like it's SQL, when it's not. But for manipulating data, I'd rather use a language than a library.

[1] https://pandas.pydata.org/docs/reference/api/pandas.NaT.html [2] https://pandas.pydata.org/docs/user_guide/groupby.html

edit: half a sentence

theLiminator · 2024-06-04T23:12:11.000000Z

I should've specified polars, ibis, dplyr, pyspark, etc.

Pandas is easily the worst dataframe api.

I'll never go back to SQL from polars, it's far superior in both composability and readability imo.

Not to mention complex transforms can be version controlled and unit tested, and then you can compose these together.

It also maps to/from SQL quite naturally.

fbdab103 · 2024-06-05T00:43:01.000000Z

I think the fundamental problem is that if you want to talk to databases, you have to speak SQL. Like Javascript on the web, you are stuck with what the platform provides. Both languages have significant deficiencies, yet is/was the only game in town.

Sure, there are some languages which can compile down to SQL, but like Typescript, every once and a while, you find some edge case where the transpilation fails you and you might as well be an expert in SQL.

I love all of the ideas of PRQL, but I do not know if I would be painting myself into a corner adopting something that will go the way of CoffeeScript.

zepolen · 2024-06-05T05:53:39.000000Z

I love SQL, I think it's fantastic, it's also the one skill I learned 30 years ago that still applies today and that I, personally, have used across at least six different databases.

The critiques in that link show a fundamental lack of understanding, eg. it poses that the following query should be allowed and wonders why sql complains about fetching more than one row since avg requires more than one row, but the reality is that what they are describing is avg(rows of rows) rather than avg(rows) which obviously doesn't make sense:

   SELECT
     AVG(
       SELECT SUM(amount) FROM purchases GROUP BY customer
     )

But even worse, the article says that the only solution is using a CTE(!?) and doesn't mention the obvious use of a subquery:

  SELECT
    AVG(total)
  FROM (
    SELECT
      SUM(amount) total_per_customer
    FROM
      purchases
    GROUP BY customer
  );

In my experience I've seen that usually the people that find it difficult to understand SQL are also the people that will be writing .fetch_by_id functions instead of .fetch_by_filters, ie. they consider database tables and data in general as 1d arrays in imperative programming and only think in those terms rather than the 2d sets they are. I'd wonder how their mind would melt if they'd tried to understand querying for 3d data, eg. temporal databases.

Also I fail to see how KQL is any different to SQL, it's pretty much the same thing with the exception that the selection comes afterwards, is that the factor that makes you hate SQL?

From the link you posted:

  StormEvents 
  | where StartTime between (datetime(2007-11-01) .. datetime(2007-12-01))
  | where State == "FLORIDA"  
  | count

vs

  FROM   StormEvents
  WHERE  StartTime BETWEEN '2007-11-01' AND '2007-12-01'
  AND    State = 'FLORIDA'
  SELECT count(*)

and

  StormEvents
  | where DamageCrops > 0
  | summarize
      MaxCropDamage=max(DamageCrops), 
      MinCropDamage=min(DamageCrops), 
      AvgCropDamage=avg(DamageCrops)
      by EventType
  | sort by AvgCropDamage

vs

  FROM     StormEvents
  WHERE    DamageCrops > 0
  SELECT   max(DamageCrops) MaxCropDamage, 
           min(DamageCrops) MinCropDamage, 
           avg(DamageCrops) AvgCropDamage
  GROUP BY EventType
  ORDER BY AvgCropDamage

bradford · 2024-06-05T21:26:44.000000Z

To be clear, I don't find SQL difficult to understand. I've used it for 20+ years and I can always get the query to generate my desired output. But I often find that the language is a hindrance, and that I can more efficiently reach my desired output using modern languages.

for example, here's the KQL equivalent to the 'average of sums' query:

  purchases
  | summarize total_per_customer=sum(amount) by customer
  | summarize avg(total_per_customer)

I find this more elegant, and I'd prefer authoring it over any of the equivalent SQL solutions previously mentioned.

barryrandall · 2024-06-04T18:11:53.000000Z

I only ever need to know KQL when something isn't working well, which turns 1 problem into 2: the original problem, and how to express exactly what I need in a language that, when things are going well, I forget quickly.

bbkane · 2024-06-05T00:12:37.000000Z

I've definitely had my troubles with KQL, but I still find it more memorable than SQL. Interesting that some folks feel the opposite.

My most recent favorite KQL trick for debugging from logs is `autocluster`. Maybe you'll also find it useful when you're in trouble :)

umanwizard · 2024-06-05T11:04:11.000000Z

https://www.scattered-thoughts.net/writing/against-sql/ is another great takedown of SQL.

rqtwteye · 2024-06-04T18:07:16.000000Z

I think SQL is pretty good but I would like to see a tighter integration with the application code and the database. Like managing the database source in the repository, integrate the database into tests easier, maybe express queries in the application language instead of SQL. You can do all of these to to some degree with things like ORM and LINQ but the database still feels very separate from the rest of the project.

I think SQL as a language could also be improved a lot to make it easier to understand. It looks and feels like FORTRAN 77.

dventimihasura · 2024-06-04T18:59:04.000000Z

One way to have tighter integration between the database and the application logic is to abandon the application code altogether. Express the application logic right in the database using (custom) data types, domains, defaults, constraints, views, triggers, and procedures. Then you can delete much or all of the "application tier", delete the ORM, and simplify the entire stack. That's one way to avoid having "the database [feel] very separate from the rest of the project."

cwbriscoe · 2024-06-05T05:37:42.000000Z

I agree with everything you said except putting business logic in procedures. That should be in application code instead in my experience. I have worked with too many vendors that put all their business logic in the DB. That always makes things more difficult for maintenance IMO. It may be easy to get something up and running for the first time, but it makes things 10x more difficult to change in the future.

dventimihasura · 2024-06-05T16:42:31.000000Z

That depends. I worked with customers 25 years ago who put loads of business logic in very long and involved T-SQL stored procedures in MSSQL or PL/SQL stored procedures in Oracle. That may have made maintenance more difficult for the database developers who wrote them. It did make maintenance more difficult for the Java and .Net developers when they had to look at them, but that may have been a skill issue. These days, between (custom) data types, domains, defaults, constraints, views, materialized views, and triggers, I think there's more opportunity to regard "stored procedures" as a measure-of-last-resort. When that's invoked, I think application of the "Single Responsibility Principle" can take you a long way.

collyw · 2024-06-05T18:00:22.000000Z

This was the way things were done 20 years ago as far as I understand. Shame I mostly missed that (MySQL was flavor of the day, being free and fast and throwing a way a load of features, only to be repeated to an even greater extent by Mongo a decade later). I could see a lot of benefits to working like that.

dventimihasura · 2024-06-05T19:14:11.000000Z

That's a good point about MySQL. PostgreSQL wasn't quite ready for primetime at that time, and Oracle and MSSQL aren't free. That left MySQL to fill the gap, but it's not really built for this style of development.

panstromek · 2024-06-04T19:27:03.000000Z

This make sense to me logically, but whenever I try to actually do it, it becomes a disaster.

dventimihasura · 2024-06-04T19:32:34.000000Z

Fair. I've only ever had success with this strategy as a solo hacker. It was once claimed to me that this would be a disaster on large teams and I had to admit that I had no evidence one way or the other. Don't know if that resonates with your experience. Just anecdotal.

TeaBrain · 2024-06-04T21:45:21.000000Z

This idea is largely regarded as making the codebase less maintainable since it can be more difficult to test discrete logical units of the SQL code than doing so for the application code.

dventimihasura · 2024-06-04T22:21:03.000000Z

I'm aware of that belief, but I do not share it.

mrguyorama · 2024-06-05T18:42:26.000000Z

Sounds like we should be adding good testing frameworks to SQL then

dventimihasura · 2024-06-05T19:15:50.000000Z

pgTAP is pretty good for PostgreSQL, but I don't know what if any options are available for other databases.

https://pgtap.org/

SoftTalker · 2024-06-05T05:28:26.000000Z

This is what I like to do.

sethammons · 2024-06-04T21:00:51.000000Z

application logic needs to live with application code in source control. I've not seen a solution that leverages the database the way you suggest and achieves that. It leads to people yeeting things into production and hoping they work because half the logic is not in front of them

dventimihasura · 2024-06-04T22:17:02.000000Z

The customers I work with tend to use database migration tools like Flyway or Liquibase, have their DDL in source control, run tests, and have fairly conventional CI/CD. As I said above, I've never personally worked on a large team doing this, but our customers seem to make it work.

SoftTalker · 2024-06-05T05:32:35.000000Z

Absolutely no reason you cannot keep your application logic in source control with this approach.

kccqzy · 2024-06-04T18:52:07.000000Z

You are really discussing an improvement in programming languages rather than in databases. Innovation in programming languages themselves tend to be in languages like Haskell which give you much more abstraction power. You should take a look at Haskell beam https://haskell-beam.github.io/beam/tutorials/tutorial1/ or Esqueleto https://github.com/bitemyapp/esqueleto and you will find that these libraries achieve much tighter integration between the database and the rest of the language. I'm especially fond of Beam: the way to define data types and tables, the way to write queries just feel incredibly intuitive to me.

> managing the database source in the repository, integrate the database into tests easier

These are much easier problems to solve. Many people have already solved it.

maweki · 2024-06-04T18:53:58.000000Z

As I see it, we haven't really made any progress in terms of impedance mismatch.

Otherwise, jooq is a fine addition to the Java world, as it generates a type-safe DSL from your ddl file where every statically correct jooq/Java statement is also a statically correct SQL query.

dventimihasura · 2024-06-04T19:07:27.000000Z

I see it differently. In the 25 years I've been working in this industry I see a welcome trend toward doing more in the database, such that the impedance mismatch dissipates. This is from a low-water mark 20 years ago or so, when most Java developers I knew kept the database at arm's length, insisting it met them on their terms rather than the other way around.

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

https://supabase.com/

One way to eliminate the Java-SQL impedance (for example) mismatch is to delete Java altogether, along with JOOQ, Hibernate, and Spring (for example).

rqtwteye · 2024-06-04T21:06:00.000000Z

I think more code in the database could work but then the database should have better programming languages. The current SQL dialects are terrible for writing regular code.

dventimihasura · 2024-06-04T22:14:03.000000Z

That's a matter of taste. I find most of the business logic I need to express involves finding and transforming relational data, a task to which SQL is better suited than any general purpose programming language. Often, SQL is all that's needed. Less often, I have to resort to PLpgSQL, PL/SQL, or T-SQL. When that happens, it's not that bad.

wvenable · 2024-06-04T19:19:33.000000Z

Fundamentally the database is separate from the rest of the project. It's quite often a completely separate service hosted on a completely separate machine.

Do you feel the same about HTML, CSS, JavaScript, REST APIs, etc?

rqtwteye · 2024-06-04T21:04:48.000000Z

HTML, CSS, JavaScript, REST APIs are usually stored in a repository like the application code. Much easier to manage. I understand that the database is usually hosted separately but in most cases they are tightly coupled so should be managed together.

wvenable · 2024-06-04T21:10:49.000000Z

I store my database structure in the repository either as scripts or as migration code (and sometimes both)

I'm not sure what else you need. This seems like a basic tooling problem.

rspeele · 2024-06-05T03:38:51.000000Z

SQL, C, Javascript: the three kings of "good enough".

Bad enough that their flaws are evident even to novice practitioners.

Good enough, entrenched enough, that they will be around for the next 50 years too.

FLT8 · 2024-06-05T22:24:18.000000Z

All three are also superficially easy to learn to use, hence they're able to sucker people in initially with the learning curve, then keep them there with the sunk cost fallacy.

gregw2 · 2024-06-05T11:10:22.000000Z

I think C will "be around" like COBOL in 50 years given its footguns despite its ubiquity today.

Not sure about SQL and Javascript though; successor options are way less clear.

collyw · 2024-06-05T17:57:08.000000Z

Does AI do COBOL yet? Maybe that's where all the jobs will be in a decade. It's not like there are a ton of blog posts about how to do stuff in COBOL to train it on.

bagol · 2024-06-05T01:01:38.000000Z

The only problem I have with SQL is that `SELECT` comes first instead of last. It prevents the development of good tooling.

systems · 2024-06-05T03:50:35.000000Z

well its trade off , its easier to read, maybe a bit harder to write

also remember that sql is a query language not a programming language, so the main target are not programmers

also think of it this way, it might help, you start with that you really want

  select firstname, lastname, hobby

so you start by saying you want a table (a relation, a relvar) that have first name , last name etc ... this is what you want then you tell me where to get it from

  from users

so the order is not bad , it like saying, get me a a drink from the box you wont say, from the box get me a drink

select first resembles more natural language, and is more declarative

from first, is more procedural

what i want first, then how --> declarative

how to find what i want, then what i really want --> procedural

wodenokoto · 2024-06-05T05:25:15.000000Z

If I start deciding on my with the table, my query editor can tell me what I can select. I've never seen a query editor that looks at the select table and suggests tables based on that.

a_aitch · 2024-06-05T01:30:28.000000Z

Could you elaborate? I'm curious why that makes a difference.

cyberax · 2024-06-05T02:00:25.000000Z

For auto complete, for example.

pmoleri · 2024-06-05T02:02:09.000000Z

Select Na..

vs

From Users Select Na...

OJFord · 2024-06-05T16:17:44.000000Z

You could qualify the names up front?

  select users.Na.. from ..

If your tooling doesn't work in the way you want with that I don't think there's reason to think it would if it were possible to write

  from users select Na..

either.

DeathArrow · 2024-06-05T06:18:42.000000Z

I wonder if there can be a more efficient language to interrogate relational databases. By efficiency I mean both performance and less complexity.

quercusa · 2024-06-04T19:26:30.000000Z

Stonebraker said Oracle had a head start and was achieving strong growth using "sales tactics I would not condone."

Well, I never...

dventimi · 2024-06-04T20:39:10.000000Z

It helps to have friends in high places.

https://news.ycombinator.com/item?id=32596903

fallingfrog · 2024-06-05T01:39:02.000000Z

Tl;dr: the original design idea behind sql was a failure and here’s why.

I know the intent was to write a declarative language where you just specify what you want and the system figures out how to get it- but I consider that project to have definitely failed considering that in order to get the sql interpreter to behave in a sane fashion you must use explicit JOIN statements everywhere and fetch table rows and index them to other table rows in a specific order using specific keys that the programmer chooses. It’s exactly the same as writing while loops, though the syntax is admittedly quite nice.

I’ve seen people try to interact with the interpreter with a great big god query like SELECT FROM A,B,C,D,E WHERE A.id=B.id and B.id2=C.id2 and so on, and that would be a declarative way to do it, but the performance of the interpreter in such scenarios is absolutely, universally, perplexingly dismal and in practice you can’t use sql that way. It was a nice idea though.

balakk · 2024-06-05T11:15:42.000000Z

Sorry but I've to agree with the other flippant response to this post. The whole idea of SQL is to avoid such imperative thinking, and SQL optimizers are incredibly good in most prominent DB engines. There are ways to hint and influence a certain query plan; but that's not to take away from the incredible convenience that SQL affords for writing adhoc queries.

fallingfrog · 2024-06-05T17:19:06.000000Z

The ones I’ve used (mainly Microsoft sql server) are not just bad, they’re abysmally bad for any query that’s not extremely trivial. Writing every query with explicit joins is a necessity almost 100% of the time. Maybe you’ve just never tried to do anything tricky with large tables with a lot of data? Query optimizers almost always resort to nested table scans as soon as you do anything even slightly complex.

dventimi · 2024-06-05T05:55:25.000000Z

Rubbish

jklowden · 2024-06-05T04:12:00.000000Z

The only reason we live with Sql in 2024 is that the industry has stagnated. Not only the vendors: the users, too.

SQL was designed for a purpose it never fulfilled and never will: end user queries. The promise was that users would write their queries so IT staff would have to write reports. But the only users of SQL today are those same programmers, many resentfully or disdainfully. Every online database discussion is rife with misinformation or silly advice ("write SQL sentences"); rare is any discussion about the relational model or first order predicate logic.

And that brings us to the user-led stagnation, as it were. If more programmers using SQL understood the fundamentals, perhaps there would be some innovation. Instead we live through the constant reinvention and rediscovery of what SQL can do, and a perennial effort to make it "easier" with ORMs and such. Actual ease, using a better language, is never discussed.

The right language to express relational algebra IMO is Lisp. I don’t mean Lisp-over-SQL. I mean Lisp over data, where tables are fundamental Lisp objects, and modifying those objects updates the database. Lisp solves all of SQL’s deficiencies: verbosity, orthogonality, composability.

The would-be inventor of a better database query language faces a daunting problem, however: there is no free relational engine. No C library implementing relational algebra over a data store. So, not only does the language need invention, but the database engine itself needs to be engineered. Nothing has happened in this area since Berkeley DB, which is barely a start and has anyway been hijacked by Oracle.

The nearest thing to a free modular database framework isn’t very close. The original Berkeley Ingres DBMS is nominally free software. But it hasn’t been portable to, say, OpenBSD for decades, and the proprietary owners are understandably not motivated to make their product a toolbox for tinkerers.

So, it seems we’re doomed to another 50 years of SQL. We have met the enemy, and he is us.

analyte123 · 2024-06-05T04:19:34.000000Z

It's simply not true that the only users of SQL are programmers. In fact, I have seen the disdain programmers have for SQL be a problem, where the programmers designed a Python client for data and the major users wanted SQL.

treflop · 2024-06-05T06:25:09.000000Z

I don’t think it matters what language we’re using. Working with relational algebra means putting yourselves in a certain mindset and if most people honestly have no interest in learning SQL, they’re not going to learn some other language.

Gravityloss · 2024-06-05T21:45:49.000000Z

A lot of companies use a lot of services with builtin databases. Every single vendor is peddling their own weird query language. Just give me SQL. I don't want to learn your proprietary thing.

TOMDM · 2024-06-05T04:14:48.000000Z

If anything the latest craze is leading us into repetition.

> The LLM will write the SQL query for your end users!

We are doomed to repeat the same mistakes forever.

pphysch · 2024-06-04T20:32:44.000000Z

SQL-the-standard is one of the great mistakes of computing, a bad abstraction of Titanic proportions. The standard should have been abandoned long ago in favor of the actually-existing reference implementations like PostgreSQL.

"Standard SQL is good and portable and worth spending enormous resources supporting" -- statements dreamed up by the utterly deranged

392 · 2024-06-05T10:32:05.000000Z

No one has ever said that since no database implements the whole standard or limits themselves to just the standard. jOOQ blog has some good writings on this. I am glad at least that the standard gave us a sizeable common subset over half a century rather than everybody doing their own thing from day one.

pphysch · 2024-06-05T17:29:17.000000Z

> I am glad at least that the standard gave us a sizeable common subset over half a century rather than everybody doing their own thing from day one.

There are tons of popular programming languages and they share paradigms, good ideas and syntactic similarities. Query languages ought to be the same.

Dialects work in natural language because interpretability, syntax and semantics are all fuzzy.

Dialects in computer languages is an anti-pattern because syntax and grammar must be precise and deterministic. You are either writing Java or gibberish, the Java toolchain will never charitably interpret your syntax errors as "close enough to understand".

SQL-the-standard birthed a bunch of different dialects of SQL that ought to be de-jure separate query languages.

So much human effort is wasted dealing with the standard SQL abstraction layer. If someone is migrating from Oracle to Postgres, the fact that their query languages are eerily similar is far down the list of practical concerns.

dventimi · 2024-06-04T20:40:11.000000Z

I regard this as hyperbolic.