I suspect the distinction between tables and data frames (and arguing about query languages) is from people who have lost sight of the important things when dealing with data - the relational model and a system that supports relational algebra. Beyond that, features and optimisations need to be justified in terms of today's (and tomorrow's) needs which are specific to the data. The only major issue with SQL is a lack of support for something like R's tidyverse gather() and spread() operations which could be considered as missing operations in relational algebra.
But I also agree with you. I think we are teaching programming wrong, we start with imperative programming, but perhaps we should start with data modelling. How the data look like? What are the constraints? Can we test the constraints? How many bits I need to store it?
"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious." -- Fred Brooks
The central design premise for relational data modeling was “if your model could potentially allow inconsistencies, assume the inconsistencies”. Today that premise is easily brushed aside with “you’ll never get anyone manipulating this table without going through this layer” or “we then group by these columns to show unique records to the user”.
I also think another reason is that relational database interfaces don’t really fit in with the architecture people want to use these days. Products like dynamo have secure and operable HTTP interfaces and SDKs that fit in really nicely with the ‘serverless’ stuff. To run a relational database you pretty much need to run a network, which isn’t particularly compatible with such architectures.
The lack of understanding of the relational model is not the limiting factor in my experience, the developer experience is just much worse.
NoSQL databases need you to go through this ceremony too. It may be less obvious if you're just passing JSONs over HTTP all the way, but something still needs to keep connections and sanitize untrusted input.
*I have experience with a system, not designed by me, that had one, and we were always going to redo/split it but never did.
> but I don't see why - in principle - a "serverless", SQL-over-API-as-a-Service couldn't work.
It can, but there’s just not really any good ones. Where as there are products like Dynamo which are amazing from an operability standpoint (as long as your use case doesn’t run up too hard against any of its constraints). AWS Serverless RDS is pretty terrible for example, the engine choices are limited, the scaling story is terrible, it’s expensive, it doesn’t actually have anything close to the “on-demand” functionality described in the marketing material, and the interface is just a mechanism for passing SQL queries around (so you’d probably want to use yet another abstraction layer for constructing your queries). Spanner is pretty good, but it’s really expensive, and isn’t amazing enough on its own to justify moving to GCP unless you’re already there. You can also run an HTTP interface yourself for you RDBMS, but there isn’t really a mature product in that space. There’s no show stopping technical limitations there, it’s just not a well or widely supported feature.
Clearing that bit up:
Cloud is "just someone else's servers".
Serverless is "someone else maintains the infrastructure." It's one step further in the same direction.
If the cloud is akin to renting a flat instead of owning it, then serverless is like living in hotels.
Objects are just a collection of attributes. Table rows are just a collection of attributes. There are some things that objects can do that the relational model can't but if you are designing a database schema why would you care about those things? There is no rule that you have to use every feature of a programming language everywhere. So just use the least common denominator of features and you're good to go.
Complaining about ORIM is like complaining about the universal serial impedance mismatch. You can't plug in USB 3 (OOP) in USB 1.1 (relational) but you can plug in USB 1.1 into USB 3.0 and since we know the old model is still good enough for a lot of use cases (think keyboard and mouse) we still use it even though we have a completely different standard installed into our computers.
What this means in practice is that your Domain/Entity classes look exactly like your database schema, not the other way. You still have to write queries but your ORM makes it ten times easier by offering very convenient query builders that let you build dynamic queries without string concatenation.
It's that the data is represented with sets, and there's an algebra over those that provides (strong) guarantees and principled way of composing operations.
Yes, you use objects as mere key-value pairs and provide a bizarre semantics for the relational algebra over sets of these objects -- but! -- this isn't object orientation.
Eg., in OO objects compose -- in the relational algebra rows dont "compose", eg., even having person.address.street breaks the semantics of 'SELECT'
The interpretation of p.a.s has to be as a subset on a product of relations (ie., tables P and A filtered on a join of p.id to a.id, etc...)
This is one of the key impedance mismatches in OO<->Relations -- composition =/= join. Hence awkward and and ugly workarounds in all ORMs.
On the other hand, the purists* who rant about nulls, I think have missed something as well.
But on the whole, I would like to see something related to SQL, that tries to improve it without grafting things on, that has an overall vision and an approach of simplifying and making it more coherent. Not being committed to the syntax, and wanting to close the loop more between manipulating data and manipulating the language itself.
I don't know, if you say SQL is a functional programming language, ok, and you take some common functional language that has nice syntax and make it consistent with the relational model, what do you get?
*Of which my impression was formed by (IIRC):
"Relational Database Writings, 1989-1991 by C. J. Date"
(which Amazon seems to want over $1,000 for right now, in paperback)