> … The language (SQL) is not very composable. This is a fact that most SQL users are not aware of. The relational algebra that SQL is based on is absolutely composable but SQL is not due to the inherent limitation of the language (as it was designed to be natural language-like). When you write "select x from a where z", you are actually building something along the lines of "from a" => "where z" => "select x" in the algebra and you can actually compose each portion separately. If you are familiar with dplyr, Spark or pandas you would get this instantly.
Hmmm. Not a big lover of SQL, but this is a bit imprecise. While the unit of composability is slightly smaller for from=>where=>select (expression) vs select+from+where (subquery), in practice they both encode the same fundamental compositional principles, based on relational algebra. Any from=>where=>select query can be translated into a select+where+from query almost 1:1, if only via:
from t ::= select * from t
q => where p ::= select * from q where p
q => select xs ::= select xs from q
Sometimes the select+where+from ends up more verbose, sometimes there is more brain twisting to grok a given select+where+from query, but that's not a composition limiting factor. Granted, the readability of SQL is sometimes lacking, but it is fully capable to compose recursive relational algebra queries.
You're arguing about capability, whereas the post is arguing about the grammar itself. What you're describing is how to build a separate domain specific language that compiles to SQL (which there are quite a few of; e.g. C#'s Linq).
Quite a few query languages are equally capable (including, surprisingly, quite a few so-called graph languages, provided the SQL dialect provides a transitive closure), but SQL as a language has some undesirable properties (most of which are trade-offs for the fact that basic SQL is very easy to parse).
This argues that composability is the most important consideration of a domain specific language. But I think, as proved by SQL taking over, the UX is more important. Any programming language must consider the programmer and its humanity and natural way of thinking and reasoning to win in getting the most adoption and mindshare. Usability does matter as the user of any programming language is a human being.
Was QUEL also more usable and natural to people? As well as being more amenable to composition?
> The by clause causes the function to return a set of results, as opposed to a single result. One result is returned for each grouping specified by the by clause. Think of by as meaning "for each."
I assume it evaluates like retrieving set and scalar.
a | b
--------------
set 1 | scalar
set 2 | scalar
Assuming we key a and b by y.d and generalizing b to sets, we are reaching the limits of the relational model: there is no good way to represent multiple sets associated with a given key, we need independent tables to do so. Neither cross product nor null padding is a good way to represent the schema {y, a[], b[]}.
# create table y (i int, d int);
# insert into y values (1, 1), (1, 2), (2, 1);
# select *, (select count(*) from (select count(y.i), y.d from y group by y.d) _) as foo
from (select count(y.i), y.d from y group by y.d) _ ;
count | d | foo
-------+---+-----
1 | 2 | 2
2 | 1 | 2
(2 rows)
By the way great example how unwieldy SQL is. A bit better with CTE:
# with bar as (select count(y.i), y.d from y group by y.d)
select *, (select count(*) from bar) as bar from foo;
Yes, but it's extremely clunky. The number of times I've had to write out a whole chain of CTEs just because I wanted to apply one window function to the output of another.
In which context, "compostable" is a fantastic Freudian typo.
So () is "where", [] is "project" (choose or create columns) and you can use * for join and + for union. The result is a table with a column named comp.
They should implement something using straight functions, extremely spartan with no special syntax at all, and let everyone build their own favoured DSL over the top.
One of my major complaints about SQL is the syntax is so finicky that it is really hard to replace it with a [something -> sql] layer, because the something layer can't generate all the silly syntactic forms that SQL uses.
Eg, personal favourite, it is easy to have a dsl that translates
select(y = fn(x)) -> select fn(x) as y
that then breaks down because it can't construct
??? -> select extract(month from x) as y
and that is the only syntax the SQL database decided to understand. There are too many cases like that that need special handling, especially once SQL dialect-specific stuff comes into play.
is (strictly speaking) not part of the relational algebra because it is not using set operations (like join or union). It was added to the relational model because we hardly can process data without such expressions.
A better way to formally describe such calculated columns is to introduce functions and treat them as first class elements of the data model. In particular, function operations are better than set operations in these cases [1]:
- Calculating data using calculated columns. We describe a calculated column as a function without generating new relations (as opposed to select-from)
- Aggregating data. We can add an aggregate column without generating new relations (as opposed to groupby)
- Linking data. We can add derived link columns without generating new relations (as opposed to join)
This function-based approach to data modeling and data processing was implemented in Prosto [2] which demonstrates how many typical relational tasks can be solved using functions.
One major strength of SQL is its readability - it reads so much like English that a non-technical stakeholder could conceivably understand queries. Do you not find that this is a valuable thing that's lost with relational-algebra-esque syntax?
I haven’t one time in my 20 years of development had a time where that readability mattered tho. SQL very quickly becomes too complex for people who don’t intimately understand the language to make any sense out of it. Show a layman an INNER JOIN and see if they can make any sense out of what’s happening...they’ll just do what they do in real life: ask an engineer.
I'm in full agreement. Trivial SQL is somewhat readable, but anything non-trivial is not, and the order of SQL queries, that there is essentially no relationship between the syntactic and semantic orderings, make it hard to understand.
QUEL is better there but still not great unless its execution semantics are different than SQL's (aka semantically does it filter before or after selection?)
Relational algebra notation is so much more concise and readable. We’ve mangled so many very simple concepts because someone decided that math is offensive or something, so any math notation is automatically “not fit for software.”
Computing is inherently tied to math. There’s no getting away from that.
There is a book, Applied Mathematics for Database professionals that uses a terse mathematical notation to describe queries and I remember actually kind of liking it. I especially like that they used it to describe the state of the database and the transforms.
In my first job I used Ingres/Quel. Probably because of that, I still find SQL hideous. Quel was a lot more orthogonal and clean, but by now SQL has so many more features that they're not really comparable. The first version of Ingres ran on a PDP-11/70, and the different components (parser, query optimizer, query executor, etc.) ran in separate processes connected via pipes (this was pre-socket Berkeley Unix) because each process could only be 64K 16-bit words. It was hideously slow--ran a lot faster once it was ported to the VAX and everything could run in one process. INGRES originally stood for something like Interactive Graphics Retrieval System, IIRC because the original funding agency wanted a graphics database. Stonebraker wanted to build a relational DBMS so he just went ahead and did it, but gave it a grant-compliant name and wrote some bullshit about graphics to make the funding agency happy.
I agree with some of the comments about composabilty of SQL. I've been doing SQL since more than 14 years. Most of it was spent working on ETL projects for Finance and Utilities industries. Even today 80% of the code I write in pySpark is just plain SQL. It's been my bread and butter. However, I spend a lot of time trying to think about a solution in SQL. It's not an easy language to use when it comes to implementing complex transformations. I could write the same logic in Python in a lot less time. I use SQL mostly because it's easily portable across systems and most analysts and to some extent tech managers understand it. I work primarily on proof of concept data products and it does the job for that. Then a real developer takes over and implements it in .Net.
Minor point, and others have brought up more details around composability, but I think it's an absolute mistake to reference properties on an object before that object is declared. I.e. if SQL had the FROM clause before the SELECT clause autocomplete support could be much more intelligent about helping with SELECT columns.
Same problem with declaring imports in javascript. Python got it correct, where I write e.g. 'from somemodule import ...', so by the time I get to the import the parser has enough info to help with autocomplete. Javascript's 'import { Something } from "foo"' means the parser can't help me autocomplete Something.
I find imports in Python completely backwards conceptually compared to JS. I hadn’t thought about the autocomplete issue though. In practice, I know what I want to import, I guess.
I think it makes more sense the way Python does it. You get all the dependencies more or less in a column on the left, instead of jumbled at different widths on the right side.
Edit: In case you actually do want to play with it, Ingres is now Actian IngresX. Though I'd recommend thinking about what could have been instead of actually spending any time fighting with and configuring Ingres.
The latest iteration is actually called 'ActianX', and it's a continuation of Ingres plus support for column-based tables from the 'Vector' product acquired by Actian.
QUEL is still there, but its functionality has been frozen for years and, in terms of bells and whistles, is way behind modern SQL.
If you do want to try it, I had no trouble installing it on my home Ubuntu system a couple of years ago. Ingres is pretty easy to use in some ways. If you want to make a database, you just type "createdb mydatabase" at the command line.
Uh, it's nice to see the name Actian come out for once.
My previous employer used Versant by Actian (now called Actian NoSQL) heavily.
It's much more of a NoSQL database: it's a real object oriented database. You don't store tuples, you store objects. You don't make queries with selection and projection, you make a cut of a graph of objects.
It's insanely fast, multithreaded, has very good tooling, scales vertically very well, can do online schema evolution (class definition evolution, really).
Sadly it's almost impossible to scale horizontally (I'd be glad to be proven wrong).
It's basically what the industry needs to avoid the object-relational mismatch: an object oriented database.
Long ago, I worked for a company that had to convert its entire accounting system from QUEL to SQL. Fortunately, I was able to write a parser to find the queries and rewrite them, at least for whatever subset of the language they used. It's been a while, but I think there was an issue with multi-query transactions, so the program warned that you'd have to convert and/or verify some parts yourself, but fortunately there weren't too many of those.
As I said the last time this came around: The end of this isn't quite right. The Postgres project started in 1986. I don't recall what language it used, QUEL perhaps, but it wasn't SQL. SQL support was added between 1994 and 1996, and that's when PostgreSQL was born.
Postgres used PostQUEL. SQL was added to Postgres by one of Stonebreaker's graduate students (Andrew Yu) around '95. By '95 Stonebreaker's lab was already working on a new distributed Postgres called Mariposa.
I don't see the contradiction you mention. Stonebraker returned to Berkeley in 1985. The article doesn't say the Postgres project started in 1985. It says Stonebraker started a post-Ingres project then. The query language at that time might have been either QUEL or POSTQUEL [1]
But the ending as it is written seems correct to me.
I didn't say that there was a contradiction. The paper is about QUEL vs. SQL. Given that Postgres/PostgreSQL is introduced, I would think that the initial use of QUEL (PostQUEL), and how and when and why it transitioned to SQL would be highly relevant. But the end of the paper is needlessly fuzzy on this topic.
"The world has since standardised on SQL, and the dreams of an alternate history exists only in the heads of those who had a hand in the early database wars. It was simply a quirk of history that System R was built within IBM, the single most powerful company in the computer industry at the time; it was a quirk that the engineers who built System R came up with a fiddly language interface as an afterthought, and it was a quirk that IBM then took that language and pushed it to become a standard … one that has lasted till today.
Of course, there was a silver lining to the whole saga. Stonebraker had forked the Ingres codebase in 1982 to create his company. Defeated by the bruising database wars of the 80s, he returned to Berkeley in 1985, and started a post-Ingres database project. Naturally, he named that database post-gres — as in, after Ingres.
And thus PostgreSQL was born."
I was trying to figure what you meant. I think I have an inkling now -- let me know if this is correct. Your quibble is with the fact the implication here is that since SQL won, Stonebraker jumped on the SQL bandwagon and created a SQL database, when in fact, he didn't -- he merely created Postgres, which ran on QUEL and didn't have SQL until much later.
I think the author made a stylistic choice to omit that detail to drive home a point, but even so nothing was said that was non-factual.
The omission, combined with the context, leads the reader to make an incorrect conclusion.
... he returned to Berkeley in 1985, and started a post-
Ingres database project.
That would be Postgres.
Naturally, he named that database post-gres — as in, after Ingres.
He says it's Postgres.
And thus PostgreSQL was born.
"thus" implies that the preceding discussion, about Postgres, describes the birth of PostgreSQL. Which it doesn't. At best, this is confusing, suggesting that Postgres = PostgreSQL. Postgres became PostgreSQL ten years later, once SQL was added, replacing PostQUEL. The elided details allow for different interpretations, including the wrong one, that SQL was there from the beginning. Also, the tone of the text you quoted suggests that Stonebraker learned his lesson, and just went to SQL for the Postgres project, which he definitely did not do.
I see what you're saying about that implication. I think the effects were indirect; SQL winning meant that Postgres, even though it started out supporting only POSTQUEL, had to evolve to eventually supporting SQL -- though you are correct that the cause-effect was not as direct and as inexorable as might have been implied in the prose.
I personally think eliding details was artistic license to make the prose flow better without bringing in ancillary details, but that's just me.
Stonebraker did eventually change his mind about SQL however -- if you've watched any of his recent talks he's of the opinion that most query languages will eventually and inexorably converge to some variant of SQL. (he was wrong about Mongo inventing a SQL-like query language, but that's what his philosophical commitments look like these days)
"Mike Stonebraker of Ingres didn’t even bother to show up at the committee meeting to make the (quite strong) case for adopting QUEL because he was ideologically opposed to setting technology standards. It was the behavior of an intellectually arrogant academic rather than a prudent businessman protecting the interests of his company."
Some might call the behavior principled, rather than arrogant.
The more we go forward with technology and everything, the more I understand: Stallman was 110% damn right the whole time, and was lightyears ahead in seeing what other people couldn't see.
Apparently it is not easy to create general purpose datalog engines that scale.
"In this paper, we started with the observation that Datalog engines do not translate across domains. We experimentally evaluated the advantages and disadvantages of existing techniques, and compared them with our own baseline, a general-purpose, parallel, in-memory Datalog solver (RecStep) built upon a rdbms.
"We presented the necessary optimizations and guidelines to achieve efficiency, and demonstrated that RecStep is scalable, applicable to a range of application domains, and is competitive with highly op- timized and specialized Datalog solvers."
Don't despair. Datalog is alive and well. Yes, it does compose beautifully. However, with composition solved, you'll discover that a naive implementation of relational algebra will suffer from spurious cross products.
This is pretty interesting stuff, thanks for sharing! Does temporal information get any special treatment in the economic graph at the moment? Are temporal queries of any interest?
I ask because I work on Crux [0] which at its core is a point-in-time bitemporal Datalog engine, and I see a lot of similarities (schemaless core, Worst-Case Optimal Join etc.)
This is a fundamental difference that I have with almost all of humanity.
People do not know the difference between popularity and merit. They don't realize that the reason we do things is because that is how we do things. And they put a lot of effort into rationalizing the way we do things, without realizing the subconscious psychological (rather than rational) basis for that.
Most people fundamentally are generally unable to question assumptions about technology or how the world works.
This is one reason why, even though I am rooting for the human species, I am doubtful we will be able to stay relevant for long as autonomous general machine intelligence is built and deployed.
Even with extensive augmentation, it's obvious that there are just severe limitations to the human mind.
The nice thing is that we have the opportunity to design successors that will not be limited in so many ways.
Just for comparison, a functional approach is a major alternative to relational and set-oriented models and query languages. The difference is that functions and operations with functions are first class elements of the model.
One version of it is implemented in this project:
Composition might be desirable but in the 1980s and early 90s it didn't really matter because the databases of the day weren't powerful enough to do the multiple table joins where composition is useful with enough speed to be usable.
DBAs of the day would spend a long time optimising physical storage of tables and building aggregation tables to make up for this performance deficit.
Wow thanks. I've never heard it expressed as a Common Column Expression. It makes sense to include it in the FROM statement because that's usually parsed and executed first.
My bigger concern regarding NULLs is that its a ternary logic shoved into a binary logic system, and it all invisibly becomes nonsense when your dataset has NULLs in it, and you don't explicitly address it
WHERE col1 > col2
is wrong, and it'll break in terrible ways
and in the face of negation + NULLs, everything falls apart[0], giving you both false positive and false negatives in your answerset, and everything will look perfectly fine.
Every column being made NOT NULL is the only sane solution.
This is basically the correct take, but I would say that NULL is a real and useful concept which databases should have.
That implies that comparison and boolean operators should be ternary, which would be ugly and confusing: but an ugly and confusing that reflects reality.
This would give us three "greater than" operators:
WHERE col1 > col2
WHERE col1 ?> col2
WHERE col1 >? col2
The first is always false for NULL, the second always picks the NULL column, the third never picks the NULL column. It doesn't seem coherent to order two NULLs, so that would always be false. I'm not attached to the syntax, which is intended to be illustrative.
The most important of such operators would be
WHERE col1 ?= col2
which coerces two NULLs to be equivalent. It says "yes NULLs aren't comparable, but for this query, I want to treat them as equal". Because that is only usually true, not invariably so.
you're not wrong... Null by itself is sort of crazy in a binary system, but I see that as more a problem with binary representing reality rather than the other way around. It's a handy abstraction in that sense but a nightmare for systems level programming.
The concept is useful, the implementation is not. You really just want better ergonomics for enums, and encode the many ways NULL is meant to mean that way
The issue is that not only does SQL syntax force an artificial order on these clauses, but that these clauses Cannot be decomposed to be used elsewhere. I cannot reuse a WHERECLAUSE or a SELECTCLAUSE in another expression.
Some detail here. What goes on in relational algebra is that the FROMCLAUSE is encoded into an axiomatic primitive called a RELATION and you get stuff like this:
Basically every operation in relational algebra produces a primitive of type RELATION which allows for all operations to be composed like unix pipes.
My example in the previous post has a flaw where it's not clear what:
SELECTCLAUSE * WHERECLAUSE = ????
will output because there's no meaning to a SQLEXPRESSION without a FROMCLAUSE. But hopefully it illustrates the point. I'm putting this here for anyone who's nitpicky about the details. The relational algebra syntax is much more elegant.
Do you mean “incorrect” in the sense that comparing a customer_id to an order_id is semantic nonsense?
I agree, and I do wish SQL had stronger typing so the parser could warn you before your query silently runs off the rails. For example, in Oracle, I believe the following is legal, but I wish it wasn’t:
CTEs (common table expressions) and views definitely do help with this, though they are new-ish where they exist and often have optimization issues. But being able to use them extensively in a newer database where they work well helps this quite a bit.
Tables are composable but the expression itself cannot be decomposed. I cannot reuse a where clause somewhere else; that is the fundamental problem the article addresses.
can anyone find examples of what QUEL looked like?
If we had a more composable query language being used instead of SQL, I wonder if that would have effected the course of ORMs, which arguably end up being as much about composable models of queries as they do about actual object mapping.
I kept searching the article, thinking I must have somehow missed the examples - nope! How very odd to wrote an article about how much better QUEL was than SQL, and not have a single example of either!
Quote: "Of course, there was a silver lining to the whole saga. Stonebraker had forked the Ingres codebase in 1982 to create his company. Defeated by the bruising database wars of the 80s, he returned to Berkeley in 1985, and started a post-Ingres database project. Naturally, he named that database post-gres — as in, after Ingres.
And thus PostgreSQL was born."
And 35 years later PostgreSQL is kicking Oracle's butt at every corner.
It's good, for some purposes, that English can be written conventionally by ignoring the accent in words such as résumé. But there are plenty of contexts, and I would say most of them, where this is going to bite you.
>If the world worked differently, we wouldn’t still be writing on QWERTY keyboards, or speaking English; technically superior alternatives like Dvorak and Esperanto would have taken over.
Bad metaphor: There is no evidence for Dvorak's technical superiority to QWERTY, neither is there for Esperanto over other languages.
I think the "technically superior" bit wasn't really the right choice of words.
Esperanto isn't intended to be superior. It's value is on it being equally foreign yet approachable for all the salient parties and therefore a conceivable acceptable neutral turf for everyone to share.
Ironically, the case for Dvorak keyboards is kind of the opposite: QWERTY was intentionally designed to avoid jams, which if anything biased towards making it more difficult.
It's not so much technical superiority as having a design objective that is more appropriate for the problem space.
One can similarly argue about whether "QUEL" is really technically superior to "SQL", but the design objective is (at least as perceived by the author) better aligned with the solution space.
I'm pretty sure the author was referring to the lore, regardless of what the truth might be. Otherwise they'd have had to present a much more detailed explanation.
> Esperanto isn't intended to be superior. It's value is on it being equally foreign yet approachable for all the salient parties and therefore a conceivable acceptable neutral turf for everyone to share.
This was not even attempted; Esperanto is a Romance language. Unless you think the only salient parties are Spain, France, Portugal, Italy, and Latin America, this "value" does not exist and was not a goal.
Esperanto is not a romance language. It incorporates elements from most european languages families (balto-slavic, germanic, romance), and the grammar is most particularly inspired by slavic languages.
And it was pretty much a goal of its creator to be both familiar and foreign for different european language families, as he had experienced division in Poland between speakers of those different families and wanted to have something more universal to gather them all together.
Yes, in the design context of the creation of the language, that was not in scope, though there are arguments that it's word-formation mechanism bears more resemblance to Chinese than European languages.
The vocabulary is primarily from Romance languages, but the whole story is more complicated. From the Wikipedia page: "Esperanto's phonology, grammar, vocabulary, and semantics are based on the Indo-European languages spoken in Europe. The sound inventory is essentially Slavic, as is much of the semantics, whereas the vocabulary derives primarily from the Romance languages, with a lesser contribution from Germanic languages and minor contributions from Slavic languages and Greek. Pragmatics and other aspects of the language not specified by Zamenhof's original documents were influenced by the native languages of early authors, primarily Russian, Polish, German, and French. Paul Wexler proposes that Esperanto is relexified Yiddish, which he claims is in turn a relexified Slavic language,[72] though this model is not accepted by mainstream academics.[73]
Esperanto has been described as "a language lexically predominantly Romanic, morphologically intensively agglutinative, and to a certain degree isolating in character".[74] Typologically, Esperanto has prepositions and a pragmatic word order that by default is subject–verb–object. Adjectives can be freely placed before or after the nouns they modify, though placing them before the noun is more common. New words are formed through extensive prefixing and suffixing."
Yes it's based off european languages; it was designed in a time when removing barriers from neighboring countries in a culturally very fragmented continent was perceived of higher practical value than creating a pan-human language; intercontinental travel wasn't yet as commonplace as now, yet many european countries had different cultures living within the boundaries of the same states and usually members of the majority culture were privileged as a result; there was hope (espero in esperanto) that people could forget about differences among them and see what they have in common.
Today we can be tempted to frame it as european chauvinism, but that's just because of our expanded horizons as a global society. People who believes in esperanto in the early days would likely share the same feeling now
> nobody uses them because you'll never be able to use anybody else's keyboard
I doubt that's the main reason, and it may be unwarranted as I say below. Personally, I thought it would take too long for me to be able to type as fast as in QWERTY. Fortunately, I was wrong about that too.
I learned the QGMLW optimized layout from Carpalx (linked above) about a year ago. I trained on https://keybr.com and then on https://typeracer.com for a few days and in less than a week I reached 70wpm. I stopped training shortly after and now I can usually reach 90-100wpm. Not particularly fast, but I'm not any faster in QWERTY, and I believe I could get faster if I trained more.
Even though I almost never type in QWERTY anymore, when I do I'm still just as fast as I used to be. I never learned to type in QWERTY "the right way", though, and I suppose this is actually what keeps me from confusing the two "modes". Whenever I try to keep my fingers on their "appropriate" keys (like a properly trained typist would), it's like my brain switches to "Carpalx mode" (since I did make an effort to use the right finger positioning to learn it). It's kinda like switching between thinking in my native language and English - I can think in both, but it doesn't "feel" the same, and I'm more likely to confuse the two where they overlap more. Pretty interesting, really.
Yeah I did the same thing but I learned colemak. I never learned to touch-type qwerty, when I learned touch-typing I switched to colemak. I feel like it made things easier as I didn't have bad habits to fall back on.
I can still type qwerty but I'm not very fast at it (I never was).
The article talks about how SQL lacks composability. I would like to know everyones thoughts about this.
This is a huge issue with programming in general not exclusive to SQL. Everyone would like to build programs that are modular and reusable but programming paradigms have been traveling in directions that prevent this from happening. Many people turn to design patterns or microservices to try to deal with this organizational issue but they fail to see that the lower level programming paradigm itself is the precursor to the problem.
In SQL the problem occurs in the statement itself. The WHERE clause or the SELECT clause cannot be reused anywhere else. I can't modularize a where clause and put it in another SQL statement. I have to rewrite the entire clause to reuse it.
In OOP the same issue occurs. In OOP your class tends to contain methods that are not combinators, or in other words methods that modify a free variable. Due to this like the SQL expression, Objects cannot be decomposed either. I cannot reuse a setter in another class or anywhere else outside of the context of the free variable it modifies.
In both cases there comes a time in the future of an application where programmers realize that similar logic could be reused but structural problems are preventing the reuse from happening so they have to implement a hack to get around it.
The issue is that everyone is unaware of this trend at a low level. They are unaware that SQL lacks composability just like how they are unaware that OOP lacks composability as well. But they are aware of this issue at a higher level and they tend to call it "technical debt" or some high level design problem.
Most commenters above talk about minor syntactical issues and fail to address what is not only IMO the main issue, but the main issue that the article itself is addressing. Likely because they're all unaware of the true nature of the composability issue and just didn't completely understand what the article was saying.
Also note that when I talk about composition in OOP I am not talking about "object composition." These are completely different usages of the word.
I don't really feel like composability/modularity is all that important in SQL.
I want modularity in programs because they are large, and without proper abstraction, impossible to manage.
SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.
The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo
>The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo
The SQL data types consist of a few primitives like ints strings and chars placed in higher order data types that are tables. These types are easily isomorphic to data structures and primitive types in traditional programming languages. There is zero mismatch here, in fact the data structures in application programming languages tend to be much richer than SQL.
See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems. If the programming language enables the creation of sum types like rust or haskell than there can never be a mismatch as these languages can produce virtually any type.
>SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.
For complex applications this is not true. In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app. The web app is suppose to serve as something that routes IO the bulk of your code/logic and heavy lifting should be shifted to the database. Simple apps can avoid this but in general complex apps cannot.
Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.
>I don't really feel like composability/modularity is all that important in SQL.
You're not completely wrong. The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder. For C++ optimization is built into the syntax itself, you make choices while coding to optimize things. For SQL you open up the black box and look into the query planner to see what your High level code is compiling too. SQL is a really bad interface for doing optimizations but that's a topic for another day. The topic of this post is modularity and he's not wrong... SQL is not a composable language and there's no performance loss in making it more composeable.
> in application programming languages tend to be much richer than SQL.
Hence why i say why there is a data model mismatch.
E.g. i wouldn't say that assembly and haskell have a compatible data model just because assembly is a sequence of bytes, and haskell is a superset of that.
Not that a data model is solely about the types involved.
> See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems.
ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.
> In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app.
When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.
> Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.
This is a strawman.
> The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder
I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.
>Hence why i say why there is a data model mismatch.
There is no data mismatch if the data model is richer, than it can cover it. Unless you're saying SQL is not rich enough to cover the PL.
Well you can implement a database that does that, it just wouldn't be table based data storage anymore. Postgresql supports JSON and enums if you want it but the minute you use these types things become less predictable in terms of performance.
>ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.
Wrong ORMs are notoriously leaky because the query translation is extra indirection. You need to compile to SQL and the SQL needs to compile to a query plan. This is why ORMS are bad.
As for the data types, Objects easily mimic types in SQL. You basically rarely ever encounter problems with incompatible type systems.
>When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.
When did you last write a 500 line function? When did you last write a 100 line function? You shouldn't be doing that if you are. Case in point, the total lines of SQL written in any source code usually well exceed over 500 and that is a case for composability. T
This is no different than regular source code. If you're writing 500 line functions in your source code than you're not even taking advantage of modularity in programming languages outside of SQL so of course for you it doesn't matter. Because composition doesn't matter for you period.
>This is a strawman.
No it's a statement written in the english language. A strawman is something that doesn't EXIST, because men are made out of flesh not straw.
It's sort of similar to the non-existence of what was suppose to be written in response to my statement: an actual counterpoint or an argument. But then how can you write such a thing if it doesn't exist?
>I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.
Sure but when it comes time to hand optimize SQL is extremely bad and you need to hand optimize SQL all the time. Not necessarily most of the time, but enough times that it's a real problem. That is the point and that is what I said. It's basically the biggest headache with SQL. If you've ever done analytics you'd know this is a huge problem.
It's better to have a language that allows explicit decorators that allow the programmer to choose optimization procedures when needed. Instead in SQL often optimizations come in the form of hacks. Case in point: SELECT * FROM A is worse then SELECT A.column1 FROM A. The later optimization comes in the form of a language hack and not explicit syntax.
Not saying the alternative mentioned in the article would solve this problem nor am I saying a solution exists... but if there's any big problem with SQL today it's hand optimization for sure.
That's a hack. The where clause itself is not decomposable. But if you made every where clause a single function taking in multiple parameters than yes that function is composable but you're taking extra steps to do a non-traditional coding style.
Not even sure if stored procedures are part of the sql standard.... these seem to me to be just specific syntax additions added on by specific databases.
I mean it works so why not. I could code all my SQL this way.
Is this really a SQL feature or a relational feature? The essence of the relational model is that names are known up front: an attribute of one relation is not an attribute of another.
> Is this really a SQL feature or a relational feature?
SQL.
There is no reason you couldn't have a composable implementaiton of relational logic.
> The essence of the relational model is that names are known up front
That every relation consist of tuples each member of which consists of a name, a type, and a value is part of the model, sure. That doesn't impose restrictions on composability.
> an attribute of one relation is not an attribute of another.
An attribute of one relation may well be an attribute of another.
There's no reason you couldn't have a relational language that let you store and reuse clauses.
"attributes" are "generic" properties of relations.
So for example the union operator can take two different relations that have the same columns but different rows and create a new relation.
In this binary operation that takes two different relations.... the attributes of one relation (the columns) is indeed the attributes of the other relation.
>Union on two relations R1 and R2 can only be computed if R1 and R2 are union compatible (These two relation should have same number of attributes and corresponding attributes in two relations have same domain)
It's the same thing for unary operations like select. A select expression can operate on different relations providing that the relations have the relevant attributes.
mixins are a good first step towards composability in OOP (also doesn't javascript have rebindable methods? i'm not very familiar with the language, but i think it makes it pretty easy to bind a function to a new object as long as it contains the same members that the function refers to)
Hmmm. Not a big lover of SQL, but this is a bit imprecise. While the unit of composability is slightly smaller for from=>where=>select (expression) vs select+from+where (subquery), in practice they both encode the same fundamental compositional principles, based on relational algebra. Any from=>where=>select query can be translated into a select+where+from query almost 1:1, if only via:
Sometimes the select+where+from ends up more verbose, sometimes there is more brain twisting to grok a given select+where+from query, but that's not a composition limiting factor. Granted, the readability of SQL is sometimes lacking, but it is fully capable to compose recursive relational algebra queries.