Hacker News new | past | comments | ask | show | jobs | submit login

> how badly SQL parrots relational algebra

I’m continually amazed that we’re STILL primarily interacting with databases, programmatically, by passing SQL statements as strings into them, rather than, say, passing in a list of column names as an array, and a list of conditions as tuples, etc. ORM’s try to bridge this gap, but they do a pretty bad job of it; they end up oversimplifying so much that the result is unreasonably inefficient.






The difficulty with the approach you propose, and by contrast one of the major strengths of SQL, is composition. Passing a list of column names and a list of conditions lets you express precisely two concepts: Filtering and Projection. You could also add more: A set of tables to join together, an "aggregation" version of the same operation, etc... Going down this path, however, leads to a monolithic function that becomes progressively harder to generalize.

What relational algebra (and by extension SQL) gets "right" is that each of these operations (Projection, Filtering, Join, Aggregation, Union) are composable: They take 1 or 2 collections as input and produce a collection as output. Moreover, each operation has simple and well-defined semantics from which you can build much more complex logic.

That's not to say that Relational Algebra can't be built in to an imperative language. Scala (and by extension Spark) collections are a great example of composable operators at work. Ruby's array methods, Python comprehension syntax, and Pandas/NumPy are similar examples of simple, composable primitives that combine to be much more powerful data transformations.

Apart from RA-based language primitives, there's also compiler support that allows you to use SQL directly, but avoid passing strings around at runtime. .NET's LINQ is a great example. I'll also pitch one of my own projects, DBToaster (https://dbtoaster.github.io/), which compiles view maintenance queries down to a C++ or Scala class.

In short, I agree that passing strings around leaves performance on the floor and leads to escaping and code injection nightmares. But SQL is the culmination of literally decades of use-based design, and any effort to replace it needs to take care to understand what it does well and why (like the efforts I reference above)


Amazed until you realize the database must be able to accept SQL statements from many many different languages using many types of network drivers. Rather than implementing this (correctly) everywhere, centralizing the "smarts" in the database so that the languages and drivers just pass a string becomes a much less amazing design decision.

I'm sorry but ROFL. ORMs trade programmer ease for too much query performance imo. In all my experience it makes a lot more sense to get closer to the data with SQL than to use something like an ORM in the same way that C is faster than Python, SQL is closer to the metal like C is, and an ORM is more like Python -- quicker to prototype in but not fast for this use-case.

There's tooling that bridges the gap. An example I've been using recently: https://github.com/andywer/postguard I understand Java has a lot of these as well.

This is one of the concerns where embedding a language in another is sort of necessary but is a bit too much to have the compiler handle it by default unless DSL's are already a first class construct in the language.


Your upstream tooling is going to have to compile something into that data structure you propose, and then you're going to want to do ad hoc queries easily as a human, and end up inventing something worse.

Side note with postgres you can "prepare" a statement, compile it once, give it a name, and then execute that query without passing the full query every time.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: