Hacker News new | past | comments | ask | show | jobs | submit login

I's nice that the article mentions Codd and his relational model of data but what it doesn't mention is how badly SQL parrots relational algebra. The language is inspired by the idea ("based on a real story"(c)), yes, but it takes a really clean and sound model and makes an unbelievable mess out of it.

SQL is just an ugly historical accident. Unfortunately, this how it often works...

NoSQL are a different story, of course. BTW, I believe that they predate Codd's work. There were many examples of non-relational DBs in the 70s.

> SQL is just an ugly historical accident.

Maybe, but as Bjarne Stroustrup said: "There are only two kinds of languages: the ones people complain about and the ones nobody uses." SQL seems like that: pragmatic, not perfect, but everyone uses it.

Bjarne invented C++, I don't think parroting Bjarne contributes anything to the discussion.

You don't think quoting a person responsible for creating one of the most influential programming languages contributes to a discussion on programming?

I'd call C with classes a very costly mistake, that's probably set back computing by decades. But I can't deny its influence, although I would respectfully disagree that it's a good thing.

You're talking about C, right? :-)

It goes shows how power the relational model is when a language as bad as SQL can still succeed enormously :)

I'd love to educate myself more on how SQL mangles rel. alg. and whether there's another purer implementation. Any links?

Rel is a DBMS based on Codd and Date's Tutorial D. https://reldb.org/c/

For a discussion on how SQL speficially falls short of the relational ideal, see this link: http://www.nocoug.org/Journal/NoCOUG_Journal_201308.pdf

Pascal and Date have been making these arguments forever: http://www.dbdebunk.com .

Yes, there were various approaches to non-relational data stores but they were not so flexible in terms of "schema", which I believe is the main strength of NoSQL.

A possible exception could be MUMPS https://en.wikipedia.org/wiki/MUMPS but I have no direct experience with this (while I used something akin to https://en.wikipedia.org/wiki/Hierarchical_database_model at the start of my career).

Pick[1] subscribes to a similar philosophy, merging the OS into a database.

[1] https://en.wikipedia.org/wiki/Pick_operating_system

Wow. We use this (Universe "flavor") at work, can't believe I'm seeing it on HN.

> how badly SQL parrots relational algebra

I’m continually amazed that we’re STILL primarily interacting with databases, programmatically, by passing SQL statements as strings into them, rather than, say, passing in a list of column names as an array, and a list of conditions as tuples, etc. ORM’s try to bridge this gap, but they do a pretty bad job of it; they end up oversimplifying so much that the result is unreasonably inefficient.

The difficulty with the approach you propose, and by contrast one of the major strengths of SQL, is composition. Passing a list of column names and a list of conditions lets you express precisely two concepts: Filtering and Projection. You could also add more: A set of tables to join together, an "aggregation" version of the same operation, etc... Going down this path, however, leads to a monolithic function that becomes progressively harder to generalize.

What relational algebra (and by extension SQL) gets "right" is that each of these operations (Projection, Filtering, Join, Aggregation, Union) are composable: They take 1 or 2 collections as input and produce a collection as output. Moreover, each operation has simple and well-defined semantics from which you can build much more complex logic.

That's not to say that Relational Algebra can't be built in to an imperative language. Scala (and by extension Spark) collections are a great example of composable operators at work. Ruby's array methods, Python comprehension syntax, and Pandas/NumPy are similar examples of simple, composable primitives that combine to be much more powerful data transformations.

Apart from RA-based language primitives, there's also compiler support that allows you to use SQL directly, but avoid passing strings around at runtime. .NET's LINQ is a great example. I'll also pitch one of my own projects, DBToaster (https://dbtoaster.github.io/), which compiles view maintenance queries down to a C++ or Scala class.

In short, I agree that passing strings around leaves performance on the floor and leads to escaping and code injection nightmares. But SQL is the culmination of literally decades of use-based design, and any effort to replace it needs to take care to understand what it does well and why (like the efforts I reference above)

Amazed until you realize the database must be able to accept SQL statements from many many different languages using many types of network drivers. Rather than implementing this (correctly) everywhere, centralizing the "smarts" in the database so that the languages and drivers just pass a string becomes a much less amazing design decision.

I'm sorry but ROFL. ORMs trade programmer ease for too much query performance imo. In all my experience it makes a lot more sense to get closer to the data with SQL than to use something like an ORM in the same way that C is faster than Python, SQL is closer to the metal like C is, and an ORM is more like Python -- quicker to prototype in but not fast for this use-case.

There's tooling that bridges the gap. An example I've been using recently: https://github.com/andywer/postguard I understand Java has a lot of these as well.

This is one of the concerns where embedding a language in another is sort of necessary but is a bit too much to have the compiler handle it by default unless DSL's are already a first class construct in the language.

Your upstream tooling is going to have to compile something into that data structure you propose, and then you're going to want to do ad hoc queries easily as a human, and end up inventing something worse.

Side note with postgres you can "prepare" a statement, compile it once, give it a name, and then execute that query without passing the full query every time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact