Hacker News new | past | comments | ask | show | jobs | submit login

The difficulty with the approach you propose, and by contrast one of the major strengths of SQL, is composition. Passing a list of column names and a list of conditions lets you express precisely two concepts: Filtering and Projection. You could also add more: A set of tables to join together, an "aggregation" version of the same operation, etc... Going down this path, however, leads to a monolithic function that becomes progressively harder to generalize.

What relational algebra (and by extension SQL) gets "right" is that each of these operations (Projection, Filtering, Join, Aggregation, Union) are composable: They take 1 or 2 collections as input and produce a collection as output. Moreover, each operation has simple and well-defined semantics from which you can build much more complex logic.

That's not to say that Relational Algebra can't be built in to an imperative language. Scala (and by extension Spark) collections are a great example of composable operators at work. Ruby's array methods, Python comprehension syntax, and Pandas/NumPy are similar examples of simple, composable primitives that combine to be much more powerful data transformations.

Apart from RA-based language primitives, there's also compiler support that allows you to use SQL directly, but avoid passing strings around at runtime. .NET's LINQ is a great example. I'll also pitch one of my own projects, DBToaster (https://dbtoaster.github.io/), which compiles view maintenance queries down to a C++ or Scala class.

In short, I agree that passing strings around leaves performance on the floor and leads to escaping and code injection nightmares. But SQL is the culmination of literally decades of use-based design, and any effort to replace it needs to take care to understand what it does well and why (like the efforts I reference above)






Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: