Hacker News new | past | comments | ask | show | jobs | submit login

This came up in an application I had modeling college football. A Team is a member of a Conference and a Conference is made up of Teams. But during realignment a few years ago when teams were switching all over the place it became apparent that Teams and Conferences were truly independent of each other and that their association itself was a first class object which also needed to capture the Season/Year.

Not really rocket science but some times your model is telling you something if you’re willing to listen.

This is almost exactly the motivational example that Codd used when originally describing relational algebra. He described 5 different data organization schemes for a single problem and designed relational algebra to work with all of them. This ability for the same program logic to work with many different data storage layouts should make changes like you describe less painful to implement.

(see page 2)

E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377–387. DOI:https://doi.org/10.1145/362384.362685

PDF: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

My dream, when I'll have a few years of free time, is to design a language were the relational table is the primary data structure abstraction.

Mine too. My main gripe with object graphs is that often you need to make arbitrary decisions about what comes "first" -- should I represent my customers as a list of `(name, address)` records or a record `(names, addresses)` of lists? This is a silly decision to have to make, and an even sillier one to have to deal with when later you need to transpose the structure to more naturally fit some other task. Relational tables sidestep this issue.

Over the years I've found that these transposition problems are a huge drag on programs, and once you recognize them you start seeing them all over the place. They make simple and obvious relational tasks into dozens of lines of nested data structure traversal and manipulation.

The notion of transposition comes from the array programming world, where a multi-dimensional array/tensor can be seen as a tree structure, whose levels may be easily reordered by transposition. I sometimes use numpy arrays for general-purpose programming; it can be very powerful with a well-chosen representation. Unfortunately array concepts basically require the data to be rectangular -- `x[i]` has the same shape as `x[j]` -- which can be hard to adapt to general problems.

My dream then is a general-purpose data structure that takes the best of both the relational and array-programming worlds: the freedom and flatness of relational tables, enriched with tacit ideas like broadcasting that make array programming so ergonomic.

I’m working on a Rust library for this as my MS project. It calculates the queries inside the type system, so there’s minimal runtime cost.

It's not quite the same, but how do you feel about Datalog or Prolog? Having a set of facts and being able to make arbitrary relations in them makes for some interesting programming. Especially if you limit your data to a more rigid normal form like RDF tuples. Datafun is also a cool upcoming language in this space https://github.com/rntz/datafun

Starting from the other end, what is missing in SQL to make it more general purpose?

Generic foreign keys / Polymorphic relations. I.e. table Foo can either point to table Bar or table Xuul.

This type of relation shows up quite a bit, and either leads to application-level workarounds, or having to add many duplicate tables for the same "thing".

This is a cool idea. What would you need to be able to try this in 2 weeks, instead of a few years? What's the lean slice?

Relational algebra isn’t that hard to implement if you’re willing to sequential-scan all of the time. The massive complexity comes in the query planner: The point of adding an index is to change the space-performance tradeoff of certain operations. The system needs to be able to take advantage of these data layout changes, or there’s little benefit over storing everything in flat arrays.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact