Hacker News new | past | comments | ask | show | jobs | submit login

In F#, you would naturally approach this by defining a Customer independent of the Account (e.g. just containing a name and address), and an Account independent of the Customer (e.g. just containing an ID), and then a Bank which is a mapping of Account to Set<Customer>. What you see as a cyclic dependency, I see as a data type that you haven't reified.



This came up in an application I had modeling college football. A Team is a member of a Conference and a Conference is made up of Teams. But during realignment a few years ago when teams were switching all over the place it became apparent that Teams and Conferences were truly independent of each other and that their association itself was a first class object which also needed to capture the Season/Year.

Not really rocket science but some times your model is telling you something if you’re willing to listen.


This is almost exactly the motivational example that Codd used when originally describing relational algebra. He described 5 different data organization schemes for a single problem and designed relational algebra to work with all of them. This ability for the same program logic to work with many different data storage layouts should make changes like you describe less painful to implement.

(see page 2)

E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (June 1970), 377–387. DOI:https://doi.org/10.1145/362384.362685

PDF: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf


My dream, when I'll have a few years of free time, is to design a language were the relational table is the primary data structure abstraction.


Mine too. My main gripe with object graphs is that often you need to make arbitrary decisions about what comes "first" -- should I represent my customers as a list of `(name, address)` records or a record `(names, addresses)` of lists? This is a silly decision to have to make, and an even sillier one to have to deal with when later you need to transpose the structure to more naturally fit some other task. Relational tables sidestep this issue.

Over the years I've found that these transposition problems are a huge drag on programs, and once you recognize them you start seeing them all over the place. They make simple and obvious relational tasks into dozens of lines of nested data structure traversal and manipulation.

The notion of transposition comes from the array programming world, where a multi-dimensional array/tensor can be seen as a tree structure, whose levels may be easily reordered by transposition. I sometimes use numpy arrays for general-purpose programming; it can be very powerful with a well-chosen representation. Unfortunately array concepts basically require the data to be rectangular -- `x[i]` has the same shape as `x[j]` -- which can be hard to adapt to general problems.

My dream then is a general-purpose data structure that takes the best of both the relational and array-programming worlds: the freedom and flatness of relational tables, enriched with tacit ideas like broadcasting that make array programming so ergonomic.


I’m working on a Rust library for this as my MS project. It calculates the queries inside the type system, so there’s minimal runtime cost.


It's not quite the same, but how do you feel about Datalog or Prolog? Having a set of facts and being able to make arbitrary relations in them makes for some interesting programming. Especially if you limit your data to a more rigid normal form like RDF tuples. Datafun is also a cool upcoming language in this space https://github.com/rntz/datafun


Starting from the other end, what is missing in SQL to make it more general purpose?


Generic foreign keys / Polymorphic relations. I.e. table Foo can either point to table Bar or table Xuul.

This type of relation shows up quite a bit, and either leads to application-level workarounds, or having to add many duplicate tables for the same "thing".


This is a cool idea. What would you need to be able to try this in 2 weeks, instead of a few years? What's the lean slice?


Relational algebra isn’t that hard to implement if you’re willing to sequential-scan all of the time. The massive complexity comes in the query planner: The point of adding an index is to change the space-performance tradeoff of certain operations. The system needs to be able to take advantage of these data layout changes, or there’s little benefit over storing everything in flat arrays.


This resembles sql a lot. Btw I'm surprised that few languages seem to have in their standard library a many-to-many container which supports efficient lookups in either direction.


Actually yeah that is kinda interesting. The couple times I've had to do it, always ended up hand-rolling a 'class' with hashmaps in both directions. I wonder why few standard libraries have a both way look up structure.


Which languages support this at all? Are you talking about ORM?


I'm talking about something like this

    std::relation<Foo, Bar> foobars;
    ...
    auto& bars = foobars.forward(foo);
    ...
    auto& foos = foobars.backward(bar);
Edit: Boost apparently has it, see e.g. https://stackoverflow.com/questions/1128144/c-maps-for-many-...


Boost.Bimap is a special case (and built on top) of Boost.Multiindex that allows indexing a logical table with an arbitrary subset of fields.


> a data type that you haven't reified

Insightful. Once you learn to see dependencies as implicitly revealing abstract data types you haven't yet teased out into existence, you start seeing all kinds of places where behavior that ought to be explicit and handled by an abstraction is smeared across multiple other concepts.


I've got a half-cooked blog post about this in the oven: "reify the state machine". I've seen plenty of code with lots of communicating parts, which is really a distributed implementation of a state machine. Rewriting to make the states explicit as data, and perhaps creating a central "state machine" type for that machine, can really help clarify the domain model and let you see what you're missing.


> and then a Bank which is a mapping of Account to Set<Customer>

The problem is that this only lets you get customers from accounts, and not vice-versa.

And if you add a Map<Customer, Set<Account>> to the Bank, you now have to ensure the mappings are synchronized at all times. Which, to be fair, is very straightforward as long as Bank is immutable, but still kind of annoying.

But considering how incredibly common many-to-many relationships are in business logic and especially in SQL, I'm constantly surprised that there isn't a de facto standard, efficient data structure to represent them. That speaks to how entrenched cyclic relationships are in software engineering, I suppose.


In the in-storage form (database), the mappings is synchronized at all times.

Many-to-many relationships in the purest form is described as records of ItemAId/ItemBId pairs. `List<{ account: Account, customer: Customer}>`. There isn't any cyclical relationship in this form.

From `List<{ account: Account, customer: Customer}>` one can derive `Map<Customer, Set<Account>>` and `Map<Account, Set<Customer>>`. Bank can have copy of both mappings and use them as double index, synchronizing both mappings as operations goes by, while at the same synchronizing both mappings to the in-storage form. Or not!

We're talking mostly about data-modelling in a programming language, which applies to the in-memory form. `Customer { accounts: Set<Account> }` and A `Account { customers: Set<Customer> }` is its in-memory form, a layer of indirection from its actual form, the in-storage `List<{ account: Account, customer: Customer}>`.

If informations of Accounts, Customers, and the many-to-many relationship of both are laid flat in its purest, source-of-truth form, the in-storage database entries, why are people suggesting that there is/should be a cyclical relationship in its representative form, the in-memory objects?


This is the right answer and I wish more languages would make this dramatically more ergonomic to do. Store the most basic normalized truth and query/derive what you need.


Sure, but you wrap that up if you want. A Bank can contain both maps, if you want, and you hide the methods that would update individual maps, only exposing your wrappers that update everything together.


Yeah, I'm not saying it's difficult, I'm just saying that it's weird that there isn't a standard ManyToMany<T1, T2> class in most programming languages, even though a functional (if perhaps not performance-optimal) implementation is straightforward and should be very useful at least for DB mappings.


True, I have implemented it myself multiple times, getting a bit annoyed at its absence without stopping to ask why.


Which makes it easy to answer the question “what customers are on this account” and hard to answer the question “what accounts does this customer have”. As a customer, I usually want the latter.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: