Hacker News new | past | comments | ask | show | jobs | submit login
The Periodic Table of Data Structures [pdf] (seas.harvard.edu)
598 points by asplake on Oct 27, 2018 | hide | past | web | favorite | 64 comments

This is very inspirational but not sure if this is done in the best way. The core value of Mendeleev's approach is to find gaps and predict things that should exist but we haven't found yet. However for this to happen, one must isolate the "atoms" that can't be further broken down and identify the pattern in these atoms. The periodic table created in Figure 4 of this paper doesn't exactly sounds like isolation of "atoms" of data structures.

You're right based on the modern day thinking of what the periodic table is but you're wrong to suggest that people knew what they were doing from the very start.

The formulation of the periodic table is quite like modern day data science. Chemists even before Mendeleev observed patterns and tried to build a model to predict missing ones. The beauty is that this was done without understanding the underlying mechanism of how it all worked, ie what makes an element (protons), what makes their periodicity (electrons). That came 50+ years later when we discovered protons and electron orbitals.

So going back to the article, their approach is fine because they're just formulating some model from observed patterns. Their reasoning for the underlying mechanism could be completely wrong but that's fine too until a better model comes along.

>The formulation of the periodic table is quite like modern day data science.

That's because data science is just science. The only distinction is that you're studying somebody's sales pipeline instead of the secrets of the universe.

FWIW I was able to kinda sorta do that for basic recursion combinators: http://joypy.osdn.io/notebooks/Recursion_Combinators.html#co...

In "Functional Programming with Bananas, Lenses, & Barbed Wire" http://citeseerx.ist.psu.edu/viewdoc/summary?doi= they talk about morphisms: Hylomorphism, Anamorphism, Catamorphism, Paramorphism.

If you break them down using Joy (programming language) in terms of the genrec general recursion combinator, they form simple patterns, which then suggest at least two other unnamed combinators:

Hylo-, Ana-, Cata-

    H == [P  ] [pop c ] [G          ] [dip F        ] genrec
    A == [P  ] [pop []] [G          ] [dip swap cons] genrec
    C == [not] [pop c ] [uncons swap] [dip F        ] genrec
Para-, ?-, ?-

    P == c  swap [P  ] [pop] [[F        ] dupdip G          ] primrec
    ? == [] swap [P  ] [pop] [[swap cons] dupdip G          ] primrec
    ? == c  swap [not] [pop] [[F        ] dupdip uncons swap] primrec


I am fascinated with unifying theories such as the periodic table (Mendeleev's) and the Copernican view of planets orbiting the sun. The previous, Earth-centric model had more deviations than predictive power. The periodic table is also an example of a structure with tremendous predictive power.

I think it's also interesting to distinguish between systems that reveal intrinsic order (e.g. periodic table), and systems that superimpose external order (e.g. a performance review framework).

Behavioural economics is an example of a field that desperately needs a modern-day equivalent of Mendeleev to come around and structure.

I am craving more examples of this, please, anyone, share!

I can't find it now but I recently read of one or two guys who systematically tabulated juggling tricks. Gaps in the table led to new tricks which astonished experienced jugglers -- the way gaps in the periodic table led Mendeleev to predict properties of undiscovered elements. One guy gives talks on juggling at math conferences and on math at juggling conferences.

This reminds me of the effort to enumerate all ways of tying a necktie: http://www.tcm.phy.cam.ac.uk/~tmf20/tieknots.shtml

If you know programming, you may enjoy Peter Van Roy's components of programming paradigms.


Physics is full of such examples.

Newton’s theory of gravitation unified the motion of apples falling from a tree, cannonballs thrown into walls, the motion of planet earth around the sun, and even motion of all the known planetary bodies.

Electromagnetic waves gives us one understanding of X-rays, gamma radiation, visible light, infrared, radio waves, etc.

Quantum field theory unifies every microscopic theory of matter and energy we have ever known.

This list could be much longer.

They removed the image from the main article for some reason but I thought this was cool - https://en.wikipedia.org/wiki/File:The_Cognitive_Bias_Codex_...

Here's an eclectic mix of examples from various perspectives across multiple domains. Understanding the representation of knowledge and the fundamental structure that underpins the connective essence of things is one of the prime ideas that drives my thinking, and as a consequence much of what I post about on here is related to that in some way. Below is a brief sampling of examples related to this idea, and for a more extensive list, browse through my postings (and if you can see how they all connect, let me know!).

In general, see the Classification of Finite Simple Groups [0]. In terms of specific examples, the minimal classifications of knots [1], trees [2], and braids [3] are 3 instances that popped to mind. From a number-theoretic perspective, the paper Enumerating the Rationals and its derivative works are interesting reads. In terms of spatial structure and spatial decomposition, look at the classification of crystal structures [4], lattices, polytopes (zonotopes are particularly interesting), periodic and aperiodic tilings, the classification of space-filling curves, and the classification of closed surfaces [5]. From a relational perspective, look at the classification of topological spatial relations [6] and Allen's interval algebra [7]. And the one my intuition finds most intriguing is the classification of Group Theory Single Axioms and its relation to my conjecture that division is primary.

[0] Classification of Finite Simple Groups https://en.wikipedia.org/wiki/Classification_of_finite_simpl...

[1] Prime Knots https://en.wikipedia.org/wiki/List_of_prime_knots

[2] Enumerating Trees https://www.cs.virginia.edu/~lat7h/blog/posts/434.html

[3] Braid Group https://en.wikipedia.org/wiki/Braid_group

[4] Crystal Structure https://en.wikipedia.org/wiki/Crystal_structure

[5] Classification of Closed Surfaces https://en.wikipedia.org/wiki/Surface_(topology)#Classificat...

[6] Topological Spatial Relations https://en.wikipedia.org/wiki/Spatial_relation

[7] Allen's Interval Algebra https://en.wikipedia.org/wiki/Allen%27s_interval_algebra

[8] "Yet Another Paper on Group Theory Single Axioms" https://www.cs.unm.edu/~mccune/projects/gtsax/

P.S. Clifford Algebra / Geometric Algebra [00] unifies much of mathematics, e.g. in GA, Maxwell's equations can be reduced to one equation and expressed on one line [01]:

[00] Geometric Algebra https://en.wikipedia.org/wiki/Geometric_algebra

[01] Maxwell's equations formulated in terms of Geometric Algebra https://en.wikipedia.org/wiki/Mathematical_descriptions_of_t...

The Structure of Scientific Revolutions by Thomas Kuhn discusses paradigm shifts and periodic re-unifications of theory and fact.

Does anyone know one of these for financial contracts?

I love that paper.

I confess I haven't read the whole thing but that figure 4 looks more like a set of approaches, rather than a continuum as in the periodic table.

Their figure 1, which appears to describe fundamental design tradeoffs, makes me think of CAP theorem "maps", like this:


But the whole idea of figuring out first principles and "fundamental" design choices is quite appealing. They have a section saying that a big goal here would be automatic design of optimal data structures. They talk of machine learning techniques (they mention Bayesian optimization and reinforcement learning).

That seems a very interesting research direction. In the same vein there's this talk by Jeff Dean about how they use ML at Google to replace all sorts of heuristics in data structure design (bloom filters etc.) with machine learning to optimize performance, though from what I recall it doesn't automatically change the algorithm itself:


(discussed on HN previously https://news.ycombinator.com/item?id=15892956)

EDIT: I think they cite a paper by Dean and others which was part of that talk

> I confess I haven't read the whole thing

This is a sort of functional bug in forums such HN.

Commenting on paper like this requires time to digest and reflect on the content. For a conceptual paper such as OP I would need at least a day (to figuratively sleep on it) but returning a couple of days later to make (hopefully) informed comment is no longer interactive. There is archival and future reference value, of course, but then rebuttals to possible misconceptions will not accompany the comment.

It's quite funny I made a very similar lament a while ago:


I only post when I think I can somehow add value to the discussion, at least value for some people who might read the comment. Obviously some people with in-depth knowledge of the paper and field in general will probably already be aware of the link I posted (for example), but others might find it an interesting avenue to explore. I know I often come to HN comments for this reason: a topic seems interesting, I want to learn of related stuff and/or get opinions from people with deeper expertise.

So even if it's only for "future reference", it might actually bring value _for some_. At least that's what I tell myself :)

Yeah, that and tracking long-running topics are things old-fashioned forums (phpBB style) are better at than the newer, more ranking-oriented ones (HN, reddit, ...), since old discussions can be pushed up again.

Although long forum threads only really work if someone regularly updates the first entry with newest infos.

I want that read/write/space-optimized triangle prominently featured in every languages documentation, filled in with the different data stuctures from that language.

Need quick reads? Use Y. Quick writes? Use X. Etc.

I think I saw it popularized as the "RUM Conjecture" on HN a while back:



Same authors as the OP btw:

Manos Athanassoulis Postdoctoral Researcher

Michael S. Kester Graduate Researcher

Lukas Maas Graduate Researcher

Stratos Idreos Assistant Professor of Computer Science

be the change you want to see in the world :)

About the gaps

"""Mendeleev was a friend and colleague of the Sanskritist Böhtlingk, who was preparing the second edition of his book on Pāṇini[47] at about this time, and Mendeleev wished to honor Pāṇini with his nomenclature.[48] Noting that there are striking similarities between the periodic table and the introductory Śiva Sūtras in Pāṇini's grammar"""

More: https://en.wikipedia.org/wiki/Dmitri_Mendeleev

The generalized form of the "Periodic Table" is the Morphological Analysis. http://www.swemorph.com/ma.html

Most of the theoretical work is in picking the right properties that expose the right trade-offs. Beyond that, the method is deceptively simple.

this is probably off topic, but I've waited before asking until I found the closest topic from my perspective:

A while back (order months or years but certainly within last 2 years) someone mentioned a paper, I believe it was here on HN either as the topic or in a comment, but perhaps it may have been on IRC.

I am no longer able to find back this paper, and have only a vague recollection of it (I had read the abstract and the introduction and then postponed reading until I forgot about it).

It dealt with the problem of say 2 peers each having their own list or set, and part of their list is in common, but both may also have entries the other doesn't have. The problem was finding the most efficient way such that both end up with the union of both lists or sets. A brute force way would be to each send a copy of their list to the other, a slightly less brute way would be to have only one send a full copy, and have the other return his difference. But the paper detailed a more efficient method, which obviously I can't remember...

Does this description ring a bell? Does anyone know the paper I am trying to locate?

I actually have a similar question to yours--I lost track of a set of notes that was linked to in a HN comment, and would like to find them again!

The notes were a (draft?) PDF for a textbook on algorithms--much like Mathematics for Computer Science by Lehman & Leighton [1]--but instead, the topic was narrowly restricted to algorithms with a cryptographic or otherwise number-theoretic basis. In particle, hashes, content-addressable storage, and Merkle-trees were covered.

[1] https://courses.csail.mit.edu/6.042/spring18/mcs.pdf

Not this one by any chance?

Boaz Barak, "An Intensive Introduction to Cryptography"


Very interesting. I can't actually decide if this was what I had in mind, but it actually looks better. Thanks!

Perhaps "A Computational Introduction to Number Theory and Algebra" by Victor Shoup? This was recommended by Dan Boneh in his crypto course.

Another interesting book, so thanks, but perhaps less likely even, since I recall it had some material on things having to do with hash-based file systems (I recall seeing material on Merkle trees).

This isn't new, and doesn't answer that part of your question, but a Merkle Tree is a solution to that problem that can avoid sending the entire set.

One way to do this is to send a bloom filter as index and then return the elements that are not part of the bloom filter.

But what about the false positives?

Well, I would assume the receiving node would verify that each entry was actually needed.

Aren't the false positives in this situation representative of set elements that node b thinks that node a has but really does not?

Right, the solution I remembered was a bit more complex, it required counting bloom filters and a subtraction operation to get a filter for the set difference. But the paper mentioned in a sibling comment claims to have even less overhead.

I was reminded slightly of the taxonomy of abstract data types (as opposed to data structures) that appears in Sally Goldman's A Practical Guide to Data Structures and Algorithms using Java:


A few months ago I saw a program like this that would derive a data structure for you based on what operations you wanted to perform (inserts and queries, for example) but I haven't been able to find it again.

That's it! Thank you.

sounds interesting! if you find it again, plz post

Here's the citation from parent link seas.stratos.harvard.edu:

S. Idreos, et al., “The Periodic Table of Data Structures,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 41, no. 3, pp. 64-75, 2018.

You could make a periodic table of many things. Websites, languages, databases, etc. Maybe some new ideas could be uncovered.

Should "Data Structures" be followed with "is"? I was under the impression that since it's plural, we'd have an "are".

In this case, yes.

> This is why many argue that research on algorithms and data structures is

"is" refers to "research", not "data structures"

The sentence in question is

> Data structures is how we store and access data.

wich is arguably just a bad sentence.

The grammar is still correct if the reference here is to data structures as a singular group which is actually what the sentence appears to do.

Whether it’s correct or not, it’s not a great sentence, and would benefit from a rewrite.

Language is weird sometimes, so I’m not certain, but your argument seems circular and incorrect. Use of ‘is’ doesn’t cause ‘data structures’ to be singular a collective noun. The noun is plural, so the correct verb is ‘are’. The sentence would need to name the singular collective in order to use ‘is’. For example: “A set of data structures is how we store and access data.” That would be correct. Arguing that “Hard disks is how we store and access data” is correct and that “hard disks” is a singular collective noun, because the sentence used ‘is’, is not normally accepted grammar.

But not quite incorrect.

Normally I try to get past grammatical errors in scientific literature if they do not affect the meaning of the content. Authors may not be writing in their native language, and the literary aspects of the piece aren't its purpose. When it's the third sentence in the abstract and the first of two questions that the paper is fundamentally seeking to answer, it does make me twitch a little.

Yes, absolutely. Why do you have to ask? Unless '-s' is its genetive inflection, something German for example has preserved, or something weirder, it is a crystal clear mistake

If this works, a universal method to an exact solution for all differential equations would be on there. Good luck with that.

Reminds me of GoF patterns (the approach), but the result here seems more pedagogical...

I see Tarjan cited but no Sedgewick, or CLRS. I'm against obsequious citations, so I'll give the benefit of the doubt that these aren't needed.

Isn't CLRS mostly a secondary source that puts together previous literature? Citations usually prefer a primary source that lets you place a "discovery" accurately in date and place (aka conference/journal).

CLRS is wildly overrated; it's got a lot of odd gaps in the data structures that it does cover, and wastes a lot of time and space on esoteric junk.

What's your preferred reference?

What kind of gaps, and what esoteric junk? Genuinely curious.

A Cartesian product is not a periodic table

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact