
The Universal Data Structure - elbenshira
http://elbenshira.com/blog/the-universal-data-structure/
======
jfe
"Hashes are always O(1) reads, inserts and writes."

Maybe, once you've found a location to read, insert, or write to. The author
neglects the runtime cost required for the hash algorithm itself, which may
not be trivial; computing the hash of a string key is typically an O(n)
operation.

Furthermore, unless a suitable table size is selected, integer keys (should
one use a map like an array) will eventually hash to the same value, requiring
even more time to iterate through all the remaining values that match that key
until the desired value is found.

"I don’t know why you would ever use [a linked list] over an array (or a
hash)..."

Here's why: because arrays take up space whether you use it or not. Linked
lists don't suffer from this problem, but at the cost of 1-2 pointers per
item. Has the author seriously never managed memory before? Please tell me
this article is a joke.

~~~
seubert
It's even tagged "bad-theory." I think it's pretty clearly a joke! A really
good one!

~~~
dhimes
"And we can’t forget our favorite JavaScript interview question of all time:
If you only had twenty-four hours to implement arrays in JavaScript, how would
you do it?"

~~~
mdaniel
Aren't they just objects that have integer properties; that's why you can

    
    
        for (k in ['a', 'b']) {
          console.log(k);
        }
    

and get back 0, 1? However:

    
    
        var a = {0:'a', 1:'b'};
        for (k in a) {
          console.log(k);
        }
    

_also_ outputs 0, 1.

Edit: turns out, you can even have doubles, too:

    
    
        var weird = {3.14:'hello', 6.28:'world'};
        // for loop above emits: 3.14, 6.28
        console.log(weird[3.14]); // emits 'hello'

~~~
chriswarbo
> Edit: turns out, you can even have doubles, too:

That's because Javascript doesn't actually have integers, it just has
"Number":

> The Number type has exactly 18437736874454810627 (that is, 264−253+3)
> values, representing the double-precision 64-bit format IEEE 754 values as
> specified in the IEEE Standard for Binary Floating-Point Arithmetic

[http://www.ecma-international.org/ecma-262/5.1/#sec-8.5](http://www.ecma-
international.org/ecma-262/5.1/#sec-8.5)

------
Cushman
This is a perfect example of the kind of humor that belongs on HN. Actually
had me nodding along in parts, then screwing up my face at others. By the time
I was sure it was satire, I was committed enough to see it through to the end.

Best of all, I expect sincere discussion of the merits of the argument in this
thread. Well executed, and _heh heh_.

~~~
ignoramous
Is this blog post a dig on Rich Hickey's talk 'Simple Made Easy'?

~~~
Cushman
Just speaking for myself, it seemed like a pretty good-natured mix of
thoughtful musing on the nature of universal computation, and gentle mockery
of the idea that universal computation means there's one right answer for
anything. I'd be surprised if it's meant to skewer any one viewpoint in
particular.

~~~
elbenshira
This is an acceptable description.

------
fasteo
Like a well trained Pavlov dog[1], reading the title brought "Lua table"[2] to
my mind, the most flexible data structure I have worked with, by far.

[1]
[https://en.wikipedia.org/wiki/Classical_conditioning](https://en.wikipedia.org/wiki/Classical_conditioning)

[2] [http://www.lua.org/pil/2.5.html](http://www.lua.org/pil/2.5.html)

~~~
masklinn
> reading the title brought "Lua table"[2] to my mind, the most flexible data
> structure I have worked with, by far.

Which is not necessarily a good thing. PHP's array and JS's Object are
essentially the same thing.

~~~
pygy_
Lua tables accept arbitrary objects as keys, and make a difference between
`foo[1]` and `foo["1"]`.

Plus all the metatables goodies: weak key and/or value references, prototype
inheritance, ...

~~~
thaumasiotes
> Lua tables accept arbitrary objects as keys, and make a difference between
> `foo[1]` and `foo["1"]`.

That sounds... completely normal? Python dictionaries will do that too. So
will Java HashMaps.

~~~
seabee
There is another subtlety that values associated with integer keys are stored
as an array. dicts and HashMaps don't do this.

~~~
adrusi
And not even all integer keys! Lua will efficiently handle sparse arrays using
tables.

------
jerf
Having read over the entire thing, I have only one issues with it: The
Universal Hash structure is only universal if the language implementing it
permits you to create cyclic structures, so you can manipulate the "pointers"
or relevant language concept to create the cyclic structures. There are a
handful of languages that don't permit that, such as Erlang, and some
languages like Haskell that permit you to use "tying the knot" to create
cyclic structures [1], but ones that can sometimes be difficult to manipulate
after the fact.

In those languages, you'll probably need to use a data structure well known
for its simplicity, efficiency, and broad cross-language-platform
compatibility: The Resource Description Framework's graph model. Graphs are
also a well-known candidate for "universal data structure" [2]. Also, it's
semantic, which is a _clear_ advantage over the entirely not semantic
UniversalHash, because semantic is better than not semantic when it comes to
universal semantics of semanticness.

Semantic.

Otherwise, the article is pretty solid and all serious developers should
definitely consider implementing it forthwith, as they will find it a very,
very educational experience. I know I've seen people implement this model in
Java before, and I can certainly vouch for the fact that it was an education
for all involved.

I'm going to say "semantic" one more time, because it's been my observation
that the more you say it, the smarter you sound, so: semantic. Oh, yeah,
that's the stuff. Mmmmmmmmm.

[1]:
[https://wiki.haskell.org/Tying_the_Knot](https://wiki.haskell.org/Tying_the_Knot)

[2]: Seriously, if you really do want to discuss the "universal data
structure", perhaps because you want to analyze data structures very
generically for some mathematical reason, graphs really are a good candidate.
Not necessarily RDF graphs, which are both bizarrely complicated while lacking
simple features; the three blessed collection types are a weird combination of
features.

~~~
jsprogrammer
Do you really need to create "true" (i.e. language level?) cyclic structures?
Shouldn't you be able to simulate cyclic structures at the cost of requiring
more space (and probably time) to compute the simulation?

~~~
jerf
That would get into the stuff at the bottom, when you simulate other data
structures within your data structure. A non-cyclic data structure can
simulate cyclicness with IDs on nodes and things that store links... it's
generally how you do it in Haskell, in fact, since while saying "it can't" do
true graphs is perhaps a smidge overstrong it is certainly not practical to
try to modify knot-tied structures. (I've seen the question "How would I do a
graph?" repeatedly on /r/haskell, and "use IDs in a map" is generally what
comes back.) But you're putting a layer on top of your store.

(By the by... you know you can trust me, because... semantic. Semantic.)

------
carapace
"...We propose that many, and maybe even all, interesting organizations of
information and behaviour might be built from a single primitive operation:
n-way associative lookup. ..."

[http://www.vpri.org/pdf/tr2011003_abmdb.pdf](http://www.vpri.org/pdf/tr2011003_abmdb.pdf)

------
synthmeat
While this may be a mockery, title probably alludes to _Universal Design
Pattern_ [1], which is not so easily dismissable idea.

[1] [http://steve-yegge.blogspot.com/2008/10/universal-design-
pat...](http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html)

------
shkkmo
This had me worried:

"God-given axioms, which is academic lingo for a truth we accept purely by our
God-given logic, like: if something is not true then it is false, something
cannot exist in an empty set, something logical is logical."

But then I saw this and started laughing:

"Remember when everyone used tables to lay out their HTML? Well, that proved
to be a horrible way to do things, because tables are inherently inflexible.
It’s a strictly geometric constraint. Now we all use divs and CSS, because we
get a much-more flexible CSS engine to define our layout. Postgres and her SQL
friends are all table based, just like the <table> of 1999. Don’t be 1999."

~~~
abtinf
Why did the axioms bit have you worried? OP is pretty much exactly right,
except for the part where we accept it by our "logic". Axioms are simply true,
end of story; logic does not apply to axioms themselves, only how they can be
used in relation to other axioms.

~~~
shkkmo
Axioms are simply something that you accept as true to build a model. There is
no 'simply true, end of story' or 'proven by logic' to axioms.

Many axioms maybe picked because they seem 'obviously true', (or more likely,
because they are useful) but that doesn't make their truth simple or make them
the result of logic. (For an example take a look at the existence of
infinity).

Additionally, the 'axioms' he lists are all what I would generally consider
tautologies. (Although you might argue that the first one is actually an axiom
of bivalent logic systems).

------
Totient
Satire, aside, I think a very short addition to the last line holds a lot of
truth:

"A hash is simple. A hash is fast. A hash is all you need _to start with_ ".

I can think of plenty of good reasons to stop using a map/hash/associative
array in code, but I can't think of very many good reasons not to _start_
coding with associative arrays as your default data structure. If there's a
performance/memory problem, fix it later. I've seen a lot more code suffer
from premature optimization than I've seen suffer from using data structures
that were a little too inefficient.

------
brudgers
_When in doubt, use brute force. -- Ken Thompson_

Using hashes as a first choice data structure is not necessarily a bad idea.
1] Until profiling a working implementation demonstrates otherwise, other data
structures may be premature optimization.

[1] Clearly an improvement over the Lisper's association lists.

~~~
Grue3
It's not an improvement over _small_ association lists. At least in Common
Lisp hashes are pretty heavy and assoc lists will outperform them if there are
less than ~100 elements.

------
amelius
Ehh, relational databases already proved that "sets" are the true universal
data structures. Anything can be built upon them, including (hash)maps.

~~~
dragonwriter
A relation is equivalent to a map from its key to its non-key attributes. (A
hash-map is just an implementation detail in how a map/relation is
implemented.)

So, really, that's not a _different_ universal data structure.

------
Azkar
The Poe's law is strong in this one. It had me going.

------
jcwilde
I wish they had used the word "map" in place of "hash" throughout this entire
article. The use of hashes is an implementation detail and wholly irrelevant.

~~~
pmelendez
> The use of hashes is an implementation detail and wholly irrelevant.

Not in this case. An ordered map implemented as a tree has the same interface
but with very different operation complexities

~~~
jcwilde
My comment was meant tongue in cheek. Maps are _the_ universal data structure:
they can be used to map any input to any output. The rest is just an
implementation detail.

One might even call them "functions", but that ruins the joke.

~~~
elbenshira
I missed an opportunity here.

------
pmelendez
>"I hope you’re convinced. A hash is simple. A hash is fast. A hash is all you
need."

Hash tables are cool but they are far from being the only thing you need.

The problem with this kind of subtle satire without a disclaimer at the end is
that some people would fall by this and blindly follow it (collisions weren't
mentioned not even once nor cache misses).

~~~
kittenfluff
I don't think the satire was particularly subtle!

~~~
pmelendez
Given the number of comments with "Please tell me this is a satire" in this
thread, I would say it wasn't particularly explicit.

~~~
shkkmo
Then they obviously didn't read to the end:

"Unlike most academic work that has little to no practical implications, I
think blindly following the stuff here will prove to be incalculably
beneficial for you."

------
erikpukinskis
Jonathan Blow made the point recently that academic programming language
writers always make the same mistake of trying to take an idea and fully
radicalize it.

When you go from "Objects are super easy and useful in this language" to
"Everything Is An Object" you basically doom yourself to using objects to
implement a bunch of stuff that doesn't really make sense as objects and could
be implemented much easier as another data structure.

Big-brainded academics love the challenge of "ooh, can I make _everything_ an
object?" because they are always free to decrease the scope of their research
a little to compensate for the implementation taking a long time. And the more
phenomena you can contort into agreement with your thesis, the more
scholarpoints you get.

Blow advocates "data-driven programming" which, as a rule of thumb, I
translate in my head as "don't move anything around you don't have to."

For example, rather than just copying a giant array of JSON objects over the
wire when you only need some image URLs each with an array of timestamped
strings, you write the code that serialized that data. And if you do that a
few times, write the tooling you need to make that kind of thing easy.

The pitch is that it's not more work. And I'm kind of convinced. It just gets
rid of so much pollution when you are debugging.

Your first cut of things is often a little weird: "do I need a generator
generator here?" but typically you realize that a simpler solution works just
as well in the refactor.

When you hack in a "wrong but easier to sketch out" solution into your code as
the first try, it often just lives like that forever. Correct, confusing code
often collapses into correct, simple code. Simple, functional-but-wrong code
just seems less inclined to self improvement.

And I am continually surprised by how many problems, when simplified down as
much as possible, are best described with the basics: functions, structs,
arrays. You need fancy stuff sometimes for sure, but most of our human
problems are trivial enough that these blunt tools suffice. I just often won't
be able to see it until I've renamed all the variables three times.

What's interesting is I've been doing JavaScript programming this way, and
Jonathan Blow is... shall I say... not a fan of JS. But I think the concepts
translate pretty well! It's just instead of targeting raw memory, you target
the DOM/js runtime which is actually a pretty powerful piece of metal if you
have the patience to learn about the runtime and keep thinking about your data
every step of the way.

------
asQuirreL
It took me until:

> Hashes are always O(1) reads, inserts and writes.

To realise that this was a joke.

------
tempodox
Good tongue-in-cheek. Technically, however, the hash was outdone by the Lisp
CONS. CONSes could represent every conceivable data structure in 1959 and they
were eaten raw by the CPUs of the time. But then, there were not so many
people available for the following-blindly part.

------
mjcohen
That's one of the reasons I love awk (actually, gawk): this is the only data
structure it has.

~~~
ucho
It is* also true for PHP - almost any data structure internally is just a
linked hash map.

*or was, I am not sure about current state

------
malkia
Basic had that. It was called arrays - yay for A$()

------
belovedeagle
This... this is just very, very dedicated satire, right?

Let's replace main memory with hash maps and then _implement existing data
structures on that_.

Yes. This is satire.

~~~
hDeraj
Also note that the post is tagged as 'bad-theory'

~~~
msutherl
"Unlike most academic work that has little to no practical implications, I
think blindly following the stuff here will prove to be incalculably
beneficial for you."

------
michaelochurch
While this is satire, it brings to mind some bit of industry history. My first
reaction was, "you want to define a type class called Associative because
you're talking about an _interface_ , and that got me thinking about OOP vs.
Haskell's type classes (a superior approach) (...and then I realized that the
OP was a satire.)

The major historical selling point of object-oriented programming (OOP) to the
Forces of Evil-- not all business people are "Forces of Evil; really, there
are some great business people out there, and so I'm referring specifically to
cost-cutting mini-Eichmanns who've attempted to commoditize, humiliate, and
infantilize us with "user scrum stories" and a culture of mediocrity-- was
that OOP (after diverging far from Alan Kay's original vision) would allow the
Forces of Evil to replace high-cost experts with teams of mediocre
programmers, and thereby ruin the balance of power. Culturally, it worked (the
culture of mediocrity is well-established in the software industry);
economically, it failed (because large teams of mediocrities are actually
fucking expensive, because a "10x" engineer only costs about 1.5-2.5x an
average one).

The sales pitch for OOP to the Forces of Evil was that OOP would make it
possible to hire a couple of low-paid body-shop programmers too stupid to
recognize the OP as either (a) satire or, missing the joke but still correct,
just wrong. Smart wizards in the open-source world and at companies like
Google would do the actual engineering that made efficient hash-maps possible,
and CommodityScrumDrones would staple shit together using topologies thereof,
without really understanding any of the technologies they were gluing
together, and probably ignorant of why these things are sometimes called
"hashes" in the first place.

The problem is that when CommodityScrumDrones grow up and become middle
managers and get to start making technical choices, they often make bad ones.
They reject PostgreSQL as too hard to too old and use some NoSQL JSON storage
engine that was a "Show HN" project 17 days ago.

Even though the CommodityScrumProgrammer phenomenon has been a massive failure
in economic terms-- the Forces of Evil have won on ego terms but utterly
dominating their targets, but they've _lost money_ \-- it has been a cultural
success that has inflicted mediocrity, anti-intellectualism, and subordinacy
to "The Business", on the software industry. And now we have people calling
important technical shots who have literally _no idea_ why the OP is either
satire or wrong.

~~~
HanyouHottie
Why was this guy down voted? Everything he said is correct, if a bit
hyperbolic at points.

This is relevant: [http://www.smashcompany.com/technology/object-oriented-
progr...](http://www.smashcompany.com/technology/object-oriented-programming-
is-an-expensive-disaster-which-must-end)

