Hacker News new | past | comments | ask | show | jobs | submit login
Immutability Changes Everything [pdf] (cidrdb.org)
90 points by jpmc on Jan 27, 2015 | hide | past | web | favorite | 25 comments

Those interested in immutable data-structures might like to look into Irmin [1], git-like distributed storage (due to be released soon). It's also part of a larger stack [2].

[1] http://openmirage.org/blog/introducing-irmin -- prev discussion: https://news.ycombinator.com/item?id=8053687

[2] http://amirchaudhry.com/brewing-miso-to-serve-nymote/

Neato! Also Camlistore: https://camlistore.org/

> "Normalization’s goal is to eliminate update anomalies. Normalization is not necessary in an immutable data set. The only reason to normalize immutable DataSets may be to reduce the storage necessary."

You can still have an update anomaly in an append-only setting. Teacher changes name (by submitting a change-of-name form) - now you need to replace all the class lists.

This can be done in an append-only way, not fast, not efficient but as long as you separate writing the change from committing it in the log, you're fine.

the class list would be rendered by looking up the teacher's name. a rendering can be cached with a timestamp (it was valid as of this time) and then the most recent render can be stored. by setting a dirty flag whenever data changes, you can update dependencies when they are fetched and then memoize the results.

You still need consistent object identities to do that, which are then tantamount to mutability (they are equivalent with equivalent problems related to consistency, and similar solutions).

can you elaborate?

"Accountants Don’t Use Erasers" I was about to say 'scientist too' but they do use chalkboard erasers... only when cons cells aren't reachable anymore though.

And note that student eyes are only weak references.

Anyone have any idea the date of the paper? I couldn't find it anywhere and it seems like an important piece. I'm guessing "immutability changes everything" would be a phrase that means something different if stated 20 years ago for example.

It references a paper from 2014-2015 so it's at least that new, but still I think it would be easier if the date is shown on the top or something.

On the bottom left of the first page:

"7th Biennial Conference on Innovative Data Systems Research (CIDR’15) January 4-7, 2015, Asilomar, California, USA."

He's been working on these ideas for quite some time. Here's a talk of the same name at RICON 2012: http://vimeo.com/52831373

Having said that, this is a pet peeve of mine, too. I get frustrated by technical papers with no dates.

I always rely on the copyright field of the paper for dating. Since these papers go into proceedings, the templates don't provide for (or allow for) obvious dating fields.

Content-addressable storage will become more and more important as the immutability trend becomes more and more commonplace. (check out https://camlistore.org/). We will also want to use the proper tools to manage the new immutable paradigm... here's looking at immutability in programming languages.

I don't propose a change to the title on HN, but the linked PDF is an informal paper that's about immutability in databases.

Some other comments seem to be positive. I wanted to like this paper but did not. Forgive me if I sound critical, but the style and content of this document don't compare favorably with academic or technical papers. There is some interesting content, certainly. Perhaps I am not understanding the audience or intent? Immutability is certainly important in my book, and I don't think this paper lives up to its potential. This could be a home run if done properly.

== Iffy Content ==

A. The document sometimes mixes technical content with opinion, but does not clearly separate the two. I'm happy to read normative articles in the right context, but I don't think the content here fits the format. For example, "There is an inexorable trend towards storing and sending immutable data." Many of us want to believe that, but is it actually true? I don't see any support or reference for this statement.

B. The technical content sometimes gets muddled. For example, "When storing immutable data within a consistent hash ring, you cannot get stale versions of the data. Each block stored has the only version it will ever have!" I have several concerns with this sentence. First, it is unnecessarily specific. You don't need to use consistent hashing to get the property described; there are other ways. Second, even with immutability, availability is not guaranteed; although you may not get 'different' versions of the data, you may get no data at all. Third, and most importantly, even with versioning, as mentioned in heading 6, you have significant coordination challenges. Immutable systems do simplify coordination, but not enough to justify the author's statement.

C. The terminology is sometimes confusing. For example, the use of the "DataSet" concept confused me more than it helped. I find it to be an unnecessary distinction that did not add clarity. (Not to mention that the term is used well before it is defined.) How is a "DataSet" different from a "data set", exactly? After reading back and forth, I struggle to understand why a new term was needed.

== Questionable Style ==

A. The content reads more like a slide deck than a paper.

B. The transitions between different sections are choppy.

C. The writing is informal, and that is putting it charitably. One section is titled "Normalization Is for Sissies". Trust me, I like bad jokes and puns -- I just don't think they fit the format. "Hard Disks: Getting the Shingles"? Why stop there? Why not add "Yes, ladies and gentlemen, I'll be here all night!"

D. What is the purpose of the gray callout boxes with the exclamation pointed sentences? Stylistically, they seem out of place. It makes the paper look more like a badly formatting dinner menu or fundraising letter and less like a technical paper. For example, "High availability of immutable blocks is available now! Google, Amazon, Facebook, Yahoo, Microsoft, and more keep petabytes and exabytes of immutable data!"

== Summary ==

I would not hold this up as an example for people to read or learn from. I really don't like being this critical, but I feel like it is important for me to say something. I think papers, especially in this area, should meet a certain bar. I've tried to offer constructive criticism.

I hate to say it, but for a minute I wondered if this paper might be tongue-in-check. The graphic with the caption, "Fill out Part 3 and keep the goldenrod page from the back", is sufficiently bad that it is funny. (No offense intended.)

the CAP theorem only applies when dealing with mutable shared state. that's reason enough to go immutable.

I'm not sure I would characterize it that way. Even with immutable data you can be uncertain whether a value is stored on a system on the other side of a partition. You're only confident that if you can see the data, it hasn't changed somewhere else (which, don't get me wrong, is a tremendously valuable property).

when you define things in terms of immutable structures, the worst thing that happens is you get an _old_ answer - but not an incorrect one. the old answer comes back with a timestamp: "this is the answer to your question, as of (time)"

the CAP theorem means you either give _inconsistent_ answers (what is the state now? it's X) to two different nodes, or you sacrifice availability.

That's a really interesting thought, could you provide some sources?


this is the paper in which the theorem was formalized. they talk in terms of 'the initial value' of an atomic object, and its 'subsequent values' - so this only makes sense when the system is _implemented_ in terms of objects which have values that change over time as the result of operations:

--- More formally, let v0 be the initial value of the atomic object. Let c~1 be the prefix of an execution of A in which a single write of a value not equal to v0 occurs in G1, e ---

If requests that come in are expressed as "give me the most recent value of A," and the response comes back with a history of all operations done on A along with timestamps, then you're no longer bound by the CAP theorem because 'an operation was missing' is no longer a _problem_ - its' just an outdated answer.

Are you saying CAP doesn't apply to immutable shared state, or something else, i.e., what do you mean by "state"?

yes, i'm saying that the cap theorem only applies when you have a shared state storing mutable objects which change over time. if there is an append-only record, the CAP theorem - as formalized here (http://dl.acm.org/citation.cfm?id=564601) - doesn't apply

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact