
What's been wrought using the Piece Table? (2014) - punnerud
https://web.archive.org/web/20160308183811/http://1017.songtrellisopml.com/whatsbeenwroughtusingpiecetables
======
joeax
I was tasked with writing a undo/redo framework for an existing client app a
few years back. It's basically an HMI system (a tool typically used in
manufacturing to visually model an industrial system) that used SVG+HTML. I
could have used a piece table concept (didn't know it was called that),
storing a diff fragment of what was changed (using a library like diff-match-
patch). Unfortunately, there was more than a text-representation that needed
to be maintained so it didn't work initially.

I wound up using a stack-based architecture that pushed operations stored as
functions onto a main stack then its reverse operation onto an undo stack.
Once I worked out the state management problems it worked out quite well.

I've always wondered what other approaches others have done if tasked with a
similar problem.

~~~
westoncb
Anyone know if there is a best-practice immutable data structure variation
here? The way I've thought about it is that if you store the entire sequence
of operations which have occurred so far, and you have some function that can
re-build the current state from that list, then you don't need any inverse
operations. This means that undoing an operation is just removing the last
operation performed from the operation list, and then rebuilding the state.

This seems nice since you don't have to figure out inverse operations (which
I've seen get tricky), but I imagine the performance penalty of having to
store all operations and rebuild the entire state every time could be
problematic in some cases.

Then again, I imagine there are good standard optimizations for dealing with
this kind of thing. One thing that occurs to me is that operations which
'cancel' each other could be sought out and eliminated before rebuilding the
state.

I'd be glad for any more information on the subject!

Edit: more specifically I'm wondering if those optimizations _do_ in fact
exist, and if so what they are.

~~~
rawnlq
Do you mean
[https://en.wikipedia.org/wiki/Persistent_data_structure](https://en.wikipedia.org/wiki/Persistent_data_structure)
or something more specific for this case? It's known how to only incur an O(1)
cost in time and space for making a data structure persistent. So then you
just need to keep a reference to all past versions of the data structure you
might ever want to undo back to.

EDIT: actually reading the wiki more carefully, I think they were talking
about making a BST persistent specifically rather than any data structure. In
that case looking at implementations such as rrb-vectors might be more
interesting: [https://github.com/clojure/core.rrb-
vector](https://github.com/clojure/core.rrb-vector)

~~~
westoncb
That wikipedia article looks interesting. I guess the only thing more specific
I'm wondering about is persistent data structures for undo/redo systems. But
just knowing the name 'persistent data structure' is useful.

------
kornish
For context, this was almost certainly submitted because of a recent
submission about data structures for use in text editors.

If anyone wants to check out the comments:
[https://news.ycombinator.com/item?id=15381886](https://news.ycombinator.com/item?id=15381886).

~~~
punnerud
True, but the original post only partially talk about the algorithm at the end
(when I almost was on my way to leave the article). The link I posted was
mentioned as a cool read in between, but I feel it deserve more credit and
readers.

~~~
kornish
Absolutely. Thanks for posting!

------
projektfu
Just mentioning that "unlimited undo/redo" meant that it was really fast to
undo and redo one step in Word. This system didn't necessarily enable an undo
history like we expect from word processors now. In fact, up to Word 5.1a for
the Mac and Word for Windows 2.0, Word didn't have undo history. Undo would
undo the last action, and then it would undo the undo. I believe undo history
was added to Word 6.0.

------
rosstex
Fun fact: UC Berkeley's intro course on Data Structures used a Text Editor for
the massive project of the semester in Spring 2016. (Source: I wrote the
autograder.)

Our undo/redo feature was suggested to be implemented as a stack.

[http://datastructur.es/sp16/materials/proj/proj2/proj2.html](http://datastructur.es/sp16/materials/proj/proj2/proj2.html)

------
hangonhn
Wow. The inventor Piece Table is none other than Moore of Boyer Moore string
search algorithm:

[https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_sea...](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm)

~~~
karmakaze
Also related to both strings and piece tables is the rope data-structure for
concatenations of strings

------
e98cuenc
A few years ago (2003, time flies) I wrote a piece table with a twist, instead
of using a linked list I used a tree. I put a description of the whole
algorithm here:

[http://e98cuenc.free.fr/wordprocessor/piecetable.html](http://e98cuenc.free.fr/wordprocessor/piecetable.html)

A years later it was integrated in AbiWord.

There was a major mistake in the profiling. The dots in the graphs were the
best of 3 measuraments. At the time I thought that was the fairest to prevent
outliers due to concurrent process. It also warms the data in the cache making
the best measurement super fast.

------
tudorw
If you're interested in this kind of structure it's worth taking a look at
'transclusion' devised by Ted Nelson in the mid 60's
[https://en.wikipedia.org/wiki/Transclusion](https://en.wikipedia.org/wiki/Transclusion)
reply

------
rhinoceraptor
Is fast save the (in)famous Word file format where the in-memory state is
written to a file?

~~~
duncan_bayne
Yes. Along with - as I assume you're alluding to - state that you might not
intend to share along with the current state of the document.

~~~
kobeya
Saving internal memory state to a file is such a terrible idea, who would do
that!

 _Proceeds to go write a program to manipulate memory-mapped files._

~~~
rhinoceraptor
That’s what Emacs (and AFAIK, some other interactive lisp environments) do to
speed up boot time.

~~~
p4bl0
The famous unexec function…

------
Sammi
The writer of the article wrote more details in a reddit discussion on the
article three years ago:
[https://www.reddit.com/r/programming/comments/22fpz0/whats_b...](https://www.reddit.com/r/programming/comments/22fpz0/whats_been_wrought_using_the_piece_table_how/)

Found it in the wikipedia footnotes :)

------
frandroid
How does a piece table differ from a linked list?

~~~
ithkuil
More similar to a rope/cord:
[https://en.m.wikipedia.org/wiki/Rope_(data_structure)](https://en.m.wikipedia.org/wiki/Rope_\(data_structure\))

------
cwt137
Is this in any way related to Operational Transformation?

------
kristianp
There are some other pages about MS Word here too:
[https://web.archive.org/web/20150822081514/http://1017.songt...](https://web.archive.org/web/20150822081514/http://1017.songtrellisopml.com:80/)

------
infogulch
What's the difference between this and ropes? They seem conceptually similar.

~~~
terminalcommand
Ropes are inherently a binary tree. From what I can understand you can
implement piece tables both as a double linked list and a b-tree. Furthermore
in ropes every leaf spans one character, in piece tables, pieces have
different lengths.

Furthermore with ropes, every operation returns a new rope data structure,
that means ropes are immutable. If you want to implement undo, you only keep
references to the former ropes. There is no inserting or adding via looking up
the changes in an undo list.

I think maybe if you implemented a piece table, where every piece spans one
character, used a binary tree and made it immutable, you'd get a ropes data
structure.

For more information about persistence vs immutability this link might help:
[https://stackoverflow.com/questions/10034537/persistent-
vs-i...](https://stackoverflow.com/questions/10034537/persistent-vs-immutable-
data-structure)

~~~
deathanatos
> _Furthermore in ropes every leaf spans one character,_

The Boehm paper on ropes[1] (which is about the _only_ academic literature on
the subject that I know of) does not do that (and explicitly suggests one
should not), nor does any real-world implementation, (e.g., the Boehm
implementation, SGI's impl) do that. It would be incredibly inefficient, and
for no real gain.

A good rope implementation will store an array of characters/bytes in the
leaves, up to some threshold.

> _Furthermore with ropes, every operation returns a new rope data structure,
> that means ropes are immutable._

There is nothing inherently immutable about ropes, and it is certainly
possible to mutate a rope. (For example, appending a single character to a
leaf with space is much quicker if the rope as a whole is mutable.)

Look at the SGI implementation for an example here; their reference docs[2]
contain sufficient details to see that the rope itself is mutable.

Now, a rope typically just describes a sequence of characters. It is entirely
possible for a leaf node in a specialized rope to reference on-disk content,
and other leaves to reference in-memory content. In that regards, it can be
like a piece table. Implementing an undo/redo on top of a rope is less
straight-forward; whereas a piece table's undo history can essentially share
large portions of the linked list (see the lovely diagrams here[3]) I'm not
sure the same can be done on a Rope's tree, due to the tree's need to balance
and re-balance. That is, without copying the tree, since that kind of defeats
a lot of the benefits that a piece table has — not needing to copy the entire
"file", even if that's just a bunch of spans.

Now, one could make the leaves in a rope ref-counted (so they could be shared)
and then copy the Concat nodes for undo/redo levels. Keeping the concat nodes
as copies means the copies can balance independently, and since concat nodes
are small (~2 pointers for left/right and a depth, IIRC) that isn't _too_ much
copying (the bulk of the data, the text, is refcounted in the leaves). But the
simplicity of the piece table really starts to shine at this point.

[1]:
[http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9...](http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf)

[2]:
[http://www.sgi.com/tech/stl/Rope.html](http://www.sgi.com/tech/stl/Rope.html)

[3]: [http://www.catch22.net/tuts/piece-
chains](http://www.catch22.net/tuts/piece-chains)

~~~
terminalcommand
I stand corrected, thanks for the thorough answer.

My interest in ropes was arisen from the "data structures for text editors"
post here on HN. Most resources were haskell implementations. IBM
Developerworks had an article about ropes and a corresponding java library,
the article stated ropes were immutable.
[https://www.ibm.com/developerworks/library/j-ropes/index.htm...](https://www.ibm.com/developerworks/library/j-ropes/index.html)

I admit being wrong on every node storing a single character, that
misconception stems from a graph I saw representing ropes on yesterday's
article.

Undoing with immutable ropes is very straightforward I think. You don't copy
the whole file, but just the references. I admit it is heavier on the memory,
but you could store an arbitrary amount of previous "states" or versions in
ram. The benchmark on the Java library may prove this point.

I will try to read the original article to gain more insight. Thanks again for
the resources.

------
mcguire
Is there nothing that J Strother Moore hasn't touched?

------
anovikov
Back in 2000, i was hired to write an app that recovered damaged Word
documents, and i had to experience it first hand. Impressed me as well!

------
bitmapbrother
The title makes it seem like it was exclusive to Word and developed by
Microsoft when in reality it was developed at Xerox and used by their Bravo
word processor.

~~~
dang
That was the rewritten title (see
[https://news.ycombinator.com/item?id=15388593](https://news.ycombinator.com/item?id=15388593)).
We've reverted it to the original now.

~~~
punnerud
God point, thanks. Agree with you. I will try to remember that next time an
article make the front page.

------
yuhong
The history of FullWrite is slightly wrong. Wikipedia has a better one:
[https://en.wikipedia.org/wiki/FullWrite_Professional](https://en.wikipedia.org/wiki/FullWrite_Professional)

(1988 is also around the time of a famous DRAM shortage BTW)

------
nerdponx
What was wrong with the original title? It was "What's been wrought using the
Piece Table?"

~~~
JadeNB
It seems a bit click-baity ("well, what _has_ been wrought using it?"). I
suppose that the current one is arguably still a bit so (it could be "How the
piece table enabled unlimited undo …"), but it seems better to me.

~~~
yodon
Hmmm... my guess is you're probably not making a whole lot off those ads on
your website if "What's been wrought using the Piece Table?" strikes you as
click baity.

~~~
always_good
They're referring to the format of "click to find out!" when it can be
mentioned right there in the title.

That nobody is going "oh boy, so what _has_ been wrought using the piece
table?" is part of their point.

~~~
punnerud
Click to find out without the normal read the whole article to find out? You
are given a new title when you enter the site. Those two combined give you
more then enough information to figure out if you already know it. And if you
already knew that Word is using the algorithm, you would probably just skipped
or opened the link to verify that is was this algorithm. All this combined, do
you mean I should have kept the original title?

------
bernadus_edwin
We call it now as redux

