
A Brief Glance at How Various Text Editors Manage Their Textual Data (2015) - yankcrime
https://ecc-comp.blogspot.com/2015/05/a-brief-glance-at-how-5-text-editors.html
======
userbinator
_This is very efficient, but the consequence is Vi will become slow when
working with huge files, because it has to traverse through a bigger linear
array_

To put "bigger" into perspective, a modern CPU can traverse and copy memory at
over 10GB/s.

This is why I've always found the argument for more "efficient" (and complex)
text editor data structures a bit tenuous --- even if you have to move MBs of
data with every insertion (as happens with a simple gapless buffer), computers
truly are so fast that it wouldn't look any different to the end-user; a 1ms
and 1us delay upon each keystroke is, to the user, practically
indistinguishable.

That's not to say I'm one of those who preach against "premature optimisation"
and don't care about efficiency; far from that, in fact, but the popularity of
and lack of speed-related complaints against the small DOS text editors and
even syntax-highlighting IDEs on PCs in the 80s through early 90s which used
the same "one buffer" paradigm, on machines with a fraction of the memory
bandwidth of those today, suggest that the complexity of more "clever" data
structures may not be worth it.

...and yet we somehow still manage to make editors that peg a single CPU core
just blinking a cursor.[1]

[1]
[https://news.ycombinator.com/item?id=13940014](https://news.ycombinator.com/item?id=13940014)

~~~
kqr
> computers truly are so fast that it wouldn't look any different to the end-
> user; a 1ms and 1us delay upon each keystroke is, to the user, practically
> indistinguishable.

Unless, of course, the 1 ms operation drains the users battery 1000x faster
than the 1 us operation.

Efficiency is no longer about speed, it's about noise, temperature, and
battery life. Speed is a happy side effect.

------
ggm
If you've never interacted with the file saved state of an editor which saves
its edit history tree, its a revelation when you try the UNIX _strings_
command on one. It took me a few minutes to realise insert and moves were
_not_ the same, because the structure followed insert order, to permit
infinite-undo. The actual text position was a subsidiary location tagged on
the string.

I think everyone should be asked to write their own simple editor, and then
for extension try to convert from the obvious simple tropes in arrays to one
which begins to do what these editors do.

~~~
greglindahl
Or you could implement a gap buffer, and then your complicated editor is using
a simple trope.

Emacs is an example of a full-featured editor based on a gap buffer.

~~~
ggm
Sure. You could leap to the best solution first. Probably, you lucked out in
the gene pool and your future fate is not to write bad code. The rest of us,
who did not acquire sufficiently complex frontal lobes, struggle with concepts
and probably dive into stupider solutions first, like I did at uni in the
1980s allocating a fixed size buffer (big initial mistake)_ and maintaining a
huge linked list of edits sequence events into it.

I passed the course, but future evidence suggests I didn't learn as much as I
hoped!

~~~
greglindahl
Ha ha, well, I'll confess, I got lucky and discovered the existence of gap
buffers before I tried to write a simple editor. And these days, there's a
Wikipedia article on quite a few of the fundamental editor data structures.

It's worth noting that no widely-used editor since emacs uses a gap buffer, so
it's not like it's won the hearts and minds of implementers.

------
ScottBurson
> [Sam] is one of the first editors to separate its UI from the actual editor
> - Sam can be used on both the command-line and as a graphical text editor.

Not even close to the first. TECO had both command-line and graphical editing
in, ah, 1964:
[https://en.wikipedia.org/wiki/TECO_(text_editor)#History](https://en.wikipedia.org/wiki/TECO_\(text_editor\)#History)

~~~
kragen
There were different implementations of TECO (written in the assembly
languages of different machines) that had different user interfaces. That does
not in any way imply that TECO had an editor process using a defined protocol
to communicate with its UI process; in fact, in 1964, TECO ran on two
computers, and neither of them had separate processes, or an operating system
to run them under. ITS development hadn't started yet.

As far as I know, all the implementations of TECO, even today, glom the screen
update logic (if any!) together with the editor-buffer logic in a single
monolithic process.

------
nerdponx
Does anyone know what the new text editor Xi [0] uses? What about the GNU Zile
[1] library?

[0]: [https://github.com/google/xi-editor](https://github.com/google/xi-
editor) [1]:
[https://www.gnu.org/software/zile/](https://www.gnu.org/software/zile/)

~~~
zeugmasyllepsis
From the previous discussion [1] on HN:

I use ropes in xi editor. I did not find the argument against ropes
convincing. Yes, they're not trivial to implement, but in a proper programming
language you're not dealing with the data structure directly, you're always
going through the interface, so you get the logic right once in the rope
library implementation and then forget about it.

In a low-level design, your editing operations would be poking at the data
structure directly. There, the simplicity of a gap buffer is a pretty big win.
I agree in this environment ropes are too complicated. However, I don't see
any good reason to architect a text editor in this way. Use abstractions.

The linked article contains a factual error, the referenced Crowley paper does
not consider ropes. Thus it cannot be used in support of the argument that
piece tables outperform ropes.

There's one other important concern with piece tables I didn't see addressed.
It depends on the file contents on disk not changing. If your file system
supported locking or the ability to get a read-only snapshot, this would be
fine, but in practice most don't. It's very common, say, to checkout a
different git branch while the file is open in the editor. Thus, the editor
must store its own copy to avoid corruption. In the long term, I would like to
see this solved by offering read-only access to files, but that's a deeper
change that can be made piecewise.

[1]:
[https://news.ycombinator.com/item?id=15383193](https://news.ycombinator.com/item?id=15383193)

------
vthriller
> I'm surprised programmers haven't created overlays for Vi or Emacs for the
> GtkTextView. Definitely an opportunity for someone there.

[https://github.com/polachok/gtk-vikb](https://github.com/polachok/gtk-vikb)
already was a couple of years old by the time of writing this. (Not sure about
the current status of this library though, especially considering commit
history and no mentions of GTK3 whatsoever.)

------
signal11
Previous discussion:
[https://news.ycombinator.com/item?id=15381886](https://news.ycombinator.com/item?id=15381886)

------
WalterBright
MicroEmacs, a free text editor from the 1980s, uses a double-linked list of
lines. It is fast and memory compact, after all, it worked on a 64K DOS
machine.

[https://github.com/DigitalMars/med](https://github.com/DigitalMars/med)

~~~
Someone
In that memory regime (and without virtual memory), code size matters. With a
simple approach, Your data structures may take 10% more memory, but if its
code is a few k smaller, it’s a net win even if you exhaust all memory.

Also, in that tight memory, people would sometimes choose slower approaches if
they had lower memory usage. For example, for each line, do you use a gap
buffer, or simply move every line? The former may be slightly more efficient,
but eats extra memory.

------
gowld
> You get three choices, and may choose two:

seems a completely unjustified claim. These features aren't mutually
exclusive.

~~~
catpolice
I think the claim is that this is true in practice (among the surveyed
options) not that it is true in principle.

------
dsego
In the comment sections the author claims that Sublime text uses a fork of
GtkTextView. Is this true?

Here's the relevant bit:

> Open Sublime in a hex editor, there are gtk debug strings everywhere, along
> with gtktextview/edit strings too. It's as "home grown" as a fork can get
> really. Some searching should bring up more evidence.

------
dang
Discussed in 2016:
[https://news.ycombinator.com/item?id=11244103](https://news.ycombinator.com/item?id=11244103)

------
teddyh
See also chapter 6 of _The Craft of Text Editing_ by Craig A. Finseth:

[https://www.finseth.com/craft/#c6](https://www.finseth.com/craft/#c6)

------
fallat
damn my blog is popular today

any questions let me know

------
wyclif
How does Neovim do in comparison to Vim with regard to these metrics? Neovim
has to look a bit different here because it's async and Vim is not.

~~~
nsomaru
IIRC Vim 8 now has async features.

~~~
wyclif
You're right, Vim 8 does async. I'd still like to know how Vim compares to
Neovim according to these metrics.

------
sverige
Emacs has a text editor!?

~~~
fiddlerwoaroof
It’s called evil-mode

~~~
krylon
Hehe, you two made my day! ;-)

------
nathell
(2015)

~~~
dang
Thanks! Added.

------
sydd
> You get three choices

I dont agree the you should judge by these criteria.

\- program size: Who cares? A somewhat decent laptop has 500GB+ SSD storage, I
dont care if my editor is 5MB or 50.

\- virtual efficiency (RAM usage, and again disk space??): Again who cares if
it uses 5MB or 500 RAM? Dev machines have at least 8GB

I care a lot more about stuff like typing lag or autocomplete support.

~~~
oh_sigh
8GB seems like a lot until you try to run Eclipse and chrome with 3 tabs open.

~~~
laumars
While I don't disagree that 8GB isn't much for a developer machine these days
I do also think your claim is a little exaggerated. I certainly don't have any
issues running a couple of intelliJ instances + Chrome with _at least_ a dozen
tabs open (seriously; I'm terrible for leaving tabs open).

What I find kills my machine is trying to run Chrome and Firefox concurrently.
Particularly if I still have IntelliJ open.

~~~
setr
>with at least a dozen tabs open (seriously; I'm terrible for leaving tabs
open).

Come back when you have 150 tabs open. Anyways, somewhere around 200 across 4
windows is where i start seeing about 10GB ram used, with os and other apps.

RAM never seems to be much of an issue unless im abusing things; its cpu/disk
that kills me, particularly when some background process goes out of control.

~~~
vidarh
I'm horrible about tabs too. But I mostly use it as a "I'll use this page
again in a moment". Often that's true, but often I also just end up leaving it
around.

My solution was the OneTab extension for Chrome. It gives you a button to
click to close all tabs (with some ways of marking exceptions), and adds them
to a page, so you won't lose them. It groups it by date, from newest to
oldest. It makes it easy to come back to the things I left in tabs "in case",
while keeping the tab count down.

In effect I find I come back to quite few of them, but knowing I _can_ makes
it a lot easier mentally to press that button.

(And now I'm off to do just that before doing some work)

