Hacker News new | past | comments | ask | show | jobs | submit login
A Brief Glance at How Various Text Editors Manage Their Textual Data (2015) (ecc-comp.blogspot.com)
272 points by yankcrime on Feb 15, 2018 | hide | past | favorite | 59 comments



This is very efficient, but the consequence is Vi will become slow when working with huge files, because it has to traverse through a bigger linear array

To put "bigger" into perspective, a modern CPU can traverse and copy memory at over 10GB/s.

This is why I've always found the argument for more "efficient" (and complex) text editor data structures a bit tenuous --- even if you have to move MBs of data with every insertion (as happens with a simple gapless buffer), computers truly are so fast that it wouldn't look any different to the end-user; a 1ms and 1us delay upon each keystroke is, to the user, practically indistinguishable.

That's not to say I'm one of those who preach against "premature optimisation" and don't care about efficiency; far from that, in fact, but the popularity of and lack of speed-related complaints against the small DOS text editors and even syntax-highlighting IDEs on PCs in the 80s through early 90s which used the same "one buffer" paradigm, on machines with a fraction of the memory bandwidth of those today, suggest that the complexity of more "clever" data structures may not be worth it.

...and yet we somehow still manage to make editors that peg a single CPU core just blinking a cursor.[1]

[1] https://news.ycombinator.com/item?id=13940014


> computers truly are so fast that it wouldn't look any different to the end-user; a 1ms and 1us delay upon each keystroke is, to the user, practically indistinguishable.

Unless, of course, the 1 ms operation drains the users battery 1000x faster than the 1 us operation.

Efficiency is no longer about speed, it's about noise, temperature, and battery life. Speed is a happy side effect.


You’re assuming the chipset is completely free and waiting to handle your text edits.

If my editor is responsive while the computer is idle, but slows to a halt when I am running some other intensive process, that’s not good enough. I’d put the editor in the same category with the window manager and the terminal... the “needs to still work even while the machine is under extreme load for random reasons” category.


I tend to agree with you. I'm using my own editor most of the time now. I occasionally fall back to Emacs still, but less and less. An early decision I made was to separate the buffers and almost all the rest of the editor code.

It's slow Ruby. It's storing everything in an Array of String objects. Which means one (Ruby-internal) allocation per line for the String object, and an additional allocation for the rest of the String if it exceeds something like 24 characters (Ruby String objects store characters in the object itself if it's short enough, but otherwise store a pointer to a separate buffer). Every keypress (because I've been lazy and not bothered optimizing away cases where it's not needed), the frontend requests the full set of visible lines from the backend via a TCP connection (using Drb for RPC), and re-applies syntax highlighting and other formatting, and in the process causes a flurry of additional object allocations. It's not optimized at all. It's almost flagrantly wasteful in many areas.

It started that way out of simplicity and a desire to get it to the point where I could use it for most of my editing ASAP. And it has remained that way so far because the lag just isn't noticeable to me, and it's very rare I edit large enough files for it to matter. I might optimize it at some point, but to me this seems like something people tend to worry too much about.


sometimes efficiency can die due to a thousand cuts, and most text editors today end up running with plugins that also do compiling and highlighting and intellisense and so on. the work could add up, perhaps?

also sometimes people work with really really big files for various reasons, good and bad.


If you've never interacted with the file saved state of an editor which saves its edit history tree, its a revelation when you try the UNIX strings command on one. It took me a few minutes to realise insert and moves were not the same, because the structure followed insert order, to permit infinite-undo. The actual text position was a subsidiary location tagged on the string.

I think everyone should be asked to write their own simple editor, and then for extension try to convert from the obvious simple tropes in arrays to one which begins to do what these editors do.


Or you could implement a gap buffer, and then your complicated editor is using a simple trope.

Emacs is an example of a full-featured editor based on a gap buffer.


Sure. You could leap to the best solution first. Probably, you lucked out in the gene pool and your future fate is not to write bad code. The rest of us, who did not acquire sufficiently complex frontal lobes, struggle with concepts and probably dive into stupider solutions first, like I did at uni in the 1980s allocating a fixed size buffer (big initial mistake)_ and maintaining a huge linked list of edits sequence events into it.

I passed the course, but future evidence suggests I didn't learn as much as I hoped!


Ha ha, well, I'll confess, I got lucky and discovered the existence of gap buffers before I tried to write a simple editor. And these days, there's a Wikipedia article on quite a few of the fundamental editor data structures.

It's worth noting that no widely-used editor since emacs uses a gap buffer, so it's not like it's won the hearts and minds of implementers.


I wouldn't say a gap buffer is the "complicated and crazy" data structure for text editing. Certainly not the best. There's no free lunch, every approach has its downsides :)


Gap buffers can’t support multiple cursors efficiently. This is a core feature in eg. Sublime Text.


Sublime doesn't use a gap buffer?


> [Sam] is one of the first editors to separate its UI from the actual editor - Sam can be used on both the command-line and as a graphical text editor.

Not even close to the first. TECO had both command-line and graphical editing in, ah, 1964: https://en.wikipedia.org/wiki/TECO_(text_editor)#History


There were different implementations of TECO (written in the assembly languages of different machines) that had different user interfaces. That does not in any way imply that TECO had an editor process using a defined protocol to communicate with its UI process; in fact, in 1964, TECO ran on two computers, and neither of them had separate processes, or an operating system to run them under. ITS development hadn't started yet.

As far as I know, all the implementations of TECO, even today, glom the screen update logic (if any!) together with the editor-buffer logic in a single monolithic process.


> one of the first

vs

> the first

But cool anyway. Maybe we should see how TECO manages its data too.


TIL about punch card hot code swapping


It is reputed that someone wrote a Fortran compiler in TECO on a dare.


Ah, TECO. I still have some muscle memory of TECO on the PDP-10. Line noise indeed.


Does anyone know what the new text editor Xi [0] uses? What about the GNU Zile [1] library?

[0]: https://github.com/google/xi-editor [1]: https://www.gnu.org/software/zile/


From the previous discussion [1] on HN:

I use ropes in xi editor. I did not find the argument against ropes convincing. Yes, they're not trivial to implement, but in a proper programming language you're not dealing with the data structure directly, you're always going through the interface, so you get the logic right once in the rope library implementation and then forget about it.

In a low-level design, your editing operations would be poking at the data structure directly. There, the simplicity of a gap buffer is a pretty big win. I agree in this environment ropes are too complicated. However, I don't see any good reason to architect a text editor in this way. Use abstractions.

The linked article contains a factual error, the referenced Crowley paper does not consider ropes. Thus it cannot be used in support of the argument that piece tables outperform ropes.

There's one other important concern with piece tables I didn't see addressed. It depends on the file contents on disk not changing. If your file system supported locking or the ability to get a read-only snapshot, this would be fine, but in practice most don't. It's very common, say, to checkout a different git branch while the file is open in the editor. Thus, the editor must store its own copy to avoid corruption. In the long term, I would like to see this solved by offering read-only access to files, but that's a deeper change that can be made piecewise.

[1]: https://news.ycombinator.com/item?id=15383193


> I'm surprised programmers haven't created overlays for Vi or Emacs for the GtkTextView. Definitely an opportunity for someone there.

https://github.com/polachok/gtk-vikb already was a couple of years old by the time of writing this. (Not sure about the current status of this library though, especially considering commit history and no mentions of GTK3 whatsoever.)



MicroEmacs, a free text editor from the 1980s, uses a double-linked list of lines. It is fast and memory compact, after all, it worked on a 64K DOS machine.

https://github.com/DigitalMars/med


In that memory regime (and without virtual memory), code size matters. With a simple approach, Your data structures may take 10% more memory, but if its code is a few k smaller, it’s a net win even if you exhaust all memory.

Also, in that tight memory, people would sometimes choose slower approaches if they had lower memory usage. For example, for each line, do you use a gap buffer, or simply move every line? The former may be slightly more efficient, but eats extra memory.


Linus Torvalds maintains his own version:

https://git.kernel.org/pub/scm/editors/uemacs/uemacs.git


> You get three choices, and may choose two:

seems a completely unjustified claim. These features aren't mutually exclusive.


I think the claim is that this is true in practice (among the surveyed options) not that it is true in principle.


In the comment sections the author claims that Sublime text uses a fork of GtkTextView. Is this true?

Here's the relevant bit:

> Open Sublime in a hex editor, there are gtk debug strings everywhere, along with gtktextview/edit strings too. It's as "home grown" as a fork can get really. Some searching should bring up more evidence.



See also chapter 6 of The Craft of Text Editing by Craig A. Finseth:

https://www.finseth.com/craft/#c6


damn my blog is popular today

any questions let me know


How does Neovim do in comparison to Vim with regard to these metrics? Neovim has to look a bit different here because it's async and Vim is not.


IIRC Vim 8 now has async features.


You're right, Vim 8 does async. I'd still like to know how Vim compares to Neovim according to these metrics.


Emacs has a text editor!?


It’s called evil-mode


Hehe, you two made my day! ;-)


(2015)


Thanks! Added.


> You get three choices

I dont agree the you should judge by these criteria.

- program size: Who cares? A somewhat decent laptop has 500GB+ SSD storage, I dont care if my editor is 5MB or 50.

- virtual efficiency (RAM usage, and again disk space??): Again who cares if it uses 5MB or 500 RAM? Dev machines have at least 8GB

I care a lot more about stuff like typing lag or autocomplete support.


You care a lot. A smaller program has a clearer model, contains less bugs, and can be more flexibly evolved in different directions.



Let's not assume everyone is so stupid that they need their statement taken to the extreme pointed out to them as an exception.

I've noticed an increase in obnoxious disclaimers like "there are exceptions of course" and "it's just my opinion of course" as people get tired of responding to pointless responses that point those statements out.

That's just my opinion though. There are certainly exceptions. Feel free to disagree!


I absolutely agree sometimes. YMMV. Happy to hear your thoughts.

If only we had the ability to (literally) check boxes to include these disclaimers as flair on our comments. Then they might add disclaimer packs for the most controversial topics so that folks don't have to click too many boxes.

Obviously, there might be exceptions. Some folks might not want this feature. Others may. Who knows? Certainly not me.


8GB seems like a lot until you try to run Eclipse and chrome with 3 tabs open.


While I don't disagree that 8GB isn't much for a developer machine these days I do also think your claim is a little exaggerated. I certainly don't have any issues running a couple of intelliJ instances + Chrome with at least a dozen tabs open (seriously; I'm terrible for leaving tabs open).

What I find kills my machine is trying to run Chrome and Firefox concurrently. Particularly if I still have IntelliJ open.


I do most of my work on a laptop with 4GiB of memory (manufactured in 2009). Probably the last time I did anything dev related that meaningfully stressed the machine was when I decided to give android studio a shot.

Just sitting there, with nothing else open, the machine was completely unusable.

I cannot recall anything else I have done with the machine in it's lifetime that's been a real problem.

This includes running firefox and chromium concurrently, plus a couple virtual machines .


Like always when people start comparing system performance, there are so many variables at play (CPU make / model / age / cache / n cores, bus speed, RAM type, OS / distro / package versions / compiled flags, swap space, running daemons / services or other background processes, number of browser tabs and even the specific sites left open, etc etc)....

But for me it's the Flash plugin container for Firefox that really does the damage when it comes to concurrency. A few Firefox tabs open running YouTube (or another video streaming service) and my fan already starts spinning up louder than a revving motorbike. If I have much more open aside Firefox then the system will just lock up without warning. I suspect Flash is thrashing both CPU and RAM though but it's consistent enough behaviour that I tend to avoid it rather than investing time debugging and fixing (though since it is Flash that kills it, I suspect any such "fix" would just be altering the behaviour of the softwares end user (ie me) anyway).


I get modded down for the oddest reasons these days. I can't even fathom why anyone might disagree with this post.


I thought YouTube used HTML5 video, not flash...


It does. I also don't have flash installed, which probably helps. Can't remember the last time not having it caused a problem; good riddance.


To be honest what I really wanted to say was "porn sites" but instead picked a worksafe (and less accurate) example.


>with at least a dozen tabs open (seriously; I'm terrible for leaving tabs open).

Come back when you have 150 tabs open. Anyways, somewhere around 200 across 4 windows is where i start seeing about 10GB ram used, with os and other apps.

RAM never seems to be much of an issue unless im abusing things; its cpu/disk that kills me, particularly when some background process goes out of control.


I'm horrible about tabs too. But I mostly use it as a "I'll use this page again in a moment". Often that's true, but often I also just end up leaving it around.

My solution was the OneTab extension for Chrome. It gives you a button to click to close all tabs (with some ways of marking exceptions), and adds them to a page, so you won't lose them. It groups it by date, from newest to oldest. It makes it easy to come back to the things I left in tabs "in case", while keeping the tab count down.

In effect I find I come back to quite few of them, but knowing I can makes it a lot easier mentally to press that button.

(And now I'm off to do just that before doing some work)


Been there, done that. I was constantly losing documents I wanted saved because the browser would crash etc. So then I realised my behavior was counterproductive and thus since employed a more sane approach to tab usage.

Honestly; having more than a dozen tabs open doesn't make you a hero. It just makes your life a little less convenient.


> its cpu/disk that kills me

CPU is rarely an issue for me, but disks are a constant source of annoyance.

The turning point for me was when I got a new PC at home that has an SSD. I used to think people exaggerated the benefits of SSDs, or that they were just very impatient. ... Now I am spoiled.


8GB seems like a lot until you try to run Eclipse and chrome with 3 tabs open.

Add in a time machine backup at the same time as a slack video call and it's game over.


A lot of people care resource usage of software. It is good though that you get a loaded machine for your work.


While I care a lot about typing lag (but not at all about autocomplete), if an editor doesn't start fast, or at least has a client-server model where the client starts fast, I'd drop it after 5 minutes.

That's one reason to care about program size, though less so about the basic binary as about how many extra dependencies it has, and certainly what else it does during initialization matters (e.g. many Emacs configs does DNS requests synchronously in the critical path on startup, and end up hanging until timeouts trigger if your network is down... Fun times)

RAM usage I agree matters less. I can't remember the last time Emacs warned me I was about to open a large file. My own editor is so RAM inefficient it'd bring my laptop to its knees in no-time if I were to open a huge file. I'm happy to fall back on another editor i I for some reason ever need that. Though with many application RAM usage is large for all the wrong reasons (my editor would be in that category for other people; for me it's the right reason: it keeps it tiny and fast for me to hack on for the time being - but I don't inflict it on other people (yet)).

But it's still useful to evaluate on these criteria, as they may matter more for others.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: