Hacker News new | comments | show | ask | jobs | submit login
The Craft of Text Editing (1999) (finseth.com)
244 points by youjiuzhifeng on Jan 30, 2017 | hide | past | web | favorite | 47 comments

I've been learning a lot lately by following along with the development of xi[1], a new text editor written in Rust. Through reading that project's RFCs I've then come across other interesting projects, like swiobe[2] and wi[3].

What are the other canonical resources on this topic? It feels like tons of the interesting thought is scattered around various blogs and usenet posts and the like. I'd love to create a nice collection of good writing on text-editing / tools, but I'm not sure where to start.

[1] https://github.com/google/xi-editor

[2] https://github.com/swiboe/swiboe

[3] https://github.com/wi-ed/wi

Kakoune[1] has been posted recently here on HN to great reception. As a C++ developer, I think it has a very high quality codebase, especially considering how non-trivial it is. As a user, it's been my main text editor for over a year. It also has a vibrant community, jump in on IRC if you have questions or ideas.

[1] https://github.com/mawww/kakoune

I think getting Kakoune UI support into Xi would be a very interesting project…

Are you talking about Xi as a frontend for kakoune? If it's the UI part that you're interested in, there is also kakoune-qml[1] made by one of the regular uses.

[1] https://github.com/doppioandante/kakoune-qml

I think creating a Xi frontend that mimicked Kakoune's modal keybindings and highlighting would be an interesting exercise, and probably stretch the set of supported ideas in Xi in a good way.

REmacs is a very interesting development...


It'd be amazing if it crystallizes in a few years. Improving the old parts of Emacs, especially GUI code and low level things is a must. Rust seems like an ideal replacement for C. Great performance, much better safety guarantees.

Equally important is perhaps improving elisp concurrency. The jury is still out on whether this will happen by migrating to Guile Scheme [1].

[1] https://www.reddit.com/r/emacs/comments/4zttlt/guileemacs_st...

I believe Rust is too young a language to replace C in Emacs. I'd be happier if it was a standardised, time-tested language with a large user base and lots of docs.

Rust is standardized with lots of docs. The rest two will come with time.

It's still moving fast and breaking things in ways that---while necessary---aren't reasonable choices for programs like Emacs or TeX. These programs need to run the same way in 20 years, and be runnable the same way in 50.

Sam comes to mind as an interesting project in this area:


Too reliant on the mouse.

There is some theory of operation stuff for JOE:


A paper that covers some of the data structures used in editors is:


The gap buffer, in particular, is a great example of a simple yet powerful idea that is perfectly suited to the problem domain.

The text in that PDF didn't show up correctly for me, but there's an HTML version of the paper here:


Was irony intended?

Fun blast from the past. The original version of this shipped with Mark of the Unicorn's Mince/Scribble package for CP/M. (Mince = Mince Is Not Complete Emacs)

Glad to see someone remembers that! (I was a cofounder.)

The sources for the original version of Mince were lost long ago. It's too bad; they were very clear (thanks to the skills of Jason Linhart, the primary author, as well as Craig) and would have made a great example for study.

I used MINCE quite a bit, and it was great. MINCE was what you used for Emacs if you couldn't get to an ITS machine :-)

Thanks, it was a really nice editor.

I once build a text editor using a rope[1] data structure where every line was a node. The tree was augmented[2] with information about line numbers, titles in the document... for very fast navigation. I don't think primitive data structures like a gap buffer are useful anymore. They come from a time where saving on memory was more important than it is now.

EDIT: I forgot it was also a self balancing tree! Very cool stuff.

[1] https://en.wikipedia.org/wiki/Rope_(data_structure) [2] https://en.wikipedia.org/wiki/Interval_tree#Augmented_tree

A dissent: Saving memory isn't strictly orthogonal to editing performance.

A paged gap buffer (as described with an array index) remains ideal when you need to make a small number of surgical changes (insertions, deletions, etc) to a very large file, especially given the fact that all modern systems have page mapping hardware, so anything you implement is effectively on top of a paged gap buffer anyway. What we're really searching for is a better program-visible structure.

To that end, the biggest difficulties in efficient text editor, is that most text editing is (ahem) textbook, and neglects the fact that fork() copies on write making most real operations asynchronous, and combining writev() and mmap() can be used to produce whatever memory layout you want (a plain old stupid byte array if that's convenient); The kernel will memcpy your page table for you, so there's no sense in also doing it in user-space. And so on.

If you consider at which point a write() and a mmap() (or on OSX a mach_vm_remap()) will be faster, just how much faster it will be -- imagine: programming something as simple as a plain byte array but with instant inserts (memmove) across multi-gigabyte buffers. Then consider the cost of a write()+mmap() syscall combination in the worst case (a couple hundred micros?) and you'll never use a complicated (linked list) data structure again.

How would a gap buffer handle distributed editing operations like search-and-replace, or multi-cursor typing? The data structure seems optimized for editing in one place at a time.

There aren't a lot of structures that have an amortised cost-per-edit; you're really only ever considering the cost of doing a single insert/deletion operation, and you're really only ever trading that performance against the complexity.

A paged gap buffer is actually more like a tree, so instead of a single gap in the middle of a file, you have a gap in the middle of a block, and one block mapped per modified space. Cost of insert/delete is limited to the cost of a memmove within a block (cheap!), and your extent map never grows beyond 2x the number of changes. These upper bounds are incredibly good for edits, and there are only pathological cases that do better.

But what about search?

Search is faster too! Because virtual memory has your entire file contiguous, a search is as fast as a scan[1], which might embolden you to try indexing your file, which might really impress your users.

[1]: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

> I don't think primitive data structures like a gap buffer are useful anymore. They come from a time where saving on memory was more important than it is now.

I beg to differ. I routinely open multi-gigabyte log files and SQL dumps in text format, and would not like to have to resort to “sed” to edit them.

I don't disagree. Although a good rope implementation can handle that. Ropes are nice because it doesn't matter where you edit. Gap buffers have to copy around a lot of data if you are editing in different places.

Btw the memory overhead is not that much. It's just the memory needed to keep the pointers between the nodes. So I'm talking 10% increase or something.

>I don't think primitive data structures like a gap buffer are useful anymore.

Emacs seems to do just fine with gap buffers. And thinking about algorithms with greater complexity as being less primitive can obscure the fact that the complexity isn't always worth the tradeoff; see "Gap Buffers, or, Don't Get Tied Up With Ropes"[1]

[1] http://scienceblogs.com/goodmath/2009/02/18/gap-buffers-or-w...

I never understood why these webpages can't have like, 4 lines of CSS to make them much more readable. Preserve the older aesthetic I guess?

As pointed out elsewhere, the post is from 1999. You're effectively asking the author to, apart from creating the content, to also maintain it over the years to someone else's arbitrary satisfaction. I'm not sure if that's fair. The author likely has other project's they're working on, and maintaining the look of something they wrote 18 years ago isn't a priority.

Consider what would have happened if they had just published a book. Likely the book would be out of print and no longer available. This is definitely a step up from that.

If this is something that you truly care about, you may want to reach out to the author and offer to provide a maintenane update on your own. That said, be prepared for the author to perhaps consider even the little time it would take coordinating with you and performing any updates not worth their time. Then again, they may welcome your interest and willingness to help.

Maybe because the author's expertise is something other than writing HTML, so he picked up an old book on HTML, marked it up, and that's it. That the browser can render HTML written 20 years ago is quite a virtue. If it only takes 4 lines of CSS to make it more readable, then the page's lack of readability is more an indictment of the browser (which could do this tidying itself) rather than of the author, who should not have to continually update HTML so it renders well on recently-invented devices.

> then the page's lack of readability is more an indictment of the browser (which could do this tidying itself)

I strongly disagree with this. The browser should do nothing it is not explicitly made to do, which is one of the reasons it can still render HTML from 20 years ago. We used to have browsers that tried to do that kind of thing and we're only now extracting ourselves from that mess.

It is 100% on the website author to make their page more readable.

The whole point of vanilla HTML is that it has few presentation details. It has some headings, bold, italic, and such. If the page does not specify margins or font size, the browser absolutely should set these so it is most readable on the device. If I write plain HTML today I am not optimizing for some VR headset that will be used twenty years hence. The headset should render the plain HTML in a manner faithful to the semantic markup, not so it looks the same way it looked on Netscape with a VGA screen.

If you have something worth saying, you can take the 5 minutes it takes to make it readable.

Learning CSS does not take 5 minutes.

Looks great on my phone. Symantic HTML is responsive by default.

It's a bit awkward to read on large screens. Long lines and whatnot

> It's a bit awkward to read on large screens. Long lines and whatnot

For years and years and years I always had a half-screen-width browser window, precisely because it's easier to read text that way. But then site authors started assuming that I'd have a full-width window, and using CSS to waste half the window width.

I still think that the correct response to 'window too wide' is 'shrink the window,' but it's a losing battle.

Wait, why wouldn't the solution be "define a max width"? Site authors should not be wasting your window width, windows should not be "too wide" for text, and you shouldn't have to choose between the two solutions you mentioned.

Only because it seems to me that if the user really wants a terribly wide window … that's his choice. Who am I to prevent someone from doing something which seems stupid to me? Maybe he wants a half-inch high window at the bottom of his screen where he can scroll through my text slowly, or something. His call, not mine.

But, in demonstration of another virtue inherent in this sort of simplicity, reader mode handles it beautifully.

One may find it useful to employ CTRL+PlusKey, CTRL+MouseWheel, pinch-to-zoom, or snap-to-screen-edge.

What 4 lines would those be?

Inspired by https://bestmotherfucking.website/

  body {
      margin: 1em auto;
      max-width: 40em;
      font: 1.2em/1.62em sans-serif;
That seems like a good start

Nice! Thank you!

> In its most general form, text editing is the process of taking some input, changing it, and producing some output.

Funny how similar that definition is to the "programming" one.

That's because that is just what a Turing machine does.

As someone currently working on a code editor I love this stuff, but there's usually more focus on the technical part then the human part. With todays hardware we can do millions of stupid things every second and it will still feel snappy. We should spend more time trying to optimize for the humans instead of their computer.

I liked it so much I bought a hard copy a couple years ago. Lots to learn in that book.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact