What are the other canonical resources on this topic? It feels like tons of the interesting thought is scattered around various blogs and usenet posts and the like. I'd love to create a nice collection of good writing on text-editing / tools, but I'm not sure where to start.
Equally important is perhaps improving elisp concurrency. The jury is still out on whether this will happen by migrating to Guile Scheme .
The gap buffer, in particular, is a great example of a simple yet powerful idea that is perfectly suited to the problem domain.
The sources for the original version of Mince were lost long ago. It's too bad; they were very clear (thanks to the skills of Jason Linhart, the primary author, as well as Craig) and would have made a great example for study.
Thanks, it was a really nice editor.
EDIT: I forgot it was also a self balancing tree! Very cool stuff.
A paged gap buffer (as described with an array index) remains ideal when you need to make a small number of surgical changes (insertions, deletions, etc) to a very large file, especially given the fact that all modern systems have page mapping hardware, so anything you implement is effectively on top of a paged gap buffer anyway. What we're really searching for is a better program-visible structure.
To that end, the biggest difficulties in efficient text editor, is that most text editing is (ahem) textbook, and neglects the fact that fork() copies on write making most real operations asynchronous, and combining writev() and mmap() can be used to produce whatever memory layout you want (a plain old stupid byte array if that's convenient); The kernel will memcpy your page table for you, so there's no sense in also doing it in user-space. And so on.
If you consider at which point a write() and a mmap() (or on OSX a mach_vm_remap()) will be faster, just how much faster it will be -- imagine: programming something as simple as a plain byte array but with instant inserts (memmove) across multi-gigabyte buffers. Then consider the cost of a write()+mmap() syscall combination in the worst case (a couple hundred micros?) and you'll never use a complicated (linked list) data structure again.
A paged gap buffer is actually more like a tree, so instead of a single gap in the middle of a file, you have a gap in the middle of a block, and one block mapped per modified space. Cost of insert/delete is limited to the cost of a memmove within a block (cheap!), and your extent map never grows beyond 2x the number of changes. These upper bounds are incredibly good for edits, and there are only pathological cases that do better.
But what about search?
Search is faster too! Because virtual memory has your entire file contiguous, a search is as fast as a scan, which might embolden you to try indexing your file, which might really impress your users.
I beg to differ. I routinely open multi-gigabyte log files and SQL dumps in text format, and would not like to have to resort to “sed” to edit them.
Btw the memory overhead is not that much. It's just the memory needed to keep the pointers between the nodes. So I'm talking 10% increase or something.
Emacs seems to do just fine with gap buffers. And thinking about algorithms with greater complexity as being less primitive can obscure the fact that the complexity isn't always worth the tradeoff; see "Gap Buffers, or, Don't Get Tied Up With Ropes"
Consider what would have happened if they had just published a book. Likely the book would be out of print and no longer available. This is definitely a step up from that.
If this is something that you truly care about, you may want to reach out to the author and offer to provide a maintenane update on your own. That said, be prepared for the author to perhaps consider even the little time it would take coordinating with you and performing any updates not worth their time. Then again, they may welcome your interest and willingness to help.
I strongly disagree with this. The browser should do nothing it is not explicitly made to do, which is one of the reasons it can still render HTML from 20 years ago. We used to have browsers that tried to do that kind of thing and we're only now extracting ourselves from that mess.
It is 100% on the website author to make their page more readable.
For years and years and years I always had a half-screen-width browser window, precisely because it's easier to read text that way. But then site authors started assuming that I'd have a full-width window, and using CSS to waste half the window width.
I still think that the correct response to 'window too wide' is 'shrink the window,' but it's a losing battle.
margin: 1em auto;
font: 1.2em/1.62em sans-serif;
Funny how similar that definition is to the "programming" one.