- from what I've read somewhere, MS Word was/is (?) using it internally [http://1017.songtrellisopml.com/whatsBeenWroughtUsingPieceTa..., ]
- AbiWord 
- : http://www.catch22.net/tuts/piece-chains
- the text editor of the Oberon OS used it 
One of its most intevesting advantages is that it trivially supports fast unlimited and persistent undo/redo
This makes it relatively easy to implement non chronological undo/redo operations i.e. an undo tree.
Because old versions of the text are available one could for example link compiler error messages back to the state when the file was last saved. Thus after a long build only those errors which are still relevant (i.e. not already fixed in the meantime) would be displayed.
Piece chains also map well to POSIX system where the initial file can be mmap(2)-ed. That is the reason why vis supports editing of large files efficiently. All editing operations are independent of the file size, but linear in the number of non-consecutive changes. Of course things like go to line nn will have to read the file content, there is no way around that. But for example navigating using the nn% command is possible in constant time. The way syntax highlighting is implemented it should also (mostly) work for all types of files.
The doubly-linked list of gap buffers method is somewhat similar, but is self-optimizing because adjacent small buffers can be merged.
My intention was to highlight the piece-chain method as it seems to me surprisingly often overlooked (not known?), while having quite some noteworthy features. That said, I'm starting to think that maybe its non-trivialness can be seen in itself as one of the disadvantages too (i.e. by maybe making the program more complex than for the other, "dumber" methods)? Not sure on that, though. Adding undo upon the other ("simpler") approaches may possibly make them more complex anyway?
Its effects can be mitigated by:
1) storing the pieces in a balanced search tree instead of a linked list. This would make random access logarithmic in the number of non-consecutive changes.
2) implement some kind of compaction algorithm which merges adjacent changes. This would increase memory usage because not only the changed text but also some unrelated data in between would have to be stored in a contiguous memory region. Furthermore this duplication of "unchanged data" would harm some other nice properties of the data structure. For example marks could no longer be represented as simple pointers into the underlying buffers because their content would not be unique (i.e. a character could be part of multiple buffers).
The pointer chasing you mention is an issue for all non-trivial (i.e. not stored in a consecutive memory region) data structures. Screen updates are a non issue since the output region is of fixed size (a terminal). For syntax highlighting using LPeg the relevant text region is temporarily copied to a contiguous memory region. This is simplified by the fact that vis currently always soft wraps lines and does not support folds.
Gap buffers have other issues, for example you only have one gap. What happens if you support multiple cursors/selections which might be on totally different regions of the file? You will have to move the gap.
Anyway I find the piece chain and its support for (non chronological) undo/redo as well as mark management rather elegant.
In practice it seems to work quite well for vis. I haven't encountered performance issue in daily usage. To the contrary users reported that is "blazing fast".
I would guess that the rope data structure has probably displaced the piece chain in new projects.
This meant that you could run sam and samterm on opposite ends of a slow link and still be able to edit very large files. The remote sam process loads the file into the data structure described in the original article. samterm only loaded (over the slow link) the section of the file needed to draw a window containing the part of the text the user was looking at. As you moved around the file samterm would fill in parts of the data structure with the text you needed to see.
The data structure used on the samterm end is called a Rasp: a file with holes. See https://plan9port.googlesource.com/plan9/+/refs/heads/master...
I will compile a list of more people will want to see (seeing a lot of Atom and editors with piece table (vis uses piece tables)).
Thank you for the positive feedback. My blog before had a total of 30,000 views over 2-3 years, and now just overnight has double that - makes me pretty happy and want to put more effort into write ups! :)
I used a bizarre structure where the each onscreen line in the document (soft-wrapped) was represented by a (uint16?) character count and pointer to an array containing the characters. I had to write a custom memory allocator that put adjacent lines next to each other in memory to make moving characters between lines efficient.
Way too much overhead, and if you changed font you had to completely reload the file, but it was pretty quick even on the old 68K Palms.
+ [prosemirror.js] (https://github.com/ProseMirror/prosemirror)
Each of them might be relying on some form of Tree to store the content which actuates the views to react. I'm only guessing. A deeper, closer look might be interesting and useful.
I imagine someone somewhere has written a more complex editor that manages things a lot more manually, though. I checked the first one you linked (quill), and it, as expected, uses contenteditable.
It's way easier than you might expect. Only one line of JS needed for this demo (inline `onclick` handler for the bold button). Shortcuts like Ctrl+B, Ctrl+I, Ctrl+U, Ctrl+C, Ctrl+V, Ctrl+X, Ctrl+Z work out of the box (in Chrome; IIRC Firefox might open bookmarks for one of those).
Try to open your console when use quill and type `quill.getContents()`. And try this with Trix: `document.querySelector('trix-editor').editor.getDocument()`.
Seems like you don't know what is 'rich-text' at all.
A few years ago I had to research text editors designed to handle files >200 MB. Vim, which I love, wasn't up to the job. Here's what I found (sorry, I don't seem to have the links at my fingertips; also this is partly based on 4-year old memories):
* EmEditor: This was very impressive. The speed is breathtaking for what it does, and it can handle advanced functions such as column operations on the full file with ease. We chose this and users were very happy; it transformed their abilty to work with large text files.
THese were on our shortlist to test next but we didn't get to them due to available time and EmEditor's fantastic results:
* VEDIT: Designed for large files. They charge for it, IIRC, but based on its reputation it is worth the investment.
* The Semware Editor (TSE): Also a great reputation
* 010 Editor: http://www.sweetscape.com/010editor/
* PilotEdit: http://www.pilotedit.com/
Also of interest:
* File Query: Treats file as a database, enabling parsing, queries; very flexible and powerful, per reviews: http://www.agiledatasoftware.com/
* PDT-Windows: A database editor which reputedly has a max filesize of 18 EB (exabytes)
* bvi: Binary VI
Finally, , some excellent resources
On the other hand, I submit that JOE's highlighter is the fastest available (someone should test this- I've not done it recently).
vim -u NONE big_file.txt
This works very well for very large files.
You can do all the normal editing operations in log time, and B-trees can (obviously) be run from disk, so you don't need to use a lot of RAM, either, and can efficiently edit extremely large text files. In-memory, B-trees also use the memory hierarchy efficiently.
You could have (a secondary index of) line counts as well as bytes so that you can also seek to specific lines in log time.
In 2015, I would think that most (i.e. > 90%) want an editor that's makes them most efficient as developers.
A little side project I have is to create a set of notes on editors I've used, or want to use, so I can compare them, and use each more efficiently.
Edititing should be a precise skill with less hand movement and fewer keys pressed.
Emacs is extremely customizable. The question is which keybindings are the most efficient? Does a modal mode help? god-mode perhaps?
It uses a doubly linked list of gap buffers. Each gap buffer has a header and a 4K data page. The headers are always in memory, but the data pages can be swapped out to a file in /tmp. The memory usage limit is 32 MB. Possibly this is no longer a good idea- it's easily possible that you could have more RAM than /tmp space.
The header has the data page's offset in the swap file, the link pointers, the gap location and a count of the number of newlines in the gap buffer.
When a file is read in, the gap buffers are completely full. So read-in turns into a direct read of the file into memory (or into the swap file). The only thing it has to do is count the newlines in each 4K data page and generate the headers.
The newline count is to speed up seeks to specific line numbers. [A long standing enhancement idea is to generate the newline count on demand and use mmap. This would allow the read in to be a NOP- just demand load the pages from the original file as needed and use copy-on-write when any change is made to preserve the original. But I'm also not sure it's a good idea to not take a snapshot of the original file- so this probably should be optional.]
JOE uses smart pointers to the edit buffer. Each pointer has the address of the header and a memory pointer to the data page (which is always swapped in if there is a pointer to it). The software virtual memory system has a reference count on each page. Each pointer holds a reference on the data page it's pointing to. If there is no pointer to a page, the reference count is zero, so it can be swapped out.
The other purpose of the smart pointers is automatically stick to the text they are pointing to, even through insert and delete operations. So if you insert at one point in the file, any pointers to further locations are updated (including line number, byte offset, column number and memory offset).
For ex, how you can open a svg file, if you use the image mode you see the image representation otherwise you get the code.
This is incorrect. Intuitively, if you're using large files, you're likely to generate large changes in file sizes. Algorithmically, this is a very common approach. Collections in most languages double their size whenever the existing capacity is exceeded. This minimizes the number of times you need to reallocate lists of increasing size, and works nicely with the memory management systems of various operating systems.
It's also not resource efficient to need to change the size of the collection for every new line. I would suggest that a 10,000 line initial capacity and increasing by 10% on reallocation are appropriate compromises.
Even old ones such as CKEditor plan on updating their engine: https://medium.com/content-uneditable/ckeditor-5-the-future-....
EDIT: oh here is where it is: https://git.kernel.org/cgit/editors/uemacs/uemacs.git
And it looks like its line based like old vi (I think): https://git.kernel.org/cgit/editors/uemacs/uemacs.git/tree/l...
Interestingly there was a line limit per window based on a signed char (127) that just got changed to an int (that was the last change made).
Text editing on Windows still mostly sucks.
Hardly meant as serious editors. But they will do for simple tasks.
I've opened 200MB+ files in Notepad before, it depends entirely on processor speed.
Why do people who write in C so often make it look like they have a shortage of letters to use for names?
F = m * a
ADD STATE-SALES-TAX TO STATE-TAXABLE-SUBTOTAL GIVING SUBTOTAL-INCLUDING-STATE-SALES-TAX
In the usual implementation the gap is only moved when you actually insert text, not when the cursor is moved, which makes it even more efficient.
Using strings like this will thrash your garbage collection, especially with long lines. An array, or more interesting data structure, might be more sensible.
(only pursuing this because I'm interested in your project!)
You might find my blog interesting:
2005-2016:"it takes too long to open big files" https://bugzilla.gnome.org/show_bug.cgi?id=172099