Hacker News new | comments | show | ask | jobs | submit login
Rope Science – Computer science concepts behind the Xi editor (github.com)
421 points by krat0sprakhar 69 days ago | hide | past | web | 60 comments | favorite

Interesting! I find it amazing that they went all the way into implementing a CDRT to support efficient plugins.

At the moment I am also writing a text-editor with my partner, to show-case the C++ RRB-Trees implementations in Immer [1]. We are just started, but my plan is to stop at 1000k lines. Interestingly, with such persistent data structure, you go already a long way in implementing undo, parallel processing, and many editing algorithms are super simple thanks to slicing/concat. Memory consumption is still sub-optimal (also I am using wchar_t to simplify my life), but I am very satisfied with the results so far (it is the fastest editor I have on my machine when editing 1GB file, also can edit a ~100MB file on a Raspberry --- larger files fail only due to excessive memory use :/).

[1] https://github.com/arximboldi/immer

I wonder how rope behaves after lots of changes. Take 1GB file, modify every line multiple times. Now whole file is partitioned severely. Moving around a file could be much slower. For me moving around file quickly is much more important then big edits. Probably practically it's still good enough.

I'm developing my own text editor. To test naive implementation of text editing routines - array of line strings. I concatenated all sources of recent Linux kernel 4.6.3 once. From what I remember it was all .c, .h and .S files. Resulting file had 542MB and 19'726'498 lines. It was good enough even adding a line at front.

The true strength of the rope data structure its performance in worst case conditions. Even after a huge series of edits, it recombines leaves and rebalances the tree so it's almost as efficient as when it's loaded freshly. It's not like a piece table, which is amazingly efficient at first and then fragments.

[The actual analysis is much more subtle, as the tree invariants are a minimum and maximum number of children for non-root non-leaf nodes, and a minimum and maximum leaf size. When freshly loaded, the code is careful to maximally pack both, and after extensive editing the distribution will be more varied. However, it's still rigorously O(log n), so it ends up being a small constant factor. Might be worth doing some empirical performance testing.]

Pardon my ignorance, but what's a CDRT?

CDRT maybe Conflict-free Replicated Data Types

I love to geek out on this sort of stuff. Data Structures for Text Sequences by Charles Crowley [1] is a great read. I would also recommend checking out the Vis editor [2]. It's an interesting Vi like editor that uses the piece chain as its data structure and supports Sam's structural regular expressions.

[1] https://www.cs.unm.edu/~crowley/papers/sds.pdf

[2] https://github.com/martanne/vis

For those interested in reading more about this idea, the term is "monoid cached trees". The Fenwick tree is a particular well-known special case.

(I looked into monoid cached trees for float placement in CSS at one point in an effort to come up with an O(n log n) algorithm. I succeeded, but the constant factor was so high that it wasn't worth the price. I ended up switching to an algorithm based on splay trees that, while O(n^2) in the worst case, ended up being O(n) with a small constant factor on real-world pages.)

Reading the fourth entry on parenthesis matching made me wonder whether one could store, in the monoid, partials views into the table that is generated by CYK parsing: https://en.wikipedia.org/wiki/CYK_algorithm

I love the idea of using monoids like they're described in the blog series, but the examples suggest that there's a certain amount of non-generalizable cleverness that goes into defining each monoid. Could you do CYK subtables inside the monoid, so that people can define arbitrary CF grammars, as long as they're in Chomsky normal form?

I've been wondering lately if it would be possible to make an entire compiler stack incremental, so that the binary changes on disk as I type. I am positively sick of waiting tens of seconds or even minutes for the compiler to redo all the work it's already done thousands of times just to make a one-byte change to my binary.

Presumably, moving to an infrastructure like this (everything is incremental) is the biggest difference between Old compilers and New compilers, because of the ubiquitous IDE. I think I remember watching a talk by Anders Hejlsberg about this.

Well, you can already have half of a split unicode character at the beginning and end of a substring in the rope.

If that doesn't happen (like in the "parens" or "other" description), to use CYK, I think you could maybe have piece of the table for each substring (so for all entries with both indices inside the substring) in each node but then I don't think the operation at the parent is commutative. And you have to decide to copy the info from the children or not (if not, you'll spend an extra log n time looking down the tree).

I first discovered ropes - aka cords - from Boehm's library, alongside the conservative collector.

IIRC, Internet Explorer used a binary tree to represent its strings, at least at version 5 in the early 2000s, because of the inefficiency of doing lots of copying for string operations - looped concatenation was one of the primary drivers. That doesn't mean it went all the way to ropes, of course.

Please read this with a grain of salt as it does not seem practical or necessary. It seems like the kind of thing written by a young person who is excited but doesn't really have much experience. Most of the ideas would not be real-world-useful as stated.

Excitement is nice to feel, but it takes some experience to know when excitement is really aimed in a productive direction. Otherwise we end up with the kind of motivation that so often produces over-complex and mis-aimed software: having a "cool idea" for "exciting technology" and then looking for places to apply it, and the applications don't really fit or don't really work, but we don't want to notice that, so we don't.

To pull examples: an entire one of these essays is on "paren matching" and how it would be really great if you monoidized (ugh) and parallelized that ... the basic idea of which is instantly shot down by the fact that language grammars are just more complicated than counting individual characters. Hey bro, what if there is a big comment in the middle of your file that has some parens in it? The author didn't even think of this, and relegates this to a comment at the end of that particular essay: "Jonathan Tomer pointed out that real parsing is much more interesting than just paren matching." Which is a short way of saying "this entire essay is not going to work so you probably shouldn't read it, but I won't tell you that until the bottom of the page, and even then I will only slyly allude to that fact." Which in itself is contemptuous of the reader -- it is the kind of thing that happens when you are excited enough about your ideas that the question of whether they are correct is eclipsed. This leads to bad work.

There's the essay about the scrollbar -- if you have a 100k-line text file, do you really want a really long line somewhere in the middle to cause the scrollbar to be narrow and tweakyin the shorter, well-behaved majority of the file? No, you probably don't! But this shoots down the idea that you might want to do a big parallel thing to figure out line length, so he declines to think about it. In reality what you probably want is the scrollbar to be sized based on a smooth sliding window that is slightly bigger than what appears on the screen (but not too much).

Besides which, computers are SO FAST that if you just program them in a straightforward way, and don't do any of the modern software engineering stuff that makes programs slow, then your editor is going to react instantly for all reasonable editing tasks.

I don't want to be too overly critical and negative -- these sorts of thoughts are fine if they are your private notes and are thinking about technical problems and asking friends for feedback. It becomes different when you post them to Hacker News and/or the rest of the internet, because this contains an implicit claim that these are worth many readers' time. But in order to be worth many readers' time, much more thought would have had to go in ... and as a result, the ideas would have changed substantially from what they are now.

I didn't read past essay 4, so if it gets more applicable to reality after that I don't know!

Hi Jon! Great to see you here. Much of what I wrote is speculative, especially paren matching based on monoids. What actually went into xi editor is much more conservative. I didn't actually intend this to be posted to HN, but am not too surprised it did.

As to being a young person, I don't know if you remember, but I was already of drinking age when you and me and Adam Sah hung out a bit and talked about the "rush" language.

I admit to being excited, but because I think "modern" editors really are too bloated and slow and it causes a thousand paper cuts throughout the day, and I think that can be fixed.

Oh you're that Raph! Hi.

Sorry for presuming age + experience level, it's how this came across to me. Actually I think Rush is a prime example of "excited about general ideas that turn out not to be right or relevant to much". But we were students, and I guess that is what students often do.

I agree modern editors are too slow and bloated. I would write one if I didn't have way too many other things happening. But I don't think they are slow and bloated due to a lack of computer science concepts. I think they are slow because most of the world, over the last 25 years, has lost the art of writing software that is remotely efficient.

If I were to write an editor, it would store text as arrays of lines (since lines are what you care about) with maybe one level of hierarchy, such that each 10k lines of the file are in one array. I think that would be fine and if it ran into problems with very large files, relatively minor modifications would take it the rest of the way. (Of course this is untested but I feel pretty confident about it). Rather than calling malloc all the time, a specialized allocator would be in play.

I do think it's a good idea to make a better editor so I wish you good luck with that (dude I am so sick of emacs).

I used an array of lines in gzilla[0]. It's not bad, but it _is_ exporting details of the representation up to clients, which gets worse once you add some hierarchy. To me, the nice thing about ropes is that they appear to the client as just a string, with some fancy added methods such as creating a cursor that can advance to the next or previous line efficiently. Of course, to make this work, you need a good programming language; an implementation of ropes in C would be painful.

As I said elsewhere on the thread, the good thing about ropes is their worst case performance. Super-long line? Rope still efficient. 10 million line file? Rope still efficient. Massive sequence of edits relative to the pristine file? Rope still efficient. It's pretty easy to write code which works well in the common cases but then degrades when things get interesting. I'm not going to apologize for caring about that.

[0] http://www.levien.com/free/gzilla-tour.html

But ... in a text editor you care about lines most of the time.

This is what I object to about the rope representation -- it intentionally destroys this information that you actually want available most of the time. I don't think that's nice at all.

It's possible you could make the rope work better in this sense by annotating each piece... I dunno, haven't thought about it.

As for the worst-case performance thing ... I think my scheme would do fine with super-long lines or 10 million line files. But dude, I don't even have an editor today that works okay on 10k-line files, and I don't think it's the internal data representation that's the problem, I think it's because of all the other decisions that get made (or lack thereof).

I have no idea what you mean about "intentionally destroying" line info. The info is right there, and there's a perfectly nice interface (Cursor) for getting at it; no real difference between calling next() on a Cursor object and doing line_index++.

Besides, what's a "line"? Is it a source line, or a visual line after wrapping? In an editor, you care deeply about both, depending on what exactly you're doing. With an array of lines, you pick one for the representation, and when you want the other one, it's quite painful. With ropes, no problem at all, just two different Cursor objects into the same rope.

I agree, other decisions besides buffer representation are important, and I'm doing my best to get those right too.

>> and when you want the other one, it's quite painful.

It's really not that painful. A logical line of text may span multiple visual lines because of word wrapping. A visual line will never span more than one logical line. So you always have a 1-to-N relationship between logical and visual lines, and everyone knows you have to go to M-to-N before things get objectively painful.

Here's the thing about text editors: they are extremely well known problems. It's really hard to say anything about writing a text editor is "difficult" when we have hundreds of examples of free, open source software that we can sit with and analyze for requirements. There really aren't many new lessons to learn, certainly not in the memory representation for the file.

It's really nice to talk about parallelizing your text editor operations, but here in the year twenty thousand and seventeen, Vi is still single-threaded and not at all hardware accelerated and it still runs essentially infinitely faster than most other text editors. The only time Vi's single-threadedness becomes a problem is when plugin authors can't be arsed to not load up Python.

What data structure does Vi use? An array of lines.

What other than computer science concepts can save us from slow software? Computer science concepts (Amdahl and Moore, maybe Dennard, maybe-maybe Landauer) have proven that hardware improvements cannot. Is there some aspect of the software performance that can only be understood through another domain of human endeavor?

I am not trying to be anti-intellectual, but software currently has the opposite problem, where people decide some idea will Make Everything Better and it turns out that this idea does nothing of the kind. In fact some of these ideas have set software engineering back by decades (example: Object-Oriented Programming).

There are, of course, computer science concepts that are very smart. But we don't need these to save us from slow software, because today's slow software problem is just the result of people doing bad things in layer upon layer. We have to stop doing all the bad stuff and dig us out of the hole we're in, just to get back to neutral. Once we are back at neutral, then we can try thinking about some computer science smarty stuff to take us forward.

See? Now you're sounding like an old fart :-)

You give an example of Object-Oriented Programming as an idea that has software engineering idea back by decades but really isn't it the misapplication of that tool that does the damage? Consider that time and again software engineers have developed a code base of functions which all need a bit a shared state, and technique of calling all those functions while including a reference to the 'context', was more simply expressed as calling a function "from" the context itself?

The tools help with the cognitive burden of understanding the entire system. And tools that allow one to make durable assumptions about part of the system allow the engineer to 'free up' space in their brain for other parts of the system. That desire to add abstraction in order to 'move up' the conceptual tree and get a wider view of the overall system has been part of how humans think since they first started collecting into tribes[1].

Certainly "over abstracting" is a huge issue. There was a great story about the Xerox Star system (first word processor that was multi-lingual and multi-fonted) that Dave Curbow used to tell about how the 'call stack' to get a character on the screen had reached insane levels (and that made things very slow). All due to abstraction. And yet there are good examples too where at Sun the adding of support for the 3b2 file system was accomplished quickly, and everything still worked, due to the Virtual File System (VFS) abstraction.

My point is that it isn't the tools that set computer science back, it is the misapplication of them that does that. And what I liked about Raph's discussions on Xi is the exploring and testing whether or not a tool he had available was applicable to writing text editors.

[1] Can you imagine the challenge of having to talk to someone in the tribe for 10 minutes to determine what their role was and capabilities? So much easier to say "You're a hunter right?" and when they say yes just assume various hunter capabilities are available.

This just points to a problem that computer science concepts need to address: the conflation of subroutines (reusable components of programs) and runtime calls/returns. Or, stated more directly, programmer control over inlining and other costs, on the way to enabling more zero-cost abstractions.

One big problem is that one can't deploy the existing methods of zero-cost abstraction (templates/monomorphized generics) across process boundaries or the kernel/userspace boundary. Let's work on this, not act as if abstractions themselves are the problem.

(As an aside: I think the "implementation inheritance" aspect of OOP is at least as harmful as jblow suggests. "Associate related pieces of data with the code that operates on them as a unit" is a pretty reasonable idea on its own.)

> In fact some of these ideas have set software engineering back by decades (example: Object-Oriented Programming)

I don't want to sound like I want to oppose you here, but I would be genuinely interested about on your write up about OOP setting engineering back by decades. Could you elaborate, please?

is just the result of people doing bad things in layer upon layer.

You can make a decent argument that's the only realistic way extremely large systems like the web can possibly come about. It's not pretty but the track record of the alternatives is worse.

The context of this discussion is about "a high quality text editor".

Of course in the name of not using the same golden hammer for all problems, extremely large systems like the web and text editors should each be considered in their own rights for the best solution for each of those.

I know. But you can also substitute 'long-lived' for 'extremely large', etc. 'People have lost track of the importance of efficiency/performance' is a recurring point of jblow's. There's something to it, no doubt, but I think it also merits some pushback.

I don't think "long-lived" is a good substitute for "extremely large". The longer something lives, the better it should be, for multiple reasons -- more time to work on the code, more design iterations, more-thorough understanding of the problem gained over time. If the code is just getting more messy and decayed and hard-to-deal-with over time, then we are doing something wrong. (And we almost always are).

I'm not just saying that people have lost track of the importance of efficiency. I am saying they've lost track of how to actually do it. I think at least 95% of the programmers working in Silicon Valley have no practical idea of how to make code run fast. Of the remaining 5%, a very small number are actually good at making code run fast. It's a certain thing that you either get or don't. (I didn't really get it when I started in games, even though I thought I did ... it took a while to really learn.)

I think it's a decent substitute because what you say should happen is generally the opposite of what actually does happen, often for reasons other than ineptitude. But more generally, my wanky counter-point is 'you say all these things as if it's a given they're unequivocally bad and it's not at all obvious to me that they are'. There's an awful lot of room above 16ms.

Dude, Photoshop takes many seconds to start up, and as of the most recent redesign it now often takes multiple seconds just to display the new project menu. And this is not atypical of today's software.

Forget 16ms, I would be happy to get to an order of magnitude slower than that for much of today's software ... it would be a massive increase in human happiness.

the basic idea of which is instantly shot down by the fact that language grammars are just more complicated than counting individual characters

I'm sure someone overly excited is at this very moment trying to make this a mathematically precise statement so they can see if you're right, and if so, if there's a way to change the author's approach to support more complicated computations on the text. Maybe you can help them along if you're not just guessing.

There was a link to the Wikipedia article on the Dyck language [1] somewhere, which lead me to the Chomsky–Schützenberger representation theorem [2], which says that every context-free language can be represented by a homomorphism applied to the intersection of a Dyck language (possibly with multiple different kinds of parentheses) with a regular language.

The Dyck language can be recognized by a monoid as described. The regular language can be recognized by a monoid whose elements are functions between states of the recognizing automaton, each symbol being identified with the transitions the automaton performs when reading the symbol. The intersection should be describable by the product of monoids. (If such a thing exists, anyway. My abstract algebra is a bit rusty.)

The hard part is dealing with the homomorphism that turns the intersection of the two languages into the context-free language. I'm guessing that the general construction might require turning parentheses into empty strings, which would destroy the simplicity of the monoidal construction.

Anyway, I'm unfortunately not excited enough to see whether I can get it to work, at least not right now. But maybe it helps point someone else in the right direction.

[1] https://en.wikipedia.org/wiki/Dyck_language [2] https://en.wikipedia.org/wiki/Chomsky%E2%80%93Sch%C3%BCtzenb...

To get the parentheses right, you have to parse the language.

There is an extensive body of literature on parsing that goes back decades. Most of it I don't think is that useful. But some of it is about parallel parsing. If you are interested, there are quite a number of people with something to say about it. However, the speed wins in practice are not very big.

On the other hand, if you just write the parser so that it's fast to begin with, you don't really have a problem. The language I am working on parses 2.5 million lines of code per second on a laptop, and I have only spent a couple of hours working on parser speed. To do this it does go in parallel, but it goes parallel in the obvious way using ordinary data structures (1 input file at a time as a distinct parallel unit). So it's not "parallel parsing" in the algorithmic sense.

Why do you need to parse the language to get parens correct? For most languages, comments and strings will need to be considered, but neither of these requires doing a full parse.

Of course, I don't disagree with your point that a fast parser makes making a distinction here less useful. However that number sounds interestingly large, without context, do you have more info I can read about it?

Parsers don't often handle degenerate cases very well, but degenerate cases are quite common in text editing. (Think about how often work in progress code might actually parse correctly.)

Lexers/tokenizers handle degenerate cases very well (after many years of being used for syntax highlighting in IDEs), and some parsers intended for IDE consumption are getting much better at dealing with degenerate cases (the Roslyn family and Typescript in my experience have some very interesting work put into this area), but most parsers still have a long way to go. (Especially, because many of the most common parser-generators themselves have never bothered to concern themselves with degenerate cases.)

You make a good point (not only is parsing unnecessary, lexing-only/"ad hoc" analysis can be more resilient for many of the tasks in an editor), however your phrasing makes it sound like you're disagreeing with me, but I don't understand with which part of my comment. Could you expand?

Interesting. I'm not sure what tone you were seeing, other than I realize "degenerate" has a very negative tone, but is the most apt technical term I can find.

Was neither agreeing nor disagreeing with your points, simply expanding "sideways", because I think the conversation about the usefulness of parsing to general usage in text editors and information processing gets derailed by the "degenerate" cases where things don't parse (because those are very important to text editors).

I think people often forget or underappreciate the lexing half of the lexer/parser divide. Yet syntax highlighting engines in most of the text editors we use these days already hint that you can do a lot of user meaningful things with "just" rudimentary, generic lexers.

As a continued aside: I felt I got really good results using a lexer as the basis for a character-based somewhat "semantic" diff tool, but still to date I've yet to really see it come into general usage outside my prototype toy (http://github.com/WorldMaker/tokdiff).

First, to the immediate point: Ropes are not talked about much outside of whitepapers and undergrad data structure courses. This is a shame, even if ropes are only useful on very long strings. More discussion, more documentation, and more attempts to use ropes are not a bad thing, even if nothing "practical or necessary" comes of it. (And I would say that a text editor is quite practical!)

Second, it's completely reasonable to imagine data-structure-driven improvements to the task of writing source code. Imagine, for example, an AST editor. There are tools like org-mode and paredit which are halfway to true AST editing, and plenty of languages have tooling sufficient to support it, if there were demand. An AST editor would, of course, generalize the lessons here about ropes to pretty-printed ASTs, but there's no innate reason why it couldn't be done.

Third, a meta-comment, to address your comment elsewhere in the thread: "There are, of course, computer science concepts that are very smart. But we don't need these to save us from slow software, because today's slow software problem is just the result of people doing bad things in layer upon layer. We have to stop doing all the bad stuff and dig us out of the hole we're in, just to get back to neutral. Once we are back at neutral, then we can try thinking about some computer science smarty stuff to take us forward."

There are assumptions here about the nature of CS. CS is a science of abstractions. Complaining about layers in CS is complaining about the very nature of CS. Software is slow and insecure and hard to use because the tasks that we demand of software are extremely complex and our human processes for creating code are not sufficiently high-level, powerful, and expressive enough for us to design good systems on the first try.

Whenever somebody says, "I have removed a useless abstraction," they usually forget to also say, "By replacing it with a useful abstraction."

Only replying to one point in your post, but if you think org-mode and paredit take you halfway to AST editing, try abo-abo's lispy. It's closer to the ideal by half again, at least.

I've found the best removals of abstractions I've done have gone back to essentially bare code. That is, I put back the abstraction either I or a coworker originally avoided.

Is there any comprehensive documentation on the rope datastructure?

If you're talking about ropes in general, Boehm et al. [1] is the authoritative source. There's a wikipedia page, too, but I find it way more confusing than it needs to be.

If you're interested in applications to other domains, we use ropes as the basis of the String data type in TruffleRuby. TruffleRuby is open source, so you can see that implementation by checking out the code in the org.truffleruby.core.rope package [2]. We had to extend the basic idea of ropes to better match Ruby's semantics, such as making them encoding-aware. I gave a talk about it at last year's RubyKaigi [3] that dives into real world trade-offs.

There are also a lot of various rope implementations out there. You probably can find one for your language of choice.

[1] - http://citeseerx.ist.psu.edu/viewdoc/download?doi=

[2] - https://github.com/graalvm/truffleruby

[3] - https://www.youtube.com/watch?v=UQnxukip368

It's not comprehensive yet, but at least you can navigate the API at https://docs.rs/xi-rope/0.2.0/xi_rope/

This write up is really good! Are there similar write-ups for things other than text editors?

I wonder if only wanted a printable character ASCII editor would simplify things a lot or only a little. And I guess no tabs.

> Part 2 Line breaking

I don't really understand the problem here. Can't we count the line breaks like anything else? Is it because that's not the values we want in the end?

> Part 4: Again, making this into a monoid is pretty easy. You store two copies of the (t, m) pair - one for the simple case, and one for the case where the beginning of the string is in a comment. You also keep two bits to keep track of whether the string ends or begins a comment. In principle, you have to do the computation twice for both cases, whether the first line is a comment or not, but in practice it doesn’t make the computation any more expensive: you compute (t, m) for the first line and for the rest of the string, and just store both the first value and the monoid sum.

What if a node of the rope contains an "end comment" and (later) a "("? What should the two pairs of (t, m) be? Now that substring might be entirely inside, outside or partially outside and inside a comment.

Although I do understand the general idea of computing the result for all possible initial/input state to achieve paralellism.

For those interested in following the project or participating in discussions, there's a subreddit over at https://www.reddit.com/r/xi_editor/

For anyone interested in contributing, there is also (as of yesterday) #xi on irc.mozilla.org.

This subreddit is almost dead. That's sad.

The text editor that I use for everything is one that I wrote myself a decade ago in 5K lines of C, based on gap buffers. Save the "advanced computer science" for the problems that need it.

Would it be possible to add in-memory LZ4 compression in xi-editor? For those huge log, XML, csv, etc. files?

Maybe it'd still be possible to maintain good response time while enjoying 4-10x memory savings.

It would be possible. I'm thinking of adding "pack/unpack" operations to the Leaf data structure, but it's more motivated by getting a varint encoding for line breaks; right now each break is a 64 bit integer (on 64 bit builds), so if you have file consisting of empty lines, the corresponding line break data structure is 8 times bigger than the text. With the varint encoding I have in mind, it would be bounded at 1/4.

That said, I'm very skeptical that lz4 on text would be worth it.

Wouldn't it be easier to just not load those lines?

I opened two of the rope documents and I don't even get the problems they try to solve. How can I decide whether these problems are mine as well?

Sure my text editors aren't perfect, but they mostly get the job done, so any editor coming along needs to show that it tries to solve a problem that the user has. I'm not yet convinced this one does, so I probably will never find out what makes it brilliant.

Given a long string it will make insertion into the middle of text fast. Have you ever opened a file in Atom, added one character, and the whole thing locks up? That's the use case this solves, among others.

I use Notepad++ since forever ago, and this has literally never happened to me. I don't know how Scintilla stores its data, but I doubt it's anything that fancy.

No editor besides Atom has this problem. Do they all use ropes?

I just attempted to open a 2MB file in gedit and it took about 25 seconds to load enough to click into some text. Attempting to edit it did not go well and the process effectively hung.

I opened the file in intellij and it's performing admirably but there is noticeable lag when entering input. Keep in mind that features were disabled automatically due to the file size.

2MB may not be very representative for all files, but when having to jump into generated protobuf class files I've certainly wished my IDE could handle the load. I can't tell you what data structures these IDEs are using internally though.

And, again, the fact that features had to be disabled in intellij really says a lot. With more power, you can have more features, you can get feedback faster in your IDE, you can have more linters, more plugins, etc. That's a huge benefit to having fast software. Performance is enabling.

In case you're wondering what's going on, this is the list of posts: https://github.com/google/xi-editor/tree/master/doc/rope_sci...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact