Hacker News new | comments | show | ask | jobs | submit login
Atom's new concurrency-friendly buffer implementation (atom.io)
192 points by madspindel 11 months ago | hide | past | web | favorite | 52 comments

https://github.com/google/xi-editor uses many similar ideas to implement concurrent editing by plugins and other devices. In fact nathansobo, the author of this post, wrote a summary of some of the work Raph Levien (Xi's primary author) has done on CRDTs, so this is probably not a coincidence.

Xi also keeps track of changes using a data structure like Patch. It doesn't use a tree to do so, although we may eventually switch some to use Xi's generic Rope tree data structure. One interesting difference is that Xi always creates another "layer" for every edit and never modifies anything. This leads to lots of layers, but instead of working forwards from a "base text" we work backwards from the current text. So accessing the current text is very fast, and accessing a past snapshot may require mapping in reverse past a small number of edits.

They both solve the same problems though, although Xi's approach of layers for every edit also enables seamless multi-device concurrent editing via a full CRDT merge implementation.

If you're interested in reading in detail about the approach Xi uses, I wrote a big document about it near the end of my internship working on Xi's CRDT: https://github.com/google/xi-editor/blob/master/doc/crdt-det...

Thanks for this! Great writeup. I'm guessing xi's CRDT uses ideas from Raph Levien's work on combining OT and CRDTs? I've only given Raph's papers a cursory read, but there seem to be a lot of similarities and it looks like he was mentoring you. Does it have an official name yet?

Incidentally, I've also been working on an in-depth article on a particular CRDT: Victor Grishchenko's Causal Trees[1]. The original paper only uses the data structure for text, but I believe it can be extended to a variety of other data types. In brief, the basic convergent structure is a tree of immutable "atoms". This tree is stored in memory as a DFS in-order traversal. I treat each atom as an atomic operation ("insert char" or "delete char" in case of strings), and thus, every user action involves an O(N) mutation of the tree array to insert the new atom in the correct spot, followed by an O(N) re-read of the array to update the user-facing data. This requires the effect of each atom to be O(1) when read in DFS order, but with strings, this follows naturally. A CT string in memory looks a whole lot like your "union string" with explicit delete ops instead of tombstones, e.g. ["ins D","ins O","ins U","del","ins G"] which parses as DOG in O(N), except each atom does in fact have a unique ID and holds a reference to its causal atom — the character intended to be on its immediate left. (So no need for crazy coordinate transforms!) As with xi, you also get a full revision history which can be garbage collected if needed.

My implementation is a work-in-progress, but I've already extended it to basic Bézier drawing and I'm really curious to see how it scales compared to a more OT-like CRDT such as yours. My guess is that CT won't work very well with large files on account of the full O(N) read/write requirement as well as the ~10x memory cost vs. C strings, but on the other hand, it seems that very few other CRDTs can be painlessly tweaked for use with other data types, and this would be a worthwhile tradeoff for me. CT also benefits from a mostly O(N) merge and O(N) past revision viewing, provided you keep to the reconstruct-data-in-linear-time-on-DFS-traversal invariant.

Anyway, CRDTs are super cool! Interesting to see them being used in local contexts, but it totally makes sense, especially since consistency w/garbage collection isn't a concern.

[1]: https://ai2-s2-pdfs.s3.amazonaws.com/6534/c371ef78979d7ed84b...

[2]: https://github.com/archagon/crdt-playground

Glad you found my writeup useful, and your CT stuff sounds interesting. You may also find https://github.com/google/xi-editor/issues/250 interesting, especially the TODO list of optimizations.

Yes the Xi CRDT (closest we have to a name) is based on Raph's work, plus some discussions we had and about one week of me standing in front of a whiteboard figuring out how to apply the concepts to doing a merge on Xi's specific CRDT structures.

Currently Xi also stores many bytes for each character typed, but one of our planned optimizations is to delta-encode chunks of old revisions. The operations are stored with fields that are designed to delta-encode efficiently so for runs of normal insertions we should be able to get down to ~2 bytes per character.

Thanks for the post. Interesting stuff. Atom has come a long way since initial launch, and it still has a long way to go, but its great to see these kinds of improvements.

Bit by bit, Atom will become a native editor.

Sooner or later everything has to become native. JS/Ruby/PHP/... shit is only for initial launch.

Then some companies have been in the initial launch for over twenty years.

How do web programming languages like PHP and Ruby become native?

When they decide to abandon:

1. Dynamic typing

2. Global interpreter lock

3. Automatic garbage collection

4. Interpreting the code (instead of compiling to a single binary).

5. Virtual machine shit

It is just the matter of time, until the team accepts the pain to learn "How to do stuff in C/C++/Rust?".

I don’t know how you could read this post and not get this: The Atom team is quite capable of writing C++. It’s not painful at all. I love writing low level data structures in C++.

It just wouldn’t be an improvement for the vast majority of our code, because JavaScript already runs very fast thanks to V8, and using JS allows for excellent customizability by our users.

>3. Automatic garbage collection >4. Interpreting the code (instead of compiling to a single binary). >5. Virtual machine shit

When feeling like the rest of the world is running in a direction of utter madness, it makes sense to re-read "Worse is better" [1] once in a while. However, I agree, DT and GIL aren't helping anything but writing poor code faster and blinder.

[1] https://www.jwz.org/doc/worse-is-better.html

FYI: When clicked, that link takes you to an image of a testicle in an egg cup captioned with a sarcastic comment about Hacker News.

You have to copy and paste the url.

> FYI: When clicked, that link takes you to an image of a testicle in an egg cup captioned with a sarcastic comment about Hacker News.

jwz has strong opinions about Hacker News

Wow, host of that site is seriously full of themselves. And of course, they own some "dance lounge" in San Francisco.

Developers want the best quality from other industries, while in the mean time they suggest "Worse is better" for software industry. How about using the same strategy for air crafts, ships, cars, trains, medical devices? Some authority really needs to regulate software industry so every software company builds a high quality, energy efficient, crazy fast software. Aerospace people were successful to land human on moon on 1969. It is late 2017 and software industry is struggling to reduce the memory usage of a text editor to under 1GB. Retarded.

Give me $150B, the inflation-adjusted cost of the Apollo space program, and I'll put together a team and build the perfect text editor that uses "less than 1GB of memory."


> Retarded

Come on. Let's have an adult conversation here.

Sorry, but I think you've completely missed the point of this essay.

Think for a second: why can't you produce a better working text editor that people actually use, if all current ones are produced on inefficient technologies by subpar engineers with impaired judgment that led them to current choice of techniques? What are the real barriers to do that?

The same that keep people returning to $1 / 1€ shops.

Point 3. and 5. really ruin the rest of your list...

It is painful but inevitable. There is no single language that does the garbage collection great on behalf of you. However we are getting closer and closer to find smarter ways to do manual garbage collection.

My only complaint is that if Java and .NET were AOT compiled to native code since the beginning, instead of leaving it to third parties while taking about 20 years to adopt it, C and C++ would be less relevant than they turned out to be.

Tracing GC is not an issue for the majority of applications, and just because a language has a GC doesn't mean it lacks support for value types, e.g. Modula-3, System C#, ...

Regarding affine types, while a great approach, the ergonomics still need to be improved.

The whole point (or rather, one of the points) of Java and .net are that they are NOT compiled to native code but to bytecode so that you can write code once and then run it on many different systems without having to compile it again.

One does not preclude the other, it is only a matter of distribution, like the mainframes taught us eons ago, by combining JIT/AOT on their toolchains.

The fact that Oracle, Microsoft, Google and Apple are now on the JIT/AOT train, tells it all.

To be fair, .NET always had AOT native compilation support from the early days, but NGEN is a simple AOT compiler only meant for faster startup.

Even hello world programs tend to show different performance characteristics on different OS/platforms. The idea of write once, run everywhere is over. In a moderately complex software, changing the OS/platform is a non trivial problem. Recompiling the code (which is mostly automated) is the least problematic part of it.

Are you familiar with the joys of autoconf, cmake and friends? Have you tried writing a non-trivial application that simply compiles on different flavors of Linux and Windows? Compiling on different platforms is also a non-trivial problem. Having a virtualised platform that abstracts away differences between OS makes it vastly easier to write portable applications.

For many applications running on a VM on different platforms Just Works. The idea is most definitely not over, in fact it is becoming ever more popular. Just look at the rise of abominations like Electron.

Those joys are only enjoyed for those that insist into using C with POSIX APIs covering for lack of a rich standard library.

We don't need VMs, what we need is rich libraries and there are plenty of languages with support for AOT compilation to native code with such libraries.

Something that even C++ is finally adopting.

Regarding Electron, I hope it joins MSHTML in a couple of years.

Modern garbage collectors are actually okay for most workloads. "Great" is not necessary for almost all applications.

Ruby, using RubyMotion.

PHP, using HipHop.

What is a web programming language? Is that like C++ because I write all my servers in it?

If this gets rid off the freezing issues https://github.com/atom/atom/issues/7275 then i thank you atom developers for your efforts in fixing it. Although i already switched to VScode it's good to know that there's now more choices available.

Reinventing the 40 year old wheel for the 20th time

I always hated this metaphor.

I don't know about you, but I sure don't want my new Audi running on stone wheels.

Your Audi runs?!?

Could WebAssembly, at least when fully realized, be used for this kind of "custom data structure" optimization?

We are already able to run native code in Atom via node's native extensions, so we have no reason to wait for WebAssembly.

The only benefit of WebAssembly would be if we wanted to port Atom to work in a regular web browser. We can even already do that without WebAssembly by compiling the superstring library to JS with emscripten. This works today and benchmarks show that it's already quite fast, but I think WebAssembly will make it even more efficient.

Yes, but even faster would be actual native instructions run directly by your CPU.

So... webassembly?

Webassembly run on VM for portability.

Which in turn JIT compiles the code down to your specific hardware, possibly doing some extra optimizations along the way as it cares only about a single processor at a time. I do think languages like Rust are really superior to what we have in JS land, but the VMs there are truly a marvel of modern engineering.

Yikes, do you not realize that there is a difference between webassembly and actual native instructions?

> "JavaScript can be quite fast, but fundamentally it’s a scripting language with the unavoidable overhead that implies."

I don't even ... but better late than never.

Yay, hopefully no more jankiness. Maybe I can actually use Atom again.

Does Atom have a proper side bar yet to show open files? I can't even begin to understand how anyone can get by with tabs when working on big projects. Horizontal file navigation is a terrible idea.

Horizontal tabs works better for me. I didn't like the side bar of open files when I tried VS Code.

So it's just an opinion thing - not objectively terrible as you think.

It stops being an opinion thing when you are working with so many open files that you can't read their names in the tabs. That's why I specifically said "big projects". VS Code gives users the freedom to show files as they want. Atom doesn't.

I work with so many files that I can't read their names in the tabs - so I scroll the list of tabs horizontally - which is exactly what you do I guess if the vertical list doesn't fit on the screen. Or I just find the files again in the project tree. It works fine for me - they can't add everyone's preferred style to the UI or the project would eternally bloat.

Yes with a plug-in.

Atom: reinventing the wheel and never making it round

Speaking of reinvention, thanks for the original humour / quip in every submission with Atom in the title.

Link me what you are talking about

I tried Atom and didn't have a positive experience, so I'm not going to type out a bunch of words defending it. But the "reinvent poorly" comment comes up all the time on Atom posts. You're not adding any original insight or humor.

Here you go:


I'm missing the part where someone else said the exact same thing I did.

> the "reinvent poorly" comment comes up all the time on Atom posts.

That's likely because they reinvent things and do it poorly. This post is about yet another instance of that. It isn't something I've seen discussed before, it is a brand new part that has been reinvented poorly.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact