Hacker News new | past | comments | ask | show | jobs | submit login

Oh wow, cool!

I made a simple proof-of-concept realtime PEG parser a couple years ago, which ingests text OT/CRDT operations ("insert at position X", etc) and invalidates & recalculates the compiler output by invalidating all overlapping ranges and recalculating from the root. My implementation is way slower than I expected it to be - though I'm sure you could use a lot of tricks from well optimized parsers to speed it up. I agree - CRDT / OT / whatever is sort of orthogonal, although its really nice being able to feed the parser with the same operation format, and have it update its output.

https://home.seph.codes/public/miniohm/ / https://github.com/josephg/miniohm if you're curious.

I'd love to see this sort of thing applied at a larger scale in a compiler. For example, I could imagine editing a C program with a compiler running live. Instead of batch compiling artifacts to disk like its 1970, instead as I type each character the compiler recompiles just the parts of my program that could be affected by my change. The compiler passes code changes to the linker (maybe with a function level granularity). And the linker could then (live) manage writes to an executable, allocating new byte ranges for changed functions and updating references on the fly.

Even on very large C++ project like Chrome, there's no reason why incremental updates should take more than 1-2ms. If all you're doing is changing a single function in the binary file, why do our linkers rewrite the whole thing?




> Instead of batch compiling artifacts to disk like its 1970 [...]

I am delighted to see people discussing this. Batch orientation made sense in a very resource-constrained era. But at this point we have more RAM and CPU than we know what to do with. It seems so obvious to me that the correct solution is to prioritize developer experience and keep everything hot.

My single biggest barrier to developing faster is the latency between making a change and seeing what that does to the test results. I would like that to be well under a second. Given the cost of programmer time, I could happily throw money at RAM and CPU. But standard, batch-oriented tools won't take advantage of it 99% of the time.

I think there's a revolution in developer experience out there, and I really want to see it happen.


The perfect place to start would be support in LLVM backends, maybe they are already working on this.


I don’t know that that’s true. I doubt you could do this sort of redesign simply in an LLVM backend because a “hot” compiler would work differently at so many levels of abstraction. I think an easier place to start would be a self contained compiler with a little less scope than llvm. There we could figure out the patterns, find a clean set of internal APIs and make some sweet demos. And then with that approach the same task in LLVM.


I have done exactly this in the Markdown editor I developed: Stylo: https://www.textually.net if you're curious.

I used what I call "partial compilation" for each edit made in the Markdown editor that "re-compiles" only affected parts of the text. A function is responsible for computing the ranges to be recompiled based on the edits made in the text. In Stylo it was absolutely essential since CSS is used for syntax highlighting and a complete DOM is kept in memory for the complete Markdown text. The necessity to keep the complete DOM in memory could have been avoided without "following sibling" selectors handling but I really wanted "complete" CSS support. Since Markdown can be used for texts ranging from 1 word to 2000 pages, complete recompilation was unthinkable, so partial one with some exceptions was implemented.

The main difficulty with this kind of approach is that it's possible that the compilation, for whatever range determined, does not stop. Meaning that the effect of the edit goes far beyond the determined range, which function can not know this without compiling... It can happen for example if you edit in C and type "{" then all the scopes below are affected by this opened (and not closed yet) scope. In Markdown it can happen when you open a fence code block (using the ``` opening fence). So you need to have a proper way to handle these edge cases. The other problem is to handle the state associated with these regions and handle the non-stopping compilation case talked before.

All this, as I have discovered, make this kind of compilation and it's associated syntax highlighting really difficult to implement properly and fast enough.

In order to be fast, this process needed to be as asynchronous as possible, because there was some tiny but noticeable lags on the main thread. Even partial compilation was more around 8ms to 12ms on my computer (an old macbook pro 2012). But there is also some cases where asynchronous is not suitable, in which cases synchronous processing is used, but, only when absolutely necessary. So, a mechanic which allows to switch between asynchronous and synchronous processing was needed.

All this to say, it's possible but really difficult to do properly. Every bit of the CPU power have to be extracted to make it comfortable to use. I can just imagine with the complexity of C++ what a nightmare it would be to manage, but it should be possible with a proper compiler architecture that modularise partial compilation and encapsulates the inherent complexity of this approach.


Thanks for writing this - its really interesting to read and I'm glad other people are trying this approach.

Is your markdown editor opensource? I'd love to play with it.


"Even on very large C++ project like Chrome, there's no reason why incremental updates should take more than 1-2ms. If all you're doing is changing a single function in the binary file, why do our linkers rewrite the whole thing"

Because it requires N^3+ dataflow problems to track what is possibly changed with significant accuracy.


Does it?

Let’s say I edit a function, and my editor (attached at the hip to my compiler) knows which bytes have been changed. So it knows my change was isolated to a function in a .cpp file. The compiler has cached the compilation environment from the last build, and with that in mind it could recompile just this one function. Then we need to binary patch the function - which essentially requires allocating disk space inside the binary executable. If the previous compiler pass left some free space, we could allocate out of that and mark the old function memory as reusable. Then we need to update CALL pointers - either store them in a list or have every function do indirect calls through a table, and then you just need to update the table entry. If you want to get fancy you could do better than function level granularity, but I think that alone would give us sub-ms builds.

I wouldn’t want to change the process for release builds - but for incremental debug builds I don’t see where you’re getting an N^3 from.


Yes, it does.

"The compiler has cached the compilation environment from the last build, and with that in mind it could recompile just this one function."

First, this is actually the most trivial possible case, and even there, as you'll see, this is not correct.

Outside of a function it does not handle changing types, changing access permissions in classes, changing between virtual and non, etc. It does not handle changing globals, things with attributes of various sorts, etc.

All of those require understanding the uses, or overestimating them.

As for whether function-local changes can have non-local effects, the answer is flatly "yes". You don't even have to be that creative, since you can do things like force template function output and other things that affect global state from a function in C++.

The assumption that because i changed text in one function, it means the possible changes are limited to that function, is not correct in C++ :)

There are languages in which it is, mind you, just C++ is 100% not one of them.

Honestly, I would strongly suggest you start to read papers on this, this is an incredibly well studied area, and the costs are well known.

If it was easy and cheap, we'd still be doing it ;)

It used to be a pretty standard feature of editing environments for things like LISP, etc.


Do you have any recommendations for places to start reading? I didn't realise it was a well studied area - naively it seems like it would be possible in many languages. It would certainly be trivial for 90% of the transpiled / polyfilled stuff that we do in modern typescript / javascript.

I completely agree that doing this to live-update an optimized build would extremely expensive and difficult. And I'll happily acknowledge that (for example) adding or moving fields in a struct would be hard to implement.

But adding a compiled specialization of a template function / class should be "easy" to handle. Its non-local, but I can imagine a dependancy tree of functions needed for the compilation. On update, we can figure out a diff of that tree, free all the now-unused function bodies in the binary and then add (compiling if necessary) all the added / updated functions.

C++ might be one of the worst languages to implement this for - .NET, JVM, typescript or Go might all be much easier. But yeah, if this has been extensively looked in to I'd love to learn why we don't do it for modern development. It seems like an obvious win, and if we don't do it I hope its for good reasons.


But, this is a hard thing to do, the compiler code will have to be organized in a different way, probably the language should be designed specifically towards this end. And the compiler code will be more brittle for sure and will probably be slower for batch compiles.

Most languages do already support ok-fine grained recompiles (at the module level) and it's just many linkers that are slow (I keep hearing from people that this need not be). And actually some of them already do something that is called "incremental linking".

Even C compilers regularly compile 100K+ LOC/second, so I think there's not much to be gained here. Yes, complicated C++ can be slow to compile. Maybe we shouldn't write compliated C++ code to begin with? In the end, batch compile performance also matters.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: