Hacker News new | past | comments | ask | show | jobs | submit login
Speeding up VSCode extensions in 2022 (jason-williams.co.uk)
179 points by jayflux on Jan 28, 2022 | hide | past | favorite | 36 comments



One idea for speedups that this article doesn't mention, that I've used very successfully, is splitting an extension into two parts: The Javascript extension that runs in VS Code, and a separate language server[1] written in whatever fast language you desire (without having to compile to WASM).

I do this with an extension I develop, and it works very well. Microsoft even provides a library to make this easy, and sample code[2]. In my case the language server is written in Rust, and uses tree-sitter. This combination feels like a super power. (The first version of my extension was written in clojurescript, using the instaparse parser. It quickly became apparent that it was way too slow.)

A note on my experience with tree-sitter: It's awesome in every way except one. It's so fast that so far I haven't even bothered with incremental parsing: I can do full parsing on every keystroke; and I know that if I ever need a speed boost, I can add incremental parsing. The API is sane and easy to use. The query API is powerful. But the main weakness of tree-sitter is that the error messages are nearly content-free. The most common error looks like this: "(error)". In my case I can deal with that, but I can imagine that for many purposes that's not sufficient. There's been an issue open for 3 years about improving the error messages[3] but I haven't seen a ton of progress (unless I missed it?).

1. https://microsoft.github.io/language-server-protocol/specifi... 2. https://github.com/microsoft/vscode-extension-samples/tree/m... 3. https://github.com/tree-sitter/tree-sitter/issues/255


> A note on my experience with tree-sitter: It's awesome in every way except one.

One more: the Rube-Goldberg build setup. You need rust, npm, node and a C compiler as well as Docker or Emscripten. Tree-sitter is mostly written in Rust and there are Rust bindings, but you can't use them to for the wasm compilation target because it requires linking a C generated grammar which breaks things. I hope they'll move on to something that's possible to build and use with pure Rust.


It seems to be mentioned, unless I misunderstand what you mean. The second-last section in the article begins with this:

> If you really need to do some CPU intensive work, it’s now possible to offload some of the workload to a language server. This allows you to implement the bulk of your extension in another language (for instance, writing Rust code and compiling it down to WASM).

The section above also mentions tree-sitter.


I think you're right. I got confused because he talks about compiling to WASM; I'm not knowledgeable about WASM but a language server is an entirely separate process, which means you can compile your Rust to native x86 or whatever you want. I assume Rust compiled to native machine code is faster than WASM? I guess the one advantage of WASM in that case is that you don't have to compile the language server for every platform the extension runs on. (In my case, I compile the language server for x86 Linux and Windows.)


I forgot to mention one other thing about tree-sitter: the error recovery/tolerance, which is a requirement for practical parsing applications. It does an excellent job of returning a parse tree even if the text contains tons of errors. Instaparse, besides being much slower, just gave up if a single character was out of place. Super rarely there will be an error that causes tree-sitter to fail more comprehensively, but it is one of the few parser frameworks that even tries.


I use treesitter from neovim so it's pretty crazy fast. :)


Speaking of tree-sitter, I’ve been experimenting with it for a static analysis and codegen idea and… wow it is fast. And super easy to use. I threw together a naive TypeScript type-stripping “compiler”, using the Node bindings so it’s got a couple bottlenecks. It’s within spitting distance of esbuild for a huge (10k loc) real world module (about 80-90ms vs ESBuild’s 20-30ms), and sometimes faster than esbuild for small (50-100 loc) modules. Granted it’s not doing everything esbuild does. But it was a quick experiment with a familiar domain before I go further. Quick as in mostly working within a few hours, and naively optimized for large source content in another hour.

The WASM bindings also perform very well (better in some cases), so it can be used anywhere WASM can. Which, it seems to me, means it’s a very good candidate to replace Babel without depending on language-specific tooling like SWC/Rome/Bun.


Tree-sitter is interesting. I see that it is used by the Github Copilot extension for VS-Code, which is the slowest on my machine because it somehow contains megabytes of minified javascript and webassembly to call a remote API.


The tokenization speed issue is already addressed by the language server protocol, which delegates syntax highlighting to language servers via the "semantic tokens" API. Language servers can choose to implement this however they want, and the API allows for incremental additions.

I think this is a much better approach that baking tree sitter into VS Code and continuing to use TextMate grammars (or tree sitter specific grammars) to add syntax highlighting. That way it can be applied to any editor that supports LSP integration. Really the world would be a better place for IDEs and text editors if we could just get LSP more standardized, and VS Code's dominance is a decent place to start with it. I'm tired of relying on editor and IDE authors for language support.


> The tokenization speed issue is already addressed by the language server protocol [...]

It's not really addressed. The semantic tokens API is intended for semantic highlighting:

> Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project.

Abusing the semantic tokens API for syntactic highlighting is slow, unnecessarily complex (why do I need to implement a language server just to do syntactic highlighting?), and only a partial solution (still need a TM grammar, still don't get correct code folding, etc.).


I believe I've read in various places that the difficulty of doing syntax highlighting fast enough is precisely why it isn't done via LSP.

Am I right in thinking that the Semantic Tokens part of the spec (currently) falls short of saying "this is for all your syntax-highlighting needs, please go ahead and implement full-blown syntax highlighters using it"

I definitely agree with you that the more we can share between different editors/IDEs the better.


> I think this is a much better approach that baking tree sitter into VS Code

they're implementing both, with tree sitter being 'dumb' version of LSP syntax highlighting: https://github.com/microsoft/vscode-anycode


Oh thanks for that link, that's really interesting.

Can someone explain the paragraph below -- I thought "this is an invocation of a function named bar" was what we mean by semantic information:

> All features are based on parse trees and there is no semantic information - that means there is no guarantee for correctness. Parse trees allow to identify declarations and usages, like "these lines define a function named foo" or "this is an invocation of a function named bar"


It uses heuristics to derive semantic information from the parse tree without doing a full semantic analysis.


Is "semantic analysis" in the sense you used it similar to "type checking"?


Semantic analysis is a broader term that (for languages with a type checker) includes type checking. https://cs.lmu.edu/~ray/notes/semanticanalysis/


What's "dumb" about Anycode is its semantic features such as "go to definition", code completion, etc., not its syntax highlighting. Alas, Anycode is a separate project.


I was curious about the implementation, and was surprised to find:

- they’re importing query files, with a .scm extension, in TypeScript

- they’re using esbuild to handle those imports

It’s surprising to me that the Microsoft, who created and maintains TypeScript, is using an alternative TypeScript compiler… in part to query semantic information from TypeScript.


Can we stop with the "in 2022" headlines?


It's an SEO thing.


Anyone else reading this article in 2022?


Add "in Rust" to the list too.


Is that really bad? I mean, it's telling me the language it's in, so it's a bit of valuable information if I'm looking for articles/posts about something a bit esoteric. Say I wanted to implement algorithm X for which there are few references, well if I read "Implementing Algorithm X in Rust" that's an informative title, whereas maybe if it just were "Implementing Algorithm X" and I then find out upon opnening it that it's in Rust, and maybe I wanted C, Python or MATLAB.


The more I read about VSCode and Electron apps in general, the more I'm convinced that all this effort is akin to putting lipstick on a pig. Modern CPU designs have all converged on multi-core as the solution to increasing performance, and yet on the software side of things... we've turned all of our desktop applications into Chrome instances running single-threaded event loops. That these extensions are even more poorly optimized is more icing on the cake.


I have to say, I find it sad how even the most trivial apps, like some basic bug trackers, are built on top of electron and manage to be extremely slow even with the latest 4500€ hardware. And I have to use a bunch of different slow-as-hell apps like this in work. It is just embarassing.


Funnily enough the quote on textmate grammars in the article feels quite relevant:

>The fact that we now have these complex grammars that end up producing beautiful tokens is more of a testament to the amazing computing power available to us than to the design of the [TextMate] grammar semantics.

Amazing computing power available to them indeed.


It looks like VSCode is running many threads (some as separate processes) on my machine. Is what you're saying that it does not provide an API for extensions to schedule work on other threads?


Single threading is not the issue. Just 5% of single modern core should be plenty fast enough to run a text editor. You still want multiple threads of course, to avoid blocking the UI for background work, but computers are so fast nowadays that it should still be fast enough even if all threads were run on the same core.


It would be utterly shocking to me if VSCode isn’t using several worker threads for LSP, extensions, etc.


VS Code team member here. The diagram in the article is a little wrong, but the basics of it are:

- The "main process" which manages the windows (renderer processes)

- The renderer process" contains the UI thread for each window, the renderer process can have its own worker threads

- The extension host loads extensions in proc, extensions are free to create their own threads/processes. The separate process for extensions protects extensions from freezing the renderer

- Various other processes that live off either the main process or the "shared process", such as the pty host for terminals which enables the terminal reconnection feature when reloading a window (also file watcher, search process)

We've been shuffling where processes are launched from recently but the actual processes and their purpose probably won't change. You can view a process tree via Help > Open Process Explorer to help understand this better.

EDIT: Formatting


Off topic: Is there a document/blog/article somewhere about the plugin architecture of VS Code? I'm less interested in developing a plugin (which google results usually yield) and more interested in, say, how VS Code determines the order in which plugins are called.


You could look up activation events on the website, I don't think we guarantee anything relating to order other than that. Generally the order in which they're activated shouldn't matter in practice.


Multithreading isn't some magic thing you slap on to go faster. You'll find single-threaded event loops in all sorts of high-performance code, e.g game engines.


Which high performance game engine is single threaded? To my knowledge they're all multithreaded and heavily pipelined.


Parents wrote about the game engine event loop. Not the engine in general.

VS Code is not single-threaded in general. It's event-loop is.


JavaScript now include workers and with Node.js have Worker threads. I am pretty sure that MS guys use it into VSC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: