Hacker News new | past | comments | ask | show | jobs | submit login
Tree-sitter: new incremental parsing system for programming tools (2018) [video] (youtube.com)
131 points by ggurgone 13 days ago | hide | past | web | favorite | 28 comments

The most obvious application of tree-sitter is editors. I wrote a VSCode extension to replace the built-in syntax coloring with tree-sitter-based coloring: https://marketplace.visualstudio.com/items?itemName=georgewf...

I actually think it would make more sense for the various VSCode language extensions to just bake in tree-sitter for their language. I have had a PR open to do this with golang for a while: https://github.com/microsoft/vscode-go/pull/2555

What is the point of replacing the builtin syntax coloring? Is it faster or does it color more things?

Builtin syntax highlighting for e.g. rust is laughably bad - the treesitter highlighting is much better. Side note: I've recently switched to vscode as my main editor and so far the experience has been full of contrasts - many advanced features such as remote editing are the real gamechangers and work flawlessly, but some basic features (the aforementioned highlighting, folding, basic git integration) are notably lacking in polish. You kind of expect that if they've gotten advanced stuff right then basic stuff is surely in order, but that is not the case.

Have you tried Jetbrains IntelliJ? In my experience the IntelliJ platform is, well, if you look in the direction VS Code is pointing, there you'll find IntelliJ?

Tangentially related, there's some tree-sitter activity in the Jetbrains org on Github: https://github.com/JetBrains?utf8=&q=tree-sitter&type=&langu...

which is cool

I've used intellij a little bit and it is awesome (albeit a bit slow for my taste). The reason I stick to vscode is remote editing - compiling rust code locally on my laptop is a torture compared to compiling it on a beefy remote box! Remote editing in vscode is very well done, even most extensions work flawlessly without any changes. As I understand, there is nothing comparable for intellij.

Interesting!, thanks!

Depending on the grammar I think it's a little slower than the regex-based TextMate coloring. But the overhead is mostly due to the VSCode plugin architecture.

It colors more accurately.

Can you use tree-sitter for things that are more complicated than syntax highlighting, such as reference finding? I've been wanting to write a language server for a while but have been put off by the complexity of gracefully handling sections with incorrect syntax (while the user is typing, for example).

Important recent development in tree sitter was the new query language. Like TextMate or Sublime Grammars, ts in atom did use CSS selectors, but now it has a much more powerful s-expression query language which is useful for more than just syntax highlighting, e.g. static analysis. An application of that is Github's semantic, a haskell tool for code navigation and call graph analysis.

Demo and explanation: https://github.com/tree-sitter/tree-sitter/pull/444

Neovim is aiming to integrate this in the next major release, v0.5: https://github.com/neovim/neovim/pull/11113

I've been following tree sitter for a while, as I find the tech super cool and can't wait to see more practical applications.

One thing (among many others) that I've found really promising about Dark is its editor. See the hands-on video on their homepage for a demo: https://darklang.com/

It mostly feels like you're just typing text like in any regular text editor, but your inputs are actually manipulating the AST directly, and the editor itself ensures that your inputs can never result in an invalid program (i.e. there's no such thing as making a syntax error in Dark). It's inspired by tooling in the lisp world like Paredit and Parinfer, but Dark itself doesn't have to _look_ like a lisp because the structure of the AST is maintained by the editor itself instead of by users manually inserting and removing parens. It's an ingenious way to get most of the productivity benefits of a lisp-style syntax and all the structural editing tooling that comes with it, without intimidating new-comers with the super foreign looking parens infested syntax lisps are infamous for.

The other day I was actually briefly looking into whether or not it could be possible to replicate something like this in Atom using tree-sitter for some mainstream language like JS, but ended up getting blocked by the fact that Atom doesn't seem to offer an API for plugins to block/replace user input. This is probably for the best, given all the horrible ways this could be abused, but it does mean if I wanted to explore the idea further I'd probably have to either fork Atom to experiment with the idea or build something up from scratch, which is a pretty daunting undertaking given how deceptively complex modern editors can get these days.

But maybe I'm missing a different way to accomplish this in Atom with its existing APIs? Or does anyone know if VSCode's extension APIs can support this use case? I realize I've probably barely scratched the surface given how little time I've spent on it so far.

I really don't think it's inspired by Parinfer. It's likely based on the theory of structural editing and AST projections first popularized by JetBrains' CEO and available for experimentation in the open source project MPS. An end to end application of this theory is commonly referred to as a language workbench.

Papers: https://confluence.jetbrains.com/display/MPS/MPS+publication...

Language workbenches: https://www.martinfowler.com/articles/languageWorkbench.html

Nice intro to structural editing:https://medium.com/@mikhail.barash.mikbar/looking-at-code-th... (also mentions scratch)

> It mostly feels like you're just typing text like in any regular text editor, but your inputs are actually manipulating the AST directly, and the editor itself ensures that your inputs can never result in an invalid program (i.e. there's no such thing as making a syntax error in Dark).

The basic idea has been around for a while.

Here's something from the 80's: Alice Pascal https://www.templetons.com/brad/alice.html

> One of the first projects I did after forming Looking Glass Software Limited was a syntax-directed programming environment called Alice: The Personal Pascal.

> Syntax-directed editors are somewhat controversial, however I think they are quite good for people learning programming, and Alice was written first to be used in education in the school systems of Ontario. Our first sale was a contract to develop it for the Ministry of Education there.

> The other day I was actually briefly looking into whether or not it could be possible to replicate something like this in Atom using tree-sitter for some mainstream language like JS,

already being done as part of CodeMirror v6:



Will tree sitter also stimulate creation of free tools which work on the AST?

E.g. it's a mystery to me why we don't have free refactoring tools like the ones in IntelliJ. Like some free library which could extract methods, rename variables, etc. by modyfing the AST. It does not seem too hard.

Is it because the current AST parsers are not fast enough or is there some other reason?

From my limited knowledge/experience, the use of language server protocol (like in VS Code editor) enables refactoring operations like you describe, for example, in TypeScript it can create a struct out of function parameters, or create a class from old function-prototype based definitions. Compared to IDEs like IntelliJ, though, I imagine the feature set is much, much smaller in scope.

I did see some discussion about integrating tree-sitter with VS Code, but the focus seems limited to syntax highlighting, not operating on ASTs.

I found that the last time this talk was posted on HN [0], the author of tree-sitter mentioned that a couple of language servers are indeed using tree-sitter.

* Bash - https://github.com/mads-hartmann/bash-language-server

* Ruby - https://github.com/rubyide/vscode-ruby/tree/master/server

[0] https://news.ycombinator.com/item?id=18213022

You need semantic understanding to do several of those operations. Parsing often isn’t sufficient.

Yes, but semantic understanding is not really complicated for rename variable, for example, so it's strange there is no library which can do that.

So... You write your grammars in Javascript. Which is then serialized to JSON but a parser defined in Rust, so that it can be compiled to C?..

That’s... a very roundabout way of doing things.


Many parser generation tools use their own custom grammar language, and then generate a C parser based on that. With Tree-sitter, it’s a similar setup, except the grammars are written in JavaScript instead of some custom language.

The parser generator itself is all written in Rust, but the end user doesn’t need to use rust in any way.

I asked[1] recently if it's possible to remove the need of the whole NodeJS. The conclusion is that it might be possible to use duktape instead.

[1] https://github.com/tree-sitter/tree-sitter/issues/465


(it is the title of the talk)

Dates are usually added to posts that aren't recent.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact