I'm an engineer on the code intelligence team at Sourcegraph. We've been busy bu...

patrec · on Feb 23, 2021

What are those features out of reach of tree-sitter? I can see that you theoretically want something that's optimized for parsing well-formed code all at once, rather than potentially malformed code incrementally, but what trade-offs does tree-sitter make in practice that limit its potential for your use case? On the face of it, it seems to me like tree-sitter could server as a perfectly fine building block for generating LSIF or whatever from a code file.

efritz · on Feb 23, 2021

> On the face of it, it seems to me like tree-sitter could server as a perfectly fine building block for generating LSIF or whatever from a code file.

It does seem this way. Another reply [1] this post makes the same point with a nice proof-of-concept as well.

[1]: https://news.ycombinator.com/item?id=26230900

muglug · on Feb 23, 2021

I wish there was a more universal format for parsers, but I just don't think there enough people who know their stuff.

Take PHP, a language that a lot of people use: the tree-sitter-php extension doesn't support features added in 2019, let alone features added towards the end of 2020.

If you want an up-to-date PHP parser, there's really only one open-source parser[0] that's accurate enough to be used on PHP codebases old and new, and it's written in PHP. Then if you want to parse in a robust fashion you have to adopt a number of hacks to get everything working.

I hadn't encountered LSIF before – can GitHub be configured to use those maps?

[0] https://github.com/nikic/PHP-Parser

dcreager · on Feb 23, 2021

We've looked at LSIF before, and decided against it for a few reasons, mostly around COGS, operational overhead, and indexing latency. I gave a talk at last year's FOSDEM [1] going into some of the details. (Caveat that that talk was from when we were using a different open-source library, Semantic, to power fuzzy Code Nav. It's much easier to support new languages using the now-current tree-sitter query approach!)

[1] https://dcreager.net/talks/2020-fosdem/

gugagore · on Feb 23, 2021

Could you compare Sourcegraph to something like Moose, FAMIX, GToolkit?

https://github.com/moosetechnology/Moose