Hacker News new | past | comments | ask | show | jobs | submit login
How LSP could have been better (matklad.github.io)
254 points by packetlost on Oct 12, 2023 | hide | past | favorite | 223 comments



I’m fighting with it right now. LSP specifies several configuration mechanisms but there isn’t a “blessed” one. This is a pain point because:

I want to write Python.

There’s one FOSS language server (pylsp) that’s a better fit for my needs than the other Python servers.

I want to use a certain commercial editor.

The editor and the language server don’t share a common configuration method.

Ergo, I can’t use the editor I’d like to write Python the way I want. It’s no one’s fault. The editor works just fine with other servers, for Python and for lots of other languages. The server is also operating within spec, and plenty of editors support its configuration method. However, they don’t play well with each other. I can still use the editor with that server as long as I’m ok with the default settings, but I’m not.

It would’ve been spiffy to have a preferred baseline method that all editors and language servers spoke, but they went the enterprisey “here are a hundred options - pick a few!” overly configurable direction.

Bummer. I really, really like that editor.


Have you opened an issue against https://github.com/python-lsp/python-lsp-server? I'm an occasional contributor for easy fixes; if your fix is easy chances are it would get picked up.


I came in at the tail end of https://github.com/python-lsp/python-lsp-server/issues/195. The possibility of me sponsoring a fix came up, and I’m on board with it, but the other contributor never replied.


Ah yeah, Carlos is very low on cycles (like all of us I guess), probably due to the day job. (If only OSS had competitive pay...)


That happens. I’m kind of bummed at the situation in general, but no one owes me to prioritize my convenience over theirs.


Perhaps unrelated but Ive been meaning to ask and I couldn't find - what's the deal with python-lsp-server and virtualenvs? It seems like it has to be installed in the virtualenv to provide completions, etc. Is that right?


No. The editor sends information about the environment to the server. That was a mystery to me, too, for a while.


What is configuration in this case?


In this specific case, enabling specific plugins, configuring their behavior, adjusting maximum line lengths, that sort of thing.


That seems out of scope of what LSP deals with


That’s exactly the kind of thing LSP deals with. You have to get those settings into the server somehow, and the main methods are for the editor to send them to the server, or for the server to load them from somewhere in the filesystem or similar. A lot of those settings (like line length) are configured in the editor by the user. It makes a lot more sense to have a mechanism to communicate them over the same protocol than to use a janky method like writing to a file with a specific name, or some other out-of-band method.


LSP assumes that there is a higher level editor integration that handles configuration. That’s not ideal but it’s somewhat sensible as different editors will want to expose different ways to configure them.


Ideally, adding an LSP would just be adding a file path to your editor config. This file could contain standard info like supported file extensions, the executable of the LSP, and the parameters to pass the executable. You could also disable/ignore certain features. And, this file would have a common format across all IDEs.


True, but there are also a slew of methods like workspace/didChangeConfiguration for passing config around in-band.


LSP is a protocol for answering queries about the language model behind the text, not controlling anything about the text itself (expect for auto formatting, but that's a bit special). Those configurations seem to be in scope for the text editor, not the language server.

In fact LSP is mostly configuration agnostic. It might use configuration for a particular query like autoformatting, but it's still request/response in terms of text edits.


This is incorrect;

The textDocument/diagnostic[1] is far beyond "the language model behind the text" and includes linting which many LSP servers implement (TS and python for example), and it can also happen as a "push" as opposed to "request/response"

[1]: https://microsoft.github.io/language-server-protocol/specifi...


You have to tell the server how to interpret the code somehow. I mentioned line length. In Python, you may want one configurable max for code and another for comments. If you can pass those to the server, it can tell you which lines are too long using the standard mechanism for identifying errors and warnings. Otherwise, the editor would have to know how to parse the code so it could know whether a given line is code or comment. That’s the sort of thing LSP can do better.

That’s just one example, not the lone thing users would likely configure.

Edit: More concretely, here are some of the config knobs you can twiddle via LSP for the particular server I’ve been discussing: https://github.com/python-lsp/python-lsp-server/blob/develop...

There are other plugins around with settings that aren’t documented there, but are still configured through the same mechanism.


I get what you are saying, but it still sounds like a linter. And the linter (and syntax highlighting etc.) do need to understand the code - but the LSP is not necessarily the right tool for that.

In many cases the right tool is something like treesitter. Treesitter paired with an LSP is an incredible combination.

But yes, I understand your frustration. And if the best implementation happens to be inside the LSP then so be it.


I see your point, too. (And also adore treesitter!)

The examples I gave were around lint-y types of things, but the same problem affects more LSP-y things, too. The link I gave has lots of tweakable knobs for refactoring plugins and other stuff that’s more philosophically in-scope for language servers.


> LSP is a protocol for answering queries about the language model behind the text, not controlling anything about the text itself (expect for auto formatting, but that's a bit special)

LSP supports refactoring with "Code Action Request":

https://microsoft.github.io/language-server-protocol/specifi...

That's more than simply answering queries about the model.


No, LSP describes what information to present in the editor. For example, Inlay Hints. The user might want to see the type of every inferred variable, or just inferred return types of functions, or nothing at all. LSP needs to consider the user config and only send the info that should be displayed.


people actually like their editor? blasphemy. you must be using either sublime or kakoune


I do! The one here is BBEdit which you can extend by writing - ready for this? - command line programs. It didn’t come with a built-in JavaScript formatted, so I wrote a shell script that runs Prettier on its input, and told BBEdit to run that script when I hit F1. Voila. JavaScript formatting. It’s also incredibly fast and responsive.

If I can’t get it working to write Python the way I want, I’m going back to Emacs. It’s not a “native” Mac app, but doesn’t fall into the uncanny valley the way things like VSCode do.


I've been down that path with both Acme and Kakoune, although I went with separate formatting scripts based on whatever seemed to work best for each language. But I started to go a little insane trying to extend those editors, because I realized that I hate working with the shell and prefer to avoid working with it at all cost. And so back to Emacs I went. And now I oscillate between repeatedly poking my config with a stick, and trying to force myself to migrate to VS Code.


I switched from Emacs to VSCode for a while, but couldn’t stick with it. It’s soooo easy to set up and start using, and it does the right thing 99% of the time. However, that 1% drove me nuts. I don’t remember the specifics, just that there were a few things that bugged the everliving hell out of me, and they couldn’t be changed without doing some significant work inside VSCode itself and then in its extensions.

The worst thing about Emacs is how it absolutely spoils everything else. Yes, it’s a pain in the neck at times, but you can change anything you want.

That said, BBEdit got far closer to my platonic ideal editor than VSCode ever did, and it’s nice enough in other ways to win me over.


Haha, yes, virtually all of that has been my experience as well, except substituting BBEdit for Sublime. I'm still convinced that just sucking it up and sticking with VS Code is the way to go, I just can't make it stick.

But Emacs is still alluring enough that I spent like 4 hours over the last few days trying to get clangd to work in a project that involves cross compiling C++ in a way where generating compile_commands.json does not seem possible, with no success, even with the help of clangd contributors, where VS code somehow is able to just work with minimal configuration of the C/C++ extension. Someday I will stop doing this to myself, hopefully.


My yesterday looked a lot like that. Well, if I can't get BBEdit to work like I'd like, guess it's time to dust off my Emacs config and modernize it. Once more into the breach!

If you figure out how to stop this, please let me know.


supreme blasphemy. Emacs


you just have a call to (stockholm-syndrome) buried somewhere in your 1200 line init file


The amount of Emacs configuration I had was what drove me to the insanity of writing my own editor....



Parenthesis are flying like god's edition… ;D


> The problem, column is counted using UTF-16 code units.

Rather than fixing the problem by using code points, they made it worse by allowing any encoding. Now, not only do you need to support UTF-16, but also UTF-8 and UTF-32 [1].

[1] https://microsoft.github.io/language-server-protocol/specifi...


You don’t need to; UTF-16 is mandatory (ugh), but support for other encodings is negotiated (see ClientCapabilities.positionEncoding and ServerCapabilities.positionEncoding).


This seems somewhat pointless, because for compatibility you need to support UTF-16 anyway, so why bother implementing any of the other encodings in addition.


Because if the server and client both use UTF-8 internally, they can get better performance by agreeing to use UTF-8.


The question is if the performance gains are noticeable enough to warrant the added implementation complexity and the added integration testing needed. It seems doubtful to me, given the general protocol overhead.


Or use UTF-8 everywhere to fix the problem. Code points have their own issues (as does UTF-8, but on balance these seem like better engineering tradeoffs).


Using UTF-8 only has better engineering trade offs if you represent text as UTF-8. If you don’t (like JavaScript) then it’s just extra complexity. It’s no better than UTF-16. Only advantage it has is it’s more common among newer languages.

Code points is the only thing that makes sense for a multi-language protocol. It’s unambiguous and every Unicode client can talk in code points, even if they use some exotic encoding.


Recently I stumbled upon this issue:

https://github.com/joaotavora/eglot/discussions/1127

I don't know enough about emacs and LSP to see the full picture, but it seems that both eglot's and corfu's maintainers, assumably very competent programmers, can't find a solution for this.

I only skimmed the thread. My understanding is that LSP dumps a long list of completion candidates at once and they can't decide a cache strategy that works well with existing code...? If someone who know more about this please enlighten us.


I read the almost the whole thread and it does not seem to be an LSP issue. As far as I can tell, the issue is purely corfu and eglot disagreeing about the semantics of internal emacs completion hooks they use to communicate and whether corfu should cache on its side or should rely on eglot. Company does not have this impedance mismatch with elgot so it works fine.


There are actually two separate issues in the thread:

1. The issue you identified which is about caching. It seems you can work around this issue by using a cache busting hook from cape.

2. An unsolved issue where eglot and LSP get out of sync which seems strongly correlated with using corfu. I've been experiencing this a lot with eglot+corfu+rust-analyzer. I'd estimate about once per hour I'll notice that things are just wrong, as in red underlines in nonsense places with nonsense error messages and completion no longer working or working incorrectly. Running a M-x eglot-reconnect "solves" the issue for another ~1 hour.

The way these two issues were related is the author of eglot states that they have sunk considerable time investigating issue #2 with no success, and they argue that it is not worth their limited time to continue investigating #2 when corfu is broken by design[^] w.r.t. #1.

And yes, I'd agree that this doesn't seem to be an LSP issue but rather an issue solely on the client side.

[^] https://github.com/joaotavora/eglot/discussions/1127#discuss...


I wonder if this is still the case as most comments petered out end of March. There was a nasty bug discovered in Emacs JSONRPC-Library, which did cause spurious LSP connect errors and likely other issues in LSP server communication

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=8bf4c...

Do note that many users might have not been aware of this issue. The fixed library is part of Emacs 29. If you are not on Emacs 29 you might pull in a more recent JSONRPC-package from ELPA in which case it will shadow the inbuilt/pre-shipped library.


I stumbled upon this issue because my eglot still constantly gets out of sync with LSP. (emacs 29.1, latest stable corfu and eglot)


Personally, I have never seen this desynchronization issue between eglot and corfu. I suspect it somewhat depends on the particular LSP server, and I only really heavily use eglot for C# (OmniSharp).


I'm curious to know what `company` does differently here than `corfu`.

As a longtime user I couldn't be happier: https://company-mode.github.io/


It appears to be a cache-busting issue, if I understand the bug discussion correctly. company-capf always busts the completion-in-region cache, corfu never does unless you wrap your completion function in a cache-buster function.


The way a language server should work, imo, is it holds all the code and serves projections of the code on the fly to the editor. The user of the editor makes changes to the code and commits it back, at which point the language server parses the code and integrates it into the codebase, pretty-printing it to files as necessary for source control. But, the file layout should be an implementation detail that the editor knows nothing about.


Exactly! There is such a shift in paradigm that needs to happen here and the only project I know of that is moving in this direction is Unison.

I don't want to edit a "file", I want to edit these two functions that exist in some module(s), why can't I just see those two?

I constantly jump between many different languages and the cognitive load is noticeable: was it "!=", "/=" or "~="? Why am I writing/viewing ascii art when I'm coding?

If I am most comfortable/fluent viewing python, why can't I view a javascript source as python?

I think the remaining challenge is what sort of projections are the best/most useful? How do we manipulate these projections? I have played around with these ideas and made an AST viewer for the browser where you could configure exactly how a node was represented (using CSS) and navigation was done in block mode (node traversal) but I found it really hard to build an editing experience that felt smooth..


> If I am most comfortable/fluent viewing python, why can't I view a javascript source as python?

Because they are not isomorphic. At all. Even if you just consider the languages themselves, and ignore their ecosystems, which, in practice, you cannot.


I think this is a bit too simplistic of a take: there’s no reason you couldn’t have multiple syntaxes for the same AST so that people could work in the syntax they prefer (e.g. C-style, Pascal style, Indentation-sensitive, S-expressions). Textual syntax can be implemented a layer up, if the input to the interpreter/compiler is not text but a data structure. (The true meaning of what people think is the benefit of “homoiconicity”: eval consumes and produces the same types of data)


Sure, you could do that. But how does it help you understand the code better than reading it in the original source form? Programming languages are more than syntax. If you don't understand the semantics, you don't understand the code.


Different syntaxes are better for different purposes: s-expressions are easy to manipulate structurally; significant indentation is often easier to read; etc. Semantics is important, but syntactic noise is too.


You mean different purposes by humans or different purposes by machines?

For machine manipulation, I think it makes more sense to directly manipulate the AST.

For human manipulation, I think the cognitive overhead of mentally converting between the display syntax and the canonical syntax would far outweighs any gains in readability. But maybe your workflow is different than mine - If you have a lot of custom macros in your editor, I could see s-exps being useful (although, again, I think exposing and directly manipulating the AST would be less error prone)


A programming language doesn’t need a canonical syntax if its semantics are specified in terms of the data-structures the parser produces and not in terms of the textual representation of those data structures.


> Textual syntax can be implemented a layer up, if the input to the interpreter/compiler is not text but a data structure.

If you designed the languages and interpreters/compilers around it. Neither Python nor Javascript are, though.


My point isn’t that existing languages are designed this way, but it wouldn’t be hard to retrofit this onto an existing language. Especially one like JavaScript that already has relatively widely-used transpilers


Some percent of python code I’ve written code be rewritten as JavaScript code at the function-level.


Meta's "Transcoders" or whatever they changed the name to demonstrate that. However, if we want perfectly semantically equivalent functions, as soon as you add two numbers, then Python -> JS is impossible. The best we can do is approximately translate the behavior.


> I don't want to edit a "file", I want to edit these two functions that exist in some module(s)

That shift happened like 20 (?) years ago. That's how Eclipse displays your Java stuff. It goes to a great length to pretend that there aren't files. Instead there are packages.

Seeing noobies and experienced programmers struggle with it for years, my conclusion is that this is a bad idea. Most problematically it creates "programmers" who have no idea how their project is actually organized, or how to open files that nobody from the ops department put into their editor in such a way that they can be discovered. The amount of dumb questions I had to deal with is on par with those IT stories about outrageously incompetent users pushing mouse buttons with their foot or forgetting to plug their appliance into power supply.

In practice, the more programmers are removed from the actual thing they are programming, the worse are the results, the lower is the competence and the more resources are wasted. I would rather live with the downsides of poor synchronization between the language server and the files I'm editing then let the language server be in the datapath. Too much headache for very little gain.


I agree with all of this.

I'd only add that well before Eclipse and its ilk, Java started down this path with the deep filesystem paths that made it painful to work with from the filesystem without the kind of multi-level collapsing Github does. It was a choice that pushed people towards seeing the filesystem hierarchy as a nuisance, and laying the groundwork for encouraging people to obscure it in IDEs.


The problem with the filesystem is that it privileges organization scheme which isn’t the best one for every editing task. This makes, for example, implementation inheritance hard because your class has a bunch of invisible code in it. But, it you could expand all the superclass methods into a single view and then have edits automatically integrated into the appropriate places, this wouldn’t be as much of a problem.

Java’s filesystem hierarchy is a great example of a “fileout” format for the sort of environment I’m talking about. Another example here is smalltalk repositories generated by Iceberg: https://github.com/pharo-vcs/iceberg


The thing is, nothing stops you from having alternative views as well, but the moment you make that expected and de-facto privileged by making filesystem nabigation painful, and people stop thinking about how to present the project as a whole in a narrative as a result, you tend to lose structural information that matters when trying to navigate unfamiliar code.


It’s actually the opposite: if we moved to storing source code in, say, sqlite and built tooling to make querying these databases easy, then it would become a lot easier to get a high-level understanding of a project. Especially if, in addition to the code, you stored links (e.g. from a function to the functions it calls; from a class to what it references).

I personally find Common Lisp and Clojure much easier to navigate because I can just ignore the filesystem layout and use the in-image database of code relationships to navigate.


I strongly disagree with this, given we have real examples of image based systems to compare with. You lose a significant amount of structural information that way.

Again, note that nothing stops you from ignoring the filesystem when navigating relationships. Nothing stops your IDE from indexing the data. Even ctags is decades old.

What the filesystem structure provides is additional context: "these things belong together for some other reason than the relationships directly expressed in code.

In a codebase where nobody bothered with that, or they've just dumped code together for superficial reasons sure, you will gain nothing, but you also lose nothing because you can fall back to querying your IDE or whatever.

In a well written codebase, on the other hand, the structure lets you follow a narrative.

Put another way: If you need to query a database to get a high level understanding, it's a strong signal that the person who wrote the code thought nothing about communicating the architecture to you, and to me that's a warning that the code base is going to be a massive pain to work with because that tends to extend to other areas.


> note that nothing stops you from ignoring the filesystem when navigating relationships. Nothing stops your IDE from indexing the data. Even ctags is decades old

Sure, but all these systems do significantly more work than necessary (or have subtle caching issues and race conditions) because they have to be continuously reindexing an anemic model of the code base.

As far as image-based systems go, give me one of those any day: Common Lisp and Smalltalk have tooling and introspection capabilities from the future. My own experience is that I’m significantly more productive getting up to speed on a new Lisp (Common Lisp, elisp, Clojure) codebase than on any of the alternatives because the system stores so much metadata about the entities.

Also, I think you're underestimating the capabilities for forming narratives that my proposed system gives you: views, stored procedures, various tools built on things like graphviz for visualizing the structure of the code.


> Seeing noobies and experienced programmers struggle with it for years, my conclusion is that this is a bad idea. Most problematically it creates "programmers" who have no idea how their project is actually organized

The layout of files on a filesystem is not how a project is organized. The organization of a typical project is a graph that’s lossily represented by filesystem trees.


What I'm trying to say is that this approach prevents developers from effectively working with the tools their projects rely on to function.

I.e. be it Ant, Maven or Cradle, in order to carry out project-related tasks they will rely on files. They feed files to various tools, create new ones, delete or move old files, and then the deployed project needs to discover those files somewhere and so on.

When a programmer doesn't understand how what they are presented with in their editor maps to whatever any of those tools do you get questions like: "Where is my Java home?" or "I want to debug in the testing environment, can you tell me where is it?" or "I think I've built my program, and I want to patch the existing deployment with the program I've built -- how do I find the program I've built and where is it deployed?". Not to mention more trivial stuff like developers arguing about having / not having access to eg. Protobuf files in their project because someone's editor not having a plugin to open them and they simply don't know how to find their project directory on their computer... or trying to run poorly written Maven build which has some relative paths in it, from a wrong directory.


You might find https://www.jetbrains.com/mps/ interesting. MPS is a sort of AST-oriented IDE.


Even operators that look the same (e.g. “+”) often have different semantics between programming languages (type promotion, rounding, modulo arithmetics). Translating between programming languages while maintaining the original semantics is exceedingly complex, and you might not like how the result looks like. Those differences are why we have so many programming languages in the first place.


This browser AST viewer sounds pretty neat. Mind sharing a link to your repo of it?


That's exactly how LSP is designed. The editor e.g. requests 'the user is hovering over code <here>, what should I show?' and the server responds with some Markdown text. Or 'the user wants to navigate to the thing under their cursor <here>, where should I go?' and the server responds with a file/line.

Once you try to use LSP for anything not in that form.. it's not so fun.


No, this is the opposite of how LSP is designed: LSP assumes the editor is looking at a file and this results in a complicated sync dance between editor state and file state. I want the editor to look at temporary buffers served up from the language server and have the code be “written” by sending the temporary buffer back to the language server which handles writing it out to files itself.

As far as I know, the only languages with something even vaguely like this are Pharo Smalltalk (with its git integration) and unison


> I want the editor to look at temporary buffers served up from the language server and have the code be “written” by sending the temporary buffer back to the language server which handles writing it out to files itself.

I don’t think any editor would want to accept this workflow. The biggest issue is that writing is reputation-critical, if the editor writes to the wrong place, or it fails to write when it should have written, then users lose their work which makes them very unhappy. So the editor has to take responsibility for doing the writing, and that is the user’s mental model when they save their work.


I want a system more like Bank Python[1] or Unison[2] or Pharo Smalltalk[3] that gets us beyond the idea of "code in files". And I have plans to build this on top of Common Lisp + SLIME

[1]: https://calpaterson.com/bank-python.html

[2]: https://www.unison-lang.org

[3]: https://pharo.org


Loads of people absolutely hate having to use a weird custom IDE just to try a language. That's why all the smalltalk like things are doomed for failure unless they are enforced by a platform.


Yeah, this is the problem my proposal solves.


The advantage here is this enables some interesting workflows, like “show me this method and every definition that overrides it in a single view” or “show me this method and all the methods it calls directly”. So you can project the interesting part of your code base into a temporary editor buffer and then edit it and the language server takes care of persistence.

Also, it ends the formatting wars because the in-repository format is disconnected from the user’s preferred format.


I don't use LSP for all the real-time stuff, but I do use it for other things like rename symbol, lookup documentation, jump to, etc. and just "the file you have open" isn't really enough for that in most cases. You need to know about the "file layout" or "complete project" for that sort of thing.


Last time I read the LSP specs/repos, there were 2 nuances regarding your statement:

(1) the editor read the files, display then, and then sends changes to the LSP which then mirrors the file to compile, analyze, etc it. The file is never persisted in that stage to the file system (needs to because otherwise no syntax highlighting while editing.

(2) there are conversations about that in both the LSP and Editor space about virtual file systems. Google "language server virtual file system". The core author of the LSP spec has written one issue very related to it.


(2) is interesting, it would be cool if LSP evolved a sort of “file system server” ability that could be integrated into things like TRAMP in emacs


Imho it should be the other way around: the editor knows about file layout and everything, and the language server queries the editor when it needs contents for a specific file. Unfortunately the LSP don't have calls to query contents.

The rationale is that only the editor knows what files are open and modified. Also we can imagine scenario where the editor and the language server are on different remote (eg GitHub codespace or the browser version of vscode. The language server could be a wasm build running where the editor is running, but the editor may access files in a remote server, or vice versa)


Well, a lot of this is that I think files are a bad way to represent code. You really want something more shaped like a database that allows for multiple views into the same code. Git would for serialization, but the source of truth should be inside the language server and either the files or the repository.


This is something I have been wanting for nearly a decade. A lot of writing software isn't just implementing your logic and abstractions but actively thinking about how to organize code to the constraints of the filesystem. Having to actively model your modules around file paths, Rust for example tightly binding the use of `mod` to your layout. Refactoring is the same, a non-trivial amount of time on large projects when re-factoring is realising you need to re-organise some module hierarchy and that involves modifying the file system too.

I really dislike this, instead of a fuzzy file finder I want a fuzzy function finder, where all functions are just kept in a database that I can pull into buffers at will. Where hierarchy is only based on the logical structure of your program and the filesystem ceases exist. "New Function" over "New File". You can get the "Fuzzy Function" finder part somewhat with LSP Symbols, but it doesn't get rid of the having to think about files.

Unfortunately I don't think you can get this without first-class support by the language itself, and new languages getting critical adoption isn't a regular thing.


That's basically a subset of image-based development, as seen in Smalltalk and Lisp since forever.


Yes, and I write a lot of lisp


That sounds like it would make for bad latency while editing. And what about other files in the project, other than the files of the specific programming language? The IDE needs to understand their file layout anyway, and often there are dependencies to the layout and naming of the programming-language source files. And you want to do stuff like textual search across all project files. Effectively your LSP server would have to become a full-scale remote IDE.


Does anyone know why LSP uses UTF-16 for encoding columns? It seems like everyone agrees it is a bad choice, so I'm curious about the original reasoning. Are there any benefits at all to using UTF-16, or was it something to do with Microsoft legacy code?


The JavaScript VM, Java VM, .NET VM, and several other runtimes (including effectively the entire Windows API) have their fundamental definition of strings be based on UTF-16 baked in.

I believe the original producers and consumers of LSP were written in languages that had string lengths based on UTF-16, so it was the literal easiest way to do it, even though UTF-16 is probably objectively the most painful thing to compute if your string system isn't UTF-16.

LSP eventually got a solution where you can request something other than UTF-16 offset calculations, but I don't remember the details of what that solution is.


There was a lengthy discussion on this [1]. UTF-16 was used because it was convenient: it's what Microsoft API's and JavaScript already use (the latter being the language VS Code is written in).

[1] https://github.com/microsoft/language-server-protocol/issues...


That thread was infuriating. Since when does an encoding format have an evangelical task force? I'm all for UTF8 everywhere but wow some of the replies were super cringe.

Even when the proposal of "UTF-16 default, UTF-8 optional" was made to keep backwards compatibility, it was not enough. It has to be UTF8 because it's superior technically, as if that's the only consideration! I agree they should've just picked one, but I still don't think the maintainers needed a refresher on what is UTF-8 every 3 comments.


Count me among the UTF-8 everywhere absolutists. There are two ways to encode text: UTF-8, and a worse choice.

But I wouldn’t be annoying about it. I’d just tut tut from afar. (Though if the decision is still up in the air, I’d argue as passionately as any preacher to persuade our fellow devs to adopt our lord and savior UTF-8 into their hearts and minds.)


Yeah I would absolutely take utf8 everywhere. I hate dealing with anything else.

But I think the worst part was that the maintainer was clear that he/she wasn't debating this on a technical level. Like, they weren't trying to decide which encoding was better. From what I understand it was more about how best to deal with the (at the time) current design choices without breaking the current implementations, and feedback from actual implementers.


I'm inclined to agree that some manner of backwards compatibility is important. A middle ground with a path towards exclusive UTF-8 use seems like a fine compromise. However three things come to mind:

* LSP is being used outside of VSCode, and while UTF-16 may be helpful in that case it's a hinderance for others.

* Institutional knowledge of UTF-16 ain't great at Microsoft either. Github broke rendering of multibyte characters and it took a random GH user to the devs explain how multibyte characters and strings interact in Javascript before that got fixed.

* [insert lots of handwaving about the downsides of electron]


For earlier archaeology see [19]. It seems to me people had started coding extensions in VS Code without giving any real thought to the question, so the default choice inherited from the language was UTF-16.

[19]: https://github.com/microsoft/language-server-protocol/issues...


JavaScript uses UTF-16 for everything is why, and LSP is a TypeScript-first protocol.


Sadly there is some standard to that. JavaScript source maps also use the same definition for columns.


The biggest problem with LSP is that it's a lowest common denominator solution. If I try to edit OCaml using an LSP solution, I don't get features like "query type of symbol", presumably because JavaScript doesn't need it.


It's not a problem, in contrary it is good that implementors are pointed to possible code intelligence features that they current IDE's don't provide either.

I read some factual inaccuracies here.

LSP (the protocol)'s main driving force was for a long time TypeScript rather than JS. FYI TypesSript has a stronger type system than Java at this point, and it's LSP server is the most comprehensive and supported way to write TS with.

Another example of an LSP success story is Rust's rust-analyzer. Not to mention that C++'s clangd also became an LSP server many years ago.

All these are strongly typed and bring extra semantics (templates, lifetimes) way beyond other languages.

"query type of symbol" functionality is provided as an example by the textdocument/hover call [1]

But even if this wasn't implemented, the protocol lets any implementor extend the interface with additional / non-standard methods (and clangd did exactly that[2])

It might be that the not-widely-popular ocaml's LSP support is not there yet. Probably it's an open source project, so you can help with the implementation, or at least vote for missing features to be implemented.

[1]: https://microsoft.github.io/language-server-protocol/specifi...

[2]: https://clangd.llvm.org/extensions


Agree this is not a problem. rust-analyzer also includes a boatload of custom extensions. Here's how "query type of the selected expression" works, for example:

https://github.com/rust-lang/rust-analyzer/blob/master/docs/...


Hovering over the name of a variable in Haskell shows the type. It sounds like your OCaml LSP server just hasn't implemented this yet.


You don't get those features because the OCaml LSP doesn't implement it.

Works fine in other LSPs.


LSP is still (last I checked) missing one nice feature that cscope had 30 years ago: distinguishing between read and write references to a variable. Unfortunately it (as mlcscope) didn't keep up with C++ and ultimately vanished.


That's part of the protocol with a Document highlight request. Basically if you click on a symbol in the client, it'll send the server side request where it can find all references and mark which ones are read and which are write. The client will then highlight those references with one color for read, one for write, and one for other textual occurrences.

https://microsoft.github.io/language-server-protocol/specifi...


Wait, does that feature mean what is sounds like? That sounds too basic to be fundamentally beyond LSP.


Yeah, it sounds like it should support it.

Traditionally though OCaml editor integrations have also supported not only asking just the type of a symbol, but of an expression. I wonder if LSP can do that, because that function needs some interactive scoping of the query, not just a single point, or I suppose it can work if hovering over parenthesis but if precedency needs to be accounted for, it would be difficult to understand what the user wants to see.

I've _really_ enjoyed the expression type queries in the past, but I haven't coded OCaml for a while :/.


That's doable in LSP - the server knows about the selection, not just the cursor position. E.g. if I use rust-analyzer (in Helix, so there aren't any protocol extension in play), select an expression and request hover info, rust-analyzer shows the expression's type.


I hope we're nearing the end of the era where we're editing text in a bubble of tooling that understands ASTs but has to map everything back to text.

I look forward making tabs vs spaces a viewer setting and not an author choice.


Tabs vs spaces has been a viewer setting for decades for tab users.


Exactly! A tab is literally "1 level of indentation". You can render that one level any way you like.

Unfortunately, though, everyone who has used tabs got bitten by stupid text editors that don't make it obvious whether you've used a tab or a space and substitute a tab for any run of 8 spaces. So if you're aligning an argument list you'll get a mixture of tabs and spaces. Everyone got bitten by this once back in the day then switched tabs off forever. They then switched tabs off in the text editors that the next generation are using.


Elastic tab stops would be the Right Thing if they were implemented anywhere at all https://nickgravgaard.com/elastic-tabstops/


I don't understand the appeal of pretty alignment of arguments with the last character of the function name in the row above. What's wrong with just using one more standard indentation level, like Black does in Python?


Black uses more lines. PEP-8 is generally more compact.

I'm more than happy to let something like Black handle formatting for me, though. Sure, there are some occasions where my own formatting would have been better, but the benefits far outweigh the (very small) costs here.

The silly thing is with tools like Black etc. it should be possible to use tabs again. If the only reason against tabs is people fuck it up, there's really no reason to use spaces in an auto-formatted project.


I suppose there must be some data in the serialized program which marks the start and end of a function body. But rather than having the semantics of that marker be:

> This is indentation whitespace, display it how you like

It should be

> This is a function body, display it how you like

Not only should readers be able to differ about how much whitespace is shown. They should be able to differ about whether it's whitespace that is used at all. Maybe they want something crazy like it makes a different kind if sound when you look at it, or appears in a different color or font. Whatever it is, it's no business of whoever is writing the program--they ought to be able to have an entirely different experience while still kicking out the same program.


We're nowhere near the end of it, in fact the industry just doubles down on this model every year.

It's just the lowest common denominator, and it will always be that. Revision control tools, editors, compilers, the entire world is built around this. And there's no single alternative that could take its place.

The reality is that systems like a Smalltalk image or stored procedures in a database, or a fancy IDE, or whatever are just going to have to bundle tools for good interop with filesystems at that level, and that's just how it goes.

In the end what we have in text files is a mediocre, but universal, interop system.


People have kept saying this since paredit...


Yes, but we've been getting closer since then. Unison, for instance, is a step in this direction.


> I look forward making tabs vs spaces a viewer setting and not an author choice.

But then we'll just have AST vs tabs & spaces. A layer down, tabs already should've solved 2 vs 4 vs 8 vs whatever spaces by making the semantic separation a single character, and the representation of it an author choice.


Agreed. Even beyond that it would be great for merging & conflict resolution too.


Right. Line-wise diff is such a crude instrument.


You can already get some tools e.g. https://difftastic.wilfred.me.uk/introduction.html which try to do semantic rather than purely textual diffs


I do not follow the argumentation. LSPs are created for text editors. They are focused on lightweight text editor presentation. If you want to expose AST, you either start implementing clients again (defeating the original purpose) or you start being 3-7 levels down the abstraction madness. You can do that and projecting ASTs instead of representing text (looking at you JetBrains MPS) but even the best IDE companies in the world has abondon that in favor of being text based.

With that LSP would not solve the n x m problem of editors to languages but would require all text editors to reinvent themselves into something completely different.


Part of the problem is that the LSP protocol & model simply can't work with systems that aren't text-file-based; like a Smalltalk, say. Or stored procedures in a database. Or other heterodox models.

It makes the (probably reasonable) assumption that all development happens in file-based text editors, and imposes the text editor model. That's fine, but there are other ways to organize code objects, and the LSP ecosystem is mainly useless to them.

Which is frustrating for the people working in or on that model, but maybe of no concern to others and probably no reason VSCode etc would want to have focused on the possibilities of that alternative.

But it means people doing the heterodox will have to reinvent the wheel completely instead of partially.


But that is something you can fix when you virtualize the file system. In the end it is text. And I think there is some movement. See other comments.


Overall very informative. Really like the comparison with the Dart analytics protocol and the Jetbrains protocol. I think the comments on state synchronization with subscription based feature providers sounds very promising.

Hopefully LSP specification can evolve over time and incorporate some of these ideas!


One thing that has annoyed me with some personal implementation work is why does DAP use a cosmetically similar but not following spec version of JSONRPC? Why not just use JSONRPC especially considering LSP already does?


Because of organizational disorganization within the VS Code development teams. LSP and DAP are basically the VSCode APIs for implementing language support and debuggers but bolted onto an RPC layer without synchronization or consistency between the people/teams that develop them.


Conway’s law at its finest!



I would assume DAP predates (and possibly motivated) JSONRPC (edit: so it might have been working off of a draft spec). Hopefully someone knowledgeable can comment.


Note: This was definitely not correct! JSON-RPC apparently started in 2005.


LSP is just Microsoft's EEE[0] into the open source space, as they've done with a number of things (including Typescript and buying GitHub/NPM, WSL, etc).

Visual Studio was a meme for a long time because it was so heavy and enterprisey, so to capture more market share they made the cool new shiny version VS Code, and wanted to compete with e.g. Atom whilst still having their grip on the ecosystem as a whole.

So LSP was born. It has a lot of problems. It only really works well with VS Code (yes, there are exceptions, but not many, and they aren't paid for by a mega corporation).

This is Microsoft's way. They try to sway the developer community through force, not persuasion.

[0] https://en.m.wikipedia.org/wiki/Embrace,_extend,_and_extingu...


What exactly did they embrace and extend with LSP? And what's going to get extinguished? They created it and others adopted it...

They could've simply integrated TypeScript into VSCode using a proprietary protocol to keep people locked to it. Instead they took care to make a well documented and reasonably open protocol that others can use for their own language support, and to integrate TypeScript into other IDEs.

I really don't understand what you think they should've done instead.

Do you think that the world where every compiler has its own API is better? You're still free to do so - just develop an extension for VSCode and/or all the other IDEs instead of a single LSP server implementation.


What we are seeing now could fit a timeline where Microsoft is doing EEE again, as well as a timeline where Microsoft is trying to change its ways.

Building VSCode, LSP and Typescript could still prove to be the Embrace and Extend phase. There are some worrying signs, e.g. many VSCode extensions built by MS don't work on open-source VSCode builds like code-server and VSCodium. From a business perspective, this makes sense, as GitHub Codespaces and vscode.dev now have features that are difficult to get in competitors. Maybe at some point they will try to Extinguish all local development and push everyone to use a cloud version of VSCode? (See e.g. Python integration in Excel)

The truth is probably a lot more complex, but we can't rule out anything yet.


By this logic, they are also currently Embracing and Extending Windows, Office, etc. This doesn't make sense to me. It's their original products, not something they embraced and plan on extinguishing.


Windows and Office are both mentioned as products that are leveraged to do EEE on Wikipedia: https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...

They are not extinguishing their own products, but using their products to extinguish alternatives. The example I mentioned actually fits the wikipedia definition quite well in my opinion.


CSS was to a large extent a Microsoft project (when the time came to introduce stylesheets it was more-or-less Opera and Microsoft against Netscape, and Microsoft had many more users than Opera).

Nonetheless Microsoft managed to use CSS to achieve a great deal of user lock-in by refusing to conform to the standard they co-authored.


Well then that's clearly not EEE. There can be more predatory market strategies than just EEE.


- Embrace the web with Internet Explorer.

- Extend the web with CSS as an open standard.

- Extinguish competition by being the main browser and not following the CSS standard.

Still follows the recipe quite well in my opinion.


Exactly this. It could be either one, but M$ has done zilch to earn goodwill from the community.


The features of LSP are exactly the features of VS Code. So if you create a different editor, it will only support what VS Code does. (Unless you put in per-language effort like the olden days). If you create an LSP, it will work best in VS Code.

As well, Microsoft is building a wealth of VS Code-only features and LSPs (locking out VSCodium), by the end, they will have taken from open source and not given anything back.


> The features of LSP are exactly the features of VS Code. So if you create a different editor, it will only support what VS Code does.

This is not true since LSPs can add extensions to the existing API. For example: https://github.com/rust-lang/rust-analyzer/blob/master/docs/...

> If you create an LSP, it will work best in VS Code.

Any editor can work just as well as (or even better than) VS Code.


To be cynical, I guess we're at the "embrace and extend" part; "extinguish" hasn't happened yet, but it could. Microsoft controls the LSP and VS Code is very popular; thus Microsoft has all the leverage in the LSP space. I'm not sure what "going rogue" would look like, on Microsoft's part, but with all the power, it could do so without adversely affecting itself.


To be more cynical all products from all companies are in the “embrace and extend” phase


> They could've simply integrated TypeScript into VSCode using a proprietary protocol to keep people locked to it.

This was something that had puzzled me for a while until just today when reading another post on the original linked article's site.

It seemed odd that after MS had used Atom/VSCode to essentially "suck all the air out" of the FLOSS/free editor space that they would then "share" something like LSP which seemed to then make it easier for other editors to (at least potentially) offer a similar standard of support for languages that had LSP implementations.

(Because from a purely technical POV I do see LSP as a positive development in the editor space.)

But what I'd apparently missed was something mentioned in an earlier "Why LSP?"[0] post by the same author:

> "[...] Microsoft, who were a vendor of both languages (C# and TypeScript)

> and editors (VS Code and Visual Studio), and who were generally losing

> in the IDE space to a competitor (JetBrains)."

and, so:

> "...launched LSP to increase the value of their platform in other domains

> for free (moving the whole world to a significantly better IDE equilibrium

> as a collateral benefit)."

Thus, assuming this is accurate, the actual target in this case was JetBrains and "sharing" LSP was seen as a reasonable "cost" in order to get additional leverage.

Given that I haven't generally paid attention to any of the C#/TypeScript/JetBrains worlds that was an angle that I'd missed.

The assumption on my part that "sharing" LSP still wasn't likely to be an "altruistic" act on the part of MS was primarily driven by the fact that they hadn't really given up on proprietary/control they'd simply moved the demarcation point to be at the LSP implementation level (e.g. the plugin for Python) & "marketplace" access (e.g. remote plugin) etc (e.g. telemetry).

And by defining/retaining control of LSP they've inserted themselves in the path between other editors & LSP implementations which inherently has value (e.g. by being the "reference"/default implementation[1]).

Anyway, that was an aspect of the context for LSP that I wasn't aware of before today, so thought it might be of interest to some.

(Obviously the majority of today's developers see no issue with the actions of MS in any of this, so, if that's you feel free to agree to disagree & leave the rest of us to tilt at windmills in peace--after all, why would we stop now? :D )

[0] https://matklad.github.io/2022/04/25/why-lsp.html#Why-LSP-is...

[1] Which is one of the reasons I'd really like rust-analyzer to one day change to target an editor implemented in Rust by default. VS Code doesn't need any more value added to it for free but any Rust-based editor would benefit from being a "First Class" rust-analyzer citizen & thus strengthen the wider Rust ecosystem.


This is much better put than I could have said.


You can say a lot about Microsoft the company. You can even take digs at Visual Studio Code and its "telemetry reporting". But LSP is one of the best things that's come out of that whole ordeal. It's far from perfect. But as a Vim user, we've had nothing this good until now. I used TernJS for JavaScript and some other handy plugins for other language specific stuff... and it was such a pain. LSP made things far more uniform. And they gave it away to the community! They didn't force anyone to use it.


I agree. I primarily use(d) Emacs, and while it’s had good support for certain languages for decades, now it handles younger languages like TypeScript and Rust exactly as well as Microsoft’s editors do. That’s freaking magical.


Microsoft does not use LSP for its own language TypeScript. They have something that works better for VSCode: tsserver. When TypeScript evolves, Microsoft updates tsserver and VSCode as needed. To get the updates to Emacs, someone needs to make the respective changes to a third-party TypeScript LSP first. And it won't work as well because LSP isn't as good.


typescript-tools for NeoVim uses tsserver


Ugh, I didn't realize that.


Visual Studio was a meme?

It is still top IDE matched by maybe just JB


Because up until (relatively) recently, you couldn't even compile code on Windows without VS.


I was going to say-- visual studio is phenomenal. The profiler is incredible to use.


I feel like the RPC protocol (minus a good text representation) that LSP needs is ONC RPC. It's ancient, to be sure, but it is stupid simple to implement either with a ton of boilerplate or with a compiler to generate the structs and serializers/deserializers for you.


I've read a bunch of articles about LSP, and some of the docs, but still not entirely sure what it is. I mean, abstractly I understand, but to truly understand what it's capable of doing and when I might need to use it... Does anyone have any good pointers?


At a high level, it’s a standard way for an editor to ask another program for information about a source code file: What are the function definitions here? What are the available completions for this text? What errors are there? Can you reformat this in standard way? Where is the thing one line 23, row 41 defined?

Instead of every single editor having to understand C and Python and JavaScript and Rust and SQL and Ruby, they only have to know how to talk to a language server using a standard protocol. And if you’re writing a new language, you can make a language server for it, and then every editor that knows how to use LSP can be used to comfortably write code in that new language.

The old way is that M editors want to support N languages, someone has to write M times N language parsers. With LSP, they can write N. That’s way less work.


LSP stands for Language Server Protocol. It's a communication protocol between the clients (text editors, debuggers, source viewers, etc) and the language servers.

Typically a language compiler runs over the source files and terminates once it's done. All the compilation states are gone. The language server is the compiler itself running for a long time; its aim is to maintain the compiled states of the source files alongside of the editing session in the editor client. It's ready to answer requests from the editor in regarding the source code.

E.g. The editor loads a source file. It sends a LSP request to the language server saying the file is opened. The language server would compile the file and maintain its compiled states. The compilation process might involve other source files as well. The language server reports any errors or warnings back to the editor. The editor shows them to the user right the way.

The user modifies a function and hits save. The editor sends a file-changed request to the language server. It recompiles the file and sends back any errors or warnings.

The user places the cursor at a function call and issues the jump-to-definition command. The editor sends a LSP request to the language server to look up the location of the definition of the function. The language server has all the compiled states of the source files and knows where the function definition is. It sends back the target location (file and line number). The editor can switch to the target file and jump to the line number.

LSP defines a whole bunch of these requests and responses (e.g. looking up the type of a variable, definition of a value, auto-completion suggestion of function parameters, etc). A language server for a language can implement these and suddenly all the clients that talk LSP can utilize the features.


Thanks for the very clear explanation!

Where does the language server typically run? Does it run on some remote server or as a separate process next to your editor on your local machine?


The language server typically runs in another process alongside of the editor on the same machine. Usually it is the editor that spawns off a new process of the language server and connects to it. The editor can launches multiple language servers, one per a project of a particular language.

E.g. when the editor opens a Rust file, it would launch the Rust language server and connects to it. When it opens a Javascript file, it would launch the Javascript language server and connects to it.

Running the language server remotely would require more setup. The editor and the language server need to look at the same file, so you have to sync the changes to the file locally and remotely somehow. Also the editor might not be able to automatically launch a remote language server. You have to start it manually and make the editor connect to it manually. It's just not a smooth UX.


Honestly, you should read some of the docs [0] if these are the sorts of questions you're asking.

[0] https://microsoft.github.io/language-server-protocol/


> debuggers

That's where the sibling protocol, DAP, comes in. Rest of this is a good simple explanation on what LSP does, otherwise.


Yes, DAP is another protocol for remote debugging. I was thinking for debuggers that can already debug locally but want to show type info or various definition of entities in the source code.


For a more succinct answer, I use it for three things: jumping to the definition of a symbol, auto completion of symbols and keywords and showing documentation.

None of this is new. We've had TAGS tables forever. Certain editors would provide the rest for certain languages in language-specific ways. LSP is just a general solution to the any editor/any language problem. It seems to work at least as good and often better than the custom solutions before.


> It seems to work at least as good and often better than the custom solutions before.

The standardization that LSP brings is nice, but from a user's point of view it's pretty terrible compared to existing custom solutions. At least for the few languages I tried; Scala and Haskell. The feature set supported was very small compared to what existing integrated editors supported, it was slower and less reliable.


Yeah, I should have added for some languages. For me, Python is better with LSP than it was with Elpy, the custom Python support for Emacs.

But it's certainly not going to beat something lime SLIME for Common Lisp and obviously not editing Emacs Lisp within Emacs.


Yeah, I was going to come say something like this. LSP is much better than the status quo for most languages, but SLIME is so much better there's just no comparison.


What do you typically program in?


I have a question about the point on highlighting in particular. With a subscription-based model, don't you lose out on the ability to partially highlight a file based on the visible ranges in the editor? Unless you re-subscribe to the highlighting whenever the visible ranges change, the language server wouldn't know which slice(s) of the document to highlight after a change comes in.


I've always found it interesting that MS basically started LSP but C# has probably the worst LSP implementation I've ever seen. It's flat out broken.

Rust, Go, Dart, etc all vastly superior. I recently opened up a powershell script in nvim and even powershell's LSP is better.

I guess there is a new C# LSP with C# Dev Kit and it seems better in vs code but I haven't tried to use it with nvim yet.


Jonathan Blow criticized the performance and complexity of LSP in his talk at DevGAMM 2019 titled "Preventing the Collapse of Civilization" (timestamp 42m26s; https://youtu.be/pW-SOdj4Kkk?t=2546):

> In the programming language world, there's this thing called Language Server Protocol that is pretty much the worst thing I've ever heard of. There are proponents of this all over, building systems for it right now that are going to be living on your computer tomorrow, or maybe even today. As far as I can tell, it's basically a more complicated, slower way to do libraries.

> Say you've got an editor for some programming language and you want to be able to do stuff that we've been doing for decades already. For example, look up the declaration of an identifier by clicking on it, or have tooltips that say, "What type is this value?" Well, they say the way you should do that is—you know, you have your editor and then it's a hassle to make plugins. This is the made-up problem. It's a hassle to make plugins for all these different things. So in order to standardize, you're going to run a server on your machine. Then your editor talks over a socket to the server, and the server talks back and gives you the answer. This approach has now turned your single program into a distributed system.

> The flaw in this whole line of thinking, that none of these people seem to actually think about, is that there's nothing special about looking up the location of an identifier in your code. That's just an API, like we have all the time for everything. So the obvious next step, if you're saying that we should architect our APIs like this, is to do this for other tasks. Now your editor, or whatever program, is going to be talking to multiple of these things. If you ever want to author anything for this, you now have to author and debug components of a distributed system where state is not located in any central place. We all know how fun that is, right?

> But of course, libraries are not that simple. Libraries use other libraries. So what happens at that point is you're running all these servers on your system, and who knows, some of them are going to go down and have to restart. People are synchronizing with each other—no, this is a disaster. And people are actively building this right now, while we're spending all this time overcomplicating stuff that we used to be able to do in 1960.

(Transcript produced by asking GPT-4 to clean up the raw output from youtubetranscript.com: https://chat.openai.com/share/e1bc0e87-f79e-4958-98e7-b93c1b...)


I don't care how many credentials Jonathan Blow has, this take is pure bs. "we're spending all this time overcomplicating stuff that we used to be able to do in 1960", yet he takes the time to explain "tricks" on how to rename variables [1], while modern tools (not necessarily LSP) let you fly through the code [2].

This is just an "old man rants at cloud" take.

1. https://youtu.be/2J-HIh3kXCQ

2. https://youtu.be/AxxNHKCldzA


The environments we work in shape our biases and what we find good and bad.

Game programmers spend their careers writing C++ where, yes, you really need tricks like this to rename the variable, as the language is so fucking insane that ALL tooling gets it wrong - and the consequences of getting it wrong can be disastrous.

The most widely used options for C++ are VS's IntelliSense and clangd. IntelliSense is utter garbage, I never trust it for renaming things. clangd is better but can still get things wrong if your build process is complicated your compile_commands.json doesn't perfectly reflect it.

Managed languages have fantastic tooling with perfect context of your project, but they're also managed and are not applicable in a lot of environments.


> clangd is better but can still get things wrong if your build process is complicated your compile_commands.json doesn't perfectly reflect it.

Yes, clangd is accurate but when you have different build variants only one is reflected in the current compilation database. clangd will be perfectly accurate for this current variant, but blind to others.

Would clangd support a merge of several compilation DB into one? If a file appears several times with different options (typically include paths and defines), would clangd handle all variants in parallel or just pick one (first or last)? I haven't tried (yet ;).

It's manageable, and sometimes I'm reverting to pure "text level" changes or searches to work around this.


rust-analyzer renames over LSP are perfect the vast majority of the time. The dichotomy among static languages isn't managed vs. unmanaged, it's C++ vs. not-C++.


There's some truth to this! I strongly encourage to architect language servers as libraries first and foremost, with LSP being a relatively thin layer on top of protocol-agnostic library, and only _one_ of the ways to consume the results of the analysis.

At the same time, libraries, sadly, do not actually exist. I can not easily cook up an `.so` and then hook that up with Emacs, Vim, VS Code, and Helix. Moreover, any crash in an `.so` library brings down the whole process, which isn't a good idea for an IDE.

Maybe several years in the future we get an actually good implementation of libraries (I have high hopes for WebAssembly component model), but until then, text-over-stdio can actually be better in practice for some use-cases!


Yes. LSP is the worst thing in the world. But nothing better exists for this use case, so I'm glad that LSP was invented and I can enjoy working C++ completion, go-to symbol and other nice IDE niceties in emacs.


I don't get his point? It basically amounts to "nuh uh, this is just bloat".

>What type is this value?" Well, they say the way you should do that is—you know, you have your editor and then it's a hassle to make plugins. This is the made-up problem

Ok so it's just a made up problem? Like, the "plug in" part or the need for having tooltips/completion etc? If he was referring to the perceived need of a plugin (vs. a more integrated) architecture being just due to a made up problem, the linked article in this thread shows this is absolutely not the case. I'm not sure it's better to reimplement the same features every time for every single editor?

And if he meant that needing IDE features is the made up problem, then I guess... lol ?

>People are synchronizing with each other—no, this is a disaster. And people are actively building this right now, while we're spending all this time overcomplicating stuff that we used to be able to do in 1960.

I get the "old good" aesthetics he always aims for, but come on. Even the most charitable interpretation of this part is ridiculous. Like sure we probably had some IDE features back then, but that's not the point of LSP. The point is standardizing said features. And specifically, the interface to access them without knowing anything about the editor or how the code is written.

I'm sure your proprietary custom made computer was able to get or provide code completion from other ultra closed systems in the 1960s. It would've been helpful for him to be more specific about what exactly they were doing back then. Again, the whole point is to not have to reimplement a sort-of-compiler for every editor and whatever quirks it comes with.

He is not even doing the regular "YAGNI, it's bloat", it's just downright weird and devoid of actual arguments imo. I'm not even saying he needs to propose a better way to do it, because he doesn't even seem to acknowledge the point of language servers


He's saying (without saying it) that LSP is another instance of the "microservices everywhere" hype, and bad for the same reasons -- you have a problem, you try to solve it with a distributed system, now you have two problems and one of them is a set of microproblems.

I don't think he is arguing against a standardized interface to language-specific logic, but arguing for that standard to be a library interface, not a distributed system.


He doesn't say why a library interface would be better. That's why I'm saying he doesn't even address the main point of LSPs, which is that they came in response to have your IDE or editor maybe-kind of plug into some libraries and wish for the best. Just saying that a library interface would be better because it's less complex is meaningless, when we obviously have not managed to get anything close to that for the past 40 years.

It's not super convincing to just dismiss the only solution that seems to have sort of worked by saying that we only had to keep doing it the way that never really worked.

I'm sure the implementation can be a lot better, but he does not provide any argument for why the architecture itself is bad. Sometimes abstractions are actually useful.


I also think it's a bit overblown to call communication with a locally running process a "distributed system". I mean, that's kind of true, but not really any more true than if you were to call having multiple threads in your process a "distributed system".


It introduces _some_ of the failure modes of a distributed system, like unsynchronized / inconsistent data, but not all, like network failures. I think he does have a point there -- as a user, I should not have to care whether an IDE talks to a library via its API or to a language server via LSP, but here I am, watching WebStorm to act up and complain that the TS language server crashed.

One thing that Blow misses IMHO is to what extent libraries introduce these failure modes _too_: A library can crash as well (and take down the IDE with it), and multiple libraries that deal with the same data can get inconsistent. That's actually a problem of quality control, integrating code from multiple third parties, and so on.


Security, too. If I’m editing a .env file with a bunch of passwords in one buffer, or maybe some sensitive data I copied and pasted into an empty text window I’m using for temporary notes, I don’t necessarily want some random code written by some 3rd party running in the same process space. I mean, most of userland is pretty much written by various random people already, but I might trust a language server to tell me that I forgot a semicolon in a shell script without trusting it with the contents of /etc/shadow.


It’s true in the sense that every invocation takes arbitrarily long and can fail, which isn’t really true for normal libraries.


The reason is probably that dependency management gets much easier (or at least can get much easier). E.g. using any module in VSCode that requires native code (sqlite, treesitter etc) will have to work with that specific version of NodeJs for theat specific Electron version. Also quite a few languages already got implementations of parsers in their native language, porting them over would be a big effort.


I agree, and mentioning libraries seems like a non-sequitur to me. It’s handy to be able to integrate a language server written in Lisp with an editor made in Swift, for instance, without coming up with a way to link them as shared libraries. That sounds like a nightmare.


He is trying to say that LSP should be implemented as a set of libraries (like liblsppython.so). Your editor compiles with these libraries and provide API for plugin authors.

It's not unreasonable to expect an editor's developers to know how to link a native library.


That’s the part I vehemently disagree with. There are servers written in a whole lot of languages. I’d bet there are Java servers written in Java. How are you going to link that into an editor written in C? What if the server’s written in Python? A shell script (crazy but legal!)? And even if you solve all of those for C, what if you want to write an editor in Swift. Now you have to implement all those shared library linkers again in that language.

Oooorrrrr, you could communicate with another process via stdin/stdout and call it a day.


> There are servers written in a whole lot of languages.

I believe his opinion is that there shouldn't be LP (since he's against to use servers I remove the S from LSP) written in language that can't be compiled to xxx.so. We're talking about Jonathen Blow after all.

It make sense if you view LP as a part of an editor. But in our reality the language servers are often maintained by that specific language's community, not the editor devs, so it's probably not the best idea to expect them all to be willing to write in low-level language.


Two managed languages will be hard to integrate, but if the library is written in C or in a language with a C ABI, is possible to call it from pretty much any language. The question is whether you would want to run it in the same address space as your editor even if you could. Clang is more stable these days, but I still see clangd crashing from time to time and it would be very annoying if it were to take down my editor every time.

So in the end even if you had the option of running the language plugin within the same address space, probably you wouldn't want to. A better RPC protocol than JSON would be nice, but a standard is better than no standard.


Use COM or something like it


What if you are not on Windows? Or if you want to remotely run the language server?


You can implement it ("or something like it") on any system. In fact there is a plenty of those. Remoting isn't impossible either (e.g. DCOM, CORBA, GRPC).


Why not just use a server then? Surely that's less complicated than implementing or even just using CORBA/DCOM?


No it is not (well it might be with antiquated CORBA/DCOM stack but these are just examples, something newer like GRPC would be easier, the point is that it's solvable).

Also the original complaint was about explosion of language combinations and I just pointed the solutions to this are well known.


Frankly, if your language can't compile a .so/.dll that can interact with the rest of the system in a sane way, use a better language.

I used to disagree with Blow on this point, as stdin/stdout communication really has the arguable advantage that you can write your Brainfuck LSP in Brainfuck, but I've spent way more time than I find reasonable over the last few years dealing with the consequences of the LSP model - tweaking environment variables, paths, command line arguments for the server, server crashes, mismatches of server version vs the compiler, the client picking the wrong server executable, system-provided language vs downloaded by VS code extension......

I'd rather just drop a dll in Plugins/ and be done with it.


> Frankly, if your language can't compile a .so/.dll that can interact with the rest of the system in a sane way, use a better language.

Feel free to only use languages you personally find aesthetically pure. The rest of us will continue getting work done.


With a DLL, a crash in the server becomes a crash of your whole editor.


Sounds like a cultist who people listen because of their need to be angry at someone. Even after cleaning, this is a word soup with little substance.

LSPs solves two "made-up" problems. Editor and LSP team can be different and independent. Go teams work on LSP without actively collaborating with VSCode team for feature and release. Second, both can run at different places, it allows for things like devcontainer and Codespace. code is built/run on some remote server while IDE is running on local.


He was wrong on this issue. LSP reduces the MxN problem to Mx1 <-> 1xN, i.e. to M+N.


Assuming the MxN claim was accurate, yeah. The author of this post wrote a prior one [1], challenging whether MxN was truly the problem LSP was solving. It's also a good read!

https://matklad.github.io/2022/04/25/why-lsp.html


Any multi threaded editor is already a "distributed system" by this argument. Maybe he likes his editor to lock up while loading an autocomplete list, but I am against this.

If you are going to have a background thread that calculates results for autocomplete, then it helps to give it a protocol.


Very interesting article and comments!


"How" did the word "how" end up in the HN title for this?

Spin?


What the hell is LSP?


I think it's https://en.wikipedia.org/wiki/Language_Server_Protocol but those words definitely do not exist in the post


It's a shame the parent comment was downvoted and this is currently languishing at the bottom of the comments.


If you click the "Why LSP?" link in the first sentence of the article, or type "lsp" into Google, you'll get your answer:

> The Language Server Protocol (LSP) defines the protocol used between an editor or IDE and a language server that provides language features like auto complete, go to definition, find all references etc.



Article title: How Lumpy Space Princess could have been better

Article content in full: She couldn’t.


Can't improve on perfection.


My first thought was Liskov Substitution Principle



Why the downvote? It took me quite some reading before I understood it, given that LSP can refer to multiple things. I, too, thought about the Liskov Substitution Principle first.


An article about LSP should really start by defining LSP.


Wow. There was a tiny typo. I found myself thinking, someone who reasons like this should have a protocol for fixing typos, if they eat their own dog food.

I was floored to see the "fix typo" link at the bottom. Submitted. I now wonder if the typo was on purpose.


This is a strange position to take. "Someone who uses logic and reasoning should churn out pristine, flawless work."


You misunderstand me. I am amazed by the author's attention to robust protocol design. No one wants to be "that person" who observes a missing space before an open parenthesis, so usually smoothing out flawless work is not a collaborative process.

I found myself daydreaming how blogs would work if the author turned their brilliant attention to the process of blogging itself. They would solve such a problem, to encourage collaboration.

I was therefore stunned to see a "Fix Typo" link at the bottom. Perhaps this has become widespread, but I'd never noticed it before, and the logic of having this author provide such a link amazed me.

Part of what we do in comments is appreciate various aspects of a work. I was appreciating this aspect of this work.


Ah I did misunderstand, my mistake :)


No, I am just spelling-impaired (even in my mother tongue) :)


I was expressing amazement that you took the time to provide a "fix typo" mechanism.

In the real world, many of us would take the time to fix a splinter in a well-traveled floor. I love that your appreciation for robust protocol includes a mechanism for moving the digital world in that direction.


As long as you keep writing, its all good.


I never understood LSP in a good manner. Can someone explain it to me and how it differs?. Shouldn't the protocol be a standard way to implement things like completions for different languages?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: