Syntax highlighting is a waste of an information channel

megameter · on July 21, 2020

This kind of nibbles at the edges of the longstanding question: Whether source code is best represented as a linear document.

Because historically, that was true for the obvious reasons of pragmatism; you could put it on a punchcard, print it to paper, save it to a file all using the same character encoding and rendering methods, and when we went to interactive editing that didn't change: half of the joy of plaintext is that you can select, copy, paste, search and modify in mostly standard ways across many tools.

Syntax highlighting is incrementally in line with this paradigm since there's only one syntax for the language(syntax extension and mixed-language documents aside) and it too follows a linear order, although it also recognizes that an AST exists. But as you get into stuff like code folding, Intellisense and other IDE-like features, deviations emerge. More of the selections deal with symbolic meanings(declarations, types) and not really the tokens or the characters they consist of, and only incidentally the delimiters of the AST.

But - when we try to fully re-envision it and express these other concepts directly we tend to end up with something that is unfamiliar, a bit clunky and alienating(most graphical languages). So the accepted developments always seem to come in these little ideas that are mostly-optional add-ons.

Chuck Moore did use semantic color as a language feature in ColorForth, but his specific goals lean towards a holistic bootstrapping minimalism, so I haven't seen it taken up elsewhere.

geofft · on July 21, 2020

One of the nice things about code being text is that you can copy and paste unparseable subsets of code without anything getting in your way. For instance, if you need to move an if/else out of a function, you can move the if statement, reindent the body, and then move the else block. If you have syntax highlighting on, it might briefly mis-highlight (e.g., it may not know what to do with the else block when it's briefly unpaired) but it will let you do it, and it will fix itself once the program is properly parseable again.

Compare with, say, a WYSYWIG rich text / HTML editor. If you want to reorganize a bulleted list, it's many more steps, because the tool isn't really set up to accommodate a momentarily-invalid state. I think that's the big difference between syntax-highlighting something that fundamentally remains text and switching the representation to not-text.

This also lets you do various sorts of text manipulation like conflicted merges. There are lots of arguments for an AST-aware merge, but in case of a conflict, a text-based merge system that inserts standard conflict markers will still usually leave you with a file that parses well enough to be syntax-highlighted, even if it won't compile with the conflict markers still in place. Or even imagine converting a program from one language to another (e.g., a shell script that outgrew shell). You can stick the invalid code in your text editor, ignore the highlighting, and turn it line-by-line into the new language.

I think all the highlighting examples in this article work as overlays, just like syntax highlighting, so they'd work acceptably well in these briefly-invalid states. In some cases it'll fail to highlight or it will need aggressive error-recovery that would be dubious in an actual compiler or interpreter (imagine, say, "if you see a conflict market, skip the <<< portion and parse the >>> portion to find what locals exist and what their types are, then go back and try to highlight the <<< portion"), but since the highlighter isn't the real interpreter, that's fine.

gwillz · on July 21, 2020

> the tool isn't really set up to accommodate a momentarily-invalid state.

I've struggled to explain to students, teachers and others my frustrations with anything that isn't plaintext code. This is it - thank you!

I think it's also a neat learning concept about why it's important one is _able_ to make mistakes when writing code. So many are overwhelmed by the flexibility or fragility of syntaxes but there's actually a lot of power in that.

WorldMaker · on July 21, 2020

Years back I had a mentor of sorts very strongly convince me that "degenerate" cases where code doesn't create a valid AST from the perspective of code editing and source control standpoint should be considered something of the "default case". We spend a lot more time on work-in-progress code than we ever do finished compiling code. Invalid states aren't often as "brief" as we think they are, and there are far too many reasons why you want to be able to save and even source control work-in-progress code (including things like "it's the end of the day and I want to make sure I have this backed up" and "maybe my coworker can spot why this isn't parsing because my tired eyes are not seeing it").

> If you have syntax highlighting on, it might briefly mis-highlight (e.g., it may not know what to do with the else block when it's briefly unpaired) but it will let you do it, and it will fix itself once the program is properly parseable again.

This intuition that syntax highlighting token streams already are the most generic "semantic" tool we have readily available, are very resilient to work-in-progress states, and are very fast (because we use them in real time in editors), led me to experimenting with a token-stream based diff tool. [1]

I got some really good results in my experiments with it. It gives you character-based diffs (as opposed to line-based diffs) better (more semantically meaningful) and faster than the other character-based diff tools I compared it to. You could probably use it as diff tool with git projects today if you wanted, but it would mostly just be a UI toy as git is snapshot-based rather than a patch-based source control system. (I explored the idea curious if might be useful to patch-based darcs. Darcs kept exploring the idea of trying to implement a character-based patch format in addition to or in replacement of its line-based patch format, but so far as I saw never did, but if it did, this tool would potentially be quite powerful there.) It's a neat toy/experiment though.

[1] https://github.com/WorldMaker/tokdiff

dragontamer · on July 21, 2020

> Whether source code is best represented as a linear document.

But source code is NOT represented as a linear document. At a minimum, source code is represented by dozens, hundreds, or thousands of files across a file system.

As such, source code on a filesystem is a highly-connected graph. Source code is hypertext: classes, functions, and data-structures take you to different files in your directory tree.

Take some ASync code you have, and tell me: how many documents / files do you have to read before you really know what the async code does from beginning to end? There's nothing linear about our code layout actually.

--------------

If your question is: "Is an individual code block best represented as a linear document" ?? Then I would argue yes. At a micro-level, code executes from the beginning, and ends at the bottom (except for calls and jumps). But calls / jumps are well represented in our "implicit graph" (function call to another "document" in our file system)

-------------

I guess there's an open question about loops: they do NOT progress from top-to-bottom, but are still represented in linear form. Maybe there's a new representation you can have for them. (But maybe that's why "recursive" calls are so powerful: a "recursive" call actually matches the graph-based indirection of our code).

jtwaleson · on July 21, 2020

Code IS represented to both the programmer and to the file system as text files. I think your point is that this does not do justice to the complexity of most code bases, and with that I completely agree. There should be much better ways of presenting the actual complexity to programmers.

XorNot · on July 21, 2020

You can extend this further: git is the wrong representation model for code (text).

It's a useful representation model, but really only because programming languages are mostly designed to be line-oriented (though not enough: i.e. Python should disallow multiple imports on one line because it lightly breaks diff-viewing). Git is the dumbest possible thing that can work, and it does work, but it's also wrong.

What we really need is a generic way to store graph-structured data (which is what an AST really is) into something like git. Because then we could finally get rid of the notion of code-formatting as anything but a user experience peculiarity (the rise of "push button and blackbox fixes code for commit" elucidates why this should be).

But more importantly, it would mean we could reasonably start representing commits with some notion of what they are doing to the control flow and structure of the code. Think git's rename detection but it notices when a function is renamed, or can represent refactoring in a diff by showing that all the control linkages of one block of code have moved to a particular function (and by extension, diffs would now implicitly show when new links or dependencies are added).

The trouble of course is doing any of this generically, is an unsolved problem. I have an idea that you could probably do something interesting like this with git's existing structure and a language like Golang, where you decompose the program into a bunch of files representing the hierarchical AST and commit that rather then actual text maybe.

steerablesafe · on July 21, 2020

> What we really need is a generic way to store graph-structured data (which is what an AST really is) into something like git.

git can already store, diff and merge tree-structured data: directory trees. It would be an interesting experiment to encode various tree-structured data as a directory-tree in git and see how it behaves on different version control operations.

saurik · on July 21, 2020

I mean, git doesn't actually track diffs, so the diff/merge part is separate, and already supports plugins: just write a syntax-oriented diff/merge for git and tell git in your configuration files to use it for .py files or whatever... this isn't a git limitation.

lstamour · on July 22, 2020

Git actually does ship with parsers for different programming languages, I believe, because when you do a diff, it captions the changed lines with the function call they're from, etc. That said I've also used third-party "semantic diff" tools (generally for Windows) that have integrations with git. It's really nice when Github can't merge something, but you can, easily, locally, with a semantic merge/diff tool called from git.

_y5hn · on July 21, 2020

Since golang includes sourcefile parser, it is possible to leverage those libs to reason about and automate stuff regarding sourcefiles. See "go fmt" for a start. Go sourcefiles can just be a storage layer for higher-end purposes then.

What is suggested in ancestor posts is to change storage format, although I think it better to prototype something like using the above first. Integrating into the object model directly, I suspect one would in the end want to design/evolve a language to build a platform specifically for that. You might end up with something similar to Smalltalk, so would need to think if that's the goal or if there's more to accomplish.

The higher order question is: What concerns need to be integrated, and what concerns need to be separated? Integration may bring new powers, but also risks of evolutions into "big ball of mud". Decoupling may bring freedoms and independency, but also risks of lack of coherency and unoptimal couplings.

What are the benefits of the current paradigm, and what are the disadvantages? CVS and text-files have worked for a very long time, is cross-platform and works beyond any single project scope. Maybe the golang-approach reaches some sort of optimal equilibrium.

jtwaleson · on July 21, 2020

You hit the nail on the head. I would love to create a PoC language/IDE/VCS that solves some of this. Code is very multi-dimensional and there are more ways to view the content than a text based file tree.

password4321 · on July 21, 2020

> notices when a function is renamed, or can represent refactoring in a diff

Maybe this tool is moving a step towards this goal: https://www.semanticmerge.com/

rahoulb · on July 21, 2020

Interesting that this has come up on the day when Pharo has also made it to the front page.

Smalltalk uses an image and holds live objects in memory at all times - there's no distinction between the source code representation and the running representation of the code. This allows the "IDE" (which is a misnomer because the development environment and the live environment are one and the same) to introspect on live objects and perform analyses on them.

TL;DR; Smalltalk has never used text as an internal representation and that removes the separation between the source code and the debug or live environment.

Gibbon1 · on July 21, 2020

Code as text had long bothered me because of my experience with various schematic capture and pcb layout tools. With those the design is stored in form that can be queried directly. And you can change object properties programmatically.

I feel like the insistence of text based programming languages just leaves the industry completely mired in the mid 1970's.

treeman79 · on July 21, 2020

I’ve had to deal with several graphical programming languages over the years.

They only work on the most basic of use cases. The moment you want to get real work done they just get in the way.

jtwaleson · on July 21, 2020

Very much disagree, and completely dependent on what you define as "real work". There are many low-cost platforms out there that allow for very rapid development of big projects.

Disclaimer: I used to work for one of them (Mendix)

treeman79 · on July 21, 2020

Looks like a modern Visual Basic 6. Neat.

VB6 was so horrible and productive at the same time. Single person could build an app in days that a team of 10 would struggle with today.

And a group of 10 could bring said app to a stretching halt.

So many memories, good and bad.

jtwaleson · on July 21, 2020

Indeed :) I'd say the biggest problems with platforms like these are 1: meaningfully diffing (visual diffing is an unsolved problem) and 2: the enterprise pricing.

treeman79 · on July 21, 2020

Lock in as well.

Was at a company that invented a visual programming tool and forced development to use it.

Several flat out refused. We all went on with our careers. The people that used the proprietary system still work there, not then a decade later. They hated the tool then. But they have no marketable skills set at this point.

Gibbon1 · on July 21, 2020

You completely misunderstood my comment.

Edit: Stay classy HN

jtwaleson · on July 21, 2020

It puzzles me too. I think that portability and the ability to use the same text-based tools even though you use different languages are the best reasons for sticking with text.

Gibbon1 · on July 21, 2020

I think is a feeling of loss of control and having no clue what you're missing. Programs stored as unstructured text make it difficult to produce good tooling. Makes programmatic code gen hard, makes diffing hard, make automatically generating commit message impossible. Refactoring is always problematic. Merging even more so.

But it seems though even mentioning that perhaps structured data is a better way to store code makes most programmers really angry. At least that's my experience.

weltensturm · on July 21, 2020

There will never be anything that is as portable as text, and I think that is the sole reason we will never see any other widespread method of storing "code".

Imagine having to update to a new language version if the code is blobs, I feel sick already. And all the language vendor specific software you'd need just to do anything with it in general.

jtwaleson · on July 21, 2020

This might surprise you, but a huge amount of business logic is already stored in databases etc rather than in text based code repositories. There are lots of enterprises that have platforms (e.g. Salesforce) that allow you to build things without code.

piaste · on July 21, 2020

Databases can usually be exported as, and rebuilt from, some kind of plain text format - be it CSVs, JSON, DDL+DML scripts.

Of course it's less efficient than using the vendor's native dumps. But it's still important that the option exists, because it means that, in a worst case scenario, you can unfuck your data with text-based tools ranging from notepad to enterprise distributed buzzword ETL.

kevincox · on July 21, 2020

You can come up with a plain text format for any data. The more interesting question is the day to day use.

Right now for most programmers we general work in an editor which does little over the text representation and sometimes edit the first directly as text. We also review code almost entirely as a diff of text. The only exception to this is that we break the text into multiple files and handle each somewhat separately.

This is in start contrast to SAP or Salesforce where the primary interaction is with a specialized UI which treats the "code" more as a graph than text. And dumping to text is primarily used as a backup and is rarely interacted with in that form.

jtwaleson · on July 21, 2020

You can always figure a way to extract the logic, you will just need the runtime to execute it afterwards ;)

weltensturm · on July 21, 2020

These things are very vendor specific and proprietary, I was more thinking about general purpose programming languages.

IBorodin · on July 21, 2020

Also SAP stores a lot of business logic in the database.

dogfoods · on July 21, 2020

Yes, lots of people do things that are bad ideas.

jtwaleson · on July 21, 2020

Portability isn't the holy grail. To create working products, a lot of other things are important too.

infinite8s · on July 21, 2020

You do realize that text is just streams of binary data. Its just our entire ecosystem/tooling has evolved to interpret continuous streams of that binary data as text documents.

feanaro · on July 21, 2020

There is no reason not to store the text as a canonical representation alongside an alternative AST representation. A valid AST can always be rendered as valid source code and valid source code can always be parsed back into an AST.

So if the AST format changes incompatibly, you simply need to let the compiler regenerate it from the source code.

chriswarbo · on July 21, 2020

> there's only one syntax for the language(syntax extension and mixed-language documents aside) and it too follows a linear order

I'm of the opinion that programming languages should have two "official" representations: the usual text-based representation, and a machine-readable concrete syntax tree. The latter should be an existing format, e.g. JSON, XML, s-expressions, etc.

It should be "official" in the sense that it's standardised in the same way as the text format, language implementations (compilers/interpreters) should accept it alongside the text format, and they should have a mode which converts the text format into the machine-readable format (and optionally the other way). This is important, since lots of code 'in the wild' only makes sense after feeding it through particular preprocessors, specifically-configured build tools, sed-heavy shell scripts, etc. such that the only tool that can even parse the code correctly is the compiler/interpreter (and even that might need a bunch of tool-specific flags, env vars, config files, etc.!). This makes tooling much harder than it needs to be, and any "unofficial" workarounds will need constant work to keep up with changes to the language.

I say concrete syntax trees since we want to impose as little meaning as possible on the tokens, since that makes tooling more robust in the face of things like macros/custom syntax, new language features, incomplete or malformed code, etc.

mjbrusso · on July 21, 2020

Many Basic implementations had this feature. The SAVE command saves a binary (tokenized) version of the source code. "SAVE file, A" saves in ASCII format.

rswail · on July 21, 2020

The SAVE command in old BASICs used to tokenize the individual BASIC statements and inbuilt functions, eg PRINT, GOTO, CHR$(). It could also tokenize line numbers. But it certainly didn't do things like tokenize a FOR/NEXT loop or anything that went beyond a line break (eg GOSUB/RETURN).

Just typing those words sends me back too many years to a TTY on a PDP11/10 and when you "saved" a program by:

1. Typing LIST but not hitting CR

2. Start the tape punch and press HERE-IS a few times to get a leader

3. Hitting CR

4. Waiting for the listing to finish

5. Hitting HERE-IS a few times to get a trailer

6. Folding the paper tape neatly :)

Const-me · on July 21, 2020

> unfamiliar, a bit clunky and alienating (most graphical languages)

IMO the largest issue with them is UX, specifically input part of that. Keyboard is very fast and very precise, but it's only good if you're working on plain text or something not too far from it.

waheoo · on July 21, 2020

And I hope it stays this way.

I'll leave the minority report hand waving for you youngins.

Give me a call when the brain implants are ready for prime time though.

falcolas · on July 21, 2020

An AST is directly representable via text - via the very syntax you use to write the ‘linear document’. What’s holding us back is not necessarily the text, it’s the editors which are (generally) only able to edit text, not an AST.

It should be possible (if not easy) to create a zoomable UI that lets you interact with code directly as well as zooming out to view and interact with the code structure in a meaningful way.

kevincox · on July 21, 2020

Maybe the problem isn't so much the text representation but the absence of a canonical representation. If there was a well define canonical representation for any program then it would be trivial to view and edit it in my prefered style without messing up the diffs in the repo.

Code formatters help, some more than others but most are far from producing a canonical format. This means that I either need to edit in the projects prefered style or have my editor carefully track which sections I haven't modified and be sure to write them back exactly as the originally appeared.

solidasparagus · on July 21, 2020

See I wouldn't agree. I use JetBrains IDEs and I never feel as though I am working on a linear document. Between 'jump to definition', 'see usages', and various 'refactor' commands, it is easy to navigate and edit the tree structure.

duskwuff · on July 21, 2020

I don't buy it.

1. I personally find syntax highlighting incredibly helpful as a way of setting off syntactic elements, and incidentally as a way of calling attention to typos (like a missing quote or misspelled keyword). The author seems to dismiss this without any analysis.

2. Using color for syntax highlighting doesn't mean you can't also use color to highlight other useful things (like search results or errors), as the author suggests. Many editors which support syntax highlighting do exactly this!

3. The author's examples aren't particularly compelling. For example, using colors to denote levels of nesting will make code flash between colors as you edit it. Highlighting variables by type feels like a form of syntax highlighting, not something new, and most of the examples of highlighting code which meets specific conditions (variables which are assigned to multiple times, long functions, functions without documentation, etc) all feel like they could be summarized as "linter warnings/failures".

stevehine · on July 21, 2020

VS Code / Visual Studio both have a 'rainbow indent' extension which colours the indented space rather than the highlighted code. But it relies on the code being formatted correctly; rather than anything semantic in the language. It's still useful !

SAI_Peregrinus · on July 21, 2020

And the numerous rainbow bracket extensions for various editors, which color brackets and parentheses differently depending on nesting level.

em-bee · on July 21, 2020

using colors to denote levels of nesting will make code flash between colors as you edit it.

why would that be a problem? the nesting wouldn't change with every keystroke, but only as you type the specific characters that define the nesting. and depending on your editor or your typing habit you may type opening and closing characters first before filling in the content, thus you wouldn't break the nesting structure either.

the pharo smalltalk code editor highlights nesting brackets and parentheses in different colors, and i don't see any flashing of colors.

mercer · on July 21, 2020

I think it's important to separate the various suggestions the author offers from the actual implementation.

Currently, when my cursor is over a bracket, the matching one is already highlighted. I'd love it when in this situation his suggestion to color the brackets would 'activate'.

Or when my cursor (or mouse) is on a variable, I'd like to see the highlighting by type.

I can also imagine having a 'highlighting' bar somewhere on screen where I can switch between different forms of highlighting. When I'm doing a code review, highlight based on interesting conditions (long functions, lack of documentations, etc.).

RHSeeger · on July 21, 2020

A lot of the things he mentions do exist in different places. For example, the parens https://www.emacswiki.org/emacs/RainbowDelimiters

gnicholas · on July 21, 2020

Some people are reacting to the (clickbait-y) title, which is a more extreme statement than what the author ends up advocating. The ultimate argument is that it's ok to syntax highlight sometimes, but sometimes it might be helpful to use color in other ways.

Personally, I find syntax highlighting to be very useful, but I also agree that using color in other ways can be more useful at times. As it happens, syntax highlighting was one of the inspirations for the color gradient text pattern I developed to make reading on screen easier. [1] The existence of syntax highlighting helped me to think about color in text as an information channel that can be used to make certain tasks more efficient, so I'm very grateful for that!

1: https://chrome.google.com/webstore/detail/beeline-reader/ifj...

mmmkkaaayy · on July 21, 2020

If you haven't before, I'd recommending giving Intellij's semantic highlight a look: https://blog.jetbrains.com/pycharm/2017/01/make-sense-of-you...

Every variable name is assigned its own color, so you can track the data dependencies in your function using color. Similar to what the author suggests as "Type Highlighting"

Some might consider it distracting/overstimulating, but I find it both useful and it makes the code look a bit more 'artistic'.

lstamour · on July 22, 2020

Many IDEs, IntelliJ and Visual Studio Code among them, also have plugins for "Rainbow brackets" (the ability to use different colours for different pairs of open and close brackets). WebStorm takes this a step farther and clearly highlights open and close tags in HTML, etc., if I recall correctly.

account42 · on July 21, 2020

KDevelop had this for a long time and I agree it is useful.

matharmin · on July 21, 2020

VSCode semantic highlighting is mentioned by the author, and it does support these type of use cases. It works by first doing normal syntax highlighting, then asynchronously getting the semantic info for semantic highlighting, since it's a much slower process.

The current language-specific implementations do use it to just augment the syntax highlighting, since that is what is what most people would expect. But out of the box it already supports some use cases the author mentioned, such as different highlighting for imports or constants. The full AST and type info can be used for this (depending on the language server), so pretty-much anything is possible.

What I personally find very useful (and had to hack experimental support for) is highlighting of TypeScript async/Promise values and functions. This is something that a language server can identify easier than humans, and can make a big difference in coding.

adrianmonk · on July 21, 2020

Too much color ends up overwhelming. I can find the red circle super quick because it's the only red thing in a sea of black and white.

But I like the idea of going beyond syntax. Maybe a way to get the best of both worlds would be to have keyboard shortcuts that toggle cycle through different views. Press this shortcut to color-code types, press this other one to color-code error handling, etc.

Also, maybe other visual depictions might be useful. Maybe parentheses could start off very tall and get slightly shorter as you go a layer inward. Or add a 3D/depth shading effect so that code each level of parens literally looks like a level. Or whatever else you can think of, the point being that it doesn't have to be color.

avianes · on July 21, 2020

> Too much color ends up overwhelming. I can find the red circle super quick because it's the only red thing in a sea of black and white.

> But I like the idea of going beyond syntax. Maybe a way to get the best of both worlds would be to have keyboard shortcuts that toggle cycle through different views.

I couldn't agree more. When a line has too many different colors I have a really hard time to intuitively organizing the information. It's kind of a visual cacophony to me. So I use a minimalist syntax highlight to avoid that.

I believe we could come up with a more clever and interactive highlighting system.

A simple example could be rainbow parenthesis which is only enabled when many parentheses are found in a chunk of code. (if it already exists, I'd be glad to know about it)

heapslip · on July 21, 2020

I think too much syntax color is an easy way to shoot yourself in the foot as a developer. When everything is highlighted, and everything is important, nothing is important.

We need to choose what is important, I like to highlight language keywords as grey, function names as blue and numeric values / strings as purple / green. Everything else is white, and it works great.

tree-sitter[0] should end regex-based syntax highlighting, check this out for nvim: https://github.com/nvim-treesitter/nvim-treesitter

I really want to see tree-sitter used to power an awesome linter, that works across languages.

jayjader · on July 21, 2020

I find it interesting that your comment (to me) seems to caution against "too much" syntactic highlighting, while the comparison screenshots in the README for the project you mention seem to indicate an extensive use of colors (compared to "traditional" highlighting, as per the readme).

On a personal note I do prefer the 2nd screenshot to the first (ie I prefer tree-sitter's highlighting than whatever they used to generate the "traditional" image).

In the end I think it comes down to colors being useful to highlight _meaningful_ differences between code snippets, be it semantic, syntactic, or whichever other distinction is useful to the person reading/editing the code at that time. As mentioned elsewhere in these comments, in a language that distinguishes between synchronous and asynchronous functions (à la async/await in Typescript or Python) it could very well be useful to have a different color for each.

Then again, if you're "just" opening up your code files to edit some string "constants" here and there, there's almost no point in having any other highlighting (at that moment/for that action/activity) than "what is string, what is not?". So the holy grail is some kind of highlighting that is aware of your intent (as in, why you are viewing/editing this piece of code), and can adapt to changes in said intent.

heapslip · on July 22, 2020

Yes, I agree it's too much, but that comes from the editor, tree-sitter only provides the metadata (which is way more accurate than the regex-based approach). I'm on a color diet :) https://i.imgur.com/oqCgssd.png

saurik · on July 21, 2020

Interestingly, I strongly prefer the one on the left, as the things the one on the right is coloring extra don't really matter but calling attention to the %s in the formatted strings on the left is extremely useful.

waf · on July 21, 2020

Visual Studio does semantic highlighting for at least C# (they call it enhanced highlighting). Static classes, parameters, internal structure of regex strings, etc. Basically anything returned by the Roslyn Classification API.

I personally like it, but I think other people feel that syntax highlighting plus semantic highlighting leads to rainbow soup.

maxmouchet · on July 21, 2020

Semshi [1] does it for Python and I find it very useful for identifying errors such as missing imports, misspelled variables, etc.

[1] https://github.com/numirias/semshi

deadbunny · on July 21, 2020

Oooh, this is very nice. Thanks for sharing!

Animats · on July 21, 2020

Syntax highlighting shows something that's already visible. What could be highlighted that isn't already visible?

Undefined variables? Those really should get a squiggly underline, as with spell checkers. Some IDE probably does that already.

Anything more subtle?

A key point in language design and program tooling is detecting that thing A, which needs to be consistent with thing B far away in some other module, isn't. Most hard to find defects involve some distant relationship. What could be done with color or other automatic annotation to help?

greggman3 · on July 21, 2020

Undefined variables is already a thing in most IDEs? I know VSCode does it. I can't imagine the others don't.

One interesting thing I've found in JavaScript (and I'm guessing it would be useful in Python) is the CodeSpell plugin for VSCode

https://marketplace.visualstudio.com/items?itemName=streetsi...

It spell checks code by separating camelCaseWords and identifiers_with_underscores. Sure it's not "smart" but for dynamic languages, at least for me, it helps prevent bugs. A static language the IDE would already have the info it needs to highlight a non-existent property but in a dynamic language generally not so I've found it quite useful.

alternatetwo · on July 21, 2020

Visual Studio gives use before assign code pieces a green underline in C++.

em-bee · on July 21, 2020

the code editor in pharo smalltalk does that

jarofgreen · on July 21, 2020

A really good post, but problem 2 is the biggie here:

> What if something needs to be colored two things for two different reasons?

As another commenter points out, JetBrains already do a lot of the ideas this person has raised - but they do it in response to actions. Eg. Put the cursor next to one bracket, the matching bracket is highlighted.

I think the key to breaking this open is an editor that does different highlights at different times in response to user actions; the key will be getting the right user actions so this is seamless and easy to use.

Symbiote · on July 21, 2020

I don't think this is suggested: highlight all uses of a field in a class, or a variable in a method.

IntelliJ does this for the field/variable under the cursor, with a background colour and a margin colour. I'd like to see what it would be like with all variables in their individual colours.

recursivecaveat · on July 21, 2020

There are plugins for various editors using that kind of highlighting scheme if you'd like to try it. Usually it is called "semantic highlighting".

smt88 · on July 21, 2020

JetBrains IDEs do almost all of these already, mostly without plugins. Other IDEs probably do, too.

mdf · on July 21, 2020

Good points in this post, although I'd add the word "Only" in front of the headline.

IMO, syntax highlighting definitely has its uses, but why limit ourselves to only using colors for that? I'd like to see a future, where depending on whether I'm, for example:

- sketching the structure for a completely new program,

- implementing a new feature to an existing program,

- refactoring an existing program,

- debugging a problem in my program,

- writing tests, or

- trying to find something for reference

I'd be able to seamlessly switch between different highlighting schemes. I don't (yet) know what I'd prefer for each of these different modes of software development. The author of TFA lists some ideas for some different ways of doing semantic highlighting. Maybe over the years I'd grow to like some of them for some of these tasks.

JohnBooty · on July 21, 2020

I'd like to see a truly flexible code display system that treats things as a matrix.

As the author describes, you have a number of possible concepts we might like to visually denote:

- Language constructs (ie, traditional syntax highlighting) - Parentheses/bracket pairs - Context highlighting - etc etc etc.

The author considers a single "information channel", text color.

There are other possible, largely orthogonal, information channels.

- Text background color - Font weight (light/bold/regular) - Font face - Font italic / upright - Font underlines (single, double, dashed, bold, squiggle, etc) - Font underline color? - Perhaps font effects (shadow, blur, etc) - The editor gutter (Sublime and others expose this) - Hover state (ie, tooltips)

I'd like a text display system that lets me mix and match inputs and outputs, matrix style. Perhaps I'd like to use font weights to denote one concept, and underlines for another.

Perhaps I'd like to use standard text-color choice for language constructs ala traditional syntax highlighting, and text background color for "context" or "semantic" highlighting (where each variable gets a unique color within its scope)

It would most certainly be able to create some real visual nightmares with this level of control, but applied judiciously, it could be very valuable.

heavenlyblue · on July 21, 2020

Written by an author who had never seen IntelliJ IDEAs highlighting framework.

Yes, it does syntax highlighting, but it also manages to mix it up with local variables, etc.

the_other · on July 21, 2020

I've got a muddled response to this.

On one hand, it sounds really helpful and exciting to have tools assist us with these problems e.g. I've read the "show nesting" idea before and wondered why we don't have it yet. Yet on the other hand, I think the post misses the mark in some ways.

Most IDEs and editors working with most typed languages already do more than colouring in types: they typically hint ahead of entering characters what type of thing should go in each place where a typed thing can go. I think this is more useful than colour, and we already have it.

Taking this further, if our editor has toggles for the categories listed, and thus if it has ASTs (or equivalent things) to work over for semantics, then it probably could do even more than highlighting. Some editors already help you refactor: renaming identifiers across a project, pulling functions out etc. It's not just colouring, it's active involvement with the code, its' types and behaviours.

Further, the author deployed an overly simplistic view of colour and our visual perception. In my case, the "red" circle wasn't much easier to find because it was red, but because it was filled in. If it had the same colour as the rounded squares, and was filled in, I would have found it just as fast. The real trick in the grid examples is that some tool (in this case the author) had identified the significant information and done something about it, before the reader even got to apprehend the situation. Tooling which does this would absolutely be more powerful than tooling which doesn't. We already have tools which do some of these things. The drive shouldn't be "we need more colour?" the drive should be "what features are most helpful to coders?"

Konohamaru · on July 21, 2020

The author raised an interesting point with respect to syntax highlighting for types. I find the convention that literals be typed (ints, strings, etc...) to be inappropriate. That is very low-level and these should always be inferred.

Where typing should exist is at the level the author suggests, such as

* Iterators * List-like structures * Functions that return option types

and other more abstract structures.

minikomi · on July 21, 2020

I've experimented with using various "semantic depth" color schemes when writing clojure.

Depending on the code it can be useful.

Some I tried:

- https://github.com/istib/rainbow-blocks - rainbow blocks, the "OG" - https://github.com/seanirby/rainbow-blocks-bg - rainbow blocks but background style - https://github.com/alphapapa/prism.el - Prism, a slightly more "friendly" rainbow blocks implementation

perl4ever · on July 21, 2020

I find the unhighlighted circle to be about as quick to identify as the red one.

Also, I work for someone who is color blind. This whole attitude of "we must do something really important with color because it's so powerful" goes against the idea that it has to be optional.

Finnucane · on July 21, 2020

I am color blind, and some of the examples here just didn’t work for me. Even normal syntax highlighting is of limited value, and I need to tweak color schemes. Some thing like this could be useful as long as it didn’t get overwhelming (selectable context switching) and the user had some control over the colors used.

GlennS · on July 21, 2020

Not convinced. It's not exclusionary for people with good colour vision to want to use it effectively.

It is exclusionary not to provide good alternatives for people who don't have good colour vision (or various other disabilities).

jarofgreen · on July 21, 2020

> Also, I work for someone who is color blind.

Jethrains also does things with outlines, etc - a bit more limiting maybe but worth considering.

But yes, we much make sure we don't end up causing problems here - what about someone using a text-to-speech tool?

sradman · on July 21, 2020

The use cases demonstrated are compelling but there seems to be an underlying assumption that current syntax highlighting techniques are broken. After reading the article I'd argue that there is a case to be made for enhanced but temporary syntax highlighting modes where most highlighting classes are muted except the focus of your attention, such as imported package variables.

Consistency is important, both for style and syntax highlighting. It is hard to anticipate what a developer is mentally focused on as they scan code.

falcolas · on July 21, 2020

What’s amusing to me is how few suggestions around highlighting ignores the vast amounts of whitespace that surrounds every character. Borders (partial and full), underlines, strikethrough, background colors, font faces... so many options, yet are we’re constantly limiting ourselves to the color of a letter and the occasional red squiggle.

The one valid excuse I can think of is “my editor doesn’t support it”, to which I say BS. VIM, in a terminal, can do most of those.

dragonwriter · on July 22, 2020

> Why aren’t things this way?

They...are. With proper extensions, for example, VSCode does many of those things, and can switch which of the supported things it is doing, on top of also doing syntax highlighting, and some of the changes are automatic and context-driven.

And while VSCode is my current daily driver, that's been true of code editors for quite some time.

raxxorrax · on July 21, 2020

This is somewhat true, although I think keywords should still be highlighted. I once wrote a plugin that highlighted all labels in different colors. Even with a good palette it was really bad... The scope or block highlights could be useful. Difficult for some languages probably, but I would give it a shot.

jnxx · on July 22, 2020

He somehow missed rainbow identifiers, which is quite useful to represent the data flow in a function:

https://www.emacswiki.org/emacs/ColorIdentifiersMode

mmgutz · on July 21, 2020

Egads ... that rainbow parentheses suggestion is horrific to my eyes. Most editors already do a good job of showing matched braces without shouting.

The problems he mentions are not tied to syntax. We need better CONTEXT aware highlighting is the gist of the article.

pragmatick · on July 21, 2020

I use https://plugins.jetbrains.com/plugin/10080-rainbow-brackets for IntelliJ to color matching brackets.

LandR · on July 21, 2020

I find this plugin invaluable for Clojure!

Already__Taken · on July 23, 2020

I don't know how to make it but I always wanted a vscode extension to text background colour what scope you were in for teaching. The rainbow vomit some beautifully indented student python would be...

erikbye · on July 21, 2020

This is awful, a mess. I prefer little to no syntax highlighting, just a grayscale theme with some bold and italics. Actual code clarity (naming, etc.) and structure (indentation, whitespace, etc.) is what is important.

Recently there was an article here about what makes a good photograph, one element was few colors, bringing focus to what matters. That is what code should look like, like a good photograph. Not like a rainbow exploded.

yesenadam · on July 21, 2020

It seems everyone prefers a different colouring to their syntax highlighting, which you get used to, and other peoples' looks weird! But code should look like what you personally want it to, surely. There's no one way. Sure, you could write a manifesto promoting your opinion to a prescription for everyone, but why? You don't have to look at theirs, they don't have to look at yours. Yes, maybe for code sharing, low- or no-colour is more like a common denominator, and not strange/disorienting for anyone.

I find one of the main benefits of syntax highlighting is it's less tiring on the eyes. And once you get used to it what your colours mean, much faster to read. Having it look attractive as well is just a side-benefit, though not a small one, if you spend a lot of your life looking at code!

reportgunner · on July 21, 2020

I don't get it. Author is a programmer, why doesn't he just develop this? I don't need this and I don't want it bloating my text editor.

gsliepen · on July 21, 2020

Syntax highlighting is very useful, if not then it wouldn't be so commonplace. The other highlighting methods are also useful. But there is not one objective best highlighting method, it actually changes from moment to moment depending on what you are doing. And many editors in fact already change how they highlight things based on the context! For example, when I search for something, my editor highlights all matches. If my cursor is on an opening bracket, it will start highlighting the matching closing bracket.

Some of the highlighting methods shown in the article also seem a bit useless to me. Assigning colors to indentation levels? Surely the indentation itself is a huge visual clue already that doesn't need colors to be wasted. The rainbow parentheses example also shows that there are limits to how many obiously distinct colors you can actually have on screen at the same time, and the colors don't uniquely show nesting levels.

rbanffy · on July 21, 2020

> Surely the indentation itself is a huge visual clue

Apple had a security bug that stemmed from indented C code that was indented but not within brackets. The second line of an if was not under the condition, but was indented the same as the first line.

I myself prefer to use type to convey structure and color to convey semantics. Having keywords boldfaced helps structure to stand out while coloring parameters, locals and globals differently would help meaning to surface more evidently.

alblue · on July 21, 2020

For what it’s worth, it was introduced as a failure of an automated merge process and not direct programmer entry. Had braces been used the error would likely not have been caused (or would have compilation errors) but the fact that it appeared indented you the same level is not because a programmer made a mistake.

It is perhaps more telling that the code was compiled without an error indicating dead code which was the net effect of this merge, but sadly warning free code isn’t a goal.

rbanffy · on July 21, 2020

True, but just by looking at the code it looked right. An autoformat tool would have made the mistake more visible. I wonder if popular linters would pick it up (not only the lack of braces as a rule, but the misleading indentation)

gsliepen · on July 21, 2020

GCC version 6 and later actually warn about misleading indentation if you use -Wall (which you should), and you can turn on -Werror to ensure you don't forget to fix the mistakes.

rbanffy · on July 21, 2020

One life goal is that all my software can compile without any warnings.

gsliepen · on July 21, 2020

> Apple had a security bug that stemmed from indented C code that was indented but not within brackets.

Sure, but if your editor can correctly color based on nesting it could very well fix the indentation to be correct as well.

> I myself prefer to use type to convey structure and color to convey semantics. Having keywords boldfaced helps structure to stand out while coloring parameters, locals and globals differently would help meaning to surface more evidently.

That makes sense.

rbanffy · on July 21, 2020

> if your editor can correctly color based on nesting it could very well fix the indentation to be correct as well.

One more reason to run an autoformat tool on commit. ;-)