Because historically, that was true for the obvious reasons of pragmatism; you could put it on a punchcard, print it to paper, save it to a file all using the same character encoding and rendering methods, and when we went to interactive editing that didn't change: half of the joy of plaintext is that you can select, copy, paste, search and modify in mostly standard ways across many tools.
Syntax highlighting is incrementally in line with this paradigm since there's only one syntax for the language(syntax extension and mixed-language documents aside) and it too follows a linear order, although it also recognizes that an AST exists. But as you get into stuff like code folding, Intellisense and other IDE-like features, deviations emerge. More of the selections deal with symbolic meanings(declarations, types) and not really the tokens or the characters they consist of, and only incidentally the delimiters of the AST.
But - when we try to fully re-envision it and express these other concepts directly we tend to end up with something that is unfamiliar, a bit clunky and alienating(most graphical languages). So the accepted developments always seem to come in these little ideas that are mostly-optional add-ons.
Chuck Moore did use semantic color as a language feature in ColorForth, but his specific goals lean towards a holistic bootstrapping minimalism, so I haven't seen it taken up elsewhere.
Compare with, say, a WYSYWIG rich text / HTML editor. If you want to reorganize a bulleted list, it's many more steps, because the tool isn't really set up to accommodate a momentarily-invalid state. I think that's the big difference between syntax-highlighting something that fundamentally remains text and switching the representation to not-text.
This also lets you do various sorts of text manipulation like conflicted merges. There are lots of arguments for an AST-aware merge, but in case of a conflict, a text-based merge system that inserts standard conflict markers will still usually leave you with a file that parses well enough to be syntax-highlighted, even if it won't compile with the conflict markers still in place. Or even imagine converting a program from one language to another (e.g., a shell script that outgrew shell). You can stick the invalid code in your text editor, ignore the highlighting, and turn it line-by-line into the new language.
I think all the highlighting examples in this article work as overlays, just like syntax highlighting, so they'd work acceptably well in these briefly-invalid states. In some cases it'll fail to highlight or it will need aggressive error-recovery that would be dubious in an actual compiler or interpreter (imagine, say, "if you see a conflict market, skip the <<< portion and parse the >>> portion to find what locals exist and what their types are, then go back and try to highlight the <<< portion"), but since the highlighter isn't the real interpreter, that's fine.
I've struggled to explain to students, teachers and others my frustrations with anything that isn't plaintext code. This is it - thank you!
I think it's also a neat learning concept about why it's important one is _able_ to make mistakes when writing code. So many are overwhelmed by the flexibility or fragility of syntaxes but there's actually a lot of power in that.
> If you have syntax highlighting on, it might briefly mis-highlight (e.g., it may not know what to do with the else block when it's briefly unpaired) but it will let you do it, and it will fix itself once the program is properly parseable again.
This intuition that syntax highlighting token streams already are the most generic "semantic" tool we have readily available, are very resilient to work-in-progress states, and are very fast (because we use them in real time in editors), led me to experimenting with a token-stream based diff tool. 
I got some really good results in my experiments with it. It gives you character-based diffs (as opposed to line-based diffs) better (more semantically meaningful) and faster than the other character-based diff tools I compared it to. You could probably use it as diff tool with git projects today if you wanted, but it would mostly just be a UI toy as git is snapshot-based rather than a patch-based source control system. (I explored the idea curious if might be useful to patch-based darcs. Darcs kept exploring the idea of trying to implement a character-based patch format in addition to or in replacement of its line-based patch format, but so far as I saw never did, but if it did, this tool would potentially be quite powerful there.) It's a neat toy/experiment though.
But source code is NOT represented as a linear document. At a minimum, source code is represented by dozens, hundreds, or thousands of files across a file system.
As such, source code on a filesystem is a highly-connected graph. Source code is hypertext: classes, functions, and data-structures take you to different files in your directory tree.
Take some ASync code you have, and tell me: how many documents / files do you have to read before you really know what the async code does from beginning to end? There's nothing linear about our code layout actually.
If your question is: "Is an individual code block best represented as a linear document" ?? Then I would argue yes. At a micro-level, code executes from the beginning, and ends at the bottom (except for calls and jumps). But calls / jumps are well represented in our "implicit graph" (function call to another "document" in our file system)
I guess there's an open question about loops: they do NOT progress from top-to-bottom, but are still represented in linear form. Maybe there's a new representation you can have for them. (But maybe that's why "recursive" calls are so powerful: a "recursive" call actually matches the graph-based indirection of our code).
It's a useful representation model, but really only because programming languages are mostly designed to be line-oriented (though not enough: i.e. Python should disallow multiple imports on one line because it lightly breaks diff-viewing). Git is the dumbest possible thing that can work, and it does work, but it's also wrong.
What we really need is a generic way to store graph-structured data (which is what an AST really is) into something like git. Because then we could finally get rid of the notion of code-formatting as anything but a user experience peculiarity (the rise of "push button and blackbox fixes code for commit" elucidates why this should be).
But more importantly, it would mean we could reasonably start representing commits with some notion of what they are doing to the control flow and structure of the code. Think git's rename detection but it notices when a function is renamed, or can represent refactoring in a diff by showing that all the control linkages of one block of code have moved to a particular function (and by extension, diffs would now implicitly show when new links or dependencies are added).
The trouble of course is doing any of this generically, is an unsolved problem. I have an idea that you could probably do something interesting like this with git's existing structure and a language like Golang, where you decompose the program into a bunch of files representing the hierarchical AST and commit that rather then actual text maybe.
git can already store, diff and merge tree-structured data: directory trees. It would be an interesting experiment to encode various tree-structured data as a directory-tree in git and see how it behaves on different version control operations.
What is suggested in ancestor posts is to change storage format, although I think it better to prototype something like using the above first. Integrating into the object model directly, I suspect one would in the end want to design/evolve a language to build a platform specifically for that. You might end up with something similar to Smalltalk, so would need to think if that's the goal or if there's more to accomplish.
The higher order question is: What concerns need to be integrated, and what concerns need to be separated?
Integration may bring new powers, but also risks of evolutions into "big ball of mud".
Decoupling may bring freedoms and independency, but also risks of lack of coherency and unoptimal couplings.
What are the benefits of the current paradigm, and what are the disadvantages? CVS and text-files have worked for a very long time, is cross-platform and works beyond any single project scope. Maybe the golang-approach reaches some sort of optimal equilibrium.
Maybe this tool is moving a step towards this goal: https://www.semanticmerge.com/
Smalltalk uses an image and holds live objects in memory at all times - there's no distinction between the source code representation and the running representation of the code. This allows the "IDE" (which is a misnomer because the development environment and the live environment are one and the same) to introspect on live objects and perform analyses on them.
TL;DR; Smalltalk has never used text as an internal representation and that removes the separation between the source code and the debug or live environment.
I feel like the insistence of text based programming languages just leaves the industry completely mired in the mid 1970's.
They only work on the most basic of use cases.
The moment you want to get real work done they just get in the way.
Disclaimer: I used to work for one of them (Mendix)
VB6 was so horrible and productive at the same time. Single person could build an app in days that a team of 10 would struggle with today.
And a group of 10 could bring said app to a stretching halt.
So many memories, good and bad.
Was at a company that invented a visual programming tool and forced development to use it.
Several flat out refused. We all went on with our careers. The people that used the proprietary system still work there, not then a decade later. They hated the tool then. But they have no marketable skills set at this point.
Edit: Stay classy HN
But it seems though even mentioning that perhaps structured data is a better way to store code makes most programmers really angry. At least that's my experience.
Imagine having to update to a new language version if the code is blobs, I feel sick already. And all the language vendor specific software you'd need just to do anything with it in general.
Of course it's less efficient than using the vendor's native dumps. But it's still important that the option exists, because it means that, in a worst case scenario, you can unfuck your data with text-based tools ranging from notepad to enterprise distributed buzzword ETL.
Right now for most programmers we general work in an editor which does little over the text representation and sometimes edit the first directly as text. We also review code almost entirely as a diff of text. The only exception to this is that we break the text into multiple files and handle each somewhat separately.
This is in start contrast to SAP or Salesforce where the primary interaction is with a specialized UI which treats the "code" more as a graph than text. And dumping to text is primarily used as a backup and is rarely interacted with in that form.
So if the AST format changes incompatibly, you simply need to let the compiler regenerate it from the source code.
I'm of the opinion that programming languages should have two "official" representations: the usual text-based representation, and a machine-readable concrete syntax tree. The latter should be an existing format, e.g. JSON, XML, s-expressions, etc.
It should be "official" in the sense that it's standardised in the same way as the text format, language implementations (compilers/interpreters) should accept it alongside the text format, and they should have a mode which converts the text format into the machine-readable format (and optionally the other way). This is important, since lots of code 'in the wild' only makes sense after feeding it through particular preprocessors, specifically-configured build tools, sed-heavy shell scripts, etc. such that the only tool that can even parse the code correctly is the compiler/interpreter (and even that might need a bunch of tool-specific flags, env vars, config files, etc.!). This makes tooling much harder than it needs to be, and any "unofficial" workarounds will need constant work to keep up with changes to the language.
I say concrete syntax trees since we want to impose as little meaning as possible on the tokens, since that makes tooling more robust in the face of things like macros/custom syntax, new language features, incomplete or malformed code, etc.
Just typing those words sends me back too many years to a TTY on a PDP11/10 and when you "saved" a program by:
1. Typing LIST but not hitting CR
2. Start the tape punch and press HERE-IS a few times to get a leader
3. Hitting CR
4. Waiting for the listing to finish
5. Hitting HERE-IS a few times to get a trailer
6. Folding the paper tape neatly :)
IMO the largest issue with them is UX, specifically input part of that. Keyboard is very fast and very precise, but it's only good if you're working on plain text or something not too far from it.
I'll leave the minority report hand waving for you youngins.
Give me a call when the brain implants are ready for prime time though.
It should be possible (if not easy) to create a zoomable UI that lets you interact with code directly as well as zooming out to view and interact with the code structure in a meaningful way.
Code formatters help, some more than others but most are far from producing a canonical format. This means that I either need to edit in the projects prefered style or have my editor carefully track which sections I haven't modified and be sure to write them back exactly as the originally appeared.
1. I personally find syntax highlighting incredibly helpful as a way of setting off syntactic elements, and incidentally as a way of calling attention to typos (like a missing quote or misspelled keyword). The author seems to dismiss this without any analysis.
2. Using color for syntax highlighting doesn't mean you can't also use color to highlight other useful things (like search results or errors), as the author suggests. Many editors which support syntax highlighting do exactly this!
3. The author's examples aren't particularly compelling. For example, using colors to denote levels of nesting will make code flash between colors as you edit it. Highlighting variables by type feels like a form of syntax highlighting, not something new, and most of the examples of highlighting code which meets specific conditions (variables which are assigned to multiple times, long functions, functions without documentation, etc) all feel like they could be summarized as "linter warnings/failures".
why would that be a problem? the nesting wouldn't change with every keystroke, but only as you type the specific characters that define the nesting. and depending on your editor or your typing habit you may type opening and closing characters first before filling in the content, thus you wouldn't break the nesting structure either.
the pharo smalltalk code editor highlights nesting brackets and parentheses in different colors, and i don't see any flashing of colors.
Currently, when my cursor is over a bracket, the matching one is already highlighted. I'd love it when in this situation his suggestion to color the brackets would 'activate'.
Or when my cursor (or mouse) is on a variable, I'd like to see the highlighting by type.
I can also imagine having a 'highlighting' bar somewhere on screen where I can switch between different forms of highlighting. When I'm doing a code review, highlight based on interesting conditions (long functions, lack of documentations, etc.).
Personally, I find syntax highlighting to be very useful, but I also agree that using color in other ways can be more useful at times. As it happens, syntax highlighting was one of the inspirations for the color gradient text pattern I developed to make reading on screen easier.  The existence of syntax highlighting helped me to think about color in text as an information channel that can be used to make certain tasks more efficient, so I'm very grateful for that!
Every variable name is assigned its own color, so you can track the data dependencies in your function using color. Similar to what the author suggests as "Type Highlighting"
Some might consider it distracting/overstimulating, but I find it both useful and it makes the code look a bit more 'artistic'.
The current language-specific implementations do use it to just augment the syntax highlighting, since that is what is what most people would expect. But out of the box it already supports some use cases the author mentioned, such as different highlighting for imports or constants. The full AST and type info can be used for this (depending on the language server), so pretty-much anything is possible.
What I personally find very useful (and had to hack experimental support for) is highlighting of TypeScript async/Promise values and functions. This is something that a language server can identify easier than humans, and can make a big difference in coding.
But I like the idea of going beyond syntax. Maybe a way to get the best of both worlds would be to have keyboard shortcuts that toggle cycle through different views. Press this shortcut to color-code types, press this other one to color-code error handling, etc.
Also, maybe other visual depictions might be useful. Maybe parentheses could start off very tall and get slightly shorter as you go a layer inward. Or add a 3D/depth shading effect so that code each level of parens literally looks like a level. Or whatever else you can think of, the point being that it doesn't have to be color.
> But I like the idea of going beyond syntax. Maybe a way to get the best of both worlds would be to have keyboard shortcuts that toggle cycle through different views.
I couldn't agree more.
When a line has too many different colors I have a really hard time to intuitively organizing the information. It's kind of a visual cacophony to me. So I use a minimalist syntax highlight to avoid that.
I believe we could come up with a more clever and interactive highlighting system.
A simple example could be rainbow parenthesis which is only enabled when many parentheses are found in a chunk of code.
(if it already exists, I'd be glad to know about it)
We need to choose what is important, I like to highlight language keywords as grey, function names as blue and numeric values / strings as purple / green. Everything else is white, and it works great.
tree-sitter should end regex-based syntax highlighting, check this out for nvim: https://github.com/nvim-treesitter/nvim-treesitter
I really want to see tree-sitter used to power an awesome linter, that works across languages.
On a personal note I do prefer the 2nd screenshot to the first (ie I prefer tree-sitter's highlighting than whatever they used to generate the "traditional" image).
In the end I think it comes down to colors being useful to highlight _meaningful_ differences between code snippets, be it semantic, syntactic, or whichever other distinction is useful to the person reading/editing the code at that time. As mentioned elsewhere in these comments, in a language that distinguishes between synchronous and asynchronous functions (à la async/await in Typescript or Python) it could very well be useful to have a different color for each.
Then again, if you're "just" opening up your code files to edit some string "constants" here and there, there's almost no point in having any other highlighting (at that moment/for that action/activity) than "what is string, what is not?". So the holy grail is some kind of highlighting that is aware of your intent (as in, why you are viewing/editing this piece of code), and can adapt to changes in said intent.
I personally like it, but I think other people feel that syntax highlighting plus semantic highlighting leads to rainbow soup.
Undefined variables? Those really should get a squiggly underline, as with spell checkers. Some IDE probably does that already.
Anything more subtle?
A key point in language design and program tooling is detecting that thing A, which needs to be consistent with thing B far away in some other module, isn't. Most hard to find defects involve some distant relationship. What could be done with color or other automatic annotation to help?
It spell checks code by separating camelCaseWords and identifiers_with_underscores. Sure it's not "smart" but for dynamic languages, at least for me, it helps prevent bugs. A static language the IDE would already have the info it needs to highlight a non-existent property but in a dynamic language generally not so I've found it quite useful.
> What if something needs to be colored two things for two different reasons?
As another commenter points out, JetBrains already do a lot of the ideas this person has raised - but they do it in response to actions. Eg. Put the cursor next to one bracket, the matching bracket is highlighted.
I think the key to breaking this open is an editor that does different highlights at different times in response to user actions; the key will be getting the right user actions so this is seamless and easy to use.
IntelliJ does this for the field/variable under the cursor, with a background colour and a margin colour. I'd like to see what it would be like with all variables in their individual colours.
IMO, syntax highlighting definitely has its uses, but why limit ourselves to only using colors for that? I'd like to see a future, where depending on whether I'm, for example:
- sketching the structure for a completely new program,
- implementing a new feature to an existing program,
- refactoring an existing program,
- debugging a problem in my program,
- writing tests, or
- trying to find something for reference
I'd be able to seamlessly switch between different highlighting schemes. I don't (yet) know what I'd prefer for each of these different modes of software development. The author of TFA lists some ideas for some different ways of doing semantic highlighting. Maybe over the years I'd grow to like some of them for some of these tasks.
As the author describes, you have a number of possible concepts we might like to visually denote:
- Language constructs (ie, traditional syntax highlighting)
- Parentheses/bracket pairs
- Context highlighting
- etc etc etc.
The author considers a single "information channel", text color.
There are other possible, largely orthogonal, information channels.
- Text background color
- Font weight (light/bold/regular)
- Font face
- Font italic / upright
- Font underlines (single, double, dashed, bold, squiggle, etc)
- Font underline color?
- Perhaps font effects (shadow, blur, etc)
- The editor gutter (Sublime and others expose this)
- Hover state (ie, tooltips)
I'd like a text display system that lets me mix and match inputs and outputs, matrix style. Perhaps I'd like to use font weights to denote one concept, and underlines for another.
Perhaps I'd like to use standard text-color choice for language constructs ala traditional syntax highlighting, and text background color for "context" or "semantic" highlighting (where each variable gets a unique color within its scope)
It would most certainly be able to create some real visual nightmares with this level of control, but applied judiciously, it could be very valuable.
Yes, it does syntax highlighting, but it also manages to mix it up with local variables, etc.
On one hand, it sounds really helpful and exciting to have tools assist us with these problems e.g. I've read the "show nesting" idea before and wondered why we don't have it yet. Yet on the other hand, I think the post misses the mark in some ways.
Most IDEs and editors working with most typed languages already do more than colouring in types: they typically hint ahead of entering characters what type of thing should go in each place where a typed thing can go. I think this is more useful than colour, and we already have it.
Taking this further, if our editor has toggles for the categories listed, and thus if it has ASTs (or equivalent things) to work over for semantics, then it probably could do even more than highlighting. Some editors already help you refactor: renaming identifiers across a project, pulling functions out etc. It's not just colouring, it's active involvement with the code, its' types and behaviours.
Further, the author deployed an overly simplistic view of colour and our visual perception. In my case, the "red" circle wasn't much easier to find because it was red, but because it was filled in. If it had the same colour as the rounded squares, and was filled in, I would have found it just as fast. The real trick in the grid examples is that some tool (in this case the author) had identified the significant information and done something about it, before the reader even got to apprehend the situation. Tooling which does this would absolutely be more powerful than tooling which doesn't. We already have tools which do some of these things. The drive shouldn't be "we need more colour?" the drive should be "what features are most helpful to coders?"
Where typing should exist is at the level the author suggests, such as
* List-like structures
* Functions that return option types
and other more abstract structures.
Depending on the code it can be useful.
Some I tried:
- https://github.com/istib/rainbow-blocks - rainbow blocks, the "OG"
- https://github.com/seanirby/rainbow-blocks-bg - rainbow blocks but background style
- https://github.com/alphapapa/prism.el - Prism, a slightly more "friendly" rainbow blocks implementation
Also, I work for someone who is color blind. This whole attitude of "we must do something really important with color because it's so powerful" goes against the idea that it has to be optional.
It is exclusionary not to provide good alternatives for people who don't have good colour vision (or various other disabilities).
Jethrains also does things with outlines, etc - a bit more limiting maybe but worth considering.
But yes, we much make sure we don't end up causing problems here - what about someone using a text-to-speech tool?
Consistency is important, both for style and syntax highlighting. It is hard to anticipate what a developer is mentally focused on as they scan code.
The one valid excuse I can think of is “my editor doesn’t support it”, to which I say BS. VIM, in a terminal, can do most of those.
They...are. With proper extensions, for example, VSCode does many of those things, and can switch which of the supported things it is doing, on top of also doing syntax highlighting, and some of the changes are automatic and context-driven.
And while VSCode is my current daily driver, that's been true of code editors for quite some time.
The problems he mentions are not tied to syntax. We need better CONTEXT aware highlighting is the gist of the article.
Recently there was an article here about what makes a good photograph, one element was few colors, bringing focus to what matters. That is what code should look like, like a good photograph. Not like a rainbow exploded.
I find one of the main benefits of syntax highlighting is it's less tiring on the eyes. And once you get used to it what your colours mean, much faster to read. Having it look attractive as well is just a side-benefit, though not a small one, if you spend a lot of your life looking at code!
Some of the highlighting methods shown in the article also seem a bit useless to me. Assigning colors to indentation levels? Surely the indentation itself is a huge visual clue already that doesn't need colors to be wasted. The rainbow parentheses example also shows that there are limits to how many obiously distinct colors you can actually have on screen at the same time, and the colors don't uniquely show nesting levels.
Apple had a security bug that stemmed from indented C code that was indented but not within brackets. The second line of an if was not under the condition, but was indented the same as the first line.
I myself prefer to use type to convey structure and color to convey semantics. Having keywords boldfaced helps structure to stand out while coloring parameters, locals and globals differently would help meaning to surface more evidently.
It is perhaps more telling that the code was compiled without an error indicating dead code which was the net effect of this merge, but sadly warning free code isn’t a goal.
Sure, but if your editor can correctly color based on nesting it could very well fix the indentation to be correct as well.
> I myself prefer to use type to convey structure and color to convey semantics. Having keywords boldfaced helps structure to stand out while coloring parameters, locals and globals differently would help meaning to surface more evidently.
That makes sense.
One more reason to run an autoformat tool on commit. ;-)