

What if ‘source code’ were a serialized syntax tree? - mwsherman
http://clipperhouse.com/2013/04/03/what-if-source-code-were-a-serialized-syntax-tree/

======
nostrademons
Yes, it's been tried before, and gets periodically re-invented by everyone who
thinks seriously about language design.

Lisp works on this principle. The language is nothing but S-expressions, which
are a serialized representation of the parse tree. Nobody bothers to format
their Lisp code: they let Emacs format it for them. As you write it, Emacs
reformats it into canonical Lisp style, and then that's what gets saved.

There were also various experiments at saving SmallTalk into an object
database, which were then carried over into some Java IDEs. IntelliJ has a
setting where you can have it automatically format your Java according to the
formatter you specify as it saves the file; the representation inside IntelliJ
is a parse tree, which gets created when the file is opened and serialized
when saved.

Go operates on a similar property. You run the gofmt tool on your source code
before you check it in; it formats your source code in a canonical way and
then writes it back out. Go code always has a consistent style, because nobody
bothers to format their own code, they let the tool do it for them.

Several people swear by this. It eliminates all the time wasted by arguing
about style, and it also makes it much easier for refactoring and code
generation tools to operate (they can operate on the raw parse tree and rely
on the formatter to generate the final source code). There's still a bunch of
resistance in the wider programming world, though, largely because such
formatting is inherently lossy (there's information conveyed in our style
choices) and because programmers are people too who like to be individuals.

------
aphyr
We have languages where the source code is a literal serialization of the
abstract syntax tree. This property is homoiconicity
(<http://en.wikipedia.org/wiki/Homoiconicity>), and is a key factor in the
success of s-expressions used in Lisps. It comes with all sorts of benefits--
like having a macro system which rewrites not the _text_ , but the _source_ of
one's code.

However, destroying formatting information in the source can do more harm than
good. Often we use alignment to enhance readability, or to call out similarity
or symmetry between parts of the code. I think you're better off just using a
set of well-known formatting rules (like those built in to your editors) than
trying to eliminate formatting altogether.

------
shabble
I think that's the idea behind Go's 'gofmt'[1] tool; it even ships with a few
scripts suitable for VCS commit hooks[2]

It's an idea I've thought about before; allow users to define some sort of
style profile for their editor, and reformat to/from the canonical
representation before committing.

The biggest hurdle is of course tool support; if your web-based code review
app is showing different code to what you're seeing in your emacs buffer, and
different again from what your coworker is pastebinning from their gdb crash
dump, Things Get Confusing.

Even something as simple as line-numbers not matching up can be quite
irritating. Sourcemaps might be part of the solution for languages that
support them, but there's still a hell of a lot of work in upgrading tools to
support these sorts of features before you can expect them to become
widespread.

[1] <http://blog.golang.org/2013/01/go-fmt-your-code.html>

[2] <http://tip.golang.org/misc/git/pre-commit>

------
seanmcdirmid
> At the risk of revealing my lack of a CompSci degree: what if ‘source code’
> were replaced serialized syntax trees?

I really don't want to be "that guy", but this is an old idea in programming
languages, going all the way back to the Cornel program synthesizer days.

Here is a recent debate that discusses the issues:

[http://programmers.stackexchange.com/questions/119095/why-
do...](http://programmers.stackexchange.com/questions/119095/why-dont-we-
store-the-syntax-tree-instead-of-the-source-code)

~~~
mwsherman
Yep, I was sure it’s not an original idea. Though that particular link seems
to be about a universal, language-agnostic syntax tree.

Also interesting, but I wasn’t even intending to be that ambitious. I’d limit
the scope to individual languages, and let those communities debate the
‘right’ serialization.

~~~
seanmcdirmid
The problem is one of cost benefit: the costs are high in terms of tooling
while the benefits do not seem to be big enough. Intentional Software was a
whole company founded on this idea and it hasn't really gone anywhere.

There is a small trend back toward looking at structured editors; in that case
you have no choice but to serialize a tree.

------
mwsherman
It may be interesting to note that I didn’t stumble on this from a compsci
perspective. Rather, I was wondering to myself, what could source control do
if it understood the code it was managing?

------
tantalor
What about files? Depending on the compiler, it may not care whether your
source code is in one big file or many small files.

For example, in JavaScript file layout doesn't matter, but in C++ it does.

~~~
fractallyte
"Source code in files. How quaint." - Kent Beck

(In Smalltalk and Lisp machines, the source code becomes integrated into the
'operating system', rather than being isolated in static 'dead' files.)

