
Show HN: CodeBuff – smart code formatter - parrt
https://github.com/antlr/codebuff
======
chriswarbo
Very interesting! I have been thinking about something similar, but rather
than learning from examples, it would have a generic "misaligned" cost
function that would penalise lines which have similar content but in different
columns, and minimise this by hillclimbing or similar.

The difficulty is tying it to a particular language's parser and whitespace
rules.

~~~
timhwang21
If you're interested, Google Research recently published a paper about code
formatting via dynamic programming:
[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44667.pdf)

~~~
parrt
Interesting, though it is a framework very similar to previous work using Box
combinations. Here, we don't require any work from a language expert. We
simply sniff your project, and then make new files look like those. Handling a
new language requires no coding.

------
mkagenius
Does this ML model assume that popular code style = correct code style? (had a
cursory look)

~~~
bradleyjg
I think you can take the code and train it on whatever corpus you like.

~~~
parrt
Yep. no definition of "good style". The tool simply makes new files look like
the rest of your project.

------
clemensley
Love this approach. We need more ai applied to problems in writing code. Like
an ai parser that auto corrects errors. Anyone know if something like this
exists?

~~~
jclos
Not directly related, but there is a group in the University of Edinburgh that
focuses on that (applying ML to source code). Their page is [https://mast-
group.github.io/](https://mast-group.github.io/)

------
lubomir
How does it deal with languages with significant whitespace? Can it format
Python without breaking it?

~~~
jurgenv
haven't tried, but if you train it using only correct Python I bet it can only
produce correct Python. Have to check this out to be totally sure of the
absence of a weird corner case.

------
thealistra
Why do you need a grammar if this is supposed to learn it itself from the
code?

~~~
jurgenv
The grammar is used to compute the code layout features that CodeBuff learns
from: it parses each file and associates spacing and indentation features to
trees' contextual features.

Then, when we parse a new file to pretty-print, we parse again with the _same
parser_ and the features that were learned are matched to the tree at hand to
recover the "right" spacing and indentation features for the given example
code.

------
jpalomaki
This would be a nice way for getting the code formatter configuration about
right when switching to a new editor with an old project.

------
Hydraulix989
How does this improve over VSCode's golang formatting (which I think is
excellent)?

~~~
jurgenv
It's language parametric, so it can work for any language you have a grammar
for and a set of example files to train on. In this sense it helps the authors
of formatting tools.

For the users for a specific language it could also be an improvement since
configuration of a formatter can be done via completely arbitrary code
examples. But we have to work a bit further to streamline that use case.

