
Detecting Code Indentation - ingve
https://medium.com/firefox-developer-tools/detecting-code-indentation-eff3ed0fb56b
======
jamesrom
People love to hate tabs, but in a world where we represent logic in megabytes
of alpha-numeric characters, doesn't a semantic notation of indentation make
just a little bit of sense?

Tabs are for indentation, spaces for alignment. These are different concepts,
but spacers[1] just think whitespace is whitespace are forever double tapping
their spacebar like some sick mental tick that can't be cured.

Your editor should not need to detect tab size. You should be able to size
your tabs however you want. A tab is an indentation, weather you want it to be
2 chars or 20. It shouldn't matter.

The fact that a neural network is being used to detect tab size, just shows
how much ground tabbers have lost to the space infidels[2].

[1] [http://www.jwz.org/doc/tabs-vs-spaces.html](http://www.jwz.org/doc/tabs-
vs-spaces.html)

[2] [http://blog.codinghorror.com/death-to-the-space-
infidels/](http://blog.codinghorror.com/death-to-the-space-infidels/)

~~~
alextgordon
You shouldn't use tabs for new projects, because spaces are more popular. It's
like endianness: it doesn't matter _which_ you use, as long as it's the same
as everybody else.

Right now, "everybody else" is using spaces. So get onboard.

~~~
prodigal_erik
Big-endian and little-endian convey exactly the same information. Tabs and
spaces don't; tabs impose fewer requirements on presentation and discourage
people from wasting effort on fragile ASCII art that mostly just causes
spurious merge conflicts.

------
agentgt
I have always wanted some editor plugins to help with 2-space code
indentation. I just can't understand how some developers think that 2-space
indentation is readable.

I don't have that many strong feelings on code style with one exception...
code blocks should be indented at minimum of 4-spaces.

My feeling is that visually impaired people that prefer 4-spaces like myself
have a physical perhaps even medical handicap/challenge of reading 2-spaces
whereas the 2-space folks just prefer 2-spaces so they can shove as much
things in one line as possible or have some sort of archaic 80 char per line
argument.... or worse have shit loads of indentation (this is especially the
case for Javscript and Scala... see Steve McConnell “Taming Dangerously Deep
Nesting” section on "Code Complete").

And when it comes to diffing.. 2-space becomes so egregious that I have to
take the code and replace with tabs at times.

~~~
turbohz
Switch to indenting with tabs. Problem gone.

~~~
agentgt
Oh I like tabs as well but many battles have been been fought and tabs is
loosing ground in the overall war. Plus there are some fairly good arguments
for languages that support variable indentation to not use tabs.

So I concede that loss and concentrate energy on the "please god do not use
2-spaces" battle.

------
jhallenworld
Interesting, I've not seen this discussed before. JOE uses a variation of the
GCD method:

[https://sourceforge.net/p/joe-
editor/mercurial/ci/default/tr...](https://sourceforge.net/p/joe-
editor/mercurial/ci/default/tree/joe/b.c#l2810)

Create a histogram of the indentation of the last 250 lines of the file (on
the theory that there are more comments at the beginning of the file). The GCD
of the three most popular indentations is selected. This way one oddly
indented line does not screw it up. It does not ignore single space
indentations, since I used to use that. Instead it ignores lines which begin
with comment characters, like asterisk or '/'. It can still be confused by
block comments like this:

    
    
        /* first
           second
           third */
    

The editor does in theory know the syntax, so it could be enhanced to just
ignore comment lines.

Also, it determines if the user prefers to use tabs or spaces for indentation.

~~~
alextgordon
In Chocolat we take a similar approach, but the histogram is of the GCDs, not
of the indentations.

So we just take a sample of 1000 lines, then take the GCD of every pair. The
indentation width is simply the most common pair.

If it turns out that one bin is not clearly more dominant than the others,
then we default to whatever was used last time for that language.

------
x1798DE
One thing I think is a bit annoying is that SublimeText doesn't seem to be
able to properly detect the 4-space tab settings in Python if you have some
visually aligned elements that are indented to a multiple of 2 - it always
incorrectly assumes 2 for those files.

Given that Python itself relies on consistent indentation, once you know it's
Python, you should be able to detect the file settings by finding an indented
construct and checking the indentation of the next block. That or ignore lines
that come between brackets, parens, quotes, etc, when doing the calculation.

I assume there is a plugin for this, but a cursory search hasn't turned it up.

------
Animats
"gofmt" makes this a non-issue for Go. In Python, indentation has meaning, and
the compiler is smart enough to understand when tabs vs spaces is an
ambiguity, so it's less of a problem.

The last time I ran a C++ project, we just ran everything through Artistic
Style [1] in "ansi" mode.

Fussing over this too much is bikeshedding.

[1] [http://astyle.sourceforge.net/](http://astyle.sourceforge.net/)

------
one-more-minute
The indent-detective plugin [1], which implements this feature for Atom, is
directly based on this post. The compare-lines heuristic works really nicely.

[https://atom.io/packages/indent-detective](https://atom.io/packages/indent-
detective)

~~~
jhasse
It's a great plugin :)

I would really love if that plugin has some options. For example never detect
1 or more than 8 spaces.

------
mofle
I wrote a JS module for detecting indentation some years ago. I ended up with
an algorithm [0] that looks for the most common difference between two
consecutive non-empty lines.

[0]: [https://github.com/sindresorhus/detect-
indent#algorithm](https://github.com/sindresorhus/detect-indent#algorithm)

------
yitchelle
It would be nice if the language has a spec on how the code should be
formatted. I think rust is heading in this direction with rust-fmt

[https://github.com/rust-lang-nursery/rustfmt](https://github.com/rust-lang-
nursery/rustfmt)

------
kazinator
See here:
[http://www.kylheku.com/cgit/c-snippets/tree/autotab.c](http://www.kylheku.com/cgit/c-snippets/tree/autotab.c)

Autotab detects how wide tabs should be and what the indentation is, using a
variety of heuristics.

------
Nutmog
Why are people still manually indenting in the first place? Just use an IDE
that does it for you. It doesn't matter what convention it's applying because
you don't have to do it. It'll automatically re-indent all your old code to
match whatever it's own default way is and you can do programming instead of
formatting.

