
Tabs or spaces – Parsing a 1B files among 14 programming languages - nikbackm
https://medium.com/@hoffa/400-000-github-repositories-1-billion-files-14-terabytes-of-code-spaces-or-tabs-7cfe0b5dd7fd#.ux86pwfmi
======
jasode
After 20+ years of listening to the tabs-vs-spaces debate and considering
_all_ the legitimate points that _both_ sides have, many have made the
following observation and it's what resonates with me the most:

In an ideal perfect world, _all_ of programmers and _all_ text editor tools
would use tabs specifically for indentation and spaces specifically for
alignment. But, we don't live in that perfectly coordinated world so spaces
maintains the most fidelity -- at the expense of programmers not being able to
instantly customize the indentation from widths of 2,4,6,8.

Therefore, the _slight_ edge goes to spaces even though I find tabs extremely
attractive. (That conclusion is in the context of a team of multiple
programmers, using multiple languages, multiple text editors. If you're solo
and can maintain "tabs discipline", that's a different scenario.)

~~~
spinningarrow
Personally, I don't find the value in alignment - indentation is all I do. I
especially dislike alignment of this sort:

    
    
        var a_variable           = 1;
        var another_variable     = 2;
        var yet_another_variable = 3;
    

which is just too fiddlesome and in fact makes it _harder_ for me to read.

~~~
zbjornson
That's also hazardous when you later add an_even_longer_variable and then have
to modify three extra lines to make them align.

~~~
spinningarrow
Exactly. To be fair, I believe most popular editors have plugins/shortcuts to
do this. Or at least I hope people aren't doing this manually.

~~~
dkuntz2
But if anyone looks at a diff, it looks like all the lines have changed,
instead of just one, taking just a tiny bit more cognitive overhead.

~~~
citrusx
`git diff -w`

~~~
tnorthcutt
But `git blame` still sees the whitespace change, right?

Edit: turns out `git blame -w` works as expected, ignoring whitespace. The
More You Know™

------
w_t_payne
I've always used spaces instead of tabs.

My rationale:

The number of readers of any document will normally be greater than the number
of authors. We should give a slight preference for readability and ease of
comprehension over writeability, and should certainly not require that our
readers adjust their editor settings for each document that they peruse.

In addition, it is useful to use vertical alignment to group related content
that spans multiple lines. Although this at least partly an aesthetic choice,
and not without drawbacks, I believe that the benefits outweigh the costs.

I believe that it looks neater, improves readability and also makes some
errors "pop" out in a way that they don't in unaligned content. I find the
ability to visually group related terms and expressions particularly useful.

The downside is that you have to "tidy up" after using refactoring tools or
after doing a search-and-replace operation.

However, I believe that this downside is mitigated due to the fact that the
very tidying up that is required is a good way of eyeballing the changes and
checking to make sure that they are OK.

[http://williamtpayne.blogspot.co.uk/2012/04/spaces-over-
tabs...](http://williamtpayne.blogspot.co.uk/2012/04/spaces-over-tabs.html)

~~~
scrollaway
Hey, a rationale, something I seldom see in those debates :)

I think that if you do want to align, spaces are unavoidable. But I want to
reiterate three points against alignment:

\- Alignment creates a maintenance burden. It has to be updated whenever you
touch unrelated lines, which creates whitespace noise.

\- Some types of alignments (specifically, method parameter alignment) create
extreme horizontal noise. See for yourself:
[https://github.com/pennersr/django-
allauth/blob/c9b31ddee81d...](https://github.com/pennersr/django-
allauth/blob/c9b31ddee81d2cde06c1f686c4eb7703f916681f/allauth/account/views.py#L629-L632)

That stuff's ridiculous. Styleguides are meant to make the code more readable
and maintainable - this is neither. Method calls can be defined just like
arrays, where the first argument goes on a newline with a single added indent.
No need to align anything.

\- Finally, alignment breaks with proportional fonts. This is something
someone else brought up on HN in another topic and I think it's a really good
point; proportional fonts, like tabs, are an accessibility feature.

~~~
patrickmay
> \- Some types of alignments (specifically, method parameter alignment)
> create extreme horizontal noise. See for yourself:
> [https://github.com/pennersr/django-
> allauth/blob/c9b31ddee81d...](https://github.com/pennersr/django-
> allauth/blob/c9b31ddee81d..).

The answer there would be to put the line break before the . and align the
.dispatch under super.

------
teddyh
Neither!

[http://nickgravgaard.com/elastic-tabstops/](http://nickgravgaard.com/elastic-
tabstops/)

“ _A better way to indent and align code_ ”

When I saw this, it was obvious to me that _this_ was the true solution to the
problem which both space and tab proponents try to solve.

~~~
nwah1
Yes, this seems like the best approach. The key for this to gain support is
for Github's diff system to support it.

------
robgering
I've been learning Go in my spare time. One of the things that I've found
really refreshing is how the enforced conventions largely eliminate
contentious discussions like tabs vs spaces.

For those unfamiliar with Go, the gofmt tool[1] converts indentation to tabs,
standardizes brace positioning, etc. Most editors have a plugin that will
automatically run the tool on save. An option for programmers using other
languages is EditorConfig[2].

[1] [https://blog.golang.org/go-fmt-your-code](https://blog.golang.org/go-fmt-
your-code)

[2] [http://editorconfig.org/](http://editorconfig.org/)

------
lethargic_meat
Some IDEs will convert tabs to spaces when saving and the other way around
when editing, so, while your approach is interesting, not really a proof of a
majority choice.

Also my beef is specially with people who don't use tabs in files like fstab,
like beasts.

~~~
scrollaway
I'd bet most of the code ever written gets styled however the IDE du jour
decides to by default.

If some IDEs decided to switch to tab indent by default, people would suddenly
love tabs.

~~~
aplummer
I think this definitely applies to xcode, I use tabs everywhere but for xcode
it just prints a lot of spaces and I am used to it.

------
mherrmann
I've never really understood why you would use several characters (=spaces)
for something that is semantically one indent. Can someone enlighten me? What
are the advantages of spaces over tabs?

~~~
philipov

        def doSomething( parameter1
                       , parameter2
                       , parameter3
                       ):
            pass
    
    

How would you make all the punctuation line up using tabs?

~~~
scrollaway

        def do_something(
            param1, param2,
            param3, param4
        ):
            ...
    

It's not that hard. Alignment is generally harmful - it's a maintenance burden
which adds whitespace noise on diffs, which means more noise in code reviews.
Not to mention how atrociously ugly it is when long variables/method names
lead to 5+ lines with 60+ characters of indent for a single variable name per
line.

~~~
pyre
Most diff programs have an "ignore whitespace" option. "whitespace noise on
diffs" is pretty much a solved problem. You can even add '?w=1' to a Github
URL to turn on the feature.

~~~
scrollaway
1\. That's a band-aid. The characters have still been changed - the diff still
"exists".

2\. Those options are not perfect, they don't always work as expected

3\. Those options are not turned on by default (because that would be INSANE)

4\. Git's not the only VCS, much as I wish it were

5\. Github's not the only Git web UI; that option is not available everywhere.

It's not a "solved problem". It's an artificial problem, that gets solved when
you start consistently stripping trailing whitespaces on save and not
realigning everything all the time.

~~~
pyre
> It's an artificial problem, that gets solved when you start consistently
> stripping trailing whitespaces on save and not realigning everything all the
> time.

This is mostly solved via coding standards. If there are no coding standards
people will "refactor" the coding style when they touch various pieces of
code. If people were following a consistent style, then the alignment wouldn't
be changing all of the time.

Although I strongly agree with the trailing whitespace sentiment. The most
egregious offender that I've found is Eclipse putting trailing spaces on blank
lines to bring them to the indent level of the non-blank lines. Ugh.

~~~
scrollaway

        enum {
            NORMAL    = 1
            ABNORMAL  = 2
            IRREGULAR = 3
        }
    

Now if you're writing like this, what happens when you introduce "EXCEPTIONAL
= 4"? You change 3 other unrelated lines. This is something that alignment
causes, regardless of "coding standards". Hell, gofmt does that... It looks
pretty (to some extent - with long names and lots of members it gets
unreadable), but creates so much noise.

------
funkaster
Both:
[https://www.emacswiki.org/emacs/SmartTabs](https://www.emacswiki.org/emacs/SmartTabs)

(Tabs for indenting, spaces for aligning)

~~~
bvinc
To me, when people say "tabs", this is what I assume they mean. Using tabs for
alignment is TERRIBLE.

------
dvh
I hereby declare end of discussion (at least for me), spaces because most
other programmers use spaces. Unless you are joining the project that
consistently uses tabs, then use tabs. Otherwise spaces.

~~~
jasonkostempski
If greater than 58% of all projects you work on mostly use tabs, then use tabs
91% of the time, in 97% of your projects.

------
Tharkun
Now, can we look at the same repositories and look at the number of bugs and
correlate it with the use of spaces?

But in all seriousness, how often something is used isn't an indication of how
good it is. Spaces are like cigarettes. Just don't start.

~~~
emodendroket
Yeah, but since in this case it doesn't really matter it's better to just do
what most people are doing. There are no serious advantages to either one.

------
guessmyname
I couldn't care less about this, and I generally feel out of place when other
programmers ask me about my preferences on this topic. I have written code in
different programming languages, and just naming Go and PHP where Tabs and
Spaces are the standard respectively — PHP mostly because of PSR — I simply
don't pay attention to the character(s) used for the indentation, the tooling
already does that for me, being gofmt for Go and PHPCS for PHP, and there is
probably the same tools in other languages [1]. I never understood why people
complained about this more than other (more important?) things in the code
like the position of braces which is also a flame war between developers but
it makes more sense than caring about Tabs vs. Spaces.

[1] I say "probably" because even when I have written in Vala, C++, Ruby,
Python, JavaScript, I have always relied on the IDE to automatically select
the most common indentation in the project, so I never realize if I am using
Tabs or Spaces since hitting the Tabulator key while using spaces will simply
translate them to the correct indentation.

~~~
rdtsc
Completely unrelated. Noticed you use Vala. Just curious, how do you find it?
Do you just use it for GNOME related things (GUI building) or for other cases
as well?

Remember being interested in it a few years back. It seemed to be popular, but
haven't followed it since.

~~~
guessmyname
Vala was my first introduction to OOP and since I was always working in a Unix
environment C# was not an option for me. The bindings to develop GTK-based
applications were good enough but still today I believe that using tools like
Glade writing a GTK-based app in any other language is as easy as it is with
Vala, the only advantage is that I can compile to C without having to write C
all the time.

It has worked well for me, and was able to create corporate applications in
the past. I don't use it with the same frequency now because of my current
job, but it is still in my list of programming languages that I want to use
for personal projects along with C++ and (recently) Go.

For people interested in the language I suggest them to take a look at
Elementary OS documentation, they are one of the most popular projects using
Vala in production and I am sure they will appreciate any contribution from
beginners and experience developers:
[https://elementary.io/developer](https://elementary.io/developer) —
[https://github.com/trending/vala](https://github.com/trending/vala)

~~~
rdtsc
Thanks. Didn't know about Elementary OS and was surprised there is a good
number of Vala projects on GH.

Yeah there is something appealing about compiling to C. It is a relatively
easier target than LLVM or assembly. Nim is in that space, is making progress
but it still suffers from a small community of developers.

------
mcos
The biggest problem with the spaces vs tabs debate is that editor presentation
is still tightly coupled to file persistence. Imagine an abstraction layer
created so that developers might choose to see what they wished, yet have
files saved in a standard format it might negate some of the issues people
have.

~~~
mbrock
That sounds like a really complex solution to a kind of non-problem.

------
txutxu
Some perl projects carried by teams, use perltidy as pre-commit hook.

It's not as cool as talk about newer languages, but it handles all you may
need about code indenting and alignment. And about tabs and about spaces.

Today somebody was asking in planet Debian, about Haskell vertical code
alignment... well, again, perltidy has an option to enable/disable that.

I've developed many years without using it... recently I discovered it in
first person, now I cannot live without it :) has options even for vertical
alignment of indented comments.

I enjoy seeing it in action, after an hour or two of coding $anything. It
finds always more inconsistencies than I did expect. And I really _try_ to be
consistent.

~~~
TurboHaskal
Perltidy is truly amazing. I haven't seen anything that comes close in terms
of power and configurability.

------
skoczymroczny
Parsing files might not be the best way to measure programmer preferences,
because in big projects programmer's preferences will be squelched by the
coding standard and/or tab/space cargo culting.

------
rsaarelm
I wish he'd have also counted the files that use only spaces or only tabs
compared to a mixture of both in the indentation.

My reason for being against tabs is that unless you have something like gofmt,
somebody will inevitably screw up and put in indentation that mixes tabs and
spaces.

The second thing to check would be tabs that aren't in the initial indentation
whitespace of the line. The other inevitable screw-up is using tabs to do some
kind of vertical layout that shows up right with exactly one tab stop size.

------
open-source-ux
Obligatory clip from Silicon Valley (Season 3): Tabs vs Spaces

(Some potential storyline spoilers if you haven't seen Silicon Valley)
[https://youtu.be/SsoOG6ZeyUI](https://youtu.be/SsoOG6ZeyUI)

~~~
randlet
Wrong link :)

[https://youtu.be/SsoOG6ZeyUI](https://youtu.be/SsoOG6ZeyUI)

~~~
open-source-ux
Oops, thanks for the correction! Have also updated my original post with the
correct link :)

------
pbiggar
For gods sake man, how _many_ spaces?

------
gnode
I find it interesting that C++ using the .cc extension has about 7% tabs,
whereas C++ using the .cpp extension has about 36% tabs.

I wonder what else is different about these two groups.

~~~
brandmeyer
Well, one correlation is that Google's coding style for C++ uses spaces and
the .cc extension. It stands to reason that some fraction of Google alumni
would continue to use both practices in their own work. I don't know if its
enough to push a 36% tab usage to 7%, but might account for a solid fraction
thereof.

------
scraft
I've worked for 10+ years in the games industry, ranging from 10 a man 'indie'
studio to 400 people (large for the games sector) studios. I am struggling to
think of any project I have worked on that hasn't used tabs (including a huge
Python project). This maybe a completely irrelevant data point, but I wondered
if there is a chance the games industry has a natural preference for tabs?

------
d33
Am I the only one that finds DATA interesting here? Has anyone actually posted
the terabyte over BitTorrent or something so I could play with it while
avoiding the "Don’t analyze the main [bigquery-public-
data:github_repos.contents] table — at 1.5 TB, it will instantly consume your
monthly free terabyte."?

------
milansuk
I would like to know If they are people who have 'switched'(coded with spaces
and then use tabs or revers)?

~~~
louthy
I have, I mentioned it in my comment above [1]. It's not too long, so I'll
paste it again:

"I came to this conclusion too after at least 20 years of using tabs; but
mostly it was because of moving to white-space significant languages like
Haskell, F#, etc. It was just too painful getting everything lined up with
tabs and I needed the fidelity of spaces.

Previous to that with C/C++/C# the alignment usually took care of itself
through the closing brace and IDE auto-formatting. So tabs was the natural
unit of currency there."

[https://news.ycombinator.com/item?id=12398432](https://news.ycombinator.com/item?id=12398432)

------
adontz
"One vote per file: Some files use a mix of spaces or tabs. We’ll count on
which side depending on which method they use more."

This makes results completely useless, as files of people who really do not
care will count for one side or another randomly.

~~~
samuellb
Or people who use a "mixed tabs and spaces" indentation, e.g. size 8 tabs with
4 characters wide indentation, and spaces for uneven indentations. (Which IMHO
is broken because it leads to unreadable code when you don't have the same
settings as the person who wrote the code).

Also, I guess C-style block comments could cause an over-counting of spaces
(but looking at the numbers, it seems that either the query takes this into
account, or C code simply has a poor comment-to-code ratio)

------
bluejekyll
This is such a flamebait topic.

They could just as easily have posted a study of who uses emacs vs. vim. Well,
maybe not as easily, and we'd all sit here debating the pros and cons of
either.

Tools like gofmt will, over time, make these discussions moot.

------
taivokasper
This is a remake on a bigger scale of:
[https://news.ycombinator.com/item?id=10795745](https://news.ycombinator.com/item?id=10795745)

------
ameliaquining
I'm curious, what's going on with C? I was surprised to see it as the only
language other than Go (where tabs are enforced by gofmt) to dissent from the
spaces majority.

~~~
CalmStorm
Probably related to Linux kernel coding convention:
[https://www.kernel.org/doc/Documentation/CodingStyle](https://www.kernel.org/doc/Documentation/CodingStyle)

Quoted: Tabs are 8 characters, and thus indentations are also 8 characters.

------
jasonkostempski
Why no Objective-C? I'd bet it'd be either heavy on the space side or such a
horrible mix of both that the analysis program would become self-aware and
kill itself.

------
ilikejam
$ emacs <file>

C-x h <tab>

Whatever that does, with whatever type of source file, that's what I'm using.

------
goalieca
As for my preference, well let me just say.. screw makefile! You've got it all
wrong!!

------
emodendroket
So basically unless you're using C the argument is settled in favor of spaces?

