While I'm at it, half the justifications Google gives for the shell guide are inaccurate. Looks like a "overall people are used to this style so we're using it and we'll try to justify it without knowing why"
Thread's gotten too deep and wide to reply to everyone, so I'd like to vent right here at the top...
We have these machines that are meant to take the tedium out of our lives, let's USE THEM. Why do we not have tools that format things they way I like them? Who cares how the other team members like it? They can use the tool, too. Who cares how it's stored in your VCS? The tool should just be part of the workflow and format spaces/tabs the way I want them and maybe revert to a canonical format for committing to the repo.
Personally, I prefer tabs. I read code best when it's indented four spaces. But John wants to use two spaces in this code. And Ralph likes three. Holy Hell, don't make me wade through someone else's preferences to understand this codebase! Use tabs and people can set them the way they want. Yes, tabs are broken in many places (e.g. Objective-C methods can get so verbose that they need to wrap; convention is to wrap and align the colons; tabbing as far as possible and then spacing to fill breaks things when Ralph formats his tabs to three spaces; in this case, there is indeed a solution: tab to the start of the original line of code, and space to format from there ... but then the IDE has ideas of its own about what should be spaces and what should be tabs...)
Maybe you can read that code with only two spaces of indent, but I can't. I'm gonna need that reformatted to put a larger visual separation between scope changes or whatever required the indention. Rather than have this debate, a tool should be forged.
> Yes, tabs are broken in many places (e.g. Objective-C methods can get so verbose that they need to wrap; convention is to wrap and align the colons; tabbing as far as possible and then spacing to fill breaks things when Ralph formats his tabs to three spaces; in this case, there is indeed a solution: tab to the start of the original line of code, and space to format from there ... but then the IDE has ideas of its own about what should be spaces and what should be tabs...)
Which is the problem. If you think it's possible to make a tool to solve this, great - do it, I'll happily adopt your tool as a precommit hook and use tabs to indent everything (or go on using spaces - your tool would turn them into tabs, right?) But given that this problem has existed since the '70s and no-one has solved it, I'm inclined to think it's impossible. So I'll use spaces in the VCS, which means agreeing an indent width so that our diffs make sense (and I don't really care whether that's two or four, but it does need to be defined and adhered to across the team).
Disagree. Readability is not about 4 or 2 spaces, it is about strongly enforced convention that makes these typographic details literally disappear from the eyes of the readers. This effect is very noticeable in book publishing: if you ever notice the typography, the typographer failed at his task. It is the same in code, just file the rules and it will become your preferred guideline in no time.
And repeated experiences everywhere point to the fact that code text is already very hard to manage, very sensitive to alterations, misreading, etc. I'd rather remove layers and tools than add more.
Edit: obviously, readability is ask about convention when they are within the realm of sane acceptable conventions. And by everywhere I mean in teams where the loc count is above a few thousands.
Disagree with your disagree. There is a difference between the actual code and how the code is styled. It makes sense to have conventions on how to format a for statement but not where to add brackets - making them be on the same line or on their own line makes little difference to noticing bugs and is purely a stylistic preference.
So you mean that the where-the-brackets guidelines found in core FOSS mega-projects, big corps like Google, down to many github personal projects having their own "please follow the guides" in the readme, all of them are misguided, useless, wrong and even, let'y say it, plain stupid?
I, on my side, claim that guidelines are there for a reason, and should be followed.
> Use tabs and people can set them the way they want.
If members of a team set tabs the way they want, how do they limit line lengths to 80 columns, which is a requirement that's specified in the Linux kernel coding style that was linked to in the comment you replied to?
This looks at it only from the context of the single dev's machine. I care more about the consistency of the source code that's checked into the repository and not getting false file diffs based on white space differences, and other formatting discrepancies. And for some reason, this seems harder than it should be.
That wouldn't be hard at all. All you need to do is assign a default style and create the diffs in that style.
The hard part is creating a tool that can change between styles. Such a tool would need to be made for every language separately which is a pain. I've been wanting to try something like this for years but have never built up the courage.
It really doesn't matter how many spaces or tabs, what's important is that all code and scripts are written in the same style, which greatly reduces time required to understand someone else's code. At Google, that's a norm.
Agreed, consistency matters most. Regarding whether it matters whether tabs or spaces are used, I've come to realise that I cannot control the tab length when viewing code outside of my editor (obviously!). So, let's say we have a 80 char line length limit. If I set my tab size to be 2 spaces, the line length is obviously going to be different to someone who sets their tab size to be 4 spaces. Just having the option to control the size of tabs means tabs should never be used as it will result in inconsistencies across editing/viewing environments. Also, when viewing code online, generally you have a limited viewing area. If you're using tabs, the length of the lines are generally going to be very long, and it's very difficult to view the code. In conclusion, if you care about things like line lengths (you should), and if you care about keeping your code in a certain and consistent format, the only way to manage it is with spaces, imo.
Yup, they seem to have used an overly indent-happy indentation scheme. Two separate indents: he 'begin' after the 'for' and for the block body is superfluous. This raises the concern that their results are biased towards smaller indents.
I've always been conflicted on this. It's simply untrue that more than 3 levels of indentation indicates bad code. For example, HTML, XML, and JSON often benefit from deeper levels of indentation.
Tabs have the benefit of having a dynamic size, so I could simply adjust the tab width within my editor. Spaces also do not play as well with editors. Unfortunately, browsers offer no means to change the width of tabs, so it's impossible to properly display code using tabs. This is especially frustrating when reading the source of a page indented using tabs. It would be much easier to read with spaces, but not as easy to read as if I were able to set tab width.
I read that piece many years ago and thought who cares, but I've later changed my editor to never use tabs. He's right.
Tabs are nice in theory, but in practice you end up with garbled code. It's better to train yourself to get used to seeing code bases formatted slightly differently, just one of the things you have to get over, IMHO. Like camel notation version underscores versus dashes.
> It's better to train yourself to get used to seeing code bases formatted slightly differently, just one of the things you have to get over, IMHO. Like camel notation version underscores versus dashes.
That's a pretty good point. Indentation is indeed just one small aspect of the overall code style - why would you want to have that configurable and nothing else?
If the editor of the Lisp Machine, which was used by people like Steele, Stallman, Pitman had no problems using tabs, it can't be that Lisp has a technical requirement not to use tabs. Steele wrote a book about C. Stallman wrote a Lisp in C and a C compiler...
Basically your arguments are from a strange parallel world.
What external tool can't interpret a \t character correctly?
To me, the problem is consistency. I'm a 4 space man myself, but tabs don't outright offend me, as long as they're everywhere. I'd rather commit edits with tabs than attempt to convert the whole project to spaces, or worse, commit some spaces and some tabs to a single file/project.
A tab does not mean 8 spaces. That's the whole problem, people keep thinking tabs and spaces have an exchange rate.
If you have a source file with only tabs (and you should, then nobody has to run an re-indent script because they can't read it), you should be inserting tabs. If you have a source file with only spaces, you should map the tab key to insert the correct amount of spaces before editing. If you have a source file with both tabs and spaces for indentation, you should re-indent.
Well, it shouldn't be a problem if you keep your source files clean and properly use tabs for indentation. Then everyone can set their editor to render tabs at their desired width, whether it's 2, 4, or 8 spaces.
It's ludicrous that these space-wielding heretics (to borrow from the kernel style guide) are keeping everyone from having that.
Every editor, or at least every good editor for programmers, can convert 4 (or whatever) spaces to tabs. If it bothers you that much, you can set up your editor so that it tabifies files when opening them and untabifies when saving. This way you won't even know if the file used spaces or tabs - and that's how it should be.
In short: configure your editor to display to your tastes and save to whatever style guide you use in your organization. Problem solved.
Ups. good luck then; that is bound to conflict with other editors;
I think the Visual studio editor (or eclipse?) used to have 4 characters for tabs, if everybody is on Visual studio then you are fine. Of course if everybody in the shop has the same editor / editor settings then life is easier.
vim has 8 spaces for tabstop and that's what its docs say:
"Note: Setting 'tabstop' to any other value than 8 can make your file appear wrong in many places (e.g., when printing it)."
Why is this still a thing that people have to care about? Can IDEs still not automatically convert between company-style and personal-style?
(Sad that tabs solved that problem decades ago (by having a byte that means "+1 indent" instead of having to build some ascii-art that looks like an indent) but people screwed up the implementations so badly :( )
Proper indentation (tabs for indentation, spaces for alignment) is still pretty hard to configure in most editors in my experience. Even in emacs I had to install an external package to get it to behave properly (and it still breaks from time to time).
Failing that I'd rather people just used spaces and no tabs. Using tabs for alignment just means it's going to look fucked up everywhere else. At least spaces are consistent.
Hah... if you think 2 space indents are bad in shell (and C++), you should try writing python with it (yes, google coding style, in a fit of madness, insists that engineers try to keep python blocking straight with 2 space indents. Remember, some of these hapless folks are maintaining large bodies of production python code (as opposed to itty bitty automation scripts).
Oh.. hmmm.. so this has changed since the (first and) last time I tried to get a piece of python code through a review at Google. Looks like better sense has prevailed finally.
(There's a version number on that public document but as far as I see, no version history so I could tell when the change happened.)
I think 4 spaces is for public code and 2 spaces for internal code. FWIW, I got used to 2 spaces quickly enough, and now it doesn't bother me using either 2 or 4, other than sometimes forgetting to change the setting on vim when changing between projects.
It's fun how the "tabs vs. spaces" war is relevant only where it doesn't really mean anything.
See, most of the languages (I've seen so far) have a well-defined rule of identation. The thing you get when you do "select all, ident file" in your IDE / editor. At this point it doesn't really matter whether you use spaces, or 2-space tabs, or 4-space tabs, or mix, or whatever - there's only one way non-whitespace characters can be positioned, so it will (in theory) look the same on every editor after you reindent it.
The only place where the choice of tabs vs. spaces actually matters is when you want, for some reason, to break the default rules of identation for your language and position something manually. But in this case, there's no debate; tabs are not suited for precise, manual positioning. Only spaces will be guaranteed to make the code look the same everywhere.
Totally. I love the kernel style over anything else. Even the GNU style for C isn't something I am very fond of.
Still there are some languages where 4 spaces indent is fine. But two is just horrible. Also I like tabs. Convert tabs to spaces in the editor and it works out pretty fine. 2 spaces is just too cluttery.
The sheer number of replies to this point is staggering, for what really ought to be a total non-issue in this world.
Someone needs to make some kind of GitHub integration that lets you download code using whatever esoteric formatting you prefer, then transform back to some given standard on commit. Then everyone can finally just agree to disagree and get on with life.
I consider the leading pipe clearer, and could be sold on it, but ending each line with an inconvenient slash character (non-US layout) makes me strongly favor the trailing version.
The trailing version seems more natural with operators like comma, but less natural with operators like minus, and is far clearer with semicolonless languages. I suspect if I were to go through old code, I'd find both uses, but generally I prefer trailing.
The exception to this is languages like Haskell, though indentation could serve the same purpose.
I'm surprised there isn't more commentary on when and how to use external but standard or common commands like grep, sed, awk, perl, join, find, tar, parallel, etc.
It's one thing to use bash consistently everywhere but as a heavy multi-machine shell user I've been bitten by incompatible or missing external utilities more often than I care to admit. You might be surprised how many systems aren't using the GNU utilities, have them running in a weird mode or are using ancient versions of them.
Maybe Google is religious about keeping all their environments identical?
I almost always use set -e with my bash scripts;
this way you will always notice if an invoked command failed or not; your script will not report that everything is OK if part of the process actually failed.
That is a particularly useful approach in functional languages, where the function declaration itself is actually a fairly concrete guarantee of what the function will do.
Shell scripts are almost the complete opposite end of the spectrum:
Shell script functions are usually only created as a last resort.
Global side effects (creating temporary files, changing global system state, global variables, etc.) are what shell scripts are all about.
There are rare snippets of shell scripting that are different, using local variables and doing some sort of calculation, but that is the exception, not the rule.
While ideally comments would be prolific, poetic, and perfect, some commenting is always better than none and most developers have bad habits of not commenting their code, so pushing them gently in the direction of more, not less, usually works.
I would rather put that the comments should be succinct rather being prolific. Better to put some explanation on tricky parts of the code as comments, and have method/function/class behavior as javadoc, pod, pydoc etc.
why would you need to re-read every line if you are looking at methods that have a single responsibility and that responsibility is clearly communicated through the name? (and parameter types/name, return types/names, in languages where some of those things are available)
Style guides are the worst to me. Taking some of the small joys left in programming away, while making you feel like a cog at the same time. Especially since many are outdated or just plain wrong, and very difficult to change once established.
Tools that take the AST and output standardized code for peer review and documentation sounds a little better. It would not deal well with the only human problem really worth having a style guide for - naming things. But at least humans aren't forced to jump through hoops. And the naming thing possibly can be settled with an interface that asks something like - "what do you want everyone to call the 'BitWarper' symbol?", for all named symbols.
But well written software is no place to express individuality - we just want it to work and not make our eyes bleed when we have to fix it! Even better then, just have machines generate and test all the code based on systems of higher order rules and style guides in situations where factory manufactured code is necessary. The outputs should be reasonable if the requirements are well specified (NASA style). Humans can come in after and do the real fun work in optimizing and finding clever hacks (if environment is not mission critical and such liberty can be safely taken).
Having humans program character by character, with their bare hands, while also suppressing creativity, is unnecessary in this Post-Industrial Age.
Yeah, Google has 80 column limits for almost every language (except for Java, it's pretty difficult there). So does Mozilla and all major open source projects I can think of (again, Java being the exception).
An 80 character line limit almost mandates use of a 2 space indent, for most languages.
Personally I prefer a line limit closer to 100, and 4 space indents in most languages. Some languages end up with a lot more indenting than others owing to structural literals and lambdas, and they benefit more from a smaller indent.
I was surprised how quickly I got used to it. Occasionally I have to trace blocks but not very often, and the tradeoff feels wonderful (it's much more rare that I catch myself re-formatting things to avoid the 80char limit).
Nothing that hasn't been said a billion times already -- and likewise for spaces. A fairly comprehensive overview: http://c2.com/cgi/wiki?TabsVersusSpaces. No, the clever people in this thread are not going to put the issue to bed for once and for all.
It's still an issue, but arguably one you only have when starting a new project. I have yet to hear a single reason for tabs or spaces that justifies changing the indentation for a whole established code base - unless it had inconsistent indentation to begin with.
Well, sometimes I need to align my lines in a certain way (e.g., if function arguments are on multiple lines, I want them to align with each other). It's often impossible to do with tabs, because they are fixed width, so I have to add spaces. But when other people view my code, and their tab width is different, it all goes to chaos. That's why I prefer spaces over tabs.
Ah, I didn't see the distinction between "indent" and "align."
Anyway, it seems like this system necessitates that the leading white space on a given line must be a mixture of both tab characters (\t) and spaces, unless one sets their editor to insert space characters as tabs (which is standard), and is obviously living in a state of sin.
The primary reason people use monospace fonts is historical inertia. It's definitely worth trying out a proportional font if your environment is amenable to it (some tools don't cope particularly well, for sad and disappointing reasons).
Nothing intrinsically, but there is value in consistency, and Google's decided on spaces. One of the rationales for these guides is to give the programmer a way to make a decision about questions where there is no single correct answer.
If you use tabs, then you cannot have a column limit because the line length changes depending on what the tabstops are.
In Linux they say tabs are 8 characters, which no they are not, so they can have an 80-column line limit. Anybody programming with a less than 8 character tab can't get the column limit right (the number of characters on a line changes with the indentation level).
Tabs are invisible characters that don't have a standard width so are always causing problems like this. They are used because many programmers use editors where they would have to actually press space multiple times to indent/unindent (ie bad editors) and because source control doesn't know when an indent change actually means something vs just being cosmetic.
One possible reason is that if you copy a line with tabs from the terminal, you're actually going to end up with the tab converted to spaces in the clipboard. If you use spaces everywhere, you won't have this problem.
Unfortunately firefox 24(on windows) has problems with rendering the style of the xsl stylesheet and gives crap putput(Basically the full text content with no line breaks). It seems to work on firefox 21/linux though. Also ie8/windows and chrome 30/windows chromium 25/linux work.
Under the section 'When to use Shell',
why does the style guide say 'If performance matters, use something other than shell' ?
I was under impression that since shell script is low level it should have superior performance.
Google appear to use Oracle Hyperion Financial Management for consolidated reporting and planning (search their job pages - they usually have ads) - I would suspect that this runs on Oracle boxes running Solaris.
These rules look like they came from the low-intelligence paper belt. We do not write rules like that. They must simply be part of the tool. Otherwise, they do not count. What they now did, was to create an opportunity for someone who knows that he is incompetent, to invent a new job for himself, that is, "checking up" with his more competent colleagues, who contrary to him are productive in writing code, on this style guide. Rule number one: Anybody who wants to "enforce" this kind of rules must demonstrate that he is capable of writing a parser that can apply them. It is simply bad practice to create that kind of opportunities. It is bad practice to create that kind of jobs. Therefore, this kind of documents must be rejected.
Of course we write style guides and while I like the idea that it's part of the tool, that's rarely the case.
I don't follow the objection about a colleague helping ensure consistency across a team. I'm really not sure why competence comes into that equation either.
Agree with the idea any style guide should be automated. Several CI servers I know of can incorporate style checkers and their reports into their workflow so this can be made really hands off, even to the point of automatically failing code review stage before its been lumped in the review queue.
Don't agree at all with the idea this type of doc should be rejected. In fact it's completely wrong to jump into automation without having "found which way is up" manually first time around.
Actually, they came from very smart, well intentioned people, who thought everyone would program Lisp in this century. And of course never saw C (and all its related family of languages) coming and ruling for decades.
All these rules make sense for Lisp, and no sense at all for C.
And they did wrote the parser that applies those rules, in Emacs you never indent Lisp code, you press a key and the current s-expr automagically indents better than you could ever dream of doing it.
> It is not necessary to know what language a program is written in when executing it and shell doesn't require an extension so we prefer not to use one for executables.
I disagree with their recommendation against using file extensions for executables, and I'd love to have my mind changed about this.
Using an extension gives you automatic syntax highlighting. It also lets you quickly glean the type of a file when exploring a directory for the first time, which is more helpful than simply knowing whether the file can be executed.
Why does a lack of necessity override those two benefits?
When you run a program, your concern should be what it is called. Not how it is written.
A language-specific filename extension puts an implementation detail in userspace. If you run a binary executable, you shouldn't care whether it's written in C, C++, Fortran, or any other language (though source files, being used only by developers and compilers, generally do have extensions).
Worse: if you decide for whatever reason to change the implementation language, you're either forced to track down and change all references to the program name, or to retain the (now incorrect) filename extension for backwards compatibility.
And, as noted, magic(5) or the shebang line should correctly identify the file type and language for syntax highlighting -- if not, your editor is broken. Replace it with a shell script, "editor.sh".
file(1) will tell you the types of files in a directory with far greater accuracy than filename extensions can.
The extension is unnecessary for syntax highlighting. Shebang lines are supposed to be used for detecting the file type.
As for knowing the file type at a glance, I'm not sure how often I need this. I'm normally looking for the file by name anyway. If I needed to determine the file-type, I'd write a script to parse the shebang lines of executable files in the current directory and generate a list of the files with their hypothetical extensions (based on a hash/dictionary/whatever). I don't need that very often, so, for me, the trade is worth it.
Your benefits are so to people who are going to be editing these files. They are not benefits to people who are going to be using these files (i.e. running the script). Most people are not going to be editing the script, they are going to be using it. Thus, it makes sense to cater to them, and to not have them have to know things they don't care about, like what language the tool they are using was programmed in.
One reason: let's say you write tool 'foo.sh' in Bash. It does what you want and you move on to other things.
It's suddenly a year from now, and your foo.sh tool needs some new features, or is too slow to do the job any more because your requirements have changed.
You decide it's grown too much for Bash and want to move to Python for maintainability, or Go for performance, or C++ to link with some library you need to use.
Now you have to tell your team (and any other teams that have found your tool useful): "We only want to maintain one version, so don't use 'foo.sh' any more. You have to use 'foo.py' (or 'foo.exe' or whatever). Oh, and have fun changing all YOUR scripts and tools that reference 'foo.sh'!"
You now have two files to do one thing, and your solution doesn't work (it doesn't pass the arguments from the shell script to the python, or pass the exit code back, and it's making an extra process which can screw up monitoring). You could add the extra code to make it work; but even if you did, you're going to a lot of extra effort to become equal, not better :P
Most of the things you mentioned can be solved in ~15 minutes, surely not a lot of extra effort.
And I'm not sure how a script that calls another script screws up monitoring.
Plus, if someone cares about high level stuff like monitoring but is worried that people's tools might break if he changes script.sh to script.py later on, I think he needs to sort out the lower level stuff first. Like distribution and packaging :)