Hacker News new | comments | ask | show | jobs | submit login
Black: An uncompromising Python code formatter (github.com)
447 points by kumaranvpl 9 months ago | hide | past | web | favorite | 255 comments



This is great except for the enforcement of double-quotes around all strings and spaces around slice operators. These two choices contradict the standard Python documentation, most of the standard library, and the behaviour of the interpreter itself.

When the language itself has an established convention, Black should follow that convention, not fight it. These two weird choices just generate needless churn, which is surprising as it seems Black has quite the opposite goal.

It's a shame, because all the other design choices in Black are pretty good!


Colons in slices are implemented to the letter of PEP 8. Your disagreement here probably stems from pycodestyle mistakenly enforcing a different rule (no spaces before colons on if-statements, defs, and so on) in the slice context.

The language itself doesn't have an established standard in terms of string quote usage. If it did, Black would follow it. What repr() does is a weak indicator and how the documentation is written is random, there was not only no enforcement as to which quotes to use, there wasn't even a recommendation. Black standardizes on double quotes since it has clear benefits whereas the other option does not.


I thought of another way to express this that might resonate better.

What I like about the Black philosophy is that it wants to make code style _uninteresting_. People should think about other things, not formatting. That's a great goal. So it seems to me that the best style choice is the most _boring_ choice. The least creative, least novel way. It should try to avoid inventing new formatting algorithms.

What's the most boring way to format a string literal?

The way the language already does it. The way every Python programmer has already seen string literals formatted from the very first day they started typing things into a Python interpreter.

Even if half of us liked single-quotes and half of us liked double-quotes, you can guarantee that every Python programmer on the planet has seen and lived with strings that are formatted the repr() way, including the double-quote fans. You can't guarantee the opposite.

No one could fault you for doing it the repr() way. Blame Guido :) he made that choice decades ago, and everyone has already had to make their peace with it. It's a solved problem. For anyone writing a program that emits Python code, it's the default way to format a string. It's the least assailable option, and that's a good thing.

How does that sit with you?


I’m a Python programmer and I never had the perception that single quotes are more canonical in any capacity. I always felt that single quotes were for degenerates who didn’t realize that single quotes are most commonly used for chars by the broader programming community. ;)


I've seen single quotes for chars vs. double quotes for Strings in Java, but in sh, it's "Strong Quoting with Single Quotes" vs. "Weak Quoting with Double Quotes":

http://www.grymoire.com/Unix/Quote.html#uh-1

"When you need to quote several character at once, you could use several backslashes. This is ugly but works. It is easier to use pairs of quotation marks to indicate the start and end of the characters to be quoted. Inside the single quotes, you can include almost all meta-characters"

So some of us associate single quotes with proper const strings, with no variable expansion, command substitution or other interpreter hanky-panky.


I’ve always thought that double quotes are for degenerates who don’t pay attention to what a standard repr output is for strings.


And I’ve always thought that triple quotes are for degenerates who... no, wait, nvm.


... who can’t write self-explanatory code.


Code only captures the what and the how, comments for any complex routines can capture the why. Obviously overuse is probably a red-flag, but in my experience code alone doesn’t always capture enough context in any sufficiently complex system.

I get your specific comment may be a bit sarcastic, given the ancestry - but I make this one more for younger developers I see that don’t always filter that ;)


You don't have to press Shift as often to type string literals so it's a win, all other things being equal.


I despise people using single quotes even more than people using tabs instead of space, emacs instead of vim or GNU-style indentation instead of... anything sane :D


I used to use single quotes for dictionary keys and other constants, and double quotes for strings intended to be read by human. I read some article about it a considered it a good convention. Now I see it actually isn't a convention. :-) I switched to black just yesterday and overall I like it, still I miss a bit this distinction between single and double quotes.


Slices: My reading of README.md is that Black inserts spaces around the colon, is that right? Almost none of the Python I've seen in 20+ years is written that way. The tutorial and documentation on python.org don't use spaces around the colon, and it is extremely rare in the standard library (out of over 1100 slice expressions that have operands on both sides of a colon, I count 10 that use the extra spaces).

Quotes: We have a difference of opinion over what constitutes an established convention. If the language's own way of displaying strings has been stable for its entire history, I consider that established. A reasonable choice would have been to do what repr() does (single-quotes unless the string contains a literal single-quote) or a simplified version of it (single-quotes always), not the opposite of what the language itself does.

(I wouldn't care so much about this except that I've really wanted a tool like Black for a while and greatly appreciate the philosophy!)


> My reading of README.md is that Black inserts spaces around the colon, is that right?

I can't speak for the readme, but:

    $ cat test.py
    x = [1,2,3]
    print(x[1: 3])

    $ black test.py
    reformatted test.py

    $ cat test.py
    x = [1, 2, 3]
    print(x[1:3])


Oh, I must have misunderstood. Sorry! I take back my issue with the slices, then.


The README confused me, but I think it's saying that it inserts spaces if needed to make it clear that it's a lower-precedence operator. But unlike operators like +, it's not one that inherently requires spaces.

I tried out black and it does the following (the file originally had no spaces):

    x = a + b
    x = m[a:b]
    x = m[a + 1 : b]
    x = m[a:-b]
    x = m[a : 1 - b]
I would personally still write m[a + 1:b], I think, but black's approach is totally defensible. (I guess I would really write m[a+1:b], and black would rightly correct me.)


m[a+1:b] and f(x=y+1) are weird corner cases of Python formatting.

I usually avoid this problem by adding extra parentheses:

  m[(a + 1):b]
  f(x=(y + 1))


Wouldn't being closer imply being used together by the operator? So: 6/2 * 3 = 9 but 6 / 2*3 = 1? In the slice context it seems very obvious the colon has to be slicing, because it can't stand alone to produce an indexable value. But spacing should use this rule, right? [I think this rule was the one used by fortress, the Guy Steele language?]


Yes, that's the rule black is implementing (assuming I'm reading your comment right and understanding its README right...). m[a + 1:b] is in danger of being read as "take the slice 1-to-b, add it to a, index m on that" - even though we know that isn't a well-typed set of operations, the spacing implies that's how it should be read, same as e.g. m[a + 1/b].

Spacings that don't conflict with precedence are m[a+1:b], m[a+1 : b], or m[a + 1 : b]. While the middle one makes precedence obvious, all three are acceptable. black picks the latter, and I think it picks that one because of another rule that it should prefer writing a + 1 instead of a+1.


Double quotes also have drawbacks, visual noise and the doubling of keypresses required on the most common keyboard layouts.


The "visual noise" makes it clearer the me that it surrounds a block of text, so I don't buy this.

Double quotes are also the convention in english to delineate a literal, so I would argue it's more obvious.

I'll grant you keyboard presses though. There's an obvious advantage to single quotes here, but consider why this is the case. To make the use of "apostrophe" more efficient since it appears far more often than a double quote in english. I suppose it's pragmatic to leverage this advantage in code where string quoting is extremely common...

Perhaps double quoting is a bias I've developed writing code, but I suspect it's actually a bias I carried over from reading and writing english and what simply seemed more obvious.

Language design than supports both makes me crazy. Pick one and enforce it!


You can still type single quotes. You have a tool to convert that for you.

The visual noise complaint is interesting. Do you also consider the letter W to be more noisy than the letter V? Should we discourage the use of noisy letters in the alphabet?


A double-quote is more noisy than a single-quote, and W is more noisy than V.

The difference is that " and ' are equally usable options in the context we're talking about. Quotes are very common, so the visual noise adds up when your screen is full of quote marks. Given that they mean the same thing, and one is both harder to type and harder to read, it makes sense to prefer the other.


> Given that they mean the same thing, and one is both harder to type and harder to read, it makes sense to prefer the other.

Nailed it.


A double quote is in no way harder to read. Only on Hacker News.


this seems a bit reductive.


Yes, why www. was dropped, and quotes are used everywhere in most Python code. The triple doubles for doc strings are the worst example, though I have no illusions pep8 will be changed any time soon.


> The visual noise complaint is interesting.

I thought the same. I guess you could just modify your programming font to make the double quotes really tiny :)


I agree about the noise, but, it's much easier to work with. raise FooException("Can't load bar") works, as does f"The flange is elevated by {flange['spronge']} degrees of spronge."


Goes both ways, sometimes there's text to quote:

raise KeyError('"%s" not found.' % name)

f'The flange is elevated by {flange["spronge"]} degrees of spronge.'

I don't normally use many contractions or possessives in my code, but am willing to admit it happens occasionally.


Yep, that's why I like how repr() does it. Guido solved this nicely a long time ago :)


Eh, many programming languages use double quotes for strings, and single quotes for singular characters. Languages that can use single quotes for strings that I know of are ruby and python.


Don’t forget VimL, the language that insanely decided to use double quotes as a “start of comment” indicator, then went back and gave them special meaning based on their position so they either signal the start of a comment or a string.


A different comment character could have been chosen, but this affects nothing. It's not insane.


Lua, and there are others. I favor double quotes for this same reason even when single quotes are allowed.

I'd like «guillemets» but that's going to have to be self-serve...


I use single quotes when the string in question is used like an enumeration:

    set_color('black','white')
And double quotes for strings meant to be read by a human:

    print("Hello there.  How are you?")
And an example with both forms:

    syslog('warning',"unit %s spin rate %f too low",u,rate)
For me, it's an indication of how I expect the string to be used.


And even then, Perl and Ruby (heck, even Bash) have a semantic difference between double- and single-quoted strings. The only other popular general-purpose languages where they're synonymous are PHP, JavaScript, and Lua.


>The only other popular general-purpose languages where they're synonymous are PHP, JavaScript, and Lua.

PHP does treat single and double quotes differently. Contents of single quotes strings are not parsed for variable substitution.


Also Pascal and SQL.


And JavaScript!


Hmm. My reading of PEP 8 on slices is that the spaces surrounding : in slices are optional, and used to implement the later rule "If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator."

This leads to the following difference between the PEP 8 recommendation and Black:

PEP 8 accepts both of these:

  ham[lower+offset : upper+offset]
  ham[lower + offset : upper + offset]
While Black just uses:

  ham[lower + offset : upper + offset]
Also, this is one that PEP 8 doesn't have a example on, but I think this would be one of the cases in which the space wasn't necessary:

  slice[a.b : c.d]

The "." operator binds so tightly in my mind that it doesn't need the spaces around the ":" to disambiguate. It's at the same precedence level as subscription and function call, and higher precedence than unary + and -.

After typing this all out, I think that it might be the case that the bigger difference is actually in how binary operators are treated. It looks like Black always puts whitespace around binary operators, which contradicts PEP 8's recommendation that you are allowed to vary spacing around binary operators to make precedence clear.

Since Black does allow for leaving in extraneous parentheses to make precedence clear, I wonder why it doesn't allow varying the space around binary operators as well? Of course, it should enforce that the spacing actually does match the precedence, and that the spacing is consistent within an expression. That would allow the following examples which PEP 8 lists as "Yes":

  x = x*2 - 1
  hypot2 = x*x + y*y
  c = (a+b) * (a-b)
While right now Black rewrites them as follows, which PEP 8 lists as "No" (though my reading is that varying the spacing like this is optional, so the following could be accepted as well):

  x = x * 2 - 1
  hypot2 = x * x + y * y
  c = (a + b) * (a - b)


Black does not take existing formatting into account. Doing so would cause non-deterministic formatting (Black would sometimes change its mind about its own formatting and a second pass would cause a different formatting than the first).

With this in mind, it has to enforce a rule around operators. Since any operand might be complex, it's more robust to default to spaces around operands always. Otherwise we would inevitably end up hugging operands with an operator that humans consider too tight. And since that's subjective, there is actually no rule that we can hard-code about that.


> Black would sometimes change its mind about its own formatting and a second pass would cause a different formatting than the first

Why not? I mean, I never built a formatter so idk about the complexity involved, but to me it seems that if Black had a rule like "if you see `a+b` (no spaces), just leave it alone", applying the rule twice shouldn't change anything.

(though I imagine that the interactions of such rules could become hard to understand...)


The whole idea is to _prevent_ people from thinking about formatting, by actually making it impossible to make any formatting decisions. There's exactly one way to format the program and that's it. It's draconian, but that's what's great about it.


That's the feeling I got from the readme, but GP sounds like this particular decision was made not so much for 'philosophical' reasons, as because Black-ing a file twice might give different results. I was curious why :)


I'm new to python, I didn't realize there's a difference, can you help me understand the benefits?


I put it in the README. Let me know if anything is unclear.


I have to admit that I was very excited to begin enforcing black on all of our projects at work - until the double quotes decision. I’m used to single quotes everywhere (except docstrings!).

I’m probably going to still move to black but I know I’m likely to face some pushback now on this one choice, and I don’t yet have the will to defend it.

I also know I’m being mostly unreasonable .. “why is this opinionated tool not perfectly aligned with MY opinions??” .. but feelings.


The double quotes are the only thing that bother me as well. Double quotes are harder to type (on my keyboards) and look more noisy. All other Black formatting I think I could live with. I suppose I can live with double quotes on strings too but I'm not going to like it. ;-)


That's the #1 complaint I have (and hear) about Black. Everything else, I can live with. And honestly, I can live with using double quotes everywhere, especially when I can type single quotes and then let it re-write them. It's just that they currently look really, really odd to me because that's not how Python is traditionally written.


I prefer a quoting convention of single quotes for text identifiers (as they rarely contain quotes) and double quotes for English text (because contractions are common). Thus:

    key['first_name'] 
    print("Isn't this clearer?")


I generally follow this convention:

1. Any string literal that contains either ' xor " will be delimited by the other one. 2. For all other string literals, delimit by " if it's meant for human eyes only, delimit by ' if it's meant for computers as well.


Same, though I'd word the first one as programmatic identifiers.

Adopted that from Erlang, where atoms are single-quoted and text strings are double-quoted.


Actually I prefer standardizing on double quotes and I do that in my code on a regular basis. I know some people use a mix, sometimes single quotes and sometimes double quotes, which is fine, but I prefer consistency in this case.


PEP8 doesn't have a preference between single or double quotes. Black has:

> It will replace the latter with the former as long as it does not result in more backslash escapes than before.

And I think that's good enough (and in compliance with PEP8)


Coming from a C background I always stuck with double quotes for strings. Single quotes are used for characters in C. I think double quotes are better anyway because an apostrophe is much more likely to turn up in a string than a double quote itself. I rarely have to manually escape quotes inside strings.


The slices thing seems to follow PEP8.


Everyone has an opinion, and this is theirs. That's kind of the whole point.


At Facebook, we are now using prettier[1] on all our JavaScript files, a growing number of Hack files are formatted with Hackfmt[2] and now black is being rolled out for Python. It's a really exciting time :)

[1] https://prettier.io/ [2] https://github.com/facebook/hhvm/blob/master/hphp/hack/src/h...


Hey Vjeux. What does black mean for the prettier python plugin[1]? I had high hopes to move over all projects to prettier (for JS/Python). Is there going to be any merging between prettier python and black? Do you recommend one over the other?

Thanks

[1] https://github.com/prettier/plugin-python


I don't know :)

I worked on prettier myself because I wanted to solve formatting for the language I was involved in. It turned out that the prettier infrastructure was actually really good for other languages so we used it for CSS, Markdown, GraphQL... and added support for a plugin system for other people to build printers for their own language. patrick91 (not working at Facebook) is working on a python formatter using the prettier infrastructure.

Independently, ambv (working at Facebook) started black which is written in Python. He's part of the Python core team and the Python infrastructure team at Facebook so it made sense for him to drive adoption of black within Facebook.

One interesting thing I realized is that communities are built around programming languages and it's really hard to influence another community from the outside. So my bet would be that black has the most chance of succeeding within the Python community.


> One interesting thing I realized is that communities are built around programming languages and it's really hard to influence another community from the outside.

IMO, that point is worthy of a detailed blog post or conference talk, if you would be so inclined. Would love to hear more.


That's true, I'll consider it, in the meantime here are some thoughts.

When I started working on React Native, I thought that the most difficult thing would be to design a good set of APIs to make it easy to write mobile apps using React that felt good. This turned out to be the "easy part", we started the project wanting to solve this and having lots of good ideas on how to do it.

What turned out to be a lot harder was the fact that we were trying to use JavaScript from within iOS and Android ecosystems.

1) Those at the time were in different repos, how do you synchronize code between them?

2) The three ecosystems use a different set of tools for everything: IDE (xcode, intellij, sublime/atom/code/emacs), package manager (cocoapods, maven, npm), linters (eslint), build (how do you hook up with the play button in xcode?), profilers (can you display stack traces with the two languages calling each other?)...

3) Mixing and matching languages inside of a single project is hard because there are a lot of subtle different semantics (eg: javascript doesn't have int32 or int64). If you have type systems, they are incompatible (flow vs obj-c). So in practice you end up with a lot of boilerplate to talk between the two languages and it's a performance overhead.

There's also a social aspect where you invested so much learning an ecosystem that it becomes part of your identity. So you see someone wanting to bring another language as trying to attack you directly.

My mission since then has been trying to "break down the silos" and trying to build tools that can work with all those languages. It's not been easy :)


Wow, that website is hard to use! You can’t touch scroll down on iOS unless you click towards the bottom of the page as scrolls in the animation zone are ignored.


Why black over yapf?


Black is a simple tool. It tries to implement a single code style well. It's not configurable.

We tried YAPF before and could never roll it out for everybody. I even contributed the "facebook" style to the tool. There were a few reasons why YAPF didn't work out for us but the most important were:

- YAPF would at times not produce deterministic formatting (formatting the same file the second time with no changes in between would create a different formatting); Black treats this as a bug;

- YAPF would not format all files that use the latest Python 3.6 features (we have a lot of f-strings, there's cases of async generators, complex unpacking in collections and function calls, and so on); Black solves that;

- YAPF is based on a sophisticated algorithm that unwinds the line and applies "penalty points" for things that the user configured they don't like to see. With a bit of dynamic programming magic it arrives at a formatting with the minimal penalty value. This works fine most of the time. When it doesn't, and surprised people ask you to explain, you don't really know why. You might be able to suggest changing the penalty point value of a particular decision from, say, 47 to 48. It might help with this particular situation... but break five others in different places of the codebase.


Another issue is that yapf uses an algorithm that is quadratic, so you end up with code you actually see in practice take seconds to minutes to format.

The authors have been adding specific workarounds for some cases but it's a general issue with the approach:

- https://github.com/google/yapf/issues/264

- https://github.com/google/yapf/issues/39

Black algorithm doesn't explode in such way. It's also a lot faster overall which makes it possible and reasonable to enable a good format on save experience.


I do like Black's line wrapping rules of keeping things on separate lines to minimize diffs. I really hate auto formatting tools that tries to compress lines vertically through binpacking techniques that produce unaligned code.


RE: non-deterministic formatting: I'm not familiar with YAPF, but what kind of behaviour does it have? Like if you run it on two copies of the same file you might get different results, or that it's not idempotent (i.e. sometimes YAPF(YAPF(source)) != YAPF(source))?


> It's not configurable.

Except for the line length:

> if you're paid by the line of code you write, you can pass --line-length with a lower number.


Yeah, I wasn't willing to die on that hill, especially that I'm introducing a new default value that wasn't popular before.


Ah but there's the irony.

Everyone would like to stick to the standard formatting rules. Well... except for that one little idiosyncratic thing that they just won't sacrifice.

Then black comes storming in to enforce conformity. Well... except for that one little idiosyncratic thing the author of black just can't bring himself to adhere to.

Don't get me wrong. I'm a big fan of the approach in general. I use things like paredit-mode and aggressive-indent-mode and whitespace-mode and I've just set up emacs to use black together with blacken-mode, but with blacken-line-length set to 80.


Time for a fork!

Let's call it "black-ish"


I've got to say, I'm quite happy with the default line length in Black.

The reason I always fight with people who want to increase line length is that they generally want to increase it to 100. Which is just a bit too big to fit two columns comfortably side by side at a reasonable size on a laptop screen, or three columns side by side on a wide desktop screen, which is how I generally set up my editors.

I'm always frustrated with codebases that use a 100 character standard, since I'm always running into lines that wrap, but when I use type annotations, 80 characters causes too many function signatures to have to wrap.


It was a good call. Some code bases use long identifiers and an 80 column limit gets to be a mess.

And for the 80 column fetishists, you'd have to pry their VT-100 terminals from their cold, dead hands. Disposing of the body is enough of a pain, but worse, you have to find a recycler who can deal with the lead glass in the CRT.


The modern argument for lower-line-lengths is the ability for more screens+fonts+UIs to display two blocks of code side-by-side. Somewhere between 80 and 120, you start to force many UIs to wrap. Adding to the subjectivity, approx nobody cares about one line of long code out of 100,000 -- so what percent of code-that-wraps is too much?

To avoid dying on that hill, many eng leads just throw up their hands and give into the 80-column folks, even if it's less productive for everybody else.

The beauty of gofmt and black is that formatting can become a commit hook, so no human wastes time cutting lines manually like some early 20th century typesetter.


When I read that line in the documentation, it hurt a little. I'm going to be honest here. I did not feel good with someone saying that an 80-column limit is just there to pad my paycheck, rather than an informed decision I made. I realize that it was tongue in cheek, but it still felt bad.


If that hurt you, you need bigger problems in life.


I like a line length rule because I use an editor that can show me multiple files side-by-side, and keeping things to a certain length ensures that I can productively do that.

And my terminal happens to be around 240 characters wide for the font size I use and the size of my laptop's screen, which means if I limit to 80 characters or less I can snugly fit three files, or if I set to 100 I can get two with some breathing room.

Plus, if a line really is going over those lengths, then it's often a code smell: maybe I've got code that's too complex and ending up deeply-nested, or maybe I've got functions or methods taking way too many arguments, and hitting a line-length rule will warn me about that.


Why are you being so dismissive? It seems like you are making this into some kind of contest to see who has bigger problems.

I am just being honest, and I think I am being helpful because other people will also react negatively to the way the documentation is written. I support the project and want it to be successful, providing feedback like this achieves progress towards that goal, in my estimation.


yapf is not consistent, you can rerun it on source and it will reformat it multiple times. The rules for how it formats are also hard to understand


Prettier is what I miss more everytime I switch from JS to Python. I hope Black fixes that.


I feel the same way. Prettier is a dream--instant speed, great output and 100% reliability for over a year.

So far black seems great. I just ran it on some existing Python packages and it was fast and the output was correct. Still need to try the editor plugins but very excited so far.

Now, if only someone would lead a similar project for R. :)


That's been my experience. I'm using the VSCode plugin and have for the first time ever enabled format-on-save for Python. I have yet to have any reason to see that as anything other than a positive.


In case of credit where credit’s due, is gofmt where the concept of auto-formatting syntax became mainstream? At first I hated it because I thought I had a style, but later on I enjoyed the consistency far more than I enjoyed my signature approach.

A programming language has an opinion on how you build software with it, so it’s appreciated that it also has an opinion on how you should write it so that it remains consistent and easy to follow. No debate about where to put braces or semicolons or whatever else doesn’t matter when putting something in front of your users.

It feels like dumbing down in a way, which is sad, but I think this is more for the benefit of collaboration than individualism or artistic intent. In that case you either disable the tool or refuse to use it.

In every other case, you’ve automated away almost every nitpick from a code review.


When i heard about gofmt, i hated it. I wanted choice. Then i realised i loved python because it forced people to indent. I gave black a try. I still hate some style decisions, but who cares ? The benefits are far too great to pass up.


I felt the same with Prettier and the 80 char limit. At the same time, it solved every single style problem except the choice of line length!

That is a fantastic achievement.


I think the unconfigurability of gofmt may be its greatest strength. Indent, etc. have existed for years but getting everyone to just agree on a reasonable style is invaluable. GNU Indent's default style is one that (almost) nobody uses and the number of available options is overwhelming. rustfmt's default style is fine, but it also has an overwhelming number of options, in addition to not handling line wrapping well. ocp-indent is the only other formatter I've used that's as painless as gofmt.


In the Perl world auto-formatting syntax is old hat, though of course (it being Perl) the formatter is highly configurable[0]. So in practice you get consistency within an organization, but perhaps not over time.

Coming from that background I was initially put off by the lack of options with gofmt, but once I realized the whole world of go coders would be using the same formatting I immediately fell in love with it.

[0]: https://metacpan.org/pod/distribution/Perl-Tidy/bin/perltidy


Configurability for this means:

- debate

- time to setup the tool to your liking

- testing (and adapting to the style)

- and going back to 1 from time to time

So at best it's going to be costly, morally draining and repeated regularly, especially if you change team or in open source. At worst, which is the case for perl, it will not be used by most devs.


Go is probably where it went mainstream in the current generation, yeah, but the idea has definitely been around for a while.

The earliest implementation I know of was COMAL, a very clean and tidy version of BASIC, I think dating from around 1980.

A lot of 80s micro BASICs did some level of auto-formatting purely as a side-effect of the source code being stored in tokenized form to save memory.


A friend had COMAL on his C-64. It was very impressive.

Microware Systems Corporation's BASIC09 from about the same time took the code you entered and converted it into "I-code". It's VM code, Jim, but not quite as we know it. The basic09 program didn't do the things we expect from IDEs now, but it did let you modify code etc., so its internal form reflected the BASIC09 statements and control structures rather than having lower-level branch and conditional branch instructions. That let it prettyprint your code with a consistent format when you listed it. (It also let it avoid the insane interpretation and symbol table lookup overhead of Microsoft BASICs of the era, inherited from the days when Altair BASIC had to run on a system with 4K of RAM.)


Visual studio has done auto formatting of c# since at least 2005. Ive used many other tool chains that does it before go became popular. As others noted, the biggest difference with gofmt is probably the lack of configuration.


As usual the practice can be traced back a long time to Lisp. It's completely expected that Lisp code will be formatted in the standard way. This isn't a Go thing.


I really want to use Black, but we have a particular style point that we’d miss so much that we haven’t adopted it yet...

Double quotes for strings that need to be human readable, single quotes otherwise.

This makes it so obvious when something is going to be sent to the user, we find it really useful. That said, I think Black’s appeal is it’s uncompromising nature, so I wouldn’t ask it to change. Adding the option to turn off quote formatting would probably go against its vision. Also, it could be argued that we should use the internationalisation functions to denote strings sent to the user, but hey we don’t do i18n yet.

For now, this, and one or two places that it fails to have an opinion (number of lines after imports) are keeping us from using it.


Do you find your team enforcing the string quote rule consistently? It seems to me like it's easy to miss at times as automatic enforcement is impossible. Are there no cases where a string that wasn't originally planned to be user-visible ends up being so? I've heard this idea at times but when I looked at actual codebases it turns out it's more of an aspiration than an actual rule. And if you can't depend on it, why have it?

As for number of lines after imports, how is a lack of enforcement there stopping you from using the tool? Black enforces one line but is fine if you put two (on module level). In general, if you give up on the tool due to a missing rule, you end up having to manually enforce tens of other rules that you'd otherwise be free from.


> And if you can't depend on it, why have it?

That's a bit rich. There are other conventions in programming that you can't depend on technically but serve a real purpose. Identifier naming and comments are the first that come to mind.

If a language gives you a choice of token that has no semantic distinction then different people will adopt different semantics by convention.

As an aside, calling a tool "opinionated" is code for "my conventions are better than yours". That's fine if I don't have any conventions or I can't decide, but if I have decided, then it's just offensive.


Sometimes it's helpful just to have a decision; any decision, followed consistently, is better than no decision or continued debate. This is one of those situations.

So I don't read "opinionated" to necessarily mean "better than your opinions"; it's more like "makes decisions for you so you can avoid the cost of debating them."


And what if I have already incurred the cost and am happy with my decisions, and they differ to yours? I now cannot use your potentially useful tool, even if 90% of our decisions do accord with each other. That's disappointing.


Comments are often a side of poorly written code though. Ideallly the name and structure of the program should make the intent and purpose obvious, which eliminates the need for a lot of the comments people leave. Of course this isn’t always true, but if you’re using comments to compensate for bad/confusing code, you shouldn’t think you’re doing “the right thing”.


I kind of subconsciously do the same thing; use single quotes for symbol-like strings, and double for human readable.

But I can buy the argument that just having a single auto-enforced rule improves consistency and that has greater benefits than the somewhat vague distinction that is not enforced.


> Do you find your team enforcing the string quote rule consistently?

Yes. It's pretty much the only formatting style point that we don't have automated.

> As for number of lines after imports, how is a lack of enforcement there stopping you from using the tool?

Our current automated linting enforces it, but Black doesn't always reformat it, so we might get linter errors from Black formatted code.


> Yes. It's pretty much the only formatting style point that we don't have automated.

Alright, fair enough!

> Black doesn't always reformat it, so we might get linter errors from Black formatted code.

Well, as long as it doesn't add new linter errors, that should be fine, do you disagree?

There are always going to be suboptimal formattings and missing transformations but as long as the situation gets better automatically on average and can be further improved with minimal manual input, you should be fine.


One idea you might want to consider is to create a global function, let’s call it h() and pass all your human readable strings through it, like so h(“hello world”). This function, simply returns the same string. This is more explicit than relying on quote types. It also allows you to do interesting things such as logging everything to a text file and running a spell checker on it, or checking for wrongly encoded string, ...etc.


I believe _() would be even better, granted you might want to translate messages in future. At first might just def _(s): return s.


A single underscore is pretty much de facto reserved for localization.


Which is exactly what this is. Putting everything a human reads into a function is step one of localization.


That's pretty standard in libraries that do i18n (like Qt, for example). It's a very good practice IMO. Not only does it enable i18n, it means you can easily collect every user-destined string and check it for things like forbidden words etc.


Yep, as others have commented, this is what I meant by using the translation tools.


Why would you have strings that are not supposed to be human readable? Machine to machine should ideally use bytearray (b""-strings) or integer enums for that purpose.

Then you also have the ambiguity of what is considered human readable, is an xml document human readable? Http headers? File paths? Urls? Is a programmer considered human?


I believe the point is whether the string is intended to be displayed to a human, not whether a human might read it in the code. Still some wiggle room around logging, etc. I guess.


Sorry if i wasn't clear, what i meant was actually why would you have strings that are not intended to be displayed to end user?

Smells like non binary serialization format or something alike, which is usually a code smell. It's convenient the first 2 weeks but once the project grows you need a more strict schema and once you have that you might as well use a serialization library which might as well have a binary serialization backend.


> why would you have strings that are not intended to be displayed to end user

Why would you not? E.g., you use pandas and columns all have names. Colors are typically also strings, etc. Thus you would often do things like

    grouped = df.groupby(['foo', 'bar'])['baz'].mean()
However, the parent's point, IIUC, is that he'd do

    grouped.plot(color='red', title="User-facing title.")


Colors as strings is a perfect example of what i'm trying to question here. How do i know which colors that are valid? Run time error? No thanks. Reading online documentation[0]? No thanks. I'd rather have my auto-complete[0] tell me right away the available options and my compiler tell me if i misspelled 'turquoiuse'. A better solution for this is a color-class with a long list of predefined colors as constants, and since they are just constants they could have any underlying format, not necessarily a string. Even in a dynamic language such as python this should be preferred over random strings.

Then yes, there are valid uses, especially in ad-hoc scripts. Column names and dictionary keys are one of the gray zones, though again, in my experience once your project grows these are also usually better to code generate from your db schema or serialization protocol; either complete data structures, api-functions, or just a list of constants. Anyway, my point is not to ban strings entirely, it's to question what we use it for. If you use strings as data/identifiers so frequently that you need a special convention for them something smells quite fishy.

[0] Auto complete = More accessible form of documentation. Before someone starts screaming that i'm stupid for "ignoring documentation".


We did this as well, but I think the benefits of black and auto-formatting outweigh losing this convention.


Could fork it and remove that rule


I love it. Something I always wish for with linters is an easy way to run them only for the lines changed in a particular diff, to allow a codebase to gradually converge on consistency without breaking git blame by reformatting everything. Is there a nice way to do that for any Python linter?


At Facebook we only tell you about lint violations for the lines you touch using arcanist from phabricator[1]. While it works great for most lint warnings, this hasn't worked that well for code formatters.

The most successful strategy was to add a flag in the file (@format in the header) to tell that a file is automatically formatted. The immediate benefit is that we enable format on save for developers on those files when they use Nuclide (>90% of penetration for JavaScript and Hack).

The other advantage is that when we release a new version of the formatter, we can re-run it on all those files so that people don't have lint warnings on code they already formatted in the past.

With that setup, there's a strong incentive for individual engineers to run the formatter on their team codebase in one PR and then everyone benefits from now on.

[1] https://secure.phabricator.com/


Have you run into issues where the "let's reformat the entire codebase" commit makes `git blame` unusable?


It doesn't. Use `git hyper-blame` or `git blame $REV^ -- $PATH`.

Sure, there is an additional step but we feel this shouldn't be a blocker for significant workflow improvements.

In fact, a single big "reformat all" commit is better than a bunch of incremental ones that reformat areas that you also change semantically. That is harder to filter and makes diffs harder to follow (which changes are logic and which are just style?).


I hadn't head of hyper-blame before. It's part of chromium's depot_tools.

> git hyper-blame is like git blame but it can ignore or "look through" a given set of commits, to find the real culprit.

https://commondatastorage.googleapis.com/chrome-infra-docs/f...


I'm so glad this is a thing. I get so tired of people arguing about the small stuff. Sometimes too much freedom is a bad thing. It seems okay at first, and you try to be as democratic as you can with your team, and then someone wants something really wonky, and PEP8 and PEP257 don't say it's absolutely wrong after all, and shit just wastes time.

I don't agree with every detail--double quotes as default is going to be hard for me to adjust to--but the things I don't agree with aren't as important to me as being able to set it and forget it and stop debating it every so often.

This is the gofmt the python world needs. As far as I'm concerned this is the new standard.

Well done.


Even if your team is totally on board with your style guide, everyone wants to stay pep8/257 compliant, no arguments, you use linters to warn about errors, etc. you will still have some instances of people committing code that breaks the guidelines.

Either it gets caught in code review and you have to waste time with nitpicking, or worse it makes it through to the repo and now you have to make a commit to fix what amounts to a typo.

Autoformatting with a unified, consistent tool means that you remove all those problems.


> I don't agree with every detail--double quotes as default is going to be hard for me to adjust to--but the things I don't agree with aren't as important to me as being able to set it and forget it and stop debating it every so often.

That's the key part for me, too: Black offers the freedom of not needing to waste time talking about things which really don't matter. No more wasting time on code review where the real issues are obscured by sloppy whitespace, idiosyncratic formatting preferences, etc.

I also had a preference for single quotes but … every file in every project I work on is consistent as soon as I hit save and I certainly don't care enough not to let that outweigh a minor aesthetic point.


> By using it, you agree to cede control over minutiae of hand-formatting.

I may be in a minority, but I do not want to cede control over minutiae of hand-formatting. Am I the only person that feels this way?


Your complaint is the entire reason it's an interesting project: it gives you practically no choice. That's why it's called black, as a reference to the Henry Ford quote: "Any customer can have a car painted any color that he wants so long as it is black."

The idea is to toss aside control over nitpicky formatting _configuration_ options in favor of not worrying about formatting configuration and just going with someone else's opinion of what the configuration should be instead.


Definitely not the only one, but I think the trend is toward using formatters.

José Valim, creator of Elixir and general programming whiz, I think perfectly summed why formatters are so great in a talk he gave at Elixir conf.

(I am paraphrasing from memory here so if someone has the source, please chime in.)

The gist: “I started using the formatter and at first I ran the it on my code and I hated it. It’s taking all my carefully, hand formatted code and messing it up! But then I ran it on OTHER people’s code and I loved it, as the code started to look like the standard format I had gotten used to.”

I think many people don’t start to like formatters until they see what it does to other people’s code. Many people like their own fine tuning, but that’s only half the question. For a big project, most of the code I read will not be my code. I prefer all of that code be in one, consistent style. Sometimes formatting is expressive, so it is a trade off, but for me the lost expressiveness is far outweighed by the Gaines consistency.


Why are you interested in minutiae? Esp that which has no functional impact and can be automated.

People are into things like this pep8 etc because they don't want to waste another second of their lives thinking about formatting. Or, worse discussing, arguing, bikesheding, documenting, enforcing, teaching the new guy how we format "here".

I'm sorry to sound snarky, but this is one of the things you slowly learn over years of development. I've had more than 25. Long ago I felt like you. No longer.


The phrasing "minutiae" is unfortunate because it makes it sound like the exact formatting is a matter of taste or doesn't matter. In my experience, there are always corner cases where either tools like these produce bad results. "Bad" isn't just some aesthetic property, it can mean the difference between being able to absorb the meaning of a block of code in 10 seconds, versus having to spend a minute taking it all in. Across a large body of code, those paper cuts really add up.

Personally, the only code formatter I've ever been really comfortable using is clang-format. And the reason is that they really try hard to get the corner cases right. Black might be fine, but I've been burned many times with other tools and in general would be reluctant to trust a tool like this without seeing what it does in practice to a large code base.


I think most people would agree that corner cases are an issue, but in large codebases the payoff of having a strong, unified standard provides a bigger payoff than spotty edge cases.

The larger the codebase and the more developers you have working on a project, the less important edge cases become and the more benefit you get from a common standard.


For a large codebase that uses Black, you might want to look at PyPA/Warehouse or Fabric 2.


I've noticed that, these days, I only find myself worrying about careful hand-formatting when the auto-formatter is doing an inadequate job. e.g., IntelliJ's default Java standard makes far too much stuff optional, which is another way of saying that it is full of cases where it forces me to make the decision.

Looking through Black's rules, it seems to me like the rules it's implementing are comprehensive and specific enough that I'd probably end up with few, if any, situations where I even want to take control. And I'd gladly give those up in return for not having to wade through so much diff clutter when I'm doing code reviews.


around the same length of time developing. ... i still hand format; mostly because i think most auto-formatters have bad defaults. Which may no longer be true, since i turn off indent in vim immediately on any installation.

It isn't really interest in minutia -- My fingers just do the thing automatically at this point; which means if autoindent is on, I then have to go back and delete all the stuff my muscle memory has made me do.

I'm ok with it if i'm forced into some IDE with an editor that is not built for actually writing code (i.e. every IDE default editor); in those instances auto-formatting is very useful. I just avoid those environments, if at all possible.


Totally agree with this. After having to go through how we do things "here" at a few different jobs, it's just not worth it. It's not worth it when I have to learn someone else's stupid idiosyncrasies, and it's not worth it when I "get" to enforce my own.

I'm willing to let go of the things I'm used to in favor of having something completely uncontroversial. I've been in the business not as long as you, but long enough to count the time lost on this stuff.

We do get attached to style and personalization. I think I was a lot more attached when I was younger at this. Perhaps my formatting was more important when my code itself was less personal or less elegant or something. Or maybe the tasks were simply things that weren't all that interesting but just needed to be done. So my way of leaving my mark was to make the formatting just absolutely perfect. Perhaps it was a way of asserting some agency in junior positions where the architecture was predetermined, the problem was well-defined, and the solution was already known when the ticket was assigned. Just get in there and write the code.

I think--and I may be wrong about this--that as I've gotten older and into roles that are more autonomous, where I get to architect entire components of core company business or start from scratch or do other things that assert my personality and agency in code, I care a hell of a lot less about formatting. Mine, yours, someone else's, I don't fucking care, just forget about it and move on.

I also suspect that caring a lot about code formatting is one of the few ways that juniors can signal that they are really engaged in their work and get a little attention. You can't argue about an application's design or anything actually important, so you push a little on what you can, which is somewhat reasonable, and probably a signal of poor management, really.

Anyway, I digress. Bottom line is that I care less and less as I get older and have other things to worry about. I'm starting to view people who are really picky about personal conventions of code formatting as people who either don't or can't contribute anything more interesting to a conversation.


This is one of the things I miss in Python from Go. I loved gofmt because I didn't have to think about the formatting it was just formatted. It did take time to get used to, but now I miss it in other languages. Especially when working with other developers who may not share the same formatting style as myself.

"Gofmt's style is no one's favorite, yet gofmt is everyone's favorite."


maybe give it a try. I felt the same way about prettier, but then I realized how much time I save by no longer having to worry about formatting. As I type, I just type it all on one line and then do a keyboard shortcut and it magically gets reformatted and looks pretty.


This is what I do too. Provided it's easy to type out, I don't really care all that much what the code looks like - especially when I'm getting paid.

(It just wants to be consistent, something the computer is very good at enforcing.)


Not at all. I would never use something like this except to clean up some sloppy code that I am inheriting.


Sometimes that sloppy code is your own code from yesterday or last week or last month. Tighten that feedback loop and sooner or later you're formatting on save and really happy about it.


I probably won't use this for my own personal solo projects. But I'm happy to cede control over minutiae when working on a large team, because it's almost impossible to stick to one style in that case. Even just communicating what the coding style is exactly is too hard for most teams.


If you’re working alone, fine.

But when working with other people, getting everyone to do the same thing, and have that automatically done for you / enforces is incredibly valuable. It’s such a massive win that any deviation from “my personal optimum formatting” is rounding error.


Not really. I worked at one place where they enforced PEP8 (using flake8 etc) using a git commit hook. In other words, when you try to commit some code, it would run flake8 which would most likely balk at some of the changes you made (and not let you commit until those were fixed). A lot of the rules/warnings are extremely nitpicky, and they had all of them turned on (except for the line length); I don't know how many times I could not commit my code because I left a space at the end of a line, or an empty line contained whitespace, or I had only one blank line between two class definitions, etc. It was just ridiculous. I can understand the argument for having all code in the same format, but this kind of mechanism just seriously decreased my productivity, not in the least because it constantly pissed me off.

Admittedly, Black works differently; as I understand it, it will just auto-reformat your code rather than yelling at you and making you go through your code and fix everything by hand, which is what the aforementioned approach did.

Still, I wonder about the usefulness of such tools. Python code is already much more uniform than most other languages. Also, I am not sure you should take the last crumbs of creativity or personal preference away from programmers. Last but not least, PEP 8 is meant as a style guide, not as a book of law that needs to be enforced at all costs. Some of the Python core developers seem to agree; I have seen comments from Guido and others who apparently think that such tools go against the spirit of the PEP.


> I don't know how many times I could not commit my code because I left a space at the end of a line, or an empty line contained whitespace

Trailing whitespace causes issues with git, editors, diff tools, and numerous other things, as well; keeping it out of a repository is a good thing.

> I had only one blank line between two class definitions

I certainly agree that that's the kind of thing a tool should help with rather than complain about and make you fix.


> "I certainly agree that that's the kind of thing a tool should help with rather than complain about and make you fix. "

It's also the type of thing you should be getting your IDE to worry about. And not leave it till it's time to commit/push.


Just run

  black .
or

  autopep8 -ir .
before commit, what's the big deal? Keeps the code consistent, improves readability, simplifies CI and allows focusing on more important things.


> I don't know how many times I could not commit my code because I left a space at the end of a line, or an empty line contained whitespace, or I had only one blank line between two class definitions, etc.

On the other hand, it drives me nuts when I see stuff like this in our source files. I would love to have a commit hook that did nothing but enforce a minimal set of white space rules. Just requiring no trailing white space and no mixed tabs and spaces would make me so happy.

>Admittedly, Black works differently; as I understand it, it will just auto-reformat your code rather than yelling at you and making you go through your code and fix everything by hand, which is what the aforementioned approach did.

I don't actually write much Python, but surely you could have run the same tool (or some other formatter configured to match the linter) locally to have it do that auto-format for you?


> I don't actually write much Python, but surely you could have run the same tool (or some other formatter configured to match the linter) locally to have it do that auto-format for you?

Maybe... I don't work there anymore, but if I'm ever in a similar situation, I will consider that approach, assuming it will only reformat files as needed. (I suspect the company-mandated flake8 script scanned all the code (200K lines), rather than just the files that changed, considering how slow it was.)


So long as it isn't too helpful, I guess.

A coworker of mine at a previous job used a JS autoformatter built into his editor, and he couldn't insert a `debugger` statement into his source when testing locally because his editor would delete it immediately...

I spend about equal amounts of time fighting with my style linter, formatting my code, and disabling dumb lint rules. Maybe an auto-formatter would save time by reducing the first two more than it would increase the second (though if the formatter introduces bugs by deleting bad code all bets are off.)

I also don't really care about style... Who really cares where line breaks are? Who cares whether you line up your comments with spaces or not? That stuff doesn't "take time", affect readability, cause arguments etc, because we're not children.


> Last but not least, PEP 8 is meant as a style guide, not as a book of law that needs to be enforced at all costs. Some of the Python core developers seem to agree; I have seen comments from Guido and others who apparently think that such tools go against the spirit of the PEP.

Actually PEP 8 starts with: "A Foolish Consistency is the Hobgoblin of Little Minds"[1] A lot of people miss that part.

[1] https://www.python.org/dev/peps/pep-0008/#id15


> I can understand the argument for having all code in the same format, but this kind of mechanism just seriously decreased my productivity, not in the least because it constantly pissed me off.

Globally, though, code is read an order of magnitude more times than it's written. So it's a huge productivity improvement in the not particularly long run.


That doesn't really apply here... Like I said, Python code is already much more uniform than most other programming languages. Unless somebody's formatting style is particularly ridiculous, their code should be just as readable as anybody else's, even if they put a space where it supposedly doesn't belong, or use too many/too few blank lines, or use the "wrong" way to split and indent a long list of function parameters. There is no productivity gain for others here. Maybe for languages like Javascript or C, but not Python. Almost all of it is just nitpicking.

(Of course you can make code less readable in other ways, e.g. by choosing undescriptive names, or weird idioms, but flake8/Black naturally don't address those issues.)


As someone who reads and writes a lot of python a, consistent style is incredibly valuable.

The only reason I don't use autoformatters is because most of them are bad. In my experience, black doesn't have those issues.


I'm fine with ceding control right up until the formatter does something I don't like for no good reason. For languages like Rust, where there is a single format convention that is closely tied to the language (via rstfmt), I am okay with this kind of forced standard. For something like Python, C++, or Java where there isn't a single "winner" for format guidelines, there's virtually no chance that I would embrace something like this.


A little bit of a catch 22 you think?

C++ has clangfmt.

Black has a good momentum right now it very well might be the clear winner in a few months


I don't think it's a catch-22 at all. Either the language has an official style guide (C#) and/or formatter (Rust, Go), or it doesn't (C++, Java). Given that black doesn't even seem to comply with PEP-8, I don't consider it acceptable.

I use clang format because it gives me full control over the style I cede control to it because it happens that I can express all of my personal minutiae of hand formatting in clang format rules. In contrast, I currently am writing Java in VS Code, and the Java formatting plugin doesn't give me an easy way to change its rules, so I disabled it entirely.


> black doesn't even seem to comply with PEP-8

That is just not true. Where does Black not conform to PEP 8?


> right up until the formatter does something I don't like for no good reason.

Hand-formatting. You are describing formatting code by hand.


I almost always turn off auto-formatting; at least when i can use vim. If i'm force outside of vim (where i don't have a 'shift' operation), then I don't mind it as much; but i almost always hate the way auto-formatted code looks.


I used to feel this way before using Go.


We get a fair amount of code from non-developers, and it invariably looks like hot garbage. Having something that auto-cleans it is a godsend.


You are certainly not a minority in the discussion. Personally I hate repeating the same argument as it's a style question. I program in C# and dislike braces on new lines, I leave it in place because my company rolled out a standard and if I deviate from that I create more work for myself and others.

Are you consistent with your formatting rules? I'd be interested to know how much time you spend formatting your code vs ceding control to an auto-formatter.


It takes me a lot longer to write the code than it does to format it for readability.

I'm inclined to think that the reason that auto-formatters are popular is not because manually formatting code is hard, but simply to head off nitpicking in code review.

I think there's a better solution to the "style nitpicking in code review" problem: Just don't do it.

If you're nitpicking style during a code review, chances are good that you are not looking for real problems.


It's more cargo cult for nitpickers.


Love the idea of eliminating formatting debates when writing Python.

At the risk of being redundant, I also raise an eyebrow at the choice to prefer double quotes for strings. My company standardized on single quotes, mainly to be consistent with repr and also encourage the use of double quotes in messages displayed to the user.

Everything else seems in order. I might increase the line width to 90 just to use an easier value to remember when configuring editors and other tools ;)


--single-quoted-strings would be a really nice option for the vast majority of python programmers that use single quotes by default.


Ditto, no single quoted strings -- not even going to consider it no matter how brilliant it (potentially) is.


This isn't a vast majority by any stretch of the imagination.


The overwhelming response here saying use single quotes begs to differ.


That's not how this works. All the people who are just happy or indifferent about double quotes don't comment about it. And some of the ones that aren't happy about it commented here multiple times.

Judging from the additional stars on GitHub, and projects that just migrated (pytest!), I'd say there's a very vocal minority which is very attached to single quotes.


> Judging from the additional stars on GitHub, and projects that just migrated (pytest!), I'd say there's a very vocal minority which is very attached to single quotes.

Accuse me of selection bias. Immediately use even more biased selection bias.


Selection bias is real here.

I thought dbl quotes smart for all the reasons in the readme. Me commenting 'this is great' is just noise on HN, and discouraged by the rules. Never take self-selected anything as truth, especially comments (tweets/posts/voluntary votes)


This is all subjective unless some data is collected. I could say with just as much confidence that all of the major Python projects I've seen or contributed to use single quotes -- e.g. numpy or pandas.


this is the only issue stopping my team from adopting black.


The problem with code formatters for Python is that you can't just break lines using whitespace; you need to insert symbols, ideally parentheses. For example, if you need to break this line of code:

     left[first][second] = right[first][second][third]
Manual breaking looks like this:

    left[first][second] = (
        ‎right[first][second]
            [third]
    )
Code formatters will produce something like the following atrocity:

    left[first][second
        ‎] = right[first][second][
        third]
Comments and strings are also unwrappable if the formatter is afraid of inserting characters.


Did you actually try Black on a line like this?


In:

    left[a_rather_long_key][a_rather_long_key] = right[a_rather_long_key][a_rather_long_key][a_rather_long_key]
Out:

    left[a_rather_long_key][a_rather_long_key] = right[a_rather_long_key][
        a_rather_long_key
    ][a_rather_long_key]


I must admit I didn't. I felt confident it was restricted to adding and removing whitespace because the README says at the start:

> Black ignores previous formatting and applies uniform horizontal and vertical whitespace to your code.

I now see it does sometimes modify non-whitespace characters e.g. later in the README it mentions:

> In [certain] cases, parentheses are removed when the entire statement fits in one line

I'm not in a position to test black out right now (I can't run Python on the computer I'm posting this comment on). I'd be curious to know what it does on the code I posted, and on over-length comments and string literals.


I tried it. Black doesn't seem to touch long strings or comments. It just leaves you with "line too long" errors that you can clean up yourself.

For your code example, see my comment above.


> You will save time and mental energy for more important matters.

Exactly why every language, from here until the end of time, should have a “go fmt” equivalent.


I wanted to add a bit of positivity and mention that I'm really liking how the black project is approaching problems. For example, a few people brought up fluent interfaces[0] as an issue. There were many opinions about the right and wrong thing to do, but discussion got to a very pragmatic decision I feel.

Then there were requests to add command line arguments specifically so that tools could integrate with black that were added almost immediately.

Congrats on gaining so much traction so quickly, and thanks for listening to users (when it makes sense).

[0] https://github.com/ambv/black/issues/67


Has anyone rolled this out to an existing codebase? What's the best practice? A single commit that reformats the whole codebase? How do you avoid creating merge hell?


You can see how this was done for Fabric, PyPA/Warehouse, and pytest.

General guidelines:

1. One commit with only the automatic formatting. Afterwards you'll be able to skip over it easily with `git hyper-blame` or `git blame $BLACK_REV^ -- $FILE`.

2. Avoid leaving open pull requests. If you do, after landing the blackening commit, blacken all pull requests, too. They shouldn't conflict then.

3. Set up enforcement with pre-commit or CI (you can run `black --check` on Travis or similar).

4. Don't forget the repo badge ;-)


How does this compare with Yapf (https://github.com/google/yapf)?



I chuckled at this "If you're paid by the line of code you write, you can pass --line-length with a lower number."


88 characters per line is a weird choice. Why not 90?


Originally PEP 8 had 79 characters. Now that was a weird choice so most companies went with 80 instead, including Facebook. You want a low-ish limit because it makes it possible to fit two files side by side on a typical screen resolution. Even if you don't edit like that, you look at diffs like that. More importantly, a low column limit is helpful to disabled engineers who don't have to navigate horizontally so much.

So we used 80. I always felt bad when the linter stopped people from pushing impactful changes because they went over the limit by two characters. So two years ago I set up a "highway speed limit" style warning in flake8 (code B950 in PyCQA/flake8-bugbear). What it does is it keeps your limit intact (for example "80") but doesn't trigger unless you went over by more than 10%. So the limit happens to be 88.

When I was working on Black, I was faced with a dilemma. Should the formatter stick to 80 or be able to "go over" a bit, too, as we would let humans do. I felt like the latter made more sense as the resulting code looks nicer (fewer occasions to break a single line into three or more). Then I remembered Raymond's talk "Beyond PEP8" where he mentions that experience shows "90-ish" is the wisest choice. So I went with it.


> More importantly, a low column limit is helpful to disabled engineers who don't have to navigate horizontally so much.

I saw that you mentioned that - where have you seen a study that claims 100 is the cutoff? Would be interested in seeing that.


I don't have a formal study, just conversations with several legally blind engineers I work with.


> Originally PEP 8 had 79 characters. Now that was a weird choice so most companies went with 80 instead

Probably thinking of one of these:

- backslash continuations

- terminals/etc that counted newlines

- off by one errors


Also possibly a terminal text editor - the cursor sits on the column where the next character would be input (such as vim's insert mode). So with 79 character lines, the cursor is sitting at 80 while waiting for the next input.

If your screen is larger than that it's no big deal, but if your screen is 80 columns and the cursor was at column 81, it would wrap to the next line without actually being a newline.


Also taking up a character, are ASCII-rendered scrollbars (like EDIT.COM and friends).


Yes.

The old standard was actually 78, so that a diff would fit on the screen.


Science. 88 produced smaller files than 80. And 90 and above didn’t noticeably shorten files.

Like Raymond Hettinger said in his talk beyond pep8, 90ish is better than a strict 80. If you have 81 characters on a line its a waste of time to move that to three lines and now harder to read. So there should be some buffer. that buffer is 10%. The goal is 80 but we are ok with up to 88.


but no matter what, it's a strict something, right? in Black it's a strict 88 by default (unless it violates some other rule, if I understood the readme)


90 characters?!?!? You heathen. I want 132 characters per line.


How does this compare to yapf?



What's with python 3.6 requirement? - Major stable distros are at python 3.5. So I can't even try it without messing with system python or OS.


It's open source software created by a volunteer. Using the latest version of Python lets me leverage features which help me focus on the problem at hand. f-strings, pathlib integration, the latest typing module, and so on.

I'm sorry this makes using my tool harder for you but I'm sure there is an easy way for you to install Python 3.6 without destroying your system Python. There's Homebrew for macOS, deadsnakes for Ubuntu, EPEL for RedHat, and so on.


Probably because of f-strings.


After testing this some tonight, I'm committed to my first-blush response. This is my standard from now on, and it is the standard for my team.

Someone needs to ask a StackOverflow question about Python formatting and we need to answer it with "Just use Black." Upvote the fuck out of it, and be done with this forever.


I like the spirit of it, but the implementation seems to contain too many exceptions (e.g. trailing comma, whilespaces around :). This problem is not unique to Black, it is actually too common for most auto-formatters.

I usually prefer a set of dead-simple formatting/styling rule, easier to enforce, lower cognitive load.


What don't you like about the exceptions? The code and the documentation would both be simpler without them. They are there because the end result is closer to what a human would do in those situations.

And the two exceptions you mentioned are ones you will also have to make if you want to stay PEP 8 compliant (pycodestyle's E203 is invalid inside slices) and you want your code to execute on Python pre-3.6 (where you can't add trailing commas to calls and signatures containing args and *kwargs).


> where you can't add trailing commas to calls and signatures containing args and kwargs

I think that's signature only. You don't have problems with calls:

    $ python3.5
    Python 3.5.5 (default, May 17 2018, 07:04:26) 
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> def foo(*args, **kwargs):
    ...   print(*args, **kwargs)
    ... 
    >>> foo(
    ...   'abc',
    ... )
    abc
    >>>
The argument black made about not adding trailing comma is also quite unconvincing to me:

> Unnecessary trailing commas are removed if an expression fits in one line. This makes it 1% more likely that your line won't exceed the allotted line length limit. Moreover, in this scenario, if you added another argument to your call, you'd probably fit it in the same line anyway. That doesn't make diffs any larger.

Who cares about the 1% chance of not exceeding the line length limit? If you really care about that, use one-per-line style, not all arguments in one new line.


Yeah, call side was fixed in 3.5 but I don't split hairs here. I consider it either 3.6+ or don't put trailing commas after neither signatures nor calls with stars.

BTW, your signature doesn't demonstrate a call with unpacking. What you meant to test was:

l = [1, 2, 3] foo(0, *l,)

This works in 3.5+ but fails on 3.4 and before.


using black is the no cognitive load in my book.


The reality is that the way humans write code is complex and if you try to use simple rules it's going to look bad in many cases.

What linters have been doing is to figure out --some-- rules that are general enough that can be enforced.

Complete formatters like black are making decisions for --every single formatting choices--.

In practice, they need to be complex if they want to have people using them.


Anyone know how I can get auto format on save to work in Vim with Black? I installed with Plug.

Figured it out:

`autocmd BufWritePre *.py Black`


Is it possible to bind blackcellmagic[1] to run automatically everytime a cell is ran in a Jupyter notebook?

[1] https://github.com/csurfer/blackcellmagic


You can explore putting something like :

    require([
      "base/js/namespace",
      "base/js/events"
    ], 
      function(Jupyter, events){
 
        events.on('finished_execute.CodeCell', function(){
          // execute black here
        });
    });
In ~/.jupyter/custom/custom.js.


This is great - especially _removing_ hanging indents which is a real pet hate of mine! Only issue is the quotes - I'd love to see Black leave those as single if flake8 is configured to do so, for all the reasons mentioned in the comments.


Maybe I missed it, but I don't see a comparison with PyCharm's built-in code formatter. Why would I integrate this code-formatter, "black", into PyCharm when PyCharm already does this for me and the whole team?


You can run this on pre-commit. While you can set PyCharm to run on file save, it is not guaranteed to run on all files. By using a command line tool, you can enforce that all files across your project are formatted this specific way and never let someone check in something that doesn't conform to it. If you have newbie developers, it can be a lifesaver.


You edit code, test it, and then have some program rewrite your code after testing but before it becomes an immutable commit hash?


Why do you want your code formatter tied explicitly to an IDE? Why would you want to force everyone on your team to use the same editor?

Use independent components, and let people use what they work best in.


You’re acting as if it’s inconceivable that there are people that don’t use PyCharm out there....


Not everyone runs PyCharm, and this tool is open source.



I'm also assuming pycharm has some options. This thing seems to make very specific decisions and if you don't agree with them, tough.


On the example there's:

    # in:

    TracebackException.from_exception(exc, limit, lookup_lines, capture_locals)
    
    # out:

    TracebackException.from_exception(
        exc, limit, lookup_lines, capture_locals
    )
I don't like this one. I would prefer that if you have )/]/} on the next line, then you should have a trailing comma, e.g.:

    TracebackException.from_exception(
        exc, limit, lookup_lines, capture_locals,
    )
Also I would prefer one-per-line over all in the same line (but not on the same line with the parentheses), but I feel less strongly about that one.


Sure you can prefer this but it is not the point. The point is that having ONE average way to do it > several best ways for several people.


I'm saying this is a weird/below-average way to do it.

If this cannot fit in one line:

    foo(arg1, arg2)
My first choice would be:

    foo(
        arg1,
        arg2,
    )
Second choice would be:

    foo(
        arg1, arg2,
    )
While Black chooses:

    foo(
        arg1, arg2
    )
And made some unconvincing argument about it:

> Unnecessary trailing commas are removed if an expression fits in one line. This makes it 1% more likely that your line won't exceed the allotted line length limit. Moreover, in this scenario, if you added another argument to your call, you'd probably fit it in the same line anyway. That doesn't make diffs any larger.

Who cares about the 1% chance of not exceeding the line length limit, really?


I know and I completely agree. Maybe a kind of vote could be made on this kind of choices. But what would be really nice would be git "style lenses" : having different styles to edit or share.


I just tried this on a Django project, and it formatted all the migration files as well. Is there a way to exclude certain directories when formatting a whole project?


I'm working on that. In the mean time use your Unix fu with `find | grep | grep -v | xargs black`.


So what? How is that harmful?


I take it it's because the migration files are auto-generated anyway, so it's odd to auto-format previously auto-generated files. You also don't really want to have repository churn on those files since they're generated once (and then never edited again) along with the feature they're for.


It would be a one-time auto-format just like the one-time auto-format that black will do for the rest of the project. Any newly auto-generated files can be formatted before they are committed. So I don't see it actually adding to the harm.


I find as I write code I no longer worry about alignment or long lines. When I have vim paste a section of code and its at a different indentation I just run black.


I wish there was something like this for C#/.NET

.editorconfig is a start but the amount of time saved through automation and removing customization is significant.


Is this conceptually different from using pylint, failing the build for any warnings and saying anyone who modifies the `pylintrc` file gets fired?


Sounds like you're joking but I'll bite. Yes, it's different because doing what you suggest introduces barriers for people which make landing changes harder. By forcing people to conform to any style manually, especially by blocking them from landing important features and bug fixes due to styling preferences, you are building up opposition. It's just a bad experience for everybody involved.

In contrast, when you're promising that this exact problem won't happen anymore because an automatic tool will handle stylistic preference for your team, people are more willing to accept stylistic choices they (mildly) disagree with. Because on average the style is still better and on average everybody can move faster.


Agreed auto-correct is a big plus. Maybe a better way to phrase my question is this: Can auto-correct be implemented in pylint? Or does allowing configurations create too much of an obstacle?

In the ruby world for example, rubocop has an auto-correct feature. However it is only implemented for a subset of the style checks.


Any word out about a version for emacs?


From the linked GitHub page, it says for Emacs integration use https://github.com/proofit404/blacken . I haven't tried it.


This is so cool. I like all the choices and will be adding it to my workflows for my personal projects.


Choice. There is only one (line length). And I think that's a good thing. Which were you referring to?


I think the commenter likes the choices made by the project so they don't have to make them :)


I've been long awaiting a Python equivalent of gofmt and prettier.js!


Why is this better than autopep8?


Black and YAPF introduce consistent formatting within an entire file. Ironically, autopep8 goes against PEP 8 by only targetting the places in the file that generate pycodestyle warnings. Some of the warnings are also incompatible with PEP 8 (W503, E203) and the chosen formattings might very well be inconsistent with the rest of the file.


Paulo Torrens is crying.


88 column wrap? Yeah - no thanks.


just specify a different --line-length argument? Easily solved.

I have mine set to 120. I code my python code in Pycharm, on a high resolution screen. Having a column limit of 88 only makes sense if you are inside of VIM or something.


Almost everyone agrees that there should be a column limit, but there is general disagreement about what that limit should be, exactly. There are plenty of sound arguments for 80 and as many for 100 or 120.


Curious, why is this a bad thing? Also it seems like you can change it to a lower number if you want.


80 columns is very standard.


It was a standard from a time when we had green-screen terminals only capable of displaying an 80x24 grid of characters. (https://en.wikipedia.org/wiki/VT100)

It's worth asking - does the standard make sense still, given how we edit today?


It's not just about whether your hardware is capable of displaying wider lines. If it were only about hardware capability, why wouldn't we be writing code that's 200 or 300 characters wide?

* Studies of readability generally show that it declines once lines of text are longer than 60-70 characters, not counting whitespace or punctuation. At that point, humans have difficulty finding the beginning of the next line, which slows them down. You can compensate for this by increasing line spacing but you lose a bunch of space that way. The vast majority of professionally typeset natural language material is limited to about this length, or even shorter.

* People read code in terminals. Most terminals default to 80 columns wide. Consider people that develop on multiple computers and multiple OSs, and have to reconfigure them all. Or if you use a new computer or loaner computer the defaults will be back to 80. So if you change to 120 columns, you have to do it over and over again. Same with text editors, but less so.

* Side-by-side diffs can get cumbersome if the text is more than 80 columns wide, and consider that font sizes vary, and some people like their monitors vertical for reading diffs so they can see more context. On my 24" 1920x1200 monitor, I can easily read a side-by-side 80 column diff, very nearly 100, but definitely not 120.

* As a heuristic, an abundance of wide lines often indicate problems with the code itself. Too much nesting or something like that. This depends on the language and indentation used, it's generally accepted that Java code will be something like 25% wider.

I'm not saying that 80 columns is the right choice, only that there are reasons to support that choice. Just like there are reasons to choose 100 or 120.


I exclusively program on a small laptop and don't have great eyesight. Formatting code such that it only looks good on huge displays makes my life more difficult.


Probably not, there is line wrap.


Definitely not. We turn all line-wrapping off. Everybody has wide-screens.


And nobody uses split windows? No screen can comfortably fit more then about 200 columns, so I believe 100 is the absolute maximum width.


We use 2 wide-screens. 4K/Retina also removes need for tiny columns.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: