I believe from memory Django decided to move to using Black back in 2019 [0] but delayed the change until Black exited Beta. Black became none beta at the end of January [1].
This was finally merged to the main branch today [2].
I suspect there are lots of other both open source and private projects that are also making the change now. This is a show of confidence in Black as the standard code formatter for Python.
Shameless plug: For people who like black, I've been working on ssort[0], a python source code sorter that will organize python statements into topological order based on their dependencies. It aims to resolve a similar source of bikeshedding and back and forth commits.
We use isort[0] for this. It even has a "black" compatible profile that line spits along black's defaults. Additionally we use autoflake[1] to remove unused import statements in place.
isort only sorts imports. ssort will sort all other statements within a module so that they go after any other statements they depend on. The two are complementary and I usually run both.
Thanks for your work on this. Coming back to python from golang, I really missed the auto-formatting. black plus something like ssort seems to bring the same to python. I've had really good results with black so far and look forward to trying ssort.
This is relevant to my interests. We have an internal code style guide at my company that includes guidelines for order of class statements, roughly matching yours. I have one pet peeve that made me write the style guide in the first place - Django's `class Meta` which we always have at the top of the class because it contains vital information you need to know as a programmer, like whether this class is abstract or not. Whenever I have to work with an external Django codebase and find myself scrolling through enormous classes trying to find the meta my blood pressure rises.
I've had the same problem with pydantic. Currently, properties are special cased and moved to the top. Everything else, including classes, is grouped with methods. Meta classes will end up somewhere in the middle, which is probably the worst possible case.
SSort is currently used for several hundred kilobytes of python so I'm wary, but if I'm going to make a breaking change before 1.0 then I think this is likely to be it.
By the way, I have raised a ticket to finalize method order before 1.0 release (https://github.com/bwhmather/ssort/issues/11). Please follow and comment if you would like to see a change.
This sounds like a living hell if you use git diff a lot to compare for small changes that might introduce a bug? which is what happens at work all the time since our unit test and CI are a joke. Not dumping on your project but the idea of that much of a change up of the code scares the dickens out of me.
One thing worth mentioning is that the `git blame` ignore file trick doesn't work as well with ssort as it does with black because the changes ssort makes tend to be much less local.
Implementation details at the top. Python is a scripting language so modules are actually evaluated from top to bottom. Putting high level logic up top is nice when you just have functions, which defer lookup until they are called, but you quickly run into places (decorators, base classes) where it doesn't work and then you have to switch. Better to use the same convention everywhere. You quickly get used to reading a module from bottom to top.
It is a bit backwards, but in exchange you get predictability
With backwards sorting you know that, unless there is a cycle, you can always scroll up from a call site to find the definition or down from a definition to see where it is used. With forwards sorting you can scroll down to find a definition, unless the function was imported, or used as a decorator somewhere, or called by something that was used as a decorator, or used in some other way that I haven't thought of.
My personal experience is that this predictability is hugely useful. It almost entirely obviates the need for jump-to-definition within a module, and gives modules a very obvious shape and structure.
I never thought of it as backwards. Defining functions before calling them makes as much sense as defining terms before using them or assigning variables before reading them.
Very interesting, especially the method order part. I dislike the order you chose, and yet, I would be tempted to use it on my projects anyway, because being congruent is so important to me.
This is about how I initially felt with black. I didn't like some of the things it did, but I was happy to have a standardized opinionated formatter so I went with it. Was definitely the right decision.
It’s an unfortunate compromise with code formatters. As someone who takes code formatting really serious and puts a lot of manual thought into code formatting, code formatters almost always make my code worse, in my opinion, or at best unchanged outside of small errors like double spaces, etc. But it creates a standard that it loves everyone’s code towards, which is good., and also obviously alleviates the chore of manually formatting.
I just wish sometimes they were more configurable. For example, the Elixir formatter is quite opinionated on things but is generally not configurable.
I often wish they were _less_ configurable. Even Black's handful of config options are too many... Each one is a chance to pick a whole new color for the bikeshed.
That makes sense, but it's a problem when the formatter makes a rather opinionated choice and doesn't allow one to configure it. The primary example I'm thinking of in Elixir is comments for pipelines.
For example, if I use the pipe example on Elixir's landing page and add comments to it (obviously a contrived example):
"Elixir" # string to get frequencies for
|> String.graphemes() # Get all graphemes (i.e., character units)
|> Enum.frequencies() # Get the number of occurrences of each grapheme
This gets formatted by the Elixir formatter to:
# string to get frequencies for
"Elixir"
# Get all graphemes (i.e., character units)
|> String.graphemes()
# Get the number of occurrences of each grapheme
|> Enum.frequencies()
This is a rather strongly opinionated format stance, and as far as I know, it's not configurable.
Will do. Examples directory isn't terribly helpful as documentation as it mostly contains real code with problematic syntax (and compatible licensing) that tripped up ssort when I ran it on a copy of my pip cache. I will move it into tests to avoid confusion
Thanks for sharing this. When solo coding, I tend to dump new classes and functions wherever is physically closest to where I was previously editing. It makes sense in the moment so I don't disrupt my train of thought by jumping all over the file, but then is a confusing ball of mud when I need to return to the project after time off. Was the shortest scroll direction up or down when I implemented it? etc…
this is great. Imagine i declare global variable which is used in function which is defined AFTER this global variable is declared (filled by value) and then function is executed later. Why does ssort put my declaration/filling of global variable before that function declaration?
def myfunc():
global globalvar
str(globalvar)
globalvar='abc'
myfunc()
will be transfered to
globalvar='abc'
def myfunc():
global globalvar
str(globalvar)
myfunc()
I understand why is it done but i dont want to have function definition block filled with this declaration of variables (which i do later) since it has no impact to my code and it makes is just a bit "cleaner". Dont tell me to not use global variables :D
I've been using black at work for over a year now. I don't much care for some of the choices it makes, which can sometimes be quite ugly, but I've grown used to it and can (nearly) always anticipate how it will format code. One nice side effect of encouraging its use is how, at least where I work, it was very common to use the line continuation operator \ instead of encompassing an expression in parentheses. I always hated that and black does away with it.
What I don't much care for is reorder-python-imports, which I think is related to black (but don't quote me). For the sake of reducing merge conflicts it turns the innocuous
from typing import overload, List, Dict, Tuple, Option, Any
into
from typing import overload
from typing import List
from typing import Tuple
from typing import Option
from typing import Any
Ugh. Gross. Maybe I'm just lucky but I've never had a merge conflict due to an import line so the cure seems worse than the disease.
Edit: Just to be 100% clear: this is python-reorder-imports, not black. I thought they were related projects, though maybe I'm wrong. Regardless, black on its own won't reorder imports.
Give µsort a try instead; it's focused on providing more safety when applying sorting to large codebases, and is designed to pair well with black out of the box:
For me µsort is a non-starter since it doesn't ignore the import/from part when sorting lexicographically. So when an import changes from `import foo` to `from foo import bar` (and vice versa), the import is moved. Sorting should start at the package name, nothing else.
I suspect you are fighting the tide of common practice in the Python community. In your example, switching from `import foo` to `from foo import bar` is changing the entire nature of the import. Sorting module- and from-imports separately also makes it much easier for many people to visually scan a block of imports. And similar to black, having a tool be consistent and predictable across projects and modules is more important than bikeshedding every possible opinion.
isort also kind of has this bad behavior when using 'import as':
$ cat foo.py
from x import a, b, d, e
from x import c as C
$ isort foo.py
Fixing /tmp/foo.py
$ cat foo.py
from x import a, b
from x import c as C
from x import d, e
It feels to me like importing names from a module gets you a set of names from that module, so I'm already thinking about it as a collection. It doesn't bother me at all that it's turned into a tuple and spread over multiple lines.
I think I prefer it (first example) to the diff (second), it's just that a singular thing imported is such a common case that three lines for it where it so easily fits on one does seem a bit silly.
Really? I just put that exact line in a file I'm working on, and black didn't change anything. Maybe you mean in case it exceeds the line length limit, rather than that specific example.
In any case, you can wrap those in parentheses, in which case black will just enforce its usual tuple formatting: single line if it fits; one line per item if not, with a trailing comma.
edit: I tried it on a long line with a backslash break, and black wrapped the imports in parentheses like I suggested above. I wonder what causes the behaviour you see on your end.
Why do you even care? I never look at that part of the code. If PyCharm automatically removed/added imports without me managing them I would be a happier person.
Reading some of the comments here it's become clear to me that the next stage in the development of auto-formatters is to have the formatter commit the code as a canonical format but to display the code to each individual contributor in the style of their choosing. Thus removing all kinds of arguments about whether 80 or 120 columns is the one true width.
But even now I have to adapt on screen shares when my coworkers are using dark themes before sunset. Can't imagine what that would be like with different code formatting. So the next next step is to display their desktop with my theme.
IOW collaboration tools still have a long way to go.
I think this is the most wonderful part of Lisp. Specifically its homoiconicity, or the fact that the syntax of the program is the program, and yet the syntax (as far as linebreaks, indentation, spaces vs tabs, etc) is completely irrelevant to the meaning of the code.
Ostensibly you could craft a future where what is on disk is not what the user is actually editing - a-la the virtual DOM. And on read/save the developer's preference is used to transform the syntax into their ideal shape. This is trivial in a Lisp, but not so easy in other languages.
I'm pretty sure you can already do that with some scripting. Just write a git alias, say git edit, which will run the work tree copy through your favorite formatter to a temp file and send that temp file to your editor, and a commit hook to rename all the temp files back to the original and format them back to whatever the canonical format is. You can also configure your git diff and stash etc to make them aware of the temp file naming convention. You might even be able to write a script to generate all these aliases. There are some annoying details such as needing a separate command to temporarily move the temp files to the work tree to give your IDE a hand, but totally doable. It's going to take maybe a few weeks of work, but doable for a single person.
This is the way. I had not considered this before reading your comment.
If this system results in syntactically identical code [1] it should not matter if it’s displaying for you differently [2] if it means you can read or write around or in it more comfortably it’s just a hairstyle.
I was asked to familiarize myself with Replit the other day and it seemed the editor defaulted to two spaces for Python. Two spaces?! I changed it to four.
A friend joined my session and began to code with me, their editor was in the default two space indentation. It was madness.
[1] This seems like is a decent sized presumption across many languages and versions.
[2] This seems like an interesting AI problem, showing code structures you’ve never used in your style you’ve never defined.
I think that making editors do this is within the realms of feasibility. Most support auto-formatting to your preferred style so it doesn't feel like a leap for it to format to your preferred style but keep the file on disk the project owner's preferred style. I haven't looked extensively to see if this already exists though but we chatted about this at work as I was advocating for use of prettier on a front-end project!
Smudge & clean might do the trick, but it could be dirty. The smudge -> clean process might produce additional changes that aren't related to the purpose of your commit. Whitespace in particular could be a problem, especially where there's ambiguity in how it should be used. black isn't as bad because it has stricter rules on whitespace. Still, if you aren't checking style rules before every merge someone using smudge and clean could end up reformatting entire files.
IMO the next next step is, as others have discussed on HN, getting your version control to store and abstract syntax tree. tree-sitter could make this easier nowadays, but I think it'd need more invasive changes in Git than just using the filters.
The reason to use Black is the same as Prettier on the HTML/CSS/JS side: forever stop having an opinion on code style, it's wasted time and effort. Any "it's not exactly what we want" comment with an attempt to customize the style to be closer to "what we were already using" is exactly why these things exist: by all means have that opinion, but that's exactly the kind of opinion you shouldn't ever even need to have, tooling should style the code universally consistently "good enough". Which quotes to use, what indent to use, when to split args over multiple lines, it's all time wasted. Even if you worked on a project for 15 years, once you finally add autoformatting, buy in to it. It's going to give you a new code style, and you will never even actively have to follow it. You just need to be able to read it. Auto-formatting will do the rest.
Except Python is a general purpose programming language so it's hard to have 1 shoe fits all solution when style vary based on medium you're working with. Are you making an OOP GUI app? Django? Something that is using loads of long Xpaths?
I don't know if that applies. Ideally, a good code formatting tool would work with any project. If there is a specific flag you want to disable for some block to use your own format, then the tool should support that.
As a couple of examples, PHP has had a unified formatting standard since 2013 and Elixir has a formatter built into the language. Both languages need the formatter to be enabled by your IDE/CI and that's also the case for Black.
Aside: I love a good linter, but as a long-time Python fan I find it sad that Black has so little configuration (yes, I know, but still) and moreover that it often produces code that no human Python dev I know would write...
Python was always meant to look concise / beautiful... (MyPy has also made this trickier too)
In 30yrs of dev the truest statement in standards I can make is that they change, all the time. The 2nd truest is I and coworkers have wasted far to much energy on arguing and maintaing STDs.
Blacks value isn't autoformatter, it's preemptive discussion ender.
> Blacks value isn't autoformatter, it's preemptive discussion ender.
Exactly. I have learned that having it formatted just like I want it is FAR FAR FAR less important than having the entire ecosystem share a single format.
I am just about to embark on resolving differences in our code base due to two different clang format files being used by different teams that have now merged. Can't wait to have all those conversations and discussion over which options are correct.
Well, sorta. It's really, really mentally annoying switching between projects where standards are different. For example 80 char limit to 120 char limit takes me at least a month to fully get used to. I agree black is better than the alternative, I agree it has downsides, I'm happy some of the parameters are tunable, but I'm also glad most of them are not. I just want to write software with tools I'm used to.
Maybe it's because I only have one eye and the resulting slightly reduced width of field, but wide lines drive me crazy. I need to see the whole line without scanning. This was one of python's original appeals to me...
I'm like her in that I have eye issues and thus can't use screens that are too large (16" absolute maximum) and even there it needs to be with large font. I wish everyone had stayed with 80 chars so I could have two vertical emacs buffers, but I've sorta gotten used to these ugly wrap around lines.
I wish people just used more locals though. I don't see what the problem is and it makes Sentry errors easier to debug.
Well, you'll be surprised to find out that gofmt has exactly zero configuration. Ok, they (wisely in my opinion) decided not to mess with breaking lines automatically, and the job was far easier to do with a new language than with an already-established one where most developers have their long-treasured preferences.
Actually, modern versions of black will retain the fluent style you prefer, though it will collapse up to the first method call on the first line within the parens, so you end up with something like:
I get why it's done that, but I just don't think it helps humans read.
Part of the twisted beauty of PEP-008's narrow lines is that you're forced to extract (named) variables, or avoid overly indented code by extracting methods or applying higher level abstractions.
In the last few years I find devs are happier to format and push to "sort that problem out", leaving the readability benefit of that thought process lost.
TL;DR writing readable code isn't just about getting the spaces and brackets right...
always feels a bit off and "unbalanced" to me. The opening paren doesn't have anything immediately following it, so it feels 'symmetric' that the closing paren shouldn't have anything preceding it.
And also it feels like the open and closing parens should be on lines that start at the same indentation level. Honestly this I think does aid in readability a bit.
> Part of the twisted beauty of PEP-008's narrow lines is that you're forced to extract (named) variables, or avoid overly indented code by extracting methods or applying higher level abstractions.
This feels orthogonal? The line is wrapping either way, which might sufficiently annoy someone to extract things out a bit more. But IMO it feels like a bit of an anti-pattern to create abstractions on the basis of syntax as opposed to the structure of the program.
> writing readable code isn't just about getting the spaces and brackets right...
? I mean of course not, but that's what we're talking about in the context of formatters right? I think the real, major way auto-formatters help with readability is by getting people to stop wasting mental cycles on things like spaces and brackets so that they can focus on more important code organization concerns.
I agree with it feeling unbalanced. I don't register that statement as being complete. Putting the parenthesis on its own line is the same as putting a closing curly brace on its own line in languages that use those.
int foo() {
return 1; }
(This example actually breaks up vertically in my mind. As if it's just the number 1 being bracketed)
Maybe it could be broken up differently though to avoid the lone paren.
I also prefer the close paren to be on it's own line. Besides how it looks, it feels easier to me to add to the code inside the parens (but this is likely because I use vim).
Every time I was tempted to do something like this, I hesitated because I didn't want every other line in every file with my name on a single commit, mostly to avoid making git blame harder than necessary. It would be nice if there was a kind of diffing algorithm that can diff code units *syntactically* across history.
The problem with this approach is, the blame before and after the ignored wouldn’t make any sense to the viewer if he didn’t know about ignoring the formatting commit. Also, you will need to configure that for every clone. Since tree diffing algorithms are pretty well known these days, I don’t know why there hasn’t been any real effort to implement a git plugin that can chase syntax tree node changes instead of doing string diffing like it was the 70s. Syntax parsers are so easy write now and surely the tree node changes can be cached. Your usual diff/patch tooling wouldn’t work for this kind of diff, but that’s just an option away when you need them back.
This is a nice feature, but I do wish that .git-blame-ignore-revs was automatically applied, similarly to .gitignore and .gitattributes. Hopefully there are plans to do so in a future Git release?
Not everyone uses PyCharm, but if you do it's really easy to highlight a specific code block and look through the git commit history for that section. I've used it many times for this exact type of problem, trying to find when the last substantive change happened.
To do this just highlight the block, right click, and choose Git > Show History for Selection.
The best way to do this is to rewrite history with git filter branch / etc and rerun black at every commit. Then everyone nukes their clone and you continue on with the best of both worlds.
The only real downside is you nuke your issue tracker at the same time.
In my experience it's better to just bite the bullet and do it. Eventually you will do it, so you either screw up git blame for a small codebase with a small amount of history, or wait until it is a large codebase with a large amount of history to screw up.
> It would be nice if there was a kind of diffing algorithm that can diff code units syntactically across history.
There have been quite a few attempts at that though I've only seen them applied to resolving merge conflicts. It would be interesting to try them for blame too.
Does the user matter? As long as the commit message is something sensible like 'Autoformat with black' it can be easily ignored when seen, and you can avoid seeing it with blame as simonw suggests.
The problem is that this revision will override all the previous ones in the “blame” output so it needs to be explicitly ignored. See a great link elsewhere in the thread on how to deal with that in newer versions of git.
- doesn't respect vertical space - sure, making the code fit on screen might be valuable (though the default width should be at least 120 characters, I mean we're in 2022 after all), but Black does it by blowing up the vertical space used by the code
- spurious changes in commits - if you happen to indent a block, Black will cause lines to break
- Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
> oesn't respect vertical space - sure, making the code fit on screen might be valuable (though the default width should be at least 120 characters, I mean we're in 2022 after all), but Black does it by blowing up the vertical space used by the code
This is fine with me--I think it makes sense to optimize for readability, and I can read a long vertical list of arguments a lot more readily than a long comma-delineated list.
> spurious changes in commits - if you happen to indent a block, Black will cause lines to break
Is this a generic argument against wrapping lines, or am I misunderstanding something?
> Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
I'm not following this either. If black automatically reformats your code over multiple lines, that doesn't suggest manual formatting. Maybe you're arguing that all code which produces a given AST should be formatted in the same way--this would be cool and I would agree, but black gets us 95% of the way there so to argue that it "fails" is to imply that "0%" and "<100%" are equivalent.
> the default width should be at least 120 characters, I mean we're in 2022 after all
Even in 2022, some people don't have wide external monitors, sometimes like to view two files (or a diff) side-by-side, or need to use GitHub/BitBucket/etc. code viewer pages. Also, it's still difficult for humans to read long lines.
Agreed. It's really nice to be able to have two files side-by-side (or a file plus a shell) without having tiny font. I still do the overwhelming majority of my work on a laptop--maybe I'd feel differently with one or more 27" 4K monitors, but even then I don't think a code formatter should make these kinds of assumptions.
> This is fine with me--I think it makes sense to optimize for readability
You cannot read things you can't see. If half a function is scrolled off the bottom of the screen because every function arg is on its own line .... its pretty annoying.
I also noticed we are in 2022, and my screen is so big I can have three or four files of 80'ish chars wide side to side. Specially with Django, where you usually need models.py, views.py, forms.py and a template open at the same time. With 120'ish lines, I lose one vertical split.
I have to bump up my font size a bit and find 120 characters too wide on a 27" monitor where I need to look at multiple things side by side. It's also harder to read even when viewing a single file.
IMO, < 80 is ideal where possible with an absolute maximum of 99. I think Black's choice of 88 (plus maybe a little more in special cases) is quite good.
It's odd that nobody followed Go's formatter in letting developers break lines themselves and mostly fixing indentation and spacing. I thought they made good choices.
Honestly the only grievance I have with Go's formatter is that it doesn't automatically break lines. I'd be a big fan of "if two programs parse to the same AST, they should format the same" and if that's too aggressive perhaps allow for `// go:nofmt` annotations or something. In whatever case, `gofmt` gets at least 95% right.
Nah, I can totally understand why they decided to stay away from this can of worms. First, what max line width do you choose? Second, where do you break a line if it's too long? I think gofmt gets the balance exactly right: makes source code easier to read by providing a unified formatting style, but doesn't get in your way more than necessary.
The whole point of an opinionated formatter is to have opinions about these sorts of things.
> Second, where do you break a line if it's too long?
It depends on the context. Yeah, writing the algorithm to make these decisions is a little complex, but it's also well-understood.
> doesn't get in your way more than necessary
What is "necessary"? It seems like you're trying to say "it makes decisions on the things I think it should make decisions on" which is fine, but it's not like choosing between `struct {` and `struct{` is objectively more critical than line wrapping.
When I'm writing Python at Google, and get yet another error because my Python or Markdown line exceeds 80 characters, and read the fights on the mailing lists about changing the limit, I think Go was created because it was easier to create a whole new language than get the line length increased for Python.
Why do you get errors rather than auto-formatting (in the editor or in CI) and moving on with your day? I would have thought Google would have this sorted already?
I know! Usually the editor autoformats on save and while typing, but there are some edge cases where the regular incremental formatter fails but it's rare enough that I don't reflexively hit the "format all files" button and then get caught out by pre-submit tests.
> - Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
It's one of my favorite things about black, and I've started to use that formatting of function calls with long arguments for other languages too.
But I also despise long lines with a passion, I hate having to go to the right, and would much much rather scroll up and down with a consistent width, so that I can put multiple views next to each other.
I don’t mind the formatting, I mind that the formatting is done depending on wether the list ends with a comma or not.
[
Item1,
Item2
]
Is combined to one line, while
[
Item1,
Item2,
]
Stays as multi line. Now I am once again in charge of formatting my code, by virtue of the comma. Does this stay multi line or is it short enough enough to combine? That should be for black to decide not me!
Yes, but wasn’t the whole point not having to run the linter with options.
The trailing comma thing seems inconsistent with handing over the formatting to black. Now I’m in charge of deciding if a list should be cleaned up into one line or not.
My monitor is in portrait mode. Even when I used one in landscape, I typically had two windows side by side. So extra-wide lines of code are less readable.
> - Black fails at its most basic premise - "avoiding manual code formatting" - because a trailing comma causes a list/function call to be split over lines regardless of width
Ah that's why `manage.py shell` now split json pasted on several lines, very annoying
I did a blind survey of YAPF vs Black at my work. The results came back as 70% in favour of Black.
Black gives generally nicer output, and also more predictable output because its folding algorithm is simpler. YAPF uses a global optimisation which makes it make very strange decisions sometimes. Black does too, but much less often.
There are also non-style problems with YAPF. It occasionally fails to produce stable output, i.e. yapf(yapf(x)) != yapf(x). In some cases it never stabilises - flip flopping between alternatives forever!
Finally it seems to have very bad worst case performance. On some long files it takes so long that we have to exclude them from formatting. Black has no issue.
In conclusion, don't use YAPF! Black is better in almost every way!
Yeah exactly. I had 20 samples from our codebase that showed some representative differences and you had to click on which one you liked more. The order (Black/YAPF or YAPF/Black) was randomised.
I also had to turn off Black's quote normalisation otherwise it is really obvious which is which. Quote normalisation is another point in Black's favour.
I could put the survey up somewhere if anyone is interested.
YAPF is slower than Black for many degenerate cases, a fact I notice most strongly since I use an "auto-format file on file save" extension in my editor. The case I found in particular was editing large JSON schema definitions in Python, as they're represented as deeply nested dictionaries. Black seems to format them in linear time based on the number of bytes in the file, while YAPF seems to get exponentially slower based on the complexity of the hard-coded data structure. It was a niche case, and the maximum slowdown was only ~1-2 seconds, but that editing freeze was quite annoying.
What's wrong with configurable? Too much opportunity to bikeshed?
I figured yapf was not "new" which is why black won.
Starting about 5-6 years ago there was a push in the Python community to replace solved problems with new ones in what appears to me as chasing the JavaScript community.
Instead of consolidating on existing tools that worked well but had some rough edges to smooth out, numerous projects came about to reinvent the wheel.
There is no "best format". It's a matter of opinion.
Taste is something we cannot objectively agree, and in fact, people will end up arguing even about this very statement, offering what they think is an objective measure.
Bikesheding, yes. Hours lost in meeting, chat debates, documentation to write, linting configuration. To be redone for each project, team, etc. Worse even in FOSS where everybody will come in a ticket and complain. And after while, you do it again, because the debate is never settle even in the same team or project.
There is not way to take a team of 3 people, choose a style, and make it so that they are all 100% happy with it.
Black took the road of gofmt: you can't chose. And it won because of that: it saved people time and energy.
People realize the cost was not worth the satisfaction, which you are unlikely to get anyway. Let's just move on to what matters, it's good enough. Pareto.
> Bikesheding, yes. Hours lost in meeting, chat debates, documentation to write, linting configuration
Sounds like a team problem, I've been on plenty of teams that use clang-format for c/c++ and there have never been any issues like this. Team players know that (almost all) arguing over formatting is not a good use of time. (edit: in case not clear, clang-format is extremely configurable. Set a default config and live with it forever, that's how those teams work.)
> Black took the road of gofmt: you can't chose. And it won because of that: it saved people time and energy.
I don't see how this follows, if a team was dysfunctional enough to be wasting hours and hours of time before, I can't imagine why that wouldn't continue. It just shifts from "let's change this flag in yapf" to "let's switch to yapf because black looks ugly and gives us no options".
Well if it helps your team out then that's great, I just wouldn't expect that to generalize well. But who knows, people are weird, maybe more would give up fighting about style because of a tool change than I expect.
Yes, and also too hard to set up. It's extremely dumb, but I'm much more likely to use something I can't configure, because if I can configure it, I'm going to want to, and it'll take forever to make all those choices.
I'm so happy that languages are settling more and more on heavy reformatter usage. I'd like to think it was triggered by Go and gofmt. Working on a team where each engineer has their own personal syntax is not fun.
Indeed, I don't like Black's style, but I prefer working in a Black codebase than one where everyone has their own preference. Having style guidelines in a team is also a great way to remove pointless debates when reviewing PRs.
Agreed, and what's interesting is that despite all of those pointless style debates, there hasn't been much pushback on using reformatters (that I've seen). This tells me that the debates weren't really about "my style is objectively best" but more about "I'd like to use a consistent/predictable style (with preference to mine)."
Quite right, I have not met a lot of people that were actually strongly opinionated about their coding style. They simply did not want to use a tribal one that was not well defined that just added friction to the development process.
...which is why I wish Black allowed more configuration. A team can often agree on a set of styles. Every team on the Python planet agreeing... now that's much harder
I disagree on that though. By sticking to vanilla Black (no pun intended) you ensure that people joining your time will probably already be familiar with the style, you prevent strongly opinionated employees from pushing for changes in the linter config.
Black is opinionated, so it skips the debate entirely. We just use Black, not black with 120 characters lines, just Black.
To each their own I guess, but to me it just seems like pandora's box. Once you show that you are open to changes in the linting configuration, it makes the rules mutable and pretty much guarantees that at some point someone will say "how about we just change this one parameter in the linter", which will probably be agreed by the rest of the team, not because they actually agree but because they don't want to argue.
That does not really matter as long as the default are somewhat sensible. As far as I can tell Black's are.
gofmt is also opiniated. Someone somewhere picked the defaults and maybe I disagree with those defaults, but in the grand scheme of things, all go code is more approachable because it all looks the same. That to me, is worth a lot more than having my favorite format be the one on top.
Go and gofmt definitely pushed a lot of the momentum of the current wave but don't forget to give respect to Ruby / Rubocop where it's due, where the adage of Convention over Configurability has reigned supreme for decades.
Sure but the style guide standards exist and it's pretty rare for a Ruby application to stray from them (from my experience at a multi-billion dollar public company Ruby shop)
Style guides are a notorious time-sink where people will spend enormous amounts of time debating various conventions without that being linked to measurable benefits. One of the big problems here is that people notoriously conflate “familiar” with “better” and you rarely run the counter-experiment showing that after a couple weeks everyone would be familiar with any of the serious proposals.
The advantage of a tool like Black is that it avoids that constant bikeshedding and the fact that it actually does the work for you puts the conversation in a different light because the option which is the least work is just letting Black format the code. Whatever you pick for style, you really want automatic formatting to avoid it seeming like a chore.
> Style guides are a notorious time-sink where people will spend enormous amounts of time debating various conventions without that being linked to measurable benefits.
It feels like we're trying to justify the continued employment of uncooperative, contrarian egoists. Pick a style and use it or they can go find another job to waste time debating nonsense.
There's some truth to that but I think it's more than that. This comes up all of the time in the UI world: if you ask people their opinion on something in the abstract, they'll often provide a ton of ideas but if you give them something which they can use now they'll probably either say it's okay or come up with a much smaller list of things they actually care about. Things like style guides are especially tricky here because people have opinions but they don't get a bill for changes — having a formatter available can shift that dynamic to where you can basically say “Black does this for free every time you hit save. Are you willing to build a tool which maintains that level of effort?”
I've had that happen a few times and the number of people who will volunteer opinions has consistently been at least an order of magnitude larger than those who are willing to contribute something like linter or formatter configuration.
Black is slowly creeping into gofmt-level universality in the Python community and it’s great. The next big milestone is a first-party recommendation by python.org itself.
No, the next big milestone is embedding the format style as the syntax of the language. I'm curious as to why Go didn't even do this (they should have, in my opinion, but wimped out and left it to an external tool).
In general, what are the strategies for large public codebases like this to mitigate supply chain attacks or other source-level attacks?
For clarity, I'm hoping to open us discussion about how we're dealing with massive changesets like this that are difficult to review due chiefly to the breadth of it.
For a purely mechanical change like this, someone could run black against the same revision of Django and verify the changes they see locally match the changes in this PR.
The same version of black, with the same settings, will always produce the same results from the same input code. Definitely re-producible. That question is about how stable the formatting is from version to version. Which is now more stable, and why Django finally made the move.
Interesting! Can you help me imagine attack scenarios? All I can think of is:
- The changeset is authored by a trusted committer but the committer's tools have been locally compromised.
- The public tool itself (e.g. black) has been compromised to automatically create vulnerabilities in difficult-to-review bits of code (a Ken Thompson hack).
As a reformatting tool should only change the formatting, you could check that the Abstract Syntax Tree is unchanged. The ast module in the standard library gives access to the AST [1].
It's a significant milestone in the adoption of Black by influential projects within the Python ecosystem, which makes it a good hook for discussing the idea that Black, now stable, is becoming established as the standard for code formatting for Python.
Using black is not about how the code looks but to eliminate an entire suite of review comments/discussions. Everyone simply runs black over all code before submitting and no one ever comments about how anything is formatted.
Naive question, but why is everybody so aggravated by formatting discussions? It seems to be a widespread opinion that these discussions are just 1) pointless and 2) difficult and time consuming.
My personal experience is that 1) in many cases you do benefit from taking a moment, going through your code and thinking about presentation. And 2) I find it not at all difficult to settle. A change either doesn't matter, then you just don't discuss it at all, or it is important, then you quickly agree on the best solution. (In the worst case, "best" means what the project lead finds prettier.) If you don't have a social mechanism to agree on something as basic as coding style, then your team probably has bigger problems.
I actually find robo-formated code annoying to read: Go code from a bloody beginner who doesn't know what they are doing looks exactly like carefully tended for, highly thought-out code. And in autoformatted Python, you for example cannot make formulas clearer by removing spaces around operators with higher precidence. Parentheses placement is dicated by how long words are and not by what logically belongs together, etc..
I think it's more that formatting discussion is _easy_: if you get a merge request, you can start adding comments like “this should be indented differently” or “wrap this here” and spend a lot of time “reviewing” the request without noticing that you missed an error in the logic because you were focused on the most visibly upsetting problem.
Using an automatic formatter also can help reduce diff noise, which is something I've noticed on more active projects. Using Black drives close to zero the amount of time I spend confirming that someone didn't actually change functionality along with formatting. There are other ways to get this, of course, but it's so easy just to enable Black and never spend your time on it again.
> Parentheses placement is dicated by how long words are and not by what logically belongs together, etc..
This is actually a bit more subtle: Black will remove parentheses when they don't have any effect (e.g. `(32)` will become `3 2` because it covers the entire expression) but if you use them for only a subset of the expression they'll be preserved (e.g. `(1(32))` becomes `1 * (3 * 2)`.
They are also huge timesinks, people spend less time sometimes deciding which framework or language or cloud provider to use and which country to register their company in, than spending hours and hours on how some SICK ANIMAL forgot a trailing comma somewhere, or bikeshedding how many spaces to indent with, and who uses an ultrawide monitor and wants wide columns (but then again someone turns theirs into portrait mode, and wants narrow columns), etc. Gobs and gobs of time, totally disproportionate to the issue at hand!
Yes, black sometimes produces fugly code, but at least the bikeshedding can fucking stop. There are bugs to fix, for chrissake. Yes, some people are not thoughtful enough to format their code like a piece of poetry, so what? Should everyone take Typography 101?
There is also a huge cohort of programmers who, when doing reviews and finding nothing to nitpick, resort to criticizing, in great detail, the code formatting of pull requests. They fell they MUST leave some critique or else their review is incomplete and their peepee is small. I find it hugely enjoyable that a tool like black can deprive them of the joy of belittling someone else for microscopically minor things and concentrate on, say, the actual merit of a change.
Like you, I'm quite fascinated by the apparent massive frustration and time sink that is apparently happening due to formatting discussions. Been working nearly 25 years in software at all levels, rarely using auto-formatted code, literally can't remember having one of these discussions. If anything I might even say I wish people cared a little more.
I do quite often artisanally craft formatting of specific code sections to highlight the intent of the algorithm or design. I assume black would merrily destroy all my hard work.
By chaining yourself to a format preferred by a machine, you free yourself of having to understand how and why another human thinks the way they do and prefers what they prefer.
With a style guide and linter I've never experienced this and idk why you would. Then the only time style comments come up is pointing someone to the guide
and that is exactly what tools like black do for you. A linter tells you a line of code could be better while as something like black goes further and makes the change for you so you don't even need to think about it.
So now when you look at the annotated change history all you're going to see is a bunch of changes by the person that reformatted the code instead of the person that wrote it.
The `.git-blame-ignore-revs` file can be used to ignore that (and will be [1]). Unfortunatly GitHub doesn't support it but at least it's possible to have clients behave in a reasonable way.
There's workarounds that others have mentioned, but indeed, the unfortunate side-effect of deciding to apply a formatter is a 'formatting' commit, causing a lot of code churn and issues if naively using git blame.
But, it's a "rip the plaster off" kinda thing, because it should ensure a lot less churn, inconsistent code style, or arguments and reviews about formatting after this is merged. It frees up a lot of headspace and distractions in code reviews. I don't know about you, but when I did code reviews I'd always end up zooming in on code style issues - ' vs ", things on newlines or no, JS objects with stringed keys, etc.
Uh so is your take "don't do broad refactors ever?"
Beyond `.git-blame-ignore-revs` (which is neat and TIL), in GitHub's web viewer, if you find the line you're interested in and see that the most recent PR is a reformat, you click the "view blame prior to this change" button. I think most blame viewers do (or at least should) have a feature like this.
I love this except the use of the default black line length of 88. One of the things I appreciate about gofmt is being trusted with deciding on line breaks.
I suggested Black to a team I was on a year ago and one developer hemmed and hawed about how he likes to format arrays or something. I didn't win any friends by pointing out that disregarding those personal preferences is part of why I was recommending it.
A year later and it seems to be the default on all projects I'm working on and I'm loving it.
Autoformatters are hell for 2d arrays of data where the columns have meaning and you want them to be aligned (time series, matrix math). It’s my only real gripe.
This is such a great news. We've been using Black in the company that I work for the past 3 years or so and it was a game changer for code reviews. Hopefully other open source Python/Django projects will follow the lead.
What's the point of putting linters into CI? Is the point to fail the build if the code wasn't pre-formatted with i.e. Black? Or is the point to autoformat and autocommit the formatted code?
Had a great experience with black. Only thing I did was change its default line length limit to 120 characters (I was regularly dealing with signal names from source data that were about 90 chars).
Do Black and other autoformatters enable significantly more reusable code and computer-generated code? Formatting is certainly not the only or greatest barrier, but if format is standardized across projects, it's easier to plug and play code from outside.
“Black” developer refused for a long time to add option to format code with single quotes with very aggressive manners. Now Django devs didn’t see that option for single quotes and code looks unpleasant.
I have always used single quotes for Python code since I start working with it. When I started to adopt Black on my projects it indeed felt weird and the code looked unpleasant. But after a while you get used to it.
Some people make the case that it's easier to write single quotes (well, depending on the keyboard format anyway). For keyboards in the US standard you have to hold the Shift key to write a double quote. But the good thing about Black is that you can still write your code using single quote and when you run the command line utility it will fix/normalize the code to use double quotes.
Nowadays I got so used to it that I even write my Python code using double quotes. And looking at Python code using single quotes looks weird/unpleasant for me.
I use single quotes for items that, while technically a string, could be considered a value or symbol. For example:
syslog('debug',"Just opened %s for output",filename)
While there's no semantic difference between single and double quote, in my code base, there is. And if black becomes very popular, why even support single quotes anymore?
Technically single or double quotes have the exact same meaning in Python. What makes people use single quotes is probably other languages like PHP, Perl and Bash.
I know I've made it a habit to default to single quotes unless I know I need double quotes. So that might be where the habit comes from in the Django project. But it's not actually necessary in python so might as well use the most commonly used type of quote.
To keep the single quotes, which in my opinion make the code less cluttered and closer to the REPL, I use the pre-commit hook double-quote-string-fixer, in conjunction with black's option skip-string-normalization set to true.
This was finally merged to the main branch today [2].
I suspect there are lots of other both open source and private projects that are also making the change now. This is a show of confidence in Black as the standard code formatter for Python.
0: https://github.com/django/deps/blob/main/accepted/0008-black...
1: https://news.ycombinator.com/item?id=30130316
2: https://github.com/django/django/pull/15387