Pyflakes makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives.
Pyflakes is also faster than Pylint or Pychecker. This is largely because Pyflakes only examines the syntax tree of each file individually. As a consequence, Pyflakes is more limited in the types of things it can check.
It's also even less well-documented, I have no idea if you can even create a custom flake. pylint's got the advantage that as badly documented as it is creating custom lints is supported and normal.
That's a nonsense argument. The output is proportional to the number of violations. If you keep your project clean, then you'll have about a report of about 100 lines showing the statistics, with a one-line message: "your code has been rated at 10/10" at the end.
Of course, if you run it against a project that is full of violations, you'll get lots of violation messages. If you use a linter or style checker or whatever on a project that is full of violations, and it doesn't produce lots of output, then the tool is rubbish.
The right thing to do is to use it from the start and either fix or explicitly ignore any violations as they come up.
The approach described in this blog post is good for existing projects. However, the linked examples disable all warnings and explicitly enable those that the author has deemed important. It would be better to explicitly disable unimportant ones.
Don't get me wrong, pylint is great, but at some point it seems those tools begin to get opinionated instead of helping
Number and size of methods are both code smells that indicate that you might be creating a God Objects or God functions. If there's a good reason for an enormous function, then violations can be ignored on a case, project or user basis. I'd prefer a tool that can warn about that sort of thing, to one that can't.
Most of the limits are configurable, and set to reasonably generous defaults. Methods per class is 20 and statements per method is 50.
And the author has already provided a constructive solution to the problem other than "not linting", which is to start with a whitelist and slowly crank the linter up, providing instant value and a path to move forward to obtain incrementally more value over time, instead of trying to crank the linter down, which means a lot of work before you're provided with any value, followed even then by, in the real world, probably staring at the list of issues, getting overwhelmed, and deciding the code seems to be working fine the way it is so why bother?
(Remember, we are not superhumans. None of us respond to a list of 20,000 style issues by dropping everything and fixing them all. This is not even wrong; we can not operate like that, so there's no point in pretending that we either do or should.)
In this process, I may exclude a folder or two if they are non vital and have a lot of dirty files. I think I have fixed code with nearly 10.000 violations this way, it takes 3 hours max if you use an efficient editor. Then the code is is some shape that you can start working on.
Pylint is also best used in tandem with an editor or IDE, so you can get live feedback.
The author's problem comes directly from not using pylint, then introducing it to a large, established project.
You simply can't get a list of 20,000 style issues when you first run this kind of checker, if you use it from the start of your project.
If you introduce lint later on, the proposed approach of whitelisting and ramping up is OK, and probably a good idea to start with if you are really swamped, but once you have done a bit of ramping up, you should switch to blacklisting and ramping down. IMO, unless you have tonnes of violations evenly distributed across all the different types of violations, then blacklisting and ramping down is a better approach.
Blacklisting is better than whitelisting in this case for a few reasons:
* By disabling all, you will miss out on any new rules you didn't know about (e.g. from plugins or new versions of pylint)
* Whitelisting requires you to think about all the possible violations, and work out if they matter to you; Blacklisting allows you to just look at the individual high-frequency violations in your codebase, and work out if those particular ones matter to you.
* Blacklisting allows you to knock off the low-hanging-fruit where there are just a few instances of a particular violation, and keep that aspect clean as you go. Whitelisting ignores anything you have personally deemed unimportant, possibly saving up a big cleanup for later.
Looking at one of the linked examples - there are disabled warnings that can introduce quite subtle bugs and maintainability problems (e.g. dangerous-default-value), some that are rarely seen and simple to fix (e.g. unnecessary-semicolon), some that are syntax errors in more recent Pythons (e.g. duplicate-argument-name). Even in an older project, nothing is gained by disabling these warnings and you just end up with noise in your pylintrc file.
Pylint produces a report that shows the counts of the different violations, in frequency order. Start at the top and work your way down to either produce a blacklist or fix the problems.
Yes, but there are several cases where this doesn't matter. Exceptions for example.
> Number and size of methods are both code smells
Yes, but Pylint's default values comes from where? It's pretty easy to trigger them without being a God object
20 methods is fine, 50 lines is what fits in my terminal, and a lot of functions don't fit that (especially with blank lines, splitting lines bigger than 80 chars, etc
Exceptions are not an example. They should inherit from BaseException (PEP-352). BaseException has __init__, therefore any Exception has __init__ by inheritance.
> 50 lines is what fits in my terminal,
50 statements, not 50 lines. Blank lines aren't statements, splitting a long line doesn't turn it into two statements. It's pretty rare to have a 50 statement method that doesn't violate Curly's Law. Personally, I'd make it much smaller. Depending on how much commentary and whitespace you put in, you could have 150 or more lines in a 50 statement function.
> Yes, but Pylint's default values comes from where?
It doesn't matter, pylint isn't an academic paper that requires every assertion to be backed up by experimental data and citations from peer-reviewed journals. It's a handy and easily configurable tool. I would guess that the numbers came from a couple of pylint developers having a conversation like "50 statements is pretty big for a function isn't it? Yes, let's make that the default limit." Were it hard-coded as a magic number deep in the pylint code, then perhaps they might need a bit more experimental validation behind those numbers.
The code was developed with help of the 'pep8' tool which only finds 12 offending lines we declared ok (all too long, but it's SQL where formatting for readability is more important than 80 chars..) and wlthough pylint gives it a 7.26/10 there's a TON of errors.
So yes, I could now spend a while blacklisting some rules, fixing some stuff, but the defaults are just so bad I won't bother, as usual. Because I think the shape of this code is pretty good.
That's the wrong attitude. Nobody commits code they don't think is good, but unless you measure "good" objectively and thoroughly there can always be problems you don't notice.
The entire point of the article is that there ARE bugs buried in all that noise, and that you can surface and prevent them from being Future Wink's problem by spending an hour today to fix the tooling. For 800 lines it should take you far less than an hour, and if there is a "TON" of complaints there are probably one or 2 real bugs. Multiply that by 10k LOC and you start to realize the scope of the problem.
Take the pain, fix it today while you are just at 800, so when you're 1000 or 5000 it's 5000 lines of better code.
foo = functools.partial(bar, 'baz')
def THIS_IS_IN_GLOBAL_SCOPE(): ...
def bar(spam, eggs):
print spam, eggs
return bar('baz', eggs)
This is bad because readers now see FOO and think it's a constant:
FOO = functools.partial(bar, 'baz')
Better to run with -E first, address those, then add in warnings, and blacklist the points which are actually not useful.
>> configure it with a whitelist of lint checks
If you start using Pylint at the beginning of a project, you can blacklist warnings that you want to ignore fairly easily.
>> If the errors are utter garbage, delete those checks from the configuration.
I've never had "utter garbage" errors. I've had errors I didn't agree with/made the conscious choice to ignore. And I've had errors I thought were garbage, but really I just didn't fully understand them.
(Although I've experienced how applying it to an existing, unlinted project is horrible.)
I've had lots of spurious errors with numpy code, where pylint seems to think everything is a list.
NumPy's `__init__.py` code is a bit messy and manipulates `__all__` a lot , so I can understand why pylint (and other static code analysis tools) struggle with it.
Yeah, it's frustrating hitting errors in third party modules, and it's tedious for one person initially to set up pylint correctly, but so worth it for a team.
That being said, since you seem like a suitably snarky person, you may enjoy the email newsletter I write which is dedicated to the ways I've screwed up: https://softwareclown.com/
I signed up. Better to learn from your mistakes than my own. The testimonial was awesome by the way.
Regardless, using functools.partial solves that problem: ``callback = partial(print, i)``.
Also, you described two separate bugs. The first one doesn't sound like an asynchronous callback issue. It sounds like a text editor with autocomplete would have avoided the problem. Or coding in an environment that encourages frequently running the code to look at its results.
Also Sublime Text has a nice plugin called Anaconda, where PyLint can be enabled:
A (hopefully) short story: on my last contact job I was brought in as a senior developer for a growing Django project that had 0 coding standards. I imposed a small set of coding rules (what I actually did was say we would follow the Django coding style guide, as I've learned over time pointing at an existing, 3rd-party guide means when people get butthurt about having to change a habit they get mad at the guide and not at a coworker) and then added pep8 and pylint to the git commit hook. That lasted for about a day because of the massive amount of errors from pylint (the situation the author describes), so I modified the hook to block on pep8 violations and just give a guilt-trip in pylint errors for the time being. Then myself and a couple of team members who drew short straws sucked it up and actually fixed pylint errors as we went until we could turn pylint back on. We fixed a number of bugs in that process and then going forward pylint started to show it's strength by catching the sorts of bugs the article mentions (and still enforcing a standard coding style so you could no longer look at a file and say "That's tclancy's code" but rather saw "That's our-project's code".
Having pylint in the hook process is better than running it as part of CI or ad hoc because it's run more often, meaning you get hit with a smaller number of errors at once. Having pylint or pep8 as part of your IDE is even better because you fix problems as you go. One of my current side projects is a Django project written by a Rails dev (no idea why) that I do maintenance on. The first thing I do when I open a file I've never seen is get rid of all the red and green squigly lines (or at least look at why they are there). Most of the time it's just a formatting issue but there are plenty of actual bugs it turns up. Just by opening the files to make an update or fix a bug I often wind up improving things people didn't even know were broken as a side effect.
Linters are a good thing. They can always be better and they will always annoy for some idiosyncrasy they have (or you do), but it beats ignorance.
I wrote a wrapper around all of these linting tools so I can call them programmatically from the same Python process, and each time a file changes I run my linting tools on that file. (Not the whole codebase). I also intercept the output and re-format it to give a (brief) report in a common format. I can also control thestrictness of linting (i.e. which config file to use) with a YAML-format comment in the file header, so we can write quick prototypes without worrying too much about style then bring them into the quality system on a file-by-file basis.
In this way, linting runs fast enough to be useful in the edit-lint/test loop, and the error messages are both telegraphic and immediately actionable.
Plus if you want to disable some features you can even do it at the local level rather then having a single central rule. eg:- using #pylint: disable=I0011,E0401 in your .py files.
Subtly, pylint makes it easy to migrate teams to new APIs, e.g. by disallow rules.
-1 to whitelisting - I just devote ~1 day to going through the 1000s of warnings, not a big deal.
PyCharm does type inference and cross-file name/object checking, these are not things pyflake can do.
In terms of coding style I've found it fairly ineffective at picking up on most bad practices.
There's no elasticity over what the OP reported. That was a bug. Clear as they come. He may have unit tests but they didn't catch it and since you can't have unit tests for everything, why not spend 500ms running a linter?
Testing and linting does different things. Of course a sandwich is cheaper than a condo, but what does that have to do with what we are talking about?
As for the claim that you can't have unit tests for everything: What do you mean? No, you can't test all branches or combinations in a system, but you can have a 100% test coverage (aka. hitting each line at least once).
Yeah not really. Both linting and unit tests catch bugs. This is a fact that you refuse to actually face.
>you can have a 100% test coverage
You can but it's A) a bad idea and B) it won't even necessarily catch the bug that the OP brought up if you do.
I use both linting (pylint actually) and unit tests, so I am not advocating that you should scrap your linting tools. I am just trying to point out the difference.
Your B) argument is yet another indication that the tests are not good enough. The example bug OP posted is what made me comment in the first place. If your tests don't catch that your tests are useless.
Edit: What the heck, this is an honest question. :(
To me, it's a good reason to use different loop variable names for successive loops. A good practice to use in any language, since it helps keep loops distinct to your mind as well.
for volume in get_volumes():
for other_volme in get_other_volumes():
I'm actually a little surprised something like that hasn't been attempted yet for Python.
Thanks for the answer!
1.) Never go more than two layers deep, so you have one "def" inside another "def" and that's it.
2.) Always keep the closures at the top of the containing function - no further nesting.
def my_routine(flag, foo):
return thingy + foo
Which is why I like having a tool that'll catch it for me.
My typical approach is `lambda x, y=y: f(x, y)`. Inline functions are much more readable, is the thing. But I can see the argument for this coding style, yes.