Hacker News new | past | comments | ask | show | jobs | submit login
How to set up a perfect Python project (sourcery.ai)
76 points by pedxing128 50 days ago | hide | past | web | favorite | 48 comments

I still struggle to see the point with every project that starts out using pipenv. Python -m venv & activate & pip install -r requirements.txt works more often(every time for me) and is faster. I have literally not once seen a project where the pipenv solution was better, but I’ve seen 12 where it straight up failed to install correctly and all others it was slower by a large factor.

For me pipenvs only selling point is that some people don’t know about python -m venv.

Most projects don't need either.

I only use pipenv on a work project with a hundred dependencies, and venv on one other large side project. Everything else is installed with --user, especially dev tools like black and pyflakes. The user folder gets cleaned up every few years when you upgrade to a new version of python (rm ~/.local/lib/python3.5).

Has worked fine for a decade or two, no conflicts. I think some folks get spooked at python packaging and overcompensate in response, but it's pretty easy to troubleshoot when you've learned how it works.

This is only true in the limited case where you never have to share your code with anyone. If you want to share it, setting it up so others can get going without having to figure out which packages are missing is necessary. In that case, you should probably use that same tool yourself to make sure that your environment works the same as you expect others to use it.

Nope, share code all the time. That is what requirements.txt (and alternatives) are for. Where they install them is up to them.

Agreed. Pipenv only led to confusion and pain for me. Last time I tried to use it in a project led to not knowing what the heck to do once I wanted to in my dockerfile - do I install pipenv inside docker? what's the point of that? Is there a tool to convert this Pipfile to a requirements.txt file to use with plain 'ole pip or what?

Quickly noticing my mistake in being driven to play with shiny, new toys, I went back to my old tried-and-true pip + virtualenv for local development and plain old pip inside docker.

I think part of the drive around changes to Python packaging standards and tools has been inflated by a poor understanding and poor usage of existing tools.

Take setup.py for example - nearly everything you read in Python pushes using a requirements.txt file in your project. Why not define your project as an installable package with its own hard requirements - that are defined in your setup.py - and leave the Pip requirements.txt strictly for additional development/test dependencies? Or, maybe better, put test requirements in tests_require?

Nothing against Kenneth, but a big part of me wishes there were more anonymity in the open-source community. It's too easy for a single, well-known individual to come in and publicize <something> and everyone jumps on the <something> bandwagon without realizing that <other-thing> already exists, works well, and is an established standard.

The only thing that pipenv does better is that it uses package hashes. This gives some confidence that you are really installing what you think you are.

But I hate pipenv, so much.

The fact that requirements.txt files are by-default version pegged is an absolute nightmare for people upgrading. It simply does NOT work for a serious, large project that you'd like to keep up to date as dependencies improve. It might seem like it works great for a new, tiny, short-lived project, or to inexperienced devs.

The basic idea of pipenv, being able to resolve the dependency graph more cleanly and precisely, is far far far better - but there are still some usability improvements that leave something to be desired.

pip-compile (part of https://pypi.org/project/pip-tools/ ) provides a significantly better way to maintain a requirements.txt file: create a requirements.in that specifies only the things you care about, and pip-compile it into a requirements.txt with pinned versions. Then you can use pip-compile --upgrade to upgrade just some of the versions, unlike pipenv which wants you to upgrade everything, whether you're ready to or not.

the main selling point was the simplicity + lock (which takes forever to generate), but since Pipenv barfs on itself again and again against "closed + will not reopen" issues and unsupported feature flags it's use becomes more of a pain

I disagree with the usage of githooks for linting and testing. It discourages frequenting committing which is a tenet of using git. Commit often, perfect later, publish once (https://sethrobertson.github.io/GitBestPractices/).

Your pipeline should be responsible for ensuring things don't make it into the master branch that shouldn't. Run tests and linting on a pull request and make it a requirement for those to pass in order to merge the PR. Otherwise one of two things is going to happen:

1. People will not commit frequently and you will have a commit history that isn't as granular as it should be.

2. People will just bypass the githook check

TBH I think some of this stems from people's fear of rebasing. I've heard so many times, "But it rewrite's the history". Yes, but it's the history on YOUR branch. Nobody should care what you to do your branch and it is yours to perfect. You should want to rewrite the history of your branch in my opinion. If you're committing frequently there will inevitably be some commits you want to squash or fixup.

I agree with you. A better solution would be to use CI to check the branches (so you know what state you're in) and enforce a test and style pass before a branch can be merged to master.

I don't know why there's such a fear of rebasing, in a lot of situations it's the right way to go.

I don't like squashing though, but that's a personal choice - sometimes excessive use of squash can hide the history of why things developed the way they did, and that can sometimes be very useful information.

Githooks are the source control version of a database trigger.

Aka, sometimes useful, but usually a bad idea.

I don’t think any of this is wrong or bad, but it feels a bit outdated. Black and ISort are great, but rather than a setup.cfg, how about using a pyproject.toml - and maybe swap Pipenv for something like poetry.

Also, huge fan of pytest (as is mentioned) and mypy (not mentioned, but quite helpful).

[edit] mypy is indeed mentioned! I must’ve missed that on first pass.

I haven't looked into poetry, but your comment makes me shake my head and wonder... what's wrong with setup.cfg to begin with? Another damn file format? What?

Well, I hope your wandering ends here, at PEP-518 [0], which addresses your wonder directly.

[0] https://www.python.org/dev/peps/pep-0518/#sticking-with-setu...

I’ll check it out. I will admit I like being able to combine several configs into a single setup.cfg or package.json, but I’m usually the guy who has a separate config file checked in for every tool he’s using.

mypy is mentioned

I wanted to like black but can’t get over the double quote default. I like that they justify it with the less ambiguity and the ability to not have to escape single quotes. I even like their proposed fix of just code with singles and reformat into double.

But I don’t like unnecessary different levels of reality (code one way, commit another) and hitting shift quote all those times sucks.

It’s funny how an “opinionated formatter” sounds great in theory until you realize the opinion is “wrong.”

I'm not sure when single-quoted strings became the default (at least in Python/Ruby/similar languages), but it's almost entirely about fashion rather than functionality. Double-quoted strings were the norm for literally decades until recently. There is no strong argument for favoring one over the other. So long as everyone uses the same thing (at least within a given project, but ideally within an entire language's work), everything is fine. It's still hard for me to break the double-quote habit because single-quoted strings were unheard of in my entire career until about ten years ago.

I used double quotes for years, but appreciate being able to type more quickly with single quotes. It’s 90% convenience and 10% less clutter since single quotes are just smaller.

It’s not a big deal to me as I don’t care too much (similar to tabs vs space). I have a preference but would likely spend zero time caring if my team were trying to decide and would never in a million years try to change someone else’s project.

Single quotes are slightly less noisy.

   "this", "that", "and", "the", "other"

   'this', 'that', 'and', 'the', 'other'
However, in Ruby, PHP, Groovy, and JavaScript single quotes have actual functional advantages, either through simpler syntax abilities, or by not requiring escaping double quotes for HTML/XML.

And you've got at least a few single-quotes in your comment through language use of contractions.

In code, I prefer whichever style tends to produce the fewest escape-sequence situations.

If the code contains hard-coded strings, that's usually double-quotes, though not always.

Otherwise I guess it makes less sense to want to push +shift+ everytime you enter a quote on a Python dict key, for example, versus using the single quote.

They both mean the same thing in JS and Python. One is easier to type, easier to read, and used by the Python REPL. Single quotes FTW.

Solution: write in whatever way you want and hit "reformat code" before you git add/commit. Reality reconciled, opinion great again.

Addendum for clarification: part of the appeal of autoformatters is that what you type into your editor does not matter. I don't see the "difference of reality" because thanks to black, I don't need to care.

What I meant by minimizing different realities is that I would rather avoid situations where I have to write code and look at code differently. I can definitely autoformat on save or on commit, but I’d rather not so I can always “think using the same code” and get used to the same quotes.

I’m happy to adopt to other folks projects as I don’t want to convert anyone. But purposely working on my own projects with a quote different than the record seems like an extra effort, even if minimal.

Although it at least now has an option to not do anything about quotes, leaving them as they are.

I've been meaning to make a single quote fork of it, but haven't yet got around to it. Wonder if anyone is interested?

Wouldn't a fork of Black with a different formatting choice miss the whole point of Black?

Except that it could eventually overtake Black so that Blacker becomes the new norm.

I don’t think Black’s dream of all Python looking the same is that worthwhile. Especially if people don’t like some of Black’s decisions.

I might use Blacker if it existed, but I’m currently ok with not using Black and just sticking with flake8.

The difference is not substantial, just makes it a tad easier to read.

Also it depends what you are using it for, 1) to keep your team focused, or 2) enable anyone in the python community to feel comfortable jumping into the project. Given not everyone uses black, #2 is not as common as you'd think.

Oh look, yet another project I’ve not heard of (pipx) that I’m being told to use to manage the clusterfuck of Python package management.

> Git hooks with pre-commit

In projects where I am involved in, we usually set up hooks that run a few checks from https://github.com/pre-commit/pre-commit-hooks plus https://github.com/asottile/pyupgrade plus https://github.com/psf/black plus maybe the mypy and flake8 hooks. These hooks are also run in a "lint" stage by by tox, which is also run on the CI system. This catches some lints that slip through every now and then, before you go on to commit them. Also, a tox run will as a side effect also reformat the code before you check it in, which is a nice laziness feature.

I am currently thinking if I should move to the pytest plugins for mypy and flake8 because I find they make more sense as tests, especially if I'm developing inside a venv and repeatedly run pytest instead of tox. That would also mean that the mypy and flake8 checks are run repeatedly for every Python interpreter I want to tox, unkess I invest some time into coming up with better TOXENVs. Hm.

We use Flake8, PyLint and PyTest on an internal engineering tool / internal automation libraries with about 200,000 lines of code, including test code. I would love to use Git hooks, but long before the project grew this big the time it took to run our code checks was unacceptable to enforce with hooks.

The fastest I've seen them run recently is 3 minutes. With about 800 tests, PyTest take a minimum of 45 seconds just to do test discovery--the tests actually run faster. We started with Flake8 and have been slowly adding PyLint checks, but each check (so far) has added a little more time. We diff the local "feature" branch against the remote dev branch and only run PyLint against the differences, but it still takes 1 min on average to run--experiments to run against a smaller set of files still result in coffee break worthy execution times most of the time.

Judging from history, if we enforced the use of Git hooks to run these tools it would result in mutiny with teams "secretly"--their manager(s) being onboard--working around and supplanting the tool. Small things like this result in a lot of frustration and a loss of goodwill.

Interesting! How exactly do you run the tests and the linters then? Pytest with flake8+pylint plugins? Separately? Serially? In parallel? Why flake8 and pylint at the same time, did you find significant differences?

We ask developers to make it part of their workflow, but the project is shared across a 50+ engineers and few people do that. We enforce it via Jenkins pipeline on every commit that get's pushed to our internal BitBucket Git server. The Jenkins BitBucket plugin is used to enforce a successful build before pull requests can be merged. It works, but it can be a little frustrating at times to see an endless steam of alternating failed then passed builds because few people run the checks before pushing.

We use a combination of Python Invoke and the flake8 & pylint CLI commands. At some point we plan to reinvestigate our use of Python Invoke as it doesn't run tasks in parallel--we were new to Python when we started this project two years ago and didn't understand the limitations of the tools we were using. (I should say, Python Invoke has worked out well for most of our developer centric tasks.)

We run both Flake8 and Pylint as we started out with Flake8 and later tried to use Pylint. The workload to fix the code base was too much for us as most developers are QA people learning good software design and how to share a large code base with their "extra", "volunteer" time. We've been slowly turning on more Pylint checks and cleaning up the code base. When we did try to move more quickly, we were deluged with complains about the cognitive load it placed on people and backed off.

We did find some differences, but I don't remember what they were and couldn't comment on the significance.

Thanks. It's a tricky topic, as Google has found as well: https://static.googleusercontent.com/media/research.google.c....

While I might choose slightly different tooling and configuration, this closely matches my practices. In particular, I'm a big fan of the pre-commit hook for linting/testing/etc. automatically and preventing a commit if they fail (you can always override it if there's some rare emergency).

Another thing I like to do is create a skeleton project and keep it in Git, so that this stuff doesn't have to be done manually for each new git project. You can check out/fork/clone/whatever the skeleton and have most of this stuff pre-configured, including e.g. a baseline requirements.txt for whatever frameworks you use.

You can also put the pre-commit stuff into a ~/.git_template so that newly-created repos already have it configured. It's a bit harder to manage if you work with multiple unrelated languages/frameworks, but can be a real time-saver in a uniform environment.

The most downside of Python is this. Setting up the environment and dependency management is very confusing and hard to start in the first place. Especially for Python there are a lot of non-programmer users.

It seems to be easier to do right in the first place than fix everything. JavaScript yarn is really good but only based on npm already done most things okayish.

A good news is that modern languages like Go, Rust, Elixir, etc designed more carefully with this, providing much better experiences.

This is part satire part rooted in deep truths and I am conflicted about this.

I guess for some value of "perfect" and "best practices."

Pylint is a much better linting tool than flake8.

What a bunch of shit. This is mere performance art. All you need is a setup.py and you're ready to package up a basic python project. Unbelievable.

Have you ever worked on a development team? If you're not doing most of this, or some form of it, you're doing it wrong. Code that isn't consistent and well-tested is a waste of everyone else's time.

Article introducing useful tools and how to set them up = "Performance Art"?

Contrarian-for-the-sake-of-it and ultimately contentless comment = "Not Performance Art"?

Got it.

It seemed easier than vomiting.

So very constructive, thanks for the input.

Not the most agreeable way of saying it, but 100% agree.

Why is it that every-time someone says something _everyone else should be thinking_, it gets downvoted to hell. I'm guessing it's because the way you said it, but something tells me the wiser, more experienced opinions such as yours just aren't flashy or trendy enough to get support in this modern, up-vote world.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact