> Standard practice atm is to install packages locally to a project by using venv, or rather pipenv.
Thanks for letting me know. This is a good thing to know, it makes me more likely to jump back into Python in the future.
I suppose it is to a certain point an indictment of NPM, certainly I expected more people to start doing this after the left-pad fiasco. But it's also an indictment of package-managers in general.
So let's assume you're using modern NPM or an equivalent. You have a good package manager with both version pinning and (importantly) integrity checks, so you're not worried about it getting compromised. You maintain a private mirror that you host yourself, so you're not worried that it'll go down 5-10 years from now or that the URLs will change. You know that your installation environment will have access to that URL, and you've done enough standardization to know that recompiling your dependencies won't produce code that differs from production. You also only ever install packages from your own mirror, so you don't need to worry about a package that's installed directly from a Github repo vanishing either.
Even in that scenario, you are still going to have to make a network request when your dependencies change. No package manager will remove that requirement. If you're regularly offline, or if your dependencies change often, that's not a solved problem at all. A private mirror doesn't help with that, because your private mirror will still usually need to be accessed over a network (and in any case, how many people here actually have a private package mirror set up on their home network right now?) A cache sort of helps, except on new installs you still have the question of "how do I get the cache? Is it on a flash drive somewhere? How much of the cache do I need?"
If you're maintaining multiple versions of the same software, package install times add up. I've worked in environments where I might jump back forth between a "new" branch and an "old" branch 10 or 15 times a day. And to avoid common bugs in that environment, you have to get into the habit of re-fetching dependencies every checkout. When Yarn came out, faster install times were one of its biggest selling points.
I don't think it's a black-and-white thing. All of the downsides you're talking about exist. It does bloat repo size, it does mess with Github stats (if you care about those). It makes tools like this a bit harder to use. Version conflation doesn't seem like a real problem to me, but it could be I suppose. If you're working across multiple environments or installing things into a system path it's probably not a good idea.
But there are advantages to knowing:
A) 100% that when someone checks out a branch, they won't be running outdated dependencies, even if they forget to run a reinstall.
B) If you checkout a branch while you're on a plane without Internet, it'll still work, even if you've never checked it out before or have cleared your package cache.
C) Your dependency will still be there 5 years from now, and you won't need to boot up a server or buy a domain name to make sure it stays available.
So it's benefits and tradeoffs, as is the case with most things.
I understand that the tradeoffs exist, my surprise is mainly that would be an uncommon workaround in pythonland for workload specific tasks (eg most projects dont have differing library versions across branches; at least not for very long) is common practice in jsland
Although one factor I just realized is that pip also ships pre-compiled binaries (wheels) instead of the actual source, when available. Which would generally be pretty dumb to want in your repo, since its developer-platform specific; assuming js only has text files, it would be a more viable strategy in that ecosystem to have as a common case
Regarding B and C, its not like you’re wiping out your libraries every commit; the common case is install once on git clone, and only again on the uncommon library update. A and C is a bit of an obtuse concern for most projects; I can see it happening and being useful, but eg none of my public project repos in python have the issue of A or B(they’re not big enough to have version dependency upgrades last more than a day, on a single person, finished in a single go) and for C, its much more likely my machine(s) will die long before all the pypy mirrors do;
Which I’m pretty sure is true of like 99% of packages on pypy, and on npm; which makes the divergent common practice weird to me. It makes sense in a larger team environment, but if npm tutorials are also recommending it (or node_modules/ isn’t in standard .gitignores), its really weird.
And now that you’ve pointed it out, I’m pretty sure I’ve seen this behavior in most js projects I’ve peeked at (where there’ll be a commit with 20k lines randomly in the history), which makes me think this is recommended practice
Thanks for letting me know. This is a good thing to know, it makes me more likely to jump back into Python in the future.
I suppose it is to a certain point an indictment of NPM, certainly I expected more people to start doing this after the left-pad fiasco. But it's also an indictment of package-managers in general.
So let's assume you're using modern NPM or an equivalent. You have a good package manager with both version pinning and (importantly) integrity checks, so you're not worried about it getting compromised. You maintain a private mirror that you host yourself, so you're not worried that it'll go down 5-10 years from now or that the URLs will change. You know that your installation environment will have access to that URL, and you've done enough standardization to know that recompiling your dependencies won't produce code that differs from production. You also only ever install packages from your own mirror, so you don't need to worry about a package that's installed directly from a Github repo vanishing either.
Even in that scenario, you are still going to have to make a network request when your dependencies change. No package manager will remove that requirement. If you're regularly offline, or if your dependencies change often, that's not a solved problem at all. A private mirror doesn't help with that, because your private mirror will still usually need to be accessed over a network (and in any case, how many people here actually have a private package mirror set up on their home network right now?) A cache sort of helps, except on new installs you still have the question of "how do I get the cache? Is it on a flash drive somewhere? How much of the cache do I need?"
If you're maintaining multiple versions of the same software, package install times add up. I've worked in environments where I might jump back forth between a "new" branch and an "old" branch 10 or 15 times a day. And to avoid common bugs in that environment, you have to get into the habit of re-fetching dependencies every checkout. When Yarn came out, faster install times were one of its biggest selling points.
I don't think it's a black-and-white thing. All of the downsides you're talking about exist. It does bloat repo size, it does mess with Github stats (if you care about those). It makes tools like this a bit harder to use. Version conflation doesn't seem like a real problem to me, but it could be I suppose. If you're working across multiple environments or installing things into a system path it's probably not a good idea.
But there are advantages to knowing:
A) 100% that when someone checks out a branch, they won't be running outdated dependencies, even if they forget to run a reinstall.
B) If you checkout a branch while you're on a plane without Internet, it'll still work, even if you've never checked it out before or have cleared your package cache.
C) Your dependency will still be there 5 years from now, and you won't need to boot up a server or buy a domain name to make sure it stays available.
So it's benefits and tradeoffs, as is the case with most things.