
How I think I want to drop modern Python packages into a single program - pcr910303
https://utcc.utoronto.ca/~cks/space/blog/python/PipDropInInstall
======
matmann2001
This is a solved problem and the solution is virtualenv. Learning how to use
venv's would have likely taken less time than the tinkering described in this
post. If you're going to regularly develop with python and use 2nd/3rd party
code, it's worth knowing.

~~~
zwieback
Agreed - I was a virtualenv holdout until recently and even though it's a
horrible solution with replication of junk everywhere it's the way to go.

~~~
linkdd
Deduplication should happen at the filesystem level.

Databases do it, Git does it, ZFS does it.

Ok the first two examples aren't filesystems but they can store files, so not
totally irrelevant?

~~~
zwieback
Yeah, agree kind of. But having to resort to duplicating binaries over and
over is a sign something isn't right. Not that it's an easy thing to solve.

~~~
linkdd
It depends, binaries are just data. You reference that data with a path on the
filesystem. In a programming language it would be assigning a value to a
variable. For static values, the compiler/interpreter can optimize and do
deduplication. And you expect it to do so because you don't want a global
variable holding the value 1 to do the optimization yourself.

Relying on abstractions to simplify the architecture is a sign something might
be right. Simply because simple things are less error-prone IMHO.

------
PaulHoule
With Python the future is here but it is not evenly distributed. In
particular, it's been depressing to see that "any answer at all" has so often
won out against "a correct answer".

Java has been a safe place to deploy code to because of its xenophobia. You
don't expect Java to be linked into endless C libraries, and Java often
doesn't come pre-installed thanks to license issues. (Maybe Larry Ellison is
rich because you read "INSTALL JAVA TO EXPERIENCE THE POWER OF JAVA". The
consequence is that you got a clean copy of Java and that your clean copy of
Java won't break all the crapplets that came with your OS that expected an
ancient Java that came with the OS.

Python got a boon from being shipped w/ most Linux distros, but it was also a
setback... On Linux "python" became a synonym for "python 2" because otherwise
you'd break the system scripts. (Today it drives me nuts that Microsoft added
"python" to the past on Windows because Windows was the one operating system
where you didn't have to hide the system Python to get a good Python)

"venv" is genius but conda got so much mindshare that many people never
realized "venv" could do everything "conda" does except for corrupting
internal data structures that force a periodic reinstall.

The consequence is that "best practices" haven't been institutionalized in
most places, you have to continuously fight bad practices, and people like the
poster are always inventing non-solutions to the problem venv solves.

(e.g. for instance, what about the person who uses a Docker container to keep
python environments apart as opposed to one who uses venv?)

~~~
tempay
For conda it’s an unfair characterisation. Conda is language agnostic and can
be (and is) used to ship anything. Packages like GCC/Clang/R will never be
accessible for venv. Personally I find it magical to be able to ship complex
packages on conda-forge that “just work” on Linux/macOS/Windows/ARM/POWER
without requiring users to have special privileges.

~~~
PaulHoule
I'm tempted to say "so what?"

Here's a specific example. In a previous job I had to work with Tensorflow
models of various vintages that depended on different versions of CUDA.

Conda (out of the box) did nothing to help with that problem. I think an early
version of Conda might have installed one early version of CUDA, but the
elephant in the room as that I would need to install as many versions of CUDA
as necessary.

NVIDIA uses a lot of intimidation to fool people as to the nature of the
drivers. Specifically, CUDA/CuDNN and such don't depend on kernel drivers or
Windows installers, but just require that some huge directories full of
userspace libraries be on your path. Thus you can have CUDA version A linked
into one conda environment, and CUDA version B linked into another conda
environment and it all worked.

That is, it worked when I made my own conda packages (conda wasn't allowed to
distribute such package by NVIDIA) and it was astonishingly slow because conda
stores packages in BZIP2 files that take forever to unpack and because conda
wastes more time with it's symlink machine.

In the end it was possible to solve the problem w/ conda but if I hadn't been
distracted by conda I probably would have solved it just as quickly by
unpacking the userspace drivers once and just setting the path appropriately.

------
hprotagonist
I do full time scientific python, and this post has some _painfully_ wrong
ideas.

 _> The official oletools install instructions talk about using either pip or
setup.py. As a general rule, we're very strongly against installing anything
system-wide except through Ubuntu's own package management system, and the
environment our Python program runs in doesn't really have a home directory to
use pip's --user option, so the obvious and simple pip invocations are out._

No! The obvious and simple pip invocation is as follows:

    
    
      python3 -m venv ${HOME}/myvenv
      ${HOME}/myvenv/bin/pip install foo bar baz oletools
      ${HOME}/myvenv/bin/pin install -e /path/to/my/installable/project
    

and then edit your code in `/path/to/my/installable/project/project-name/*.py`
to your heart's content and tell your IDE or editor that
`${HOME}/myvenv/bin/python` is the python to use. Activation is strictly
optional, but if you like that, `source ${HOME}/bin/activate` (or use direnv
to do it for you).

I would also recommend

    
    
      export PIP_REQUIRE_VIRTUALENV=1
    

I divorce what $USER thinks `python` means from the system python as soon as I
can with `pyenv global 3.8.5`:
[https://github.com/pyenv/pyenv](https://github.com/pyenv/pyenv)

Of course never, ever `sudo pip ... `.

And don't touch $PYTHONPATH unless someone has forced you to use Caffe, who
should be shamed on a regular basis for being Bad At Packages.

~~~
hal9000-tng
> No! The obvious and simple pip invocation is as follows:

> [complex commands to set up a venv follows]

"All you have to do is... recompile your kernel... check your version
dependencies... maybe do that once or twice. It's so simple, I don't know why
everyone doesn't do it!"

Ubergeek.tv's classic "Switch to Linux" spoof ad of 2001:

[https://www.youtube.com/watch?v=Xtah_05BOe8](https://www.youtube.com/watch?v=Xtah_05BOe8)

~~~
hprotagonist
I generalized a bit for completeness and to emphasize that contrary to the
article's point, you can do all of this in $HOME and never install anything
outside your home directory, which was a concern of the author as a usage
policy for their server.

    
    
      python -m venv myvenv
      myvenv/bin/pip install ...
    

works just as well, i just needed a concrete path to talk about in the rest of
the text.

Compared to "just pip" it's one whole extra line, so I don't think it's
amazingly godawful at all.

~~~
pdonis
_> you can do all of this in $HOME_

Which the author explicitly said _does not exist_ for the use case he is
discussing:

"the environment our Python program runs in doesn't really have a home
directory"

~~~
hprotagonist
so put it wherever! it's just a directory, put it where you have write access.

~~~
pdonis
_> so put it wherever! it's just a directory_

You can certainly put a directory wherever you like and put the code there. In
fact, that's exactly what the author did, as he describes in the article.

What you can't do is just magically have that directory be the "home"
directory of a user and have the $HOME environment variable always point to
it. You might be able to set things up to do that in principle, but it might
not be worth the trouble as compared to other solutions. In fact, that's
pretty much the position the author takes in the article for his use case.

~~~
hprotagonist
the position the author takes is "i've never heard of `venv` so i'll do these
other terrible 5 hacks instead".

$HOME or not is a red herring, i apologize if it appears more salient than it
really was meant to be.

~~~
pdonis
_> the position the author takes is "i've never heard of `venv` so i'll do
these other terrible 5 hacks instead"._

No, it's "I don't need the full power of venv so I'll just put the code in a
specific directory now that I've figured out how to tell pip to do that when
it's not a user's home directory". Using pip install's "\--target" option
hardly qualifies as a "terrible hack". The designers put it there because
there were valid use cases for it. Setting environment variables before
running a program is hardly a "terrible hack" either; it's one of the most
common uses for shell scripts.

 _> $HOME or not is a red herring_

If all you meant by it is "put the code in any directory you like", then, as I
said, that's exactly what the author did.

------
skywhopper
> Nor do we want to learn about how to build and maintain Python virtual
> environments

This is a shame, because this is one thing that Python gets mostly right. You
don't have to fuss around with the shell wrapper, etc. For 90% of cases you
can probably do:

    
    
        $ virtualenv subdir-name
        $ subdir-name/bin/pip install <package-names>
        $ subdir-name/bin/python your-python-script.py

~~~
inshadows
You also lose tested versions from OS. You will either have to do your own QA
of working dependency versions and manually keep them updated over time (e.g.
security fixes, or stick with what you freeze forever), or just get dump of
last versions every time you deploy the virtualenv.

~~~
jnwatson
Each package owner probably knows much better about the dependencies than the
"OS". Save for a handful of Python packages that have extra-Python OS
dependencies, relying upon the explicit versions called out by the package
owner is generally more reliable.

------
mixmastamyk
Downloading the package and "vendoring" it can be a useful strategy. The tool
basket can help with that, although pip to a folder should work as well.

[https://pypi.org/project/Basket/](https://pypi.org/project/Basket/)

------
bmn__
The correct solution here is to make a Debian/Ubuntu package out of oletools,
that way Chris gets exactly what the wanted.

If he asks nicely on IRC, he might even get someone else to do it for him.

------
jklehm
poetry, pipx, and pyenv are pretty much all you need to be happy with python
project management and sandboxing. They address the issues in the article and
are far more ergonomic than plain venv wrangling.

------
Galanwe
There is no real direction in the article. I did not see any takeaway, or some
kind of proposition or nice trick?

It can pretty much be summed up as "I don't know much of Python and admittedly
don't want to spend time on it, so I did a bunch of hacks to download my
dependencies and it worked".

> Nor do we want to learn about how to build and maintain Python virtual
> environments

I think the author would have better spent his time learning the basic usage
of virtualenv/venv than writing this blog post.

~~~
IshKebab
> I think the author would have better spent his time learning the basic usage
> of virtualenv/venv than writing this blog post.

I don't. Virtualenv is a hack to workaround the mess of Python paths that they
won't actually fix. I'd put it in a similar bucket as Docker (hacking around
the mess of binary software distribution on Linux).

I don't want to spend my time learning workarounds.

~~~
gojomo
But, it's a "hack workaround" that a professional consensus has converged
upon. Pythonistas have developed supporting tools & help-texts, & many assume
the use of silo'd virtual environments as 'basic hygiene' for serious work.
The approach is also highly analogous to other "alternate virtual root"
solutions across OSes/containers/user-customization schemes.

So instead of that, you'd rather... make you own custom hack workarounds? That
will be idiosyncratic to your installations/projects? That will hit unique
errors/tradeoffs the standard convention has already worked around? That will
be harder for others to understand?

OK, to each their own!

