I have struggled with virtual environments and runtime executables across various OSs.
What are current best practices?
2) Run pip install. myproject/bin/activate; pip install requirements.txt; (download all project dependencies).
3) Start the application. myproject/bin/activate; python myapp.py
If you can assume that there is an interpreter available on the system, say /usr/bin/python3, you can use that instead of creating a virtual environment.
If you want to embed all the dependencies, you can save all the files created by pip install in the /lib directory if I remember the name well.
Should I write a full blog post with example? I used to deploy python applications in a bank, can explain all the advanced usage with and without internet access, with and without dependencies.
" you can assume that there is an interpreter available on the system, say /usr/bin/python3, you can use that instead of creating a virtual environment."
Edit: Although if startup performance is important, using a virtual environment might be a good idea even for simple scripts. Running a "hello world" program using the system Python takes twice as long on my system as running it with "python -S" (i.e. ignoring site packages) or from a fresh virtual env.
Aaaand, now that you're at it, a single .dmg file for MacOS with binaries and scripts so us lazy people with Macbooks can start coding right away :) (it might be a bit inaccurate but I guess you know what I mean)
In my case, and contrary to what one of the other commentators said, I'm using docker.
It took a while to get it ironed out, but I use a "base" folder where I have already downloaded all the packages I use by default (Top of my mind are pandas, numpy and youtube-dl ;) )
I have all the related configuration in an env file that tells pip to save the packages in the base folder so they don't disappear when I shut down the container - that I always run with --rm so they get removed when dead - and for creating a new env I just have to copy that folder. As I use a base folder for all of this I don't need to remember the names of the envs, as just need to list folders.
Only downside to this - because I'm lazy and it still hasn't bothered me enough to fix it - is that new files & folders are owned by root.
I use just a base image with python, and for different versions that work with this - supposedly - I just have to download a different image.
Learning how to use a simple barebones venv is extremely easy, saves a ton of time both in the short and long run, and generalises better.
pip install virtualenv
virtualenv -p result_of_which_python env
pip install anything_you_like
> but I use a "base" folder where I have already downloaded all the packages I use by default [...etc...]
Your setup sounds outrageously complex, and I don't understand why you would do any of that.
python3 -m venv env
pip install anything_you_like
py -3.6-32 -m venv env
- It would work the same way on Windows, Linux and macOS
- Most importantly (!), it is not a Python package manager. That is, if a package needs BLAS or MKL or HDF5 or whatever else, you won't be crossing fingers and hoping your system-installed version would work for all your venvs; instead, those binary libraries are properly managed per environment.
Pro tip: use mamba instead of conda to get a free 4x speed boost.
That's the generalisation part I mentioned. The distinction you're making is non-trival for a wee noobie. Better they get the ground work in and expand to where they need to go.
Then they're reading on some blog about protobuf, or hdf5, or arrow or whatever, and want to use it - but using either from Python needs installed C libraries. On linux/macos you'll need to dig into your package managers and then hope things are compatible, on Windows it's a complete pain. In conda, they can just 'conda install h5py' or whatever, and get proper hdf5 installed without having to figure out the nitty gritty details.
Well, already I can tell you're disconnected from what "most noobs" are actually like. Presumably you think "most noobs" are data scientists, which just isn't the case in my experience.
It is the case in my personal experience though. I don't think I'm disconnected from what 'noobs' are as I've taught a lot of them through my line of work.
Assume you're a noob and are exploring Python package universe, fire a new private tab and google 'most popular python packages'; in my case the first 3 links are:
All of which list numpy, pandas and the rest e.g. tensorflow and friends.
Please let them actually use python before sending them into tooling hell.
If you're just writing "a=b+c"-level Python scripts, or scrambling together tiny flask/django apps, or whatever else, you probably don't care indeed.
If you're doing/learning any kind of data-science-related stuff, that requires tons of C extensions and libraries. And many 'noobs' learn Python in order to use pandas/numpy/torch/whatever else is hot these days.
Of all the times I have tried conda only once was it as easy as following the guide they post. The other times it was completely broken. I also greatly dislikes that it pollutes your default environment. Imagine if every piece of software would do that.
But isn't that similar to what virtualenv does?
The differences are that with virtualenv, folders are created elsewhere.
Besides that, friction so far hasn't enough for me to automate this even more (as in, creating a script that would do the folder creation) but I guess I'll do eventually.
Also, this way I have a set of python packages already installed.
And it was fun learning how to command pip / python to do things as I wanted them. That I guess answers this:
> I don't understand why you would do any of that
virtualenv -p $(which python) env
Isn't this what virtualenv was created to get away from?
If several of your projects rely on one of the "base" packages, upgrading it (which you might want for one project) could break the other projects.
Also, having implicit dependencies on packages might lead to problems with deployment (or for other developers) because they don't know that this package is required (and you might not have realized either, since it's always available on your machine).
(I might have misunderstood your solution, though. I've never used Docker.)
This other comment  clarifies my whole set up, and sheds a lot more light on how it works
The friction here sounds like a tire fire. I think you are just not aware at the moment of how much less convoluted this can be. You've become used to this complexity.
OK. I understand where are you coming from, and I feel your pain, being a developer myself and have been way too many times that I want to admit on the "But, why?!" position.
You sound really distressed by this, and at the same time honestly curious (And baffled can probably shoved in there) so 'll explain things a little more.
I think all this starts with the fact that I explained myself rather poorly. I _do_ have different envs for different codebases. My set up is like this:
Everything python related is in a folder where I store my "dockerized" projects:
then I have a "base" project folder, that I use as a starting point for _most_ of my projects:
Whenever I want to start a new project, the steps I need to do is:
Once the folder is there, it just starts a container in interactive mode with the (environment) base folder always shared as /work
In case I need some files to work there, I just copy them.
Again, the _only_ issue so far - that annoys me -, is that if I create files when inside the container they are owned by root. Eventually I'll grow tired of this and will fix it, but not there yet.
for me starting a certain env is just one line of code. Obviously you could do the same with your method.
Said that I didn't have the script, for me creating a new env goes like this:
cp -r ./base new_env
docker run -it -rm --name new_env_container -v $(pwd)/new_env:/work -w="/work" python:3.7 /bin/bash
rm -r new_env
This way, the container is removed every time you exit it, but it has a name so you can log into another console would you need it. the only issue with this is that it's a barebones OS, so if you need some program for it you have to install it (And that is lost of you don't keep the container: in my todo list there's a change for python.sh where it would be possible to persist the containers, and run them from the one already existing if found. The downside to this is that every container you persist is several hundred mb... But space is cheap, right?). BTW, I do have a container I persist for use with youtube-dl as it depends heavily on ffmpeg.
From my point of view, that works the same a venv. Your mileage might vary.
Caveat: python:3.7 is not the default name of the python image; I renamed it because the default was too long. I used to do it this way until I got tired of writing the same long command.
but otherwise, i agree entirely
1/ add a Dockerfile into your project with the following lines:
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD [ "python", "./your-daemon-or-script.py" ]
3/ git clone, docker build, docker run and that's it
- Debug containerized apps: https://code.visualstudio.com/docs/containers/debug-common
- Debug Python within a container: https://code.visualstudio.com/docs/containers/debug-python
You can apparently develop directly inside a container too:
- Developing inside a Container: https://code.visualstudio.com/docs/remote/containers
# the base miniconda3 image
# load in the environment.yml file - this file controls what Python packages we install
ADD environment.yml /
# install the Python packages we specified into the base environment
RUN conda update -n base conda -y && conda env update && conda install -y -q moto && conda install -y -q -c conda-forge awscli httmock
# download the coder binary, untar it, and allow it to be executed
RUN wget https://github.com/cdr/code-server/releases/download/2.1698/... \
&& tar -xzvf code-server2.1698-vsc1.41.1-linux-x86_64.tar.gz && chmod +x code-server2.1698-vsc1.41.1-linux-x86_64/code-server
COPY docker-entrypoint.sh /usr/local/bin/
ADD ./code /code
Building that and running it with: docker run -d -p 127.0.0.1:8443:8080 -p 127.0.0.1:8888:8888 -v $(pwd)/data:/data -v $(pwd)/code:/code --rm -it <image>
from the directory where your code is will put those files into the container, and start a VS Code and a Jupyter Notebook server on your localhost. The password for Jypter is the default "local-development" and the password for the VS Code instance is in the Docker logs. You can set these via the Dockerfile but I just keep the defaults.
I vastly prefer this to anything else because it means I can install any packages I want without worrying about messing up my environment. You can use virtual envs to make this even better, but I am typically too dumb and lazy for that. Better part still is that my development is the same on my Mac, on my Linux machine, and on my Windows machine. Same VS Code version, same packages, etc.
Biggest issue here is with certain VS Code plugins. Some, like the vim plugin, can be finicky and depend heavily on the version of code server that you use. Some plugins break completely. However, I mainly hate plugins so this doesn't present much of an issue for me personally. I have the vim plugin, the python plugin, and a terraform plugin installed. Once they are installed, they work perfectly for me.
The way my set up works is I have a repo with that Dockerfile in it as well as the accompanying files such as environment.yml and docker-entrypoint.sh:
if [ $# -eq 0 ]
jupyter lab --ip=0.0.0.0 --NotebookApp.token='local-development' --allow-root --no-browser &> /dev/null &
code-server2.1698-vsc1.41.1-linux-x86_64/code-server --allow-http --no-auth --data-dir /data /code
and a .gitignore file with this in it:
Oh also I found the repo where I took these things from: https://github.com/caesarnine/data-science-docker-vscode-tem...
Repl.it is one way to sidestep these issues but it's not perfect. Making people jump in the deep end and use Linux helps to some extent, but also brings its own set of issues and frustrations.
I don't think "add another layer of abstraction to hide the complexity" is often a good solution. Docker brings it's own problems too.
I wholeheartedly agree here.
> Docker brings it's own problems too
This is a rather complex statement to reply to. Even thought you might be right, I don't think this applies totally to what is being discussed here.
The biggest issue I have with people advising for or against a certain tool, is that they do that from the point of view of the tool, instead of looking at fit from the problem you are trying to solve. that in your case, would be:
> I have years of experience with messing around with all of the relevant parts of Windows, Mac, and Linux and it never Just Works
As long as you manage to install Docker in all those 3 systems (I have no experience with Mac because I don't use it) both for Windows and Linux installing Docker is a no brainer.
There's a slight curve when it comes to fetching the right image and running it, but your problem is not that one; your problem is teaching Python. So you can take care of that yourself, and focus on the teaching part.
Supposing that you managed to install Docker, fetch a python image and running it (Something that is a lot more easy to do than it sounds), you have python, whatever version you want, and in an isolated way.
For me... it just works.
But this approach might be a bit limiting depending on your target audience.
If we're talking about web development with Python typically that means PostgreSQL and Redis too, and probably running Celery in addition to a web server such as gunicorn.
It's really nice to be able to just run a single Docker Compose command and be up and running in a way that works the same on Windows, MacOS and Linux.
This post outlines the differences between creating a Python development environment with and without Docker https://nickjanetakis.com/blog/setting-up-a-python-developme.... It focuses on the use case of web development.
Virtual environments are just separate copies of python with their own libraries installed.
Pipenv is basically just a workflow for organizing virtual environments
I can give someone a repo and tell them to type pip install pipenv && pipenv sync and they'll have everything. Assuming they already have the correct version of python installed, which is one nice thing docker handles, but it is easy to install python these days so it hasnt been an issue.
My biggest problem with docker was that I ended up using a ton of storage just learning about it. I have a feeling that was mostly my own fault, maybe using too heavy of a base image. I've been trying it out once or twice a year for about 6 years now, every time my conclusion is "wow this is really cool, I wish i could justify spending more time to get it right"
Though docker and virtual environments share the same problem, in that they are just a way for a developer to distribute code to other developers, and to production environments. Distributing python applications to end users is a totally different issue. I floated the idea of sending out a local data collection* app to Mac and windows users mostly because I think it would be fun to try.
*data collection of troubleshooting information from users within the same company on company hardware, that are actively asking for help. I'm not trying to spy on people.
Probably some combination of memory usage and complexity, depending on your application. If you're already familiar with using docker as a development environment, definitely go for it.
I don't use pipenv, I'm still using plain old virtualenv for development. Mostly it's just a matter of familiarity. If there's not an itch, why scratch?
And yes, setting up Python is a pain, so I think being able to start learning and running code without any setup is a big deal for learners. Get them excited about coding before they have to deal with that!
- Vagrant (so you don't have to worry about runtime executables) with the stack matching that of the production server (you can even ask ops to provide you with the provisioning script and remove the parts that you don't need - otherwise learning to provision your dev. VMs won't hurt you)
Then simply point your IDE (I only use Vim under duress) to the remote Python interpreter (the one you installed in the Vagrant VM).
It does add processing overhead but it worked with my 2010 MacBook Pro until it died and still works (only ten times faster) with the 2016 model. Your only limitation would be the RAM (I would recommend at least 8GB and if you plan to run multiple machines communicating together as much as you can afford - I do believe that Docker has less overhead, but again, for my use, not needed).
The best practice is what works for you, not the latest trend.
2) source venvname/bin/activate
then you do everything in the virtual environment...
The process died.
Your code probably took too long.
Maybe you have an infinite loop?
I'm thinking I should start a Slack. Any opinions on that?
In the meantime can you open an issue to talk about it?
If you don't want to create a full chat I'm happy to host on a Zulip chat for hackers I'm part of, just let me know.
I'll open an issue, thanks!
I learned to code with Java like 10 years ago. I haven't touched it again after graduating, and to be honest I was not so fond of the language (too verbose and too much OOP for my taste), but I'm glad I learned the benefits of having static types and a compiler from the start.
Obviously, Python is closer to pseudo code than Java, which is also great when you just start learning.
The scripting aspect is great because you can get useful things done with a few lines and minimal constructs. There's a pile of solved problems on Stackoverflow. Folks can approach it at their own pace and on their own terms, at work or at home. I promise them that if they hate it, I'll refund the price. ;-)
Now, germane to the discussion in this thread about installing the packages and so forth. That's a drawback to Python that I tell people about, but my approach is to provide complete hand holding on installation until they've come up to speed on programming. If they've never programmed, then they've probably never approached a computer from the command line or dug into its file structure. So, learning a bit of programming first is a good way to prepare for doing that other stuff.
I originally learned BASIC without knowing how to install BASIC on the mainframe.
We're a Windows shop, so I help them download and install WinPython. Running into a useful library that WinPython doesn't have is pretty rare, and then a pip install within the WinPython shell usually works.
It's a fantastic scripting language, which makes it excellent for people who want to automate computing tasks. Researchers, journalists, academics, administrative roles, etc.
But it is very high-level, so it is not a great way to teach/learn how something like a computer, database, or operating system works.
I think it's a great way to start teaching or learning programming, since you can keep using it no matter where your career takes you.
With tech being a core part of many jobs these days it's essential that everyone knows at least the basics. Python is great for that purpose.
The example in the readme isn't idiomatic Python. It would be better to use enumerate.
Yes, enumerate is better, but I want to teach concepts one at a time and I want students to understand what's going on in code. Before teaching `for a, b in c` I want to teach `a, b = c`. And to motivate doing that I want to teach `return a, b`. At this point in the course they're only just starting to learn about lists - they haven't seen tuples and they've never defined a function. So it's not time yet.
Besides, it's important that students are intimately familiar with how to index a list a which indices are valid.
To continue to be nit picky with that example, the range function produces an iterable object, which is different from a list.
It makes teaching harder. You really have to work to come up with great examples. But it makes learning easier and there's less correcting to do later.
A teacher is in a position of authoritative trust. As a student, after I learn an example is flawed, I often wonder if I’m just missing some context because I trust the teacher to have gotten it right. In a setting where communication is already established (e.g., a class room) this can be cleared up quickly with a question. In other settings (e.g., reading a text), it can leave me wondering until I reach a much higher level of competence.
There's a similar strategy of building things up and then refactoring when they get bad (this ifelse is getting too big. We could use a dict). But the difference is every step along the way is valid or immediately corrected.
The student has never seen a tuple, or iterable unpacking of any form. Would you just show them `for index, word in enumerate(words)` and tell them not to worry about what that means?
Quite shortly after this, I ask them to essentially do `zip_longest`. Given two string variables:
string1 = "Goodbye"
string2 = "World"
length1 = len(string1)
length2 = len(string2)
if length1 > length2: # one could use max, but I don't expect them to
length = length1
length = length2
for i in range(length):
if i < len(string1):
char1 = string1[i]
char1 = ' '
if i < len(string2):
char2 = string2[i]
char2 = ' '
print(char1 + ' ' + char2)
I'd teach index access and number generation separately, with small incrementing variations. Then I'd show iteration, then enumeration (maybe after tuple unpacking).
When you combine two concepts it doesn't create just a combination. It creates one or more new concepts. Those new concepts have their own idioms and they should be taught properly.
Combining looping over numbers and index access creates two new concepts: looping over items and looping over items with their index. Both of those things have their own idiom in Python and your solution shows neither.
I think telling someone not to worry about the details of how something works because you'll get back to it later is way better than showing them the wrong thing and then correcting it. One is a promise kept, the other a promise broken.
In my opinion, nailing this kind of stuff is like half of the value of the project you're doing. You need to lean way in on it and get it right.
My take is right now you're thinking backwards from the position of someone who knows how to code. You need to think forward like someone who doesn't, at least more often. A student isn't going to be motivated to learn a, b = c by learning return a, b first because they won't know that second concept exists!
In short, if your students aren't ready to use enumerate or zip_longest, don't hand them problems that call for enumerate or zip_longest.
I'm not sure I'd say teaching someone to do something by hand for which they could use an existing function is "wrong". There's huge pedagogical value in knowing how powerful tools you didn't make work. Going bottom-up is a very effective way to do that (just ask the lisp folks), and a culmination of "nice job, you've implemented something so useful that it mirrors what the language designers/library authors did as well, here's how to use their version to save time in the future" is far from a broken promise.
I didn't say it was wrong the wrong way to teach. I said the code was wrong. Like, if I removed myself from teaching and I just saw that in a PR, I'd definitely suggest they use enumerate instead.
I'm a huge fan of reimplementing built in functions as a way to learn. I'm learning Clojure right now (like I stopped to write this comment) and I do it all the time. But, I know I'm doing it.
The other thing that's fine is building up to the abstraction. "Let's get the index, now the item. Okay, there's a better way to do this". But you have to do it immediately.
I'm just not a fan of showing someone something, letting it sink in, and then having to go back and correct it.
And while I hope my arguments stand on their own, and I certainly could be wrong, I'm at least not speaking out of complete inexperience. I've spent a decent amount of time teaching people to code, juggle, work at rope courses, and fly airplanes.
But you may be surprised how many list-like operations `range` supports. Subscripting, `len`, `in`, `.count`, `.index`. It might as well be a tuple.
And if you want to nitpick more, range isn't a function.
Your project looks amazing. I hope I can find ways to contribute.