

Working with Python subprocess - Shells, Processes, Streams, Pipes, Redirects - timf
http://jimmyg.org/blog/2009/working-with-python-subprocess.html

======
squeed
Python's subprocess module is great. It is definitely the cleanest wrapping of
a system program of any of the scripting languages I've ever used. When you
just want output, you call .communicate() and be done with it. When you want
to interact, the file descriptors are there for you to play with. It's
scripting heaven.

There is one feature that he could have touched on more:

shell=False is the only easy, safe way to put user-supplied input in a command
line. This stuff is just too damned easy to get wrong.

I've seen _heroic_ code that tried to work around shell quoting and escaping
vulnerabilities. One colleague was trying to write a wrapper around the
ldapsearch binary, and decided that filtering all "|" would do it. He'd
completely forgotten that ` also triggers arbitrary command execution.

Don't be a hero. Keep bash away from your input.

------
malkia
I had to use os.popen (Maya, MotionBuilder, and outside) (obsolete by now, but
still available), and it seemed much more easier than subprocess.Popen because
I simply wanted a file filter.

Basically a coworker had ways to read/write custom ASCII 3D animation and
model format, and another one introduced a binary form for it. All I did was
to replace his "open" function with "open_model" and "open_anim" (no monkey
patching), with something like this:

    
    
      def open_anim(name, mode, *args, **kwargs):
     	if name.upper().endswith( ".BIN_ANIM" ):
     		if 'r' in mode:
     			return os.popen( "anim_bin2text <" + name, mode, *args, **kwargs )
     		if 'w' in mode:
     			return os.popen( "anim_text2bin >" + name, mode, *args, **kwargs )
     	return open(name, mode, *args, **kwargs)

~~~
VMG
popen might have been simple but don't forget popen2, popen3 and popen4,
spawn, spawnvp, spawnlp. You've got to get rid of that cruft or you become
php.

------
nzoschke
Great article. Chaining pipes together in Python with subprocess can get a
little wordy, but the power is there to do anything you need to do.

It's especially excellent when compared to Ruby. Ruby has a bunch of popen
calls, but each one has strange quirks and inflexibility to make it not good
for building pipelines.

[http://whynotwiki.com/Ruby_/_Process_management#How_do_I_exe...](http://whynotwiki.com/Ruby_/_Process_management#How_do_I_execute_an_external_program.3F)

I'm finding myself using a ton more shell (bash) for pipelines, as this is
what it was built to do, but when I need to add more logic its Python and
subprocess.

------
mturmon
The python parts seemed OK. But the data flow diagram in the OP at:

[http://jimmyg.org/blog/2009/working-with-python-
subprocess.h...](http://jimmyg.org/blog/2009/working-with-python-
subprocess.html#what-happens-when-you-execute-a-command)

(I.e., the first blue box in the article, "what happens when you execute a
command")

is wrong. It implies that the command (ls -l) is read by the terminal
emulator, which gives it to bash, which gives it to the kernel.

This is wrong in several ways. The terminal emulator starts bash (or whatever
shell) and it is bash that directly reads the string "ls -l", figures out what
it means, finds the ls program in PATH, and runs the ls binary. It is ls that
calls kernel functions when it's doing its job.

And of course, the kernel is not really of importance here. The selected
command does whatever it does, which might not even access the kernel ("Hello,
world").

And bash can and sometimes does directly call kernel functions without an
intermediate binary (e.g., unlink(2) to overwrite a file for output
redirection).

~~~
thwarted
_...the data flow diagram in the OP..."what happens when you execute a
command"...is wrong. It implies that the command (ls -l) is read by the
terminal emulator, which gives it to bash, which gives it to the kernel._

It's accurate, the author just left off where the command is coming from
(implying it's the user sitting at a keyboard). You type into the terminal-
emulator, which passes that input to the program it's running, the shell,
which figures out what it means.

A terminal (or something emulating a terminal) is always between the keyboard
and the input of the program running. Even when you login on the console, the
terminal driver is accepting your input, transforming it in some fashion, and
giving it to the nxt level.

A terminal doesn't exist between this echo command and the sh program:

    
    
       echo ls -l | sh
    

And is, in fact, why things like expect(1) exist and that ssh can be
configured to allocate a pseudo-terminal or not depending on how you want/need
to interact with the remote command.

Incidentally, there seems to be a bug in yum whereby it detects if its output
is a terminal or not tries to format it, but it gets the check reversed and
pretty-print word-wraps output that goes to a non-terminal. Which is why

    
    
       yum search something
    

and

    
    
       yum search something | sort
    

produce different output and the latter has blank lines scattered throughout.
(yum 3.2.28 on Fedora 14).

 _And of course, the kernel is not really of importance here. The selected
command does whatever it does, which might not even access the kernel ("Hello,
world")._

Every _Hello, World_ program I've seen displays text to the user (or sends it
somewhere), which means a system call, which is calling "kernel functions".
The kernel is the mediator between programs/processes, there are very few
cases where it isn't directly involved.

 _And bash can and sometimes does directly call kernel functions without an
intermediate binary (e.g., unlink(2) to overwrite a file for output
redirection)._

Not really a good example, but the idea is sound. Bash has builtin
functionality that emulate the functionality of system calls or other
binaries. For this case, bash opens files for redirection when it's going to
overwrite with the O_TRUNC flag set. This can be changed with various shell
options (clobber) to have it not implicitly overwrite. You can see this with
strace.

And technically, "overwriting" a file means truncation or seek-and-write, not
unlink and recreate. The difference has implications for hardlinked files.

~~~
mturmon
OK, I see your point regarding the terminal emulator. There are some keys that
are intercepted by the terminal emulator and processed by it alone (say,
whatever does "print screen" or "close window"). The shell never sees these
key presses.

The thing I didn't (and still don't) like about the diagram is it implies the
whole "ls -l" is delivered by the terminal emulator to the shell. Which isn't
how it works. The emulator is transparent to all normal keystrokes and the
shell is doing all the work, character by character, even for stuff like
ctrl-H which, years ago, could have been (and sometimes was) intercepted and
handled by the terminal driver.

And the presence of the kernel in the diagram is also confused. (You are
right, system calls are all over, even printf() will bottom out in the kernel;
I was trying to say that there is no _requirement_ for the user program to
call in to the kernel.)

Here's a sentence in the OP that has the same confusion:

"When you execute a program from Python you can choose to execute it directly
with the kernel or via a shell."

The kernel is not "executing the program directly". This is nonsense.

What they want to say is that you can use a system call (execve) to directly
load and run the program, or you can use an interpreter like sh.

This same point of confusion is exhibited in their diagram.

~~~
thwarted
_The thing I didn't (and still don't) like about the diagram is it implies the
whole "ls -l" is delivered by the terminal emulator to the shell. Which isn't
how it works. The emulator is transparent to all normal keystrokes and the
shell is doing all the work, character by character, even for stuff like
ctrl-H which, years ago, could have been (and sometimes was) intercepted and
handled by the terminal driver._

I don't think it's as simple as this. When the shell spawns a program, it gets
out of the way and the program spawned receives all the input directly, the
terminal intermediates, but the shell doesn't. For example, if you want to
turn off echo (the printing of the characters you type), you need to change
the terminal settings. This is done with _stty -echo_ (go ahead and type that,
you won't be able to see what you're typing), which communicates with the
terminal driver to tell it to not print characters that are input. This is how
password prompts are done.

Witness:

    
    
       $ od -c
       hello<backspace><backspace><backspace><backspace><backspace><control-d>
       0000000
    

_od_ sees no input. The input was deleted by the terminal driver. Usually, the
terminal is in line-input mode, so this can happen. Now, in my case, my
backspace key maps to ^?, so to actually get a control-h from my backspace
key, I had to do _stty erase ^H_. Then, the above produces:

    
    
       0000000   h   e   l   l   o 177 177 177 177 177
    

because the backspace/erasure didn't occur at the terminal level (in fact,
nothing got erased at all, nothing interpreted the input that way, and the
input was passed to od plainly). Note the output of _stty -a_. There are a
number of line editing control characters that modify the input _at the
terminal level_ before the program/shell ever even sees it.

You can see this with _strace -o /tmp/s.out -f bash_ then running _od -c_ in
that shell, typing the above, erasing it, and seeing that neither bash wakes
up to handle the input and od sees no input (od's only read shows up as
returning zero bytes, indicating end-of-file).

~~~
mturmon
We're not in disagreement at all. Thanks for the careful reply.

It's definitely true that some spawned programs get line-by-line input (or
some other buffered input). And some don't (e.g., bash itself, which when
connected via a tty, interprets bazillions of control codes).

