Some time ago, I had blogged a series of posts about various ways of doing pipe-like operations in Python, including one experimental project of mine (which is not like real Unix pipes, which are IPC, but is intra-process), and some others, including PyP. Below are links to some of those posts, which may be of interest. Note: some of the posts link to each other, so I've posted them below in reverse chronological order.
Swapping pipe components at runtime with pipe_controller:
Mostly out of frustration for PyP not being lazy (on large inputs it reads the entire file up-front, or at least used to).
But it was quite interesting to implement all the standard python idioms (like slicing) in a lazy way, and it's not that complicated a tool. I still use it a lot whenever I have a nontrivial pipeline to write.
Yea, fortunately pythonpy supports lazy iteration over sys.stdin when you really need it. Just like in python, the syntax won't be as nice as using a list. But it works:
However, the number of times that you need this are surprisingly rare. Most lazy operations don't require that each row be aware of the surrounding row context, and using the much simpler:
py -x 'new_row_from_old_row(x)'
will get the job done in a lazy fashion. Usually, when you need rows to be context aware, as in:
py -l 'sorted(l)'
or
py -l 'set(l)'
it's just not possible to accomplish your task without reading in all of stdin.
Cool :), glad it's supported, at least for the simple case of line-wise transforms.
Some things can't be done without reading everything. But there are still a number of operations on "all of stdin" that can safely be done lazily. I'm particularly fond of "divide stdin into chunks of lines separated by <predicate>" [0]. Which does need context, but only enough to determine where the current chunk ends (typically a few lines).
`py` seems to be aimed at a single expression per invocation (nice and simple), while `piep` recreates pipelines internally (more complex but also means pipelines can produce arbitrary objects rather than single-line strings). So I'm not really sure how you'd do the above in `py` anyway.
You should put up a donation link. If more people do that and it becomes common practice to donate to useful open source tools / projects then imagine what more awesome stuff could be created. Many people desire to build tools but don't have the financial support to do so.
I'm confused by this.
How does contributing money to people developing software for the commons lead to abandonware? And is abandonware a problem if it is free/open-source? Should I avoid publishing my hobby projects that I'm not planning to support?
It encourages people to start up projects just to get the contributions, then abandon the projects when the initial stream of contributions dries up. Yeah abandonware is a problem for people who start to use it thinking there is a community of some sort behind it, and then are stuck with supporting it all on their own or having to find something else. Fortunately such projects are normally easy to spot once you've been burned a few times. Unfortunately they comprise 95% of what's on github.
ls | py -x '"mv %s %s.txt" % (x,x)' | sh has me worried, I can't find the escape code directly. Also you might wish to offer the following api ls | py -X '["mv","%s","%s.txt"] and run those using processes or sub-process. This would ensure the right argument in the right place. Just an idea.
Hmm, I just added this example yesterday, and it seems to be highly contested, so I'll exchange it for something less controversial. That being said, if you're concerned about spaces, the best way to do this is either using shutil.move, or by generating output like mv 'file name' 'file name.txt' to pipe into sh. Of course, since you need to wrap your commands with single quotes to avoid bash quirks, it may seem difficult to use single quotes in a pythonpy expression. Fear not though - just use sharp quotes (`), which pythonpy will swap out for single quotes:
Yea, IMO the best way to do this with pythonpy is:
py -l 'len(l)'
That being said, I still use wc -l every time. In fact, I always prefer regular unix commands (e.g. grep, xargs, head) to pythonpy when possible. But there are some things, such as:
py -l 'l[::2]'
which while possible with tools like sed, are just much better expressed with pythonpy.
Ah! I haven't thought of using wc, although I frequently use it to count the number of lines in my code. For awk, yes, I have to learn it. Thanks to you and the others who gave the example in bash.
Technically this will work on any shell not just bash but good luck! Shell scripting is rather fun once you get the hang of it. Though if I'm honest, if I get a shell script about 30+ lines I tend to back up and break out ruby/perl/python/anything but shell. Think of it as a succinct glue language and you'll do fine.
In the meantime feel free to use this python thing, its rather fun to be honest. Just wanted to demonstrate the unix-y way of doing it.
'wc -l' counts the number of lines. It also supports '-c' and '-b'.
To work with input split into columns, use awk. By default, it assumes the columns are separated with spaces. This can be customized with the '-F' option.
For reference, you can do this using the wc core utility. The -l option makes it count the number of lines so 'wc -l' will do the same thing as your sum expression.
I attempted to install on Windows - it doesn't work due to the lack of os.geteuid(), which is called in the setup.py file to check for root. With a little bit of finagling I could get around that and make it install, but not work due to undefined SIGPIPE in the signal module. It makes sense that this could be UNIX only, but that wasn't stated anywhere on the webpage. Perhaps that should be mentioned somewhere.
Yea, sorry about that. I had an initiative to make a windows version, wpy, a little after launch, but couldn't really get any windows developers interested in making it happen. From my time working on windows, I know this tool would be even more valuable there than on unix. I'll try to find a place to indicate this doesn't work. Perhaps there's a good way to check if the system is windows in the setup.py and warn users.
I usually use Python(or another lang's interpreter) as my calculator, so this will save me a few keystrokes in the future. I think I'll still use short scripts rather than the command line to do more involved things (like find long palindromes in a txt file), but this will save me a bit of time when doing quick calculations. Thanks!
http://opensource.imageworks.com/?p=pyp
The Pyed Piper Tutorial: http://www.youtube.com/watch?v=eWtVWF0JSJA