First of all, all of these frameworks use Hadoop Streaming. As mentioned in the mrjob 0.4-dev docs  (pardon the run-on sentence):
"Although Hadoop is primarly designed to work with JVM code, it supports other languages via Hadoop Streaming, a special jar which calls an arbitrary program as a subprocess, passing input via stdin and gathering results via stdout."
mrjob's role is to give you a structured way to write Hadoop Streaming jobs in Python (or, recently, any language). When your task runs, it's taking input and output the same way as your raw Python example does, except mrjob is passing it through to the methods you've defined after running each line through a deserialization function. It picks the code to run based on command line arguments such as --mapper, --reducer, etc. The output of the code is again serialized. The methods of [de]serialization are defined declaritively in your class, as you showed.
So why did you find mrjob to be slower than bare Hadoop Streaming? I don't know! In theory, you're running approximately the same code between the mrjob and the bare Python versions of your script. If anyone has time to dig into this and find out where that time is being spent, I would be grateful. Results should be sent to the issue tracker on the Github page .
Feel free to ask clarifying questions. I realize I may not be explaining this effectively to people unfamiliar with the ins and outs of Python MapReduce frameworks.
I'm thinking of organizing the 2nd "mrjob hackathon" in the near future, so please ping me if you're interested in contributing to an easy-to-handle lots-of-low-hanging-fruit OSS project. (Particularly if you're a Rubyist, because we have an experimental way to use Ruby with it.)
 mrjob is maintained by Yelp, where I worked until recently. It's still
under active development, though it's slowed somewhat since Dave and I
Also, if you're just starting out or want to look over the docs, I'd recommend using the dev version hosted on readthedocs instead of the PyPI version, as the author did: http://mrjob.readthedocs.org/en/latest/index.html
It's just splitting on tab for input and re-joining on tab for output. Some extra logic for lines without tabs.
EDIT: The example in the post is using JSON for communication between intermediate steps, while the Hadoop Streaming example is using a custom delimiter format. So this isn't really a fair comparison; the mrjob example could just as easily use the same efficient intermediate format.
The input is RawProtocol, which simply splits on tab. But after that, mrjob defaults to using JSON internally, and this is causing a lot of slowdown.
Please either mention this difference in your post or update the code and conclusions. If there's a place in the documentation where we should mention optimizations or details like this, I'd be interested to know.
I should have thought of this before. Oh well.
One real issue with mrjob is that it assumes you're only going to have one key and one value. It isn't straightforward to use multiple key fields. The workaround is to write a custom protocol (which, btw, is very simple ) that uses the line up to the first tab as the key, and the rest of the line as the value, probably splitting it on tab as well and passing it through as a tuple. If we had made multipart keys simpler to use, maybe you would have chosen to use a more efficient format.
Anyway, the main part I take issue with is:
"mrjob seems highly active, easy-to-use, and mature...but it appears to perform the slowest."
That's just not true. It would be fair to say that optimizing jobs with multipart keys isn't straightforward and therefore encourages non-optimal code, but that's moot if you're just using one key and one value, as most people do.
I'm really not trying to dump on you here. I liked the post! I would just prefer that it was more precise about these things.
EDIT: If anyone's thinking about downvoting this guy (someone did), don't. This is a discussion in good faith.