

Pulsar: Concurrent framework for Python - bsg75
http://quantmind.github.io/pulsar/

======
omginternets
I really want to use Python in a distributed computation project of mine, but
I've been running into serious performance bottlenecks with regards to object
serialization. Like many concurrent Python frameworks, Pulsar uses `pickle` to
serialize data across processes, and for all but the smallest data structures,
workers end up spending 95+% of their time (de)serializing data.

I work with video data, so I just ran these rough benchmarks on my MacBook
Pro:

    
    
        import cPickle as pickle  # yes, I'm on Python 2.7.  Sue me.
        import numpy as np
        
        a = np.random.rand(800, 600, 3)  # 800x600 px, RGB channel
    
        %timeit pickle.dumps(a)  // %timeit magic from IPython
        1 loops, best of 3: 855 ms per loop
    
    

It takes nearly a second to serialize a not-so-huge `numpy` array, which makes
it very difficult to do any sort of soft real-time analysis.

This is a huge pain, and (very sadly for this Python aficionado) suggests that
Python might be the wrong language for this kind of work.

Any suggestions?

~~~
sepeth
You should use highest protocol of pickle, here are the numbers in my machine:

    
    
        In [5]: %timeit pickle.dumps(a)
        1 loops, best of 3: 724 ms per loop
    
        In [6]: %timeit pickle.dumps(a, protocol=pickle.HIGHEST_PROTOCOL)
        100 loops, best of 3: 12.4 ms per loop

~~~
darkxanthos
What's the trade off here?

~~~
AnkhMorporkian
Backwards compatibility. Anything pickled with HIGHEST_PROTOCOL will be
unreadable in anything below Python 2.3.

~~~
robzyb
So my Ford Model T won't be able to read it?

(That is my humorous way of saying that Python 2.3 is rather old and therefore
HIGHEST_PROTOCOL seems like it would be a desirable trade off for most.)

~~~
StavrosK
Yeah, what the hell? Even 2.5 is way too old, I generally only support 2.6+
nowadays. If something breaks compatibility with 2.5, no problem there. Hell,
Python 3 has been out for _years_!

~~~
AnkhMorporkian
Hey, I'm with you. I use 3.4 on almost everything, and 2.7 at worst anywhere
else. I weep for those stuck using Python 2.2.

~~~
StavrosK
No, I'm agreeing with you. I want to transition to 3.4 completely (mypy looks
amazing), but there are some small Django libraries (third-party libs) that
aren't yet compatible :(

~~~
AnkhMorporkian
Mypy is indeed amazing, I can't get enough of it. Throw in asyncio and pathlib
and you get the reason why I could never go back.

You might look into trying to port the django libs by yourself if you have the
skill to do so. 2to3 often gets you really far. If you don't, I'd definitely
recommend opening a ticket on the project page. I've done that with a few libs
I use, and for a couple of them the maintainers just totally forgot about them
and got around to doing the conversion just because I asked.

~~~
StavrosK
I did that already, I just have to find some time to do the conversion. The
maintainer was kind enough to assist.

How do you use MyPy? I'm particularly worried about two things:

1) If I'm building a library, I can't have MyPy as a requirement, but would
still like to use it for the types checks. Is there a way to omit the import
when distributing your library?

2) Can it only check parts of an application? Maybe I have a big Django app
and don't want it to static-check Django and all the other imports every time,
for example.

~~~
AnkhMorporkian
Sorry it took so long for me to get back to you; I only check HN from time to
time.

1\. No, afaik there's no way to omit the import. Sadly there's little in the
way of macros or preprocessors in the python world at this point.

2\. I'm fairly sure there's ways to use module stubs, but all my mypy work has
dealt with the stdlib which presents no issue.

~~~
StavrosK
Thanks for your reply! There aren't preprocessors, that's true, but I think
the way MyPy does things is a bit unnecessary. They could have avoided the
typing module and just used bare annotations, and made MyPy a static type
checker. Still, this just avoids the dependency, which isn't such a big deal.

------
lsbardel
Pulsar is 100% written on top of asyncio. In this respect is not dissimilar to
Tornado or Twisted, which also use an event loop for their asynchronous
implementation (with their own event loop and Future classes).

However, pulsar is built on top of the standard lib with all the benefits that
that brings, especially in view of the changes in python 3.5
([https://www.python.org/dev/peps/pep-0492/](https://www.python.org/dev/peps/pep-0492/)).

The actor model in pulsar refers to the parallel side of the asynchronous
framework. This is where pulsar differs from twisted for example. In pulsar
each actor (think of a specialised thread or process) has its own event loop.
In this way any actor can run its own asynchronous server for example.

Tornado is an asynchronous web framework, pulsar is not, you can use any web
framework and run it on pulsar wsgi application. You can use pulsar to create
any other socket application, not just HTTP.

------
ignoramous
I am not sure whether Pulsar would anywhere achieve the speed / throughout of
a truly concurrent system like JVM, Go, Haskell, and Erlang. I don't see
benchmarks posted on Pulsar's website, but I came across this discussion [1]
where aphyr and KiranDave discuss about why Clojure/Erlang's model of explicit
concurrency (which enabled parallelism) is superior to NodeJs' (implicit
concurrency via libuv, and multi-process parallelism via clusters).

From reading the "design" section on the website, to me it looks like Pulsar
is an attempt to replicate NodeJs (?) and by extension cannot compete with
languages with truly concurrent runtimes?

[1]
[https://news.ycombinator.com/item?id=4306241](https://news.ycombinator.com/item?id=4306241)

~~~
scribu
Great link - thanks for posting!

Pulsar can be configured to use either processes or threads. If you use
processes, you pay the IPC penalty, just like NodeJS does. If you use threads,
you pay the GIL penalty.

So yeah, it won't be as fast as BEAM or the JVM either way.

------
dikaiosune
Is this related to the fibers/channels/actor framework by the same name for
Clojure?

[https://github.com/puniverse/pulsar](https://github.com/puniverse/pulsar)

~~~
klibertp
Good question. The name is the same and goals of both projects are similar.
Implementation is very different, though.

------
ivoras
Oh, another one. Doesn't the fact that there is a new concurrent framework for
Python nearly every single month point to the existence of a rather large
pachyderm in the room?

------
sciurus
Previously:
[https://news.ycombinator.com/item?id=6543266](https://news.ycombinator.com/item?id=6543266)

------
calebm
How does it compare/contrast with Tornado?

~~~
jonesetc
Looks like an actor framework, built directly on top of stdlib asyncio, and is
designed to use multiprocessing first instead of tornado's cooperative
multitasking.

