
CharmPy – A high-level parallel and distributed programming framework - lainon
https://charmpy.readthedocs.io/en/latest/
======
devxpy
Can you please lay out the differences between this and Dask?
[https://dask.pydata.org/en/latest/](https://dask.pydata.org/en/latest/)

I work on a parallel programming framework for python myself. Not geared
towards performance, but the ease of use.
[http://zproc.readthedocs.io/en/latest/](http://zproc.readthedocs.io/en/latest/)

~~~
juanjgalvez
There are quite a few differences between them. Disclaimer: I work on CharmPy,
and I'm not an expert on Dask, so my comments might be biased and not entirely
accurate with respect to Dask.

Obvious difference between the two is programming style. CharmPy (its current
core API) is based on asynchronous method execution between distributed
objects. Being objects they can have state and data which allows for a lot of
flexibility. In Dask, you express a workflow as a series of dependent tasks
(which as far as I know are stateless so it's more like functional
programming) and dask schedules it for you. The scheduling is centralized
(done in only one place, so it's like a master-worker pattern) even if you use
the "distributed" scheduler (which is needed for multi-node runs). With
CharmPy you can have truly distributed applications.

Another thing I observed with the dask model is that, since everything needs
to be translated into a task graph before execution, there seems to be poor
support for mutable distributed numpy arrays. A mutation operation like
modifying a single element of a distributed array is not allowed as far as I
know (I have tried), and other mutation operations that are supported actually
generate a completely different task graph as a result, with the overhead this
entails. In charmpy, this restriction does not exist since you can just invoke
a method on the object that holds the data that you want to modify, and do it
in-place.

In terms of performance, our initial tests have shown huge performance
difference, with CharmPy being up to 200x faster (this is comparing with dask
distributed scheduler for a very simple BSP-style program). Of course,
difference will vary by workload, but one thing to note is that Dask is pure-
python, while CharmPy's core runs in C/C++. The current level of task
granularity that we can comfortably support is a few hundred microseconds, and
we expect to improve it further. In contrast, the Dask documentation for the
distributed scheduler explicitly warns against using small task granularity,
recommending tasks larger than 100 ms duration. And something like Jug
recommends tasks longer than 20 seconds.

We are planning on adding other APIs on top of the core charmpy API, to
accommodate other programming styles. For example, offer better support for
the functional parallel programming style (there is a small example of
parallel map in the codebase using charmpy), or task scheduling.

~~~
p1esk
I had a task recently where I needed to convert several million audio files
from one format to another, and I did it with python's multiprocessing module
(similar to this: [https://stackoverflow.com/questions/50662610/using-
multiproc...](https://stackoverflow.com/questions/50662610/using-
multiprocessing-to-batch-convert-wav-to-flac-python-pydub) )

Just like the poster of that question on SO, I'm wondering if that's the best
way (in terms of speed or ease of use). Do any of the third party libraries
(like yours) offer any advantages for this use case? To clarify, I'm only
talking about doing work on a single workstation.

~~~
juanjgalvez
For a single workstation and the task you describe, the pool.map()
functionality of the multiprocessing module should be perfectly adequate. Not
sure how scheduling overhead would compare between charmpy and
multiprocessing, but for this task it shouldn't matter (I assume you need at
least a second to convert one file, and even if the conversion is faster, you
can chunk the tasks anyway to mask overhead). I would say the big difference
for this task is if you want to run it in parallel on multiple hosts, which
pool.map can't do. With charmpy we can provide a distributed parallel map
offering the same or similar API as pool.map. There is a simple example in
'examples/parallel-map/par-map.py', but we are working on offering a library
on top of charmpy with more features and a solid API.

~~~
p1esk
Oh, good point about batching - my files were really small (audio samples for
speech recognition), so a conversion of a single file took a lot less than a
second.

I looked at the par-map.py example, however I can't quite understand where do
I enter a server IP or something like that. The whole process is fuzzy to be
honest. What do I need to do if I want to run my conversion task on two local
workstations? E.g. I install CharmPy on both, then what?

~~~
juanjgalvez
You don't actually have to specify hosts or addresses in your application
code. When the application starts, the runtime will know how many processes
there are and on which hosts. The key is to use a job launcher.

For the par-map.py example, suppose you want to run it on 4 hosts and 8
processes per host. One way to do this is by launching the application with
"charmrun". First, install charmpy on all hosts like you said. Then you would
create a nodelist file with the names or addresses of the 4 hosts. Finally,
launch like this: `$ charmrun +p32 par-map.py ++nodelist mynodelist.txt`

I have updated the "Running" section of the docs to try to explain this
better, also pointing to the charmrun manual. Hopefully things are clearer
now.

~~~
samtwhite
And on most clusters and supercomputers you don't need to manually create the
nodelist at all, charmrun can do that automatically for you by parsing the
batch scheduler's list of allocated hosts.

------
targafarian
from charmpy import *

Please, for the love of God, import names explicitly or use e.g. `import
charmpy as cp` and subsequently `cp.foo` so that reading example code we get a
better sense of your API without having to guess which names were possibly
overwritten.

~~~
juanjgalvez
You must be referring to the example in the README. That is the only example
in the source code or docs that uses `import *` as far as I'm aware. But yeah,
I agree. It's fixed now.

~~~
targafarian
Much credit to you for changing that in the example. Will give it a try to see
what the advantages/disadvantages are next time I need do do more complex
tasks in parallel (I generally like Dask, but I would certainly need to try
this out to know where each is better or worse--or if it's just a matter of
taste).

------
leetrout
That seems like a very poor choice in name given the existence of PyCharm...

~~~
davidmr
I'm not sure what better to call it. Charm++ (its core) was around ruining the
lives of chemistry grad students and their sysadmins long, long before pycharm
was a twinkle in its creators' eyes. I'm sure jetbrains won't be coming for it
any time soon.

~~~
deadbeef404
Charmer? CharmIt? Charming? CharmHPD? Charmplus? Superintendentchalmers? The
problem here isn't that PyCharm came second or that Jetbrains will have an
issue. The problem here is that it's so many conversations will go

> Have you used CharmPy?

> You mean PyCharm?

> No, CharmPy!

> Are we talking about the same thing?

> No

> Oh, that's just confusing, then.

~~~
davidmr
I dunno, maybe. They're so completely different that it seems extremely
unlikely to me that anyone will confuse the two in a conversation with any
context whatsoever. The conversation you made up would almost certainly start
with a discussion about scientific parallel programming. Any discussion I've
ever had about charm++ (too many, I'm afraid) wouldn't have been confused for
a discussion about an IDE.

Either way, it's done.

------
wedn3sday
This seems pretty cool, but I'm left with one question. In the example from
the repo it states, "The following computes Pi in parallel, using any number
of machines and processors." However, after reading through all the docs, I
see no reference whatsoever to any multi-machine support, only multi-
processing on a single machine. Can this span over a cluster? Although they
make the claim that it will "scale to hundreds of thousands of cores" (which I
want to believe, I would love to use this) without the API documentation to
show me how to do this, their fancy library doesnt do much good.

~~~
juanjgalvez
Hi. Yes, applications can span multiple nodes (e.g. in clusters and
supercomputers), and is one of the main use cases of charm++/charmpy. The fact
that you don't see anything in the API or examples is that application code is
basically transparent to the amount of processes that are launched.

What determines the number of processes used is the launcher (e.g. charmrun,
or something like aprun or ibrun on other systems). During initialization, the
charmpy runtime will figure out internally how many charmpy processes are
active in the job.

With charmrun, you can launch multiple processes in one host, but also across
multiple hosts (by ssh'ing into each one and spawning the processes). This is
done automatically by charmrun assuming you specify a list of hosts (called
nodelist, see
[http://charm.cs.illinois.edu/manuals/html/charm++/C.html](http://charm.cs.illinois.edu/manuals/html/charm++/C.html)).
Again, the application code is not affected by this.

Similarly, on other systems you can launch charmpy applications with the
system job launcher (e.g. aprun, sbatch, ibrun…). We have done so for example
on Cray supercomputers. It is simple enough but we have to update the
documentation to at least show an example of this.

~~~
wedn3sday
Thanks for the response! This is great, makes it pretty much a better version
of mpi. Could you add this to the documentation, or if its already there maybe
a link on the main doc page about running on a cluster?

------
woodson
Related but somewhat off-topic question: Is there an easy to use parallel
processing framework for python that also works well on Windows (anaconda)? I
keep having issues with lost processes and other more or less random crashes,
no matter whether I use joblib or Dask, etc. All I really need is a parallel
for or apply.

~~~
juanjgalvez
You can install charmpy on Windows with pip if that works for you. Launching
multiple processes is straightforward (you can check the documentation at
charmpy.readthedocs.io).

We don't offer an API yet in charmpy to explicitly do things like parallel
apply (but will soon). You can however look at `examples/parallel-map/par-
map.py` in the source code which shows a simple example of how to do it with
the current API and might be what you are looking for.

------
Caminha
Is there a way to interop that with MPI based tools (such as Trilinos or
PyTrilinos)? I would really love to be able to write some of my unstructured
mesh based simulations on something like that.

~~~
juanjgalvez
For CharmPy this is not currently supported, but Charm++ can interoperate with
MPI libraries, so with luck it wouldn't require much effort to get it working.
I will open an issue on github to track this task.

------
ianbertolacci
Very cool. However, this documentation is incomplete. It needs the full Python
API listing.

~~~
juanjgalvez
We'll update the documentation with the API in the next couple of days.
Eventually there will also be a more comprehensive manual explaining every
feature.

