
Celerity – High-level C++ for Accelerator Clusters - pjmlp
https://celerity.github.io/
======
fluffything
Weird. A library that wraps SYCL within MPI, yet requires all processes to
hold a copy of all the memory ?

One of the main reasons to use MPI is to solve problems that do not fit within
the memory available in a single cluster node.

There is a presentation [0] from 2020-02-20 that's not very impressive.
Particularly, they compare against MPI+OpenCL, but do not show a comparison
against MPI-CUDA.

Doing a distributed MatMul using MPI-CUDA is trivial. I wonder how the
cyclomatic complexity and performance compares for that case. Insted, they
only compare doing one MatMul per process using MPI-OpenCL... that's... a two
liner with CUDA (just call MpiInit followed by a cuBLAS call).

[0] [https://www.uibk.ac.at/fz-
hpc/events/resources/2020_02_20_ah...](https://www.uibk.ac.at/fz-
hpc/events/resources/2020_02_20_ahpc_celerity.pdf)

~~~
qayxc
I'm honestly much more worried about NVIDIA's overwhelming success in buying
their way into the _entire_ scientific community.

CUDA is proprietary, not standardised, and tied to a single hardware vendor.
Technical issues aside, I applaud _any_ effort to break the dominance of CUDA.
CUDA is a pest that needs to either open up or be replaced.

HPC is a very heterogeneous environment (on the whole) and making it easier to
use the full potential of the hardware is understandable. Also keep in mind
that it's a research project and very early in its development.

~~~
adev_
> CUDA is a pest that needs to either open up or be replaced.

My main problem with cuda is not that it is un-free, it is that the source
code is unavailable and consequently you have to deal with a badly packaged,
badly supported blob that turn only on a restricted set of compilers.

The consequences of that is that every software depending on cuda is a pain in
the butt to distribute. Even major software like Pytorch or tensorflow need to
have separated released due to cuda.

It is particularly sad that Nvidia in 2020 did not understand what Mellanox
understood 10 years ago: that successful leadership in hardware is compatible
OSS with an software stack.

------
e40
Not to be confused with Celerity Computing:
[https://en.wikipedia.org/wiki/Celerity_Computing](https://en.wikipedia.org/wiki/Celerity_Computing)

Fun fact: I did a port of Franz Lisp[1] to the Celerity, having the compiler
(liszt) generate code for the backend of the C compiler, since they told us:
"in no way will you be able to generate assembly code out of your compiler for
this architecture." They were an early Sun Microsystems competitor that didn't
make it.

[1] Not to be confused with Franz's Common Lisp, this is the MacLisp
compatible Lisp from UC Berkeley.

------
queensnake
Intel’s DPC++ seems similar - computation graph, communicating via buffers.

[https://software.intel.com/en-us/oneapi/dpc-
compiler](https://software.intel.com/en-us/oneapi/dpc-compiler)

Anyone pick out differences?

~~~
14113
DPC++ is basically "just" SYCL, so no MPI.

