
Parallel Supercomputing for Astronomy with Julia - SiempreViernes
https://juliacomputing.com/case-studies/celeste.html
======
KenoFischer
Oh hey, I worked on this. Paper with all the details here:
[https://arxiv.org/pdf/1801.10277.pdf](https://arxiv.org/pdf/1801.10277.pdf).
Happy to answer questions.

~~~
philipkglass
Finishing the whole job in under 15 minutes is impressive but also a bit
suspicious, in a way. Back when I used a top 10 HPC facility at a national lab
I saw a _lot_ of jobs in the queue that could have run just fine on business
class servers without the fancy, expensive interconnects needed for Grand
Challenge problems. People didn't run their jobs on smaller machines because
there was money in the budget to build a world class Top 10 computing resource
but not money to buy less exotic hardware matched to typical jobs. Individual
research groups didn't really get their typical compute needs measured or
surveyed.

These are obviously leading questions, but:

\- Are there significant research advantages to super-fast turnaround (less
than 15 minutes, enabled by massive parallelism) in this domain?

\- Do you feel like a massively parallel system with Xeon Phi nodes is a good
match for this problem? Or did the code get optimized to run at high scale on
Cori Phase II because that's where you were given compute resources?

Finally, a bit less provocative:

Does this approach effectively scale _down_ to e.g. a university that can
afford a large storage array and beefy commercial servers (optionally equipped
with Phi or other accelerators), but doesn't have HPC resources that are
contenders for the Top 500? Or do you really need things that smaller systems
can't deliver, like many terabytes of memory in distributed global arrays?

~~~
KenoFischer
| Are there significant research advantages to super-fast turnaround (less
than 15 minutes, enabled by massive parallelism) in this domain?

No, the 15 minute turn around time is not important given the dataset we have
at the moment, but showing that it was possible to do was considered important
from a science perspective, because of the upcoming LSST telescope. LSST will
generate an amount of data equivalent to the full dataset we had available
every 3-4 days, so being able to scale this up far enough to accommodate that
as well as future planned extensions to the algorithm, necessitated showing
scale. The actual science runs by the project are usually done on a few
hundred nodes over a couple of hours.

| Do you feel like a massively parallel system with Xeon Phi nodes is a good
match for this problem? Or did the code get optimized to run at high scale on
Cori Phase II because that's where you were given compute resources?

Cori Phase II worked well for this problem, though I wouldn't be surprised if
GPUs wouldn't have been a better fit (though harder to program of course and
at the time the Julia GPU infrastructure probably wasn't quite ready yet -
even KNL was a struggle since LLVM was still in the process of completing
support for it). The Celeste project is still ongoing (working on science
goals more so than extra parallelism or performance improvements at the
moment), but I wouldn't be surprised if there was an attempt to run on Summit
at some point, especially now that Julia's GPU compiler is much more mature.

One of the biggest problems we failed to anticipate actually was getting the
data from disk to compute units quickly enough. Early in the project we
crashed the interconnect on the machine, so for the challenge run we weren't
allowed to do anything other than pull the data directly from disk (lest we
bring down the machine again while other challenge runs were ongoing). I
haven't really looked at the interconnect on Summit, so I can't say how well
it would handle that.

| Does this approach effectively scale down to e.g. a university that can
afford a large storage array and beefy commercial servers

Yes, it scales fairly well. In fact you could probably do it with spot
instances on a public could fairly well. The biggest thing would once again be
getting the data to the compute units quick enough. That's quite demanding on
the network (and ideally you want to pre-stage the data in memory). Certainly
it's feasible to do this on a large-ish university cluster on the SDSS data
set in a few hours. Probably less feasible on LSST data once that comes
online, but maybe by that point improvements in computation speed and storage
speed will have made up for that and it'll become feasible again.

~~~
pmalynin
>No, the 15 minute turn around time is not important given the dataset we have
at the moment, but showing that it was possible to do was considered important
from a science perspective, because of the upcoming LSST telescope. LSST will
generate an amount of data equivalent to the full dataset we had available
every 3-4 days, so being able to scale this up far enough to accommodate that
as well as future planned extensions to the algorithm, necessitated showing
scale. The actual science runs by the project are usually done on a few
hundred nodes over a couple of hours.

I've heard that the folks working on the EHT array need months to crunch
numbers. Could something like this be used to speed up that process? Or is
there some other reason that would prohibit that.

P.S. I want pictures of black holes.

~~~
KenoFischer
I don't know. I'd imagine that the folks working on the EHT already are making
use of plenty of HPC for image reconstruction. It's quite a different problem
from the Celeste application of course, so this work isn't directly
applicable, but if they ever wanted to rewrite their code in Julia, they
should give us a shout ;).

------
sgt101
I found it funny to compare a paper by my colleague Andrew and this talk. Ok
we used gibbs sampling and a teeny weeny sad old hadoop cluster and we were
looking a phone lines and not stars, but Bayesian inference to make a single
catalogue... Ours does what's this made of and where is this thing, rather
than "is it a star or a galaxy"

Also ours doesn't run in 15 minutes!

[https://rd.springer.com/content/pdf/10.1007%2F978-3-319-7107...](https://rd.springer.com/content/pdf/10.1007%2F978-3-319-71078-5_17.pdf)

------
kwertzzz
This is really a great achievement. I am wondering how much communication was
necessary between the cores during the computation or is the problem a so
called "embarrassingly parallel" workload where the work could be split into
independent tasks?

------
qualsiasi
I would be curious to know what would be the elapsed time on the same machine
(and problem) using more "enterprisey" languages like Java, C#, ...

~~~
jabl
I work in HPC, and, well, Java and C# are more or less irrelevant in that
space. Well, there is a little bit of Java, some people are using things like
Hadoop or Spark, but otherwise, no.

"Hard-core" computing is almost all C/C++/Fortran (and of course CUDA for
GPU's etc.). Python and R are fairly popular, but in those cases (hopefully)
most of the heavy lifting is done by library code (again, C/C++/Fortran, or
increasingly CUDA via ML libraries such as tensorflow) rather than the
interpreter. Julia is very promising in this space, as it offers a solution to
the "two-language" problem. I'm hopeful for Julia to make more of an impact,
but it's of course a slow process.

(There is a (tiny) bit of ASM, but that's more or less exclusively done for
widely used performance-critical libraries like BLAS, or FFT, not for
application code.)

~~~
antpls
Is Julia a JIT compiler? How is Julia different than Pypy or even the JVM?
There is also the new GraalVM in the landscape.

I don't see the plus value in Julia compared to Python or Java. It will still
be slower than C/C++, probably less portable, and all the legacy libraries
have to be rewritten in Julia.

If Julia is only a new syntax, to me Python is already very simple. If Julia
is a JIT compiler, why not participate to already available compilers?

Julia is still there, so I guess it adds value, but I don't know where to
place that effort in the grand scheme of things.

~~~
ethelward
> How is Julia different than Pypy

Well, no GIL for a start, which is a pretty strong selling point when running
on several dozen of cores at once.

> I don't see the plus value in Julia compared to Python or Java

I don't see how you can put Python and Java in the same bag.

> It will still be slower than C/C++

Not that much
[https://news.ycombinator.com/item?id=17204750](https://news.ycombinator.com/item?id=17204750)

> probably less portable

Who cares? 99.99% of HPC are Linux clusters anyway. And Julia runs on macOS
and Linux, that cover the overwhelming majority of the concerned users.

> and all the legacy libraries have to be rewritten in Julia.

[https://docs.julialang.org/en/v0.6.0/manual/calling-c-and-
fo...](https://docs.julialang.org/en/v0.6.0/manual/calling-c-and-fortran-
code/)

> I don't know where to place that effort in the grand scheme of things.

According to you interrogations, browsing their website would be a good start.

[https://julialang.org](https://julialang.org)

~~~
antpls
Thank you for the detailed answer. Here's my feedback on some of the points :

> I don't see how you can put Python and Java in the same bag.

What I meant is that the trio Python/Java/C is ubiquitous for many people in
both entreprises and scientific fields, from embedded to web servers. It
allows for great reusability of codes and people's skills.

> Who cares? 99.99% of HPC are Linux clusters anyway. And Julia runs on macOS
> and Linux, that cover the overwhelming majority of the concerned users.

But will Julia be able to output the necessary instructions for the future
hardware accelerators, which could be totally different architectures? I'm
thinking of all the new neural networks cores, the DSP, fpgas, and
heterogeneous computing from different rival vendors. It seems Julia is deeply
dependant of LLVM.

> and all the legacy libraries have to be rewritten in Julia.

> [https://docs.julialang.org/en/v0.6.0/manual/calling-c-and-
> fo...](https://docs.julialang.org/en/v0.6.0/manual/calling-c-and-fo..).

If you have to reuse C and Fortran libraries, why not just use Python which
can do the same, or even Lua, or Lisp. Python is already the defacto language
to glue libraries together onto a higher level algorithm.

~~~
KenoFischer
> But will Julia be able to output the necessary instructions for the future
> hardware accelerators, which could be totally different architectures? I'm
> thinking of all the new neural networks cores, the DSP, fpgas, and
> heterogeneous computing from different rival vendors. It seems Julia is
> deeply dependant of LLVM.

Yes. Watch HN in the next week or two for an announcement that may interest
you ;).

~~~
ethelward
Oh gosh, you're making me impatient now :)

------
adyavanapalli
"Peta floating point operations per second per second" XD

------
jgamman
wow - 28 people on the about page and the only woman is the diversity
director.

~~~
KenoFischer
Yep, that's something we need to fix. So far we've done most of our hiring
from the open source community, which is unfortunately overwhelmingly male.
We've done some work to try to increase diversity in the community and Jane
graciously agreed to take a break from her PhD studies at Caltech to help with
outreach as part of that (also thanks to the Sloan Foundation for funding the
corresponding effort). Clearly a lot more work is needed on that front, but
I'm hoping the work that Jane and others have been doing this past year have
helped on the community side. Of course there's work to do on the company side
independently, but I imagine the community will continue to be an important
hiring channel, so I consider improving diversity there as a necessary
ingredient.

~~~
chimpburger
Nothing needs to be fixed. Make the opportunity available to women but do not
enforce a gender quota. Candidates should only be selected on merit. Equal
opportunity is good. Forcing equality of outcome is BAD.

~~~
cygx
If you have a gender imbalance of 27:0 that is not predicated on self-evident
physiological differences, there's a cultural problem that artificially limits
the pool of potential candidates. It is in a field's best interest to fix
that.

~~~
snovv_crash
I was trying to hire specialists in a similar field, in Europe. Applicants
were at least 50:1 m:f. Instead we focused on getting diversity based on other
things - country of origin, industries worked in before, age etc. It has
worked out really well, we have a dynamic team, and yes, once we started
hiring for other roles the gender balance of applicants flipped completely.

~~~
cygx
Sure: When hiring, you work with what's available. But as a community, it
doesn't hurt trying to figure out where such ridiculously large gender
imbalances come from, and if there's something that can be done about it.

