
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Containers [pdf] - ingve
https://www-cs.stanford.edu/~matei/papers/2019/usenix_atc_gg.pdf
======
stefco_
This is really great! I wrote and operate a pipeline framework for analyzing
data from multiple astronomical observatories in low-latency [1], and I'm
currently adding this sort of burst-performance scaling for things like
background simulations. Things like Kubernetes spin-up time or packaging huge
libraries for AWS Lambda are, indeed, challenges. Getting those startup times
down and doing autoscaling in a relatively platform-agnostic way with low
boilerplate overhead would be really game-changing for these sorts of analyses
(among other applications).

If you can run an analysis in low-latency and detect a joint gravitational
wave/EM source, for example, you can quickly follow it up with other
telescopes, like we did with the first direct kilonova observation [2]. Though
the gamma-ray detection localization wasn't good enough to really aid
counterpart searches for that event (GW170817), there are other types of
candidates (like GW+neutrinos, my project) for which this rapid localization
would be a huge improvement. And if you can do really hard things like
estimate the GW source parameters before merger (e.g. whether you're looking
at a binary neutron star merger, which is likely to emit detectable light if
it's close enough), you can try to get fast-slewing telescopes pointing on
source at merger time. Not to mention that burst processing would let you do
more clever things with your statistics (getting better sensitivity) in low-
latency as well. So stuff that makes this easier is really exciting!

Maybe if the API is amenable to my own solution I'll be able to implement this
as a backend :)

[1] [http://multimessenger.science](http://multimessenger.science) [2]
[https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.11...](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.161101)

~~~
isoprophlex
Reading all that makes me think.... you must have one of the coolest jobs
around

~~~
stefco_
It is! My response to the other comment describes some of the challenges, but
overall it's wonderful! It feels a lot like running a startup, though of
course the compensation is more spiritual than financial. But it has all of
the great and exciting parts of a startup combined with the bonus of getting
to think about stars crashing into each other for your actual job.

------
juomba
If you are not aware already - Bazel has a corresponding Remote Execution API
-
[https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX...](https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX_A3_axxNnvroblTZ_6s/edit#heading=h.ole76l21af90).

Corresponding GCP offering -
[https://console.cloud.google.com/apis/library/remotebuildexe...](https://console.cloud.google.com/apis/library/remotebuildexecution.googleapis.com)

------
s_Hogg
Is there a use case for this where it becomes uneconomic? I was working with
AWS last year and found that in some cases it really was better to just have a
single EC2 instance (or auto-scaling group) rather that ~9999999999999 lambdas
to handle each individual task.

~~~
keithwinstein
(Co-author here) It really depends (see Figure 2).

On the question of Lambda vs. EC2, EC2 instances take much longer to start. So
depending on the job, to get the performance you can get with a "burst
parallel" flock of Lambda workers, you would need to keep a warm cluster of
EC2 instances ready to take your job. At which point, the cost comparison
depends on how often you have work to execute. EC2 is cheaper if you have a
100% duty cycle (but you probably don't).

The gg tool, though, is mostly agnostic to the backend -- you can take a job
that's expressed in gg IR (e.g., "compile this program") and then execute it
with any of the gg back-ends. We have one for Lambda and one for a cluster of
warm VMs. The performance of gg-to-EC2 is generally better than outsourcing
methods that leave your laptop in the driver's seat (e.g. bazel-to-icecc) and
give less semantic information about data- and control-flow of the job to the
remote execution engine. (E.g. in Figure 9, you can see that gg-on-EC2 is much
faster than icecc-on-EC2 for compiling GIMP and Inkscape.)

~~~
polskibus
Any plans to evolve gg to use open source serverless platforms like knative,
openwhisk or openfaas?

~~~
keithwinstein
Sure -- we actually already have an OpenWhisk backend. The IR is sufficiently
stupid that it's pretty easy to write a new backend.

At least a low-performing one -- it gets harder if you want to use (a)
persistent workers [instead of invoking a new platform worker for every
"thunk"] and (b) direct inter-worker networking [instead of putting every
intermediate result in a storage medium].

------
albertwang
Source appears to be here:
[https://github.com/StanfordSNR/gg](https://github.com/StanfordSNR/gg)

Some more info linked here:
[https://news.ycombinator.com/item?id=16570548](https://news.ycombinator.com/item?id=16570548)

------
polskibus
Could this be adapted to work with an opensource serverless framework like
knative? It would be more useful if I could choose whether to use local
serverless cluster or public depending on data volume, security and other
workload-specific requirements.

------
lsofzz
Whenever I have boring cron-style or once off task that needs to be done, I
bring out Apache Airflow cluster to schedule them for me. Nomad is another
option but we haven't productionised it [edit]yet[/edit].

