
Self-Adjusting Computation - harperlee
http://www.umut-acar.org/self-adjusting-computation
======
amelius
This is in my opinion one of the important things that modern software
engineering is missing. Often, the most difficult part of writing software is
to keep everything in a consistent state. With self-adjusting computation, it
becomes possible to write a program as if the input-data is static, and all
changes in the input are propagated by the framework/compiler for free.

------
finch_
Incremental computation is an interesting idea, and could be very useful to
big data processing. However, the particular incremental technique of self-
adjusting computation has two big flaws:

\- There is a significant storage overhead due to all of the data that is
collected about the computation (the "dynamic dependency graph").

\- It makes the assumption that if the input to your algorithm changes a
little, the execution path and intermediate variable values will still be the
same for most of the computation. This is not true for many algorithms you
might wish to make incremental.

I'm not necessarily saying these issues couldn't be overcome, but a lot more
research is needed.

For some other alternatives to incremental computation that avoid these flaws
(while introducing other problems of their own), you could look into:

DBToaster: [http://www.dbtoaster.org/](http://www.dbtoaster.org/) LINVIEW:
[http://dl.acm.org/citation.cfm?id=2588555.2610519](http://dl.acm.org/citation.cfm?id=2588555.2610519)

~~~
assface
_There is a significant storage overhead due to all of the data that is
collected about the computation (the "dynamic dependency graph")._

The storage overhead is massive. It's a non-starter with this approach. Our
experiments with self-adjusting computation were in the range of 30-100x for
simple algorithms. That means if you have a 1TB data set, you need 30TB just
to store the intermediate results.

A relational database with materialized views or the special purpose systems
that you cite (DBToaster, LINVIEW) are better approaches.

~~~
umutacar
storage overhead of self-adjusting computation depends on the granularity at
which dependencies are tracked, which is in the control of the programmer. my
post above provides more information.

------
nine_k
Can please anyone explain the differences / similarities of this approach and
FRP?

------
umutacar
\- It is not true that self-adjusting computation necessarily leads to large
space overheads.

For example recent paper on InCoop (SOCC 2011) and iThreads (ASPLOS 2015) show
small space overheads even for very large datasets. If my memory is not
failing me I recall space overheads to be less that 20% in many cases. Another
paper in ICFP 2014 on DeltaML shows techniques for reducing space overheads.

Here are links to these papers: Incoop: [http://www.umut-
acar.org/publications/socc2011.pdf](http://www.umut-
acar.org/publications/socc2011.pdf) IThreads: [http://www.umut-
acar.org/publications/asplos2015.pdf](http://www.umut-
acar.org/publications/asplos2015.pdf) DeltaML: [http://www.umut-
acar.org/publications/icfp2014.pdf](http://www.umut-
acar.org/publications/icfp2014.pdf)

All of the papers above use the following idea: in self-adjusting computation,
the programmer has full control over the granularity of dependency tracking.
In one extreme, all dependencies are tracked and even fine grained updates can
be updated efficiently but such fine-grained dependency tracking can lead to
10-100x space overheads. In the other extreme, the dependency on the input as
a whole is tracked, leading to essentially no space overhead but then any
change to the input triggers a whole recomputation. In principle anything in
between is possible. For example in InCoop dependencies are tracked at the
level of large disk blocks. In iThreads, dependencies are tracked at the level
of OS pages. In DeltaML (ICFP 2014) paper, dependencies are tracked at a level
blocks determined by the programmer.

\- Self-adjusting computation does not _assume_ that a small change has to
lead to a small change in the execution path. The change in the execution path
however determines the update time. This is the cost model that self-adjusting
computation offers. There are many techniques for controlling the size of the
change in the execution path, as can be seen by some of the more sophisticated
applications of self-adjusting computation on large data sets (incoop,
ithreads, deltaML), machine learning (inference), and computational geometry.
In all of these cases, self-adjusting computation leads to asymptotic
improvements compared to batch computation.

One ultimately powerful technique that allows the programmer/designer to
control the efficiency is traceable data structures (PLDI 2010 paper), which
allows application specific logic to be exploited as desired.

Traceable data structures also allow defining domain-specific solutions (by
using the appropriately designed traceable data structures).

Paper on traceable data structures: www.umut-
acar.org/publications/pldi2010.pdf

