
Show HN: Optimizing Higher Order Functions with Hypothetical Inverses - pschanely
https://medium.com/@pschanely/optimizing-higher-order-functions-with-hypothetical-inverses-e5153ab69753
======
ericbb
This approach is strongly reminiscent of abstract algebra; for example, the
Map-Reduce Transformation example seems to be(?) justified exactly when f is
an isomorphism over r.

I've been thinking along somewhat similar lines but I've been looking at it
from the perspective of introduction and elimination rules, as found in type
theory. If your compiler can find a transformation that brings an intro form
and an elim form together, then you can apply the computation rule to remove
both from the program. My motivation is to remove allocations that aren't
really essential. (I do have a functional language compiler that I work on but
it doesn't implement any optimizations yet).

Another interesting idea that seems related is from a Philip Wadler talk[1] I
saw which discusses how normalization can be used to achieve something like
your "hypothetical inverses", which should always be eliminated from the final
result.

> These transformations might only be useful in obvious optimization scenarios
> (cases that no programmer would intentionally create).

I would be surprised if that turned out to be the case.

[1] "Everything Old is New Again: Quoted Domain Specific Languages"
[https://www.youtube.com/watch?v=DlBwJ4rvz5c](https://www.youtube.com/watch?v=DlBwJ4rvz5c)

~~~
pschanely
Yes! I think perhaps that f needs only to be a homomorphism instead of an
isomorphism, but I don't trust myself enough to make such a claim definitively
:)

Always curious to follow people doing compiler work; loop me in, yeah?

And, just watched that Wadler talk; it's good stuff. Of course, it sounds like
his scenario is like: "there is a way to do the transformation, you just have
to find it," and in my case, it's more like: "there might or might not be an
algebra in which f acts as a homomorphism, and, even if there is, you may not
have the transformations you need to determine it!"

~~~
ericbb
That's funny. I was thinking isomorphism because you postulate an inverse,
which homomorphisms don't always have. But the identity doesn't depend on f
being invertible. You could just have a rule:

If f is a homomorphism with respect to r, then reduce(r, map(f, l)) =
f(reduce(r, l)).

The rule can be proven without reference to an inverse for f so isn't it
enough to just add such a rewrite rule to your optimizer? (Now I'm not sure
what the inversion trick accomplishes?).

> Always curious to follow people doing compiler work; loop me in, yeah?

norstrulde.org and twitter.com/ericbb

> "there might or might not be an algebra in which f acts as a homomorphism,
> and, even if there is, you may not have the transformations you need to
> determine it!"

To my way of interpreting it, the algebra is determined by r, which is given.
You might not know whether f is a homomorphism with respect to r and it'd be
tricky to figure that out by inspecting the definitions for f and r but if you
have a table built in advance that characterizes functions in relation to each
other (incr is a homomorphism with respect to max, etc.), then you can drive
your rewrite engine using that table... So I wouldn't say that you want to use
transformations to determine if f is a homomorphism but rather that you want
to use the fact that f is a homomorphism to decide which transformations are
applicable. Does that make sense?

Edit: I guess, more generally, you could have a relation with the form
homomorphism(f:a->b, q:a->a->a, r:b->b->b) which justifies the transformation
reduce(r, map(f, l)) -> f(reduce(q, l)).

------
js8
This is excellent, I was thinking about similar optimization using inverses:

If one calculates some reduction over a data structure, and this reduction is
group operator (so it has inverse, like addition), and this data structure is
then updated, it may be easier to recalculate the reduction by updating the
result through inverse than to recalculate the whole reduction again.

For example, we are calculating sum over a list, we take element out, then
it's easier to just subtract the element from the sum rather than calculating
the sum again.

Anyway, all in all, I think knowledge of function inverses (and perhaps also
isomorphisms between types) by compiler will become very handy in the future,
and will enable huge number of optimizations.

~~~
pschanely
Thank you. And, heck yes, more people need to be talking about how to deal
with computations that are largely repeated. That problem comes up all over
the place.

Couchdb had an interesting approach for this that did not require inverses:
they persist intermediate reduce results, so that you can re-use nodes in the
tree that haven't changed. So clever.
[http://horicky.blogspot.com/2008/10/couchdb-
implementation.h...](http://horicky.blogspot.com/2008/10/couchdb-
implementation.html)

~~~
btown
Database indices (SQL and no) are similar in that regard - they are persisted
so that a single insertion or update does not invalidate the entire index.
Cache invalidation is a hard general problem, but easier when a closed system
(like a database) is the only thing using its own cache, and the interface is
much higher level.

------
_lm_
For the Map-Reduce transformation, you require f(r(a,b)) = r(f(a),f(b)). This
is equivalent to requiring r to be a homomorphism, I think.

You could probably generalize that condition to situations where f can be
lifted over the output of r so you don't have to require that r's output type
is the same as its input.

~~~
pschanely
Unfortunately, I took barely enough grad school classes to be dangerous! It's
worth clarifying that the lifting of f through r can change r (to, say r'), in
which case, yes, f is a homomorphism from an algebra with r' to an algebra
with r. (hopefully I got that right?)

I am curious about the idea that the inputs and outputs of r may not need to
be of the same type, and want to investigate further (help is appreciated!).
Since r is given to reduce(), at least one of its inputs must match its
output, depending on how you define reduce(). Haskell's fold(l/r) permit the
non-accumulator side to differ in type from the output, but the output must
match the accumulator side. I think some similar transformation may apply to
folds, but haven't thought about it enough.

------
panic
How does this compare to stream fusion?
[http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7...](http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7401)

~~~
pschanely
Ah, I knew that HN would tell me what I should be reading! I've only been able
to spend a few minutes reading this one, but it looks to me like the first
focus there is on eliminating intermediate containing structures, with some
interesting opportunities for optimization afterwards. Their stream definition
of fold(l/r) is still recursive, so I don't think it would be able to pull,
for example, the incr() outside the reduce in my first example of "reduce(max,
map(incr, l))" I think the approaches might be complimentary, actually; you
might apply the higher level transformations I describe first and use a stream
approach to implement the resulting chain of operations. Going to need to do a
deeper investigation later today, but, anything else I should be looking at?

------
pschanely
Referenced in the paper panic suggested below (
[https://news.ycombinator.com/item?id=11595273](https://news.ycombinator.com/item?id=11595273)
), and suggested to me directly by @mattcmd on twitter (
[https://twitter.com/mattmcd/status/726342652845809664](https://twitter.com/mattmcd/status/726342652845809664)
), there is something called the "fusion property of fold," which would
explain transformations I'm discussing here applied to folds (sided reduce).
From the paper (
[http://www.cs.nott.ac.uk/~pszgmh/fold.pdf](http://www.cs.nott.ac.uk/~pszgmh/fold.pdf)
), (and some minimal editorial substitution from me) we see that:

h(fold(g, w, l)) = fold(f, h(w), l)

if

h(g(x, y)) = f(x, h(y))

This is exciting because (1) it suggests the equivalent transformation for
sided reduces (folds), and (2) fold is a more general operator than the
others. My somewhat silly use of "hypothetical inverses" effectively just
creates a goalpost to (1) constructively find the the function f, given g (or
vice-versa) and (2) prove the necessary condition.

Very exciting.

------
kazinator
How do we make compilers smart enough to know that “sum(reverse(x))” can be
rewritten to “sum(x)”.

That examle is pretty easy; you apply a pattern match to the tree which looks
for the pattern (sum (reverse ?var)) and rewrites it to (sum ?var).

In Common Lisp, we would use define-compiler-macro on the sum function which
looks for the argument form being (reverse whatever), in which case the macro
returns the form (sum whatever).

Not so fast, though; order of summing matters in floating-point! And over
integers too, in the presence of overflows (in junk languages without
bignums). That is to say, if we add together three or more integers, the
arithmetic result can be in range of the type, but some orders might overflow,
whereas others do not.

~~~
rntz
> And over integers too, in the presence of overflows (in junk languages
> without bignums)

I don't think this is true. Modular addition is commutative and associative.
As long as the behavior of overflow is modular (as it is in 2's complement),
order shouldn't matter. And if the behavior of overflow is to trap, then it
should trap regardless of the order you add the numbers in. I guess if you
care about the stack trace you get being the same even under optimizations,
then you could argue this isn't a safe optimization. And if the behavior of
overflow is undefined, it's undefined whether you reverse the list or not.

Maybe you're thinking of Javascript, which not only doesn't have bignums, but
uses floating point even for integers?

~~~
kazinator
> _And if the behavior of overflow is to trap, then it should trap regardless
> of the order you add the numbers in._

How come? Suppose A + B doesn't trap, and if T0 = A + B, then T0 + C doesn't
trap. Why should (A + B) + C trap?

~~~
prodigal_erik
Say you have a register whose range is -9..+9. (5-1)+5 doesn't trap but
(5+5)-1 does.

~~~
kazinator
I understand how to make an example of this; the question is why _should_
(5-1)+5 trap, just because (5+5)-1 does.

------
sivvy
The premise of the optimizations in this article don't always hold,
unfortunately. If x is floating point, sum(reverse(x)) is unlikely to be equal
to sum(x) for long enough x, and sorting x on beforehand will make this effect
even more apparent.

------
quickie69
Beat me to it wrt floating point! It's not a field.

------
brudgers
Interesting article. Not sure if it meets the of the "Show HN" guidelines --
there is nothing for users to try out or play with. If it is still withing the
window, maybe change the title. If not, maybe email the moderators.

~~~
pschanely
Thanks! I have a link to code written in Maude at the very bottom; The code
needs so much explanation though. Perhaps I'll highlight the code link at the
top of the article?

~~~
brudgers
Or post a clickable link here in the thread? Explaining the code might be an
opportunity to create a rough draft of something more polished. A bit of
background on Maude might be nice too. Save readers from Googling and maybe
help people understand why it was used. All things that tend to be
"intellectually interesting."

Anyway, I was a bit philosophically torn since the content tends toward
mathematical/algorithmic and made me wonder what does it mean to play around
with mathematics/algorithms?

~~~
pschanely
Totally; great point. My code is here: [https://github.com/pschanely/wf-
optimizer/blob/master/demo.m...](https://github.com/pschanely/wf-
optimizer/blob/master/demo.maude)

Maude is a programming language based on rewriting:
[http://maude.cs.illinois.edu/w/index.php](http://maude.cs.illinois.edu/w/index.php)

It's pretty wild; your code doesn't "execute" in the traditional sense;
instead patterns match the code and transform it repeatedly until it reaches a
"normal form" \- one that cannot be rewritten further.

Maude is handy for playing with optimizers, because you very often want to
specify behavior in terms of patterns (when you see these functions together,
replace them with this...)

~~~
gmfawcett
I hadn't heard of Maude before, thanks! For those interested in this kind of
thing, another well-known term-rewriting language is Pure
([http://purelang.bitbucket.org/](http://purelang.bitbucket.org/)), which grew
out of an earlier project called Q
([http://q-lang.sourceforge.net/](http://q-lang.sourceforge.net/)).

