
Partial Evaluation and Immutable Servers - mzl
http://blog.regehr.org/archives/1197
======
contingencies
There have been many discussions in this area recently. Certainly, the
worrisome trend of using containers as 'lightweight VMs' (with nearly all the
same attack surfaces as regular VMs) suggests the current container-craze is
kind of unlikely to change things: your analysis is correct... there's useless
fat there on many layers. The question is how do we get rid of it?

You discuss changing the entire design process and electing to use compilers
or languages that are capable of shaving off disused codepaths automatically.
The problem with this idea is that declaring what is disused is difficult in a
conventionally hacked-together-from-components unix-style system. The option
to hack stuff together is precisely what makes it a lovely platform for
getting things done, and it's hard to retrain most developers/admins on to a
greenfield platform (for an example of this approach see, eg. the Erlang or
Clojure communities).

I have taken a different yet similar approach in my thinking[1] where one
assumption is that behavioural profiling of a codebase from a very high (eg.
OS, network) level should be generally applicable to any codebase, allow
sidestepping the need to retrain developers, and yet still be able to whittle
away much of this fat (though far from all of it). Things like grsec implement
learning modes which allow this to be done at the system call level, network
profiling is easy as is filesystem monitoring, etc.

When you step back and think about this junk, the sad fact is that the NSA was
publicly releasing the kernel-level implementation for this stuff nearly 20
years ago ... and have likely been doing this for even longer. The academic
community tries to provide mechanisms for provable correctness but the world's
time-sapped developers just chug along the same old tired path.

What we _really_ need, I feel, is automated help that doesn't require a
difficult to achieve cognitive paradigm shift for current generation
programmers.

[1]
[http://stani.sh/walter/pfcts/original/](http://stani.sh/walter/pfcts/original/)

------
crypt1d
I find it a bit amusing that we used to build systems like this in the first
place. There was no unused code because there was no _shared code_ , no
libraries ,etc. You only used what you needed. Then we worked our way up to
the 'common code' idea and generated enormous amount of libraries that systems
come with by default and make programmer's life much easier. But now we
realize that this is not very efficient after all, so we want to be able to
write a program and then strip it of all the unnecessary code that we
introduced(historically) in the first place. What is the lesson learned here?
Is it that common code is not always a good idea, or is it that we need to
write it in a way that is more _modular_ and easier to take apart and remove
the blocks we dont need? Would such a change be even possible now?

~~~
csl
Well, the idea here is that with libraries in pure source form, a partial
evaluator _could_ be able to specialize and shrink down the program
automatically, as you describe. So you'd get the best of both worlds.

I'm not aware of any systems that actually do this on a global level, but I
have a feeling I may be surprised if I start looking around.

~~~
justincormack
Well, link time optimization does this for C code in principle. It is fairly
new in gcc and clang so it is not clear what it buys you in real world cases.
A lot of code cant be compiled out because it is hard to prove it cannot be
called. I plan to experiment with LTO and compiling a kernel and an
application together at some point to see how it goes.

------
fulafel
Lots of unexplored potential in this area.

Also think about optimizing memory access by applying this to data structures:
if you can prove there are at most 42000 foo objects, the compiler could
reduce 64-bit foo pointers to 16-bit indices. And reorder object layout so
that fields are in access order. Etc. With everything being limited by cache
misses, reducing your cache footprint and miss frequency like this can provide
big speedups. (also shrink your VM memory requirements).

Managed languages with capabilities to deopt when assumptions are violated can
do optimizations like these more optimistically and based on profile feedback.
In fact Hotspot already does in a more limited way (comperssed oops).

~~~
userbinator
_if you can prove there are at most 42000 foo objects, the compiler could
reduce 64-bit foo pointers to 16-bit indices_

I think the main problem with doing this is that it requires a level of
"intelligence" and insight that compilers currently are nowhere near; these
types of optimisations are easy to see from the point of view of the (human)
programmer, but due to their generality, compilers simply can't seem to
"think" in this way. It's not quite at the level of "find an implicit O(n^2)
algorithm in the code and replace it with an O(n log n) one", nor is it as
simple as the common pattern-matching type of optimisation.

The other aspect is that size-optimisation has traditionally been highly
downplayed in academic research despite the techniques being well-known in
e.g. the demoscene and some embedded systems development. Overlapping
instructions, for example, is one optimisation that I don't think compilers
will be able to do any time soon. This and other types of low-level
optimisation are often applied by (human) Asm programmers, however.

~~~
fulafel
Like I said, HotSpot already does a version of this. Doing doing the full
version with deopt wouldn't appreciably harder analysis-wise:
[https://wikis.oracle.com/display/HotSpotInternals/Compressed...](https://wikis.oracle.com/display/HotSpotInternals/CompressedOops)

For static compilation, generally proving <42000 by reasoning from code is
probably too hard, but lots of room for cheating. For example profile feedback
generates observation about <42000 and you put in an assertion to that effect.

There's some attention deficit in academia, but there do exist papers about
this stuff. Eg
[https://dl.acm.org/citation.cfm?id=1739033](https://dl.acm.org/citation.cfm?id=1739033)
and its refs
[http://scholar.google.com/scholar?q=related:g11WX-h6J6cJ:sch...](http://scholar.google.com/scholar?q=related:g11WX-h6J6cJ:scholar.google.com)

It's just not making its way to real-world production compilers/VMs. C/C++
don't really allow most of these optimizations, that's one problem.

------
agumonkey
I posted that on reddit but it's worth a little spam:

see [http://c2.com/cgi/wiki?SynthesisOs](http://c2.com/cgi/wiki?SynthesisOs)
for a kernel featuring jit and partial evaluation to specify threads of code
on the go. Old but still exciting.

