
Building a Program Synthesizer - sidereal
https://homes.cs.washington.edu/~bornholt/post/building-synthesizer.html
======
arnioxux
His previous post was probably the best overview on program synthesis
paradigms (circa 2015):

[https://homes.cs.washington.edu/~bornholt/post/synthesis-
exp...](https://homes.cs.washington.edu/~bornholt/post/synthesis-
explained.html)

It's a much broader overview of the field so read that first if you need an
intro!

~~~
apskim
And if you want a deeper dive after that, here's a 100-page survey of all the
techniques, principles, and applications of program synthesis circa 2017:

[https://www.microsoft.com/en-
us/research/publication/program...](https://www.microsoft.com/en-
us/research/publication/program-synthesis/)

------
jbattle
With these techniques - how complex a program can practically be created in
2018?

A lot of the example programs mentioned feel more like functions (stateless +
deterministic mapping of a tuple of inputs to a single output).

In contrast, I think of a 'program' as having a longish lifecycle comprising a
stream of inputs and outputs.

Relatedly, I'm not clear how well the approaches detailed here actually would
scale with complexity - is this a meaningful building block in building more
complex applications? Or is some higher-order framework needed that 'knows'
how to apply this in a rich context?

~~~
saurabh20n
10-50 lines of code. You have to pick the right domain. Sumit is doing a lot
of work [1] on integrating these ideas into Microsoft's products, e.g.,
Excel's flash fill. Ras [2] is synthesizing biology models and architecture
design. Sanjit, Armando, Emina, Rishabh, are all doing excellent work getting
them to scale.

[1] [https://www.microsoft.com/en-us/research/video/flash-fill-
fl...](https://www.microsoft.com/en-us/research/video/flash-fill-
flashextract/)

[2]
[https://www.frontiersin.org/articles/10.3389/fbioe.2014.0007...](https://www.frontiersin.org/articles/10.3389/fbioe.2014.00075/full)

------
jnurmine
Once upon a time I thought of this and came to a conclusion that in the future
software is not written, it is grown.

A big hall with computers, churning on calculations to end up with reusable
modules/components for some predescribed purpose. These are combined and
eventually larger systems emerge.

The inputs to the calculations were to be descriptions of what the program
should do, no details on the "how". For instance, "make an input box for name
and snailmail address, with cancel and ok buttons".

It was obviously warp engine level stuff when it comes to feasibility and I
thought it would be very far away, like 100 years away... But maybe it is
closer than that.

~~~
iancmceachern
This is happening now for mechanical design. They call it generative design.
You give inputs (volume/shape, loading cases, other parameters) and then the
software iteratively solves for the optimum geometry to meet the goals.

~~~
tobmlt
And there is also ``form parameter design'' for ship hull geometry. It is also
generative, but perhaps it more typically makes use of deterministic
(nonlinear optimization) solvers to generate the geometry itself.
(Implementations vary - I show my bias (or just inability to shut up) here as
I am trying to finish a PhD in this as we speak) The main similarity I'm going
for in this comment is that the user specifies design constraints and the
program generates ship hull shapes that meet those requirements.

Tie generative methods in with solvers for hydrodynamics, motions, structures,
stability, etc, and one can conceive of automating the ship hull design
spiral, or sections of it anyway. That's not to say that some of this is not
already out there in commercial software.

------
tobmlt
In case anybody is reading this post and wants an idea of how to get started
with the basics of logic programming (upon which one might build toy
constraint solvers) I recommend this:

[https://codon.com/hello-declarative-world#programming-
paradi...](https://codon.com/hello-declarative-world#programming-paradigms)

------
opless
Of course, being a new old thing, it's not complete without referencing it by
it's actual name ... Genetic Programming.

[http://www.genetic-programming.com/johnkoza.html](http://www.genetic-
programming.com/johnkoza.html)

~~~
seanmcdirmid
Well, genetic programming is a subset of program synthesis, not a same set.

I’m not sure how the use of an SMT solver could be considered a genetic
programming method.

~~~
opless
Ah, I read the first link in the comments first, which really does imply
they're using GP methods.

[https://homes.cs.washington.edu/~bornholt/post/synthesis-
exp...](https://homes.cs.washington.edu/~bornholt/post/synthesis-
explained.html)

However they're using a SMT solver as part of the fitness function of the
program.

You know - it needs the correct output, but you need to prevent overfitting,
and if you have a formal proof of the program generated you can expend greater
effort into reducing its complexity and optimising.

Again, it's an old new thing and possibly going to start getting hyped like
neural networks have been of late.

~~~
ychen306

        Ah, I read the first link in the comments first, which really does imply they're using GP methods.
    

No, it doesn't.

~~~
opless
And the evidence to back up your assertion, is where?

Here's mine... (excuse my lack of formatting.)

The synthesis step of stochastic superoptimisation finds the next candidate
program P’ by drawing an MCMC sample based on the previous candidate program
P. It proposes P’ by randomly applying one of a few mutations to P:

* changing the opcode of a randomly selected instruction

* changing a random operand of a randomly selected instruction

* inserting a new random instruction

* swapping two randomly selected instructions

* deleting an existing randomly selected instruction

The MCMC sampler uses the cost function, which measures how “close” to the
target program P’ is and how fast P’ is, to decide whether to accept the
candidate P’. A candidate is more likely to be accepted if it is close to the
target or very fast. But even programs that are slow or distant from the
target have some probability of being accepted, ensuring we explore novel
programs

~~~
ychen306
There's nothing "genetic" about how MCMC works. AFAIK, usually with GP, you
keep multiple candidates that you evolve together, which is not the case here
(it's much closer to metaheuristics such as Simulated Annealing), e.g.:

    
    
      let candidate be some random program
      while candidate not correct (per your SMT solver):
          new_candidate = mutate candidate
          let cost be cost of new_candidate
             (varies by different techniques;
              some just measure the number of bits differ
              from expected output of testcases)
          if cost is improved:
             candidate = new_candidate
          else, with some probability (the lower the cost, the higher the prob.):
             candidate = new_candidate
    

Of course, there's nothing preventing you from using GP instead of MCMC to
find programs. The focus of research is on how to mutate (e.g. how do you
avoid generating bad candidates based on past experience) and evlaute the cost
of a candidate, rather than the methaheuristics itself.

~~~
voidmain
That looks to me almost exactly like a (1,1) evolution strategy [1]. Our field
frequently has invented almost the same ideas multiple times in different
subfields, with different mathematical analogies and different terminology.
It's useful to identify these connections, and it helps not to be
automatically dismissive of people who seem to be using the "wrong"
terminology.

[1]
[https://en.wikipedia.org/wiki/Evolution_strategy](https://en.wikipedia.org/wiki/Evolution_strategy)

~~~
opless
Perhaps I should not have been so hasty.

It's just annoying that the implication is that the 'reinventors' haven't done
their research. It's almost as bad as my original feeling of "that's an new
old thing", and assuming malicious intent/plagiarism.

