It's a much broader overview of the field so read that first if you need an intro!
A lot of the example programs mentioned feel more like functions (stateless + deterministic mapping of a tuple of inputs to a single output).
In contrast, I think of a 'program' as having a longish lifecycle comprising a stream of inputs and outputs.
Relatedly, I'm not clear how well the approaches detailed here actually would scale with complexity - is this a meaningful building block in building more complex applications? Or is some higher-order framework needed that 'knows' how to apply this in a rich context?
Most of the recent work in this field seems to be based around the idea of program "sketching", i.e. the programmer sketches a high-level specification, and the algorithms progressively synthesise and evaluate small functions / programs to efficiently implement the sketch.
No doubt machine learning researchers will have something to say about this though...
A big hall with computers, churning on calculations to end up with reusable modules/components for some predescribed purpose. These are combined and eventually larger systems emerge.
The inputs to the calculations were to be descriptions of what the program should do, no details on the "how". For instance, "make an input box for name and snailmail address, with cancel and ok buttons".
It was obviously warp engine level stuff when it comes to feasibility and I thought it would be very far away, like 100 years away... But maybe it is closer than that.
Tie generative methods in with solvers for hydrodynamics, motions, structures, stability, etc, and one can conceive of automating the ship hull design spiral, or sections of it anyway. That's not to say that some of this is not already out there in commercial software.
I’m not sure how the use of an SMT solver could be considered a genetic programming method.
However they're using a SMT solver as part of the fitness function of the program.
You know - it needs the correct output, but you need to prevent overfitting, and if you have a formal proof of the program generated you can expend greater effort into reducing its complexity and optimising.
Again, it's an old new thing and possibly going to start getting hyped like neural networks have been of late.
There are plenty of program synthesis techniques (maybe not this one) that do not use fitness functions and get the program straight from a solver. If they have anything written up, I’m sure the related work section would make the context more clear.
As for the accusation of band wagoning, it seems to come straight from the community that has been doing synthesis for years, not the ones that do GP.
Ah, I read the first link in the comments first, which really does imply they're using GP methods.
Here's mine... (excuse my lack of formatting.)
The synthesis step of stochastic superoptimisation finds the next candidate program P’ by drawing an MCMC sample based on the previous candidate program P. It proposes P’ by randomly applying one of a few mutations to P:
* changing the opcode of a randomly selected instruction
* changing a random operand of a randomly selected instruction
* inserting a new random instruction
* swapping two randomly selected instructions
* deleting an existing randomly selected instruction
The MCMC sampler uses the cost function, which measures how “close” to the target program P’ is and how fast P’ is, to decide whether to accept the candidate P’. A candidate is more likely to be accepted if it is close to the target or very fast. But even programs that are slow or distant from the target have some probability of being accepted, ensuring we explore novel programs
let candidate be some random program
while candidate not correct (per your SMT solver):
new_candidate = mutate candidate
let cost be cost of new_candidate
(varies by different techniques;
some just measure the number of bits differ
from expected output of testcases)
if cost is improved:
candidate = new_candidate
else, with some probability (the lower the cost, the higher the prob.):
candidate = new_candidate
It's just annoying that the implication is that the 'reinventors' haven't done their research. It's almost as bad as my original feeling of "that's an new old thing", and assuming malicious intent/plagiarism.
What does "close" mean here? I don't see that explained.
If you're trying to match outputs, then this is just old-fashioned GP with a minor twist - i.e. including speed in the fitness function, which has the potential to find some novel local maxima, which produce outputs that are close to the target AND very fast.
If you're trying to match instruction sequences - then I don't see the point at all.
GP often fails because it runs out of steam before producing a definitively correct solution.
It's easy to design cost/fitness functions that get close but not close enough, and slightly harder to design functions that solve a non-trivial problem some of the time.
It's incredibly hard to design functions that find an answer reliably without getting lost in the problem space.