> Rather, it's Inductive Programming and more specifically Inductive Logic Programming (ILP), which is learning programs from examples, i.e. "incomplete specifications". I'm not familiar with the General Program Synthesis Benchmark Suite, but the problem you list (test three strings are ordered by length) is trivial for ILP approaches.
The General Program Synthesis Benchmark Suite works from input-output examples, not “complete, formal specifications”.
How would you tackle this with ILP?
> However, I have to say that even so, if something is a difficult problem for program synthesis approaches, then it's very unlikely that neural networks will do any better at it. For instance, do you know how well deep neural nets perform on this benchmark?
I'm not aware of any serious at-scale attempts. Your option is basically to try few-shot with GPT-3.
OTOH, learning these trivial programs from 100 examples is a largely artificial framing used to support a field which hadn't worked its way up to meaningful problems, and in the more general sense, large networks are promising; eg. the GitHub-trained GPT:
> These approaches are still state of the art for their respective tasks and there is no other approach that has been shown to do any better, including deep neural networks. In what sense are they "no longer more than briefly and tangentially relevant" as you say?
“if you're interested in learning AI”
These techniques were invented from the field of AI, but that does not mean they remain in the field of AI.
> You clearly have a strong opinion on GOFAI and the AI winter of the '80s, but what knowledge does this opinion come from? Can you say?
I can argue why ML approaches are good and promising and point at that. I can argue why ML approaches make conceptual sense whereas GOFAI does not, though I don't see us resolving that short-term so I'd rather not. But what I can't so easily do is point to the non-existence of GOFAI AI successes. It's just not there.
You do have tools Watson and WolframAlpha which use GOFAI techniques for fact search over a large set of human-built knowledge repositories (trivia q's / math tools), but Watson is mostly considered a stunt, and I'm not aware of anyone calling WolframAlpha AI.
> the ability for reasoning (despite big claims to the contrary)
The nebulousness of the term ‘reasoning’ is pulling a lot of weight here. It's clearly doing sophisticated computations of some sort, beyond brute memorization.
There are more examples too, this is just addressing the one point people get wrong most often. BPEs are an interim performance hack, not an indictment on the approach in general.
> or generation of novel programs
Is clearly false.
> For instance, the append() example you show above is clearly memorised: you haven't given the model any examples of append(), so it can't possibly learn its definition from examples.
This is true, but it's mostly just an artifact of me having to prompt it through FitnessAI. Unlike smaller models, few-shot learning works, it just takes more space than I have to prompt with.
See the GitHub-trained example for something that integrates with more arbitrary code. There are many other examples, like the database prompt below (all bold is human input), or see some of the examples I linked above.
Q: “If z(str) = str + " " + str + " z" (for example, z("dumbell") = "dumbell dumbell z"), and g(str) = "k " + str + " j" then what is g("run")?”
A: “g("run") = "k run j"”
(The inverse problem doesn't work so well, giving “g(str) = "k run j"” for one example (valid but vapid) and “g(str) = "k str j"” for two (close but no banana), and confusion for more complex prompts, though I suspect the format is partially to blame. I can list other failure cases. But my point isn't that GPT-3 is reliable here; it's a language model.)
> btw, why do you need to give it the list "a"? What happens if this is ommitted from the prompt?
That example was from me trying to emulate an example I saw on Twitter I've since lost, which was a similar thing but multi-step, where each step GPT-3 returned all three lists, modified or queried per the given commands.
Omitting `a`, I get
Q: “b = ["lifting", "curls", "squats"], c = ["running", "jogging"], so what is b after b.append("pushups")?”
A: “lifting,curls,squats,pushups”
I had to change the prompt a bit because initially the result was truncated (FitnessAI is not made for this), or said “b.append("pushups") will add the string "pushups" to the end of b.”, which is correct but not what I wanted.
Few-shot would fix formatting inconsistencies; right now the model is just guessing.
In the interest of pruning this conversation a bit I will not continue the
discussion about GPT-3. Apologies, but this thread is growing too fast and I
don't have the time to give your comments the attention they deserve. I am
happy for you to have the last word in that matter.
>> These techniques were invented from the field of AI, but that does not mean
they remain in the field of AI.
Like I say above, it is pretty uncontroversial that these approaches are part
of the field of AI research. You can consult wikipedia or e.g. the calls for
papers from major AI conferences, AAAI and IJCAI, if in doubt.
So I have to ask again, why do you say these approaches are are not in the
field of AI research? According to whom? And based on what?
I would please like an answer to the above question.
Further, I can certainly point you to successes of symbolic AI, which you say
don't exist. For one thing, the entire fields of automated theorem proving,
planning, search, game playing, knowledge representation and reasoning, etc.
that you say are "not AI", but are like I say still active and still state of
the art in their respective tasks. These are certainly successful- they have
produced systems and techniques that still work best than any alternative and
actually quite well.
For examples of specific systems that were successful in their time, see Logic
Theorist [1] that proved 38 of the first 52 theorems in Principia Mathematica;
Chinook [2], the first computer program to win a world championship against
humans (in checkers/draughts); Deep Blue [3], the first AI system to defeat a
human grandmaster (Garry Kasparov) in chess; MYCIN [4] the first AI system to
outperform human experts in disease diagnosis (specifically, diagnosis of
infections); and so on.
Of course these systems have been superseded - but they were successes
nonetheless. Another reason to learn the history of AI is to become aware of
those systems- they, indeed, were "there".
Again I have to ask you- where does your knowledge of AI come from? When
you make such strong statements about what works and what doesn't, what failed
and what succeeded, are you sure you are well informed? Do you draw your
knowledge from primary sources, or are you trusting the opinions of others who
claim to be experts- but may not be (like in the article above)?
>> How would you tackle this with ILP?
Below I've defined the problem in the format expected by Louise [5]:
To explain, shorter/2 is a predicate defined as background knowledge by me.
triadic_chain is a metarule, a second-order clause that provides inductive
bias. length/2 is an ISO Prolog predicate.
Like I say, this is a trivial problem, not least because its solution is easy
to figure out and the background knowledge and metarules are trivial to define
by hand. Louise can also perform predicate invention to define new
background knowledge (kind of like inventing new features) and also new
metarules. That is to say, Louise can learn the shorter/2 and length/2
programs, also from very few examples- and then reuse them as background
knowledge. But showing how to do that would make for a larger example. I'm
happy to oblige if you are curious.
I should point out that there exists no neural net approach that can learn the
same (or a similar) program from a single positive example- not least because
neural nets cannot make use of background knowledge (i.e. a library of
programs from which to build other programs).
> You can consult wikipedia or e.g. the calls for papers from major AI conferences, AAAI and IJCAI, if in doubt.
As to Wikipedia, see the second paragraph. The sections where it mentions, eg., symbolic or sub-symbolic approaches are prefixed with “Researchers in the 1960s and the 1970s” or “By the 1980s”. Kind'a telling.
Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.
> Further, I can certainly point you to successes of symbolic AI, which you say don't exist. For one thing, the entire fields of automated theorem proving, planning, search, game playing, knowledge representation and reasoning, etc. that you say are "not AI", but are like I say still active and still state of the art in their respective tasks.
Yes, but there's a reason I suffixed that comment with “(again, in an AI context)”. GOFAI is great if you ignore the last two letters of the name, and how it failed almost all its major promises.
These used to be considered AI because it was thought that you could build a useful reasoning agent out of a combination of these techniques, given appropriate developments. Now (almost) nobody does that; Google Map's pathfinding is just a pathfinder, not a general reasoner.
> Below I've defined the problem in the format expected by Louise
Right, OK, I figured it'd be something like this, when you said it was trivial, but this is just another perspective on my original criticism. You wrote the program you wanted it to generate as the background knowledge.
It must be so, because your examples don't specify, even roughly approximately, the program you wanted generated. Another valid solution would be (among many)
ordered(A,B,C):-shorter(A,B),shorter(A,C).
and the only reason it didn't choose this is because you gave it the program you wanted it to generate (obfuscated a little, yet still, there was only one application available). It didn't ‘learn’ anything.
> I should point out that there exists no neural net approach that can learn the same (or a similar) program from a single positive example- not least because neural nets cannot make use of background knowledge (i.e. a library of programs from which to build other programs).
This is not true, but as you wanted to prune, I'll leave it there.
>> Like, my point is not about whether you can find the odd person trying to
solve intelligence with grammars, or what were GOFAI conferences still harbour
GOFAI research in the corners, my point is that a) these approaches don't work
as a way to actually tackle AI, the problem, b) the vast majority of the field
does not take them as seriously as a method of doing so, regardless of other
uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise
without having much care for those parts of the field.
>> Yes, but there's a reason I suffixed that comment with “(again, in an AI
context)”. GOFAI is great if you ignore the last two letters of the name, and
how it failed almost all its major promises.
>> These used to be considered AI because it was thought that you could build a
useful reasoning agent out of a combination of these techniques, given
appropriate developments. Now (almost) nobody does that; Google Map's
pathfinding is just a pathfinder, not a general reasoner.
I keep asking- how do you know these things to be true? Are they just your
opinion?
Because GOFAI just observably doesn't work. The ideas are brittle, can't generalize and abstract the way is needed, has made very little progress recently (if any) an AI context, and you just _don't see_ anything that would argue otherwise.
In contrast, ML methods do work, observably and clearly, and they work in a ridiculously general way, to a degree larger than almost anyone thought (or even thinks) is reasonable for them to.
And it's not just my opinion; there's a reason AI conference attendance has shot up a factor of 10 or so in the last few years, why NeurIPS is the leading one (and even historically GOFAI conferences are majority NNs), why the big AI labs with big AI cash are all doing NNs, and why all of a sudden AI is a popular topic outside academia.
If this doesn't answer your question, perhaps answer the opposite; how do you know that it's wrong?
>> If this doesn't answer your question, perhaps answer the opposite; how do you
know that it's wrong?
I know the literature. It's my job.
>> And it's not just my opinion; there's a reason AI conference attendance has
shot up a factor of 10 or so in the last few years, why NeurIPS is the leading
one (and even historically GOFAI conferences are majority NNs), why the big AI
labs with big AI cash are all doing NNs, and why all of a sudden AI is a
popular topic outside academia.
That's still an opinion- "it's not just my opinion, everyone says so". A.k.a.
"It is known", in Dothraki. And of course it is of no consequence who's
spending money on what and who's submitting papers where. The volume of
research was never a criterion for its quality. Heed thee well the legend of
our Lord Geoff Hinton's years in the academic wilderness and how he emerged
victorious with the laws of deep learning in his hands.
I think what you've said so far has convinced me you're expressing a personal
opinion that is strongly held without a good reason to do so. You make
sweeping statements with great certainty, but you don't really seem to know
how you know the things you know, so you end up "knowing" some things that you
don't really know. For instance, you claimed that "GOFAI" successes just
"aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't
seem to have heard of these before (I'm more surprised about not knowning of
Deep Blue than MYCIN).
You also claim that "these approaches are not AI". That's a "No True
Scottsman" right there. Except there really is no True Scottsman (i.e. "AI" in
the sense you use it)- ask Yoshua Bengio:
Bengio: In terms of how much progress we’ve made in this work over the last
two decades: I don’t think we’re anywhere close today to the level of
intelligence of a two-year-old child. But maybe we have algorithms that are
equivalent to lower animals, for perception. And we’re gradually climbing this
ladder in terms of tools that allow an entity to explore its environment.
Spectrum: Will any of these ideas be used in the real world anytime soon?
Bengio: No. This is all very basic research using toy problems. That’s fine,
that’s where we’re at. We can debug these ideas, move on to new hypotheses.
This is not ready for industry tomorrow morning.
> For instance, you claimed that "GOFAI" successes just "aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't seem to have heard of these before (I'm more surprised about not knowning of Deep Blue than MYCIN).
At this point I think we're just hopelessly talking past each other. Of course I know about Deep Blue. I didn't know about MYCIN, but, like, “MYCIN was never actually used in practice”, so I don't feel particularly bad about missing that one.
But neither of those challenge my point. If you want to go back in time 30 years, then sure, if you want to be an AI expert, then you have to know GOFAI. That's what the ‘OF’ stands for.
> I know the literature. It's my job.
Yah I read the literature too. (Albeit it seems a very different subset.) That's not an argument though.
But none of those sources says that e.g. search or planning are not AI fields. That was your original claim, if I'm not mistaken? Anyway it doesn't matter. It's a very strange thing to say and I was just trying to understand what made you say it- strictly out of curiousity.
I too can quote Hinton -from memory and without a link. I remember him saying that the next big thing in AI will come from a grad student who distrusts everything he (Hinton) has ever said. Unfortunately, I won't be that grad student- I haven't heard everything that Hinton has ever said.
I best summarized my claim when I said the following. Whether or not it's an ‘AI field’ is not very interesting to me, as long as the following holds.
---
Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.
Apologies for splitting the thread, but I thought it'd be easier to read this
way.
This comment addresses your concerns about me writing the program I wanted
Louise to generate. I like to see background knowledge ("BK", e.g. shorter/2)
as a library of sub-programs from which the learner can select the ones
necessary to compose a target program. The example above is trivial because
I've defined a BK predicate that is necessary and sufficient to learn, so the
learner was indeed served the solution "on a plate".
However, as I said in my previous comment, Louise can learn its own background
knowledge. This can be done by predicate invention, or more simply, by
incrementally learning necessary sub-programs.
Below is a problem definition and learning session that first learns length/2
(renamed llength/2 to avoid name clashes with the built-in) and shorter/2 from
list and numeric function primitives, before using the learned predicates as
BK for ordered/2. Like I say in my previous comment, it's a little larger
than the previous one:
The BK for this problem consists of tail/2, similar to "car" in Lisp (i.e.
matches the head of a list) and the pair of p/2 and s/2, that act as
"dereferencers" to Peano number functions. These are bog-standard Prolog
programs and useful whenever a target program must manipulate a list, or
perform numerical reasoning. In other words, they're pretty much generic, like
a standard library of sorts.
I've added the full source of the experiment file for the learning task on
pastebin. It includes a few more detailed comments and a set of constraints to
clean up the learned hypothesis, mostly for aesthetic reasons:
Of course this is still a toy problem and we know the solution. But I hope it
demonstrates the principle. On the other hand, you'd still not be able to
solve this with alternative approaches, e.g. I see that the benchmark suite
you pointed to is used for genetic programming. I'm also not aware of neural
approaches that build programs incrementally, from a couple of examples of
each sub-program.
That is to say, this is a toy problem for ILP. For other approaches it's
unsolvable.
Heck, I don't think it even got `shorter` right; it gave
shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't learn len(A) < len(B), not because the program isn't expressible with the primitives you gave, but because it just doesn't reason that far.
So again, it's only trivial because it isn't learning the program, it's learning to put fit the puzzle pieces of the program together, after you wrote the program and then chopped it up.
>> Heck, I don't think it even got `shorter` right; it gave
shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
>> which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't
learn len(A) < len(B), not because the program isn't expressible with the
primitives you gave, but because it just doesn't reason that far.
Regarding reasoning "that far" Louise can learn the complete successor /
predecessor relation (</2 and >/2) on its own and only from the primitives
s(N,s(N)) and p(s(N), N):
However, in the ordered/3 problem I define p/2 and s/2 by hand so that I can
put in ground_peano/1 to avoid infinite recursion when Louise tries to pass
two lists to s/2 or p/2 (at that point, their termination conditions never
obtain, so they keep recursing).
You can chalk the potential for infinite recursion up as a limitation, you're
very welcome- but there are techniques to avoid this and guarantee termination
(Knuth-Bendix ordering of the Herbrand base, see ref [1]) which I haven't come
round to implementing yet (because they are not necessary given a bit of
common sense in defining BK, as above). On the other hand that's actually a
feature, in the sense that earlier systems required more specific language
bias than the metarules, that would avoid this kind of type-unsafety, but also
demanded more expert knowledge from the user. In any case, there's outs.
>> You've just kicked the can down the road; what you've given there cannot
solve, for instance, the same problem but with <= instead of <.
Edit: You know, it just struck me but when you say that only ML has ever
worked out of all AI, that probably means you don't recognise Louise as a
machine learning system... because it's not deep learning. That's just another
instance of the strange synechdoche I was talking about in my first comment in
this thread, where to some peoples' knowledge only deep learning is machine
learning because that's all some people know of machine learning. A bit like
thinking that chicken is the only thing one can eat because all one has ever
had is chicken.
I don't think you've understood my point. At every step of the way you've put in more effort than writing the program and specifying the examples. Clearly this is not trivial.
If you're actually inferring programs and not just doing a sort of guided line-by-line generalization of a program you had already written, the only things you would need would be
Aren't you shifting the goalposts a bit? First you ask me how I'd solve this
problem with ILP:
“Given three strings n1, n2, and n3, return true if length(n1) < length(n2) <
length(n3), and false otherwise” (a)
I say this is a trivial problem to solve with ILP and I show you a, well,
trivial solution and you complain that it's - trivial.
Then I show you a more elaborate version that learns sub-programs on the way
to the full solution and you say that a) it doesn't solve a different problem,
with ≤ instead of < and b) that it's not trivial anymore.
And now you're saying you want a solution that learns from examples only. You
would have saved us both a lot of time had you clarified your expectations up
front.
No matter. There isn't anything that can do what you ask. Or rather, there are
many approaches that could learn (a) just from examples, with a brute-force
search. But there is no approach that could learn arbitrary programs only from
examples. The reason is that the space of all programs that can be computed by
a Universal Turing Machine ("arbitrary") is infinite and any learner trying to
find one of them blindly, without some kind of hint to guide it, would be lost
for ever inside it.
Most machine learning approaches that learn programs from examples adopt some
sort of inductive bias to guide a search for a program that satisfies some set
of goodness criteria, including neural approaches [1]. In ILP, inductive bias
consists primarily of BK and language bias (like the metarules in Louise). ILP
has a certain advantage in this, in that the languages of examples, bias and
hypotheses are the same (some first order logic language, like Prolog or ASP)
and so ILP systems can learn their own bias, like Louise can learn its own BK
and metarules. By way of comparison, neural nets, with their hand-crafted
architectures, minutely fine-tuned to specific domains or even particular
datasets, cannot do that (e.g. a trained model can't be used as a feature to
another neural net, in the way that ILP hypotheses can be used as BK). Of
course you need to start somewhere, from obvious primitives like head/2,
tail/2, s/2 and p/2 that I used above.
But I digress. The bottom line is that learning arbitrary programs from
examples is a hard problem for any machine learning approach [2].
Classification is a piece of cake, by comparison. And that is why there has
been so little progress in this problem even after decades of research [3].
The take home message of course is that neural nets are not the end of the
line in AI research and it would be disastrous for the progress of the field
to allow research into neural nets to eclipse every other approach. If this
happens it will all have to be discovered again, from scratch. And in another
70 years.
[3] This is where I'd normally say that there have been recent breakthroughs
that promise to overturn years of slow progress, but that's a story for
another time (and another venue most like).
“I have looked at the program synthesis literature before and it really does not seem very advanced to me. The General Program Synthesis Benchmark Suite lists unsolved benchmarks like “Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise”, and that's with 100 examples. So, probably less practically useful than GPT-3, which wasn't even trained on the task.”
I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.
> The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.
I'm not looking to solve arbitrary programs. I'm looking, at least at this first hurdle, to solve a few largely trivial ones.
If you saw 100 examples, you would be able to solve this problem, or at least get very close. Therefore this sort of theoretical argument cannot hold.
> By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK).
Section 4.7.1 shows this is entirely feasible for neural networks, though the technique is very different.
Though I prefer even more the elegance (and general absurdity) of learning the bias by pretraining on WebMath (GitHub, arXiv Math, Math StackExchange).
> The bottom line is that learning arbitrary programs from examples is a hard problem
I agree it's hard, but in my mind this sentence completes to “and therefore our only hope is to use most powerful tool we have available, neural networks.”
>> I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.
I must admit I'm very confused by this. I really thought you were talking about the problem of ordering three strings by length. What problem where you talking about, if not that?
The problem from the benchmark is automatically learning the function, to check three strings are ordered by length, from examples.
What you are doing is categorically different, because you are manually guiding the search process by telling it which interim lines to generate. This ignores all the interesting parts of the challenge.
Ah, I see what you mean. You mean that I explicitly told Louise how to compose
each program from its sub-programs. In truth, I did not. In the incremental
learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning
targets and a few primitives from which to construct them. I specifically and
very deliberately did not tell it to use each target to define another. It
figured it out on its own.
For instance, I did not direct Louise to define shorter/2 by means of
length/2. In order to do this I would have to specify length/2 as BK for
shorter/2, but I didn't. Note the BK declarations in the experiment file I put
on pastebin:
background_knowledge(llength/2, [tail/2,p/2]).
background_knowledge(shorter/2, [s/2]).
% ground_peano/1 is added here so it's reported by list_mil_problem/1
background_knowledge(ordered/3, [ground_peano/1]).
background_knowledge(ordered_leq/3, [leq/2,ground_peano/1]).
Remember how in an erarlier comment I pointed out that the definition of
shorter/2 changed to use leq/2, instead of s/2, when the examples of
ordered_leq/3 where in the BK? Again, Louise figured that on its own.
In fact, this ability of Louise (actually, its learning procedure, Top Program
Construction, or TPC) is kind of native, in the sense that TPC was
originally conceived as an algorithm to select relevant background knowledge
for a different learning system ("Thelma" for "Theory Learning Machine"; hence
"Louise"). So it can figure out what BK it needs on its own. Automatic
selection of relevant background knowledge in ILP was the original subject of
my PhD research. Interestingly, it turns out that once we had a system that
could perform this task, it could also learn its own programs.
> You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.
No, I get this, it's just not really more than a surface level pretense of choice. The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.
The ‘choice’ you are offering Louise is something like, idk, m=5, n=2. 5^2 is not big. You argue about the terminology, but whatever you call it, it's still exponentially less interesting than the problem first posed. All the interesting work has been done for it, by you.
With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.
Many apologies for the delay in replying - I missed the "more" link at the
bottom of the thread. And here I was, refreshing the page disappointed that no
more criticism was forthcoming.
>> The hard thing about generating programs is that there are many
possible programs; something like m^n, where ’m’ is the number of functions
you have available to use (say, ~1000) and `n` is the number of steps the
program needs to take (say, ~5 in this case), and there's another factor for
where to put the parameters which here is low enough to be mostly negligible.
It turns out even 1000^5 is really big, so this problem is hard if you don't
do it smart.
Indeed, the complexity of the raw, combinatorial problem is the greatest
hurdle in solving it in the general sense, however this time complexity is
calculated somewhat differently than in your comment. Let me show you.
First, in terms of ILP, the "number of functions you have available to use" is
the number of predicate symbols defined in the BK, which I'll notate as p.
"Where to put the parameters" is the number of body literals (similar to
function calls) in each metarule, which I'll notate as k. I'll notate the
number of metarules as m.
"The number of steps the program needs to take" is not relevant to the
calculation: we are trying to calculate the complexity of constructing the
program by blindly combining a set of building blocks (BK predicates and
metarules)- not the complexity of executing the program. What is relevant is
the size of the target theory, i.e. its number of clauses (program lines),
because of course a larger program means a larger number of combinations of
our building blocks. I'll notate the size of the target theory as n.
Putting it all together, the time complexity of constructing a program of n
clauses from p predicate symbols with m metarules with at most k body literals
(of any arity) is O(pmᵏ⁺¹)ⁿ [1]. This is an exponential time complexity that
corresponds to the size of the search space for programs that can be
constructed from these components, i.e. that's the number of constructible
programs. The time complexity of the problem is such that even n = 5 is
sufficient to completely bog down a powerful modern computer.
Louise can manage it because it doesn't conduct a search of that space,
instead it only constructs a unique object in that space, the Top program,
that can be constructed in polynomial time O(pmᵏ⁺¹) [2], i.e. the number of
constructible clauses. Indeed, Louise is capable of learning large programs,
of a few thousand clauses in a few minutes. In other words, the problem is
manageable because of the advances encapsulated by Louise's learning
procedure, Top Program Construction, not because the problem is trivial, as
you portray it - and not because I'm leading Louise by the hand, as you
suggest. Even if I was leading Louise by the hand, the combinatorial space
of constructible programs would still grow exponentially.
Regarding learning "only from examples" as I understand you to mean it, there
is some literature on that, of the kind you say is not "AI" (i.e. it predates
2012's deep learning boom). To my knowledge, this was first discussed in the
following:
1. Introduction
This paper addresses a deep difficulty with the generalization problem as
defined above: If consistency with the training instances is taken as the sole
determiner of appropriate generalizations, then a program can never make the
inductive leap necessary to classify instances beyond those it has observed.
Only if the program has other sources of information, or biases for choosing
one generalization over the other, can it non-arbitrarily classify instances
beyond those in the training set. In this paper, we use the term bias to
refer to any basis for choosing one generalization over another, other than
strict consistency with the observed training instances.
(...)
3. The Futility of Removing Biases
(...)
Although removing all biases from a generalization system may seem to be a
desirable goal, in fact the result is nearly useless. An unbiased learning
system’s ability to classify new instances is no better than if it simply
stored all the training instances and performed a lookup when asked to
classify a subsequent instance.
Ref: *"The Need for Biases in Learning Generalizations", T.M. Mitchell,
Rutgers Computer Science Department Technical Report CBM-TR-117, May, 1980.
Reprinted in Readings in Machine Learning, J. Shavlik and T. Dietterich,
eds., Morgan Kaufmann, 1990.*
http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf
But this is another reason to read old AI papers: to avoid falling down the
same holes people have already thoroughly explored in years gone by.
>> With the way you first laid out the question, there's a good chance (>1%) I
could have gotten the answer mostly right (up to parameter order) without
looking at the examples, just the background knowledge and the target type.
Have you tried doing that? I suggest you do- if only to get a feel for the
true difficulty of the problem.
> "The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program.
I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.
> Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.
How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.
If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.
> Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom).
Yes, and as usual for GOFAI it doesn't solve the problem.
>> I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.
I understand, but we don't need to take that term into account. We're only interested in the cost of a blind combinatorial search. Even if we added that term in, we'd just get a higher complexity.
>> How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.
I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.
>> If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.
There are only 1280 clauses, but there are about 2 billion programs of size n = 3 that can be constructed with those clauses, one of which is the target program. The challenge is to find the target program in that 2 billion.
The advantage of Louise is that it only needs to look at the 1280 clauses, not the 2 billion programs. The trick is to find how to do that. It's like the difference between sorting a list with bubblesort and quicksort. Quicksort has to do a lot less work, but the trick is figuring out quicksort.
>> Yes, and as usual for GOFAI it doesn't solve the problem.
Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
> I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.
I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?
> There are only 1280 clauses, but there are about 2 billion programs
Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.
> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
Deep neural nets do not learn only from examples! They encode strong inductive
biases in their carefully hand-engineered and hand-tuned architectures, hence
for example CNNs are used for image recognition and LSTMs for sequence
learning etc. Without these biases deep neural nets would not be able to
generalise as well as they do (in the sense of local generalisation but not
global generalisation as meant by François Chollet [1]).
The biggest advances in deep neural nets have come from the discovery and use
of good inductive biases: training with gradient descent, backpropagation,
more hidden layers, the "constant error carousel", convolutional layers, ReLu
over sigmoid, attention, etc, etc. One could say that deep neural nets are all
about good inductive bias.
It's interesting that you bring up the ARC dataset. The paper that introduced
it (also from Chollet) [2] makes a strong claim about the necessity of
"knowledge priors" for a system to be considered intelligent. These are
described at length in section III.1.2 "Core knowledge priors" and are exactly
a set of strong inductive biases that the author of the paper considers
necessary for a machine learning system to solve the ARC tasks and that
consist of such problem-specific biases as object cohesion, object
persistence, object influence via contact, etc. It is exactly such "knowledge
priors" that are encoded as background knowledge in ILP systems.
Indeed, in the ARC challenge on Kaggle, the best-performing systems (i.e. the
ones that solved the most tasks) were crude approximations of the ILP
approach: a library of hand-crafted functions and a brute-force search
procedure to combine them. I note also that attempts to use deep learning to
solve the challenge didn't go anywhere.
Humans also have strong inductive biases that help us solve such problems. But
I'm not the best placed to discuss all this - I'm not a cognitive scientist.
In the end, what you are asking for is magick: a learner that learns only from
examples, without any preconceived notions about how to learn from those
examples, or what to learn from them. There is no such machine learning
system.
>> Doesn't matter, since the programmer gives that information. As you admit,
Louise cannot check for arbitrary programs.
I don't understand what you mean "check for arbitrary programs". I can give
Louise zero BK and metarules and ask it to generate all Prolog programs, say.
Prolog is a Turing complete language so that would give me the set of all
programs computable by a Universal Turing Machine (it would take a while). But
what would that achieve?
At this point I'm not sure I understand what your remaining objections are
against the approach I showed you. For the purpose of learning arbitrary
programs it works better than anything else. Of course it's not magick.
Perhaps you should take my suggestion to think about the problem a bit more
carefully, if you're really intersted in it. Or are you? I mean, if you
consider AI solved, e.g. by GPT-3, then I can see how you wouldn't be
interested in thinking any further about the issue.
P.S. To clarify, I'm keeping this discussion up for your sake, albeit eagerly.
You have expressed some strongly held, but incorrect opinions that it seems to
me you have acquired by consulting inexpert sources, probably because you have
a day job that has nothing to do with AI and doesn't leave you enough time to
study the matter properly. My day job is to study AI and I feel that such a
privilege is only justified if I spend time and effort to help others improve
their knowledge on the subject. I'm guessing that on your part, you're more
interested in "winning" the conversation, but please try to gain something
from our interaction, otherwise all the time we both spent at it would be to
waste. When this is over, try to dig out and read some authoritative sources.
I would advise you on which ones - but you'd probably resist my recommendation
anyway, so you're on your own there.
My initial response was a fairly kneejerk reaction to the snark. The following is a rewrite. Please don't; if you really think so little of me, rather don't reply than reply unpleasantly.
> Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures
“Solomonoff Induction does not learn only from evidence! It encodes strong inductive biases in its construction and choice of Turing machine...”
but it doesn't matter. Our universe is not a random soup of maximal entropy.
The tasks I am talking about solving are overtly not impossible.
You talk about ML methods like the success of, say, image recognition comes from image-recognition-specific architectures. You mention ‘hand-engineered’ or ‘hand-tuned’. And yet, to throw your snark back at you, if you were up to date with the literature, you would know this is not true.
Consider ViT as an example. The same Transformer, the same minimal inductive biases, work as well for language modelling as for image segmentation as for proof search—the only difference perhaps that ViT works on patches for efficiency, though the paper shows that probably hurts performance in the limit. All it takes is an appropriate quantity of data to learn the appropriate task-specific adaptations the network needs. Heck, even cross-domain works; it's all one architecture, so it's all one inductive bias.
To my mind, this is what it means to learn from examples. There is no way that an architecture designed for language translation could also encode task-specific priors for these different tasks.
For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.
ML people call that data leakage.
> I don't understand what you mean. Yes, Louise can check for arbitrary programs. I can give it zero BK and metarules and ask it to generate all Prolog programs, say.
Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.
I see I made you feel bad with my advice to read up. I'm sorry, because that
was not my intention. However, you really do need to take my advise seriously.
You've insisted throughout our conversation that you don't need to read older
machine learning or AI papers because they're not relevant anymore. And yet,
they are. And you do need to read them because without them you will not be
able to understand the recent developments you seem to be intersted in.
Take for instance your example of ViT. This is a transformer, so it's clearly
not an unbiased generaliser that learns only from examples. You say so
yourself: "it's all one inductive bias". Yes, that's how machine learning
works and deep neural nets don't do anything different, neither do they learn
only from examples, as you seemed to suggest in your previous comment (you
replied "That's literally what DL is" to my comment that "you can't learn only
from examples").
But I think you misunderstood my comment about how the biggest advances in
deep neural nets have come from purpose-built architectures. That is not to
say that the same architectures cannot be applied to different domains- but
the state of the art systems are always fine-tuned for specific tasks or
datasets. This hasn't changed recently and it hasn't changed in the last 30
years.
>> For sure, one might call this ‘strong inductive biases’, in that the
program is not random bytes (as a truly bias-free algorithm must be), but
please at least admit that this is a complete different conceptual plane to
the sort of biases you give Louise. Louise's biases aren't merely task
specific, they're problem-specific. It would be one thing if Louise's biases
were a handwritten web of a million BK rules: fine, whatever, as long as it
solves the task that is obviously possible to solve. But they're not, they're
tuned per example.
A truly bias-free algorithm is not "random bytes". It's a learner that
memorises its training examples and can only recognise its training examples.
Hence why it can't generalise. This is in Mitchell's paper which I suggested
you read.
Louise's biases are not problem-specific in the short example I showed
you. I defined BK predicates with wide applicability in programs processing
lists and numbers. There is no such limitation, theoretical or practical, in
the general sense, either. You can give Louise a million irrelevant BK
predicates, if you like, and it will still find the ones it needs to complete
the learning task assuming they're in there somewhere. In fact, it will find
all of the relevant ones - and return the superset of all programs that
solve the task (so you can use it for example to identify interesting
relations in your dataset). Like I say in a previous comment, Louise's
learning algorithm was originally designed to select relevant BK.
Additionally, like I said in an earlier comment, Louise can learn its own
bias, both the BK and the metarules, so it is not only not limited to
task-specific bias, it is not even limited to user-provided bias. Under some
circumstances it can even invent new examples. And then use them to learn a
hypothesis that generalises better to unseen examples.
*
>> Louise can perhaps generate all Prolog programs. Louise cannot search the
space of Prolog programs.
> Louise's biases are not problem-specific in the short example I showed you.
This is clearly untrue.
You were customizing the BK to each specific task. You were also customizing the stepping stones for each specific task.
Justifications can come later. At least admit that you customized the BK for each problem instance and prior to doing so the solver did not solve the problem asked.
Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.
I did not "customize the BK to each specific task". You can go back and see
what I did. I provided some generic BK predicates that manipulate lists and
numbers, I defined some metarules and I gave a few examples of each program's
inputs and outputs.
I don't understand your criticism and I don't understand what you want me to
"at least admit".
>> This is clearly untrue.
Can you show me which biases in the example I showed are problem-specific?
>> Not responding to the rest since you've missed my entire point and I don't
feel like rephrasing it.
I don't think I missed your point. I think you, yourself, are horribly
confused about what point you are trying to make. And the reason of course is
that you want to be able to express strong opinions about AI and machine
learning, but you don't want to have to do the hard work to understand the
subject. So you keep saying "five impossible things before breakfast", like
asking for a learner that learns only from examples, or saying that's what
deep learning is, etc.
I'm sorry but despite what the article above suggests, there is't an easy way
to being an expert- not even in machine learning. If you want to know what
you're talking about, then you'll have to do your homework.
I can't edit my post but I forgot to calculate the actual size of the
hypothesis space for the multi-predicate problem for ordered/3. That is: p =
5 (tail/2, p/2, s/2 and llength/2 and shorter/2), m = 4, k = 3, n = 3 (the
target theory is a clause for each target). The size of the hypothesis search
space, i.e. the total number of programs of size n = 3 is (pm^(k+1))^n =
2,097,152,000.
As I say above, Louise's TPC procedure avoids searching this space and so
effectively ignores the exponential term, reducing the complexity of the
problem to that of, in the worst case, enumerating (pm^(k+1)) = 1,280 clauses.
In other words, the problem is easy for Louise, not because the problem itself
is trivial, but because Louise's learning procedure, TPC, is efficient.
For a further example, in the upcoming publication I mention above, Louise is
shown to learn a 2,567 clause theory in under 220 seconds with perfect
accuracy after training on 20% of all training examples. The hypothesis space
for this problem (grid world navigation) is in the order of 2*10^4944 but
Louise shrinks it to the problem of enumerating, at worst, a little over 81
million clauses.
The General Program Synthesis Benchmark Suite works from input-output examples, not “complete, formal specifications”.
How would you tackle this with ILP?
> However, I have to say that even so, if something is a difficult problem for program synthesis approaches, then it's very unlikely that neural networks will do any better at it. For instance, do you know how well deep neural nets perform on this benchmark?
I'm not aware of any serious at-scale attempts. Your option is basically to try few-shot with GPT-3.
OTOH, learning these trivial programs from 100 examples is a largely artificial framing used to support a field which hadn't worked its way up to meaningful problems, and in the more general sense, large networks are promising; eg. the GitHub-trained GPT:
https://www.youtube.com/watch?v=y5-wzgIySb4
or any of the GPT-3 programming demos:
https://twitter.com/sharifshameem/status/1284103765218299904 https://twitter.com/sharifshameem/status/1284815412949991425 https://www.reddit.com/r/commandline/comments/jl8jyr/the_nlc...
> These approaches are still state of the art for their respective tasks and there is no other approach that has been shown to do any better, including deep neural networks. In what sense are they "no longer more than briefly and tangentially relevant" as you say?
“if you're interested in learning AI”
These techniques were invented from the field of AI, but that does not mean they remain in the field of AI.
> You clearly have a strong opinion on GOFAI and the AI winter of the '80s, but what knowledge does this opinion come from? Can you say?
I can argue why ML approaches are good and promising and point at that. I can argue why ML approaches make conceptual sense whereas GOFAI does not, though I don't see us resolving that short-term so I'd rather not. But what I can't so easily do is point to the non-existence of GOFAI AI successes. It's just not there.
You do have tools Watson and WolframAlpha which use GOFAI techniques for fact search over a large set of human-built knowledge repositories (trivia q's / math tools), but Watson is mostly considered a stunt, and I'm not aware of anyone calling WolframAlpha AI.
> the ability for reasoning (despite big claims to the contrary)
The nebulousness of the term ‘reasoning’ is pulling a lot of weight here. It's clearly doing sophisticated computations of some sort, beyond brute memorization.
> or arithmetic (ditto)
http://gptprompts.wikidot.com/logic:math#toc6
There are more examples too, this is just addressing the one point people get wrong most often. BPEs are an interim performance hack, not an indictment on the approach in general.
> or generation of novel programs
Is clearly false.
> For instance, the append() example you show above is clearly memorised: you haven't given the model any examples of append(), so it can't possibly learn its definition from examples.
This is true, but it's mostly just an artifact of me having to prompt it through FitnessAI. Unlike smaller models, few-shot learning works, it just takes more space than I have to prompt with.
See the GitHub-trained example for something that integrates with more arbitrary code. There are many other examples, like the database prompt below (all bold is human input), or see some of the examples I linked above.
https://www.gwern.net/GPT-3#the-database-prompt
Or I can ask
Q: “If z(str) = str + " " + str + " z" (for example, z("dumbell") = "dumbell dumbell z"), and g(str) = "k " + str + " j" then what is g("run")?”
A: “g("run") = "k run j"”
(The inverse problem doesn't work so well, giving “g(str) = "k run j"” for one example (valid but vapid) and “g(str) = "k str j"” for two (close but no banana), and confusion for more complex prompts, though I suspect the format is partially to blame. I can list other failure cases. But my point isn't that GPT-3 is reliable here; it's a language model.)
> btw, why do you need to give it the list "a"? What happens if this is ommitted from the prompt?
That example was from me trying to emulate an example I saw on Twitter I've since lost, which was a similar thing but multi-step, where each step GPT-3 returned all three lists, modified or queried per the given commands.
Omitting `a`, I get
Q: “b = ["lifting", "curls", "squats"], c = ["running", "jogging"], so what is b after b.append("pushups")?”
A: “lifting,curls,squats,pushups”
I had to change the prompt a bit because initially the result was truncated (FitnessAI is not made for this), or said “b.append("pushups") will add the string "pushups" to the end of b.”, which is correct but not what I wanted.
Few-shot would fix formatting inconsistencies; right now the model is just guessing.