>> Sorry, I did misunderstand what you were saying.
Thank you for your honesty and there's no need to apologise.
Let me explain my point above, if I can. You said that search, planning,
expert systems, etc, are "not AI, and they never will be". I understand that
as saying that such systems are not artificial intelligences, in the sense of
an intelligent machine created by humans out of whole cloth (without trying to
define what "intelligent" means).
That is certainly true, but it is also uncontroversial that the above are
sub-fields of the field of research that is known as "AI". That is, there is a
field that researches approaches that could lead to the creation of
intelligent machines and that is called "AI" and then there's the ultimate
target of that field which is to create "AI".
My original comment bemoans the fact that in some sectors, "AI", as a field of
research, has become synonymous with only one sub-sub-field of AI research,
that is, deep learning.
Contrary to what you state, these "GOFAI" fields (symbolic AI, if you will)
are still active and far from having "failed" in any way, they are "SOTA" in
their respective tasks. For example, the field of automated theorem proving is
dominated by systems that employ the resolution rule, a logic-based approach
and while recent efforts have been made to make inroads into theorem proving
with deep neural nets (e.g. a recent attempt to use transformers) results are
still very far from being comparable to the traditional approaches. I know
more about automated theorem proving that I know e.g. about planning or search
(because my PhD is in a subject close to the former) but my understanding is
that in those fields too, traditional techniques dominate- which is why
research in them is still active.
If I am permitted to tout my own horn, my broader subject area can be
described as "program learning", i.e. machine learning of arbitrary programs
from examples of their inputs and outputs. In this area also, deep learning
systems are hopelessly outclassed by symbolic approaches, not least because
these approaches learn precise representations of target programs (rather than
approximations) from a mere handful of examples (four or five, etc).
And so on. These are just some examples of AI research that is ongoing, that
is not deep learning, and that is state of the art for its respective tasks.
In view of that, I consider that using "AI" to mean "deep learning" (as the
article above does) is not only misinformed but also misinforming and actively
harmful. In the sense of harming people's understanding, that is.
As to your comment about how GOFAI "failed", I'm afraid this opinion, which is
common, is also misinformed. Here, too, a knowledge of the history of the
field helps, but to summarise, the last winter happened because of strictly
political reasons and for no reason that had anything to do with the
scientific merits, or practical successes of the relevant approaches. In fact,
expert systems, now widely considered a failure, were one of the first big
success stories of AI; a success story that was cut short only because, again,
of political reasons.
>> As much as you argue I should adopt the subject's history, I'm saying you
should adopt its present.
More to the point, I'm recommending that you should know a subject's history
before forming a strong opinion about its present and future. As to me, I'm
firmly rooted to the present. About half of the literature I read is neural
networks- and that's not even the subject of my research. But if you think
about it, in the era where the trend is to use deep learning, the most
interesting results can only come from not using deep learning. In a gold
rush, if everyone is digging up Widow's Creek, then Widow's Creek is the last
place to dig for gold.
I don't want to dismiss automated provers, as they are often quality, useful tools (SAT solvers in particular), but if you're interested in learning AI, traditional approaches are no longer more than briefly and tangentially relevant.
That's my point, that you don't need to learn woodworking to build a car, even if wooden carts still have occasional uses, and some cars have wood trim or wooden trailers.
> If I am permitted to tout my own horn, my broader subject area can be described as "program learning", i.e. machine learning of arbitrary programs from examples of their inputs and outputs. In this area also, deep learning systems are hopelessly outclassed by symbolic approaches, not least because these approaches learn precise representations of target programs (rather than approximations) from a mere handful of examples (four or five, etc).
I have looked at the program synthesis literature before and it really does not seem very advanced to me. The General Program Synthesis Benchmark Suite lists unsolved benchmarks like “Given three
strings n1, n2, and n3, return true if length(n1) <
length(n2) < length(n3), and false otherwise”, and that's with 100 examples. So, probably less practically useful than GPT-3, which wasn't even trained on the task.
> the last winter happened because of strictly political reasons and for no reason that had anything to do with the scientific merits, or practical successes of the relevant approaches
I disagree, but you've not given me much more than a list of vague references that don't all exactly support your argument, so I don't have much clue where you diverge.
If GOFAI worked, we'd see some indication of it working (again, in an AI context), but we don't.
> But if you think about it, in the era where the trend is to use deep learning, the most interesting results can only come from not using deep learning. In a gold rush, if everyone is digging up Widow's Creek, then Widow's Creek is the last place to dig for gold.
This analogy doesn't work. Neural networks are giving unparalleled results by the bucket. That's why people are digging there. A gold mine might have plenty of competing miners, but it's sure going to be a lot more likely to give you chunks of gold than a random patch of grass in your backyard.
I'm happy to see we're still in healthy disagreement. However, I have to
apologise for confusing you by describing my field as "program learning" which
is admittedly vague, but I didn't want to go into the particulars. My field is
not program synthesis, which is constructing programs from complete, formal
specifications. Rather, it's Inductive Programming and more specifically Inductive
Logic Programming (ILP), which is learning programs from examples, i.e.
"incomplete specifications". I'm not familiar with the General Program
Synthesis Benchmark Suite, but the problem you list (test three strings are
ordered by length) is trivial for ILP approaches. Again, I don't want to point
to my own research, directly (I'm going through a bit of a modesty phase) (oh,
alright, it's just that the documentation of my project is crap). However, I
have to say that even so, if something is a difficult problem for program
synthesis approaches, then it's very unlikely that neural networks will do any
better at it. For instance, do you know how well deep neural nets perform on
this benchmark? I can't find any results with a quick search.
You make the point that one does not need to learn these "obsolete" AI
approaches because they are not relevant anymore. I don't understand why you
say that. These approaches are still state of the art for their respective
tasks and there is no other approach that has been shown to do any better,
including deep neural networks. In what sense are they "no longer more than
briefly and tangentially relevant" as you say?
Regarding the gold rush, the point of the analogy is that in a gold rush only
a very few people will ever strike gold. This is exactly the state of research
into deep learning currently. After a few initial big breakthroughs, like CNNs
and LSTMs, progress has stalled and the vast, vast majority of published
papers (or papers put on arxiv permanently) present incremental results, if
that. Literally thousands of deep learning papers are published each month and
the chance to have an impact is miniscule. From my point of view, as a
researcher, going into deep learning right now would be career suicide. Not to
mention that, while the first few successes were achieved by small academic
teams, who had typical academic motives (er, glory), the game has now passed
to the hands of big corporate teams that have quite different incentives, so
it's almost impossible for small teams or individual researchers to make a
dent.
As to the winter and whether GOFAI works, perhaps I haven't convinced you with
my sources, but in that case, I have to go back to my earlier question and ask
where your knowledge comes from. You clearly have a strong opinion on GOFAI
and the AI winter of the '80s, but what knowledge does this opinion come from?
Can you say? And if this sounds like a challenge, well, that's because it is.
I'm challenging you to re-examine the basis of your convictions, if you like.
Because to me, they sound like they are not well-founded and that you should
put some water in your wine. The things you say "don't work", work and the
things you say work, don't work as well as you say.
For my part, I certainly agree that GPT-3 or the next iteration of a large
transformer-built language model can be a useful tool, but such a tool will
always be limited by the fact that it's, well, a language model, and it can
only do what language models do, which does not include e.g. the ability for
reasoning (despite big claims to the contrary) or arithmetic (ditto) or
generation of novel programs. For instance, the append() example you
show above is clearly memorised: you haven't given the model any examples of
append(), so it can't possibly learn its definition from examples. It only
returns a correct result because it's seen the results of append() before. Not
the same result, but close enough. Like I say, this ability can definitely
be useful- but its usefulness is limited compared to the ability to learn
arbitrary programs, never before seen.
btw, why do you need to give it the list "a"? What happens if this is ommitted
from the prompt?
> Rather, it's Inductive Programming and more specifically Inductive Logic Programming (ILP), which is learning programs from examples, i.e. "incomplete specifications". I'm not familiar with the General Program Synthesis Benchmark Suite, but the problem you list (test three strings are ordered by length) is trivial for ILP approaches.
The General Program Synthesis Benchmark Suite works from input-output examples, not “complete, formal specifications”.
How would you tackle this with ILP?
> However, I have to say that even so, if something is a difficult problem for program synthesis approaches, then it's very unlikely that neural networks will do any better at it. For instance, do you know how well deep neural nets perform on this benchmark?
I'm not aware of any serious at-scale attempts. Your option is basically to try few-shot with GPT-3.
OTOH, learning these trivial programs from 100 examples is a largely artificial framing used to support a field which hadn't worked its way up to meaningful problems, and in the more general sense, large networks are promising; eg. the GitHub-trained GPT:
> These approaches are still state of the art for their respective tasks and there is no other approach that has been shown to do any better, including deep neural networks. In what sense are they "no longer more than briefly and tangentially relevant" as you say?
“if you're interested in learning AI”
These techniques were invented from the field of AI, but that does not mean they remain in the field of AI.
> You clearly have a strong opinion on GOFAI and the AI winter of the '80s, but what knowledge does this opinion come from? Can you say?
I can argue why ML approaches are good and promising and point at that. I can argue why ML approaches make conceptual sense whereas GOFAI does not, though I don't see us resolving that short-term so I'd rather not. But what I can't so easily do is point to the non-existence of GOFAI AI successes. It's just not there.
You do have tools Watson and WolframAlpha which use GOFAI techniques for fact search over a large set of human-built knowledge repositories (trivia q's / math tools), but Watson is mostly considered a stunt, and I'm not aware of anyone calling WolframAlpha AI.
> the ability for reasoning (despite big claims to the contrary)
The nebulousness of the term ‘reasoning’ is pulling a lot of weight here. It's clearly doing sophisticated computations of some sort, beyond brute memorization.
There are more examples too, this is just addressing the one point people get wrong most often. BPEs are an interim performance hack, not an indictment on the approach in general.
> or generation of novel programs
Is clearly false.
> For instance, the append() example you show above is clearly memorised: you haven't given the model any examples of append(), so it can't possibly learn its definition from examples.
This is true, but it's mostly just an artifact of me having to prompt it through FitnessAI. Unlike smaller models, few-shot learning works, it just takes more space than I have to prompt with.
See the GitHub-trained example for something that integrates with more arbitrary code. There are many other examples, like the database prompt below (all bold is human input), or see some of the examples I linked above.
Q: “If z(str) = str + " " + str + " z" (for example, z("dumbell") = "dumbell dumbell z"), and g(str) = "k " + str + " j" then what is g("run")?”
A: “g("run") = "k run j"”
(The inverse problem doesn't work so well, giving “g(str) = "k run j"” for one example (valid but vapid) and “g(str) = "k str j"” for two (close but no banana), and confusion for more complex prompts, though I suspect the format is partially to blame. I can list other failure cases. But my point isn't that GPT-3 is reliable here; it's a language model.)
> btw, why do you need to give it the list "a"? What happens if this is ommitted from the prompt?
That example was from me trying to emulate an example I saw on Twitter I've since lost, which was a similar thing but multi-step, where each step GPT-3 returned all three lists, modified or queried per the given commands.
Omitting `a`, I get
Q: “b = ["lifting", "curls", "squats"], c = ["running", "jogging"], so what is b after b.append("pushups")?”
A: “lifting,curls,squats,pushups”
I had to change the prompt a bit because initially the result was truncated (FitnessAI is not made for this), or said “b.append("pushups") will add the string "pushups" to the end of b.”, which is correct but not what I wanted.
Few-shot would fix formatting inconsistencies; right now the model is just guessing.
In the interest of pruning this conversation a bit I will not continue the
discussion about GPT-3. Apologies, but this thread is growing too fast and I
don't have the time to give your comments the attention they deserve. I am
happy for you to have the last word in that matter.
>> These techniques were invented from the field of AI, but that does not mean
they remain in the field of AI.
Like I say above, it is pretty uncontroversial that these approaches are part
of the field of AI research. You can consult wikipedia or e.g. the calls for
papers from major AI conferences, AAAI and IJCAI, if in doubt.
So I have to ask again, why do you say these approaches are are not in the
field of AI research? According to whom? And based on what?
I would please like an answer to the above question.
Further, I can certainly point you to successes of symbolic AI, which you say
don't exist. For one thing, the entire fields of automated theorem proving,
planning, search, game playing, knowledge representation and reasoning, etc.
that you say are "not AI", but are like I say still active and still state of
the art in their respective tasks. These are certainly successful- they have
produced systems and techniques that still work best than any alternative and
actually quite well.
For examples of specific systems that were successful in their time, see Logic
Theorist [1] that proved 38 of the first 52 theorems in Principia Mathematica;
Chinook [2], the first computer program to win a world championship against
humans (in checkers/draughts); Deep Blue [3], the first AI system to defeat a
human grandmaster (Garry Kasparov) in chess; MYCIN [4] the first AI system to
outperform human experts in disease diagnosis (specifically, diagnosis of
infections); and so on.
Of course these systems have been superseded - but they were successes
nonetheless. Another reason to learn the history of AI is to become aware of
those systems- they, indeed, were "there".
Again I have to ask you- where does your knowledge of AI come from? When
you make such strong statements about what works and what doesn't, what failed
and what succeeded, are you sure you are well informed? Do you draw your
knowledge from primary sources, or are you trusting the opinions of others who
claim to be experts- but may not be (like in the article above)?
>> How would you tackle this with ILP?
Below I've defined the problem in the format expected by Louise [5]:
To explain, shorter/2 is a predicate defined as background knowledge by me.
triadic_chain is a metarule, a second-order clause that provides inductive
bias. length/2 is an ISO Prolog predicate.
Like I say, this is a trivial problem, not least because its solution is easy
to figure out and the background knowledge and metarules are trivial to define
by hand. Louise can also perform predicate invention to define new
background knowledge (kind of like inventing new features) and also new
metarules. That is to say, Louise can learn the shorter/2 and length/2
programs, also from very few examples- and then reuse them as background
knowledge. But showing how to do that would make for a larger example. I'm
happy to oblige if you are curious.
I should point out that there exists no neural net approach that can learn the
same (or a similar) program from a single positive example- not least because
neural nets cannot make use of background knowledge (i.e. a library of
programs from which to build other programs).
> You can consult wikipedia or e.g. the calls for papers from major AI conferences, AAAI and IJCAI, if in doubt.
As to Wikipedia, see the second paragraph. The sections where it mentions, eg., symbolic or sub-symbolic approaches are prefixed with “Researchers in the 1960s and the 1970s” or “By the 1980s”. Kind'a telling.
Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.
> Further, I can certainly point you to successes of symbolic AI, which you say don't exist. For one thing, the entire fields of automated theorem proving, planning, search, game playing, knowledge representation and reasoning, etc. that you say are "not AI", but are like I say still active and still state of the art in their respective tasks.
Yes, but there's a reason I suffixed that comment with “(again, in an AI context)”. GOFAI is great if you ignore the last two letters of the name, and how it failed almost all its major promises.
These used to be considered AI because it was thought that you could build a useful reasoning agent out of a combination of these techniques, given appropriate developments. Now (almost) nobody does that; Google Map's pathfinding is just a pathfinder, not a general reasoner.
> Below I've defined the problem in the format expected by Louise
Right, OK, I figured it'd be something like this, when you said it was trivial, but this is just another perspective on my original criticism. You wrote the program you wanted it to generate as the background knowledge.
It must be so, because your examples don't specify, even roughly approximately, the program you wanted generated. Another valid solution would be (among many)
ordered(A,B,C):-shorter(A,B),shorter(A,C).
and the only reason it didn't choose this is because you gave it the program you wanted it to generate (obfuscated a little, yet still, there was only one application available). It didn't ‘learn’ anything.
> I should point out that there exists no neural net approach that can learn the same (or a similar) program from a single positive example- not least because neural nets cannot make use of background knowledge (i.e. a library of programs from which to build other programs).
This is not true, but as you wanted to prune, I'll leave it there.
>> Like, my point is not about whether you can find the odd person trying to
solve intelligence with grammars, or what were GOFAI conferences still harbour
GOFAI research in the corners, my point is that a) these approaches don't work
as a way to actually tackle AI, the problem, b) the vast majority of the field
does not take them as seriously as a method of doing so, regardless of other
uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise
without having much care for those parts of the field.
>> Yes, but there's a reason I suffixed that comment with “(again, in an AI
context)”. GOFAI is great if you ignore the last two letters of the name, and
how it failed almost all its major promises.
>> These used to be considered AI because it was thought that you could build a
useful reasoning agent out of a combination of these techniques, given
appropriate developments. Now (almost) nobody does that; Google Map's
pathfinding is just a pathfinder, not a general reasoner.
I keep asking- how do you know these things to be true? Are they just your
opinion?
Because GOFAI just observably doesn't work. The ideas are brittle, can't generalize and abstract the way is needed, has made very little progress recently (if any) an AI context, and you just _don't see_ anything that would argue otherwise.
In contrast, ML methods do work, observably and clearly, and they work in a ridiculously general way, to a degree larger than almost anyone thought (or even thinks) is reasonable for them to.
And it's not just my opinion; there's a reason AI conference attendance has shot up a factor of 10 or so in the last few years, why NeurIPS is the leading one (and even historically GOFAI conferences are majority NNs), why the big AI labs with big AI cash are all doing NNs, and why all of a sudden AI is a popular topic outside academia.
If this doesn't answer your question, perhaps answer the opposite; how do you know that it's wrong?
>> If this doesn't answer your question, perhaps answer the opposite; how do you
know that it's wrong?
I know the literature. It's my job.
>> And it's not just my opinion; there's a reason AI conference attendance has
shot up a factor of 10 or so in the last few years, why NeurIPS is the leading
one (and even historically GOFAI conferences are majority NNs), why the big AI
labs with big AI cash are all doing NNs, and why all of a sudden AI is a
popular topic outside academia.
That's still an opinion- "it's not just my opinion, everyone says so". A.k.a.
"It is known", in Dothraki. And of course it is of no consequence who's
spending money on what and who's submitting papers where. The volume of
research was never a criterion for its quality. Heed thee well the legend of
our Lord Geoff Hinton's years in the academic wilderness and how he emerged
victorious with the laws of deep learning in his hands.
I think what you've said so far has convinced me you're expressing a personal
opinion that is strongly held without a good reason to do so. You make
sweeping statements with great certainty, but you don't really seem to know
how you know the things you know, so you end up "knowing" some things that you
don't really know. For instance, you claimed that "GOFAI" successes just
"aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't
seem to have heard of these before (I'm more surprised about not knowning of
Deep Blue than MYCIN).
You also claim that "these approaches are not AI". That's a "No True
Scottsman" right there. Except there really is no True Scottsman (i.e. "AI" in
the sense you use it)- ask Yoshua Bengio:
Bengio: In terms of how much progress we’ve made in this work over the last
two decades: I don’t think we’re anywhere close today to the level of
intelligence of a two-year-old child. But maybe we have algorithms that are
equivalent to lower animals, for perception. And we’re gradually climbing this
ladder in terms of tools that allow an entity to explore its environment.
Spectrum: Will any of these ideas be used in the real world anytime soon?
Bengio: No. This is all very basic research using toy problems. That’s fine,
that’s where we’re at. We can debug these ideas, move on to new hypotheses.
This is not ready for industry tomorrow morning.
> For instance, you claimed that "GOFAI" successes just "aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't seem to have heard of these before (I'm more surprised about not knowning of Deep Blue than MYCIN).
At this point I think we're just hopelessly talking past each other. Of course I know about Deep Blue. I didn't know about MYCIN, but, like, “MYCIN was never actually used in practice”, so I don't feel particularly bad about missing that one.
But neither of those challenge my point. If you want to go back in time 30 years, then sure, if you want to be an AI expert, then you have to know GOFAI. That's what the ‘OF’ stands for.
> I know the literature. It's my job.
Yah I read the literature too. (Albeit it seems a very different subset.) That's not an argument though.
But none of those sources says that e.g. search or planning are not AI fields. That was your original claim, if I'm not mistaken? Anyway it doesn't matter. It's a very strange thing to say and I was just trying to understand what made you say it- strictly out of curiousity.
I too can quote Hinton -from memory and without a link. I remember him saying that the next big thing in AI will come from a grad student who distrusts everything he (Hinton) has ever said. Unfortunately, I won't be that grad student- I haven't heard everything that Hinton has ever said.
I best summarized my claim when I said the following. Whether or not it's an ‘AI field’ is not very interesting to me, as long as the following holds.
---
Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.
Apologies for splitting the thread, but I thought it'd be easier to read this
way.
This comment addresses your concerns about me writing the program I wanted
Louise to generate. I like to see background knowledge ("BK", e.g. shorter/2)
as a library of sub-programs from which the learner can select the ones
necessary to compose a target program. The example above is trivial because
I've defined a BK predicate that is necessary and sufficient to learn, so the
learner was indeed served the solution "on a plate".
However, as I said in my previous comment, Louise can learn its own background
knowledge. This can be done by predicate invention, or more simply, by
incrementally learning necessary sub-programs.
Below is a problem definition and learning session that first learns length/2
(renamed llength/2 to avoid name clashes with the built-in) and shorter/2 from
list and numeric function primitives, before using the learned predicates as
BK for ordered/2. Like I say in my previous comment, it's a little larger
than the previous one:
The BK for this problem consists of tail/2, similar to "car" in Lisp (i.e.
matches the head of a list) and the pair of p/2 and s/2, that act as
"dereferencers" to Peano number functions. These are bog-standard Prolog
programs and useful whenever a target program must manipulate a list, or
perform numerical reasoning. In other words, they're pretty much generic, like
a standard library of sorts.
I've added the full source of the experiment file for the learning task on
pastebin. It includes a few more detailed comments and a set of constraints to
clean up the learned hypothesis, mostly for aesthetic reasons:
Of course this is still a toy problem and we know the solution. But I hope it
demonstrates the principle. On the other hand, you'd still not be able to
solve this with alternative approaches, e.g. I see that the benchmark suite
you pointed to is used for genetic programming. I'm also not aware of neural
approaches that build programs incrementally, from a couple of examples of
each sub-program.
That is to say, this is a toy problem for ILP. For other approaches it's
unsolvable.
Heck, I don't think it even got `shorter` right; it gave
shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't learn len(A) < len(B), not because the program isn't expressible with the primitives you gave, but because it just doesn't reason that far.
So again, it's only trivial because it isn't learning the program, it's learning to put fit the puzzle pieces of the program together, after you wrote the program and then chopped it up.
>> Heck, I don't think it even got `shorter` right; it gave
shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
>> which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't
learn len(A) < len(B), not because the program isn't expressible with the
primitives you gave, but because it just doesn't reason that far.
Regarding reasoning "that far" Louise can learn the complete successor /
predecessor relation (</2 and >/2) on its own and only from the primitives
s(N,s(N)) and p(s(N), N):
However, in the ordered/3 problem I define p/2 and s/2 by hand so that I can
put in ground_peano/1 to avoid infinite recursion when Louise tries to pass
two lists to s/2 or p/2 (at that point, their termination conditions never
obtain, so they keep recursing).
You can chalk the potential for infinite recursion up as a limitation, you're
very welcome- but there are techniques to avoid this and guarantee termination
(Knuth-Bendix ordering of the Herbrand base, see ref [1]) which I haven't come
round to implementing yet (because they are not necessary given a bit of
common sense in defining BK, as above). On the other hand that's actually a
feature, in the sense that earlier systems required more specific language
bias than the metarules, that would avoid this kind of type-unsafety, but also
demanded more expert knowledge from the user. In any case, there's outs.
>> You've just kicked the can down the road; what you've given there cannot
solve, for instance, the same problem but with <= instead of <.
Edit: You know, it just struck me but when you say that only ML has ever
worked out of all AI, that probably means you don't recognise Louise as a
machine learning system... because it's not deep learning. That's just another
instance of the strange synechdoche I was talking about in my first comment in
this thread, where to some peoples' knowledge only deep learning is machine
learning because that's all some people know of machine learning. A bit like
thinking that chicken is the only thing one can eat because all one has ever
had is chicken.
I don't think you've understood my point. At every step of the way you've put in more effort than writing the program and specifying the examples. Clearly this is not trivial.
If you're actually inferring programs and not just doing a sort of guided line-by-line generalization of a program you had already written, the only things you would need would be
Aren't you shifting the goalposts a bit? First you ask me how I'd solve this
problem with ILP:
“Given three strings n1, n2, and n3, return true if length(n1) < length(n2) <
length(n3), and false otherwise” (a)
I say this is a trivial problem to solve with ILP and I show you a, well,
trivial solution and you complain that it's - trivial.
Then I show you a more elaborate version that learns sub-programs on the way
to the full solution and you say that a) it doesn't solve a different problem,
with ≤ instead of < and b) that it's not trivial anymore.
And now you're saying you want a solution that learns from examples only. You
would have saved us both a lot of time had you clarified your expectations up
front.
No matter. There isn't anything that can do what you ask. Or rather, there are
many approaches that could learn (a) just from examples, with a brute-force
search. But there is no approach that could learn arbitrary programs only from
examples. The reason is that the space of all programs that can be computed by
a Universal Turing Machine ("arbitrary") is infinite and any learner trying to
find one of them blindly, without some kind of hint to guide it, would be lost
for ever inside it.
Most machine learning approaches that learn programs from examples adopt some
sort of inductive bias to guide a search for a program that satisfies some set
of goodness criteria, including neural approaches [1]. In ILP, inductive bias
consists primarily of BK and language bias (like the metarules in Louise). ILP
has a certain advantage in this, in that the languages of examples, bias and
hypotheses are the same (some first order logic language, like Prolog or ASP)
and so ILP systems can learn their own bias, like Louise can learn its own BK
and metarules. By way of comparison, neural nets, with their hand-crafted
architectures, minutely fine-tuned to specific domains or even particular
datasets, cannot do that (e.g. a trained model can't be used as a feature to
another neural net, in the way that ILP hypotheses can be used as BK). Of
course you need to start somewhere, from obvious primitives like head/2,
tail/2, s/2 and p/2 that I used above.
But I digress. The bottom line is that learning arbitrary programs from
examples is a hard problem for any machine learning approach [2].
Classification is a piece of cake, by comparison. And that is why there has
been so little progress in this problem even after decades of research [3].
The take home message of course is that neural nets are not the end of the
line in AI research and it would be disastrous for the progress of the field
to allow research into neural nets to eclipse every other approach. If this
happens it will all have to be discovered again, from scratch. And in another
70 years.
[3] This is where I'd normally say that there have been recent breakthroughs
that promise to overturn years of slow progress, but that's a story for
another time (and another venue most like).
“I have looked at the program synthesis literature before and it really does not seem very advanced to me. The General Program Synthesis Benchmark Suite lists unsolved benchmarks like “Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise”, and that's with 100 examples. So, probably less practically useful than GPT-3, which wasn't even trained on the task.”
I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.
> The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.
I'm not looking to solve arbitrary programs. I'm looking, at least at this first hurdle, to solve a few largely trivial ones.
If you saw 100 examples, you would be able to solve this problem, or at least get very close. Therefore this sort of theoretical argument cannot hold.
> By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK).
Section 4.7.1 shows this is entirely feasible for neural networks, though the technique is very different.
Though I prefer even more the elegance (and general absurdity) of learning the bias by pretraining on WebMath (GitHub, arXiv Math, Math StackExchange).
> The bottom line is that learning arbitrary programs from examples is a hard problem
I agree it's hard, but in my mind this sentence completes to “and therefore our only hope is to use most powerful tool we have available, neural networks.”
>> I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.
I must admit I'm very confused by this. I really thought you were talking about the problem of ordering three strings by length. What problem where you talking about, if not that?
The problem from the benchmark is automatically learning the function, to check three strings are ordered by length, from examples.
What you are doing is categorically different, because you are manually guiding the search process by telling it which interim lines to generate. This ignores all the interesting parts of the challenge.
Ah, I see what you mean. You mean that I explicitly told Louise how to compose
each program from its sub-programs. In truth, I did not. In the incremental
learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning
targets and a few primitives from which to construct them. I specifically and
very deliberately did not tell it to use each target to define another. It
figured it out on its own.
For instance, I did not direct Louise to define shorter/2 by means of
length/2. In order to do this I would have to specify length/2 as BK for
shorter/2, but I didn't. Note the BK declarations in the experiment file I put
on pastebin:
background_knowledge(llength/2, [tail/2,p/2]).
background_knowledge(shorter/2, [s/2]).
% ground_peano/1 is added here so it's reported by list_mil_problem/1
background_knowledge(ordered/3, [ground_peano/1]).
background_knowledge(ordered_leq/3, [leq/2,ground_peano/1]).
Remember how in an erarlier comment I pointed out that the definition of
shorter/2 changed to use leq/2, instead of s/2, when the examples of
ordered_leq/3 where in the BK? Again, Louise figured that on its own.
In fact, this ability of Louise (actually, its learning procedure, Top Program
Construction, or TPC) is kind of native, in the sense that TPC was
originally conceived as an algorithm to select relevant background knowledge
for a different learning system ("Thelma" for "Theory Learning Machine"; hence
"Louise"). So it can figure out what BK it needs on its own. Automatic
selection of relevant background knowledge in ILP was the original subject of
my PhD research. Interestingly, it turns out that once we had a system that
could perform this task, it could also learn its own programs.
> You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.
No, I get this, it's just not really more than a surface level pretense of choice. The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.
The ‘choice’ you are offering Louise is something like, idk, m=5, n=2. 5^2 is not big. You argue about the terminology, but whatever you call it, it's still exponentially less interesting than the problem first posed. All the interesting work has been done for it, by you.
With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.
Many apologies for the delay in replying - I missed the "more" link at the
bottom of the thread. And here I was, refreshing the page disappointed that no
more criticism was forthcoming.
>> The hard thing about generating programs is that there are many
possible programs; something like m^n, where ’m’ is the number of functions
you have available to use (say, ~1000) and `n` is the number of steps the
program needs to take (say, ~5 in this case), and there's another factor for
where to put the parameters which here is low enough to be mostly negligible.
It turns out even 1000^5 is really big, so this problem is hard if you don't
do it smart.
Indeed, the complexity of the raw, combinatorial problem is the greatest
hurdle in solving it in the general sense, however this time complexity is
calculated somewhat differently than in your comment. Let me show you.
First, in terms of ILP, the "number of functions you have available to use" is
the number of predicate symbols defined in the BK, which I'll notate as p.
"Where to put the parameters" is the number of body literals (similar to
function calls) in each metarule, which I'll notate as k. I'll notate the
number of metarules as m.
"The number of steps the program needs to take" is not relevant to the
calculation: we are trying to calculate the complexity of constructing the
program by blindly combining a set of building blocks (BK predicates and
metarules)- not the complexity of executing the program. What is relevant is
the size of the target theory, i.e. its number of clauses (program lines),
because of course a larger program means a larger number of combinations of
our building blocks. I'll notate the size of the target theory as n.
Putting it all together, the time complexity of constructing a program of n
clauses from p predicate symbols with m metarules with at most k body literals
(of any arity) is O(pmᵏ⁺¹)ⁿ [1]. This is an exponential time complexity that
corresponds to the size of the search space for programs that can be
constructed from these components, i.e. that's the number of constructible
programs. The time complexity of the problem is such that even n = 5 is
sufficient to completely bog down a powerful modern computer.
Louise can manage it because it doesn't conduct a search of that space,
instead it only constructs a unique object in that space, the Top program,
that can be constructed in polynomial time O(pmᵏ⁺¹) [2], i.e. the number of
constructible clauses. Indeed, Louise is capable of learning large programs,
of a few thousand clauses in a few minutes. In other words, the problem is
manageable because of the advances encapsulated by Louise's learning
procedure, Top Program Construction, not because the problem is trivial, as
you portray it - and not because I'm leading Louise by the hand, as you
suggest. Even if I was leading Louise by the hand, the combinatorial space
of constructible programs would still grow exponentially.
Regarding learning "only from examples" as I understand you to mean it, there
is some literature on that, of the kind you say is not "AI" (i.e. it predates
2012's deep learning boom). To my knowledge, this was first discussed in the
following:
1. Introduction
This paper addresses a deep difficulty with the generalization problem as
defined above: If consistency with the training instances is taken as the sole
determiner of appropriate generalizations, then a program can never make the
inductive leap necessary to classify instances beyond those it has observed.
Only if the program has other sources of information, or biases for choosing
one generalization over the other, can it non-arbitrarily classify instances
beyond those in the training set. In this paper, we use the term bias to
refer to any basis for choosing one generalization over another, other than
strict consistency with the observed training instances.
(...)
3. The Futility of Removing Biases
(...)
Although removing all biases from a generalization system may seem to be a
desirable goal, in fact the result is nearly useless. An unbiased learning
system’s ability to classify new instances is no better than if it simply
stored all the training instances and performed a lookup when asked to
classify a subsequent instance.
Ref: *"The Need for Biases in Learning Generalizations", T.M. Mitchell,
Rutgers Computer Science Department Technical Report CBM-TR-117, May, 1980.
Reprinted in Readings in Machine Learning, J. Shavlik and T. Dietterich,
eds., Morgan Kaufmann, 1990.*
http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf
But this is another reason to read old AI papers: to avoid falling down the
same holes people have already thoroughly explored in years gone by.
>> With the way you first laid out the question, there's a good chance (>1%) I
could have gotten the answer mostly right (up to parameter order) without
looking at the examples, just the background knowledge and the target type.
Have you tried doing that? I suggest you do- if only to get a feel for the
true difficulty of the problem.
> "The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program.
I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.
> Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.
How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.
If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.
> Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom).
Yes, and as usual for GOFAI it doesn't solve the problem.
>> I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.
I understand, but we don't need to take that term into account. We're only interested in the cost of a blind combinatorial search. Even if we added that term in, we'd just get a higher complexity.
>> How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.
I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.
>> If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.
There are only 1280 clauses, but there are about 2 billion programs of size n = 3 that can be constructed with those clauses, one of which is the target program. The challenge is to find the target program in that 2 billion.
The advantage of Louise is that it only needs to look at the 1280 clauses, not the 2 billion programs. The trick is to find how to do that. It's like the difference between sorting a list with bubblesort and quicksort. Quicksort has to do a lot less work, but the trick is figuring out quicksort.
>> Yes, and as usual for GOFAI it doesn't solve the problem.
Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
> I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.
I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?
> There are only 1280 clauses, but there are about 2 billion programs
Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.
> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
Deep neural nets do not learn only from examples! They encode strong inductive
biases in their carefully hand-engineered and hand-tuned architectures, hence
for example CNNs are used for image recognition and LSTMs for sequence
learning etc. Without these biases deep neural nets would not be able to
generalise as well as they do (in the sense of local generalisation but not
global generalisation as meant by François Chollet [1]).
The biggest advances in deep neural nets have come from the discovery and use
of good inductive biases: training with gradient descent, backpropagation,
more hidden layers, the "constant error carousel", convolutional layers, ReLu
over sigmoid, attention, etc, etc. One could say that deep neural nets are all
about good inductive bias.
It's interesting that you bring up the ARC dataset. The paper that introduced
it (also from Chollet) [2] makes a strong claim about the necessity of
"knowledge priors" for a system to be considered intelligent. These are
described at length in section III.1.2 "Core knowledge priors" and are exactly
a set of strong inductive biases that the author of the paper considers
necessary for a machine learning system to solve the ARC tasks and that
consist of such problem-specific biases as object cohesion, object
persistence, object influence via contact, etc. It is exactly such "knowledge
priors" that are encoded as background knowledge in ILP systems.
Indeed, in the ARC challenge on Kaggle, the best-performing systems (i.e. the
ones that solved the most tasks) were crude approximations of the ILP
approach: a library of hand-crafted functions and a brute-force search
procedure to combine them. I note also that attempts to use deep learning to
solve the challenge didn't go anywhere.
Humans also have strong inductive biases that help us solve such problems. But
I'm not the best placed to discuss all this - I'm not a cognitive scientist.
In the end, what you are asking for is magick: a learner that learns only from
examples, without any preconceived notions about how to learn from those
examples, or what to learn from them. There is no such machine learning
system.
>> Doesn't matter, since the programmer gives that information. As you admit,
Louise cannot check for arbitrary programs.
I don't understand what you mean "check for arbitrary programs". I can give
Louise zero BK and metarules and ask it to generate all Prolog programs, say.
Prolog is a Turing complete language so that would give me the set of all
programs computable by a Universal Turing Machine (it would take a while). But
what would that achieve?
At this point I'm not sure I understand what your remaining objections are
against the approach I showed you. For the purpose of learning arbitrary
programs it works better than anything else. Of course it's not magick.
Perhaps you should take my suggestion to think about the problem a bit more
carefully, if you're really intersted in it. Or are you? I mean, if you
consider AI solved, e.g. by GPT-3, then I can see how you wouldn't be
interested in thinking any further about the issue.
P.S. To clarify, I'm keeping this discussion up for your sake, albeit eagerly.
You have expressed some strongly held, but incorrect opinions that it seems to
me you have acquired by consulting inexpert sources, probably because you have
a day job that has nothing to do with AI and doesn't leave you enough time to
study the matter properly. My day job is to study AI and I feel that such a
privilege is only justified if I spend time and effort to help others improve
their knowledge on the subject. I'm guessing that on your part, you're more
interested in "winning" the conversation, but please try to gain something
from our interaction, otherwise all the time we both spent at it would be to
waste. When this is over, try to dig out and read some authoritative sources.
I would advise you on which ones - but you'd probably resist my recommendation
anyway, so you're on your own there.
My initial response was a fairly kneejerk reaction to the snark. The following is a rewrite. Please don't; if you really think so little of me, rather don't reply than reply unpleasantly.
> Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures
“Solomonoff Induction does not learn only from evidence! It encodes strong inductive biases in its construction and choice of Turing machine...”
but it doesn't matter. Our universe is not a random soup of maximal entropy.
The tasks I am talking about solving are overtly not impossible.
You talk about ML methods like the success of, say, image recognition comes from image-recognition-specific architectures. You mention ‘hand-engineered’ or ‘hand-tuned’. And yet, to throw your snark back at you, if you were up to date with the literature, you would know this is not true.
Consider ViT as an example. The same Transformer, the same minimal inductive biases, work as well for language modelling as for image segmentation as for proof search—the only difference perhaps that ViT works on patches for efficiency, though the paper shows that probably hurts performance in the limit. All it takes is an appropriate quantity of data to learn the appropriate task-specific adaptations the network needs. Heck, even cross-domain works; it's all one architecture, so it's all one inductive bias.
To my mind, this is what it means to learn from examples. There is no way that an architecture designed for language translation could also encode task-specific priors for these different tasks.
For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.
ML people call that data leakage.
> I don't understand what you mean. Yes, Louise can check for arbitrary programs. I can give it zero BK and metarules and ask it to generate all Prolog programs, say.
Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.
I can't edit my post but I forgot to calculate the actual size of the
hypothesis space for the multi-predicate problem for ordered/3. That is: p =
5 (tail/2, p/2, s/2 and llength/2 and shorter/2), m = 4, k = 3, n = 3 (the
target theory is a clause for each target). The size of the hypothesis search
space, i.e. the total number of programs of size n = 3 is (pm^(k+1))^n =
2,097,152,000.
As I say above, Louise's TPC procedure avoids searching this space and so
effectively ignores the exponential term, reducing the complexity of the
problem to that of, in the worst case, enumerating (pm^(k+1)) = 1,280 clauses.
In other words, the problem is easy for Louise, not because the problem itself
is trivial, but because Louise's learning procedure, TPC, is efficient.
For a further example, in the upcoming publication I mention above, Louise is
shown to learn a 2,567 clause theory in under 220 seconds with perfect
accuracy after training on 20% of all training examples. The hypothesis space
for this problem (grid world navigation) is in the order of 2*10^4944 but
Louise shrinks it to the problem of enumerating, at worst, a little over 81
million clauses.
Thank you for your honesty and there's no need to apologise.
Let me explain my point above, if I can. You said that search, planning, expert systems, etc, are "not AI, and they never will be". I understand that as saying that such systems are not artificial intelligences, in the sense of an intelligent machine created by humans out of whole cloth (without trying to define what "intelligent" means).
That is certainly true, but it is also uncontroversial that the above are sub-fields of the field of research that is known as "AI". That is, there is a field that researches approaches that could lead to the creation of intelligent machines and that is called "AI" and then there's the ultimate target of that field which is to create "AI".
My original comment bemoans the fact that in some sectors, "AI", as a field of research, has become synonymous with only one sub-sub-field of AI research, that is, deep learning.
Contrary to what you state, these "GOFAI" fields (symbolic AI, if you will) are still active and far from having "failed" in any way, they are "SOTA" in their respective tasks. For example, the field of automated theorem proving is dominated by systems that employ the resolution rule, a logic-based approach and while recent efforts have been made to make inroads into theorem proving with deep neural nets (e.g. a recent attempt to use transformers) results are still very far from being comparable to the traditional approaches. I know more about automated theorem proving that I know e.g. about planning or search (because my PhD is in a subject close to the former) but my understanding is that in those fields too, traditional techniques dominate- which is why research in them is still active.
If I am permitted to tout my own horn, my broader subject area can be described as "program learning", i.e. machine learning of arbitrary programs from examples of their inputs and outputs. In this area also, deep learning systems are hopelessly outclassed by symbolic approaches, not least because these approaches learn precise representations of target programs (rather than approximations) from a mere handful of examples (four or five, etc).
And so on. These are just some examples of AI research that is ongoing, that is not deep learning, and that is state of the art for its respective tasks. In view of that, I consider that using "AI" to mean "deep learning" (as the article above does) is not only misinformed but also misinforming and actively harmful. In the sense of harming people's understanding, that is.
As to your comment about how GOFAI "failed", I'm afraid this opinion, which is common, is also misinformed. Here, too, a knowledge of the history of the field helps, but to summarise, the last winter happened because of strictly political reasons and for no reason that had anything to do with the scientific merits, or practical successes of the relevant approaches. In fact, expert systems, now widely considered a failure, were one of the first big success stories of AI; a success story that was cut short only because, again, of political reasons.
I could talk about the AI winter subject for hours (it's a favourit subject of mine) but a good starting point is this article by the editor of the IEEE journal Intelligent Systems: (Avoiding Another Winter) https://www.computer.org/csdl/magazine/ex/2008/02/mex2008020.... The wikipedia page on https://en.wikipedia.org/wiki/AI_winter also has a decent summary and some sources. Finally, see the wikipedia article on the Lighthill Report https://en.wikipedia.org/wiki/Lighthill_report which is more relevant to AI research in the UK (the Report killed AI research in the UK, dead) and this review of the report by John McCarthy: http://www-formal.stanford.edu/jmc/reviews/lighthill/lighthi... (man who coined the term AI and also created LISP on the side).
>> As much as you argue I should adopt the subject's history, I'm saying you should adopt its present.
More to the point, I'm recommending that you should know a subject's history before forming a strong opinion about its present and future. As to me, I'm firmly rooted to the present. About half of the literature I read is neural networks- and that's not even the subject of my research. But if you think about it, in the era where the trend is to use deep learning, the most interesting results can only come from not using deep learning. In a gold rush, if everyone is digging up Widow's Creek, then Widow's Creek is the last place to dig for gold.