> Rather, it's Inductive Programming and more specifically Inductive Logic Progr...

YeGoblynQueenne · on Nov 9, 2020

In the interest of pruning this conversation a bit I will not continue the discussion about GPT-3. Apologies, but this thread is growing too fast and I don't have the time to give your comments the attention they deserve. I am happy for you to have the last word in that matter.

>> These techniques were invented from the field of AI, but that does not mean they remain in the field of AI.

Like I say above, it is pretty uncontroversial that these approaches are part of the field of AI research. You can consult wikipedia or e.g. the calls for papers from major AI conferences, AAAI and IJCAI, if in doubt.

So I have to ask again, why do you say these approaches are are not in the field of AI research? According to whom? And based on what?

I would please like an answer to the above question.

Further, I can certainly point you to successes of symbolic AI, which you say don't exist. For one thing, the entire fields of automated theorem proving, planning, search, game playing, knowledge representation and reasoning, etc. that you say are "not AI", but are like I say still active and still state of the art in their respective tasks. These are certainly successful- they have produced systems and techniques that still work best than any alternative and actually quite well.

For examples of specific systems that were successful in their time, see Logic Theorist [1] that proved 38 of the first 52 theorems in Principia Mathematica; Chinook [2], the first computer program to win a world championship against humans (in checkers/draughts); Deep Blue [3], the first AI system to defeat a human grandmaster (Garry Kasparov) in chess; MYCIN [4] the first AI system to outperform human experts in disease diagnosis (specifically, diagnosis of infections); and so on.

Of course these systems have been superseded - but they were successes nonetheless. Another reason to learn the history of AI is to become aware of those systems- they, indeed, were "there".

Again I have to ask you- where does your knowledge of AI come from? When you make such strong statements about what works and what doesn't, what failed and what succeeded, are you sure you are well informed? Do you draw your knowledge from primary sources, or are you trusting the opinions of others who claim to be experts- but may not be (like in the article above)?

>> How would you tackle this with ILP?

Below I've defined the problem in the format expected by Louise [5]:

  ?- list_mil_problem(ordered/3).
  Positive examples
  -----------------
  ordered([a],[b,c],[d,e,f]).
  
  Negative examples
  -----------------
  []
  
  Background knowledge
  --------------------
  shorter/2:
  shorter(A,B):-length(A,C),length(B,D),C<D.
  
  Metarules
  ---------
  triadic_chain metarule 'P(x,z,y):- Q(x,z), R(z,y)'.
  true.

Given this problem definition, Louise can learn the following (Prolog) program:

  ?- learn(ordered/3).
  ordered(A,B,C):-shorter(A,B),shorter(B,C).
  true.

To explain, shorter/2 is a predicate defined as background knowledge by me. triadic_chain is a metarule, a second-order clause that provides inductive bias. length/2 is an ISO Prolog predicate.

Like I say, this is a trivial problem, not least because its solution is easy to figure out and the background knowledge and metarules are trivial to define by hand. Louise can also perform predicate invention to define new background knowledge (kind of like inventing new features) and also new metarules. That is to say, Louise can learn the shorter/2 and length/2 programs, also from very few examples- and then reuse them as background knowledge. But showing how to do that would make for a larger example. I'm happy to oblige if you are curious.

I should point out that there exists no neural net approach that can learn the same (or a similar) program from a single positive example- not least because neural nets cannot make use of background knowledge (i.e. a library of programs from which to build other programs).

__________________

[1] https://en.wikipedia.org/wiki/Logic_Theorist

[2] https://en.wikipedia.org/wiki/Chinook_(computer_program)

[3] https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparo...

[4] https://en.wikipedia.org/wiki/Mycin

[5] https://github.com/stassa/louise/

Veedrac · on Nov 10, 2020

> You can consult wikipedia or e.g. the calls for papers from major AI conferences, AAAI and IJCAI, if in doubt.

As to Wikipedia, see the second paragraph. The sections where it mentions, eg., symbolic or sub-symbolic approaches are prefixed with “Researchers in the 1960s and the 1970s” or “By the 1980s”. Kind'a telling.

Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.

> Further, I can certainly point you to successes of symbolic AI, which you say don't exist. For one thing, the entire fields of automated theorem proving, planning, search, game playing, knowledge representation and reasoning, etc. that you say are "not AI", but are like I say still active and still state of the art in their respective tasks.

Yes, but there's a reason I suffixed that comment with “(again, in an AI context)”. GOFAI is great if you ignore the last two letters of the name, and how it failed almost all its major promises.

These used to be considered AI because it was thought that you could build a useful reasoning agent out of a combination of these techniques, given appropriate developments. Now (almost) nobody does that; Google Map's pathfinding is just a pathfinder, not a general reasoner.

> Below I've defined the problem in the format expected by Louise

Right, OK, I figured it'd be something like this, when you said it was trivial, but this is just another perspective on my original criticism. You wrote the program you wanted it to generate as the background knowledge.

It must be so, because your examples don't specify, even roughly approximately, the program you wanted generated. Another valid solution would be (among many)

    ordered(A,B,C):-shorter(A,B),shorter(A,C).

and the only reason it didn't choose this is because you gave it the program you wanted it to generate (obfuscated a little, yet still, there was only one application available). It didn't ‘learn’ anything.

> I should point out that there exists no neural net approach that can learn the same (or a similar) program from a single positive example- not least because neural nets cannot make use of background knowledge (i.e. a library of programs from which to build other programs).

This is not true, but as you wanted to prune, I'll leave it there.

YeGoblynQueenne · on Nov 10, 2020

>> Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.

>> Yes, but there's a reason I suffixed that comment with “(again, in an AI context)”. GOFAI is great if you ignore the last two letters of the name, and how it failed almost all its major promises.

>> These used to be considered AI because it was thought that you could build a useful reasoning agent out of a combination of these techniques, given appropriate developments. Now (almost) nobody does that; Google Map's pathfinding is just a pathfinder, not a general reasoner.

I keep asking- how do you know these things to be true? Are they just your opinion?

Can you please satisfy my curiousity on this?

Veedrac · on Nov 10, 2020

Because GOFAI just observably doesn't work. The ideas are brittle, can't generalize and abstract the way is needed, has made very little progress recently (if any) an AI context, and you just _don't see_ anything that would argue otherwise.

In contrast, ML methods do work, observably and clearly, and they work in a ridiculously general way, to a degree larger than almost anyone thought (or even thinks) is reasonable for them to.

And it's not just my opinion; there's a reason AI conference attendance has shot up a factor of 10 or so in the last few years, why NeurIPS is the leading one (and even historically GOFAI conferences are majority NNs), why the big AI labs with big AI cash are all doing NNs, and why all of a sudden AI is a popular topic outside academia.

If this doesn't answer your question, perhaps answer the opposite; how do you know that it's wrong?

YeGoblynQueenne · on Nov 10, 2020

>> If this doesn't answer your question, perhaps answer the opposite; how do you know that it's wrong?

I know the literature. It's my job.

>> And it's not just my opinion; there's a reason AI conference attendance has shot up a factor of 10 or so in the last few years, why NeurIPS is the leading one (and even historically GOFAI conferences are majority NNs), why the big AI labs with big AI cash are all doing NNs, and why all of a sudden AI is a popular topic outside academia.

That's still an opinion- "it's not just my opinion, everyone says so". A.k.a. "It is known", in Dothraki. And of course it is of no consequence who's spending money on what and who's submitting papers where. The volume of research was never a criterion for its quality. Heed thee well the legend of our Lord Geoff Hinton's years in the academic wilderness and how he emerged victorious with the laws of deep learning in his hands.

I think what you've said so far has convinced me you're expressing a personal opinion that is strongly held without a good reason to do so. You make sweeping statements with great certainty, but you don't really seem to know how you know the things you know, so you end up "knowing" some things that you don't really know. For instance, you claimed that "GOFAI" successes just "aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't seem to have heard of these before (I'm more surprised about not knowning of Deep Blue than MYCIN).

You also claim that "these approaches are not AI". That's a "No True Scottsman" right there. Except there really is no True Scottsman (i.e. "AI" in the sense you use it)- ask Yoshua Bengio:

Bengio: In terms of how much progress we’ve made in this work over the last two decades: I don’t think we’re anywhere close today to the level of intelligence of a two-year-old child. But maybe we have algorithms that are equivalent to lower animals, for perception. And we’re gradually climbing this ladder in terms of tools that allow an entity to explore its environment.

Spectrum: Will any of these ideas be used in the real world anytime soon?

Bengio: No. This is all very basic research using toy problems. That’s fine, that’s where we’re at. We can debug these ideas, move on to new hypotheses. This is not ready for industry tomorrow morning.

https://spectrum.ieee.org/tech-talk/artificial-intelligence/...

Or, you know, ask any AI researcher :)

Edit: Which "GOFAI" conferences are majority NNs? What period are we talking about?

Veedrac · on Nov 10, 2020

> For instance, you claimed that "GOFAI" successes just "aren't there" but I listed a few, like Deep Blue or MYCIN - and you didn't seem to have heard of these before (I'm more surprised about not knowning of Deep Blue than MYCIN).

At this point I think we're just hopelessly talking past each other. Of course I know about Deep Blue. I didn't know about MYCIN, but, like, “MYCIN was never actually used in practice”, so I don't feel particularly bad about missing that one.

But neither of those challenge my point. If you want to go back in time 30 years, then sure, if you want to be an AI expert, then you have to know GOFAI. That's what the ‘OF’ stands for.

> I know the literature. It's my job.

Yah I read the literature too. (Albeit it seems a very different subset.) That's not an argument though.

> Or, you know, ask any AI researcher :)

OpenAI is explicitly about the path to AGI, https://openai.com/about/.

DeepMind was also founded to tackle AGI (no source, sorry).

Geoffrey Hinton thinks NNs will get to AGI https://www.technologyreview.com/2020/11/03/1011616/ai-godfa....

Even in your own link, Yoshua Bengio is saying that this is a path to AGI, it's just not there yet.

> Which "GOFAI" conferences are majority NNs?

I said “historically GOFAI conferences”, so eg. AAAI.

YeGoblynQueenne · on Nov 11, 2020

But none of those sources says that e.g. search or planning are not AI fields. That was your original claim, if I'm not mistaken? Anyway it doesn't matter. It's a very strange thing to say and I was just trying to understand what made you say it- strictly out of curiousity.

I too can quote Hinton -from memory and without a link. I remember him saying that the next big thing in AI will come from a grad student who distrusts everything he (Hinton) has ever said. Unfortunately, I won't be that grad student- I haven't heard everything that Hinton has ever said.

Veedrac · on Nov 11, 2020

I best summarized my claim when I said the following. Whether or not it's an ‘AI field’ is not very interesting to me, as long as the following holds.

---

Like, my point is not about whether you can find the odd person trying to solve intelligence with grammars, or what were GOFAI conferences still harbour GOFAI research in the corners, my point is that a) these approaches don't work as a way to actually tackle AI, the problem, b) the vast majority of the field does not take them as seriously as a method of doing so, regardless of other uses, and c) therefore it's natural, not ‘impossible’, to gain AI expertise without having much care for those parts of the field.

YeGoblynQueenne · on Nov 12, 2020

Well, a) is not known and b) and c) are not correct.

YeGoblynQueenne · on Nov 10, 2020

Apologies for splitting the thread, but I thought it'd be easier to read this way.

This comment addresses your concerns about me writing the program I wanted Louise to generate. I like to see background knowledge ("BK", e.g. shorter/2) as a library of sub-programs from which the learner can select the ones necessary to compose a target program. The example above is trivial because I've defined a BK predicate that is necessary and sufficient to learn, so the learner was indeed served the solution "on a plate".

However, as I said in my previous comment, Louise can learn its own background knowledge. This can be done by predicate invention, or more simply, by incrementally learning necessary sub-programs.

Below is a problem definition and learning session that first learns length/2 (renamed llength/2 to avoid name clashes with the built-in) and shorter/2 from list and numeric function primitives, before using the learned predicates as BK for ordered/2. Like I say in my previous comment, it's a little larger than the previous one:

  ?- list_mil_problem([llength/2,shorter/2,ordered/3]).
  Positive examples
  -----------------
  llength([],0).
  llength([a],s(0)).
  shorter([a],[b,c]).
  shorter([1,2],[3,4,5]).
  ordered([a],[b,c],[d,e,f]).
  
  Negative examples
  -----------------
  :-ordered([a],[b],[c]).
  :-ordered([a,b,c],[a,b],[c]).
  
  Background knowledge
  --------------------
  tail/2:
  tail([A|B],B).
  
  p/2:
  p(s(A),A).
  
  s/2:
  s(A,s(A)).
  
  Metarules
  ---------
  abduce metarule 'P(X,Y)'.
  list_rec_func metarule 'P(x,y):- Q(x,z),R(y,u),P(z,u)'.
  list_comp metarule 'P(x,y):- Q(x,z), R(y,u), S(z,u)'.
  triadic_chain metarule 'P(x,z,y):- Q(x,z), R(z,y)'.
  true.

  ?- time(learn_dynamic([llength/2,shorter/2,ordered/3])).
  llength([],0).
  llength(A,B):-tail(A,C),p(B,D),llength(C,D).
  shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
  ordered(A,B,C):-shorter(A,B),shorter(B,C).
  % 20,928 inferences, 0.000 CPU in 0.007 seconds (0% CPU, Infinite Lips)
  true.

The BK for this problem consists of tail/2, similar to "car" in Lisp (i.e. matches the head of a list) and the pair of p/2 and s/2, that act as "dereferencers" to Peano number functions. These are bog-standard Prolog programs and useful whenever a target program must manipulate a list, or perform numerical reasoning. In other words, they're pretty much generic, like a standard library of sorts.

I've added the full source of the experiment file for the learning task on pastebin. It includes a few more detailed comments and a set of constraints to clean up the learned hypothesis, mostly for aesthetic reasons:

https://pastebin.com/nPYFKKpx

Of course this is still a toy problem and we know the solution. But I hope it demonstrates the principle. On the other hand, you'd still not be able to solve this with alternative approaches, e.g. I see that the benchmark suite you pointed to is used for genetic programming. I'm also not aware of neural approaches that build programs incrementally, from a couple of examples of each sub-program.

That is to say, this is a toy problem for ILP. For other approaches it's unsolvable.

Veedrac · on Nov 10, 2020

You've just kicked the can down the road; what you've given there cannot solve, for instance, the same problem but with <= instead of <.

I checked by running with

    positive_example(ordered/3,ordered(S1,S2,S3)):-
        member([S1,S2,S3],[[[a],[b,c],[d,e,f]]
                          ,[[a,b],[c,d],[e,f,g]]
                          ,[[a],[c],[e,f,g,h,i]]
                          ]
              ).

    negative_example(ordered/3,ordered(S1,S2,S3)):-
        member([S1,S2,S3],[[[a],[],[c]]
     ,[[a,b,c],[a,b],[c]]
     ]
              ).

and just got

    ordered([a,b],[c,d],[e,f,g]).
    ordered([a],[c],[e,f,g,h,i]).
    ordered(A,B,C):-shorter(A,B),shorter(B,C).

Heck, I don't think it even got `shorter` right; it gave

    shorter(A,B):-llength(A,C),llength(B,D),s(C,D).

which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't learn len(A) < len(B), not because the program isn't expressible with the primitives you gave, but because it just doesn't reason that far.

So again, it's only trivial because it isn't learning the program, it's learning to put fit the puzzle pieces of the program together, after you wrote the program and then chopped it up.

YeGoblynQueenne · on Nov 10, 2020

>> Heck, I don't think it even got `shorter` right; it gave

    shorter(A,B):-llength(A,C),llength(B,D),s(C,D).

>> which means len(A) + 1 == len(B), not len(A) < len(B), and AFAICT it can't learn len(A) < len(B), not because the program isn't expressible with the primitives you gave, but because it just doesn't reason that far.

Oops. Haha well spotted @^_^

This is correct for </2:

  ?- list_mil_problem([llength/2,shorter/2,ordered/3]).
  Positive examples
  -----------------
  llength([],0).
  llength([a],s(0)).
  shorter([a],[b,c]).
  shorter([1,2],[3,4,5]).
  ordered([a],[b,c],[d,e,f]).
  
  Negative examples
  -----------------
  :-ordered([a],[b],[c]).
  :-ordered([a,b,c],[a,b],[c]).
  
  Background knowledge
  --------------------
  tail/2:
  tail([A|B],B).
  
  p/2:
  p(s(A),0):-ground_peano(A).
  p(s(A),A).
  p(s(A),s(B)):-ground_peano(A),p(A,B).
  
  s/2:
  s(0,s(A)):-ground_peano(A).
  s(A,s(A)).
  s(s(A),s(B)):-ground_peano(B),s(A,B).
  
  ground_peano/1:
  ground_peano(A):-ground(A),\+is_list(A).
  
  Metarules
  ---------
  abduce metarule 'P(X,Y)'.
  list_rec_func metarule 'P(x,y):- Q(x,z),R(y,u),P(z,u)'.
  list_comp metarule 'P(x,y):- Q(x,z), R(y,u), S(z,u)'.
  triadic_chain metarule 'P(x,z,y):- Q(x,z), R(z,y)'.
  true.


  ?- time(learn_dynamic([llength/2,shorter/2,ordered/3])).
  llength([],0).
  llength(A,B):-tail(A,C),p(B,D),llength(C,D).
  shorter(A,B):-llength(A,C),llength(B,D),s(C,D).
  ordered(A,B,C):-shorter(A,B),shorter(B,C).
  % 29,554 inferences, 0.000 CPU in 0.011 seconds (0% CPU, Infinite Lips)
  true.

Paste that in a file and consult it to test it:

  ?- ordered:ordered([1],[1,2,3],[1,2,3,4,5,6,7,8]).
  true .

Regarding reasoning "that far" Louise can learn the complete successor / predecessor relation (</2 and >/2) on its own and only from the primitives s(N,s(N)) and p(s(N), N):

  ?- list_mil_problem([s/2,p/2]).
  Positive examples
  -----------------
  s(0,s(A)).
  s(0,s(0)).
  s(s(0),s(s(s(s(0))))).
  p(s(A),0).
  p(s(0),0).
  p(s(s(s(s(0)))),s(0)).
  
  Negative examples
  -----------------
  []
  
  Background knowledge
  --------------------
  s_/2:
  s_(A,s(A)).
  
  p_/2:
  p_(s(A),A).
  
  Metarules
  ---------
  identity metarule 'P(x,y):- Q(x,y)'.
  chain metarule 'P(x,y):- Q(x,z), R(z,y)'.
  true.
  
  ?- learn(s/2).
  s(0,s(A)).
  s(A,B):-s_(A,B).
  s(A,B):-s_(A,C),s(C,B).
  true.
  
  ?- learn(p/2).
  p(s(A),0).
  p(A,B):-p_(A,B).
  p(A,B):-p_(A,C),p(C,B).
  true.

However, in the ordered/3 problem I define p/2 and s/2 by hand so that I can put in ground_peano/1 to avoid infinite recursion when Louise tries to pass two lists to s/2 or p/2 (at that point, their termination conditions never obtain, so they keep recursing).

You can chalk the potential for infinite recursion up as a limitation, you're very welcome- but there are techniques to avoid this and guarantee termination (Knuth-Bendix ordering of the Herbrand base, see ref [1]) which I haven't come round to implementing yet (because they are not necessary given a bit of common sense in defining BK, as above). On the other hand that's actually a feature, in the sense that earlier systems required more specific language bias than the metarules, that would avoid this kind of type-unsafety, but also demanded more expert knowledge from the user. In any case, there's outs.

>> You've just kicked the can down the road; what you've given there cannot solve, for instance, the same problem but with <= instead of <.

That's a different problem. Off we go:

  ?- list_mil_problem([llength/2,shorter/2,ordered_leq/3]).
  Positive examples
  -----------------
  llength([],0).
  llength([a],s(0)).
  shorter([a],[b,c]).
  shorter([1,2],[3,4,5]).
  ordered_leq([a],[b],[d]).
  ordered_leq([a],[b,c],[d,e,f]).
  ordered_leq([a],[c],[e,f,g,h,i]).
  
  Negative examples
  -----------------
  :-ordered([a,b,c],[a,b],[c]).
  
  Background knowledge
  --------------------
  tail/2:
  tail([A|B],B).
  
  p/2:
  p(s(A),0):-ground_peano(A).
  p(s(A),A).
  p(s(A),s(B)):-ground_peano(A),p(A,B).
  
  s/2:
  s(0,s(A)):-ground_peano(A).
  s(A,s(A)).
  s(s(A),s(B)):-ground_peano(B),s(A,B).
  
  leq/2:
  leq(A,A):-ground_peano(A).
  leq(A,B):-s(A,B).
  
  ground_peano/1:
  ground_peano(A):-ground(A),\+is_list(A).
  
  Metarules
  ---------
  abduce metarule 'P(X,Y)'.
  list_rec_func metarule 'P(x,y):- Q(x,z),R(y,u),P(z,u)'.
  list_comp metarule 'P(x,y):- Q(x,z), R(y,u), S(z,u)'.
  triadic_chain metarule 'P(x,z,y):- Q(x,z), R(z,y)'.
  true.
  
  ?- learn_dynamic([llength/2,shorter/2,ordered_leq/3]).
  llength([],0).
  llength(A,B):-tail(A,C),p(B,D),llength(C,D).
  shorter(A,B):-llength(A,C),llength(B,D),leq(C,D).
  ordered_leq(A,B,C):-shorter(A,B),shorter(B,C).
  true.

Or I could have added an eq(X,X) predicate instead of leq/2.

Note that I didn't declare leq/2 as BK for shorter/2 this time around:

  ?- background_knowledge(shorter/2, BK).
  BK = [s/2].

I didn't even change its examples:

  ?- positive_example(shorter/2, E).
  E = shorter([a], [b, c]) ;
  E = shorter([1, 2], [3, 4, 5]).

It picked it up on its own, because it's the best way to define shorter/2 as a sub-program for ordered_leq/3. Aaaw. Isn't it smart?

Pastebins for the source files:

ordered/3 and ordered_leq/3: https://pastebin.com/6NH0VTKK

s/2 and p/2: https://pastebin.com/0d0YWMfV

You'll let me know if I've done something else dumb, yes? :)

________________________

[1] https://www.doc.ic.ac.uk/~atn/papers/metagol_mlj.pdf

See section 4.1 "Ordering the Herbrand Base".

Edit: You know, it just struck me but when you say that only ML has ever worked out of all AI, that probably means you don't recognise Louise as a machine learning system... because it's not deep learning. That's just another instance of the strange synechdoche I was talking about in my first comment in this thread, where to some peoples' knowledge only deep learning is machine learning because that's all some people know of machine learning. A bit like thinking that chicken is the only thing one can eat because all one has ever had is chicken.

Veedrac · on Nov 10, 2020

I don't think you've understood my point. At every step of the way you've put in more effort than writing the program and specifying the examples. Clearly this is not trivial.

If you're actually inferring programs and not just doing a sort of guided line-by-line generalization of a program you had already written, the only things you would need would be

  Positive examples
  -----------------
  ordered([a],[b,c],[d,e,f]).
  
  Negative examples
  -----------------
  ordered([a],[b],[c]).
  ordered([a,b,c],[a,b],[c]).

I was never in doubt that it was possible to write a different program which would ‘solve’ <=. My criticism was that “it's a different problem”.

YeGoblynQueenne · on Nov 11, 2020

Aren't you shifting the goalposts a bit? First you ask me how I'd solve this problem with ILP:

“Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise” (a)

I say this is a trivial problem to solve with ILP and I show you a, well, trivial solution and you complain that it's - trivial.

Then I show you a more elaborate version that learns sub-programs on the way to the full solution and you say that a) it doesn't solve a different problem, with ≤ instead of < and b) that it's not trivial anymore.

And now you're saying you want a solution that learns from examples only. You would have saved us both a lot of time had you clarified your expectations up front.

No matter. There isn't anything that can do what you ask. Or rather, there are many approaches that could learn (a) just from examples, with a brute-force search. But there is no approach that could learn arbitrary programs only from examples. The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.

Most machine learning approaches that learn programs from examples adopt some sort of inductive bias to guide a search for a program that satisfies some set of goodness criteria, including neural approaches [1]. In ILP, inductive bias consists primarily of BK and language bias (like the metarules in Louise). ILP has a certain advantage in this, in that the languages of examples, bias and hypotheses are the same (some first order logic language, like Prolog or ASP) and so ILP systems can learn their own bias, like Louise can learn its own BK and metarules. By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK). Of course you need to start somewhere, from obvious primitives like head/2, tail/2, s/2 and p/2 that I used above.

But I digress. The bottom line is that learning arbitrary programs from examples is a hard problem for any machine learning approach [2]. Classification is a piece of cake, by comparison. And that is why there has been so little progress in this problem even after decades of research [3].

The take home message of course is that neural nets are not the end of the line in AI research and it would be disastrous for the progress of the field to allow research into neural nets to eclipse every other approach. If this happens it will all have to be discovered again, from scratch. And in another 70 years.

____________________

[1] e.g. see Learning explanatory rules from noisy data https://arxiv.org/abs/1711.04574 by DeepMind, which also uses metarules.

[2] See for example:

Deep Learning for Program Synthesis

Synthesizing a program from a specification has been a long-standing challenge.

(...)

This problem is extremely challenging, and the complexity of the synthesized programs by existing approaches is still limited.

https://sunblaze-ucb.github.io/program-synthesis/index.html

[3] This is where I'd normally say that there have been recent breakthroughs that promise to overturn years of slow progress, but that's a story for another time (and another venue most like).

Veedrac · on Nov 11, 2020

> Aren't you shifting the goalposts a bit?

No, I don't think so. My original comment was

“I have looked at the program synthesis literature before and it really does not seem very advanced to me. The General Program Synthesis Benchmark Suite lists unsolved benchmarks like “Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise”, and that's with 100 examples. So, probably less practically useful than GPT-3, which wasn't even trained on the task.”

I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.

> The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.

I'm not looking to solve arbitrary programs. I'm looking, at least at this first hurdle, to solve a few largely trivial ones.

If you saw 100 examples, you would be able to solve this problem, or at least get very close. Therefore this sort of theoretical argument cannot hold.

> By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK).

I assume you've seen GPT-f? (https://arxiv.org/abs/2009.03393)

Section 4.7.1 shows this is entirely feasible for neural networks, though the technique is very different.

Though I prefer even more the elegance (and general absurdity) of learning the bias by pretraining on WebMath (GitHub, arXiv Math, Math StackExchange).

> The bottom line is that learning arbitrary programs from examples is a hard problem

I agree it's hard, but in my mind this sentence completes to “and therefore our only hope is to use most powerful tool we have available, neural networks.”

YeGoblynQueenne · on Nov 11, 2020

>> I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.

I must admit I'm very confused by this. I really thought you were talking about the problem of ordering three strings by length. What problem where you talking about, if not that?

Veedrac · on Nov 11, 2020

The problem from the benchmark is automatically learning the function, to check three strings are ordered by length, from examples.

What you are doing is categorically different, because you are manually guiding the search process by telling it which interim lines to generate. This ignores all the interesting parts of the challenge.

YeGoblynQueenne · on Nov 11, 2020

Ah, I see what you mean. You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.

For instance, I did not direct Louise to define shorter/2 by means of length/2. In order to do this I would have to specify length/2 as BK for shorter/2, but I didn't. Note the BK declarations in the experiment file I put on pastebin:

  background_knowledge(llength/2, [tail/2,p/2]).
  background_knowledge(shorter/2, [s/2]).
  % ground_peano/1 is added here so it's reported by list_mil_problem/1
  background_knowledge(ordered/3, [ground_peano/1]).
  background_knowledge(ordered_leq/3, [leq/2,ground_peano/1]).

Remember how in an erarlier comment I pointed out that the definition of shorter/2 changed to use leq/2, instead of s/2, when the examples of ordered_leq/3 where in the BK? Again, Louise figured that on its own.

In fact, this ability of Louise (actually, its learning procedure, Top Program Construction, or TPC) is kind of native, in the sense that TPC was originally conceived as an algorithm to select relevant background knowledge for a different learning system ("Thelma" for "Theory Learning Machine"; hence "Louise"). So it can figure out what BK it needs on its own. Automatic selection of relevant background knowledge in ILP was the original subject of my PhD research. Interestingly, it turns out that once we had a system that could perform this task, it could also learn its own programs.

Veedrac · on Nov 11, 2020

> You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.

No, I get this, it's just not really more than a surface level pretense of choice. The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.

The ‘choice’ you are offering Louise is something like, idk, m=5, n=2. 5^2 is not big. You argue about the terminology, but whatever you call it, it's still exponentially less interesting than the problem first posed. All the interesting work has been done for it, by you.

With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.

YeGoblynQueenne · on Nov 14, 2020

Many apologies for the delay in replying - I missed the "more" link at the bottom of the thread. And here I was, refreshing the page disappointed that no more criticism was forthcoming.

>> The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.

Indeed, the complexity of the raw, combinatorial problem is the greatest hurdle in solving it in the general sense, however this time complexity is calculated somewhat differently than in your comment. Let me show you.

First, in terms of ILP, the "number of functions you have available to use" is the number of predicate symbols defined in the BK, which I'll notate as p. "Where to put the parameters" is the number of body literals (similar to function calls) in each metarule, which I'll notate as k. I'll notate the number of metarules as m.

"The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program. What is relevant is the size of the target theory, i.e. its number of clauses (program lines), because of course a larger program means a larger number of combinations of our building blocks. I'll notate the size of the target theory as n.

Putting it all together, the time complexity of constructing a program of n clauses from p predicate symbols with m metarules with at most k body literals (of any arity) is O(pmᵏ⁺¹)ⁿ [1]. This is an exponential time complexity that corresponds to the size of the search space for programs that can be constructed from these components, i.e. that's the number of constructible programs. The time complexity of the problem is such that even n = 5 is sufficient to completely bog down a powerful modern computer.

Louise can manage it because it doesn't conduct a search of that space, instead it only constructs a unique object in that space, the Top program, that can be constructed in polynomial time O(pmᵏ⁺¹) [2], i.e. the number of constructible clauses. Indeed, Louise is capable of learning large programs, of a few thousand clauses in a few minutes. In other words, the problem is manageable because of the advances encapsulated by Louise's learning procedure, Top Program Construction, not because the problem is trivial, as you portray it - and not because I'm leading Louise by the hand, as you suggest. Even if I was leading Louise by the hand, the combinatorial space of constructible programs would still grow exponentially.

Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom). To my knowledge, this was first discussed in the following:

  1. Introduction
                                                                                 
  This paper addresses a deep difficulty with the generalization problem as
  defined above: If consistency with the training instances is taken as the sole
  determiner of appropriate generalizations, then a program can never make the
  inductive leap necessary to classify instances beyond those it has observed.
  Only if the program has other sources of information, or biases for choosing
  one generalization over the other, can it non-arbitrarily classify instances
  beyond those in the training set. In this paper, we use the term bias to
  refer to any basis for choosing one generalization over another, other than
  strict consistency with the observed training instances.

  (...)

  3. The Futility of Removing Biases

  (...)

  Although removing all biases from a generalization system may seem to be a
  desirable goal, in fact the result is nearly useless. An unbiased learning
  system’s ability to classify new instances is no better than if it simply
  stored all the training instances and performed a lookup when asked to
  classify a subsequent instance.

  Ref: *"The Need for Biases in Learning Generalizations", T.M. Mitchell,
  Rutgers Computer Science Department Technical Report CBM-TR-117, May, 1980.
  Reprinted in Readings in Machine Learning, J. Shavlik and T. Dietterich,
  eds., Morgan Kaufmann, 1990.*

  http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf

But this is another reason to read old AI papers: to avoid falling down the same holes people have already thoroughly explored in years gone by.

>> With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.

Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.

________________

[1] https://www.doc.ic.ac.uk/~shm/Papers/ECAI-546.pdf

See section 2.1. Language classes, expressivity and complexity for a sketch proof.

[2] Upcoming work, currently in review.

Veedrac · on Nov 17, 2020

> "The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program.

I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.

> Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.

How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.

If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.

> Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom).

Yes, and as usual for GOFAI it doesn't solve the problem.

YeGoblynQueenne · on Nov 19, 2020

>> I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.

I understand, but we don't need to take that term into account. We're only interested in the cost of a blind combinatorial search. Even if we added that term in, we'd just get a higher complexity.

>> How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.

I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.

>> If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.

There are only 1280 clauses, but there are about 2 billion programs of size n = 3 that can be constructed with those clauses, one of which is the target program. The challenge is to find the target program in that 2 billion.

The advantage of Louise is that it only needs to look at the 1280 clauses, not the 2 billion programs. The trick is to find how to do that. It's like the difference between sorting a list with bubblesort and quicksort. Quicksort has to do a lot less work, but the trick is figuring out quicksort.

>> Yes, and as usual for GOFAI it doesn't solve the problem.

Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.

Veedrac · on Nov 20, 2020

> I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.

I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?

> There are only 1280 clauses, but there are about 2 billion programs

Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.

> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.

That's literally what DL is.

You can also learn programs from examples. People do it all the time. What else would you call the Abstraction and Reasoning Challenge, https://www.kaggle.com/boliu0/visualizing-all-task-pairs-wit...?

YeGoblynQueenne · on Nov 20, 2020

Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures, hence for example CNNs are used for image recognition and LSTMs for sequence learning etc. Without these biases deep neural nets would not be able to generalise as well as they do (in the sense of local generalisation but not global generalisation as meant by François Chollet [1]).

The biggest advances in deep neural nets have come from the discovery and use of good inductive biases: training with gradient descent, backpropagation, more hidden layers, the "constant error carousel", convolutional layers, ReLu over sigmoid, attention, etc, etc. One could say that deep neural nets are all about good inductive bias.

It's interesting that you bring up the ARC dataset. The paper that introduced it (also from Chollet) [2] makes a strong claim about the necessity of "knowledge priors" for a system to be considered intelligent. These are described at length in section III.1.2 "Core knowledge priors" and are exactly a set of strong inductive biases that the author of the paper considers necessary for a machine learning system to solve the ARC tasks and that consist of such problem-specific biases as object cohesion, object persistence, object influence via contact, etc. It is exactly such "knowledge priors" that are encoded as background knowledge in ILP systems.

Indeed, in the ARC challenge on Kaggle, the best-performing systems (i.e. the ones that solved the most tasks) were crude approximations of the ILP approach: a library of hand-crafted functions and a brute-force search procedure to combine them. I note also that attempts to use deep learning to solve the challenge didn't go anywhere.

Humans also have strong inductive biases that help us solve such problems. But I'm not the best placed to discuss all this - I'm not a cognitive scientist.

In the end, what you are asking for is magick: a learner that learns only from examples, without any preconceived notions about how to learn from those examples, or what to learn from them. There is no such machine learning system.

>> Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.

I don't understand what you mean "check for arbitrary programs". I can give Louise zero BK and metarules and ask it to generate all Prolog programs, say. Prolog is a Turing complete language so that would give me the set of all programs computable by a Universal Turing Machine (it would take a while). But what would that achieve?

At this point I'm not sure I understand what your remaining objections are against the approach I showed you. For the purpose of learning arbitrary programs it works better than anything else. Of course it's not magick. Perhaps you should take my suggestion to think about the problem a bit more carefully, if you're really intersted in it. Or are you? I mean, if you consider AI solved, e.g. by GPT-3, then I can see how you wouldn't be interested in thinking any further about the issue.

_________________

[1] https://blog.keras.io/the-limitations-of-deep-learning.html

[2] https://arxiv.org/abs/1911.01547

P.S. To clarify, I'm keeping this discussion up for your sake, albeit eagerly. You have expressed some strongly held, but incorrect opinions that it seems to me you have acquired by consulting inexpert sources, probably because you have a day job that has nothing to do with AI and doesn't leave you enough time to study the matter properly. My day job is to study AI and I feel that such a privilege is only justified if I spend time and effort to help others improve their knowledge on the subject. I'm guessing that on your part, you're more interested in "winning" the conversation, but please try to gain something from our interaction, otherwise all the time we both spent at it would be to waste. When this is over, try to dig out and read some authoritative sources. I would advise you on which ones - but you'd probably resist my recommendation anyway, so you're on your own there.

Veedrac · on Nov 20, 2020

My initial response was a fairly kneejerk reaction to the snark. The following is a rewrite. Please don't; if you really think so little of me, rather don't reply than reply unpleasantly.

> Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures

“Solomonoff Induction does not learn only from evidence! It encodes strong inductive biases in its construction and choice of Turing machine...”

but it doesn't matter. Our universe is not a random soup of maximal entropy.

The tasks I am talking about solving are overtly not impossible.

You talk about ML methods like the success of, say, image recognition comes from image-recognition-specific architectures. You mention ‘hand-engineered’ or ‘hand-tuned’. And yet, to throw your snark back at you, if you were up to date with the literature, you would know this is not true.

Consider ViT as an example. The same Transformer, the same minimal inductive biases, work as well for language modelling as for image segmentation as for proof search—the only difference perhaps that ViT works on patches for efficiency, though the paper shows that probably hurts performance in the limit. All it takes is an appropriate quantity of data to learn the appropriate task-specific adaptations the network needs. Heck, even cross-domain works; it's all one architecture, so it's all one inductive bias.

To my mind, this is what it means to learn from examples. There is no way that an architecture designed for language translation could also encode task-specific priors for these different tasks.

For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.

ML people call that data leakage.

> I don't understand what you mean. Yes, Louise can check for arbitrary programs. I can give it zero BK and metarules and ask it to generate all Prolog programs, say.

Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.

YeGoblynQueenne · on Nov 20, 2020

I see I made you feel bad with my advice to read up. I'm sorry, because that was not my intention. However, you really do need to take my advise seriously. You've insisted throughout our conversation that you don't need to read older machine learning or AI papers because they're not relevant anymore. And yet, they are. And you do need to read them because without them you will not be able to understand the recent developments you seem to be intersted in.

Take for instance your example of ViT. This is a transformer, so it's clearly not an unbiased generaliser that learns only from examples. You say so yourself: "it's all one inductive bias". Yes, that's how machine learning works and deep neural nets don't do anything different, neither do they learn only from examples, as you seemed to suggest in your previous comment (you replied "That's literally what DL is" to my comment that "you can't learn only from examples").

But I think you misunderstood my comment about how the biggest advances in deep neural nets have come from purpose-built architectures. That is not to say that the same architectures cannot be applied to different domains- but the state of the art systems are always fine-tuned for specific tasks or datasets. This hasn't changed recently and it hasn't changed in the last 30 years.

>> For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.

A truly bias-free algorithm is not "random bytes". It's a learner that memorises its training examples and can only recognise its training examples. Hence why it can't generalise. This is in Mitchell's paper which I suggested you read.

Louise's biases are not problem-specific in the short example I showed you. I defined BK predicates with wide applicability in programs processing lists and numbers. There is no such limitation, theoretical or practical, in the general sense, either. You can give Louise a million irrelevant BK predicates, if you like, and it will still find the ones it needs to complete the learning task assuming they're in there somewhere. In fact, it will find all of the relevant ones - and return the superset of all programs that solve the task (so you can use it for example to identify interesting relations in your dataset). Like I say in a previous comment, Louise's learning algorithm was originally designed to select relevant BK. Additionally, like I said in an earlier comment, Louise can learn its own bias, both the BK and the metarules, so it is not only not limited to task-specific bias, it is not even limited to user-provided bias. Under some circumstances it can even invent new examples. And then use them to learn a hypothesis that generalises better to unseen examples. *

>> Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.

I don't follow. What do you mean?

Veedrac · on Nov 20, 2020

> Louise's biases are not problem-specific in the short example I showed you.

This is clearly untrue.

You were customizing the BK to each specific task. You were also customizing the stepping stones for each specific task.

Justifications can come later. At least admit that you customized the BK for each problem instance and prior to doing so the solver did not solve the problem asked.

Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.

YeGoblynQueenne · on Nov 21, 2020

I did not "customize the BK to each specific task". You can go back and see what I did. I provided some generic BK predicates that manipulate lists and numbers, I defined some metarules and I gave a few examples of each program's inputs and outputs.

I don't understand your criticism and I don't understand what you want me to "at least admit".

>> This is clearly untrue.

Can you show me which biases in the example I showed are problem-specific?

>> Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.

I don't think I missed your point. I think you, yourself, are horribly confused about what point you are trying to make. And the reason of course is that you want to be able to express strong opinions about AI and machine learning, but you don't want to have to do the hard work to understand the subject. So you keep saying "five impossible things before breakfast", like asking for a learner that learns only from examples, or saying that's what deep learning is, etc.

I'm sorry but despite what the article above suggests, there is't an easy way to being an expert- not even in machine learning. If you want to know what you're talking about, then you'll have to do your homework.

YeGoblynQueenne · on Nov 14, 2020

I can't edit my post but I forgot to calculate the actual size of the hypothesis space for the multi-predicate problem for ordered/3. That is: p = 5 (tail/2, p/2, s/2 and llength/2 and shorter/2), m = 4, k = 3, n = 3 (the target theory is a clause for each target). The size of the hypothesis search space, i.e. the total number of programs of size n = 3 is (pm^(k+1))^n = 2,097,152,000.

As I say above, Louise's TPC procedure avoids searching this space and so effectively ignores the exponential term, reducing the complexity of the problem to that of, in the worst case, enumerating (pm^(k+1)) = 1,280 clauses.

In other words, the problem is easy for Louise, not because the problem itself is trivial, but because Louise's learning procedure, TPC, is efficient.

For a further example, in the upcoming publication I mention above, Louise is shown to learn a 2,567 clause theory in under 220 seconds with perfect accuracy after training on 20% of all training examples. The hypothesis space for this problem (grid world navigation) is in the order of 2*10^4944 but Louise shrinks it to the problem of enumerating, at worst, a little over 81 million clauses.