Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Aren't you shifting the goalposts a bit? First you ask me how I'd solve this problem with ILP:

“Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise” (a)

I say this is a trivial problem to solve with ILP and I show you a, well, trivial solution and you complain that it's - trivial.

Then I show you a more elaborate version that learns sub-programs on the way to the full solution and you say that a) it doesn't solve a different problem, with ≤ instead of < and b) that it's not trivial anymore.

And now you're saying you want a solution that learns from examples only. You would have saved us both a lot of time had you clarified your expectations up front.

No matter. There isn't anything that can do what you ask. Or rather, there are many approaches that could learn (a) just from examples, with a brute-force search. But there is no approach that could learn arbitrary programs only from examples. The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.

Most machine learning approaches that learn programs from examples adopt some sort of inductive bias to guide a search for a program that satisfies some set of goodness criteria, including neural approaches [1]. In ILP, inductive bias consists primarily of BK and language bias (like the metarules in Louise). ILP has a certain advantage in this, in that the languages of examples, bias and hypotheses are the same (some first order logic language, like Prolog or ASP) and so ILP systems can learn their own bias, like Louise can learn its own BK and metarules. By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK). Of course you need to start somewhere, from obvious primitives like head/2, tail/2, s/2 and p/2 that I used above.

But I digress. The bottom line is that learning arbitrary programs from examples is a hard problem for any machine learning approach [2]. Classification is a piece of cake, by comparison. And that is why there has been so little progress in this problem even after decades of research [3].

The take home message of course is that neural nets are not the end of the line in AI research and it would be disastrous for the progress of the field to allow research into neural nets to eclipse every other approach. If this happens it will all have to be discovered again, from scratch. And in another 70 years.

____________________

[1] e.g. see Learning explanatory rules from noisy data https://arxiv.org/abs/1711.04574 by DeepMind, which also uses metarules.

[2] See for example:

Deep Learning for Program Synthesis

Synthesizing a program from a specification has been a long-standing challenge.

(...)

This problem is extremely challenging, and the complexity of the synthesized programs by existing approaches is still limited.

https://sunblaze-ucb.github.io/program-synthesis/index.html

[3] This is where I'd normally say that there have been recent breakthroughs that promise to overturn years of slow progress, but that's a story for another time (and another venue most like).



> Aren't you shifting the goalposts a bit?

No, I don't think so. My original comment was

“I have looked at the program synthesis literature before and it really does not seem very advanced to me. The General Program Synthesis Benchmark Suite lists unsolved benchmarks like “Given three strings n1, n2, and n3, return true if length(n1) < length(n2) < length(n3), and false otherwise”, and that's with 100 examples. So, probably less practically useful than GPT-3, which wasn't even trained on the task.”

I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.

> The reason is that the space of all programs that can be computed by a Universal Turing Machine ("arbitrary") is infinite and any learner trying to find one of them blindly, without some kind of hint to guide it, would be lost for ever inside it.

I'm not looking to solve arbitrary programs. I'm looking, at least at this first hurdle, to solve a few largely trivial ones.

If you saw 100 examples, you would be able to solve this problem, or at least get very close. Therefore this sort of theoretical argument cannot hold.

> By way of comparison, neural nets, with their hand-crafted architectures, minutely fine-tuned to specific domains or even particular datasets, cannot do that (e.g. a trained model can't be used as a feature to another neural net, in the way that ILP hypotheses can be used as BK).

I assume you've seen GPT-f? (https://arxiv.org/abs/2009.03393)

Section 4.7.1 shows this is entirely feasible for neural networks, though the technique is very different.

Though I prefer even more the elegance (and general absurdity) of learning the bias by pretraining on WebMath (GitHub, arXiv Math, Math StackExchange).

> The bottom line is that learning arbitrary programs from examples is a hard problem

I agree it's hard, but in my mind this sentence completes to “and therefore our only hope is to use most powerful tool we have available, neural networks.”


>> I only asked about ILP to clarify your defense. I maintain, after seeing the ILP you were referring to, that the defence doesn't meaningfully affect my point, that the problem you claim to be solving is not the one I was talking about, and not meaningfully more interesting.

I must admit I'm very confused by this. I really thought you were talking about the problem of ordering three strings by length. What problem where you talking about, if not that?


The problem from the benchmark is automatically learning the function, to check three strings are ordered by length, from examples.

What you are doing is categorically different, because you are manually guiding the search process by telling it which interim lines to generate. This ignores all the interesting parts of the challenge.


Ah, I see what you mean. You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.

For instance, I did not direct Louise to define shorter/2 by means of length/2. In order to do this I would have to specify length/2 as BK for shorter/2, but I didn't. Note the BK declarations in the experiment file I put on pastebin:

  background_knowledge(llength/2, [tail/2,p/2]).
  background_knowledge(shorter/2, [s/2]).
  % ground_peano/1 is added here so it's reported by list_mil_problem/1
  background_knowledge(ordered/3, [ground_peano/1]).
  background_knowledge(ordered_leq/3, [leq/2,ground_peano/1]).
Remember how in an erarlier comment I pointed out that the definition of shorter/2 changed to use leq/2, instead of s/2, when the examples of ordered_leq/3 where in the BK? Again, Louise figured that on its own.

In fact, this ability of Louise (actually, its learning procedure, Top Program Construction, or TPC) is kind of native, in the sense that TPC was originally conceived as an algorithm to select relevant background knowledge for a different learning system ("Thelma" for "Theory Learning Machine"; hence "Louise"). So it can figure out what BK it needs on its own. Automatic selection of relevant background knowledge in ILP was the original subject of my PhD research. Interestingly, it turns out that once we had a system that could perform this task, it could also learn its own programs.


> You mean that I explicitly told Louise how to compose each program from its sub-programs. In truth, I did not. In the incremental learning problems for ordered/3 and ordered_leq/3 I gave Louise three learning targets and a few primitives from which to construct them. I specifically and very deliberately did not tell it to use each target to define another. It figured it out on its own.

No, I get this, it's just not really more than a surface level pretense of choice. The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.

The ‘choice’ you are offering Louise is something like, idk, m=5, n=2. 5^2 is not big. You argue about the terminology, but whatever you call it, it's still exponentially less interesting than the problem first posed. All the interesting work has been done for it, by you.

With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.


Many apologies for the delay in replying - I missed the "more" link at the bottom of the thread. And here I was, refreshing the page disappointed that no more criticism was forthcoming.

>> The hard thing about generating programs is that there are many possible programs; something like m^n, where ’m’ is the number of functions you have available to use (say, ~1000) and `n` is the number of steps the program needs to take (say, ~5 in this case), and there's another factor for where to put the parameters which here is low enough to be mostly negligible. It turns out even 1000^5 is really big, so this problem is hard if you don't do it smart.

Indeed, the complexity of the raw, combinatorial problem is the greatest hurdle in solving it in the general sense, however this time complexity is calculated somewhat differently than in your comment. Let me show you.

First, in terms of ILP, the "number of functions you have available to use" is the number of predicate symbols defined in the BK, which I'll notate as p. "Where to put the parameters" is the number of body literals (similar to function calls) in each metarule, which I'll notate as k. I'll notate the number of metarules as m.

"The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program. What is relevant is the size of the target theory, i.e. its number of clauses (program lines), because of course a larger program means a larger number of combinations of our building blocks. I'll notate the size of the target theory as n.

Putting it all together, the time complexity of constructing a program of n clauses from p predicate symbols with m metarules with at most k body literals (of any arity) is O(pmᵏ⁺¹)ⁿ [1]. This is an exponential time complexity that corresponds to the size of the search space for programs that can be constructed from these components, i.e. that's the number of constructible programs. The time complexity of the problem is such that even n = 5 is sufficient to completely bog down a powerful modern computer.

Louise can manage it because it doesn't conduct a search of that space, instead it only constructs a unique object in that space, the Top program, that can be constructed in polynomial time O(pmᵏ⁺¹) [2], i.e. the number of constructible clauses. Indeed, Louise is capable of learning large programs, of a few thousand clauses in a few minutes. In other words, the problem is manageable because of the advances encapsulated by Louise's learning procedure, Top Program Construction, not because the problem is trivial, as you portray it - and not because I'm leading Louise by the hand, as you suggest. Even if I was leading Louise by the hand, the combinatorial space of constructible programs would still grow exponentially.

Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom). To my knowledge, this was first discussed in the following:

  1. Introduction
                                                                                 
  This paper addresses a deep difficulty with the generalization problem as
  defined above: If consistency with the training instances is taken as the sole
  determiner of appropriate generalizations, then a program can never make the
  inductive leap necessary to classify instances beyond those it has observed.
  Only if the program has other sources of information, or biases for choosing
  one generalization over the other, can it non-arbitrarily classify instances
  beyond those in the training set. In this paper, we use the term bias to
  refer to any basis for choosing one generalization over another, other than
  strict consistency with the observed training instances.

  (...)

  3. The Futility of Removing Biases

  (...)

  Although removing all biases from a generalization system may seem to be a
  desirable goal, in fact the result is nearly useless. An unbiased learning
  system’s ability to classify new instances is no better than if it simply
  stored all the training instances and performed a lookup when asked to
  classify a subsequent instance.

  Ref: *"The Need for Biases in Learning Generalizations", T.M. Mitchell,
  Rutgers Computer Science Department Technical Report CBM-TR-117, May, 1980.
  Reprinted in Readings in Machine Learning, J. Shavlik and T. Dietterich,
  eds., Morgan Kaufmann, 1990.*

  http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf

But this is another reason to read old AI papers: to avoid falling down the same holes people have already thoroughly explored in years gone by.

>> With the way you first laid out the question, there's a good chance (>1%) I could have gotten the answer mostly right (up to parameter order) without looking at the examples, just the background knowledge and the target type.

Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.

________________

[1] https://www.doc.ic.ac.uk/~shm/Papers/ECAI-546.pdf

See section 2.1. Language classes, expressivity and complexity for a sketch proof.

[2] Upcoming work, currently in review.


> "The number of steps the program needs to take" is not relevant to the calculation: we are trying to calculate the complexity of constructing the program by blindly combining a set of building blocks (BK predicates and metarules)- not the complexity of executing the program.

I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.

> Have you tried doing that? I suggest you do- if only to get a feel for the true difficulty of the problem.

How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.

If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.

> Regarding learning "only from examples" as I understand you to mean it, there is some literature on that, of the kind you say is not "AI" (i.e. it predates 2012's deep learning boom).

Yes, and as usual for GOFAI it doesn't solve the problem.


>> I meant in the sense of denotational semantics, so looping (or rather, finding a fixed point of a loop) is one ‘step’.

I understand, but we don't need to take that term into account. We're only interested in the cost of a blind combinatorial search. Even if we added that term in, we'd just get a higher complexity.

>> How exactly would you suggest? By the nature of my criticism, I can't construct the Louise BK blind.

I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.

>> If anything you agree here; you say there are only 1,280 clauses it could construct, so I only need discriminatory power of 1-in-13 to have a >1% chance of getting the answer right.

There are only 1280 clauses, but there are about 2 billion programs of size n = 3 that can be constructed with those clauses, one of which is the target program. The challenge is to find the target program in that 2 billion.

The advantage of Louise is that it only needs to look at the 1280 clauses, not the 2 billion programs. The trick is to find how to do that. It's like the difference between sorting a list with bubblesort and quicksort. Quicksort has to do a lot less work, but the trick is figuring out quicksort.

>> Yes, and as usual for GOFAI it doesn't solve the problem.

Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.


> I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.

I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?

> There are only 1280 clauses, but there are about 2 billion programs

Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.

> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.

That's literally what DL is.

You can also learn programs from examples. People do it all the time. What else would you call the Abstraction and Reasoning Challenge, https://www.kaggle.com/boliu0/visualizing-all-task-pairs-wit...?


Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures, hence for example CNNs are used for image recognition and LSTMs for sequence learning etc. Without these biases deep neural nets would not be able to generalise as well as they do (in the sense of local generalisation but not global generalisation as meant by François Chollet [1]).

The biggest advances in deep neural nets have come from the discovery and use of good inductive biases: training with gradient descent, backpropagation, more hidden layers, the "constant error carousel", convolutional layers, ReLu over sigmoid, attention, etc, etc. One could say that deep neural nets are all about good inductive bias.

It's interesting that you bring up the ARC dataset. The paper that introduced it (also from Chollet) [2] makes a strong claim about the necessity of "knowledge priors" for a system to be considered intelligent. These are described at length in section III.1.2 "Core knowledge priors" and are exactly a set of strong inductive biases that the author of the paper considers necessary for a machine learning system to solve the ARC tasks and that consist of such problem-specific biases as object cohesion, object persistence, object influence via contact, etc. It is exactly such "knowledge priors" that are encoded as background knowledge in ILP systems.

Indeed, in the ARC challenge on Kaggle, the best-performing systems (i.e. the ones that solved the most tasks) were crude approximations of the ILP approach: a library of hand-crafted functions and a brute-force search procedure to combine them. I note also that attempts to use deep learning to solve the challenge didn't go anywhere.

Humans also have strong inductive biases that help us solve such problems. But I'm not the best placed to discuss all this - I'm not a cognitive scientist.

In the end, what you are asking for is magick: a learner that learns only from examples, without any preconceived notions about how to learn from those examples, or what to learn from them. There is no such machine learning system.

>> Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.

I don't understand what you mean "check for arbitrary programs". I can give Louise zero BK and metarules and ask it to generate all Prolog programs, say. Prolog is a Turing complete language so that would give me the set of all programs computable by a Universal Turing Machine (it would take a while). But what would that achieve?

At this point I'm not sure I understand what your remaining objections are against the approach I showed you. For the purpose of learning arbitrary programs it works better than anything else. Of course it's not magick. Perhaps you should take my suggestion to think about the problem a bit more carefully, if you're really intersted in it. Or are you? I mean, if you consider AI solved, e.g. by GPT-3, then I can see how you wouldn't be interested in thinking any further about the issue.

_________________

[1] https://blog.keras.io/the-limitations-of-deep-learning.html

[2] https://arxiv.org/abs/1911.01547

P.S. To clarify, I'm keeping this discussion up for your sake, albeit eagerly. You have expressed some strongly held, but incorrect opinions that it seems to me you have acquired by consulting inexpert sources, probably because you have a day job that has nothing to do with AI and doesn't leave you enough time to study the matter properly. My day job is to study AI and I feel that such a privilege is only justified if I spend time and effort to help others improve their knowledge on the subject. I'm guessing that on your part, you're more interested in "winning" the conversation, but please try to gain something from our interaction, otherwise all the time we both spent at it would be to waste. When this is over, try to dig out and read some authoritative sources. I would advise you on which ones - but you'd probably resist my recommendation anyway, so you're on your own there.


My initial response was a fairly kneejerk reaction to the snark. The following is a rewrite. Please don't; if you really think so little of me, rather don't reply than reply unpleasantly.

> Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures

“Solomonoff Induction does not learn only from evidence! It encodes strong inductive biases in its construction and choice of Turing machine...”

but it doesn't matter. Our universe is not a random soup of maximal entropy.

The tasks I am talking about solving are overtly not impossible.

You talk about ML methods like the success of, say, image recognition comes from image-recognition-specific architectures. You mention ‘hand-engineered’ or ‘hand-tuned’. And yet, to throw your snark back at you, if you were up to date with the literature, you would know this is not true.

Consider ViT as an example. The same Transformer, the same minimal inductive biases, work as well for language modelling as for image segmentation as for proof search—the only difference perhaps that ViT works on patches for efficiency, though the paper shows that probably hurts performance in the limit. All it takes is an appropriate quantity of data to learn the appropriate task-specific adaptations the network needs. Heck, even cross-domain works; it's all one architecture, so it's all one inductive bias.

To my mind, this is what it means to learn from examples. There is no way that an architecture designed for language translation could also encode task-specific priors for these different tasks.

For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.

ML people call that data leakage.

> I don't understand what you mean. Yes, Louise can check for arbitrary programs. I can give it zero BK and metarules and ask it to generate all Prolog programs, say.

Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.


I see I made you feel bad with my advice to read up. I'm sorry, because that was not my intention. However, you really do need to take my advise seriously. You've insisted throughout our conversation that you don't need to read older machine learning or AI papers because they're not relevant anymore. And yet, they are. And you do need to read them because without them you will not be able to understand the recent developments you seem to be intersted in.

Take for instance your example of ViT. This is a transformer, so it's clearly not an unbiased generaliser that learns only from examples. You say so yourself: "it's all one inductive bias". Yes, that's how machine learning works and deep neural nets don't do anything different, neither do they learn only from examples, as you seemed to suggest in your previous comment (you replied "That's literally what DL is" to my comment that "you can't learn only from examples").

But I think you misunderstood my comment about how the biggest advances in deep neural nets have come from purpose-built architectures. That is not to say that the same architectures cannot be applied to different domains- but the state of the art systems are always fine-tuned for specific tasks or datasets. This hasn't changed recently and it hasn't changed in the last 30 years.

>> For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.

A truly bias-free algorithm is not "random bytes". It's a learner that memorises its training examples and can only recognise its training examples. Hence why it can't generalise. This is in Mitchell's paper which I suggested you read.

Louise's biases are not problem-specific in the short example I showed you. I defined BK predicates with wide applicability in programs processing lists and numbers. There is no such limitation, theoretical or practical, in the general sense, either. You can give Louise a million irrelevant BK predicates, if you like, and it will still find the ones it needs to complete the learning task assuming they're in there somewhere. In fact, it will find all of the relevant ones - and return the superset of all programs that solve the task (so you can use it for example to identify interesting relations in your dataset). Like I say in a previous comment, Louise's learning algorithm was originally designed to select relevant BK. Additionally, like I said in an earlier comment, Louise can learn its own bias, both the BK and the metarules, so it is not only not limited to task-specific bias, it is not even limited to user-provided bias. Under some circumstances it can even invent new examples. And then use them to learn a hypothesis that generalises better to unseen examples. *

>> Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.

I don't follow. What do you mean?


> Louise's biases are not problem-specific in the short example I showed you.

This is clearly untrue.

You were customizing the BK to each specific task. You were also customizing the stepping stones for each specific task.

Justifications can come later. At least admit that you customized the BK for each problem instance and prior to doing so the solver did not solve the problem asked.

Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.


I did not "customize the BK to each specific task". You can go back and see what I did. I provided some generic BK predicates that manipulate lists and numbers, I defined some metarules and I gave a few examples of each program's inputs and outputs.

I don't understand your criticism and I don't understand what you want me to "at least admit".

>> This is clearly untrue.

Can you show me which biases in the example I showed are problem-specific?

>> Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.

I don't think I missed your point. I think you, yourself, are horribly confused about what point you are trying to make. And the reason of course is that you want to be able to express strong opinions about AI and machine learning, but you don't want to have to do the hard work to understand the subject. So you keep saying "five impossible things before breakfast", like asking for a learner that learns only from examples, or saying that's what deep learning is, etc.

I'm sorry but despite what the article above suggests, there is't an easy way to being an expert- not even in machine learning. If you want to know what you're talking about, then you'll have to do your homework.


As I said, rather don't reply than reply unpleasantly. I'm cutting this here.


I don't know why you need to reply unpleasantly. It should be possible to give and receive criticism, even strong criticism, without having it turn into a flamewar just because we're on the internet.

Indeed, you yourself have criticised my work and my field mercilessly in this thread and I did not once reply with unpleasantness. In fact, what you keep dismissing as irrelevant and basically cheating (Louise) is my PhD research. I would be well within reason to be defensive about it. Instead, I believe I have remained polite and respectful towards you throughout and strove to answer all your questions about it.

Although you did take my criticism as snark, so this is perhaps something that is not entirely objective - you might perceive my criticism as a personal attack, say. Again, this should not be the case. In my field of work, criticism is what makes your work better and without criticism one never improves. So I do mean it when I say that my contribution to this thread was for your sake and to help you improve your knowledge of a subject you seem to be interested in.

In any case, I'm sorry this conversation turned sour. I didn't want to make you upset and I apologise for having done so.


> I believe I have remained polite and respectful towards you throughout

I disagree. I do not mind in the slightest being told I am wrong, or having my ideas criticized. But calling me too stupid to understand my own point, or too intellectually lazy to want to understand a subject, or to talk down to me like a child—that is not kosher. This conversation is not worth being attacked, or my day being made unpleasant because you choose not to avoid the impulse to throw insults.

To the other side of things, it might help calm you to know I never much considered what I was saying a criticism of Louise. Louise, from what I can tell, is fine, and an interesting take on the task. What I was objecting to was only the way you used it in the argument. A bike is cheating if you bring it to a 100 metre sprint, but that doesn't mean they serve no purpose. Eg. I do not consider SAT solvers particularly relevant to AI progress, but one can hardly deny they are quality tools.


As far as I can tell, I did not talk down to you as to a child, and I certainly did not call you intellectually lazy or stupid. I criticised the fact that you don't want to put in the hard work to understand the subject you are discussing, which is what you have stated from the start of the conversation, claiming you don't need to read up on the history of AI because it is not relevant (I'm paraphrasing your point but correct me if I misunderstood it).

It seems to me I am right to think that you took my criticism as an insult to your faculties. If I say something wrong, I expect to be corrected and criticised if I insist on it, but I don't take that as an insult.

>> To the other side of things, it might help calm you to know I never much considered what I was saying a criticism of Louise.

And still you persist with the same style of commenting. "Calm" me? And you complain that I talk down to you? You have replied to my original comment with arrogance to tell me that my entire field of study is "not AI" and irrelevant - and then continued to insist you don't need to know anything about the ~70 years of work you dismiss even when it became clear that this only causes you to make elementary errors. You speak of things you know nothing about with great conviction and then you get upset with me for pointing out this can only result in errors and confusion. Given all that, I have shown great patience and courtesy. Others would have just ignored you as ignorant and unwilling to learn. I gave you the benefit of the doubt. Was that a mistake?


> "Calm" me?

A poor choice of words, sorry. I meant, I understood you to be saying you found the criticism of Louise unpleasant, and I thought it would lessen that to know that I didn't and don't think Louise was bad.


I can't edit my post but I forgot to calculate the actual size of the hypothesis space for the multi-predicate problem for ordered/3. That is: p = 5 (tail/2, p/2, s/2 and llength/2 and shorter/2), m = 4, k = 3, n = 3 (the target theory is a clause for each target). The size of the hypothesis search space, i.e. the total number of programs of size n = 3 is (pm^(k+1))^n = 2,097,152,000.

As I say above, Louise's TPC procedure avoids searching this space and so effectively ignores the exponential term, reducing the complexity of the problem to that of, in the worst case, enumerating (pm^(k+1)) = 1,280 clauses.

In other words, the problem is easy for Louise, not because the problem itself is trivial, but because Louise's learning procedure, TPC, is efficient.

For a further example, in the upcoming publication I mention above, Louise is shown to learn a 2,567 clause theory in under 220 seconds with perfect accuracy after training on 20% of all training examples. The hypothesis space for this problem (grid world navigation) is in the order of 2*10^4944 but Louise shrinks it to the problem of enumerating, at worst, a little over 81 million clauses.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: