> I mean, try to define some BK and generate combinations of it until you solve the problem. You say that I made the problem easy because I defined some BK by hand. I suggested you try doing that to see whether the problem is as easy as you think.
I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?
> There are only 1280 clauses, but there are about 2 billion programs
Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.
> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
Deep neural nets do not learn only from examples! They encode strong inductive
biases in their carefully hand-engineered and hand-tuned architectures, hence
for example CNNs are used for image recognition and LSTMs for sequence
learning etc. Without these biases deep neural nets would not be able to
generalise as well as they do (in the sense of local generalisation but not
global generalisation as meant by François Chollet [1]).
The biggest advances in deep neural nets have come from the discovery and use
of good inductive biases: training with gradient descent, backpropagation,
more hidden layers, the "constant error carousel", convolutional layers, ReLu
over sigmoid, attention, etc, etc. One could say that deep neural nets are all
about good inductive bias.
It's interesting that you bring up the ARC dataset. The paper that introduced
it (also from Chollet) [2] makes a strong claim about the necessity of
"knowledge priors" for a system to be considered intelligent. These are
described at length in section III.1.2 "Core knowledge priors" and are exactly
a set of strong inductive biases that the author of the paper considers
necessary for a machine learning system to solve the ARC tasks and that
consist of such problem-specific biases as object cohesion, object
persistence, object influence via contact, etc. It is exactly such "knowledge
priors" that are encoded as background knowledge in ILP systems.
Indeed, in the ARC challenge on Kaggle, the best-performing systems (i.e. the
ones that solved the most tasks) were crude approximations of the ILP
approach: a library of hand-crafted functions and a brute-force search
procedure to combine them. I note also that attempts to use deep learning to
solve the challenge didn't go anywhere.
Humans also have strong inductive biases that help us solve such problems. But
I'm not the best placed to discuss all this - I'm not a cognitive scientist.
In the end, what you are asking for is magick: a learner that learns only from
examples, without any preconceived notions about how to learn from those
examples, or what to learn from them. There is no such machine learning
system.
>> Doesn't matter, since the programmer gives that information. As you admit,
Louise cannot check for arbitrary programs.
I don't understand what you mean "check for arbitrary programs". I can give
Louise zero BK and metarules and ask it to generate all Prolog programs, say.
Prolog is a Turing complete language so that would give me the set of all
programs computable by a Universal Turing Machine (it would take a while). But
what would that achieve?
At this point I'm not sure I understand what your remaining objections are
against the approach I showed you. For the purpose of learning arbitrary
programs it works better than anything else. Of course it's not magick.
Perhaps you should take my suggestion to think about the problem a bit more
carefully, if you're really intersted in it. Or are you? I mean, if you
consider AI solved, e.g. by GPT-3, then I can see how you wouldn't be
interested in thinking any further about the issue.
P.S. To clarify, I'm keeping this discussion up for your sake, albeit eagerly.
You have expressed some strongly held, but incorrect opinions that it seems to
me you have acquired by consulting inexpert sources, probably because you have
a day job that has nothing to do with AI and doesn't leave you enough time to
study the matter properly. My day job is to study AI and I feel that such a
privilege is only justified if I spend time and effort to help others improve
their knowledge on the subject. I'm guessing that on your part, you're more
interested in "winning" the conversation, but please try to gain something
from our interaction, otherwise all the time we both spent at it would be to
waste. When this is over, try to dig out and read some authoritative sources.
I would advise you on which ones - but you'd probably resist my recommendation
anyway, so you're on your own there.
My initial response was a fairly kneejerk reaction to the snark. The following is a rewrite. Please don't; if you really think so little of me, rather don't reply than reply unpleasantly.
> Deep neural nets do not learn only from examples! They encode strong inductive biases in their carefully hand-engineered and hand-tuned architectures
“Solomonoff Induction does not learn only from evidence! It encodes strong inductive biases in its construction and choice of Turing machine...”
but it doesn't matter. Our universe is not a random soup of maximal entropy.
The tasks I am talking about solving are overtly not impossible.
You talk about ML methods like the success of, say, image recognition comes from image-recognition-specific architectures. You mention ‘hand-engineered’ or ‘hand-tuned’. And yet, to throw your snark back at you, if you were up to date with the literature, you would know this is not true.
Consider ViT as an example. The same Transformer, the same minimal inductive biases, work as well for language modelling as for image segmentation as for proof search—the only difference perhaps that ViT works on patches for efficiency, though the paper shows that probably hurts performance in the limit. All it takes is an appropriate quantity of data to learn the appropriate task-specific adaptations the network needs. Heck, even cross-domain works; it's all one architecture, so it's all one inductive bias.
To my mind, this is what it means to learn from examples. There is no way that an architecture designed for language translation could also encode task-specific priors for these different tasks.
For sure, one might call this ‘strong inductive biases’, in that the program is not random bytes (as a truly bias-free algorithm must be), but please at least admit that this is a complete different conceptual plane to the sort of biases you give Louise. Louise's biases aren't merely task specific, they're problem-specific. It would be one thing if Louise's biases were a handwritten web of a million BK rules: fine, whatever, as long as it solves the task that is obviously possible to solve. But they're not, they're tuned per example.
ML people call that data leakage.
> I don't understand what you mean. Yes, Louise can check for arbitrary programs. I can give it zero BK and metarules and ask it to generate all Prolog programs, say.
Louise can perhaps generate all Prolog programs. Louise cannot search the space of Prolog programs.
I see I made you feel bad with my advice to read up. I'm sorry, because that
was not my intention. However, you really do need to take my advise seriously.
You've insisted throughout our conversation that you don't need to read older
machine learning or AI papers because they're not relevant anymore. And yet,
they are. And you do need to read them because without them you will not be
able to understand the recent developments you seem to be intersted in.
Take for instance your example of ViT. This is a transformer, so it's clearly
not an unbiased generaliser that learns only from examples. You say so
yourself: "it's all one inductive bias". Yes, that's how machine learning
works and deep neural nets don't do anything different, neither do they learn
only from examples, as you seemed to suggest in your previous comment (you
replied "That's literally what DL is" to my comment that "you can't learn only
from examples").
But I think you misunderstood my comment about how the biggest advances in
deep neural nets have come from purpose-built architectures. That is not to
say that the same architectures cannot be applied to different domains- but
the state of the art systems are always fine-tuned for specific tasks or
datasets. This hasn't changed recently and it hasn't changed in the last 30
years.
>> For sure, one might call this ‘strong inductive biases’, in that the
program is not random bytes (as a truly bias-free algorithm must be), but
please at least admit that this is a complete different conceptual plane to
the sort of biases you give Louise. Louise's biases aren't merely task
specific, they're problem-specific. It would be one thing if Louise's biases
were a handwritten web of a million BK rules: fine, whatever, as long as it
solves the task that is obviously possible to solve. But they're not, they're
tuned per example.
A truly bias-free algorithm is not "random bytes". It's a learner that
memorises its training examples and can only recognise its training examples.
Hence why it can't generalise. This is in Mitchell's paper which I suggested
you read.
Louise's biases are not problem-specific in the short example I showed
you. I defined BK predicates with wide applicability in programs processing
lists and numbers. There is no such limitation, theoretical or practical, in
the general sense, either. You can give Louise a million irrelevant BK
predicates, if you like, and it will still find the ones it needs to complete
the learning task assuming they're in there somewhere. In fact, it will find
all of the relevant ones - and return the superset of all programs that
solve the task (so you can use it for example to identify interesting
relations in your dataset). Like I say in a previous comment, Louise's
learning algorithm was originally designed to select relevant BK.
Additionally, like I said in an earlier comment, Louise can learn its own
bias, both the BK and the metarules, so it is not only not limited to
task-specific bias, it is not even limited to user-provided bias. Under some
circumstances it can even invent new examples. And then use them to learn a
hypothesis that generalises better to unseen examples.
*
>> Louise can perhaps generate all Prolog programs. Louise cannot search the
space of Prolog programs.
> Louise's biases are not problem-specific in the short example I showed you.
This is clearly untrue.
You were customizing the BK to each specific task. You were also customizing the stepping stones for each specific task.
Justifications can come later. At least admit that you customized the BK for each problem instance and prior to doing so the solver did not solve the problem asked.
Not responding to the rest since you've missed my entire point and I don't feel like rephrasing it.
I did not "customize the BK to each specific task". You can go back and see
what I did. I provided some generic BK predicates that manipulate lists and
numbers, I defined some metarules and I gave a few examples of each program's
inputs and outputs.
I don't understand your criticism and I don't understand what you want me to
"at least admit".
>> This is clearly untrue.
Can you show me which biases in the example I showed are problem-specific?
>> Not responding to the rest since you've missed my entire point and I don't
feel like rephrasing it.
I don't think I missed your point. I think you, yourself, are horribly
confused about what point you are trying to make. And the reason of course is
that you want to be able to express strong opinions about AI and machine
learning, but you don't want to have to do the hard work to understand the
subject. So you keep saying "five impossible things before breakfast", like
asking for a learner that learns only from examples, or saying that's what
deep learning is, etc.
I'm sorry but despite what the article above suggests, there is't an easy way
to being an expert- not even in machine learning. If you want to know what
you're talking about, then you'll have to do your homework.
I don't know why you need to reply unpleasantly. It should be possible to give and receive criticism, even strong criticism, without having it turn into a flamewar just because we're on the internet.
Indeed, you yourself have criticised my work and my field mercilessly in this thread and I did not once reply with unpleasantness. In fact, what you keep dismissing as irrelevant and basically cheating (Louise) is my PhD research. I would be well within reason to be defensive about it. Instead, I believe I have remained polite and respectful towards you throughout and strove to answer all your questions about it.
Although you did take my criticism as snark, so this is perhaps something that is not entirely objective - you might perceive my criticism as a personal attack, say. Again, this should not be the case. In my field of work, criticism is what makes your work better and without criticism one never improves. So I do mean it when I say that my contribution to this thread was for your sake and to help you improve your knowledge of a subject you seem to be interested in.
In any case, I'm sorry this conversation turned sour. I didn't want to make you upset and I apologise for having done so.
> I believe I have remained polite and respectful towards you throughout
I disagree. I do not mind in the slightest being told I am wrong, or having my ideas criticized. But calling me too stupid to understand my own point, or too intellectually lazy to want to understand a subject, or to talk down to me like a child—that is not kosher. This conversation is not worth being attacked, or my day being made unpleasant because you choose not to avoid the impulse to throw insults.
To the other side of things, it might help calm you to know I never much considered what I was saying a criticism of Louise. Louise, from what I can tell, is fine, and an interesting take on the task. What I was objecting to was only the way you used it in the argument. A bike is cheating if you bring it to a 100 metre sprint, but that doesn't mean they serve no purpose. Eg. I do not consider SAT solvers particularly relevant to AI progress, but one can hardly deny they are quality tools.
As far as I can tell, I did not talk down to you as to a child, and I
certainly did not call you intellectually lazy or stupid. I criticised the
fact that you don't want to put in the hard work to understand the subject you
are discussing, which is what you have stated from the start of the
conversation, claiming you don't need to read up on the history of AI because
it is not relevant (I'm paraphrasing your point but correct me if I
misunderstood it).
It seems to me I am right to think that you took my criticism as an insult to
your faculties. If I say something wrong, I expect to be corrected and
criticised if I insist on it, but I don't take that as an insult.
>> To the other side of things, it might help calm you to know I never much
considered what I was saying a criticism of Louise.
And still you persist with the same style of commenting. "Calm" me? And you
complain that I talk down to you? You have replied to my original comment with
arrogance to tell me that my entire field of study is "not AI" and irrelevant
- and then continued to insist you don't need to know anything about the ~70
years of work you dismiss even when it became clear that this only causes you
to make elementary errors. You speak of things you know nothing about with
great conviction and then you get upset with me for pointing out this can only
result in errors and confusion. Given all that, I have shown great patience
and courtesy. Others would have just ignored you as ignorant and unwilling to
learn. I gave you the benefit of the doubt. Was that a mistake?
A poor choice of words, sorry. I meant, I understood you to be saying you found the criticism of Louise unpleasant, and I thought it would lessen that to know that I didn't and don't think Louise was bad.
I genuinely don't understand what you expect me to do here. How can I possibly define BK for a problem I have zero information about beyond the type signature?
> There are only 1280 clauses, but there are about 2 billion programs
Doesn't matter, since the programmer gives that information. As you admit, Louise cannot check for arbitrary programs.
> Deep learning doesn't "solve" that problem either- the point is that you can't learn only from examples.
That's literally what DL is.
You can also learn programs from examples. People do it all the time. What else would you call the Abstraction and Reasoning Challenge, https://www.kaggle.com/boliu0/visualizing-all-task-pairs-wit...?