
The Bitter Lesson (2019) - radkapital
http://incompleteideas.net/IncIdeas/BitterLesson.html
======
YeGoblynQueenne
>> In computer chess, the methods that defeated the world champion, Kasparov,
in 1997, were based on massive, deep search.

"Massive, deep search" that started from a book of opening moves and the
combined expert knowledge of several chess Grandmasters. And that was an
instance of the minimax algorithm with alpha-beta cutoff, i.e. a search
algorithm specifically designed for two-player, deterministic games like
chess. And with a hand-crafted evaluation function, whose parameters were
filled-in by self-play. But still, an evaluation function; because the minimax
algorithm requires one and blind search alone did not, could not, come up with
minimax, or with the concept of an evaluation function in a million years.
Essentially, human expertise about what matters in the game was baked-in to
Deep Blue's design from the very beginning and permeated every aspect of its
design.

Of course, ultimately, search was what allowed Deep Blue to beat Kasparov
(3½–2½; Kasparov won two games and drew another). That, in the sense that the
alpha-beta minimax algorithm itself is a search algorithm and it goes without
saying that a longer, deeper, better search will inevitably eventually
outperform whatever a human player is doing, which clearly is not search.

But, rather than an irrelevant "bitter" lesson about how big machines can
perfom more computations than a human, a really useful lesson -and one that we
haven't yet learned, as a field- is why humans can do so well _without
search_. It is clear to anyone who has played any board game that humans can't
search ahead more than a scant few ply, even for the simplest games. And yet,
it took 30 years (counting from the Dartmouth workshop) for a computer chess
player to beat an expert human player. And almost 60 to beat one in Go.

No, no. The biggest question in the field is not one that is answered by "a
deeper search". The biggest question is "how can we do that without a search"?

Also see Rodney Brook's "better lesson" [2] addressing the other successes of
big search discussed in the article.

_____________

[1]
[https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)#Des...](https://en.wikipedia.org/wiki/Deep_Blue_\(chess_computer\)#Design)

[2] [https://rodneybrooks.com/a-better-
lesson/](https://rodneybrooks.com/a-better-lesson/)

~~~
dreamcompiler
Are we certain that well-trained human players are not doing search? It's
possible that a search subnetwork gets "compiled without debugger symbols" and
the owner of the brain is simply unaware that it's happening.

~~~
YeGoblynQueenne
>> Are we certain that well-trained human players are not doing search?

Yes- because human players can only search a tiny portion of a game tree and a
minimax search of the same extent is not even sufficient to beat a dedicated
human in tic-tac-to, leta lone chess. That is, unless one wishes to
countenance the possibility of an "unconscious search" which of course might
as well be "the grace of God" or any such hand-wavy non-explanation.

>> It's possible that a search subnetwork gets "compiled without debugger
symbols" and the owner of the brain is simply unaware that it's happening.

Sorry, I don't understand what you mean.

~~~
oezi
Why do you dismiss the unconscious search that humans do in Go? Having learned
Go some years ago it is such an exciting thing to realize that with practice
the painstaking process of consciously evaluating the myriads possibilities of
moves gives way to just "seeing" solutions out of nothing. You can really feel
that your brain did wire itself up to do analysis for you at a level that is
subconscious but interfaces so gracefully with your conscious cognition that
it is a real marvel.

~~~
YeGoblynQueenne
>> Why do you dismiss the unconscious search that humans do in Go?

The question is why you say that humans perform an unconscious search when
they play Go. And what kind of search is it, other than unconscious? Could you
describe it, e.g. in algorithmic notation? I mean, I'm sure you couldn't
because if you could then the problem of teaching a computer to play Go as
well as a human would have been solved years and years ago. But, if you can't
describe what you're doing, then how do you know it's a "search"?

Note that in AI, when we talk of "search" (edit: at least, in the context of
game-playing) we mean something very specific: an algorithm that examines the
nodes of a tree and applies some criterion to label each examined node as a
target node or not a target node. Humans are absolutely awful at executing
such an algorithm with our minds for any but the most trivial of trees, at
least compared to computers.

~~~
acidbaseextract
Here's a section of Michael Redmond's (9-dan professional Go player)
commentary on the Lee Sedol vs AlphaGo matches:
[https://youtu.be/yCALyQRN3hw?t=3031](https://youtu.be/yCALyQRN3hw?t=3031)

It's really fun to watch his commentary because he relentlessly plays
"variations" — possible next moves and sequences — while waiting for the
players, explaining the tradeoffs between moves and the consequences they lead
to a few steps ahead in the game.

I don't know what to call "variations" but a tree search with heuristics. He
does it slowly to explain it to the audience, but I have no doubt the same
process runs much faster in his mind.

~~~
YeGoblynQueenne
Fast enough to evaluate a few million future positions in a few seconds? Like
I say in another comment, even professional players cannot "look ahead" more
than a few ply, so whatever it is they're doing "in their heads", the tree
search they're _reporting_ is not how they win games.

To clarify, you can come up with an explanation of anything that you do, or
observe yourself or another person do. For example, you might explain how you
hit a ball with racket in tennis or with a bat in baseball, etc, but that
doesn't mean that the process you are describing is the process that your mind
(let alone your brain) actually follows.

If nothing else because such a description will necessarily fudge important
steps. For example, if I describe myself walking as "I put one foot in front
of the other" \- have I explained enough about walking that it can now be
reproduced mechanically? Experience teaches that -no.

------
kdoherty
Potentially also of interest is Rod Brooks' response "A Better Lesson" (2019):
[https://rodneybrooks.com/a-better-lesson/](https://rodneybrooks.com/a-better-
lesson/)

~~~
bnjmn
"Potentially" is an understatement! A much better take, IMO.

------
auggierose
I guess it depends on what you trying to do. I had a computer vision problem
where I was like, hell yeah, let's machine learn the hell out of this. 2
months later, and the results were just not precise enough. It took me 2 more
months, and now I am solving the task easily on an iPhone via Apple Metal in
milliseconds with a hand-crafted optimisation approach ...

~~~
jefft255
His advice really concerns more scientific research and its long-term
progress, and not really immediate applications. I think that injecting human
knowledge can lead to faster, more immediate progress, and he seems to believe
that too. The "bitter lesson" is that general, data-driven approaches will
always win out eventually.

------
ksdale
I think it's plausible that many technological advances follow a similar.
Something like the steam engine is a step-improvement, but many of the
subsequent improvements are basically the obvious next step, implemented once
steel is strong enough, or machining precise enough, or fuel is refined
enough. How many times has the world changed qualitatively, simply in the
pursuit of making things quantitatively bigger or faster or stronger?

I can certainly see how it could be considered disappointing that pure
intellect and creativity doesn't always win out, but I, personally, don't
think it's bitter.

I also have a pet theory that the first AGI will actually be 10,000 very
simple algorithms/sensors/APIs duct-taped together running on ridiculously
powerful equipment rather than any sort of elegant Theory of Everything, and
this wild conjecture may make me less likely to think this a bitter lesson...

~~~
StevenWaterman
I agree, the first AGI probably will be bodget together with loads of expert
input. However, that's not evidence against the bitter lesson.

The _first_ of anything is usually made with the help of experts, but they're
quickly overtaken by general methods that lever additional computation

~~~
ksdale
Sorry, I didn't mean to suggest that the bitter lesson is _wrong_ , just that
it's not bitter, it's actually how a whole bunch of stuff progresses.

~~~
StevenWaterman
Makes sense!

------
fxtentacle
The current top contender on AI optical flow uses LESS CPU and LESS RAM than
last year's leader. As such, I strongly disagree with the article.

Yes, many AI fields have become better from improved computational power. But
this additional computational power has unlocked architectural choices which
were previously impossible to execute in a timely manner.

So the conclusion may equally well be that a good network architecture results
in a good result. And if you cannot use the right architecture due to RAM or
CPU constraints, then you will get bad results.

And while taking an old AI algorithm and re-training it with 2x the original
parameters and 2x the data does work and does improve results, I would argue
that that's kind of low-level copycat "research" and not advancing the field.
Yes, there's a lot of people doing it, but no, it's not significantly
advancing the field. It's tiny incremental baby steps.

In the area of optical flow, this year's new top contenders introduce many
completely novel approaches, such as new normalization methods, new data
representations, new nonlinearities and a full bag of "never used before"
augmentation methods. All of these are handcrafted elements that someone built
by observing what "bug" needs fixing. And that easily halved the loss rate,
compared to last year's architectures, while using LESS CPU and RAM. So to me,
that is clear proof of a superior network architecture, not of additional
computing power.

~~~
jmole
Yup - and this year's top CPUs have almost 10x the performance per watt of
CPUs from even 2-3 years ago [0]

Raw computation is only half the story. The other half is: what the hell do we
do with all these extra transistors? [1]

0 -
[https://www.cpubenchmark.net/power_performance.html](https://www.cpubenchmark.net/power_performance.html)

1 - [https://youtu.be/Nb2tebYAaOA?t=2167](https://youtu.be/Nb2tebYAaOA?t=2167)

~~~
fxtentacle
Any day now people will start compiling old programs to web assembly so that
you can wrap them with election, instead of compiling them to machine code.
Once that happens, we have generated another 3 years of demand for Moore's law
X_X

------
astrophysician
I think what he's basically saying is that priors (i.e. domain knowledge +
custom, domain-inspired models) help when you're data limited or when your
data is very biased, but once that's not the case (e.g. we have an infinite
supply of voice samples), model capacity is usually all that matters.

------
JoeAltmaier
Got to believe, this is like heroin. Its a win until it isn't. Then where will
AI researchers be? No progress for 20 (50?) years because the temptation to
not understand but to just build performant engineering solutions, was so
strong.

In fact, is the researcher supposed to be building the most performant
solution? This article seems alarmingly misinformed. To understand 'artificial
intelligence' isn't a race to VC money.

~~~
visarga
AI as a field relied mostly on 'understanding' based approaches for 50 years
without much success. These approaches were too brittle and ungrounded. Why
return to something that doesn't work?

DNNs today can generate images that are hard to distinguish from real photos,
super natural voices and surprisingly good text. They can beat us at all board
games and most video games. They can write music and poetry better than the
average human. Probably also drive better than an average human. Why worry
about 'no progress for 50 years' at this point?

~~~
JoeAltmaier
Because, they can't invent a new game. Unless of course they were only
designed to invent games, and by trial and error and statistical correlation
to existing games, thus producing a generic thing that relates to everything
but invents nothing.

I'm not an idiot. I understand that we won't have general purpose thinking
machines any time soon. But to give up entirely looking into that kind of
thing, seems to me to be a mistake. To rebrand the entire field as calculating
results to given problems and behaviors using existing mathematical tools,
seems to do a disservice to the entire concept and future of artificial
intelligence.

Imagine if the field of mathematics were stumped for a while, so investigators
decided to just add up things faster and faster, and call that Mathematics.

~~~
visarga
What GPT-3 and other models lack is embodiment. There are of course RL agents
embodied in simulated environments, like games and robot sims, but this pales
in comparison to our access to nature and the human society. When we will be
able to give them a body they will naturally rediscover play and games.

Human superiority doesn't come just from the brain, it comes from the
environment this brain has access to - other humans, culture, tools, nature,
and the bodily affordances (hands, feet, eyes, ability to assimilate organic
food...). AI needs a body and an environment to evolve in.

------
sytse
The article says we should focus on increasing the compute we use in AI
instead of embedding domain specific knowledge. OpenAI seems to have taken
this lesson to heart. They are training a generic model using more compute
than anything else.

Many researchers predict a plateau for AI because it is missing the domain
specific knowledge but this article and the benefits of more compute that
OpenAI is demonstrating beg to differ.

~~~
throwaway7281
Model compression is an active research field and will probably be quite
lucrative, as you will literally able to save millions.

------
aszen
Interesting, I wonder what happens now that Moore's law is considered dead and
we can't rely on computation power increasing year over year. To make further
progess with general purpose search and learning methods we will need lots
more computational power which may not be cheaply available. Then do we focus
our efforts on developing more efficient learning strategies like the one we
have in our minds ?

I do agree with the part about not embedding human knowledge into our computer
models, any knowledge worth learning about any domain the computer should be
able learn on its own to make true progress in AI.

~~~
noanabeshima
The amount of compute used in the largest AI training runs has been
exponentially growing:

[https://openai.com/blog/ai-and-compute/](https://openai.com/blog/ai-and-
compute/)

The amount of compute required for Imagenet classification has been
exponentially decreasing:

[https://openai.com/blog/ai-and-efficiency/](https://openai.com/blog/ai-and-
efficiency/)

~~~
aszen
Very interesting links, thanks for sharing.

So the trend isn't changing we still need bigger models to make progress in
NLP and CV, while the algorithmic effeciencies are promising but they aren't
giving anywhere near the same improvements as larger models.

I'm curious how long this trend will continue and if there's anything
promising that can reverse this trend

~~~
PeterisP
IMHO the main thing that determines this trend is whether the results are
_good enough_. For the most part, there's only some overlap between the people
who work on better results and people who work on more efficient results,
those research directions are driven by different needs and thus also tend to
happen in different institutions.

As long as our proof of concept solutions don't yet solve the task
appropriately, as long as the solution is weak and/or brittle and worse than
what we need for the main partical applications, most of the research focus -
and the research progress - will be on models that try and give better
results. It makes sense to disregarding the compute cost and other impractical
inconveniences when working on pushing the bleeding edge, trying to make the
previously impossible things possible

However, when tasks are "solved" from the academic proof-of-concept
perspective, then generally the practical, applied work on model efficiency
can get huge reductions in computing power required. But that happens
_elsewhere_.

The concept of technology readiness level
([https://en.wikipedia.org/wiki/Technology_readiness_level](https://en.wikipedia.org/wiki/Technology_readiness_level))
is relevant. For the NLP and CV technologies that are in TRL 3 or 4, the
efficiency does not really matter as long as it fits in whatever computing
clusters you can afford; this is mainly an issue for the widespread adoption
of some tech in industry by the time the same tech is in TRL 6 or so, and this
work mostly gets done by different people in different organizations with
different funding sources than the initial TRL 3 research.

------
maest
For contrast, take this Hofstadter quote:

> This, then, is the trillion-dollar question: Will the approach undergirding
> AI today—an approach that borrows little from the mind, that’s grounded
> instead in big data and big engineering—get us to where we want to go? How
> do you make a search engine that understands if you don’t know how you
> understand? Perhaps, as Russell and Norvig politely acknowledge in the last
> chapter of their textbook, in taking its practical turn, AI has become too
> much like the man who tries to get to the moon by climbing a tree: “One can
> report steady progress, all the way to the top of the tree.”

My take is that there is something intelectually unsatisfying about solving a
problem by simply throwing more computational power at it, instead of trying
to understand it better.

Imagine in a parallel universe where computational power is extremely cheap.
In this universe, people solve integrals exclusively by numerical integrations
so there is no incentive to develop any of the Analysis theory we currently
have. I would expect that to be a net negative in the long run as theories
like Gen Relativity would be almost impossible to develop without the current
mathematical apparatus.

~~~
YeGoblynQueenne
Where is this quote from, please?

To play devil's advocate, I think retort to your comment about "intellectually
satisfying" methods is "yeah, but, they work". And in any case,
"intellectually satisfying" doesn't have a formal definition in computer
science or AI so it can't very well be a goal, as such.

My own concern is exactly what Russel & Norvig seem to say in Hofstadter's
comment: by spending all our resources on clmbing the tallest trees to get to
the moon, we're falling behind from our goal, of ever getting to the moon.
That's even more so if the goal is to use AI to understand our own mind,
rather than to beat a bunch of benchmarks.

~~~
self
The quote is from this article:

[https://www.theatlantic.com/magazine/archive/2013/11/the-
man...](https://www.theatlantic.com/magazine/archive/2013/11/the-man-who-
would-teach-machines-to-think/309529/)

~~~
YeGoblynQueenne
Thank you.

Gosh, what an awkward pose, that first picture.

------
dyukqu
Previous discussion:
[https://news.ycombinator.com/item?id=19393432](https://news.ycombinator.com/item?id=19393432)

------
koeng
This lesson can be applied to synthetic biology right now, though it is still
in its infant stages.

At least a few of the original synthetic biologists are a bit disappointed in
the rise of high-throughput testing for everything, instead of "robust
engineering". Perhaps what allows us to understand life isn't just more
science, but more "biotech computation".

------
ruuda
A slightly more recent post, that really opened my eyes to this insight (and
references The Bitter Lesson) is this piece by Gwern on the scaling
hypothesis:
[https://www.gwern.net/newsletter/2020/05#gpt-3](https://www.gwern.net/newsletter/2020/05#gpt-3)

------
throwaway7281
This reminds me of the Banko and Brill paper "Scaling to very very large
corpora for natural language disambiguation" \-
[https://dl.acm.org/doi/10.3115/1073012.1073017](https://dl.acm.org/doi/10.3115/1073012.1073017).

It is exactly the point and it is something not a lot of researchers really
grok. As a researcher you are so smart, why can't you discover whatever you
are seeking? I think in this decade, we see a couple more scientific
discoveries by brute force which will hopefully will make the scientific type
a bit more humble an honest.

------
coldtea
> _At the time, this was looked upon with dismay by the majority of computer-
> chess researchers who had pursued methods that leveraged human understanding
> of the special structure of chess._

This seems problematic as a concept in itself.

Sure human players have a "human understanding of the special structure of
chess". But what makes them play could be an equally "deep search" and fuzzy
computations done in the brain that and not some conscious step by step
reasoning. Or rather, their "conscious step by step reasoning" to my opinion
probably relies on tops on subconscious deep search in the brain for pruning
the possible moves, etc.

I don't think anybody plays chess at any great level merely by making
conscious step by step decisions.

Similar to how when we want to catch a ball thrown at us, we do some thinking
like "they threw it to our right, so we better move right" but we also have
tons of subconscious calculations of the trajectory (nobody sits and
explicitly calculates the parabolic formula when they're thrown a baseball).

------
cgearhart
I have read this before and broadly agree with the point—it’s no use trying to
curate expertise into AI. But I don’t think modeling p(y|x) or it’s friend
p(y, x) is the end we’re looking for either. But, it’s unreasonably effective,
so we keep doing it. (I don’t have an answer or an alternative; causality
appeals to my intuition, but it’s really clunky and has seemingly not paid
off.)

~~~
sgt101
Actually I feel like causalities time has come. The framework that has
convinced me is just the simple approach of doing controlled experiments over
observational data to establish causal links via DAGs no need for any drama!

~~~
cgearhart
It seems to be just shuffling around the hard part of the problem. Causality
still depends on some unstructured optimization problem of generating and
evaluating causal diagram candidates. I haven’t really seen it applied where
the set of potential causal relationships is huge.

------
francoisp
building a model for and with domain knowledge == premature optimization? In
the end a win on kaggle or a published paper seems to depend on tweaking
hyperparameters based on even more pointed DK: data set knowledge...

I wonder what would be required to build a model that explores the search
space of compilable programs in say python that sorts in correct order.
Applying this idea of using ML techniques to finding better "thinking" blocks
for silicon seems promising.

~~~
YeGoblynQueenne
>> I wonder what would be required to build a model that explores the search
space of compilable programs in say python that sorts in correct order.

Oh, not that much. You could do that easily with a small computer and an
infinite amount of time.

~~~
francoisp
With Deep Thought and an infinite amount of time, you can get the answer to
life the universe and everything...

------
overhyp
I would like to offer what I believe is a counterpoint, but I am not a trained
ML researcher so I am not sure if it is even a counter-point. Maybe it is just
an observation.

I recently participated in the following Kaggle competition:

[https://www.kaggle.com/allen-institute-for-
ai/CORD-19-resear...](https://www.kaggle.com/allen-institute-for-
ai/CORD-19-research-challenge/)

Now, you can see the kinds of questions the contest expects the ML to answer,
just to take an example:

"Effectiveness of movement control strategies to prevent secondary
transmission in health care and community settings"

All I can say is that the contest results, on the whole, were _completely
underwhelming_. You can check out the Contributions page to verify this for
yourself. If the consequences of the failure weren't so potentially
catastrophic, some might even call it a little comical. I mean, its not as if
a pandemic comes around every few months, so we can all just wait for the
computational power to catch up to solve these problems like the author
suggests.

Also, I couldn't help but feel that nearly all participants were more
interested in applying the latest and greatest ML advancement (Bert QA!),
often with no regard to the problem which was being solved.

I wish I could tell you I have some special insight into a better way to solve
it, given that there is a _friggin pandemic_ going on, and we could all very
well do with some _real friggin answers_! I don't have any such special
insight at all. All I found out was that everyone was so obsessed with using
the latest and greatest ML techniques, that there was practically no first
principles thinking. At the end, everyone just sort of got too drained and
gave up, which is reflected by a single participant winning pretty much the
entire second round of 7-8 task prizes by the virtue of being the last man
standing :-)

I have realized two things.

1) ML, at least when it comes to understanding text, is really overhyped

2) Nearly everyone who works in ML research is probably overpaid by a factor
of 100 (just pulling some number out of my you know what), given that the
results they have actually produced have fallen so short precisely when they
were so desperately needed

------
glitchc
When it comes to games, exploitation (of tendencies, weaknesses),
misdirection, subterfuge and yomi play a far bigger role in winning than
actual skill. Humans are much better than computers at all of those. Perhaps a
dubious honour, but an advantage nonetheless. We're only really in trouble
when the machine learns to reliably replicate the same tactics.

~~~
elcomet
I think that computers managed to beat humans at poker already. (Online poker,
which is different from physical games, where of course AI cannot compete)

------
sidpatil
[http://norvig.com/chomsky.html](http://norvig.com/chomsky.html)

------
avmich
> When a simpler, search-based approach with special hardware and software
> proved vastly more effective, these human-knowledge-based chess researchers
> were not good losers.

It's like calling Russia a loser in Cold War. Technically the effect is
reached; practically the side which "lost" gained possibly largest benefits.

------
KKKKkkkk1
Today Elon Musk announced that Tesla is going to reach level-5 autonomy by the
end of the year. Specifically

 _There are no fundamental challenges remaining for level-5 autonomy. There
are many small problems. And then there 's the challenge of solving all those
small problems and then putting the whole system together._ [0]

I feel like this year is going to be another year in which the proponents of
brute-force AI like Elon and Sutton will learn a bitter lesson.

[0]
[https://twitter.com/yicaichina/status/1281149226659901441](https://twitter.com/yicaichina/status/1281149226659901441)

~~~
typon
Elon Musk announcing something doesn't make it true

------
vlmutolo
It’s funny when you’ve been thinking for months about how speech recognition
could really benefit from integrating models of the human vocal tract…

and then you read this

~~~
sqrt17
Here's a thing: incorrect assumptions that are built into a model are more
harmful than a model that assumes too little structure. If you model the vocal
tract and the actual exciting things are the transient noises that occur when
we produce consonants, at best there's lots of work with not much to show and
at worst you're limiting your model in a negative way. That's the basis for
the "every time we fired a linguist, recognition rates improved" from 90s
speech recognition.

On the other end of the spectrum, data and compute ARE limited and for some
tasks we're at a point where the model eats up all the humanity's written
works and a couple million dollars in compute and further progress has to come
from elsewhere because even large companies won't spend billions of dollars in
compute and humanity will not suddenly write ten times more blog articles.

~~~
visarga
I think we're far from having used all the media on the internet to train a
model. GPT-3 used about 570GB of text (about 50M articles). ImageNet is just
1.5M photos. It's still expensive to ingest the whole YouTube, Google Search
and Google Photos in a single model.

And the nice thing about these large models is that you can reuse them with
little fine-tuning for all sorts of other tasks. So the industry and any
hacker can benefit from these uber-models without having to retrain from
scratch. Of course, if they even fit the hardware available, otherwise they
have to make due with a slightly lower performance.

~~~
sqrt17
GPT-3 is too large to be useful for practical purposes. Look it up. It's the
equivalent of a Formula 1 car or a Saturn V rocket - an impressive feat of
technology but of no practical relevance for getting you to work and back.

And certainly fine-tuning and distillation are part of the story why we wanted
these large do-all-be-all models in the first place, but the question of
what's next for the state of the art - and that currently would be
featurization through a large transformer model (i.e. BERT, ERNIE, GPT-2) with
some deep-but-not-huge task-specific model on top - isn't simply answered by
"more compute".

------
lambdatronics
TL;DR: AI needs a hand up, not a handout. "We want AI agents that can discover
like we can, not which contain what we have discovered." I was internally
protesting all the way through the note, until I got to that penultimate
sentence.

~~~
rbecker
Yeah, it takes a careful, charitable reading to not interpret it as "don't
bother with understanding or finding new methods, just throw more FLOPS at
it".

------
annoyingnoob
That is a wall of words, I can't even read it in that format.

------
totally_a_human
This page seems to be down. Is there a mirror?

------
mtgp1000
>We want AI agents that can discover like we can, not which contain what we
have discovered. Building in our discoveries only makes it harder to see how
the discovering process can be done.

I think these lessons are less appropriate as our hardware and our
understanding of neural networks improve. An agent which is able to [self]
learn complex probabilistic relationships between inputs and outputs (i.e.
heuristics) requires a minimum complexity/performance, both in hardware and
neural network design, before any sort of useful[self] learning is possible.
We've only recently crossed that threshold (5-10 years ago)

>The biggest lesson that can be read from 70 years of AI research is that
general methods that leverage computation are ultimately the most effective,
and by a large margin

Admittedly, I'm not quite sure of the author's point. They seem to indicate
that there is a trade-off between spending time optimizing the architecture
and baking in human knowledge.

If that's the case, I would argue that there is an impending perspective shift
in the field of ML, wherein "human knowledge" is not something to hardcode
explicitly, but instead is implicitly delivered through a combination of
appropriate data curation and design of neural networks which are primed to
learn certain relationships.

That's the future and we're just collectively starting down that path - it
will take some time for the relevant human knowledge to accumulate.

