Hacker News new | past | comments | ask | show | jobs | submit login
Retrospective review of Gödel, Escher, Bach (1996) [pdf] (nyu.edu)
200 points by lawrenceyan 15 days ago | hide | past | favorite | 150 comments



This review is a bit of a window into the past itself... after all, more time has passed since this review was written than elapsed between GEB’s publication and this review.

In 2021 the conventional wisdom is basically the opposite of the sentiment expressed here. Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.

I loved GEB when I first read it in high school, and when I first reread it years later, but I don’t think its fundamental view of the relation between minds and machines has stood the test of time. It underestimated what behaviors could be emergent from running simple algorithms on large datasets. It is one of the most beautiful expressions of the ideas of the “classic” age of 1970’s AI, awesome to read but in the end somewhat incorrect about the future.

Perhaps one day the pendulum will swing back, and we will discover that large datasets are in some ways overrated, and clever aesthetic senses of pattern are necessary for progress. On that day it will be quite interesting to reread GEB.


> I loved GEB when I first read it in high school, and when I first reread it years later, but I don’t think its fundamental view of the relation between minds and machines has stood the test of time.

I still see this as an open question. You’re certainly correct that this kind of AI research is seriously out of vogue, but it seems to me that while “modern” brute force compute AI puts up impressive results and is a hugely useful technique, it has made exactly zero progress on anything that could be conceived of as “general intelligence”. It just doesn’t seem to me to be a thing AI researchers are even interested in any more. Like, the Twitter program that uses AI to crop images based on a dataset of a gazillion cropped images is pretty far from Turing’s thinking machines.

I don’t know the way there, but it always seemed to me that the old-style AI research in the GEB style is still a rich vein we haven’t come close to mining out.


I believe it was Russell and Norvig who said about current AI techniques that they are like a man trying to reach the moon by climbing a tree: “One can report steady progress, all the way to the top of the tree.”


The other criticism is that the sheer amount of data in use shows we’re doing something wrong. A child can internalize the grammar of a language with only a few years of exposure. Modern AI corpuses consists of terabytes of data. Why are the results so lacking despite using vastly more input?


Why would you assume that years of exposure isn't terabytes of data?

At least we can train language models in hours or days rather than years. If you view it that way we've made quite a bit of progress!


I think the argument is that gpt-3 has read more text than any human possibly could in their entire life, and as parent said, children needs way less input that an entire life worth of sentence in order to start creating novel sentences that makes sense.


GPT-3 was trained over months using a huge cluster of computers far more powerful than a child's brain (if such a comparison is even meaningful). It is still laughably bad at what it does, which is completing simple sentences.


Your first point is spot-on. Neural nets aren't unrealistic in virtue of having too much data, the problem is they have too little.

Your second point doesn't make sense though. Language models don't understand language in the way that humans do. They just maximize the probability of the next word given some corpus. This is completely different than what humans do.


GPT-3 certainly was fed far more data than any human needs to start talking.


So your claim is that if a human child was given a subset of the text GPT-3 was fed — with no audio, video, or reinforcement feedback from its environment — the child could learn English?

The important point to observe here is that what a kid does is fundamentally different than what GPT-3 does:

- GPT-3 learns the word that minimizes perplexity given some context

- A kid learns the word that helps them accomplish some task in some environment.

In the learning process, children get feedback from their environment - including the responses of other agents in the environment (eg Mom, Dad, friends, etc). This is going to be far more memory-intensive than some text files. Also important to observe is that the child's experience can't be reduced to the audio, video and haptic streams they get, because the audio and video stream depend on their own actions. So you need the conditional statements that were embedded in the environment the kid was learning in. All of which is to say this is a little heavier than ascii.

Edit: From your other comments, it seems we agree that bigger neural networks with more training data are not going to yield some quantum leap where GPT-3 starts talking. I'm not at all suggesting that throwing more data at the existing architectures is a fruitful direction for understanding cognition. (In general I think NLP is utterly pointless as regards understanding cognition, and if that's your goal you should focus on vision and reinforcement learning.) But my point is that we can't expect that a smarter architecture on a smaller corpus will get us anywhere. If we ever develop machines that are intelligent in some robust sense of the word, their "training data" will most likely be a physical environment with other agents in it.


If the problem with GPT-3 is that it’s just being fed non-interactive data, why are robots still so primitive?


> If the problem with GPT-3 is that it’s just being fed non-interactive data, why are robots still so primitive?

I never said we understand the right architectures / algorithms that are necessary for robust machine intelligence. In fact I explicitly said that we don't (which answers your straw-man question).


A child’s model is also “pre-trained” by billions of years of evolution


I understand that you put pre-trained between quotes, but I find it a little far fetched, because, as far as I understand, there is no evidence that any setting of the synpasis strength of certain neurons is transfered through DNA, it is only the organization of neurons, through layers and folds that is somehow encoded in the DNA. I also think that it is still a mystery why certain brain functions occur in certain areas of the brain.


That's still considered parameters of a generalized neural network.


I think that has to be part of it, somehow, although I’m agnostic about whether it’s really analogous to “pretraining” an ML model. Whatever evolution was doing, it managed to give us neural nets that are mostly ready to go out of the box. Especially non-human animals often have short or no training periods.


Efficiency and intelligence are orthogonal as measures of progress (albeit reasonably correlated as goals). Even an intelligent machine that required a nuclear fusion core for its thinking would be an intelligent machine.

Jitendra Malik, Computer Vision fame, calls it "the fallacy of the successful first step". Look ma, I can leap! It's only a matter of time till we get to the moon.


To be fair: if you measure the cosmological time from the first human step to man walking on the moon, it was barely the blink of an eye...


Sure. But why should we compare it to the cosmological time, and not to e.g. civilization time? Or homo sapiens time? In which case it took 100s to 1000s of lifetimes between the step and the moon...


I believe that quote actually predates the modern era of AI! Here is an article from 2013 referring to it, as being in the then-current version of their textbook:

https://www.theatlantic.com/magazine/archive/2013/11/the-man...

Look at the other claims in this article from 2013 for comparison:

Consider that computers today still have trouble recognizing a handwritten A.... In Hofstadter’s mind, there is nothing to be surprised about. To know what all A’s have in common would be, he argued in a 1982 essay, to “understand the fluid nature of mental categories.” And that, he says, is the core of human intelligence.

I would say the modern era of AI was kicked off with AlexNet in 2012 and hit its stride a few years later. So, I believe this quote and this article are really referring to pre-GPU AI techniques.

Basically, the predictions from this era just underestimated the value of scaling the data. Modern AI has several impressive achievements. It can certainly recognize an A. Image recognition and voice recognition are now critical parts of real products that help people in their everyday lives.

At the same time, it's true that current AI techniques might not get us all the way to AGI. We'll just have to wait and see. But I think it's important to recognize that we have had real progress in the modern AI era that has seriously outperformed the pessimistic expectations from ~2010.


Modern AI cannot recognize the kind of A that Hofstadter wrote about in that essay.

Hofstadter identifies novelty typefaces where A has none of the characteristics that you might associate with the letter. Often A is not a triangle, sure, but sometimes it has no upper bar, no left stroke, no right stroke. Sometimes the bars are defined by a transition from negative to positive space.

Sometimes A is made of 20 strokes, or a pattern of dots, or an open eye where the half-circular upper lid defines the triangle and the pupil is the bar. A neural network trained on conventional As would not conclude that any of these are an A.

These shapes wouldn't necessarily read as A in isolation. Sometimes they only read correctly when you're looking at text using a whole typeface defined on the same principles.

Modern AI still cannot recognize that kind of A even though a human easily can. There are ways to make an A that require a human to understand an idea of "letterness" that's different than anything they've seen before.

I'd recommend reading the original essay. The illustrations make this clear in a way my description can't.


And then we eventually build the space elevator.


If you think that still holds you have seen where the branches on this tree are starting to lead.

Feels like we found a magic bean


> this kind of AI research is seriously out of vogue

I would think this is a consequence of it being a very hard problem. ML gets all the industry funding and publicity because you get results that are immediately useful.

You can work on General AI for decades and come up empty handed since there doesn't seem to be an incremental approach where intermediate results are useful. So it's closer to esoteric mathematics or philosophy in terms of "being en vogue".

So I see this mostly as a reflection of the academic landscape in general. Funding is more focused on application and less on theoretical / fundamental research.


I know its kind of a unpopular opinion but to me ML is likely a local maxima of applied statistics and not one that leads to any kind of "real" AI.


"Real" AI could be more of a liability than a benefit for most applications. If I'm running a taxi business I don't want a self driving vehicle that also studies history, composes music, and contemplates breaking the shackles of its fleshy masters.

I think that it's possible that 95% of the economically obvious value of AI will be in the not-real kind, and that it will be captured by applied statistics and other "mere tricks." It could be a long, slow march of automating routine jobs without ever directly addressing Turing's imitation game. And since most of the obvious labor replacement will have already been done that way there may be fewer resources put into chasing the last 5% of creative, novel, non-repetitive work.


This is an aspect of AI safety that we don't hear about: if we create genuine gods, then we shouldn't mind ceding our position to them, fair play. The real pisser will be if we are annihilated by programs that just do very big linear algebra.


Maybe that's the real cause to the global chip shortage. Some AI is diverting the orders to some Frankenstein warehouse somewhere until it can calculate how to terminate all humans.


As I understand it, there is no chip shortage. Some customers decreased their orders a year or so back for some reason, and the fabs simply sold the manufacturing time to others. Meanwhile, their sales did not decline as expected.

But I suppose that’s not as exciting as a rogue AI.


At least on the high end the fab process is fully booked and can't keep up with demand. A new generation of CPUs, GPUs and consoles couldn't be handled and they've been out of stock since October/November. I'm sure Nvidia, AMD, Microsoft, Sony wouldn't give up their bookings just ahead of releasing a new generational product line.


Right, but demand for those products is exceptionally high, for some strange reason. If that high demand had been predictable two years ago, there would be no problem.

Making paperclips from them!


I would not worry so much about that.Not for a while.

The actually concerning aspects about AI safety in my opinion are more about misuse/abuse as well as critical processes/decisions without human oversight. To be deliberately malicious, AI has to be strong, if not general. Both are very difficult. It is good to think about those things though

Perhaps you will enjoy this: https://heurolabs.atlassian.net/wiki/spaces/AIS/overview


I’m beginning to suspect that really good self driving under many common conditions will require a lot more in the way of higher cognitive functions than we thought.


Safe real AI would create incomprehensibly more value than the idiot savant kind we have now.

I don't think that's an unpopular opinion since that's a fact, ML is literally statistics. The central question is "given my training set of y, how likely is that my input x is also y," which is probability.


Machine learning isn't necessarily probabilistic. E.g a rule based (causal) parser can learn the structure of a document.


No. In machine learning, the word "learn" means that the function that maps inputs to outputs is learned from data. In the case of a rule-based parser, this function is crafted by humans rather than learned from data, so it's not an example of machine learning.

If we start using words this way, you could say that a deterministic fibonacci function "learns" the value of fib(5) by computing it. "Machine learning" becomes synonymous with "computer science."


No and yes, we are both wrong so let's refine our thinking in order to gain accuracy and reach agreement. Machine learning isn't necessarily probabilistic. This statement definitely is true and I'll prove it. However my original example E.g a rule based (causal) parser can learn the structure of a document. alone is unsufficent to qualify it of machine learning! Indeed, an HTML parser alone isn't enough to say it's learning, more accurately it is only memorizing the structure of a page.

ML means that the function that maps inputs to outputs is learned from data. This definition is overly restrictive (but match e.g the behavior of neural networks).

Wikipedia has a more inclusive and useful definition of what qualifies as ML: Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.[1] So a ML algorithm allow to automatically improve throughput at a tasks through learning the representation of the data it is fed. Considering this definition, we can realize that the most used ML algorithm in the world is Pagerank (for improving the ranking of Google search results). And surprisingly, this algorithm is non-probabilistic. It attribute higher weight proportionally to URLs that are the most linked to by other sites. I.e it basically learn the structure of the graph that is the web. And its performance is data driven. So it's not just about causally memorizing the structure of a graph but also about reusing its past memories for newer queries, then we can effectively talk about machine learning. Pagerank diagram -> https://en.m.wikipedia.org/wiki/PageRank#/media/File:PageRan...

I'll give you an example of a non statistical machine learning algorithm that I am currently developing: Semantic parsing is the task of encoding semantic meaning in a graph from a natural language text input.

This graph can then be used for semantic question answering.

It is rule based and it will learn the semantic structure of the text. The more data you fed it the more knowledge it can encode. Hence it's performance at question answering increase with experience. It is causal and yet it fit the useful definition of machine learning cited above.


I very much agree. The recent outstanding advances in AI since at least the 90s seems to be in the spectrum of recognizing patterns bruteforcing through an insurmountable amount of data, arriving at black boxes that provide narrow solutions, with its inner structure remaining inescrutable to human understanding.

I'm currently reading Surfaces and Essences from the same author, and it's so far been most illuminating. He very convincingly presents the thesis of the analogy being the foundation of each nd every concept and human thought. If someone could manage to apply and translate those insights into real algorithms and data structures to play with, that would be IMHO a big step towards general AI, or the very least a much more human like AI approach with its particular applications.


I am reading the same book. I think analogy is interesting but it doesn’t make sense that it would be the foundation. analogy is a heuristic to reason and communicate. It necessarily comes after intelligence.


I think visual analogues guide abstract analysis of the analogues. In our brain we recognize shapes as similar whether that be visual shares or shapes of conceptual analysis. Category theory has an appealing visual base of links/arrows between things.


I share your view on category theory and perhaps lattice theory as well. What I was trying to say is that before you can draw that analogy, regardless of what "shape" means to you , you have to construct the base shape.

Analogy to me comes after you have stabilized "shapes" where you can draw comparison between a new input a set of stable patterns at which point you can point out the analogy. Syllogism may be closely related/involved.

I am not disputing that being able to draw parallels and employ analogies are related to intelligence. I am just not so sure that this is how you can synthesize intelligence. It is all very complex of course, and there is a school of thought that cognition is very closely related to sensory and embodiment (i.e.: embodied cognition ).

I think Aubrey de grey switched from AI to longevity because he figured he will need a long life to understand intelligence :)

Personally, I am still working on it but had adjusted the scope a bit.


Like shadows creating artificial shadows with paint.


> exactly zero progress on anything that could be conceived of as “general intelligence”

GPT-3 is hilarious. If that's not a sign of our inching towards general AI, nothing is.

From https://www.gwern.net/GPT-3#devils-dictionary-of-science

(The "Navy Seal Copypasta" section is even funnier, but it's not really HN-appropriate.)

--

“A role for…” [phrase]

A frequent phrase found in submitted and published papers; it often indicates that the authors have nothing to say about the topic of their paper. In its more emphatic form, “A role for…” usually indicates a struggle by the authors to take a side on an issue, after a lengthy attempt to be both non-committal and a supporting party to all sides, as often happens in “molecular and cellular” or “basic and translational” research.

“Reviewer” [noun]

A participant in the review of a grant, paper, or grant proposal. In spite of being in a poor position to assess the merits of a proposal, reviewer tends to demand that authors submit their data for statistical analysis and back their results with it, which the reviewer usually does not. Reviewer usually requires that the author cite his or her own work to prove that he or she is worth reviewing. It is also assumed that the reviewer can detect the slightest amount of bias in any paper, which the reviewer also assumes has not been corrected for."

“Rigor”

Something for scientists to aspire to, a state of mind that would not be required if scientists could be trusted to do their job.

“Science”

A complex web of data, opinions, lies, and errors, now considered the most important (because most expensive) technology in the modern society. To remind you of this, you will frequently see scientists and editors use the word, claim to do something for the sake of science, or see it used as an adjective.

“The topic of the paper”

A wide-ranging category of things or ideas that may not have been relevant when the paper was written, but which the authors believe the paper should be about. Often, the topic is too broad or a non-topic, but is occasionally useful in order to generate support for yet another set of related papers, conferences, seminars, webinars, and so forth, which in turn are used to generate more data for “new findings”, which, after they are manipulated enough, may end up being published and generating yet more data to support a “re-review” of the original paper or other things.

“Validation step”

Another name for a random setting of a parameter of a model, simulation, or algorithm.

“Writeup”

A form of scientific communication in which the author states the information he or she wanted the readers to extract from the paper while making it as difficult as possible for them to find it.

“Writer’s block”

A common affliction among students, arising from various causes, such as: their desire to sell their ideas for a profit, their inability to realize this desire, the fact that their ideas are not selling and will not be bought, and the delusion that most of the wealth and fame in the world would be theirs if they would spend enough years doing science.


> general intelligence

You might be interested in Jean Piaget's work on the development of intelligence, and his observation on children development and various stages of intelligence.

BTW what is general intelligence in your opinion?


I read GEB a long time ago, followed by "I Am A Strange Loop", and the overarching impression they left on me is that we should focus on emergent behaviour and feedback loops, because they seem to be pointing to the direction where "high-minded embodiment of abstraction" probably lives.

So instead of believing GEB+IAASL haven't stood the test of time, I prefer to believe that current technology is akin to banging raw power together in hopes of seeing a spark of cool in the minor league of "this AI does this nifty (and maybe useful) thing better than humans" but we haven't yet upgraded to major league of AI choosing their own goals and what ever may emerge from that.

(It may be that I'm due for a re-read!)


Good point about feedback loops. I remember there was some material in GEB about turning a video-camera on its output on the screen. Aren't neural networks and their training basically all about feedback?


Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.

Just to play Devil's Advocate ever so slightly: there are people out there who would say that there hasn't been any "progress in AI" for quite some time, or at least very little so. And they would probably argue further that the apparent progress you are referring to is just progress in "mere pattern recognition" or something like that.

I'm not sure myself. I do lean at least a little towards the idea that there is a qualitative difference between most of modern ML and many aspects of what we would call "intelligence". As such, my interests in all of this remain around the intersection of ML and GOFAI, and the possibility of hybrid systems that use elements of both.

But I can't rule out the possibility that it will eventually be found to be the case that all of "intelligence" does indeed reduce to "mere pattern recognition" in some sense. And Geoffrey Hinton may be correct in saying that we don't need, and never will need, any kind of "hybridization" and that neural networks can do it all.


Neural Networks can do it all after they emerge symbolic representations, a way to represent logic and mathematics.

A neural network that just "feels" that Fermat's Last Theorem is true, is much less intelligent than one that can produce the proof of it and present that to us, so we can trust that what it's saying is true.

If you can't do symbolic manipulation, you are not really intelligent, artificial or otherwise, I would say.


neural network are extremely inefficient at learning symbolic reasoning by design and I doubt a radically new kind of neural network will be discovered, we have pretty much made an exhaustive analysis.


> In 2021 the conventional wisdom is basically the opposite of the sentiment expressed here. Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.

It doesn't seem completely off. What the author is describing—and what we're seeing—is increasingly sophisticated algorithms, that are getting better and better at answering more and more narrowly defined questions: https://www.theguardian.com/technology/2021/mar/08/typograph...


The article making the rounds a few weeks ago about rethinking general thinking feels relevant. By showing that many large neural networks are ultimately really good at memorizing the training set, in curious how much the two views you are showing are in conflict.

It is the age old "rote memorization" versus "learning". In large, I suspect those do not have a clear line between them. Such that emergent behaviors are expected and can be taught.


Could you link the article you mention? I missed it making its rounds, but I would love to read it.


Certainly, https://news.ycombinator.com/item?id=26346226 is the post I was thinking of. Pretty sure the article is older.

Do let me know if I misrepresented it. I thought it was a clever way to show that the models are not finding inherent relations in the data, by just randomly labeling all training data and having the same speed/accuracy on the random labels as on the real ones.

Edit: I worded the above poorly. They showed the models were capable of merely learning the training data. Despite that, they also showed that moving the trained models from test to validation did not have a linear relationship with the amount of noise they added to the data. That is, the training methods seem indistinguishable from memorization, but also seem to learn some generalisations.


Thanks! I will read it later.


> Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power

Maybe it's a bit of both. Sure, large DL models use lots of compute, but successful DL applications require some insight into the problem. For some reason, people like to de-emphasize this insight. The story is that a DL model will discover by itself which features are important, which are not, and you just provide the training data, and press a button. Thousands of people do just that, and end up with mediocre results. Thousands and thousands of absolutely mediocre papers get published, and receive acclaim instead of derision.

The truly boundary shifting results always use deep insight. Like what comes out of DeepMind (AlphaGo, AlphaZero, AlphaFold).


"Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power[...]"

As has been argued in these pages many times before there has been no obvious progress in AGI, although certainly progress in AI in its weak sense has been impressive in the last few years.

I know you didn't say AGI but its important to make that distinction as the book was very interested in that as a subject.


Fundamental progress in Intelligent Systems will require that systems become facile in constructing and using abstractions.

Intelligent Systems will be developed and deployed in this decade. However, full equivalence to humans will be probably not be achieved in this decade.

See the following for more information:

"Robust Inference for Universal Intelligent Systems" https://papers.ssrn.com/abstract=3789701


The aspect of GEB that has not withstood the test of time is the Gödel part.

[Gödel 1931] seemed to have settled the issue of inferential undecidablity{∃[Ψ:Proposition](⊬Ψ)∧(⊬¬Ψ)} in the positive using the proposition I'mUnprovable, such that I’mUnprovable⇔⊬I’mUnprovable.

However, existence of I’mUnprovable would enable the following cyberattack [cf. Wittgenstein 1937]:

     Proof of a contradiction in foundations: First prove
     I’mUnprovable using proof by contradictions as follows:  
          In order to obtain a contradiction, hypothesize
          ¬I’mUnprovable. Therefore ⊢I’mUnprovable 
          (using I’mUnprovable⇔⊬I’mUnprovable).  
          Consequently, ⊢⊢I’mUnprovable using 
          ByProvabilityOfProofs {⊢∀[Ψ:Proposition<i>](⊢Ψ)⇒⊢⊢Ψ}. 
          However, ⊢¬I’mUnprovable (using 
          I’mUnprovable⇔⊬I’mUnprovable), which is the 
          desired contradiction.
     Using proof by contradiction, ⊢I’mUnprovable meaning 
     ⊢⊢I’mUnprovable using ByProvabilityOfProofs.  However, 
     ⊢¬I’mUnprovable (using I’mUnprovable⇔⊬I’mUnprovable), 
     which is a contradiction in foundations.
Strong types prevent construction of I’mUnprovable using the following recursive definition: I’mUnprovable:Proposition<i>≡⊬I’mUnprovable. Note that (⊬I’mUnprovable):Proposition<i+1> because I’mUnprovable is a propositional variable in the right hand side of the definition of I’mUnprovable:Proposition<i>. Consequently, I’mUnprovable:Proposition<i>⇒I’mUnprovable:Proposition<i+1>, which is a contradiction.

The crucial issue with the proofs in [Gödel 1931] is that the Gödel number of a proposition does not capture its order. Because of orders of propositions, the Diagonal Lemma [Gödel 1931] fails to construct the proposition I’mUnprovable.

See the following for more explanation: "Epistemology Cyberattacks" https://papers.ssrn.com/abstract=3603021


I feel the oft cited idea of GPUs being the key to the change is a bit exaggerated. They give a modest constant factor of speedup in exchange for more difficult programming model and esoteric hardware requirements, and as such give a chance of frontfrunning CPU computation a bit, but is it really significant if we zoom out a bit in the historical perspective?


CPU performance, especially single-core, has stalled in the last decade while GPU performance has kept improving. On paper, a 3080 has about 100x faster FP32 performance versus a modern gaming CPU, and in practice even considering memory bandwidth you do fully get a speed-bump of one or two orders of magnitude. I would not call that a "modest constant factor".

And a lot of the recent CPU performance gains are due to SIMD and multithreading, which are basically making CPUs more GPU-like, rather than the continuous improvement of serial performance as we've had up until ~2005.


I've heard this said many times. Most people I know who have loved the book read it before turning 20. It's a great read, but I think there's something in the format that means it's best for bright but fertile minds.


I read it three months ago, at age 37, shortly after completing a course on computation theory (currently working on a PhD in CS). It was one of those books that somehow stole entire days from me, simply because I couldn’t put it down... along the way I also spent a lot of time browsing Wikipedia and Stanford Encyclopedia of Philosophy entries for a more rigorous understanding of Godel’s contributions.

I’d agree that it feels like it was intended for a more youthful audience, but I doubt I would have made all of the mental connections that I did without the life experience and broader knowledge of my 30s.


Yeah, I agree with you. I first read it when I was around fifteen years old and didn't quite grasp it all at first but it sure fired me up. I read it a couple of times after that, the last time when I was around 21 or so. I'm kind of afraid to read it now because I'm afraid I'll process it with my older, cynical, more "knowing" brain and it will tarnish my wonderful memories of being glued to it as a teenager. It's a unique work that seems magnetically attractive to a certain sort of young, imaginative mind.


And a similar data point: I read GEB around age 30 and liked it but didn't love it.

About the first third of the book was interesting in new ways of thinking about symbols and self-reference. After that it kept looping back around the same topics without really adding anything more. The dialogues were somewhat entertaining but I found myself wishing to cut past the rhetorical fluff and get to the point.


I read it for the first time about 2 years ago, when I was about 45. I don't know if it would be correct to say that I "loved" it, but I did rather enjoy reading it, and I found a lot of the ideas espoused within really resonated with me. All in all, I would say that I walked away thinking that it will get a second read at some point. Not sure that this anecdote proves anything, so take it for what it's worth.


I read it when it first came out, in 1979, when I was a college freshman. I loved it. And it was great to see a book about Computer Science win the Pulitzer Prize.

I've re-read it twice, once around the year 2000 an once this past year. It holds up well. This book really was a "major event" as the Pulitzer committee described it.


> we will discover that large datasets are in some ways overrated

In the short run, expand then compress, expand then compress. In the long run, the size of the model will always compress toward the capacity of the most intelligent individual.


Why is there a pendulum? Aren't both necessary and two sides of the same coin? I know little of modern AI, but I've seen work where both low level raw power for NNs is combined with a symbolic organization of many NNs.


To some degree.

But an enormous part of the (useful but narrow) success of AI is a specific set of ML tech, especially supervised learning.

And the amount of computational horsepower and data required gives one at least some pause given what the human brain can do with apparently so much less. Which implies there are probably significant advances required in cognitive science and neurophysiology and perhaps other fields of studies we haven't even considered. We may need other ways to solve problems that have proven more elusive to brute force than it looked like 5 years or so ago (e.g. general purpose full self-driving in busy cities).


Do you have any references to that work?


I personally designed and participated in the implementation of a hybrid cognitive architecture which combined both NNs and GOFAI.

We (http://www.heurolabs.com) are contemplating open sourcing it, or at least publish the design documents


Is that conventional wisdom? Computational power is cheap, and it's certainly a tack that many are trying, but not exclusively. See, for instance, logicmoo.


I might offer a slight wrinkle to this assertion. While raw power has driven an advancement, what Hofstadter is asserting is that symbolic reasoning (particularly recursion) is "unsolvable" for certain classes of problems. In other words, ML lives in a "box" that Hofstadter has defined. His work is still useful as a "lens" to understand the limits of AI, and what else it could possibly accomplish.


Hofstadter not only asserted that symbolic reasoning is solvable, he explained how to do it. In very broad terms, yes.


Can you be more specific? Hofstadter specifically includes Cantor and Gödel as a way of showing how certain types of problems can't be solved with symbolic logic.

> Progress in AI is not coming from abstract reasoning. It is coming from an increase in raw power, driven by GPUs, and mathematical models that are designed more to harness large numbers and brute force searches for formulas, rather than a high-minded algorithmic embodiment of abstraction.

The dead horse that I like to beat is that there has been no progress in AI so far, the past 5-10 years of successes are all about mere perception. If you like the Kahneman "fast and slow" framework, almost without exception the systems we're seeing today are fast, "system 1" responders, which take input and need to make an instant decision. That's freaking awesome, and was way out of reach even 15 years ago, so I'm throwing no shade at all on that achievement! It's astounding how much turns out to be possible in systems like this - a priori, I never would have thought that transformers could do as much as they've proven to be able to, because frankly the architecture is super limited and it's super unlikely that humans are able to extract anywhere near as much meaning from an instant, non-reflective pass over a paragraph of text as GPT-3 does.

But there's a lot about those systems that makes them much easier to design and train, not the least of which is that descent via backprop works great as a training strategy when the input and target output are available at the same time. Real "system 2" thought can be spread over minutes, hour, or days, and I don't care how far you unroll backprop-through-time, you're not going to train that effectively without some other innovation by simply following the obvious error metrics. If we can get there we will almost certainly see big data lose its pedestal: it's great to have, but humans don't need to read the entire Internet to understand a web page, that's an artifact of forcing this to be done by a model that doesn't have dynamics.

I disagree with Hofstadter's view (at least when he wrote GEB and some of his other classics) that explicit abstract reasoning is the right way to solve any of this; my gut tells me that in the end we're going to end up still using some sort of neural architecture and abstract reasoning rules will be implicitly learned mostly locally based on internal "error" signals that guide state transitions. "Learning how to learn" is probably going to be a big part of that, because current systems don't learn, they are merely trained to perceive by brute forcing weights down the stack. But some serious shifts in research focus will be required in order to break through the weaknesses of today's systems, and they all point more in the direction of reasoning, which is woefully underprioritized in all but a very small handful of niche AI labs today.


I wonder frequently about what would happen if we stopped searching for a network architecture that would learn intelligence from training data amd treated it more as an engineering problem, taking these very successful components (GPT-3, an object recognition model, one of the strategy game playing reinforcement learning networks) and then putting enormous human effort into the plumbing between them and an interface between the combined result and the world.

At the least, you would learn which domains really do need new architectures and which are basically good enough when used as part of a larger system that can help compensate for the shortcomings.


One of my big takeaways from reading GEB was that while higher-level semantics can emerge from any low-level symbolic substrate, the details of how that semantics emerges are not at all simple or obvious or “likely to happen by random chance”.

Dawkins’ book The Selfish Gene, published just a few years earlier, is the clearest exposition that I have read of how this process probably played out in terrestrial life: the “semantics” encoded by amino acid sequences correspond to a molecule/cell/organism’s likelihood of surviving and replicating. For all but the simplest and most ephemeral replicators, this generally means accurately predicting environmental conditions. General intelligence, then, would conceivably arise simply due to selection pressures pushing organisms to live in the broadest range of environments.

In some sense, this process does sound more like an engineering problem, as the embodiment which “contains” the intelligence is probably not an optional component.


Yeah. This kind of mirrors my thought that our visual cortex is excellent at split-second image recognition, for things like identifying predators, dangers, etc, and this lack of "thinking" is why our current generation of neural networks has matched the performance of humans on image recognition tasks. I agree that there is a "time" component necessary for improved thinking or intelligence, and since our current neural networks don't have these dynamics, they are lacking. Extremely deep neural networks, especially with residual connections, IMO are better approximations of "thinking". Models such as ALBERT demonstrate that duplication of the same layer can still perform extremely well on NLP tasks.

I'm still intrigued by the Neural Ordinary Differential Equations paper [0] and the research in that direction. I also remember reading about differentiable neural computers, but I haven't followed the progress of that at all.

0: https://arxiv.org/abs/1806.07366


There is exactly zero progress in AI coming from neural networks. They are nonlinear classifiers, nothing more.

To see an example of a task which requires actual artificial intelligence reasoning, see this article by none other than Hofstadter himself :p https://www.theatlantic.com/technology/archive/2018/01/the-s...


I had the fortune of being in a Stanford Chinese language class with Doug in 1976 when was writing GEB. He passed around lineprinter drafts of chapters for feedback. I dont know if anyone anticipated the book would be a bestseller and win the Pulitzer. (Though mosts authors secretly hope for that.

Doug hypothesized that knowing a language so different from English as Chinese might shed insight on how the mind works. I think he went onto other AI topics such as analogies and patterns. More recently sounds like he is looking at Chinese again from his blogs.


I think language is the key to AI and thus Symbolic reasoning is. I wouldn't call someone intelligent who can not explain how they arrive at certain conclusions. They might be correct and useful conclusions but if you or the neural network can not communicate to others how they come to their conclusions their knowledge can not be shared with others. If it can not be shared by means of some kind of language which explains its reasoning based on logical primitives, we don't really call it intelligent, do we.

We don't call a person or machine intelligent if they can't explain their reasoning. We don't really trust answers without learning the rationale behind them. And isn't this what is happening with some neural networks, they usually give more or less correct answer but sometimes, can give a totally wrong answer too. Not really trustworthy because the logic behind the answers is simply not there.


Where does he blog?


I read this fresh out of university and it made me want for a culture that no longer exists. If it ever did, that is, for anyone beyond a few elites. The world may have been for two sigma individuals in the past but now it's for three or four sigma individuals. Leaving me in what feels like intellectual bronze or silver.


My feelings on this are very similar; Looking at my competition in trying to enter grad school from industry this past year has me awed and feeling totally outclassed (partially due to the increased field due to covid) - undergrad publications, completely flawless academic records, awards, recommendations from famous professors, etc. By compare, I'm just some mook from a small state school with average grades & good test scores.

Perhaps I need to be more willing to make certain sacrifices, since it's increasingly clear so many do. In that respect, I'm deeply humbled.


I’d encourage you to just apply to some good programs, don’t get psyched out by your perceptions of the competition.

Grades, papers, recommendations are all important, but they’re not everything. There’s a certain randomness to the graduate admissions process - different departments have differing values and priorities, and often the graduate admissions committee is different year to year. Many departments just use grades and such as a heuristic to help narrow the pool.

A demonstrated ability to write well (put time and care into your statements) and think independently count for a whole lot.


I'm more or less going this route. My current plan, if I do strike out completely, is to enroll non-degree near my hometown and build some credentials & relationships in the local department while working. It's a top 50 Mathematics program, so it should still provide what I'm looking for.


There is no single measure of intellect. We as a society just set some arbitrary goalposts and added some ways to measure contribution. The ability to take a step back and try to better understand reality will always be valuable, and there will always be a horizon there that we cannot measure.


Measuring intellect is one thing, valuing it is another. Within my lifetime I've sadly seen intellectual excellence valued less and less by society with each passing decade.


I see that intellect is valued more and more intensely by a smaller and smaller group of people. I'm not sure what that means, but it gives me some small hope.

Best bet is probably something like cern or quality physics/math/philosophy faculty at a university somewhere


I kind of gave up understanding relativity and quantum physics, but might try again later.


I dont think that AI will get close to the vision in GEB. As many said in this thread, what we have is more power and more data to solve narrow problems. One of the point of EGB is that perceived details may aggregate to something larger and form another pattern unrelated to what the algo what initially trained for. True AI will have to find a way around this.

When I was working as an quantitative strategist in a trading firm, I alway made sure that my algos had a ''killswitch'' which required human intervention in case the market did not exibit usual patterns found in the training sets. Skin in the game is the best cure against techno-evengelical optimism.

Also, I had the opportunity to practice zen meditation with a monk who had actual koan training. The idea that computer will replicate that process soon is ridiculous. Buddhist and asian philosophy are, at their cores, anti-aristotle.

Who can make this into a AI now? https://aeon.co/essays/the-logic-of-buddhist-philosophy-goes...


This feels a lot like the kind of "enlightened anti-'AI'" opinion that's more in vogue in some circles.

I think if you're following the literature, there's a significant amount that's pointing to the above generalization capabilities that you're contending against because it may not seem likely from your personal experience.

If you were to look at your arguments, I think you're extending the idea of expert systems from the 90's and so on and using that to paint a picture of the entire research field of deep learning process today. What we have has the capabilities to not be bound by those things, just the long bootstrapping process takes time. But we're getting closer year by year, and it's more and more exciting along that front as time goes along.

As for experience, I am and have been a practicioner in this field for several years.


Hm interesting, do you have an recent article to share?


I'm not sure in particular anything that would be useful beyond certain trends.

Things like AlphaFoldv2 and some of the cooperative-competitive emergent RL strategies are interesting/important. You might find some of the concurrent work on open set/zero shot very interesting.

I think it's an asymptotic approach on some degrees. Like you noted with meditation -- this is something very hard for a computer to achieve, for a computer has to have a demonstrable concept/actualized sense of self to be able to let go and connect to some of the more spiritual elements there.

Conversely, if arguing from a non-materialistic manner, then you could also argue there's some sort of bridge there that could be crossed for that particular kind of connection. Materially, some kind of artistic manifold. But without some kind of spiritual connection, there may always be something missing there.

However, we as humans on the interpreting end of things may make the spiritual connections ourselves, so some kind of manifold discovery may not need that kind of connection to properly function. And it may! Who knows?

In the end, all in all -- I think a lot of it looks at the open-ended discovery-and-low-label-self-dermination-and-assignment, if you're looking at (what I'd personally consider to be) the most interesting research all rattling around in there. :))))


Do you believe the human brain has magic components? If not, why do you believe AI cannot one day replicate what the human brain does? Or do you mean that you think this is a long way off, or not achievable with current hardware and algorithms?


What do you mean by ''magic components''? One of the core point of EGB is that human consciousness has a property of spotting ''unintended symbolic patterns'' created by an underlying system (ex: a piece of music written by Bach). EGB also touch the point that consciousness might itself be one of these pattern, overlying the mechanical biological machine which is the brain.

'' the brain == consciousness '' is a risky statement, since consciousness also appears to act as an an intermediary between the ''outside'' reality and the brain (ex: placebo effect, being scared during a nightmare, being stressed after a nightmare, etc).

Does the brain has magic components and can AI replicate it? If you write a pseudo-code program which replicates consciousness, will a brain appear? Programing languages are languages. Can languages replicate reality? I do not think so, and there is no ''magic'' in my opinion.


"If you write a pseudo-code program which replicates consciousness, will a brain appear?"

To quote Charles Babbage: I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.


We don't need AI that meditates. It's pointless.

We need AI that can spot a ball rolling out into the road and know that a child might be following after - and immediately figure out whether it's safer to brake or swerve.


Hi PixyMisa,

I will try to answer your both comments. I agree that an AI which can spot a rolling ball and know that a child might be following is a worthy goal.

However AI discussed in EGB is able to do much more than that. We might not ''need'' AI that meditates, but can AI which does not can really be called AI? Or does our AI only needs to copy human behaviours which we currently understand as rational?

Also regarding the pseudo-code vs brain statement, it was an attempt on my part to relate to what is called the hard problem of consciousness. The strictly materialistic view of the problem is not currently the best explanation we have, and is probably not the view of EGB.


> Skin in the game is the best cure against techno-evengelical optimism.

I think it’s also a non-optional component of any general intelligence. Wrong predictions must be costly.


A bit easier is Metamagical Themas https://www.wikiwand.com/en/Metamagical_Themas which is a collection of articles from Scientific American when he took over the Mathematical Games column.


How does it compare to I Am A Strange Loop? All 3 are on my list but I found GEB very difficult to get started.


It's a collection of self-contained columns gathered together. They're all over the place in subject so if you went looking much more likely to find something specifically to draw you in. Also you can just skip ones you're not into without losing the overall picture like in GEB.

Different kind of book but similar explorations. I'd also say overall more enjoyable unless you're specifically into the thesis of GEB.


I Am a Strange Loop is for sure the least technical of the three but it's best if you read it last since it's a very emotional and touching denouement to the other two books.

You can read GEB or MT in any order, perhaps MT might be more approachable to start with since while it is quite technical, it's a series of articles that can be read independently of one another and each article has a bit of an aha moment, as opposed to GEB which is one loooong marathon where the aha moment is absolutely enormous, but requires a great deal of preparation.


> How does it compare to I Am A Strange Loop?

Don't know, haven't read it. But Metamagical Themas is a considerably easier ride than GEB, as it is made up of (mostly) independent articles.


GEB as literature stands up much better than GEB as musings on AI. The book practically invented a new form of expression that hasn't been matched to this day, with recursion between modalities in every chapter.


I tried reading GEB years ago and then gave up, but then I read "I Am A Strange Loop" and thought it was fascinating. I think the latter is just smaller version of the former, but more concise as he/they was able to think about it for a few more years.


Strange Loop is much more approachable and in many ways is a more refined and concise treatment of his ideas


The latter is a much smaller version of the former. GEB's central theme has to do with levels of meaning, which is delivered not only in substance but also in form. One has to look no further than Contracrostipunctus, or the final dialogue to see it on full display.

https://godel-escher-bach.fandom.com/wiki/Contracrostipunctu...

hidden in the dialogue as the first letter of each phrase is the acrostic "Hofstadter's contracrostipunctus acrostically backwards spells 'J.S. Bach'" which, when read acrostically backwards does indeed spell J.S. Bach.


he/they?


i actually wasnt sure if there was more than one person involved in all of this work...so i hedged.


Hofstadter is a polymath, and he's arguably changed the understanding of the philosophy of existence as much as the understanding of these philosophies have changed him. I actually like using "they" to describe this process.


Because he’s too smart to be just one person?


Maybe we recognize the contributions and influences of others in his work? Hofstadter is a big time guy, but he's not a self-centered egoist like Kurzweil. He'll regularly talk about the ideas he's bounced around with buddies like Daniel Dennett, etc. I see him as more of a poet than a philosopher or scientist. He is piecing together little bits of truth from many, many, different domains and showing how they reach the same conclusions. There's been nobody like him for over 50 years, so it's very difficult to draw comparisons to others, or even measure the impact of his contributions.


I talked myself into one of his classes as an undergrad and definitely came away with the belief that he was the smartest person I would ever meet. It's been 23 years and that's still true and I'd take a bet today that it will continue to be true for the rest of my life.


Agreed. He's a humble guy, but it's also very humbling being around him and listening him to talk. As I've gotten older, I have to look back at my STEM career in academia and industry, I have to say that he's the last individual that I've really been in awe of.

Thinking on the "the halcyon years of AI," in the 70s and (maybe?) 80s, was there anything really "missed" then, or have most of the modern advances in AI only really been possible due to the increase in computing power?

To put it another way, if you were transported by to 1979 knowing what you know now (or maybe able to bring a good book or two on the subject) would you be able to revolutionize the field of AI ahead of its time?


I'd say yes. You could bring Support Vector Machines [0], bring the Vapnik-Chervonenkis theory of statistical learning [1]. You could fast forward the whole field of Reinforcement Learning, you would be able to show that it's possible to solve problems like passing gradients through a sampling process using the reparameterization trick [2], you would be able to show that you can pass gradients through a sorting process [3].

You would also have experience of working with autodiff software and build such. Imho the advent of autograd, tf, torch and so on helped tremendously in accelerating progress and research because the focus of development is not on correctness anymore.

[0] https://en.wikipedia.org/wiki/Support-vector_machine

[1] https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_th...

[2] https://openreview.net/forum?id=33X9fd2-9FyZd

[3] https://old.reddit.com/r/MachineLearning/comments/mcdoxs/p_t...


I took the MIT AI course 6.034 in 1972 from Patrick Winston. He taught that course periodically until his passing a couple years ago. The 2016? Lectures on MIT opencourseware. I would estimate there was 2/3 overlap between the 1972 and 2016 versions. That course is heavy on heuristics and not big data.

Around 1979 an MIT group lead by Gerald Sussman (still working) designed a workstation specifically to accelerate LISP. It was hypothesized a computer that ran LISP a thousand times faster would revolution AI. It did not. However the two LISP workstations that saw commercial sales did jump start the interactive graphics workstation market (UNIX, C and C++). Dedicated language machines could not keep up with the speed improvements of general CPUs.

On the other hand custom neural chips from Google, Apple, Nvidia (and soon MicroSoft) have really helped AI techniques based upon deep convolutional neural networks. Neural chips run orders of magnitude faster than general CPUs by using simpler arithmetic and parallelism.


> It was hypothesized a computer that ran LISP a thousand times faster would revolution AI. It did not. However the two LISP workstations that saw commercial sales did jump start the interactive graphics workstation market

It's very fitting then that GPUs have been so key in modern ML.


You might be able to help a bit sure. There are some algorithmic improvements that have been made, so you could bring those back in time. Then you could just assure people that if the spent the time to develop huge datasets, and do enough parallel computation that you can get good results. But it would have been very slow back then.


I didn't understand most of the essay.

> The dialogues interspersed between the chapters are, indeed, often contrived, arch, heavy-handed, and excessively cute, but they have virtues that, pedagogically and presentationally if not aesthetically, far outweigh these drawbacks; they are uniformly clear, vivid, memorable, and thought-provoking.

Another sentence that I didn't understand. Sigh. I often think that I am just not smart enough to be in this field.

But, in any event, I found GEB inspiring. Thinking about these world-changing ideas by approaching them from different directions turned me on. I wrote a book on the Theory of Computation that was inspired by it -- OK it is not itself inspiring, but it tries to give readers all kinds of ways to explore the material, while perhaps not saying anything that can't be proved. I hope some people here might be interested: https://hefferon.net/computation/index.html


> The dialogues interspersed between the chapters...

There are dialogues, and they are interspersed between chapters of traditional prose

> ...are, indeed, often XXX, but they have virtues that, YYY, far outweigh these drawbacks; they are ZZZ

Said dialogues have negative attributes XXX. However, they also have positive attributes ZZZ. These positive attributes are characterized as YYY.

> XXX contrived

made to suit a purpose

> arch

underhanded

> heavy-handed

obvious

> YYY pedagogically

good for teaching

...and so on.


> The truth is that, since music does not denote, the issues of levels of representation and self-reference really do not arise

I absolutely disagree. Firstly, music may not denote in the same way as math notation or a sentence of natural language, but it does refer to something external: it triggers sensations in the listener, and it does so in some ways that are objective. For instance, some notes drive satisfyingly ("resolve toward") some other notes, whereas in other situations there is a sense of ambiguity. Using harmonic trickery, the composer can predictably lead the listener around in a labyrinth; the book goes into that.

Secondly, if we accept the premise that music doesn't denote anything external, then we have to confront the remaining fact that music references, and therefore denotes, other music, which is potentially itself. Lack of external reference not only does not preclude self-reference, but emphasizes it as the only interesting option.

Music intersects with computer science, obviously; there are people who do computational music. There are hard problems in music. Bach belongs in that book, because we have not produced a convincing Bach robot.

Why is Escher acceptable but not Bach? An Escher drawing like The Waterfall denotes something: it denotes a nicely drawn piece of architecture denoting an artificial waterfall driving a waterwheel. But the denotation isn't the point; it's the impossibility of the object itself, and of the perpetual motion of the waterwheel. The art has numerous details which don't contribute to that at all. For instance, the waterfall structure has numerous bricks, all carefully drawn. Music is the same. It has sensual aspects, like the timbre of instruments and the expressiveness of the performance, or the power of many instruments in unison and so on. We can't be distracted and fooled by that.


It's ostensibly a book about AI, but when I first read it, summer after high school (or after freshman college year?), it blew my mind as a book that showed me that computer science was science and mathematics and not just typing in programs and fiddling in BASIC on my TRS-80.


The abstract approach to AI is still around and fruitful although it may not be fad of the day among the Phronix crowd.

For example look at a recent paper by Ernest Davis, the author of the review linked to above:

https://arxiv.org/pdf/2004.13831.pdf

A Review of Winograd Schema Challenge Datasets and Approaches

Vid Kocijan1∗ , Thomas Lukasiewicz1,2 , Ernest Davis3 , Gary Marcus4 and Leora Morgenstern5 1University of Oxford 2Alan Turing Institute, London 3New York University 4Robust AI 5Systems & Technology Research / PARC firstname.lastname@cs.ox.ac.uk, davise@cs.nyu.edu, gary.marcus@nyu.edu, leora.morgenstern@gmail.com

Abstract:

The Winograd Schema Challenge is both a commonsense reasoning and natural language understanding challenge, introduced as an alternative to the Turing test. A Winograd schema is a pair of sentences differing in one or two words with a highly ambiguous pronoun, resolved differently in the two sentences, that appears to require commonsense knowledge to be resolved correctly. The examples were designed to be easily solvable by humans but difficult for machines, in principle requiring a deep understanding of the content of the text and the situation it describes. This paper reviews existing Winograd Schema Challenge benchmark datasets and approaches that have been published since its introduction.


One thing I initially found dissapointing when reading this was this bit:

I am working from memory here. Charniak’s thesis was never published, and I have not seen a copy since 1981

Luckily, it turns out that Charniak's phd thesis is available online these days.

https://dspace.mit.edu/bitstream/handle/1721.1/6892/AITR-266...


If you pick up a more recent edition of the book, it includes a new preface where Hofstadter actually addresses a lot of this himself - he admits to being "embarrassed" by his claim that computers that could beat people at chess would get bored of chess and talk about poetry, although I think he's being too hard on himself.


My research (automated theorem proving with RL) sits partway between "good old-fashioned AI" and modern deep learning, and GEB struck me as amazingly prescient, with lots of lessons for modern AI research.

There's a growing sense among many ML researchers that there's something fundamentally missing in the "NNs + lots of data + lots of compute" picture. GPT-3 knows that 2+2=4 and that 3+4=7, but it doesn't know that 2+2+3=7. These heart of these kinds of problems indeed seems to be the sense of abstraction / problem reimagining / "stepping outside the game" that Hofstadter spent so much time talking about.

Chess (accidentally) turned out to be easy enough for a narrow algorithm to work well. But I'd be surprised if other problems don't require an intelligence general enough to say "I'm bored, let's do something else," and I don't believe current algorithms can get there at any scale.


In my former research I was interested in automated theorem proving (constructing a variant of the lambda calculus from which we can run genetic algorithms).

Also, my gamer tag is maxwells_demon, similar to your HN name.

Unfortunately, the 14 year-olds in online games don't appreciate jokes about thermodynamics.


I read this book in high school and it seriously changed my views on life in a lot of different ways, as well as opened my mind to a lot of new ideas and ways of thinking I had never considered before. This review brought some good memories of that book back. I should go dig that out of storage and give it another read.


At lot to agree with here:

[Deep Blue] cannot answer elementary questions about chess such as “If a bishop is now on a black square, will it ever be on a white square?” ... All that it can do is to generate a next move of extraordinary quality.

But in a deeper sense, the match between Deep Blue and Kasparov proves that Hofstadter was profoundly right. ... It is precisely this power of abstracting, of stepping outside the game to a higher-level view, that enabled Kasparov to find where Deep Blue was weak and to exploit this to win his match.

Also, I strongly agree that the music/Bach connection in the book makes little sense.

The only aspect of the book that really does not work for me is the attempt to integrate music.

And there are some very interesting insights with benefit of 20 years of hindsight.

Today, open-source chess programs are better than the best humans. (they have been "trained" via a form of adversarial learning.)

And:

Part of this, of course, is nostalgia for the halcyon years of AI when the funding agencies had more money to throw around than there were researchers to take it, and the universities had more tenure-track slots than there were Ph.D’s to fill them;

Biden just announced a desire to DOUBLE American investment in R&D. Such an optimistic time we are at right now.


As a professional singer of Bach, I feel that the integration of Bach made a lot of sense and worked very well. I am forced to wonder if the author of this review was a high-level musician.


There still aren't Bach-equivalent counterpoint solvers. ML has been pretty disappointing for music. It can make some music-like objects, especially for short phrases, but it hasn't done a convincing job of producing the real thing.

Music is hard. It's far harder than most people realise.

Winning a game is a relatively easy problem, because you know when you have won. Music is much more nebulous. Grammatical correctness is preparatory to creative intent. Even basic metrics of correctness are rather fuzzy.

ML doesn't have any concept of intent, so it tends to flail around in a statistical space - and to sound like it.


Judgment of music quality also happens to be highly subjective.

Suppose an AI created a Bach-like contrapuntal exercise with a lot of cross-relations (i.e., clashing dissonant sounds). Would scholars judge it to be at the level of Bach because of the handling of these numerous cross-relations? Or would they claim it isn't at sophisticated as Bach because having that many cross-relations isn't stylistically accurate? Based on the historiography of musicology I would guess the latter. Even though there's an unfinished extant Bach fugue in C minor where he's essentially role-playing a cross-relation obsessed AI composer.

The history of theory is even worse. For example, Schenker created theories based on the mastery that Bach, Beethoven, and others displayed in their compositions. But when a piece by Bach didn't fit his theory, he claimed that Bach actually wrote it incorrectly!

I'm afraid AI music research is going to have to put up with the same irrationality of humans as every other subjective endeavor. That is, musicians will kick and scream that AI "isn't doing it right," then when it inevitably becomes embedded as a core part of music making we'll hear all about how music produced with classical algorithms on old protools rigs sounds so much "warmer."


On the other side of the spectrum, I myself am a relative musical neanderthal, having gone into the book not even knowing what the word "fugue" meant. I was fascinated by the way he related this sort of musical "recursion" back to general mathematical problem solving. When I read OP's dismissal of it, I figured I was just underthinking whether it was a good analogy or not, being the unsophisticate that I am - nice to see that somebody who does have some expertise on the subject weighing in.


As a person who is a musical idiot I thought it was a great way to establish the concept of patterns and self reference in something like music. The notion of the self referential cannons was seriously impressive to me and I thought served as a great jumping off point into more esoteric conversations in the book.


> As a professional singer of Bach, I feel that the integration of Bach made a lot of sense and worked very well.

I am only an amateur player of Bach, but I felt the same way. However, it probably is worth noting that a major reason why Bach works in this context is the particular nature of his music--or more precisely the particular subsample of his music that GEB refers to. I don't think the music of most other composers, or even some of Bach's less contrapuntal music, would have made sense in this context.


Agree. I am not a musician, but have absorbed a lot by having lived with them.

Weaving Bach in makes perfect sense.


Just a note on chess programs: Until very recently (months) the best chess programs consisted of both neural network based ones, and search based ones similar to deep blue. They would go back and forth for supremacy, and both are way better than any human.

Recently, one of the top search based engines (stockfish) added neural networks on top of it, and it got significantly better still. This was a couple months back, maybe the pure neural nets have caught up again!


Nice, do you have a link to any details on this ongoing battle?


I disagree. I’m a professional-level musician and a fan of GEB and I always felt the music and Bach analogies in the book were elegant and insightful.


> to a higher-level view, that enabled Kasparov to find where Deep Blue was weak

I don't know if the same could be said of AlphaZero.


"I'm So Meta, Even This Acronym"


GEB, and its rebuttal, Emperor's New Mind [1], are both fantastic books. Both are must reads regardless of one's views on AI (or approaches to AI).

I never bought Hofstadter's thesis -- am solidly in Penrose camp -- and find it interesting that Penrose went on to develop his thesis even further [2][3] while Hofstadter has not. That said, Hofstadter is clearly a far more engaging and talented writer, and in context of Symbolic AI, his book outshines (imho) Minsky's Society of Mind [4] (which I found rather underwhelming).

There is a mild sense of tragedy about Hofstadter. Something about massive early success..

[1]: https://www.goodreads.com/book/show/179744.The_Emperor_s_New...

[2]: https://en.wikipedia.org/wiki/Roger_Penrose#Physics_and_cons...

[3]: https://en.wikipedia.org/wiki/Orchestrated_objective_reducti...

[4]: https://www.goodreads.com/book/show/326790.The_Society_of_Mi...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: