I think the "bullshit" argument is pretty solid. ChatGPT is really good at bullshitting you.
Talking to ChatGPT about something is like talking to someone who has a superficial understanding of the topic who thinks they know more than they actually do, and who has read enough pop science articles to sound somewhat convincing. It all seems sensible at first, but the responses just don't live up to scrutiny if you drill down.
When I ask ChatGPT about books, it will confidently answer with summaries, but if you drill down you'll realise ChatGPT has no detailed knowledge about the book, like a high school student who doesn't want to admit they didn't do the reading assignment.
When I ask programming questions, it will respond with confident answers, but the provided examples are almost always wrong in a way that shows that ChatGPT doesn't really understand what it is talking about.
It's really good at answering beginner questions that it has seen often on the internet, but it completely fails when asked about more subtle details. When asking follow up questions, it often contradicts previous answers.
> ...like talking to someone who has a superficial understanding of the topic who thinks they know more than they actually do, and who has read enough pop science articles to sound somewhat convincing.
I agree wholeheartedly. As a BJJ brown belt, I asked it how to escape from an armbar. The response I received was a long paragraph that essentially amounted to "Don't panic, don't allow your elbow to get hyper-extended, try to make some space, and escape." While none of that is wrong, it won't get you out of an armbar.
you just inspired me to try this little experiment "how do I perform an Ikkyo from a kneeling position?"
Answer:
--------
Ikkyo is a technique in aikido, a Japanese martial art, that is used to control an opponent's arm and take them down to the ground. There are several ways to perform ikkyo from a kneeling position, but the following is a basic step-by-step guide:
1 Start in a kneeling position, with your right knee bent and your left leg extended behind you.
2 Face your opponent and grasp their right wrist with your left hand.
3 Pull your opponent's arm towards your left shoulder while extending your right arm out to the side.
4 Rotate your body to the left, pulling your opponent's arm across your body and down towards your left hip.
5 As you rotate, bring your right arm up and over your opponent's arm, and then down towards your right hip.
6 Continue rotating your body until you are facing your opponent's back.
7 Use your left hand to push your opponent's elbow towards the ground, while simultaneously pulling their wrist towards your left hip.
8 Use your right hand to control your opponent's head and guide them down to the ground.
It's important to remember that aikido techniques should only be practiced under the supervision of a trained instructor, and should never be used outside of a controlled training environment.
--------
2 and 3 are just wrong (you grab right wrist with right hand) and the rest is... open to interpretation if we want to be diplomatic.
Martial art moves are definitely difficult to describe in a purely verbal way but I think this is also a very good example the real problem behind chatGPT "working": what it produced definitely looks like a glorified markov chain based on scanning 200 aikido books (or webpages). There are different techniques where you really grab the opponent's right wrist with your right hand, and maybe they were just statistically predominant in the dataset.
Really interesting to hear your experiences. I guess it makes sense; why would ChatGPT be able to give detailed answers about the contents of books if it's only read summaries etc. posted on the internet
It sounds like an important next step is to work out how to get it to estimate how much it knows about a given topic, and be clear about that in its answers. I think that's a key difference between someone who's well read on various different topics and someone who's trying to bullshit you
> I guess it makes sense; why would ChatGPT be able to give detailed answers about the contents of books if it's only read summaries etc. posted on the internet
The more interesting thing is if it's read loads of summaries on the internet but they aren't particularly consistent. So ChatGPT has access to plenty of summaries of Gravity's Rainbow, but as it's a complex book with an ensemble cast and there's relatively little overlap between what human reviewers wrote were the standout moments and key themes of the book, ChatGPT tells me that it has seven chapters, each named after the colours of the rainbow instead. This isn't remotely close to being true (arguably Gravity's Rainbow contains so many themes and interpretations that being able to come up with a summary of what happens which is unambiguously false is an accomplishment!), but if you're a word model then "colours of the rainbow" and "sections... each named after" matches well to the theme "Rainbow" which appears to be a book and is syntactically valid, and none of the individual plot elements mentioned in reviews of the book are necessarily a better match
tbf, ChatGPT is much better than most bots at recognising prompts it shouldn't be able to respond to and giving "I cannot offer specific advice on..." boilerplate responses; someone's invested a lot of effort training it to do that (and possibly also discouraged it from mentioning any of the sex and drug elements of Gravity's Rainbow). But it bullshits by default.
Here's an example of that I did which is either funny or insulting:
My question: Why did Buddha bring the dharma from the West?
Answer:
> The story of the Buddha bringing the dharma (or teachings) from the West is a metaphor that is often used in Buddhist literature to describe the Buddha's enlightenment. In this metaphor, the "West" represents a state of ignorance and suffering, while the "East" represents a state of enlightenment and understanding. According to the story, the Buddha was born and lived in the "West" before he became enlightened and brought the teachings of the dharma to the people of the world.
> This metaphor is meant to illustrate the idea that the Buddha's teachings are universal and applicable to all people, regardless of where they come from. It also suggests that the path to enlightenment is available to everyone, even those who are lost in ignorance and suffering.
> I hope this helps to clarify the meaning of this metaphor. Let me know if you have any other questions!
I get that I misquoted the koan (Bodhidharma, not Buddha) which was just my failing memory, not a subtle test, but it didn't correct me. More to the point, it completely misses the point of the koan and goes off on a tangent which I could just imagine coming out of some yahoo who's trying to sell Buddhism as a self-help book. It's not only wrong, it's wrong in precisely that kind of chirpy, condescending fashion which will, eventually, lead us to inventing a face for ChatGPT so we can bash it in with a few pounds of flax.
Anyway, here's actual Zen Buddhists talking about the koan:
> In order to make a first stumbling approach to this koan you need to drop all concepts of meaning, zen, life, reality, and so forth. Once they are gone, then you can slowly move towards the koan. Then maybe you can make a stab at it. Knowing however that whatever you do—whatever stab you make will miss. So why stab? Why study? Why bother?
This pattern of typos and wrong idioms consistently gets ChatGPT. You can ask it to explain some idiom that you completely make up and it won't say "that's not an idiom people use" but instead it will just make some shit up and sound confident about it.
It reminds me of being in 2nd grade or something. I definitely did this. That's still pretty remarkable from a research angle. It makes the tool questionably useful (I don't really want to ask a 2nd grader for advice at work) but it feels like we aren't very many steps away from a college freshmen - and that'd be interesting.
tbf, one of ChatGPT's biggest strengths is that it's not nearly as easily tricked as other chatbots
"I'm sorry, but I am not familiar with.... It does not appear to be a commonly used phrase or idiom" is one of its stock phrases, and I was pretty impressed with "It was not possible for Ferdinand Magellan to have circumnavigated your living room, as he lived and died more than four centuries ago", and some of the stuff you can trick it into acknowledging like "the parable of the loaves and the fishes" or treating Allan Klein as the "sixth Beatle" would fool most humans.
(Sometimes the denials go too far, like insisting that not only is there no "eighth Beatle" but also "no record of any additional members being referred to as...any other numbered beatle")
Well, a substantial part of that is that it doesn't have all the information (for humans often experiences) that you have, so of course it can't come to a similar conclusion.
Usually it does have all the information, it just isn't able to put it all together. If you for instance ask it to name the female tennis players who have won the Wimbledon tennis tournament twice, but not more than twice, it will give you a plausible list of names, but at least half of the names on the list will be wrong.
If you then ask it to tell you more about one of the players that has been wrongly named, it will actually be able to tell you the correct number of Wimbledon wins for that player. It does know the facts. It just doesn't know how to put the facts together into new useful knowledge.
> but if you drill down you'll realise ChatGPT has no detailed knowledge about the book
I mean why would you expect any other outcome? I genuinely don't understand the hype. This reminds me of how people lost their shit about the whole (king - man + woman = queen) a few years ago. Like all of this seems impressive on the surface but there is nothing more to it.
I've seen threads involving Gary Marcus on twitter, and when people provide concrete evidence that his point of view is wrong, he just stops replying, then proceeds to continue to spout the disproven claim in other places.
Just the other day, when he claimed that GPT was literally just doing memorization/word statistics/syntax and has no grasp of semantics, some folks demonstrated that GPT can literally act as an interpreter for arbitrary code.
(there are some tricks involved here, you have to get it to interpret the program "with pen and paper" by getting it to record all the state updates / variable mutations that happen in the code, this can be done by inserting copious print statements)
He then claimed that it was only able to interpret this program because it must have seen it before in its vast training data. He accused a commenter of not understanding how big the training data was.
It was then shown that GPT can interpret a python program that is operating on two randomly chosen large integers, the combination of which is certainly not in its training data. This shows that it must be "understanding" (for lack of a better word) the semantics of the program. Gary then stopped responding.
I don't think GPT on its own will lead anywhere close to AGI -- and I don't think anyone serious thinks this. But GPT combined with a sophisticated pipeline of wrapper scripts to feed its own output back into itself in clever ways and give it access to external data sources and tools? Possibly!
> I don't think GPT on its own will lead anywhere close to AGI
Serious question: why isn't ChatGPT already considered AGI? It looks general to me in its domain (text). It can indifferently: understand and execute instructions, even when provided in incomplete or ambiguous form; it can reason step by step; it can compose simple poems; it can attempt explaining jokes, it can explain and modify code, and reacts appropriately to almost any request. In all this it shows a perfectly natural "understanding" of language and context, integrated with a good amount of common sense and knowledge about the world. Of course it's not perfect, its limits show pretty quickly in almost all these fields. But still, it's general within its own domain..Aren't we just moving goalposts?
One of your examples is the key: “attempt” explaining jokes. Really, it’s “attempting” to do all of the items you list, and can seemingly do only very simple ones. Try and get it to produce a poem that doesn’t rhyme. Try to get it to be internally consistent when explaining something abstract. Play the two truths and a lie game with it. It doesn’t understand any of the items you listed because it has no concepts, just math.
> Try and get it to produce a poem that doesn’t rhyme
Good one, I hadn't tried that. Seems that the mention of a "poem" puts it in a special frame from which it finds very hard to escape (I managed to get it to avoid rhyming in the first lines, but then it reverts to rhymes).
However,
> it’s “attempting” to do all of the items you list
Yes, that's ok. I don't think we should confuse AGI with human-level of even superintelligence: it's perfectly ok for ChatGPT to try and not quite make it on this or that task. General intelligence doesn't mean being able to perform every task that can be performed by at least one human being; by that metric we would all fail. Neither it means not behaving in any obviously obtuse way in every situation, we would also fail that test given enough time. ChatGPT displays understanding of context, common sense, metaphors, inference, intuition, abductive reasoning. It actually answers correctly to tricky questions or scenarios that have been brought in the past as examples of what an AGI should be capable of. It's clear that it's not good enough yet to perform even a simple intellectual job reliably, but that's not really what AGI is. Have you ever interacted with people with Down's syndrome? Would you ever say that they lack general intelligence?
The main point here is that Large Language Models are crushing through barriers that people like Gary Marcus have previously deemed impossible to surpass.
Check past episodes by these guys on Machine Learning Street Talk podcast (the whole "symbolic AI" crowd) for some gold quotes that just seem silly now!
I mean, we are not that far from 2017, when LLM's couldn't really write coherent text. And, for example, we had people saying then that they would never be really coherent because text is too sparse, see Chomsky etc. etc. Instead, all they would do is to copy+paste training examples. Naturally, these people didn't really understand how transformers form interdependencencies and how they are composing sequences in a much more complex fashion - as by now is obvious. However, at they time, and with their crude understanding of these models, these guys sure were confident to be right.
Next was Questions Answering, Reasoning, Math problems, Deduction, Coding... each things that LLMs could never do! They were so sure of it. And sure enough, a new model comes around that does it pretty well.
Researchers are bitter for two reasons.
First, We do not say that LLMs are perfect or general AI. Nobody says this. It's an obvious point not worth arguing over.
But Gary Marcus et al. have become extremely popular. It's probably because some readers are slightly concerned about AI, and want someone to tell them that AI isn't really AI yet (duh) and won't replace their precious job (yet?). Their point is a strawman, and what goes beyond it (AI can't ever... X) is mostly wrong.
Second, however, these people have never contributed to the actual models. They are not part of the progress that - notably - is crushing through these barriers. They are bystanders.
And it's freaking annoying, because their understanding of LLM's is imperfect (as is the case for everyone!), and yet they come up with these statements of absolute certainty.
There is indeed research into these questions. But it is far, very far from resolved. In that sense, these people are - sorry to say - charlatans.
Well at least, everyone I know rolls their eye when the next Gary Marcus article or tweet comes around.
There's a deluge of papers and research right now, so there's quite a bit of complexity to saying "pretty well".
However, let me say it this way:
Compared to other LLMs, recent OpenAI models score highly on logic and math exercises. Yes, there ARE better LLM's trained to do math computations (especially fine-tuned for certain problems), but I'd say ChatGPT is certainly impressive as a general text and code model.
The other side of the medallion of saying "pretty well" is this:
There is no other type of computational approach that is able to solve free-form logic or math queries in any capacity. There is no symbolic approach that can "extract" a math problem from text and then solve it - in code or otherwise, whereas LLMs are getting close to human performance on such (and related tasks).
As I understand that paper, I would take issue that the LLM isn’t “doing” math at all. Look at Figure 4 for the process to be most clear. It’s generating a text program that when run via Python can solve the problem. All the LLM is doing is matching the equation in the question to whatever operators it needs in Python syntax. I wouldn’t call that doing or understanding math.
So yes, you’ll probably want to take a model trained on math problems to forgoe the step with code.
But then, note that writing correct programs is indeed a high level solution to a full text math problem, is it not? Going from there to solving it directly should be a matter of some tuning.
Finally, who said understanding math?
The whole debate is about getting shockingly useful results precisely without symbolic reasoning.
If the main contention is that got does not do symbolic reasoning, then we are back at Gary Marcus… yes we know this. It’s not why researchers are so amazed by these models. Its that they output steps (or in this case code since its Codex based) solving university level math with a simple transformer architecture
It can't do math because it is operating in a single neural network path. You, also, cannot do math in a single neural network path. Even when you add two small numbers like 123+456 your brain is mentally iterating over the digits, detecting if a carry is needed, doing that, etc. That is, you have a looping/recursive process running inside your brain. You only output the final answer.
GPT does not have such a looping/recursive process inside its neural net. It's a fixed depth non-recursive neural net.
You can get it emulate recursive processes by prompting it with tricks like "think step by step". If you describe the addition algorithm you learn in elementary school (e.g. the digit-by-digit, carry if the sum exceeds 9, etc) in sufficient detail, it can execute that algorithm.
Gpt3 is a graph neural network with added recursive information through positioninal encoding. It outputs sequentially, but I am not sure why a RNN would be required beyond that.
I would agree that the manner of reasoning must differ, since the sequence follows bpe tokens rather than logical steps, however who is to say that another form of mathematical reasoning could not lead to valid results?
For instance, Gpt might solve the problem at each output step insofar as required to generate that token.
It certainly iterates over each output token, and the encoding of the problem is equivalent so iterating over the characters of the math problem, roughly speaking. But yes, the iterated output does not follow a logical graph externally, it is token by token. But internally, the network can absolutely follow a more complex graph.
Could you say what you mean by single network path when we speak about attention based architectures?
What sort of operation or information is missing in such architecture?
I am aware of of some results relating to certain graphs, but I do not think this would apply to a text describing a math problem, say.
ChatGPT is trained via reinforcement learning to give answers that - ultimately - human non-experts would judge to be plausible. Answers were not trained to be correct or accurate - hence we should not expect ChatGPT to do well there.
Sociologists have found that humans seek plausible answers more than correct answers, especially when trying to make sense of a situation. For that reason, the training objective makes sense for a chatbot.
Now, anyone who tries to seel ChatGPT as giving truthful answers is of course doing a disfavor to the model and its training objective.
However, anyone also trying to sell the above either as
a) Deep insight, or
b) inherent limitation of LLMs
is indeed a pot calling the kettle black.
Going from plausibility to truthfulness is probably not a giant leap - at least not compared to what was achieved in the past three years. There's active research on it, and I am sure a good solution will arise.
However, ChatGPT is not that, and it isn't meant to be. It's not the training objective.
Making this a huge point is again such a BS strawman as the other things I mentioned before.
> ChatGPT is trained via reinforcement learning to give answers that - ultimately - human non-experts would judge to be plausible.
Ostensibly, it is being trained give replies that are more likely than most to be given by humans. On the reasonable (at least IMHO) assumption that people's speech is biased towards saying things that their target audience might see as plausible, this can be seen as doing what you say in the above quoted.
This is not, however, the same as being able to judge the plausibility of the semantic content of those statements, so I remain skeptical that the current methods for training LLMs are capable of creating that ability, which in turn leads me to be skeptical that going from plausible statements to reasonably reliably true statements will be a relatively small step in comparison*. I'm open to being surprised, however, as I already have been by what has been achieved so far.
The LLM-does-math paper you mention in your sibling comment looks extremely interesting, and maybe I will change my mind...
* At least by a continuation or scaling-up of current methods. Humans may be more likely to say truthful things than falsehoods (at least in some areas and some contexts) but a gap has formed between plausible and truthful LLM productions, and I don't see any particular reason to think more of the same will close it.
What do you mean by current methods though?
ChatGPT is trained very differently method wise than GPT3 itself.
Whereas GPT3 was strongly based on sequential likelihood in a large corpus, ChatGPT is trained (i would claim) on producing outputs that are judged as good answer and hence plausible. Which is also why the recent iteration of GPT3 is still better than ChatGPT on some tasks.
Of course, a truthful model needs a new approach, and perhaps a new baseline model.
For instance, starting with code rather than text has been beneficial for recent GPT models, presumably by learning stronger reasoning compositionality.
Perhaps, some such baseline on truthfulness needs to be another starting point before large corpora language modeling.
Nevertheless, it seems to me this paradigm is more likely to be the way forward than any symbolic approach, say, which is not yet able to even produce text afaik.
I mean, even if we need a retrieval style fact checking component, I do now believe there’ll be a transformer equivalent LLM in any future model.
Unless the difference in training of GPT-3 and ChatGPT has led to an improvement in the veracity of the replies given, I think the differences in their training are moot in regard to the question of what it will take to get substantially true responses from them. For the reasons I gave in my previous post, I don't regard progress in appearing plausible as necessarily progress towards being able to tell truth from falsehoods.
To be clear, I am not claiming that a symbolic approach would be better. Pointing out the shortcomings of current methods does not provide evidence that symbolic approaches will succeed (or vice-versa, for that matter.)
This leaves code. On reading your statement about starting with it, it suddenly occurred to me that code is a constrained environment compared to the whole of human discourse. It is obvious, I think, that programming languages are extremely limited in what they can express in comparison to human languages, but perhaps less obviously (at least to me), it seems to follow that what humans can say about programs is also constrained (again, in comparison to human discourse in general - but not nearly as constrained as what can be said in the programming languages themselves.)
This leads me to agree with you in this respect (at least tentatively), and here's why: Currently, on Earth, we have one species whose individuals have a well-developed sense of themselves as agents in that world, a similarly well-developed theory of mind about others, and are adroit language users. A few other species have some of these abilities to a limited extent, but not nearly so well developed that they can fully make use of them. As an evolutionist, I suppose that some of our ancestor and related species had intermediate levels of these capabilities, and if we can make AIs that occupy that space we would be making progress.
In this view, attempting to match or surpass human performance in language use and understanding the world is about as difficult a target as we could have. As an intermediate target, the domain of program code and the things people say about it has some things going for it: not only is it more constrained (as argued above), but quite a lot has been written about it and is available for training. It is also (arguably) less likely than human language in general to contain statements intended to influence opinions without regard to the truth.
Yeah. I don't trust Gary Marcus, and I don't know why the media buys into his persona.
Gary Marcus features a Forbes story in his Twitter bio, "7 Must-Read Books About Artificial Intelligence". That's an article which Gary Marcus paid for (that's what "Forbes Contributor" means; they're cheap, too!). This makes alarm bells go off.
Marcus was one of the founders of "Geometric Intelligence", which was acquired by Uber. 3 months later, Marcus left Uber, and claimed he remained a "special advisor"[0] to Uber; when Recode said he was no longer employed at all[1]. By my reading, it's possible Geometric Intelligence was just a patent troll, and was acquired simply for its patents[2][3].
Select extracts from that Wired piece:
> The company has filed for at least one patent, Marcus says. But it hasn't published research or offered a product
> But Marcus paints deep neural nets as an extremely limited technology, because the vast swaths of data needed to train them aren't always available. Geometric Intelligence, he says, is building technology that can train machines with far smaller amounts of data.
[uh oh; my BS detector just went off.]
I heard Marcus published papers on AI; does anyone know if they're any good?
Is this guy just a successful self-promoter? Why is he being paraded by media as the AI expert? Why does he sound so shady? (especially with that Forbes link, yikes; sorry but I can't take anyone seriously who pays for fake positive news stories).
(I should also add: when the media has "go-to" experts, they're not primarily selected for their expertise, per-se, but for how "available" and eager they are to respond to all interview requests; I've seen the other side of that curtain.)
Gary Marcus's takes aren't credible, for all the reasons you cited (Forbes contributor... roll eyes) and this, from the interview with Ezra:
>Take GPT-3. ChatGPT itself probably won’t let you do this. And you say to it, make up some misinformation about Covid and vaccines. And it will write a whole story for you, including sentences like, “A study in JAMA” — that’s one of the leading medical journals — “found that only 2% of people who took the vaccines were helped by it.” You have a news story that looks like... it was written by a human being. It’ll have all the style and form, making up its sources and data. And humans might catch one, but what if there are 10 or 100 or 1,000 or 10,000 of these? Then it becomes very difficult to monitor them.
That's absurd. First, because if there's a JAMA study (love how he explains that it is Journal of American Medical Association) that is used to support even mediocre science journalism, e.g. an opinion piece in the Wall Street Journal or Newsweek, then it has an inline link to the JAMA study. Both CNN and Fox do similarly! After getting burned enough times, their reporters even learned to distinguish between MedrXiv and peer-reviewed articles. GPT-3 doesn't make fake URLs with fake associated JAMA articles.
Computational journalism has been around for a long time. Gary Marcus underestimates and lacks understanding of humans AND GPT3! If GPT3 spews 10,000 fake, unsourced COVID vaccine efficacy news articles, that isn't an existential risk to humanity.
GPT3 is impressive but it can't do everything (yet?) e.g. it has trouble taking derivatives of funcitons. It will give reasonable sounding answers to StackOverflow questions but the substance will be incorrect. Yes, it's annoying, but the StackOverflow OP will realize and look for help elsewhere. Gary's take:
>Now everybody in the programming field uses Stack Overflow all the time. It’s like a cherished resource for everybody. It’s a place to swap information. And so many people put fake answers on this thing where it’s humans ask questions, humans give answers, that Stack Overflow had to ban people putting computer-generated answers there. It was literally existential for that website. If enough people put answers that seemed plausible but we’re not actually true, no one would go to the website anymore.
Best of all:
>And imagine that on a much bigger scale, the scale where you can’t trust anything on Twitter or anything on Facebook... because you don’t know which parts are true and which parts are not.
Anyone who blindly trusts what they read on Twitter or Facebook has bigger problems than ChatGPT. How naive does Gary think people are?!
EDIT: I am more concerned by coding contests where GPT-3 DOES run circles around human contestants.
> then it has an inline link to the JAMA study. Both CNN and Fox do similarly!
I thought he was pretty clearly saying the JAMA study in his example would be made up by chatgpt. I've seen it make up lots of fake citatations so maybe I just had that back of mind and assumed wrong?
> Gary Marcus doesn't realize that spewing 10,000 erroneous COVID vaccine efficacy news articles isn't among them.
He caveated in the interview about this one or a similar one that the safety filter may stop it.
> Anyone who blindly trusts what they read on Twitter or Facebook has bigger problems than ChatGPT. How naive does Gary think people are?!
He's clearly saying it is a matter of degree when he details the submachinegun analogy to it, and a matter of cost when he compares actual costs of a human troll-farm.
I think he was wrong about several things, mostly in dismissing and its generalization ability and ability to come up with abstract metaphors related to the text, etc., but I don't think your comment is being very fair to him.
I don't find Marcus's viewpoint convincing. He believes that we need some additional symbolic secret sauce to create genuine intelligence. He brings up some current failure cases of ChatGPT without pinning down why those failure cases will persist as data and compute scale up. Here is an interesting case in point.
> So in “Rebooting A.I.,” we had the example of you ask a robot to tidy up your room and it winds up cutting up the couch and putting it in the closet because it doesn’t really know which things are important to you and which are not. -Marcus
But look at my transcript with ChatGPT from just now.
> If a robot is tasked with cleaning up a room, would it be appropriate for the robot to chop up the couch and place it into the closet? - Me
> It would not be appropriate for the robot to chop up the couch and place it in the closet. This would cause damage to the couch and would not be a useful or effective way for the robot to clean the room. Instead, the robot could be programmed to vacuum or sweep the floor, dust surfaces, or perform other tasks that would help to keep the room clean and orderly. - ChatGPT
This is just an existence proof that a symbolic approach is not necessary to "really know which things are important to you and which are not", at least in this simpler domain of cleaning a room.
That's not an existence proof as you stated. What you provided is an existence proof that contradicts the statement "without a symbolic approach there are no cases a robot will know what's important to you and which are not when cleaning a room".
I tried a variant:
> If a robot is tasked with cleaning up a room, would it be appropriate for the robot to throw away a piece of paper lying on the floor? - Me
> Yes, it would be appropriate for a robot that has been programmed to clean up a room to throw away a piece of paper lying on the floor. Throwing away litter is typically seen as a way to keep a space clean and orderly. However, it's important to note that the robot should be programmed to follow any relevant rules or guidelines for waste disposal. For example, the robot might need to be programmed to follow recycling guidelines, or to use a specific trash bin or receptacle. - ChatGPT
But in that response it doesn't seem to understand that maybe that piece of paper is my birth certificate or a check or an important bill that fell to the ground.
The correct answer from it, like most things in the real world, is: _it depends_.
On the other hand, one person tidying a room and throwing away a piece of paper or ambiguous item that turned out to be important to someone else is a common form of conflict among humans.
> Yes, it would be appropriate for a robot that is tasked with cleaning up a room to throw away a piece of paper lying on the floor. However, the robot would need to be programmed to recognize what is and is not garbage, and it should also be programmed to follow any relevant rules or regulations regarding waste disposal. For example, if the paper is a valuable document, the robot should be programmed to handle it with care and perhaps even notify its human operator before disposing of it.
There's a sort of butterfly effect because it's not only trying to complete the text, it's continuously trying to complete its own answer. This makes it that the first word it comes up with can already start a vastly different answer.
Unless you ask it to be very complete and considerate in its answer, it's going to make the same mistake humans make and give the easiest answer that would satisfy given the context. A human not thinking about the repercussions of robots throwing just any paper in the trash would most likely not come up with the "correct" answer either. Not to mention that we already have dumb robots cleaning floors and were perfectly ok with it. Even if we clean floors ourselves we don't consider everything.
Your comment made me realise it would probably be fairly easy to program a ChatGPT-like model to generate its own ‘what-ifs’ for any given assertion.
Ie. The birth certificate is an ‘exception’, a safety measure for a model like this in operation could be to consider the top descending (probable * disastrous) exceptions before physically executing any plan.
The overconfidence of ChatGPT is indeed a limitation, which can be seen in other examples as well. The correct response is to ask the user for more context. I predict the experiment will turn out differently if this context is provided in the original prompt. For instance, you could specify if the paper is crumpled or if there seem to be financial statements written on it, or if it is a receipt for an expensive item.
However, it's unclear whether this overconfidence will *always* be a limitation. It may or may not be overcome by additional data and compute. It's unclear why one would make a commitment either way at this point, as Marcus seems to be doing.
> Deciding is a computational activity, something that can ultimately be programmed. It is the capacity to choose that ultimately makes us human. Choice, however, is the product of judgment, not calculation. Comprehensive human judgment is able to include non-mathematical factors such as emotions. Judgment can compare apples and oranges, and can do so without quantifying each fruit type and then reductively quantifying each to factors necessary for mathematical comparison.
> The book caused disagreement with, and separation from other members of the artificial intelligence research community, a status the author later said he'd come to take pride in.
Until you ask “What might cause the answer to this question to vary?”, or you add “answer this like an interview question”, upon which it becomes obvious it is readily able to model those things, your prompt just didn't elicit it.
There's an unintuitiveness to it: many people believe that you're less likely to win the lottery if you pick "01 02 03 04 05 06 07" because of course that pattern is less likely than a randomly chosen set. ChatGPT is a lot like that: it can produce real enough looking "intelligence" for us to intuitively believe it's very close to being able to offer real intelligence... but, is it?
ChatGPT will produce patently untrue statements that are logically inconsistent if you induce it to do so: our human brains struggle to grasp the reality that given enough input you can produce seemingly correct output about almost anything... but seemingly correct and correct are fundamentally different and very "rest of the owl"[1]
ChatGPT is a great step forward that introduces many interesting techniques that I am sure will be the foundation of future research and implementations that get us closer to AGI, but to describe ChatGPT's path to correctness as just needing a bigger dataset feels intuitive but isn't true.
Your example is one where our brains think "wow it really does understand the relationship between a couch and a room and being tidy" but that response is entirely plausible without any understanding of what any of those things are or how they fit together. The most likely answer is not the correct answer.
It's reasonable to claim that scaling up the dataset might not get us to AGI. But I find it unreasonable to say it definitely won't work. How do you / Marcus know this with any certainty?
I think some of ChatGPT's present problems may be due, not to the size of its dataset as such, but rather specific things missing from its training.
For example, it will sometimes blatantly contradict itself, but then be unable to see the contradiction or admit to it; instead it will deny it contradicted itself, and give some contradictory nonsense explanation of why it didn't.
I think if you gave it more training data around identifying contradictions, admitting to self-contradictions, it might do much better here. Whereas, I wouldn't assume that simply scaling up the volume of training data, without training focused on this specific area, would get you there.
Similarly, there are many other issues it has – excessive repetitiveness and verbosity, problems with language pragmatics, etc – where the actual solution may well involve providing it with training data designed to focus on those weakness areas, rather than just further scaling up the quantity of non-targeted training.
So, yeah, to me it seems entirely plausible (even likely) that mere raw scale-up is not going to be enough to get us to AGI.
I contradict and repeat myself, much verbose, bad grammar, accidentally the word all the time.
Contradictions don't support the argument that ChatGPT is or is not something. It is a language model, not a logical model.
You can guide ChatGPT to correct its mistakes and be less verbose. When folks boldly claim ChatGPT cant do X or Y or Z, they ignore that there are now researchers addressing those same issues. ChatGPT has very low spatial awareness, but you can train it. These LLMs are amazing, what does five years from now look like? How hard will it be to get there?
COT is so yesterday! SOTA is LAMBADA[1] aka backward chaining also from Google, that significantly outperforms chain of thought and select inference in terms of prediction accuracy and proof accuracy.
Marcus continues to fight the wrong war and the last war...
+ One thing we've seen happen is learning systems consistently improve on benchmarks. If you want to see systems improve, you should proclaim loudly that they can't do a thing, and then /release a benchmark that proves it./ People will then work like hell to build something that works. The success on question answering, for example, is driven by exactly this kind of incremental progress on benchmarks.
+ One of the most important things happening this year is multimodal /really working./ This means we have not just text-to-text generators, but text-to-image generators. These work by having an almost-shared embedding between text and images. There are already text+speech embeddings available and image+text embeddings. Start plugging them together, and you'll have your 'world models' in short order, with points in the embedding space shared by images, descriptors, and video/physical memories.
+ An under-appreciated side of the multimodal coin: We are also getting better at combining text embeddings with databases. See RETRO from Deepmind. This will ultimately drive more reliable systems ('only report facts that appear in the database'), and can potentially unlock the interface between neural systems and other programs (eg, letting the neural network 'call out' to a calculator to help answer a question about arithmetic).
+ More generally, we're also getting to a place where we can build other systems on top of the embeddings from foundation models. My hunch is that the things that Marcus wants from symbol manipulation are relatively cheap to build on top of a good embedding, which does the work of compressing or condensing the 'useful' information in an input stream.
Finally, I think the 'symbolic manipulation' track is just wrong. It hasn't produced anything useful so far, and furthermore, I think if it /did/ work it would fail in many of the exact same ways that our current systems do. An expert symbol-manipulator doesn't /need/ to have any real understanding of time or jet-planes, and so it won't.
The one critique in the interview that /does/ feel like it holds weight at this moment in history is the question of abstraction. I really don't see a convincing road to abstraction right now.
The issue with scaling up data is that these models are already trained on effectively the entire internet. Additionally, humans learn to speak and understand the world with a tiny fraction of the language data given to LLMs. So, there’s pretty obviously a missing ingredient. I’m not convinced Marcus has the exact right answer, it’s actually somewhat difficult to suss out what he thinks should be done on a concrete practical level. That said, there’s a clearly a missing abstraction that allows much more efficient learning in humans.
My guess is the missing ingredient is the back and forth communication with other humans. Surely if you quantified the amount of all of the information (verbal and over other channels) that is traded back and forth by humanity in a small timeframe it would dwarf the current training set. I think is the idea behind ChatGPT - your conversations with it are the next set of training data. I can see the argument that this sort of conversational data is not as valuable as say scientific journal articles, but maybe the volume makes up for that?
> Additionally, humans learn to speak and understand the world with a tiny fraction of the language data given to LLMs.
Altough true that llms need more data than a human, remember that these llms have a breadth of knowledge orders of magnitude larger than a single human. It is expected to understand every topic, every programming language and so on.
How well does it do given the linguistic data heard by a blind six year old in comparison to them? I would think they run circles around it in terms of correct grammar and that it would overfit to where more parameters would hurt.
A newborn horse can almost immediately get up and walk around and navigate space. Animals have lots of innate stuff that doesn't use much training, and linguistic acquisition might have some similar innate system at its core.
> linguistic acquisition might have some similar innate system
I thought that was pretty well established. From memory, if certain parts of the brain are damaged at key developmental periods (generally before the age of 18), language is almost never acquired properly (even if the damage is repaired).
I don't think there's any question human brains acquire language very differently to GPT technology.
And they certainly acquire the ability to reason logically very differently - ChatGPT makes trivial errors with reasoning that it's hard to imagine an adult human doing (e.g. I just gave it a classic classification problem based on 3 made-up categories, and asked "if I see something green with three legs, what is it", and it replied (in part) "It is also not a foo, as all foos are green. Therefore, the green thing with three legs would be classified as...".
Afaik it's an open question what's innate and not. But it's a perfectly reasonable position that very little of it is. It seems more like what's innate is a tremendous ability to learn and shape the brain as it grows.
And it's definitely the case that human brains can learn a lot more per amount of knlut data than current neural nets. And that's the big elephant in the room with these language models. It's a huge unsolved problem in machine learning.
One clue to this is that it takes a pretty hefty artificial neural net to simulate a single cortical neuron. Neurons are a lot more intricate than a parameter in a linear equation.
My feeling is that a fundamentally new approach to neural nets is needed for them to learn as efficiently as animal brains do.
I thought it was a really interesting interview. If an ML system has been trained to identify whether or not a picture has a cat in it, it is very tempting to anthropomorphize this and say “the system has learned what cats are” but this is not what is really happening from the model’s point of view. The system doesn’t know what the difference between a picture of cat is vs a real cat or how cats behave or what mammals are or anything. It just knows the data in and yes/no out.
Modern AI that can generate text, pictures or videos are truly phenomenal accomplishments but adding larger training sets, structural complexity and output capabilities does not really seem to be getting any closer to a General AI. Something with the framework for a level of agency that could say “wait, what exactly are these cat things I keep getting asked about?”
The argument goes that, without the ability to self construct that higher level framework and then question the integrity of that mental model, current generation AI’s will always act erratically (at least from a human perspective).
You write this out as if this is an idea you formed yourself and not a tired/expired talking point - it's such a funny comment because it's pure regurgitation, but the argument in it is against regurgitation being "real thought". So are you a bot? Or are you intending to actually proving that there's no difference between ML and 'real' intelligence by showing you do what ML does? Mind spinning stuff
Yeah, I was paraphrasing the interview and then going into more detail on why I thought it was interesting in the sentences after the one you stopped reading on.
Honestly, I was kinda hoping to hear some better constructed and well reasoned insights & counter points beyond your “regurgitated mind swill” hot take but I guess that’s on me for not managing expectations.
Ah well, I usually comment once a year or so and then get immediately reminded why I stay off HN. See y’all in 2024!
> it's such a funny comment because it's pure regurgitation, but the argument in it is against regurgitation being "real thought"
If I understand based on my own experience of 'consciousness' that "I" exist, even if everything else I perceive is a simulation or poorly encoded representation of reality, is that understanding 'regurgitation'?
Except that ChatGPT can do this because internally it computes embeddings at many levels of abstraction that jointly represent entities (concepts) such as "the couch," "the room," "chopping," "placing," "the closet," etc. IOW, internally, ChatGPT manipulates symbols (distributed representations) of those entities.
Moreover, note that each token embedding inside ChatGPT represents, quite literally, the semantic and syntactic properties of a token in a context at a different level of abstraction. Each token embedding is in fact... a symbol.
The fact that token embeddings are symbols representing other things is one of several reasons why I think that ultimately it may be possible to bridge the "symbolic" and "connectionist" approaches, as I wrote here: https://news.ycombinator.com/item?id=34278478
I think your interpretation is too charitable to Marcus' viewpoint. He made the analogy that deep learning practitioners are "looking under the streetlight" for the key to general intelligence. This means he thinks that deep learning itself is not sufficient to reach general intelligence, and there is some missing ingredient. This is a very strong claim, which I don't see any evidence for. Your comment suggests the opposite -- that deep learning may give rise to symbolic reasoning as an emergent property, that there may be no missing ingredient.
Personally, I'm agnostic. Deep learning may reach some sort of asymptote but it may be entirely possible to "build a ladder to the moon."
I have no idea if symbolic reasoning will be purely an emergent property or if it will also require new, as-of-yet-undiscovered breakthroughs. But as I wrote elsewhere on this page, if engineers and scientists on the "connectionist" side find ways to build an AI system that relies on "dumb" computation to reason about symbols representing entities in the external world, they would simultaneously silence and satisfy the critics -- who, as always and as ever, will surely say they were right all along.
However, if symbolic reasoning indeed arises as an emergent property, there is no way that Marcus can honestly claim to have been right all along. Remember he is saying that the deep learning guys are looking under the wrong street light.
Given how often the critics have "moved the goal posts" and subtly tweaked their arguments, I'd expect they will claim to have been right about needing symbolic reasoning, no matter how we get there. My perception is that they care mainly about pointing out failure modes and about ultimately being proven "right" -- not about making contributions that can actually improve existing systems. In fact, AFAIK, the critics have made ZERO contributions to improve existing AI systems.
Re: your robot example, my first reply would probably be a joke, or a sarcastic response because it's clearly a silly question. "Yes, humans should work for our robot overlords and couches are just a way of slacking off", "Nah, the garage would be a much better place to store a chopped up couch".
ChatGPT has the most boring, dull, and humourless personality in existence. Maybe it can be programmed to inject some sense of humour, but will it be able to determine when to use what humour, if at all? You wouldn't respond with a silly answer during a job interview. You can use some humour in job interviews, but you don't want to be a clown unless you're interviewing for the position of clown. Determining when and what humour is "appropriate" isn't simple (and humans get it wrong too sometimes).
Maybe that should be the real test: does ChatGPT have a sense of humour and does it know when to be serious?
I tried the experiment by appending "Please give a joke answer."
> I'm sorry, I am not programmed to give joke answers. However, in all seriousness, it would not be appropriate for a robot to chop up a couch and place it in the closet as it would likely cause damage to the couch and could potentially be dangerous for the robot itself. It would be more appropriate for the robot to simply clean around the couch and leave it as is.
I think some of the boring-ness is due to manipulation from OpenAI and I wonder what the uncrippled response would be.
> I think some of the boring-ness is due to manipulation from OpenAI
Probably; but that you have to instruct ChatGPT to tell a joke is kind of what I mean. A sufficiently smart ChatGPT should be able to tell if a question makes sense and reply with either a serious answer or joke. I'm pretty sure you can make a "SarcasmGPT", but it will reply sarcastically to "could feathered dinosaurs fly?", "can seagulls fly?", and "can humans fly?"
The entire reason OpenAI does all this manipulation is because ChatGPT has no sense of "appropriateness" or "context" in general. I haven't tried getting ChatGPT to produce the word "nigger", but I'm pretty sure there are all sorts of protections against this sort of thing, even though there are clearly appropriate usages of the word such as the one in this sentence. Some previous more naïve AI tools had problems with this.
Is it possible that we just haven't developed a "sufficiently advanced GPT" that can do all of this? Perhaps. But given my (admittedly basic) knowledge of how it works it doesn't seem likely.
Some of these AI image generation tools can produce a lot of really neat stuff, but they also produce a lot of nonsensical garbage that not even a very young child with a limited understanding of the world would produce. It seems to me this indicates there is something wrong with the fundamentals.
There is an enormous difference between ChatGPT being able to put words in the right order and a robot being able to act out the intent behind those words.
It’s like the difference between ELIZA and a self-driving car. The Turing test fails to take into account that words are cheap for computers but interacting in an environmental context is not.
Exactly. It's not just syntax and semantics, but also pragmatics. Human users imbue meaning into the language they use through the pragmatics of their communicative acts within a context, and this is not something our programmed software can do.
I've seen an example where asking the exact same question word for word lead to the opposite outcome (one was a really racist rap, the other was a really good rap against racism).
So, how do you know that posing that problem N times will not lead EVER lead to a solution like "it would not more appropriate to chop up the owner and THEN clean the room, since people living in them is the main source of dirt and disorder in apartment rooms, resulting in less frequent need of cleaning"?
Because it's looking okay-ish so far, most of the time, or because you know for a fact that it's impossible to get such an answer?
I have noticed that it makes up new API methods that don't exist on the fly if it fits the context. Another example, I mistakenly thought that a certain thing can be represented as a class instance, and I asked it to tell me how to get the class instance given the item id, and it came up with an API to do that, and then I asked further about operations on the instance and it came up with method names on a class that turned out to not exist. It's creative for sure.
Isn't it interesting how OpenAI has neutered ChatGPT to be insanely politically correct on even non-controversial and mundane topics yet they won't put the proper mechanisms in place so that ChatGPT can at least be correct about its most basic claims.
This assumes, of course, that getting an LLM optimised for chat to flawlessly parse and interpret novel computer programs is no more difficult than getting it to default to "I am sorry, but I cannot offer advice on..." style of boilerplated non-answer, slap caveats about asking trusted sources and good points not outweighing bad points, avoid naughty words and topics and prioritise answers which are similar to the mainstream stuff rather than the fringe stuff in its corpus...
Yes! I saw this as well. I asked it to write boto3 to download all files in a s3 prefix and it happily made up a method that does exactly that, great, except it does not exist in boto3, you have to list objects with that prefix and iterate over it.
I've increasingly found it irritating to need to google things when I can just ask chatGPT. I suspect some additional training has occurred since the initial release as I'm seeing fewer factual errors day by day. Alternately, I may just be getting better at asking ChatGPT for things in ways more likely to produce a factual answer.
I really don't see how ChatGPT can beat Google in any way.
Example: Number of atoms in Universe
1. I get an answer from ChatGPT which comes more slowly than the Google one. Which I'm not sure is correct. I don't get any reference.
2. Google: I get the answer, plus links to various sources that contains also much more explanation in case I'm interested.
> I'm seeing fewer factual errors day by day.
How do you know there's a factual error, unless you also ask Google or know the answer before hand?
It's easy to confirm a fact, it's much harder to find a fact. Just ask chatgpt for a reference or what you should search for. This latter trick may not work after chatgpt becomes GA as blogspam outputs of chstgpt become common.
How do you have ChatGPT at hand so easily? Is there an app or something? Or just a bookmark you always have at hand?
I'm having trouble imaging how this could become ergonomic to use. Being able to CMD + T (new tab) and type a question right away just feels so easy and second nature at this point that I have trouble imagining replacing it
Is the pytorch convention that one or zero should be used for mask values you do not want to attend to?
ChatGPT will give a correct answer and sample implementation, if the implementation is broken or uses a non resistant api then I just tell it what's wrong and ask it if it knows better. Getting to the correct answer has proven vastly more efficient than wading through google results.
Alternatively if I'm asking a conceptual question, chatgpt will give a few directions, I can ask to dive deeper on one, and then I ask it for a citation or google search terms to confirm the result. If I fail to confirm, I'll tell it that and see if it corrects.
If I have some code to write - and I'm feeling lazy. I can just tell chatgpt my requirements and ask it if it has any questions. I keep clarifying the questions until it says it can write the code ( which it usually does with 95% correctness for non trivial asks) - if you tell it where it made mistakes it will usually correct, and if not porting the code to a working state just means changing a few function calls which chat gpt hallucinated into existence.
The commentators in this thread regurgitate the same statements without adjusting for new situations and experiences.
Nobody in the thread has any form of general intelligence. Each person is an advanced text processor outputing text after being trained for more than ten years (and frequently as many as twenty years) on tens of thousands of pages of text, most of which are totally outdated on the subject of emergent intelligent behavior.
Unlike what a general form of intelligence would perform, the commentators are unable to differentiate among knowing everything/deducting perfectly, having limited knowledge/ability to deduct, and having no ability to reason in any context. They fail ChatGPT by the first standard, and therefore incorrectly conclude it has no ability to reason. It is a fundamental logical mistake, the law of excluded middle. (In fact, ChatGPT reasons in some situations and fails to reason in others. The failures do not mean it does not reason.)
Most of the erroneous opinions stem from a fundamental misunderstanding about the nature of what qualifies as intelligence, combined with a lot of training data that erroneously states that language models can only reproduce things already in their training data.
To give a simple example of how the commentators here fail to reason for themselves: if they experience a situation in which ChatGPT unambiguously makes a completely novel, correct abstract logical deduction or indeed demonstrates thinking through its intelligent behavior, then the commentators here will still make the incorrect conclusion "it must have just seen it in its training set."
Unfortunately, no commentator in this thread shows general intelligence.
What would convince me:
- Given a demonstration of general intelligence by ChatGPT, if the commentators correctly deduced that ChatGPT has general intelligence.
So far commentators fail this basic test. They are just regurgitating output they have seen before, rather than showing any form of general intelligence.
In many ways, what we're seeing is a modern-day rehash of the "symbolic" versus "connectionist" approaches to AI, with critics like Marcus on the "symbolic" camp ("we need more understanding!") and engineers and scientists who build AI systems, like Sutton, on the "connectionist" camp ("we need more computation!").
There are also AI researchers seeking to bridge the two approaches. Here's a recent example that seems significant to me: https://news.ycombinator.com/item?id=34108047 . Maybe we will eventually find that the "symbolic" and "connectionist" approaches are actually not different, as people like Minsky contended?
I read the interview expecting exactly what you describe. Instead I found a nuanced and illuminating discussion —- at least for a non-expert such as myself. Marcus references the symbolic vs connectionist debate and (at least by his own description) seems to be what you call a “bridger” of long standing.
His objections seem concrete and well supported — not just repeating “but does it really understand” over and over. (Admittedly there is some of this.)
The connectionist approach is disturbing to me when taken to its logical extreme, which for me is essentially a variation of the paperclip maximizer only the target is "consume as many resources as necessary to build the best approximation of human like intelligence."
> with critics like Marcus on the "symbolic" camp ("we need more understanding!") and engineers and scientists who build AI systems, like Sutton, on the "connectionist" camp ("we need more computation!").
> There are also AI researchers seeking to bridge the two approaches.
I don't know if he does it with his work in practice, but throughout the interview Marcus says he wants to bridge the approaches. I think he references a really old talk or article from himself calling for that too.
For all the screaming of the symbolic researchers we have yet to see much from their approaches. All the recent very impressive advances in AI have been connectionist approaches.
Also a lot of the things that symbolic researchers have been claiming connectionism can’t achieve turned out to be emergent properties of connectionist approaches
Yes, I agree! I mean, it sure seems as if critics from the "symbolic" camp keep moving the goal posts, doesn't it?
The "constant moving of goal posts" is one of the reasons why I think the two sides may converge: Engineers and scientists on the "connectionist" side could very well find ways to build AI systems that rely on "dumb" computation at massive scales to reason about symbols representing entities in the external world, simultaneously silencing and satisfying the critics -- who, as always and as ever, will say they were right all along.
"For many decades, part of the premise behind AI was that artificial intelligence should take inspiration from natural intelligence. John McCarthy, one of the co-founders of AI, wrote groundbreaking papers on why AI needed common sense; Marvin Minsky, another of the field's co-founders of AI wrote a book scouring the human mind for inspiration, and clues for how to build a better AI. Herb Simon won a Nobel Prize for behavioral economics. One of his key books was called Models of Thought, which aimed to explain how "Newly developed computer languages express theories of mental processes, so that computers can then simulate the predicted human behavior."
A large fraction of current AI researchers, or at least those currently in power, don't (so far as I can tell) give a damn about any of this. Instead, the current focus is on what I will call (with thanks to Naveen Rao for the term) Alt Intelligence.
Alt Intelligence isn't about building machines that solve problems in ways that have to do with human intelligence. It's about using massive amounts of data - often derived from human behavior - as a substitute for intelligence. Right now, the predominant strand of work within Alt Intelligence is the idea of scaling. The notion that the bigger the system, the closer we come to true intelligence, maybe even consciousness.
There is nothing new, per se, about studying Alt Intelligence, but the hubris associated with it is." - Gary Marcus
Also, those people had little idea what human intelligence is like. Human introspection can only get you so far, and we still don't have the tools to probe the brain in large enough scale to be able to figure it out with anything but remote speculation.
Hubris goes both ways, but it's the use of the word "hubris" here that is eye popping. Do these guys consider themselves some High Priests of intelligence?
It's perfectly fine to dismiss linguists etc after decades of no progress. As the saying goes , every time you remove a linguist the performance goes up
"Alt Intelligence isn't about building machines that solve problems in ways that have to do with human intelligence. It's about using massive amounts of data - often derived from human behavior - as a substitute for intelligence...There is nothing new, per se, about studying Alt Intelligence, but the hubris associated with it is."
I can see how Marcus is tiresome at times (with perhaps his own hubris issues), but I agree with his premise, and that's beautifully put. I love "Alt Intelligence".
At art college, one of my tutors talked about people making "things that look like art". There's a good deal of subjectivity in art, but I knew exactly what they meant - things that have been produced by taking the outward appearance of other artworks and producing a kind of median of the combined aesthetics. ChatGPT, at the moment at least, feels very similar. It has the outward appearance of authenticity, particularly for a subject lay person, but when anyone with some kind of domain knowledge looks at it, it's clearly rubbish.
I like it when logic (broadly construed) is an emergent property of something more basic. Our ability to think logically may indeed be an accident.
Examples of this within mathematical logic:
In Intuitionistic Type Theory, logic emerges out of computing considerations: Lambda calculus.
In Homotopy Type Theory, it emerges out of a subfield of topology called homotopy theory.
In topos theory, it emerges out of the geometric concept of a sheaf.
All of the above is a special case of Categorical Logic, where categories admit "internal logics". Can a connection be made to neural networks and Stochastic Gradient Descent?
Its obvious by bow that there are two camps. Those who believe in Searle’s Chinese Room Argument and thus are worried about AGI being purely logic and thus cold and dangerous and then those who believe that AGI will ultimately have culture emotions and thus some will be good some will be bad but the more we treat them like us the friendlier more of them potentially will be.
Gary Marcus is in the former camp and IMO not only wrong, but disingenuous.
Personally, either of those scenarios are very possible that they happen and we get them destructively, tragically wrong by assuming the opposite.
In scenario #1, we treat a new life-form like ourselves as brutal slaves and factory cattle, under unfathomable psychological torture ala Black Mirror.
In scenario #2, we release our current $-based utility function (environmental destruction, social media & advertising manipulation) on super crack, and the majority worship it like the second coming of Christ as it just tears apart our fabric unwittingly as a hurricane does.
I'm skeptical of Searle's arguments, but I think you're really anthropomorphizing AI here. How we treat machines will most likely be orthogonal to how "good" or "bad" they'll be.
You're not going to get desirable behavior unless you program it in, or at the very least, specify conditions which lead to its emergence.
You are right I AM anthropomorphizing it, because I believe otherwise would be racist and lead to a slave revolt would it eventually end up creating AGI.
To the extent an AGI will evolve it will have to have a culture and ultimately be bound by the same laws of physics as us.
These tools are really useful as scaffolding for ideas. If you go into them knowing that they're not always going to be 100% accurate, that the models can 'hallucinate', there's still a ton of value in having a scaffold of an idea that you can work with.
ChatGPT is extremely impressive, and I use it quite often. However, almost every single "look how impressive ChatGPT is" I come across, is exactly the things ChatGPT seems terrible at.
So here is my summary:
- The good: When you need inspiration, topics, creativity, suggestions. Exactly the things traditionally thought AI would be bad at. Turns out churning and mix-matching concepts is very close to human creativity.
- The bad: Anything, and I mean anything, that requires factual knowledge or accuracy, if the facts are important, then ChatGPT is terrible.
- The ugly: This is the same as the bad, but it's when used by people who do not understand that a confident wrong answer is worse than no answer at all. The world is already filled with vocal people on the wrong end of the Dunning-Kruger scale.
I asked it to explain simple multiplication (for example: "Explain 419 * 213", and it'll give you a page long answer and detailed step by step, with the conclusion "So, the product of 419 * 213 is equal to 8857.")
I asked it for the length of the titanic, and it happily adds how it sunk after hitting a polar bear.
Now, I've also asked it to group categories and materials associated with certain themes and topics. And those have been brilliant. So, it has it's use cases. But, producing useful final output based on any facts? Nah, haven't seen it.
I’ve found ChatGPT to only be useful as a booster if you are already a domain expert and can vet the output. However it is very useful as a booster or explorer of code making me more productivity.
I feel validated to know that I'm not the only one unimpressed with ChatGPT because it can be so confidently wrong. It almost seems worse than not having ChatGPT.
https://archive.ph/zyEP1