"Check and consolidate their understanding" by reading generated text that is not checked and has the same confident tone whether it's completely made-up or actually correct? I don't get it.
>interrogates our current teaching model
Jesus, many many things put our current teaching model in question, chatgpt is NOT one of them. Tbh this excitement is an example of focusing on the "cool new tech" instead of the "unsexy" things that actually matter.
> by reading generated text that is not checked and has the same confident tone whether it's completely made-up or actually correct? I don't get it.
This is a valid point, but it's referring to the state of things as of ~1.5 years ago. The field has evolved a lot, and now you can readily augment LLMs answers with context in the form of validated, sourced and "approved" knowledge.
Is it possible that you are having a visceral reaction to the "cool new tech" without yourself having been exposed to the latest state of that tech? To me your answer seems like a knee-jerk reaction to the "AI hype" but if you look at how things evolved over the past year, there's a clear indication that these issues will get ironed out, and the next iterations will be better in every way. I wonder, at that point, where the goalposts will be moved...
No, ChatGPT and others still happily make stuff up and miss important details and caveats. The goalpost hasn't moved. The fact that there are specialized LLMs that can fact check (supposedly) doesn't help the most popular ones which can't.
Have you tried Claude.ai. In my experience on computer science topics, the LLMs are very good. Because they have been trained on a vast amount of information online. I just had a nice conversation about mutexes and semaphores with claude and was able to finally grasp what they were.
I do not know if this is the case for example for mathematics or sciences.
The student isn't an idiot, they'd use what the teacher says as their ground truth and chatgpt would be used to supplement their understanding. If it's wrong, they didn't understand it anyway, and reasoning/logic would allow them to sus out any incorrect information along the way. The teaching model can account for this providing them the checks to ensure their explanation/understanding is correct. (This is what tests are for, to check your understanding).
I don’t know if you have been teaching, but I have (for nearly 19 years now ) to a lot of different people of various ages. I’m also a daily user of LLMs.
I’m firmly convince that LLMs will have an impact on teaching because they are already used in addition / superimposed on current classes.
The physical class, the group, has not been dislodged even after hundred of thousands of remote classes during confinement. Students were eager to come back, for many reasons.
LLMs have the potential to enhance and augment the live physical class. With a design school I teach at, we even have proposed a test program for a grant, where my History of Tech Design will be the in-vivo test ground of new pedagogical strategies using LLMs.
Tools that graft into the current way of teaching have had more impact that tools that promise to “replace university/schools”
The fact here is that a student, using ChatGPT, managed to give the right answer. And I agree with GP that the teaching model must evolve. The cat is out of the bag now and clearly students, of (unfortunately) almost all ages, are using it. It being "cool new tech" or anything else doesn't matter and as a teacher it must not be dismissed or ignored.
Not all subjects taught have to evolve in the same way. For example, it is very different to use ChatGPT to have a technical discussion than to simply ask it to generate a text for you. Meaning this tech is not having the same impact in a literature class and here in a CS one. It can be misused in both though.
I always come back to the calculator analogy with LLMs and their current usage. Here in the context of education, before calculators were affordable simply giving the right answer could have meant that you knew how to calculate the answer (not entirely true but the signal was stronger). After calculators math teachers were clearing saying "I want to see how you came up with the answer or you won't get any points". They didn't solve the problem entirely, but they had to evolve to that "cool new tech" that was clearly not helping there students learn as it could only give them answers.
I'm no LLM fanboy and I do know about their issues and shortcomings.
I also think that asking the right questions to a model while following a lecture, assessing its answers and integrating them into one's own reasoning is difficult. There is certainly a minimum age/experience level under which this process will generally fail, possibly hindering the learning outcome.
Nevertheless, I saw with my own eyes a mid-level student significantly improving his understanding of a difficult topic because he had access to a LLM in real time. I believe this is a breakthrough. Time will tell.
I don't know, seeming 'conversationally smarter' when having access to a language model is much different from just looking stuff up and pattern matching answers?
I'm afraid these models are making people sound smarter and feel smarter without any actual gains in real world problem solving skills.
No. Network effects + turbo-capitalism being able to lose 100s of millions a month to build market share, mean excessive concentration which we cannot get rid of simply by providing alternatives, even if they are "better".
We accidentally invented general models that can coherently muse about the philosophical beliefs of Gilles Deleuze at length, and accurately, based on two full books that they summarized. You can be cynical until your dying day, that’s your right — but I highly recommend letting that fact be a little bit impressive, someday. There’s no way you live through any event that’s more historically significant, other than perhaps an apocalypse or two.
In other words: soapbox is presumably some sort of toy car that goes 15mph, and formula 1 goes up above 150mph at least (as you can tell, I’m not a car guy). If you have any actual scientific argument as to why a model that can score 90-100 on a typical IQ test has only 1/10th the symbolic reasoning skills of a human, I’d love to eat my words! Maybe on some special highly iterative, deliberation-based task?
These aren't "general" models. They're statistical models. They're autocorrect or autocomplete on steroids -- and autocorrect/autocomplete don't require
symbolic reasoning.
It's also not at all clear to me what "symbolic" could mean in this context. If it means the software has concepts, my response would be that they aren't concepts of a kind that we can clearly recognize or label as such (edit: and that's to say nothing of the fact that the ability to hold concepts/symbols and understand them as concepts/symbols presupposes internal life and awareness).
The best analogy I've heard for these models is this: you take a completely, perfectly naive, ignorant man, who knows nothing, and you place him in a room, sealed off from everything. He has no knowledge of the outside world, of you, or of what your might want from him. But you slip under the door of his cell pieces of paper containing mathematical or linguistic expressions, and he learns or is somehow induced to do something with them and pass them back. When what he does with them pleases you, you reward him. Each time you do this, you reinforce a behavior.
You repeat this process, over and over. As a result, he develops habits. Training continues, and those habits become more and more precisely fitted to your expectations and intentions.
After enough time and enough training, his habits are so well formed that he seems to know what a sonnet is, how to perform derivatives and integrals, and seems to understand (and be able to explain!) concepts like positive and negative, and friend and foe. He can even write you a rap-battle libretto about nineteenth-century English historiography in the style of Thomas Paine imitating Ikkyu.
Fundamentally, though, he doesn't know what any of these tokens mean. He still doesn't know that there's an outside world. He may have ideas that are guiding his behavior, but you have no way of knowing that -- or of knowing whether they bear any resemblance to concepts or ideas you would recognize.
These models deal with tokens similarly. They don't know what a token is or represents -- or we have no reason to think they do. They're just networks of weights, relationships, and tendencies that, from a given seed and given input, generate an output, just like any program, just like your phone keyboard generates predictions about the next word you'll want to type.
Given billions and billions and billions and billions of parameters, why shouldn't such a program score highly on an IQ test or on the LSAT? Once the number of parameters available to the program reaches a certain threshold (edit: and we've programmed a way for it to connect the dots), shouldn't we be able to design it in such a way that it can compute correct answers to questions that seem to require complex, abstract reasoning, regardless of whether it has the capacity to reason? Or shouldn't we be able to give it enough data that it's able to find the relationships that enable it to simulate/generate patterns indistinguishable from real, actual reasoning?
I don't think one needs to be cynical to be unimpressed. I'm unimpressed simply because these models aren't clearly doing anything new in kind. What they're doing seems to be new, and novel, only because of the scale at which they do what they do.
Edit: Moreover, I'm hostile to the economic forces that produced these models, as everybody should be. They're the purest example of what Jaron Lanier has been warning us about -- namely that, when information is free, the wealthiest are going to be the ones who profit from it and dominate, because they'll be the ones able to pay for the technology that can exploit it.
I have no doubt Altman is aware of this. And I have no doubt that he's little better than Elizabeth Holmes, making ethical compromises and cutting legal corners, secure in the smug knowledge that he'll surely pay his moral debts (and avoid looking at the painting in the attic) and obviously make the world a better place once he has total market dominance.
And none of the other major players are any better.
To be fair parent pointing out they're purely statistical machines predicting next token is incorrect anyways.
They are essentially next token predictors after first training, but then instruct models are fine tuned on reasoning and Q/A scenarios, afaik early research has determined that this isn't just pure parroting, that it does actually result in some logic in there as well.
People also have to remember the training for these is super shallow at the moment, when compared with what humans go through in our lifespans as well as our millions of years of evolution (as humans).
What you're saying doesn't contradict what I wrote. I said models are trained. You're saying they're trained and fine tuned -- i.e. continue to be trained. I also didn't say they do any kind of parroting or that logic doesn't take place.
I'm saying, rather, that the models do what they're taught to do, and what they're taught to do are computations that give us a result that looks like reasoning, just the way I could use 3ds max as a teenager to generate on my computer screen an output that looked like a cube. There was never an actual cube in my computer when I did that. To say that the model is reasoning because what it does resembles reasoning is no different from saying there was an actual cube somewhere in my computer every time I rendered one.
I'm not saying "they predict tokens; therefore, they can't reason." I'm saying "something that can't reason can predict tokens, so prediction isn't evidence of reasoning."
More specifically, my comment aims to meet the challenge posed by the person I answered:
> I highly recommend letting that fact be a little bit impressive, someday. There’s no way you live through any event that’s more historically significant, other than perhaps an apocalypse or two. [...] If you have any actual scientific argument as to why a model that can score 90-100 on a typical IQ test has only 1/10th the symbolic reasoning skills of a human, I’d love to eat my words.
I have no idea what would constitute a "scientific argument" in this instance, given that the challenge itself is unscientific, but, regardless, the results that so impress this person are, without question, achievable without reasoning, symbolic or otherwise. To say that the model "muses" or "has [...] symbolic reasoning" is to make a wild, arbitrary leap of faith that the data, and workings of these models, do not support.
The models are token-prediction machines. That's it. They differ not in kind but in scale from the software that generates predictions in our cell-phone keyboards. The person I answered can be as impressed as he wants to be by the high quality he thinks he sees in the predictions. That's fine. I'm not. In that respect, we just disagree. But if he's impressed because he thinks the model's predictions must or do betoken reasoning, he's off in la la land -- and so his wide-eyed, bushy-tailed enthusiasm is based on nonsense.
It's no different from believing that your phone keyboard is capable of reasoning, simply because you are delighted that it guesses the 'right' word often enough to please you.
If your argument is "they can't reason" (plus some other stuff about how they work), what reasoning test has an LLM failed for you to conclude that they can't reason? Whenever I've given an LLM a reasoning test, it seems to do fine, so it really does sound to me like the argument is "they can't really reason, because <irrelevant fact about how they work internally>".
Because, whenever you give it a reasoning test, it also seems to do fine.
That is what I meant in my other post, I don't really think that "it seems to do fine" is enough evidence for the extraordinary claim that it can reason.
That isn't the argument. I've stated the argument twice. My longer response to you starts by stating the core of the argument as succinctly and clearly as I can. That's the first paragraph of the post. Not only are you still not getting it. You're also twisting what I wrote into claims I have not made. I'm not going to explain myself a third time.
I'll instead say this: if you think these models must be reasoning when they produce outputs that pass reasoning tests, then you should also believe, every time you see a photo of a dog on a computer screen, that a real, actual dog is somewhere inside the device.
> I'm not saying "they predict tokens; therefore, they can't reason." I'm saying "something that can't reason can predict tokens, so prediction isn't evidence of reasoning."
This is true. Reasoning is evidence of reasoning, and LLMs do pass reasoning tests. Yes, the way they work doesn't imply that they can reason, the fact that they can reason implies that.
You also said:
> These models deal with tokens similarly. They don't know what a token is or represents -- or we have no reason to think they do.
I have no reason to think that other people know what concepts are or what they represent, just that they can convincingly output a stream of words when asked to explain a context.
The argument bugs me because you can replace "they predict the next token" with "they are collections of neurons producing output voltages in response to input voltages" and you'll have the exact same argument about humans.
Thanks for explaining. I see what you're getting at now.
Here I think it's helpful to distinguish between what something is and how it's known. When we see something that resembles reasoning, we very reasonably deduce that reasoning has taken place. But 'it looks like reasoning' is not equivalent to 'it is reasoning.'
To approach the same idea from a different direction:
> I have no reason to think that other people know what concepts are or what they represent, just that they can convincingly output a stream of words when asked to explain a context.
You absolutely do have reason to think this. You're the reason. You're the best available evidence, because you have an internal life, have concepts and ideas, have intentions, and perform acts of reasoning that you experience as acts of reasoning -- and all of that takes place inside a body that, you have every reason to think, works the same way and produces the experiences the same way in other people.
So, sure, it's true that you can't prove that other people have internal lives and reason the way you do. (And you're special, after all, because you're at the center of the universe -- just like me!) But you have good reason to think they do -- and to think they do it the way you do it and experience it the way you experience it.
In the case of these models, we have no such reason/evidence. In fact, we have good reason for thinking that something other than reasoning as we think of it takes place. We have good reason, that is, to think they work just like any other program. We don't think winzip, Windows calculator, a Quake bot, or a piece of malware performs acts of reasoning. And the fact that these models appear to be reasoning tells us something about the people observing them, not about the programs themselves. These models appear to be reasoning only because the output of the model is similar enough to 'the real thing' for us to have trouble saying with certainty that they aren't the real thing. They're simulations whose fidelity is high enough to create a feeling in us -- and to pass some tests. (In that sense, they're most similar to special effects.) (Edit: and that's not to say feelings are wrong, invalid, or incorrect. They're one of the key ways we experience the things we understand.)
Is reasoning taking place in these models? Sure, it's possible. Is there an awareness or being of some kind that does the reasoning? Sure, that's possible, too. We're matter that thinks. Why couldn't a program in a computer be matter that thinks? There's a great novel by Greg Egan, Permutation City, that deals partly with this: in one section, our distant descendants pass to another universe, where matter superficially appears to be random, disorganized, and low in enthalpy. When that random activity and apparent lack of life and complexity are analyzed in the right way, though, interference patterns are revealed, and these contain something that looks like a rich vista bursting with directed, deliberate activity and life. It contains patterns that, for all the world, look and act like the universe we know -- with things that are living and things that are not, with ecosystems, predators, prey, communities, reproduction, etc. These patterns aren't in, and aren't expressed in, the matter itself. They 'exist' only in the interference patterns that ripple through it.
That's 100% plausible, too. Why couldn't an interference pattern amount to a living thing, an organism, or an ecosystem? The boundary we draw between hard, physical stuff and those patterns is arbitrary. Material stuff is just another pattern.
My point isn't that reasoning doesn't take place in these models or can't. It's, first, that you and I do something we call reasoning, and the best available information tells us these models aren't doing that. Second, if they are doing something we can call reasoning, we have no idea whether our understanding of the model's output tells us what its reasoning actually is or is actually doing. Third, if we want to attribute reasoning to these models, we also have to attribute a reasoner or an interiority where reasoning can take place -- meaning we'd need to attribute something similar to consciousness or beinghood to these models. And that's fine, too. I have no problem with that. But if we make that attribution, then we, again, have no reason to attribute to it a beinghood that resembles ours. We don't know its internal life; we know ours.
Finally -- if we make any of these claims about the capabilities or nature of these models, we are necessarily making the exact same claims about all other programs, because those work the same way and do the same things as these models. Again, that's fine and reasonable (though, I'd argue, wrong), because you and I are evidence that stuff and electricity can have beinghood, consciousness, awareness, and intentions -- and that's exactly what programs are.
The point that I don't think is disputable is the following: these models aren't a special case. They aren't 'programs that reason, in contrast to programs that don't.' They aren't 'doing something we can do, in contrast to other programs, which don't.' And even if they're doing something we can (or should) call reasoning, reasoning requires interiority -- and we have no idea what that interiority looks or feels like. Indeed, we have no good reason to think there's any at all -- unless, again, we think other programs do as well.
> Indeed, we have no good reason to think there's any at all -- unless, again, we think other programs do as well.
And this is equivalent to saying there's a dog in my computer when I open a photo of a dog. It treats the simulation, the data, the program -- whatever you want to call it -- as if it were the thing itself.
Some different type of praise: I absolutely love the voice used throughout gov.uk (this page being no exception). The language and tone is clear, straightforward, to the point, neutral, as technical as it needs to be and no more, and stripped of unnecessary flourish, without being dry. Amazing!
I think SWE must be the only (allegedly) engineering discipline where developer convenience is overtly prioritised above product quality or user experience.
If you doubt this, think how many times you've seen a framework advertised due to its ease of use for developers vs due to e.g. performance in low bandwidth.
I call absolute bullshit on that last one. There's no way ChatGPT solves a maths problem that a maths PhD cannot solve, unless the solution is also googleable in 30s.
Is anything googleable in 30s? It feels like finding the right combination of keywords that bypasses the personalization and poor quality content takes more than one attempt these days.
Right, AI is really just what I use to replace google searches I would have used to find highly relevant examples 10 years back. We are coming out of a 5 year search winter.
>interrogates our current teaching model
Jesus, many many things put our current teaching model in question, chatgpt is NOT one of them. Tbh this excitement is an example of focusing on the "cool new tech" instead of the "unsexy" things that actually matter.
reply