What do you mean by the "stochastic parrots" (null) hypothesis in this case? Car...

somewhereoutth · on Aug 5, 2023

I mean that it simply surfaces patterns in the training data.

So responses will be an 'agregation' (obviously more complex than that) of similar prompt/response from the training corpus, with some randomness thrown in to make things more interesting.

kalkin · on Aug 6, 2023

"Surfaces patterns in the training data" seems not to pin things down very much. You could describe "doing math" as a pattern in the training data, or really anything a human might learn from reading the same text. I suspect you mean simpler patterns than that, but I'm not sure how simple you're imagining.

A useful rule of thumb, I think, is that if you're trying to describe what LLMs can do, and what you're saying is something that a Markov chain from 2003 could also do, you're missing something. In that vein, I think talking about building from a "similar prompt/response from the training corpus", though you allow "complex" aggregation, can be pretty misleading in terms of LLM capabilities. For example, you can ask a model to write code, run the code and give the model an error message, and then model will quite often be able to identify and correct its mistake (true for GPT-4 and Claude at least). Sure, maybe both the original broken solution and the fixed one were in the training corpus (or something similar enough was), but it's not randomness taking us from one to the other.

somewhereoutth · on Aug 6, 2023

There is a big difference between 'doing math' by repeating/elaborating on previously seen patterns, and by having an intuitive grasp of what is going on 'under the hood'. Of course our desktop calculators work (very well) on the latter principle.

As you say, both the broken and correct solutions were likely in the training corpus (and indeed the error message), so really we are doing a smoke and mirrors performance to make it look like the correct solution was 'thought out' in some sense.

kalkin · on Aug 6, 2023

I think dismissing problem-solving as "smoke and mirrors" based on regurgitating training data will give you a poor predictive model for what else models can do. For example, do you think that if you change the variable names to something statistically likely to be unique in human history, the ability will break?

As for pattern recognition vs intuitive grasp--I don't think I follow. I would call pattern recognition part of intuition, unlike logically calculating out the consequences of a model, but on the other hand I would not say that a desktop calculator "grasps" anything-it is not able on its own to apply its calculating ability to real world instantiations of mathematical problems in the way that humans (and sometimes LLMs) can do.