reliableturing's comments

reliableturing · 2024-09-14T19:31:42 1726342302

I’m not sure what this paper is supposed to prove and find it rather trivial.

> All of the LLMs knowledge comes from data. Therefore,… a larger more complete dataset is a solution for hallucination.

Not being able to include everything in the training data is the whole point of intelligence. This also holds for humans. If sufficiently intelligent it should be able to infer new knowledge, refuting the very first assumption at the core of the work.

reliableturing · 2024-09-09T20:04:27 1725912267

I agree but I already planned to upgrade to a 16 Pro up from my 12. Biggest thing I’m looking forward to? Battery life.

In the announcement they said “big boost”, and looking at the comparison page I will go from 17h video playback to 27h. That’s not even including the battery degradation I built up in the last 4 years. I’m practically going to double my battery life.

jncfhnb · 2024-09-09T20:17:16 1725913036

Damn you can only watch 13 hours of video content per day now? Rough.

reliableturing · 2024-09-09T20:19:48 1725913188

I know, really looking forward to watching 27h of video in a day

kachapopopow · 2024-09-09T20:52:13 1725915133

If you plug in for 10 minutes you can watch 41h of video in a day!

reliableturing · 2024-03-06T20:00:02 1709755202

They probably meant vLLM https://docs.vllm.ai/en/latest/

edunteman · 2024-03-06T20:40:17 1709757617

ah shoot, yes I meant vLLM, sorry for the confusion, lots of comments to reply to :)

reliableturing · on March 31, 2023

Interesting question indeed! There isn't much of a consensus on this as you can see from the other comments. Nonetheless, I spend much time thinking about this so I'd like to take a jab at it as well.

I think it partially has to do with the concepts of modality and grounding. A modality is a channel through which information can be conveyed. You probably learned early on about the 5 human senses: vision, hearing, taste, smell and touch. The grounding problem refers to the fact that any symbols (read: language) we use, usually refers to perceptions in one or more of these modalities. When I write "shit", you can probably imagine a representation of it in different modalities (don't imagine them all!).

Interestingly, large language models (such as ChatGPT) don't have any of these modalities. Instead, they work directly on the symbols we use to communicate meaning. It's quite surprising that it works so well. An analogy that helps understand this is that asking an LLM anything is much like asking a blind person what the sun looks like. Obviously they cannot express themselves in terms of vision, but they could say that it feels warm and maybe even light because it doesn't make any noise. It would be a good approximation and they would be referring to the same physical phenomenon, but that's all it is, an approximation. They could say its a large yellow/white-ish circle if they heard this from someone else before, but since the blind person cannot see, they have no 'grounded truth' to speak from. If the sun would suddenly turn red, they would probably repeat the same answer. My point being: you can express one modality in another, but it'll always be an approximation.

What's interesting is that the only 'modality' of these LLMs is language, which is the first of its kind so we don't know what to expect from this. In a sense, LLMs are simply experiments to the question "what would a person that could only 'perceive' text look like?". Turns out, they're a little awkward. Obviously there's much more to the story (reasoning, planning, agency, etc.) but I think its fundamental to your question why reading for humans and AIs (LLMs) is not the same: LLMs have such a limited and awkward modality that any understanding can only be an approximation of ours (albeit a pretty good one), hence learning from reading will be much different as well.

Hope this helps your understanding.

reliableturing · on March 30, 2023

This is exactly part of this Google USM approach, although these pretrained models are significantly smaller than ChatGPT. They reference this paper [1] which contains more details on the pretrained text-only alignment with the speech model.

[1] https://arxiv.org/abs/2209.15329

reliableturing · on March 9, 2023

As someone who tends to side with Chomsky in these debates, I think Norvig makes some interesting points. However, I would like to pick one of his criticisms to disagree on:

"In 1969 he [Chomsky] famously wrote:

     But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term.

His main argument being that, under any interpretation known to him, the probability of a novel sentence must be zero, and since novel sentences are in fact generated all the time, there is a contradiction. The resolution of this contradiction is of course that it is not necessary to assign a probability of zero to a novel sentence; in fact, with current probabilistic models it is well-known how to assign a non-zero probability to novel occurrences, so this criticism is invalid, but was very influential for decades."

I think Norvig wrongly interprets Chomsky's "probability of a sentence is useless" as "the probability of a novel sentence must be zero". I agree that we've shown that it's possible to assign probabilities to sentences in certain contexts, but that doesn't mean that it can fully desribe a language and knowledge. This seems to me yet another case of 'the truth is somewhere in the middle' and would be weary of the false dichotomy that is put forward here. Yes we can assign probabilities to sentences and they can be useful, but it's not the whole story either.

dekhn · on March 9, 2023

what's really funny is that I worked in DNA sequence analysis at a time when the chomsky hierarchy was primal and literally all my work was applying "probability of a sequence" concepts (specifically, everything from regular grammars to stochastic context free grammars). It's a remarkably powerful paradigm that has proven, time and time again, to be useful to scientists and engineers, much more so than rule systems constructed by humans.

The probability of a sentence is vector-valued, not a scalar, and the probability can be expanded to include all sorts of details which address nearly all of Chomsky's complaints.

avgcorrection · on March 9, 2023

Chomsky’s theories of human language weren’t useful to your work on DNA? That is funny. Linguists everywhere in shambles.

dekhn · on March 9, 2023

You misunderstand. The chomsky hierarchy was critical to my work on DNA. In fact my whole introduction to DNA sequence analysis came from applying probabilistic linguistics: see this wonderful paper by Searles: https://www.scribd.com/document/461974005/The-Linguistics-of...

avgcorrection · on March 9, 2023

I still don’t understand. What’s funny about it and what’s DNA got to do with human language sentences (the original quote)?

Chomsky’s grammars are used in compilers and compiler theory. Even though programming languages have got nothing to do with human languages. Certainly nothing to do with the “probability of a sentence” that he was talking about. The application of something like that doesn’t necessarily tell you anything about what Chomsky is talking about, namely human language.

dekhn · on March 9, 2023

Funny you mention that- after working with probabilistic grammars on DNA for a while, I asked my advisor if the same idea could be applied to compilers, IE, if you left out a semicolon, could you use large number of example programs to train a probabilistic compiler to look forward enough to recognize the missing semicolon and continue compiling?

They looked at me like I was an idiot (well, I was) and then said very slowly.... "yes, I suppose you could do that... it would be very slow and you'd need a lot of examples and I'm not sure I'd want a nondeterministic parser".

My entire point above is that Chomsky's contributions to language modelling are the very thing he's complaining about. But what he's really saying is "humans are special, language has a structure that is embedded in human minds, and no probabilistic model can recapitulate that or show any sign of self-awareness/consciousness/understanding". I do not think that humans are "special" or that "understanding" is what he thinks it is.

avgcorrection · on March 9, 2023

I get it now. What’s funny is your vulgar interpretation of “language”.

Which is another pet-peeve of his: people who use commonsensical, intuitive words like “language” and “person” to draw unfounded scientific and philosophical paralells between things which can be compared with metaphors, like human languages and… DNA I guess.

dekhn · on March 9, 2023

You think DNA isn't a language? That seems... restrictive. To me, the term language was genericized from "how people talk to each other" to "sequences of tokens which convey information in semi-standardized forms between information-processing entities".

DNA uses the metaphors of language- the central dogma of biology includes "transcription" (copying of DNA to RNA) and "translation" (converting a sequence of RNA to protein). Personally I think those terms stretch the metaphor a bit.

CRConrad · on March 10, 2023

Another thing that's funny is your usage of “vulgar”.

avgcorrection · on March 13, 2023

Okay.

KKKKkkkk1 · on March 9, 2023

I'm sorry, but what Chomsky wrote is pseudo-scientific mumbo jumbo, and Norvig went to the trouble of wading through that mumbo jumbo and refuting it. Thank you, Peter Norvig.

dyno12345 · on March 9, 2023

Andrey Markov would like a word

originalcopying · on March 9, 2023

but what about things with probability zero that nonetheless happen all the time?

I'm not very clear on some technicalities around probability, but I remember a 3blue1brown video: something along the lines of the probability of randomly choosing an integers out of the real number line being a probability zero event; in spite of there being infinitely many integers to "randomly pick"

time_to_smile · on March 9, 2023

When was the last time you picked a random number from an actual real number line?

Try sampling random floats in your programming language of choice and see how long it takes to get an integer (you will eventually get one).

Then consider that floating point numbers represent only a finite subset of the (countably) infinite rational numbers. And then consider that the set of rational numbers is unimaginably smaller than the set of irrational numbers (which is the other part of the reals).

The fact that integers have measure 0 in the set of real numbers only seems confusing if you're thinking of everyday operations rather than thinking about mathematical abstractions.

reliableturing · on Dec 1, 2021

Thanks for the write up! As someone starting a doctoral degree early next year, I greatly appreciate it