No, causation isnt distribution sampling. And there's a difference between, say, an extrinsic description of a system and it's essential properties.
Eg., you can describe a coin flip as a sampling from the space, {H,T} -- but insofar as we're talking about an actual coin, there's a causal mechanism -- and this description fails (eg., one can design a coin flipper to deterministically flip to heads).
In the case of a transformer model, and all generative statistical models, these are actually learning distributions. The model is essentially constituted by a fit to a prior distribution. And when computing a model output, it is sampling from this fit distribution.
ie., the relevant state of the graphics card which computes an output token is fully described by an equation which is a sampling from an empirical distribution (of prior text tokens).
Your nervous system is a causal mechanism which is not fully described by sampling from this outcome space. There is no where in your body that stores all possible bodily states in an outcome space: this space would require more atoms in the universe to store.
So this isn't the case for any causal mechanism. Reality itself comprises essential properties which interact with each other in ways that cannot be reduced to sampling. Statistical models are therefore never models of reality essentially, but basically circumstantial approximations.
I'm not stretching definitions into meaninglessness, these are the ones given by AI researchers, of which I am one.
I'm going to simply address what I think are your main points here.
There is nowhere that an LLM stores all possible outputs. Causality can trivially be represented by sampling by including the ordering of events, which you also implicitly did for LLMs.
The coin is an arbitrary distinction, you are never just modeling a coin, just as an LLM is never just modeling a word. You are also modeling an environment, and that model would capture whatever you used to influence the coin toss.
You are fundamentally misunderstanding probability and randomness, and then using that misunderstanding to arbitrarily imply simplicity in the system you want to diminish, while failing to apply the same reasoning to any other.
If you are indeed an AI researcher, which I highly doubt without you providing actual credentials, then you would know that you are being imprecise and using that imprecision to sneak in unfounded assumptions.
It's not a matter of making points, it's at least a semester's worth of courses on causal analysis, animal intelligence, the scientific method, explanation.
Causality isnt ordering. Take two contrary causal mechanisms (eg., filling a bathtube with a hose, and emptying it with a bucket). The level of the bath is arbitrarily orderable with respect to either of these mechanisms.
Go on youtube and find people growing a nervous system in a lab, and you'll notice its an extremely plastic, constantly physically adapting, and so on system. You'll note the very biochemcial "signalling" you're talking about itself is involved in the change to the physical structure of the system.
This physical structure does not encode all prior activations of the system, nor even a compression of them.
To see this consider Plato's cave. Outside the cave passes by a variety of objects which cast a shadow on the wall. The objects themselves are not compressions of these shadows. Inside the cave, you can make one of these yourself: take clay from the floor and fashion a pot. This pot, like the one outside, are not compressions of their shadows.
All statistical algorithms which average over historical cases are compressions of shadows, and replay these shadows on command, ie., they learn the distribution of shadows and sample from this distribution demand.
Animals, and indeed all science, is not concerned with shadows. We don't model patterns in the night sky -- this is astrology -- we model gravity: we build pots.
The physical structure of our bodies encodes their physical structure and that of reality itself. They do so by sensor-motor modulation of organic processes of physical adaption. If you like: our bodies are like clay and this is fashioned by reality into the right structure.
In any case, we haven't the time or space to convince you of this formally. Suffice it to say that it is a very widespread consensus that modelling conditional probabilities with generative models fails to model causality. You can read Judea Pearl on this if you want to understand more.
Perhaps more simply: a video game model of a pot can generate an infinite number of shadows in an infinite number of conditions. And no statistical algorithm with finite space and finite time requirements will ever model this video game. The video game model does not store a compression of past frames -- since it has a real physical model, it can create new frames from this model.
Eg., you can describe a coin flip as a sampling from the space, {H,T} -- but insofar as we're talking about an actual coin, there's a causal mechanism -- and this description fails (eg., one can design a coin flipper to deterministically flip to heads).
In the case of a transformer model, and all generative statistical models, these are actually learning distributions. The model is essentially constituted by a fit to a prior distribution. And when computing a model output, it is sampling from this fit distribution.
ie., the relevant state of the graphics card which computes an output token is fully described by an equation which is a sampling from an empirical distribution (of prior text tokens).
Your nervous system is a causal mechanism which is not fully described by sampling from this outcome space. There is no where in your body that stores all possible bodily states in an outcome space: this space would require more atoms in the universe to store.
So this isn't the case for any causal mechanism. Reality itself comprises essential properties which interact with each other in ways that cannot be reduced to sampling. Statistical models are therefore never models of reality essentially, but basically circumstantial approximations.
I'm not stretching definitions into meaninglessness, these are the ones given by AI researchers, of which I am one.