Any NN "trained on" data sampled from an abstract complete outcome space (eg., a...

IanCal · on April 15, 2024

Right, learning more abstract rules about how things work is the goal and where the value comes in. Not all algorithms are able to do this, even if they can do what you describe in your first comment.

That's why they're interesting, othellogpt is interesting because it builds a world model.

mjburgess · on April 15, 2024

It builds a model of a "world" whose structure is conditional probabilities, this is circular. It's like saying you can use a lego model to build a model of another lego model. All the papers which "show" NNs building "world" models arent using any world. It's lego modelling lego.

The lack of a world model only matters when the data NNs are trained on aren't valid measures of the world that data is taken to model. All the moves of a chess game are a complete model of chess. All the books ever written aren't a model of, well, anything -- the structure of the universe isnt the structure of text tokens.

The only reason all statistical algorithms, including NNs, appear to model the actual world is because patterns in data give this appearance: P(The Sun is Hot) > P(The Sun is Cold) -- there is no model of the sun here.

The reason P("The Sun is Hot") seems to model the sun, is because we can read the english words "sun" and "hot" -- it is we who think the machine which generates this text does so semantically.. but the people who wrote that phrase in the dataset did so; the machine is just generating "hot" because of that dataset.

IanCal · on April 15, 2024

Othellogpt is fed only moves and builds a model of the current board state in its activations. It never sees a board.

> It's like saying you can use a lego model to build a model of another lego model.

No it's like using a description of piece placements and having a picture in mind about what the current model looks like.

mjburgess · on April 15, 2024

The "board" is abstract. Any game of this sort is defined by a series of conditional probabilities:

{P(Pawn_on_sqare_blah|previous_moves) ... etc.}

What all statistical learning algorithms model is sets of conditional probabilities. So any stat alg is a model of a set of these rules... that's the "clay" of these models.

The problem is the physical world isn't anything like this. The reason I say, "I liked that TV show" is because I had a series of mental states caused by the TV show over time (and so on). This isnt representable as a set of conditional probs in the same way.

You could imagine, at the end of history, there being a total set of all possible conditional probabilities: P(I liked show|my_mental_states, time, person, location, etc.) -- this would be uncomputable, but it could be supposed.

If you had that dataset then yes, NNs would learn the entire structure of the world, because that's the dataset. The problem is that the world cannot be represented in this fashion, not that NNs could model it if it could be. A decision tree could.

P(I liked the TV show) doesnt follow from any dataset ever collected. It follows from my mental states. So no NN can ever model it. They can model frequency associations of these phrases in historical text documents: this isnt a model of hte world

IanCal · on April 15, 2024

> Any game of this sort is defined by a series of conditional probabilities: {P(Pawn_on_sqare_blah|previous_moves) ... etc.}

That would always be 1 or 0, but also that data is not fed into othellogpt. That is not the dataset. It is not fed in board states at all.

It learns it, but it is not the dataset.

mjburgess · on April 15, 2024

It is the dataset. When you're dealing with abstract objects (ie., mathematical spaces), they are all isomorphic.

It doesnt matter if you "feed in" 1+1+1+1 or 2+2 or sqrt(16).

The rules of chess are encoded either explicit rules or by contrast classes of valid/invalid games. These are equivalent formulations.

When you're dealing with text tokens it does matter if "Hot" is frequently after "The Sun is..." because reality isnt an abstract space, and text tokens arent measures of it.

IanCal · on April 15, 2024

> It is the dataset.

No. A series of moves alone provides strictly less information than a board state or state + list of rules.

mjburgess · on April 15, 2024

If the NN learns the game, that is itself an existence proof of the opposite, (by obvious information-theoretic arguments).

Training is supervised, so you don't need bare sets of moves to encode the rules; you just need a way of subsetting the space into contrast classes of valid/invalid.

It's a lie to say the "data" is the moves, the data is the full outcome space: ({legal moves}, {illegal moves}) where the moves are indexed by the board structure (necessarily, since moves are defined by the board structure -- its an abstract game). So there's two deceptions here: (1) supervision structures the training space; and (2) the individual training rows have sequential structure which maps to board structure.

Complete information about the game is provided to the NN.

But let's be clear, the othellogpt still generates illegal moves -- showing that it does not learn the binary conditional structure of the actual game.

The deceptiveness of training a NN on a game whose rules are conditional probability structures and then claiming the very-good-quality conditional probability structures it finds are "World Models" is... maddening.

This is all just fraud to me; frauds dressing up other frauds in transparent clothing. LLMs trained on the internet are being sold as approximating the actual world, not 8x8 boardgames. I have nothing polite to say about any of this

IanCal · on April 15, 2024

> It's a lie to say the "data" is the moves, the data is the full outcome space: ({legal moves}, {illegal moves})

There is nothing about illegal moves provided to othellogpt as far as I'm aware.

> Complete information about the game is provided to the NN.

That is not true. Where is the information that there are two players provided? Or that there are two colours? Or how the colours change? Where is the information about invalid moves provided?

> But let's be clear, the othellogpt still generates illegal moves -- showing that it does not learn the binary conditional structure of the actual game.

Not perfectly, no. But that's not at all required for my point, though is relevant if you try and use the fact it learns to play the game as proof that moves provide all information about legal board states.

mjburgess · on April 15, 2024

How do you think the moves are represented?

All abstract games of this sort are just sequences of bit patterns, each pattern related to the full legal space by a conditional probability structure (or, equivalently, as set ratios).

Strip away all the NN b/s and anthropomorphic language and just represent it to yourself using bit sets.

Then ask: how hard is it to approximate the space from which these bit sets are drawn using arbitrarily deep conditional probability structures?

it's trivial

the problem the author sets up about causal structures in the world cannot be represented as a finite sample of bit set sequences -- and even if it could, that isnt the data being used

the author hasn't understood the basics of what the 'world model' problem even is