
Chess2vec: Learning Vector Representations for Chess (2018) [pdf] - polm23
http://www.berkkapicioglu.com/wp-content/uploads/2018/11/chess2vec_nips_2018_short.pdf
======
thomasahle
I tried something similar, learning a chess representation, and move predictor
with a simple linear fastText model:
[https://github.com/thomasahle/fastchess](https://github.com/thomasahle/fastchess)

It turns out that you can get surprisingly good accuracy (27.5%) at predicting
moves from the chess world championship, even though the model is linear and
has no idea of piece interactions.

~~~
solveit
I don't know much about chess, so this is a sincere question. What sort of
accuracy would be a baseline? So, on average, how many not-obviously-stupid
possible moves are there per turn (for some reasonable value of "obviously")?

~~~
thomasahle
I have thought about this too, but I don't think there's an easy answer.

To get the number of _good_ moves in a position, one might look at the search
distribution generated by stockfish or lc0, or a human grandmaster. They'd
probably all consider at most 1-4 moves per position.

However, reducing the search to just the top four moves is not something you
can do with a simple heuristic. It either takes a lot of engineering or
advanced pattern matching. If we could do it easily, making a strong chess
engine would be a walk in the park.

I've wondered if the accuracy I achieve stems simply from learning a
reasonable opening book. If so the accuracy of the model should deteriorate in
the middle and endgame.

That doesn't seem to be the case, however. You can try playing the `raw`
linear model yourself by running

    
    
      play_chess.py -no-mcts
    

It plays surprisingly reasonable moves :)

------
scottlocklin
This is a weird thing. The PCA result, as expected, is basically what the
moves of chess for the major pieces look like. Also, I don't think they
understand what bitboards are for (move generation and evaluation in a fairly
dense/high performance representation); this vector representation is just
serial 1-hot encoding. There are alternate vector encodings which actually
make more sense for a neural net or matrix representation of chess positions.

I guess the authors have a company that does some kind of NNMF thing on
genetic data, which would explain their approach.

~~~
CrazyStat
My overall impression is that this was a quick and dirty summer intern
project. Nothing against quick and dirty projects but I'm not sure it really
belongs on HN.

------
CrazyStat
I've been working on a similar idea of learning a latent vector representation
of chess positions and moves. I'm not sure whether to be happy or disappointed
that this paper is much more simplistic than what I've been trying to do.

They call it "machine learning", but they're just doing PCA and nonnegative
matrix factorization, neither of which falls into the class of methods that
most people think about when you say "machine learning." Their proposed
prediction method will often (perhaps usually) predict illegal moves--it might
propose moving a piece that isn't even on the board, for example.
Understandably they don't show any examples of actually applying it.

~~~
Der_Einzige
Dimensionality reduction is absolutely a form of "machine learning". UMAP or
Ivis are doing the same thing but utilize fancier methods so now they must be
more relavent?

I'd be curious to see what happens when you combine the system with a system
that removes the possibility of illegal moves. I bet accuracy will go
dramatically up - especially if they utilize a really modern transformers
based architecture for the vectorization part.

~~~
CrazyStat
Dimensionality reduction more generally may well be a form of machine
learning. Doing really simple dimensionality reduction and then a really
stupid prediction method doesn't quite cut it as far as I'm concerned.

~~~
jmmcd
> doesn't quite cut it

I think the underlying assumption here is that some approaches are too simple
to "cut it", ie to be called machine learning. And eg PCA is too simple, and
so should be excluded, and perhaps categorised as "just" statistics or linear
algebra depending on your point of view. But in my view, it's better to let go
of this view which implicitly hypes ML in a damaging way. F = ma is still
physics, even if simple. k-NN and PCA are still ML.

~~~
CrazyStat
Eh, I disagree. Not from a normative point of view, but from a descriptivist
one. Most people don't consider doing a PCA to be machine learning.

I'm a statistician, not a machine learning guy. PCA is a great statistical
method. But doing a PCA and calling it machine learning seems a tad
misleading.

~~~
jmmcd
Well, I certainly agree that many people use ML in this way. But to me it's
just bad terminology. The thing that happens during ML -- the L part -- must
be the same as what happens during statistical learning. Many (sophisticated)
ML algorithms are statistical in nature, so there is nothing there that stops
simple stat-learning algorithms from being ML algorithms. And there is nothing
in the terminology that says ML has to be sophisticated relative to stat
learning.

------
sjg007
Chess vectors make sense as an approach.

