Hacker News new | past | comments | ask | show | jobs | submit login
Chess2vec: Learning Vector Representations for Chess (2018) [pdf] (berkkapicioglu.com)
55 points by polm23 65 days ago | hide | past | web | favorite | 14 comments

I tried something similar, learning a chess representation, and move predictor with a simple linear fastText model: https://github.com/thomasahle/fastchess

It turns out that you can get surprisingly good accuracy (27.5%) at predicting moves from the chess world championship, even though the model is linear and has no idea of piece interactions.

I don't know much about chess, so this is a sincere question. What sort of accuracy would be a baseline? So, on average, how many not-obviously-stupid possible moves are there per turn (for some reasonable value of "obviously")?

I have thought about this too, but I don't think there's an easy answer.

To get the number of _good_ moves in a position, one might look at the search distribution generated by stockfish or lc0, or a human grandmaster. They'd probably all consider at most 1-4 moves per position.

However, reducing the search to just the top four moves is not something you can do with a simple heuristic. It either takes a lot of engineering or advanced pattern matching. If we could do it easily, making a strong chess engine would be a walk in the park.

I've wondered if the accuracy I achieve stems simply from learning a reasonable opening book. If so the accuracy of the model should deteriorate in the middle and endgame.

That doesn't seem to be the case, however. You can try playing the `raw` linear model yourself by running

  play_chess.py -no-mcts
It plays surprisingly reasonable moves :)

This is a weird thing. The PCA result, as expected, is basically what the moves of chess for the major pieces look like. Also, I don't think they understand what bitboards are for (move generation and evaluation in a fairly dense/high performance representation); this vector representation is just serial 1-hot encoding. There are alternate vector encodings which actually make more sense for a neural net or matrix representation of chess positions.

I guess the authors have a company that does some kind of NNMF thing on genetic data, which would explain their approach.

My overall impression is that this was a quick and dirty summer intern project. Nothing against quick and dirty projects but I'm not sure it really belongs on HN.

The 2664 is actually what AlphaZero used. (It just duplicated the whole stack a few times for history.)

It's true that chess programmers usually use a different set of bitboards.

I've been working on a similar idea of learning a latent vector representation of chess positions and moves. I'm not sure whether to be happy or disappointed that this paper is much more simplistic than what I've been trying to do.

They call it "machine learning", but they're just doing PCA and nonnegative matrix factorization, neither of which falls into the class of methods that most people think about when you say "machine learning." Their proposed prediction method will often (perhaps usually) predict illegal moves--it might propose moving a piece that isn't even on the board, for example. Understandably they don't show any examples of actually applying it.

Dimensionality reduction is absolutely a form of "machine learning". UMAP or Ivis are doing the same thing but utilize fancier methods so now they must be more relavent?

I'd be curious to see what happens when you combine the system with a system that removes the possibility of illegal moves. I bet accuracy will go dramatically up - especially if they utilize a really modern transformers based architecture for the vectorization part.

Dimensionality reduction more generally may well be a form of machine learning. Doing really simple dimensionality reduction and then a really stupid prediction method doesn't quite cut it as far as I'm concerned.

> doesn't quite cut it

I think the underlying assumption here is that some approaches are too simple to "cut it", ie to be called machine learning. And eg PCA is too simple, and so should be excluded, and perhaps categorised as "just" statistics or linear algebra depending on your point of view. But in my view, it's better to let go of this view which implicitly hypes ML in a damaging way. F = ma is still physics, even if simple. k-NN and PCA are still ML.

Eh, I disagree. Not from a normative point of view, but from a descriptivist one. Most people don't consider doing a PCA to be machine learning.

I'm a statistician, not a machine learning guy. PCA is a great statistical method. But doing a PCA and calling it machine learning seems a tad misleading.

Well, I certainly agree that many people use ML in this way. But to me it's just bad terminology. The thing that happens during ML -- the L part -- must be the same as what happens during statistical learning. Many (sophisticated) ML algorithms are statistical in nature, so there is nothing there that stops simple stat-learning algorithms from being ML algorithms. And there is nothing in the terminology that says ML has to be sophisticated relative to stat learning.

Well, word2vec has been reduced down to matrix factorization with Glove (the reformulation into linear algebra managed to get similar result but one order of magnitude faster) so it does make sense to try that first.

Chess vectors make sense as an approach.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact