Hacker News new | past | comments | ask | show | jobs | submit login

Interesting point. Nvidia have been improving the int performance for quantized inference on their GPUs a lot. It might be a lot of work but could it be possible to scale up this NNUE approach to the point where it would be worthwhile to run on a GPU?

For single-input "batches" (seems like this is what's being used now?) it might never be worthwhile but perhaps if multiple positions could be searched in parallel and the NN evaluation batched this might start to look tempting?

Not sure what the effect of running PVS with multiple parallel search threads is. Presumably the payoff of searching with less information means you reach the performance ceiling quite a lot quicker than MCTS-like searches as the pruning is a lot more sensitive to having up-to-date information about the principal variation.

Disclaimer: My understanding of PVS is very limited.




Sure if someone can come up with an approach to run an NNUE (efficiently updatable) network on GPUs that might really be another breakthrough. But at a first glance it looks to me that this could be very difficult. Because AFAIK the SF search is relatively complicated. Even for Leela implementing an efficient batched search on multiple GPUs seems to be difficult (some improvements coming with Ceres). And Leela is using a much simple MCTS search. That doesn't that Leela's search could not be improved. It does not give higher priority necessarily for forced sequence of moves (at least not explicitly) or high risk moves. Which is IMHO why sometimes she does not see relatively simple tactics.


I guess the simplest approach to port NNUE to GPUs would be to run a complete instance per GPU thread (ie concurrent, not parallel evaluation).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: