The comparison against MCTS shows strong performance from AlphaZero. Would be curious to see the performance of AlphaZero vs. number of its own rollouts - ie is the probability head output alone already encoding enough information to play well, and how deep does it have to look ahead to determine good play. Finally, for tic-tac-mo and connect 3x3, it should be possible to determine the optimal move. How much training / lookahead is required to achieve that? Does AlphaZero achieve perfect play for these games?
The paper's first listed contribution is "an independent reimplementation of DeepMind's AlphaZero algorithm". Maybe I missed it, but I don't see a link to a repo with the implementation.
"Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware."
The bigger difference in equity trading is that it's a "game" of hidden information with asymmetric and maybe unknown payoffs. Of course, computers are pretty good at trading stocks, too, but I can say confidently AlphaZero won't be the most useful approach.