This paper reminds me of the Neural Network Diffusion paper which was on the fro...

This paper reminds me of the Neural Network Diffusion paper which was on the front page of HN yesterday in the sense that we are training another model to bypass a number of iterative steps (in the previous paper, those were SGD steps, in this one, it is A* exploration steps).

On a different note, they choose such a bad heuristic for the A* for Sokoban. The heuristic they choose is "A∗ first matches every box to the closest dock and then computes the sum of all Manhattan distances between each box and dock pair". I played Sokoban for 20 minutes while reading the paper and I feel like this is a very poor exploration heuristic (you often need to move boxes away from goal state to make progress).