Create a random board configuration. Exhaustively explore the search space to find whether this is a win +1, lose -1, or draw 0 for Player 1. You now have a training example: (board configuration, game outcome)
Now, train a neural network (or other non-linear ML model, e.g. SVM) to predict the expected outcome based upon the board configuration.
Given that the search space can grow O(4^mn), this can be done only for endgame configurations. Further, not knowing any bounds on the board size makes the input representation difficult to define for a such machine learning approach. And, your target should probably be the weights of an evaluation function, rather than the exact game outcome.
As for the learning algorithm, I know TD-learning was found to be a good approach in various chess programs.
> For each possible move, use the neural network to predict the outcome of that move. Pick the move with highest expected outcome.
You would likely still want to run an alpha-beta search to pick the move to minimize the prediction error.