After AlphaGo, I tasked myself with creating a neural network that would use Q-Learning to play Reversi (aka Othello).
At that point, I had already utilized Q-Learning (the tabular version, not using a neural network) for some very simple and mostly proof-of-concept projects, so I understood how it worked. I read up only perceptrons, relu, the benefits/disadvantages of having more/fewer layers, etc.
Then I actually started on the project, thinking "I know about Q-Learning, I know about neural networks, now I just need to use Keras and I'll have a network ready to learn in about twenty lines of python."
Boy was that naive. Regardless of how much you understand the CONCEPTS of neural networks, actually putting together an effective one that matches the problem state perfectly is so, so difficult (especially if there are no examples to build off of). How many layers? Dropout or no, and if so, how much? Do you flatten this layer, do you use relu, do you need a SECOND neural network to approximate one part of the q-function and another to approximate a different part?
I spent MONTHS messing with the hyperparameters, and got nowhere because I'm doing this on a desktop pc without CUDA, so it takes days to train a new configuration only to find out it hardly "learned" anything.
At one point after days of training, my agent actually had a 90% LOSE rate against an opponent that played totally randomly. To this day I am baffled by this.
I went into the project thinking "I have this working with a table, the q-learning part is in place -- just need to drop in a neural net in place of the table and I'm good to go!" It's been almost a year and I still haven't figured this thing out.
Nice! You should have just added a bit at the end to invert whatever answer it got, and you would have had a winner.
But more seriously, I think that we will become more clever with designing genetic algorithms to evolve the neural networks as part of the training process rather than trying to build our own from scratch every time. I vaguely recall there is some research being done on that front already.
