The difference isn't easy to describe, but one such difference would be that a single extra stone can change a Go position value much more than a single pixel changes an image classification.
The problem changes dramatically when the AI is supposed to take arbitrary input from the world. Then the AI needs to determine what input to collect, and the path length connecting its decisions to its reward grows enormously.
I still agree with your take though: there's an important milestone here.
A CNN can still distinguish extremely subtle differences of various animal breeds, exceeding human performance in such tasks. Why was that advance not a warning sign? The rotational-translational invariance prior of the convolutional neural network probably helps because, by default, local changes of the patterns can massively change the output value without the need to train that subtle change for all translations. Also, AlphaGo does a tree search all the way to the games end, which can probably easily detect such dramatic changes of single extra stones. Reality is likely much too unconstrained to to able to efficiently simulate such things.