It can be almost as easy, but it can be made as hard as you want. These kind of Captcha exercises are a fun way to test ideas especially if you want to work on "attention" models (spatial Transformers, ROI, deformable convolutions, soft attention, hard attention).
You can also try to "read" one letter at a time using a RNN. You may even test the new capsule networks for their rotation invariance. All these different network architectures, encode various strategies one can use to decode a picture. Obviously all will work at > 99% on the simple cases, but as you increase the captcha difficulty you will see that the more modern architectures can trade computation for increased accuracy.