What's neat is that the technique is an almost comically simple way to add extra layers to a network. It's commonly accepted that deeper networks can learn better, but they get very unwieldy/difficult to train as they get deeper.
Roughly speaking (and please correct me if I'm off-base), the paper's technique is to slot in additional layers that that are initially 'identity+', where the new layer then gets trained to hone in on the differences from 'identity'. This training on residuals alone is more stable, since answers near each '~0' starting point are simply as good as the original network - any improvement is a pure win.
So... their winning network has a breathtaking 152 layers (and then ensembles a few of them together).