In much the same way, much of the interesting work with machine learning requires deep specialist knowledge in which selecting the best approach requires digging beneath the abstraction. Many specific applications have their own quirks, which is where significant gains are often made. (To be honest, I think the last year was more exciting for the advances in applied ML than in theoretical work.)
A co-worker wrote a library for distributed training of DNNs for a specialized use case. Because of certain quirks of our use case (think sparsity and consistent patterns in the data), there were certain optimizations he could make to the training process that gave nice linear scaling in the number of training nodes.