Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.
It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.
This actually describes neural networks in general, not so much "deep learning".
Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.
This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).
But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):
These latter machines started to prove some of what we take for granted today, in the form of the NAVLAB ALVINN self-driving vehicle:
Of course, today it can be done on a smartphone:
The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".
That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.
So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).