"Warp programs are written in a high level Pascal-like language called W2, which is supported by an optimizing compiler written in Common Lisp. W2 programs are developed in a sophisticated, Lisp-based programming environment supporting interactive program development and debugging. A C or Lisp program can call a W2 program from any UNIX host on the local area network."
Ah, the good old days of the dominance of Lisp in artificial intelligence research.
Most people don't realize that what is popularly known as "deep learning" has actually been around for over 50 years now. The big gains are really the result of having additional computing power to throw at the problems.
Is that actually true? From my cursory understanding, it seems that there has been a slew of theoretical breakthroughs (including ways to architecte and train networks aside from just blindly throwing more computation resources) that significantly pushed the field forward. Off the top of my head, I'm thinking of Hinton's 2006 DNN which pre-trained each layer one at a time as a restricted Botlzmann machine, computationally simpler activation functions like ReLu, the development of residual networks, the development of GANs, etc.
Of course GPU-based training has helped a lot, but my impression is that these theoretical breakthroughs were also important.
I think its more that originally GPU compute proved that deep nets are possible to train and give good results which lead to more research on them and the development of things like ReLU and ResNets.
You're right. It is partly greater computing resources, partly recent algorithmic advances, and partly the availability of large amounts of data which the more complex networks require.
Large, cleaned, tagged datasets (like imagenet) get at least as much credit as raw compute. I think everyone was surprised at how well deep CNN quality scaled when given appropriately large training sets.
Ah, the good old days of the dominance of Lisp in artificial intelligence research.