Tom Mitchell's definition of machine learning algorithms as those that improve their performance at some task with experience is precisely the way in which humans go about learning what's necessary to perform the same tasks that formerly took thousands or millions of hours.
For highly dimensional problems, such as text classification (i.e., spam detection) or image classification (i.e., facial detection), it's almost impossible to hard code an algorithm to accomplish its goal without using machine learning. It's much easier to use a binary spam/not spam or face/not face labeling system that, given the attributes of the example, can learn which attributes beget that specific label. In other words, it's much easier for a learning system to determine what variables are important in the ultimate classification than trying to model the "true" function that gives rise to the labeling.
Probably also worth speculating on why this is happening NOW. Why is this breaking out of CS departments in 2011 and not 2002?
The datasets are new.
Bandwidth? Storage capacity? Computing power? All of the above?
It wasn't until the 1990s that computers started becoming reasonably priced and more accessible to researchers and hobbyists that we began seeing an exponential growth in the amount of research output. In many way, one could argue that the proliferation and development of AI has very much followed Moore's law, since these are extremely complex and costly calculations.
Bandwidth increases have certainly increased the availability of data sets (Google has its entire ngrams data set fully available, and it's multiple terabytes in size), but storage capacity (hard disk, RAM, and CPU cache) and computing power have really formed the bottle neck. It's not just storage capacity, either: I/O read/write times are also immensely important. It's all just a huge balancing act right now.