I'm curious where the idea that SVM are "older" than neural networks comes from....

sgt101 · on March 23, 2017

I think that SVM's were seen in the late 1990s as replacements for three layer networks. This was because the kernel trick allowed the creation of high dimensional decision surfaces over large (for those days) training sets by optimisation. Because of the restrictions of computing power and data collection in those days the idea of very large neural networks was under explored, and most people believed that a very broad network was required to capture detailed learned classifiers and that it was impractical to train such classifiers. The idea of deep networks was not widely considered because it was thought that these would be infeasible to train, and they seemed (to me at least) to be until we found out about stochastic gradient descent, initialization, transfer learning, distributed computing and GPU's. So, SVM's became very fashionable and many people said that they were basically the end state of supervised machine learning. This made people look more at unsupervised learning, apart from some people in Canada and Scotland (and various others too!). Now people think SVM's are old because the old people that they know used to do things with SVM's. Neural networks are new because now you can do things with them that are quite unexpected.

argonaut · on March 23, 2017

Your history is a little mixed up. LeNet-5 had 7 layers, for example (late 1990s).

Regardless, neural nets fell out of favor because they were seen as overcomplicated and even though they achieved competitive accuracy (yes, even when they were out of favor they were still competitive in performance to other methods), you could get similar performance with simpler methods like SVMs.

sgt101 · on March 23, 2017

Ok, let's say mid 90's then.

utoku · on March 24, 2017

Here's Vapnik's story, I first forgot where it was from, then I remembered:

https://ocw.mit.edu/courses/electrical-engineering-and-compu...

quote:

Now, the history lesson, all this stuff feels fairly new.

It feels like it's younger than you are.

Here's the history of it.

Vapnik immigrated from the Soviet Union to the United States in about 1991.

Nobody ever heard of this stuff before he immigrated.

He actually had done this work on the basic support vector idea in his Ph.D. thesis at Moscow University in the early '60s.

But it wasn't possible for him to do anything with it, because they didn't have any computers they could try anything out with.

So he spent the next 25 years at some oncology institute in the Soviet Union doing applications.

Somebody from Bell Labs discovers him, invites him over to the United States where, subsequently, he decides to immigrate.

In 1992, or thereabouts, Vapnik submits three papers to NIPS, the Neural Information Processing Systems journal.

All of them were rejected.

He's still sore about it, but it's motivating.

So around 1992, 1993, Bell Labs was interested in hand-written character recognition and in neural nets.

Vapnik thinks that neural nets-- what would be a good word to use?

I can think of the vernacular, but he thinks that they're not very good.

So he bets a colleague a good dinner that support vector machines will eventually do better at handwriting recognition then neural nets.

And it's a dinner bet, right?

It's not that big of deal.

But as Napoleon said, it's amazing what a soldier will do for a bit of ribbon.

So that makes colleague, who's working on this problem with handwritten recognition, decides to try a support vector machine with a kernel, in which n equals 2, just slightly nonlinear, works like a charm.

Was this the first time anybody tried a kernel?

Vapnik actually had the idea in his thesis but never though it was very important.

As soon as it was shown to work in the early '90s on the problem handwriting recognition, Vapnik resuscitated the idea of the kernel, began to develop it, and became an essential part of the whole approach of using support vector machines.

So the main point about this is that it was 30 years in between the concept and anybody ever hearing about it.

It was 30 years between Vapnik's understanding of kernels and his appreciation of their importance.

And that's the way things often go, great ideas followed by long periods of nothing happening, followed by an epiphanous moment when the original idea seemed to have great power with just a little bit of a twist.

And then, the world never looks back.

And Vapnik, who nobody ever heard of until the early '90s, becomes famous for something that everybody knows about today who does machine learning.

argonaut · on March 23, 2017

Rosenblatt's perceptron has little to do with neural nets. Geoffrey Hinton regrets coining the name "multi-layer perceptron" precisely because they're really unrelated.

radarsat1 · on March 23, 2017

Hm, yeah I suppose you could argue it's not really a neural network unless you involve the idea of hidden layers. And according to wikipedia backpropagation wasn't a thing until 1975 so you might have a point.

argonaut · on March 24, 2017

"It's not really a neural network" is an understatement. Perceptrons are about as related to neural nets as linear regression is (note that you can "train" linear regression with stochastic gradient descent).