From the related article: > The move from hand-designed features to learned feat...

conmdur · on Jan 3, 2017

Here the features are the feature vectors themselves, yes. It's been found that taking somewhat of a hands-off approach and allowing networks to engineer their own mid-level representations from raw data can be very beneficial.

This is the idea behind the learning to learn paper. Instead of taking our gradient and plugging it in to a hand-engineered (i.e. on paper) update rule, we feed it to a neural network, which is trained to find the optimal update rule, in some sense (neural networks are just function approximators after all).

dougabug · on Jan 3, 2017

The point of the original paper was to learn the hyperparameters of a DNN using a DNN, as opposed to using, say, a bayesian optimization framework.