Hacker News new | comments | ask | show | jobs | submit login

Scores become more equal when you make the first hidden layer size 10 instead of 100 (both methods use an X of 784 dimensions).

Instead of PCA on all features, they could subsample 10 random features to partition, and bag the results of multiple runs. Basically, Totally Random Trees with an arbitrarily handicapped Random Subspaces method. Scales well, can beat Logistic Regression, but not any of the more developed tree methods.

Another difference with established literature is that this algorithm does not use any kind of knowledge transfer from previously learned classes. In most one-shot methods, including those used by humans, the model is already trained on other classes, and uses this information to adapt to unseen classes. Instead, the authors interpret it solely as deep learning under a small training data setting (which -- my code shows -- does not require jumping through hoops).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact