Hacker News new | comments | show | ask | jobs | submit login

I am one of the authors of "Outrageously Large Neural Networks". Yes - overfitting is a problem. We employed dropout to combat overfitting. Even with dropout, we found that adding additional capacity provides diminishing returns once the capacity of the network exceeds the number of examples in the training data (see sec. 5.2). To demonstrate significant gains from really large networks, we had to use huge datasets, up to 100 billion words.

Impressive work, mate!

Does mixture-of-experts works well the other way around, as a way to minimize power and hardware in common sized problems ?

And would it work in low resolution networks, like BinaryConnect ?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact