Hacker News new | past | comments | ask | show | jobs | submit login

> This is meant as a counterexample to existing practices

So, the paper mentions that their "model" has infinite VC-dimension, so you basically shouldn't expect it to generalize, so existing theory says that it's a model that won't work.

The problem is that VC-dimension (and Rademacher complexity, etc) also claim that modern neural nets are too complicated to generalize with the amount of data we have.

And yet they do. So the deep learning community has fallen back on counting parameters, not as a way to measure generalization, but as a way to compare models, based on the empirical observation that a lot of the "improvements" we see in papers disappear when you compare to equally sized models.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact