Hacker News new | past | comments | ask | show | jobs | submit login

"It simply doesn't matter how many parameters you have when that's the case, it means that your fit is useless." That is their entire point - that model complexity can't be measured in number of coefficients, since this model only has one coefficient, yet has model complexity as high as is possible.

This is meant as a counterexample to existing practices, not as something you should be doing.

But since everyone (implicitly or explicitly) specifies the precision of the coefficients (eg fp32, fp64, int8, etc), it’s a complete straw man you’re arguing against.

Nobody evaluates a model based simply on how well it fits the data and its number of parameters; you also look at how well the model parameters are constrained and what the uncertainty bands on the fit are.

The existing practice isn't to blindly look at the number of parameters in a model without considering the actual fit. It's a strawman argument to pretend that that's the case. A model that statistically overfits the data is just as suspect as one that underfits it. If you try to publish a paper about a model with a 0.01 chi-squared value being fit to some data, then it's going to be rejected. It doesn't matter if it's one parameter or one hundred parameters, it's clearly encoding information about the actual dataset rather than being a general model. The model that they present in this paper would have a chi-squared value of essentially zero, and someone would be laughed out of the room if they tried to present it at a conference.

> This is meant as a counterexample to existing practices

So, the paper mentions that their "model" has infinite VC-dimension, so you basically shouldn't expect it to generalize, so existing theory says that it's a model that won't work.

The problem is that VC-dimension (and Rademacher complexity, etc) also claim that modern neural nets are too complicated to generalize with the amount of data we have.

And yet they do. So the deep learning community has fallen back on counting parameters, not as a way to measure generalization, but as a way to compare models, based on the empirical observation that a lot of the "improvements" we see in papers disappear when you compare to equally sized models.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact