This is meant as a counterexample to existing practices, not as something you should be doing.
The existing practice isn't to blindly look at the number of parameters in a model without considering the actual fit. It's a strawman argument to pretend that that's the case. A model that statistically overfits the data is just as suspect as one that underfits it. If you try to publish a paper about a model with a 0.01 chi-squared value being fit to some data, then it's going to be rejected. It doesn't matter if it's one parameter or one hundred parameters, it's clearly encoding information about the actual dataset rather than being a general model. The model that they present in this paper would have a chi-squared value of essentially zero, and someone would be laughed out of the room if they tried to present it at a conference.
So, the paper mentions that their "model" has infinite VC-dimension, so you basically shouldn't expect it to generalize, so existing theory says that it's a model that won't work.
The problem is that VC-dimension (and Rademacher complexity, etc) also claim that modern neural nets are too complicated to generalize with the amount of data we have.
And yet they do. So the deep learning community has fallen back on counting parameters, not as a way to measure generalization, but as a way to compare models, based on the empirical observation that a lot of the "improvements" we see in papers disappear when you compare to equally sized models.