This is just a trick of language though, because their single, hundreds-of-thousands-of-digits-of-precision “parameter” is really just a way of packing an agglomeration of other parameters into one bit bucket.If you could take some construct and decompose it further into a bunch of independent components that separately account for the prediction behavior, then the number of parameters was really the number of components (or something close), and not “one parameter” artificially because of the way the separate parameters were packaged and decoded.So even on this intended point, the paper’s result doesn’t really matter. In a Kolmogorov complexity sense, this “one parameter” model is more complex than many multi-parameter models that represent shorter programs, which is a way of saying this “one parameter” model is not really one parameter.

 > "In a Kolmogorov complexity sense, this “one parameter” model is more complex than many multi-parameter models that represent shorter programs"I'm not a mathematician, but I think the point of the exercise was to show that the number of parameters that a function requires is not a relevant indicator of the complexity of the function itself. It specifically uses a multi-parameter function converted to a very complicated single-parameter function to show this.In the case you described where the arguments are spread into their components (as is the case in "normal" functions), the arguments can still have different precision, or more generally, different complexities. Look at a function that takes a quad, then look at a function that takes 5 bools. One is obviously more complex than the other (using the meaning of "complex" as discussed in the comments), and it has nothing to do with the number of arguments.Disclaimer: I haven't read the article, just the comments here so please don't sue me if I'm incorrect.
 This is exactly what I mean by “a trick of language” though. The superficial “number” of parameters is not important.Think of it this way: suppose someone read this article and as a result they thought, “aha, Occam’s Razor is a sham because this shows that ‘simpler’ models (just “one” parameter!) are possibly even more susceptible to overfitting or generalization error. So I’m free to use whatever complex model I wan’t!”Then they go and fit a 427-degree polynomial to their data set with 427 observations, and when someone says, “but using that many parameters is overly complex, you should prefer a simpler model” they’ll reply, “No way! Read this paper! It says that even simple models with a small number of parameters can have this problem.”I know it’s a simplification, but it’s important. The notion of “a parameter” has to meaningfully capture an independent degree of freedom, and be a unit of constraint of the complexity (theoretically, Kolmogorov complexity, minimum description length of a program that models the underlying data generating process), or else it’s not a mathematical parameter, rather just a linguistic construct.In the paper’s example, saying that the model “has one parameter” is misleading, because you need so much precision specifically to allow the “one parameter” to control many parameters worth of complexity.So if someone had a takeaway point of “there’s not necessarily a reason to favor simpler models” that would be a big misunderstanding — yet it seems like the paper almost intentionally is set up to mislead in this way.
 I absolutely do not disagree. My entire point is in your first paragraph.> The superficial “number” of parameters is not important.Whether the number of parameters is small or large does not change the complexity of the function, though if you're gonna pass 64 bools might as well call it a long long with bitwise OR to simplify the usability.

Search: