Until last week , it was an open problem whether every function with n inputs that is computable in nondeterministic time 2^O(n) could also be computed with a two-layer circuit using only O(n) gates (that is, a deep net with just one hidden layer with a linear number of gates). This is an embarrassing state of knowledge.
Now we know the following slightly less embarrassing thing: there is an explicit function, computable in linear time, that needs at least n^3/2 gates in a two-layer deep net (modulo some log(n) factors), and another function that needs n^3/2 gates in a three-layer deep net (with the additional restriction that the output gate is a majority vote of the previous layer).
This is still a long way away from truly understanding the representational power of deep nets, but it's the first general progress that has been made since the early 90's.
Disclaimer: I reinvented the idea about a month or two ago, and ran into the existing papers when googling for joint entropy estimators after making some interesting graphs with hierarchical probabilistic programs. I only wish I could've been the first!
Shiro Usui, Shigeki Nakauchi, and Masae Nakano, "Reconstruction of Munsell color space by a five-layer neural network," J. Opt. Soc. Am. A 9, 516-520 (1992)
The whole idea of "simplest model that explains most of the data" has always been very appealing to me. The concept is closely tied to reproducing kernel Hilbert spaces, which have recently experienced a revival in interest due to the representer theorem.
Why? Is that a standard dataset? They could have just cherrypicked tasks.
I also noticed that they seem to only have the actual computation working for discrete random variables (integer features) right now, which limits its applicability. They also seem to use mass-function estimators, which again can work well for discrete data while becoming statistically intractable when dealing with continuous random variables.
>The whole idea of "simplest model that explains most of the data" has always been very appealing to me.
In this case it's a bit more like, "The set of latent variables that best screen off the observables from each-other."
this happens all the time. i'm thankful that you took the time to post the results, because a discovery that's not communicated is not really a discovery. indeed, i suspect that what's going on in the field is as much a lack of communication as much as a lack of understanding.
i wish the author of the original article had included links to at least a couple of of the "number of tools to probe the geometry of these hidden structures."
Only commenting on that portion, as it's what I know. The author doesn't appear to give any indication of which test was used, and where the sample was taken from. This makes me a little suspicious, but its probably just a disciplinary thing.
Additionally, many tests reduce the pool of available questions for a scale after testing it against against other scales + theory. This means that what the author is recovering, is the scale imposed on the items by the original authors.
Nonetheless, this has forced me to reinvestigate the Big5 model, so it's not all bad :).
Massive appreciation for the links.
Probably not as theoretical as the work you referenced, but interesting to me because of the deeply practical outcomes in NLP.
Exactly. Someone should tell Yann LeCun this --see e.g. Section 3.2 in , or pretty much every time he brings up circuit complexity theory to automagically imply that "most functions representable compactly with a deep architecture would require a very large number of components if represented with a shallow one." [ibid, p.14]