I can chime in for the theoretical computer scientists. Deep learning (in its simplest form) corresponds to the class of circuits whose gates are linear threshold functions. Our primary goal with such functions is not to show what problems can be solved by small circuits using linear threshold gates, but what problems cannot be solved with such circuits.Until last week [1], it was an open problem whether every function with n inputs that is computable in nondeterministic time 2^O(n) could also be computed with a two-layer circuit using only O(n) gates (that is, a deep net with just one hidden layer with a linear number of gates). This is an embarrassing state of knowledge.Now we know the following slightly less embarrassing thing: there is an explicit function, computable in linear time, that needs at least n^3/2 gates in a two-layer deep net (modulo some log(n) factors), and another function that needs n^3/2 gates in a three-layer deep net (with the additional restriction that the output gate is a majority vote of the previous layer).This is still a long way away from truly understanding the representational power of deep nets, but it's the first general progress that has been made since the early 90's.

 Actually, there's been a few papers showing solid theoretical progress on understanding what structure in data deep networks are learning to represent.https://github.com/gregversteeg/CorExhttp://arxiv.org/abs/1406.1222http://arxiv.org/abs/1410.7404Disclaimer: I reinvented the idea about a month or two ago, and ran into the existing papers when googling for joint entropy estimators after making some interesting graphs with hierarchical probabilistic programs. I only wish I could've been the first!
 An interesting and early (1992) demonstration of deep nets learning latent structure in the visual domain was presented by Usui et al. They presented a deep autoencoder with the spectral values of color chips and show that the hidden layer representation corresponds to the Munsell color space where each dimension corresponds to a psychological/intuitive color attribute. Pretty amazing. Even cooler that they showed this in 1992.10.1364/JOSAA.9.000516 Shiro Usui, Shigeki Nakauchi, and Masae Nakano, "Reconstruction of Munsell color space by a five-layer neural network," J. Opt. Soc. Am. A 9, 516-520 (1992)
 Those papers are very interesting. I was particularly surprised that the method exactly recovers the Big-5 components of personality from the results of the survey questions.The whole idea of "simplest model that explains most of the data" has always been very appealing to me. The concept is closely tied to reproducing kernel Hilbert spaces, which have recently experienced a revival in interest due to the representer theorem.
 >I was particularly surprised that the method exactly recovers the Big-5 components of personality from the results of the survey questions.Why? Is that a standard dataset? They could have just cherrypicked tasks.I also noticed that they seem to only have the actual computation working for discrete random variables (integer features) right now, which limits its applicability. They also seem to use mass-function estimators, which again can work well for discrete data while becoming statistically intractable when dealing with continuous random variables.>The whole idea of "simplest model that explains most of the data" has always been very appealing to me.In this case it's a bit more like, "The set of latent variables that best screen off the observables from each-other."
 You will probably be interested in Kolmogrov complexity, which studies this using an information theoretic lens. However it is just an abstract concept and not really something you can measure precisely.
 I meant "deep nets" in terms of boolean circuits, which seems like a different beast from these papers.
 > reinvented the idea about a month or two ago, and ran into the existing papersthis happens all the time. i'm thankful that you took the time to post the results, because a discovery that's not communicated is not really a discovery. indeed, i suspect that what's going on in the field is as much a lack of communication as much as a lack of understanding.i wish the author of the original article had included links to at least a couple of of the "number of tools to probe the geometry of these hidden structures."
 That's really interesting, especially with respect to the personality trait recovery. I'm particularly amazed given that I've never been able to replicate the Big5 model in its entirety against particular datasets.Only commenting on that portion, as it's what I know. The author doesn't appear to give any indication of which test was used, and where the sample was taken from. This makes me a little suspicious, but its probably just a disciplinary thing.Additionally, many tests reduce the pool of available questions for a scale after testing it against against other scales + theory. This means that what the author is recovering, is the scale imposed on the items by the original authors.Nonetheless, this has forced me to reinvestigate the Big5 model, so it's not all bad :).Massive appreciation for the links.
 Possibly also of interest: Recursive Neural Networks Can Learn Logical Semantics[1]Probably not as theoretical as the work you referenced, but interesting to me because of the deeply practical outcomes in NLP.
 This is still a long way away from truly understanding the representational power of deep nets, but it's the first general progress that has been made since the early 90's.Exactly. Someone should tell Yann LeCun this --see e.g. Section 3.2 in [1], or pretty much every time he brings up circuit complexity theory to automagically imply that "most functions representable compactly with a deep architecture would require a very large number of components if represented with a shallow one." [ibid, p.14]
 Related: "On the Expressive Power of Deep Learning: A Tensor Analysis". http://arxiv.org/abs/1509.05009

Search: