Hacker News new | past | comments | ask | show | jobs | submit login

Deep learning has succeeded tremendously with perception in domains that tolerate lots of noise (audio/visual). Will those successes continue with perception in domains that are not noisy (language) and inference/control, which the article touches on? I think it really is unclear whether those challenges will require fundamental developments or just more years of incremental improvement. If fundamental developments are needed, then the timeline for progress - which everyone in tech seems to be interested in - becomes much more indeterminate.

If you think about audio/visual data, deep nets make sense: if you tweak a few pixel values in an image, or if you shift every pixel value by some amount, the image will still retain basically the same information. In this context, linearity (weighting values and summing them up) make sense. It's not clear whether this makes sense in language. On the other hand, deep methods are state of the art on most NLP tasks, but their improvement over other methods isn't the huge gap as in computer vision. And while we know there are tight similarities between lower-level visual features in deep nets and the initial layers of the visual cortex, the justification for deep learning in NLP is simpler and less specific: what I see is the fact that networks have huge capacity to fit to data and are deep (rely on a hierarchy of features). My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

I think there are similar limitations with control and inference. When it comes to AlphaGo the deep learning component is responsible for estimating the value of the game state; the planning component is done with older methods. This is much more speculative, but when it comes to the work on Atari games, for example, I suspect that most of what is being learned (and solved) is perception of useful features from the raw game images. I wonder whether the features for deducing game state score are actually complex.

I think what I'm trying to say is that when we look at the success of deep learning, we have to separate out what part of that is due to the fact that deep learning is the go-to blackbox classifier, and what part of this is due to the systems we use actually being a good model for the problem. If the model isn't good, does that model merely need to be tweaked from what we currently use, or does the model have to completely change?




There is evidence that language is fairly smooth though. For example, we can extract e.g. the gender vector from a word embedding space that is learned by a recurrent neural network. That seems to hint at the possibility that words, sentences and concepts live in smooth, high-dimensional manifold that makes them learnable for us in the first place (because in that case they can be learned by small local improvements which seems to be required for biological plausibility). That is also the reason why we have often many words for the same or similar meanings and, conversely, why formal grammars have failed at modeling language.

Arguing from the other direction, neural networks have also already proven to deal with very sharp features. For example the value and policy networks in AlphaGo are able to pick up on subtle changes in the game position. The changes from the placement of single stones can be vast in Go and by no means this is only solved by the Monte Carlo tree search. Without MCTS, AlphaGo still wins in ~80% of the time against the best hand-crafted Go program. The value and policy networks have pretty much evolved a bit of boolean logic, simply from the gradient from the smoothness that results from averaging over a lot of training data.

I have a pet theory that the discovery of sharp features and boolean programs might heavily rely on noise. If the error surface becomes too discrete, we basically need to backup to pure random optimization (i.e. trying any direction by random chance and keep it, if it is better). That allows us to skip down the energy surface even without the presence of a gradient. Of course, such noise can also lead to forgetting, but it just seems that elsewhere the gradient will be non-zero again, so any mistakes will be correct by more learning (or it simply leads to further improvement if the step was into the right direction). Surely, our episodic memory helps in the absence of gradient information as well. If we encounter a complex, previously unknown Go strategy, for example, it will likely not smoothly improve all our Go playing abilities by a small amount. Instead, we store a discrete chaining of states and actions as an episodic memory which allows us to reuse that knowledge simply by recalling it at a later point in time.


> I have a pet theory that the discovery of sharp features and boolean programs might heavily rely on noise. If the error surface becomes too discrete, we basically need to backup to pure random optimization (i.e. trying any direction by random chance and keep it, if it is better). That allows us to skip down the energy surface even without the presence of a gradient.

Isn't that basically Monte Carlo?


It's called random optimization or random search depending on whether you sample from a normal or uniform distribution for the random direction. MC typically refers to any algorithm that computes approximated solutions using random numbers (as opposed to Las Vegas algorithms which use random numbers to always compute the correct solution). So, yes, RO, RS and gradient descent are MC local optimization algorithms.


The very method of using a word embedding space assumes the manifold is smooth, so the fact that vectors extracted from a method that assumes a smooth manifold, are in fact on a smooth manifold, is just circular and not evidence of anything.


The evidence is that this works in the first place.


That is very, very weak evidence.


It is already succeeding on language tasks, see https://research.facebook.com/research/babi/

It is funny how every AI post on HN turns into a speculative discussion forum full of words "I think", "likely", "I suspect", "My guess" etc, when all the research is available for free and everyone is free to download and read it to get a real understanding of what's going on in the field.

>what I see is the fact that networks have huge capacity to fit to data and are deep (rely on a hierarchy of features).

Actually recurrent neural networks like LSTM are turing-complete, i.e. for every halting algorithm it is trivial to implement an RNN that computes it. It is non-trivial to learn these parameters from algorithm IO data, but for many tasks it is possible too.

>I suspect that most of what is being learned (and solved) is perception of useful features from the raw game images.

It is not this simple, deep enough convnets can represent computations, the consensus is that middle and upper layers of convnets represent some useful computation steps. Also note that human brain can only do so much computation steps to answer questions when in dialogue, due to time and speed limits.

>My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

This is being worked on, see the first link for Memory Networks and Stack RNNs, DeQue RNNs, Tree RNNs. Deep learning is a very generic term, there are dozens of various feedforward and recurrent architectures that are fully differentiable. The full potential of such models has not been nearly reached yet and maybe language understanding will be solved in the coming years (again, the first link shows that it is in process of being solved).


I specifically and mindfully added those words because everything is really an open research question. Would you rather I dissembled a false sense of confidence? If anything, you're stating your vague case way over-confidently. Turing-completeness is broad and nonspecific. Doing "some computation" is an obvious statement that doesn't add any information. The human brain does not seem to have time limits when it comes to thinking about what to say, and further we don't understand enough about neuroscience to make statements like that. Like I said, these are all active areas of research; the jury is still out on whether any specific approach will be the breakthrough.

EDIT (reply to below): in general these statements are either vague and nonspecific, or perfectly correct and non-informative, comments that don't have much to do with my original point.


I agree to your points. Your comment is a quality one, I mostly talked about other ones.

>Turing-completeness is quite broad and nonspecific, like I said.

It is, but feedforward models (and almost every Bayesian/statistical model) don't possess it even in theory, while RNNs do.

>Doing "some computation" is an obvious statement that doesn't add any information.

Let me be more specific: currently researchers think that later stages of CNNs do something that is more interpretable as computation than as mere pattern matching. Our world doesn't require 50-level hierarchy, but resnets with 50+ layers do good, looks like because they learn some non-trivial computation.

>the jury is still out on whether any of those RNN approaches will be the needed breakthrough.

Sure, we'll see. Maybe there won't be need in any breakthrough, just incremental improvement of models. And even current models when scaled up to next-gen hardware (see nervana) can surprise us again with their performance.


My skepticism is not about "succeeding" academically in the sense that research groups get better and better scores on Kaggle competitions.

My skepticism is about success in the sense of commercially useful systems that can process language and function "off the leash" of human supervision without the results being dominated by unacceptably bad results.

Look at the XBOX ONE Kinect vs the XBOX 360 Kinect. On paper the newer product is much better than the old product, but neither one is any easier or fun to use than picking up the gamepad. In the current paradigm, researchers can keep putting up better and better numbers without ever crossing the threshold to something anybody can make a living off.


> It is funny how every AI post on HN turns into a speculative discussion forum full of words "I think", "likely", "I suspect", "My guess" etc

This is probably due to the fact that the field is very interesting and has lots of undefined boundaries, so people like to take educated guesses based on the knowledge they might have and on their intuition. Fair enough for this discussion.

> maybe language understanding will be solved in the coming years

maybe? :)

OK, here comes my guess: I think reasoning about and producing computer programs should be easier than reasoning about and producing natural language. So if that's possible (big if), then it should come first. And then maybe the NLP will be solved with the help of code writing computers. Or maybe just by code writing computers, and nobody here has a job anymore :)


> Deep learning has succeeded tremendously with perception in domains that tolerate lots of noise (audio/visual). Will those successes continue with perception in domains that are not noisy (language)

I wonder if it's just a different kind of "noise". Higher level, more structured.

> My guess is we may need a fundamental breakthrough in a newfangled hierarchical learning system that is better suited for language to “solve” NLP.

It seems fairly evident that there are many hierarchies inside the brain, each level working with outputs from lower-level processing units. In a sense, something like AlphaGo is hierarchy-poor - it has a few networks loosely correlated with a decision mechanism.

But the brain probably implements a "networks upon networks" model, that may also include hierarchical loops and other types of feedback.

I think, to have truly human level NLP, we'd have to simulate reasonably close the whole hierarchy of meaning, which in turn is given by the whole hierarchy of neural aggregates.


Language is noisy. People often say things that have little to do with what they mean and context is really important.

EX: "How long do stars last?" Means something very different in a science class than a tabloid headline. Is that tabloid talking divorce or obscurity? Notice how three sentences in I am clarifying last.


Yep. The problem is that it's _so_ noisy, that the encryption, as it were, might be too strong to crack with statistical methods. You might need the key; i.e., something like a human brain.

EDIT: a combination of noise, I should say, and paucity of information.


Well, we also get things wrong all the time. We regularly either ask for further information to decide what they mean, or expect that it's OK to get the interpretation wrong but be corrected.

Asking a computer to solve all the ambiguity in human language perfectly is asking it to solve it far better than any human can.


No, you only need context. Context in the form of knowledge about the place, company and history that the statement is spoken in. Wikipedia will serve well for a lot of that.


Representation of relationships without representation of qualia gives you brittle nonsense - a content-free wireframe of word distributions.

For human-level NLP, you need to model the mechanism by which the relationship network is generated, and ground it in a set of experiences - or some digital analogue of experiences.

Naive statistical methods are not a good way to approach that problem.

So no, Wikipedia will not provide enough context, for all kinds of reasons - not least of which is the fact that human communications include multiple layers of meaning, some of which are contradictory, while others are metaphorical, and all of the above can rely on unstated implication.

Vector arithmetic is not a useful model for that level of verbal reasoning.


But that's the thing with AI. We make the context. In the case of AlphaGo, IBM's Watson, Self driving cars, we set the goal. There are different heuristics, but we always need to define what is "right" or what the "goal" is.

For AI to determine their own goals, well now you get into awareness ... consciousness. At a fundamental mathematical level, we still have no idea how these work.

We can see electrical signals in the brain using tools and know it's a combination of chemicals and pulses that somehow make us do what we do ... but we are still a long way from understanding how that process really works.


> we still have no idea how these work.

I'd actually just say that we've not really defined these very well, and so arguing about how far along the path we are to them isn't that productive.


Sorry, I've edited my original comment to be clearer. What I really meant is that there is wide tolerance of noise in those domains. "How long does stars last" has a completely different meaning than "How long do stars last" - not tolerant of noise.


If an 6th grader asks their science teacher "How long does stars last?" / "How long stars last?" /"How long do stars last?" / "How old do stars get?" / "Stars, how old can they get?" / ...

In similar context they probably end up parsed to the same question assuming correct inflection, posture, etc. Spoken conversations are messy, but they also have redundancy and pseudo checksum's. Written language tends to be more formal because it's a much narrower channel and you don't get as much feedback.

PS: It's also really common for someone to ask a question when they don't have enough context to understand what question they should be asking.


All those sentences sound (mentally) a lot differently. Some of those sentences give you the impression the speaker is an idiot, for example.


I'd suggest "How long do these stars last?" and "How long do these stairs last?" might be a better example. Human language has more redundancy than computer languages and in a real context it would probably still be clear what was meant even if the wrong word was used, but it's still a much spikier landscape with regard to small changes than images are.


I think you're dead on. And I'm nervous about a coming winter, because of disappointment in all the wolf-crying we're doing about how good at Natural Language we're getting. When we've barely scratched the surface. This latest bot fad worries me.

A further comment on deep methods being state of the art currently:

I wonder how well these tasks really measure progress in natural language understanding (I really don't like isolating that term as some distinct subdiscipline of broader AI goals, but so be it). Some of Chris Manning's students[1] have at least started down the path of examining some of these new-traditional tasks in language, and found that perhaps they are not so hard as they claim to be.

---

[1] A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task. Chen, Bolton & Manning [https://arxiv.org/abs/1606.02858]


Chatbots aren't about nlp, IMHO, they are about easing non-technical people into a top level cli for everything. IMHO, they ultimately have as much in common with search as with a natural language interaction.

IME as a chatbot developer, people don't talk to them in conversational english so much as spit out what they want it to do.


I don't think there will be a winter. There are enough successes in computer vision.


Yeah, absolutely, and those successes are not going anywhere. But as solutions to those problems become more and more rote, funding will still be needed for the bigger problem, which continues to fail to deliver on its promises.


But there isn't a model ( that I am aware of, and I've done some serious checking because reasons :) ) goes beyond say, Chomsky saying "we dunno". The difference between human language capability and our nearest evolutionary neighbors is profound, and we appeal to some emergent phenomenon.

But something about the very use of hierarchy in trying to solve NLP makes me queasy. I think it's more (poetically-metaphorically) like Reed-Solomon codes than hierarchies ( to the extent that those don't actually overlap ). There is Unexplained Conservation of Information That Really Isn't There To Start With.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: