Hacker News new | comments | show | ask | jobs | submit login

While I haven't read those particular papers yet, one common pattern in the ML literature seems to be:

1) Identify a commonly seen problem with deep learning architectures, whether that's large data volumes, lack of transfer learning, etc.

2) Invent a solution to the problem.

3) Test that the solution works on toy examples, like MNIST, simple block worlds, simulated data, etc.

4) Hint that the technique, now proven to work, will naturally be extended to real data sets very soon, so we should consider the problem basically solved now. Hooray!

5) Return to step #1. If anyone applies the technique to real data sets, they find, of course, that it doesn't generalize well and works only on toy examples.

This is simply another form of what happened in the 60s and 70s, when many expected that SHRDLU and ELIZA would rapidly be extended to human-like, general-purpose intelligences, with just a bit of tweaking and a bit more computing power. Of course, that never happened. We still don't have that today, and when we do, I'm sure the architecture will look very very different from 1970s AI (or modern chatbots, for that matter, which are mostly built the same way as ELIZA).

I don't mean to be too cynical. Like I said, I haven't read those particular papers yet, so I can't fairly pass judgement on them. I'm just saying that historically, saying problem X "has been addressed" by Y doesn't always mean very much. See also eg. the classic paper "Artificial Intelligence Meets Natural Stupidity": https://dl.acm.org/citation.cfm?id=1045340.

EDIT: To be clear, I'm not saying that people shouldn't explore new architectures, test new ideas, or write up papers about them, even if they haven't been proven to work yet. That's part of what research is. The problem comes when there's an expectation that an idea about how to solve the problem means that the problem is close to being solved. Most ideas don't work very well and have to be abandoned later. Eg., as one example, this is the Neural Turing Machine paper from a few years back:

https://arxiv.org/pdf/1410.5401.pdf

It's a cool idea. I'm glad someone tried it out. But the paper was widely advertised in the mainstream press as being successful, even though it was not tested on "hard" data sets, and (to the best of my knowledge) it still hasn't several years later. That creates unrealistic expectations.




While I don't completely disagree with you, how would you propose researchers go about the problem?

If anything, machine learning is applied to real world problems these days more than it ever was.

For better or worse, AGI is a hard problem, that's going to take a long time to solve. And we're not going to solve it without exploring what works and what doesn't.


I think the mere fact that the OP feels the need to state that (paraphrasing) "possibly additional techniques besides deep learning will be likely necessary to reach AGI" reveals just how deeply the hype has infected the research community. This overblown self-delusion infects reporting on self-driving cars, automatic translation, facial recognition, content generation, and any number of other tasks that have reached the sort-of-works-but-not-really point with deep learning methods. But however rapid recent progress has been, these things won't be "solved" anytime soon, and we keep falling into the trap of believing the hype based on toy results. It'll be better for the researchers, investors, and society to be a little more skeptical of the claim that "computers can solve everything, we're 80% of the way there, just give us more time and money, and don't try to solve the problems any other way while you wait!"


Agreed. The hype surrounding machine learning is quite disproportionate to what's actually going on. But it's always been that way with machine learning -- maybe because it captures the public's imagination like few other fields do.

And there are definitely researchers, top ones no less, who play along with the hype. Very likely to secure more funding, and more attention for themselves and the field. Which has turned out to be quite an effective strategy, if you think about it.

The other upside of this hype is that it ends up attracting a lot of really smart people to work on this field, because of the money involved. So each hype cycle leads to greater progress. The crash afterwards might slow things down a bit, particularly in the private sector. But the quantum of government funding available changes much more slowly, and could well last until the next hype cycle starts.


>> The other upside of this hype is that it ends up attracting a lot of really smart people to work on this field, because of the money involved.

The hype certainly attracts people who are "smart" in the sense that they know how to profit from it, but that doesn't mean they can actually do useful research. The result is, like the other poster says, a huge number of papers that claim to have solved really hard problems, which of course remain far from solved; in other words, so much useless noise.

It's what you can expect when you see everyone and their little sister jumping on a bandwagon when the money starts pouring in. Greed is great for making money, but not so much for making progress.


> The result is, like the other poster says, a huge number of papers that claim to have solved really hard problems, which of course remain far from solved; in other words, so much useless noise.

Could the answer be holding these papers to a stricter standard during peer review?


Ah. To give a more controversial answer to your comment; you are asking, very reasonably: "isn't the solution to a deficit of scientific rigour, to increase scientific rigour"?

Unfortunately, while machine learning is a very active research field that has contributed much technology, certainly to the industry but also occasionally to the sciences, it has been a long time since anyone has successfully accused it of science. There is not so much a deficit of scientific rigour, as a complete and utter disregard for it.

Machine learning isn't science. It's a bunch of grown-up scientists banging their toy blocks together and gloating for having made the tallest tower.

(there, I said it)


Machine learning researchers publish most of their work on Arxiv first (and often, only), so peer review will not stop wild claims from being publicised- and overhyped. The popular press helps with that as do blogs and youtube accounts that present the latest splashy paper for the lay audience (without, of course, any attempt at critical analysis).

As to traditional publications in the field, these have often been criticised for their preference for work reporting high performance. In fact that's pretty much a requirement for publication in the most prestigious machine learning conferences and journals, to show improved performance against some previous work. This strongly motivates researchers to focus on one-off solutions to narrow problems, so that they can typeset one of those classic comparison tables with the best results highlighted, and claim a new record in some benchmark.

This has now become the norm and it's difficult to see how it is going to change any time soon. Most probably the field will need to go through a serious crisis (an AI winter or something of that magnitude) before things seriously change.


Maybe there needs to be more incentive to publish the failures, so that knowledge of the ways that promising approaches don't generalize becomes common knowledge? I'm just kibbitzing here.


While it's a good idea in principle to publish failures, in practice it's a bit more tricky. So a particular model didn't work. Does that mean the model is fundamentally flawed? Or that you weren't smart enough to engineer it just right? Or that you didn't not throw enough computing power at it?

In a vast error landscape of non-working models, a working model is extremely rare and provides valuable information about that local optima.

The only way publishing non-working models would be useful would be to require the authors to do a rigorous analysis of why exactly the model did not work (which is extremely hard with our current state of knowledge, although some people are starting to attempt this).


>> While it's a good idea in principle to publish failures, in practice it's a bit more tricky. So a particular model didn't work. Does that mean the model is fundamentally flawed? Or that you weren't smart enough to engineer it just right? Or that you didn't not throw enough computing power at it?

And yet the field seems to accept that a research team might train a bunch of competing models on a given dataset, compare them to their favourite model and "show" that theirs performs better - even if there's no way to know whether they simply didn't tune the other models as carefully as theirs.


Good question; I edited to clarify what I was saying.


The Neural Turing Machine has been expanded upon considerably since that paper, and inspired a number of other memory augmented neural network architectures, including: https://www.gwern.net/docs/rl/2016-graves.pdf , https://arxiv.org/abs/1605.06065 and https://arxiv.org/abs/1803.10760 .

If you don't want to read a bunch of papers this video also has a good discussion of that last paper, describing how a memory system based on the NTM has been applied to reinforcement learning to achieve very impressive results that seem to me to be a very significant step towards human-like general purpose intelligence: https://www.youtube.com/watch?v=9z3_tJAu7MQ


Exciting to see a video referencing the MERLIN paper!

I hope Deepmind decides to release the code. The paper outlines the architecture well, but reproducing RL results is very tricky. With so many interlinked neural networks in the closed loop, it'll be slow going to isolate failures due to bugs from unfortunate hyperparameter selection or starting seed..

I'd still like to try, though!


Yeah, it is super interesting. I've been gradually working on reproducing it too, mostly as a way to challenge myself and to try to keep on top of some of the cool RL research that has been coming out lately, but I've still got a bit left to do. I started out by working on World Models (https://worldmodels.github.io/) since it is conceptually similar, but without the memory system and the components are more isolated and easier to test. It has been a lot of fun though, and all the background reading has been very educational!

In the Merlin paper I do appreciate how thorough the description of the architecture is, especially compared to some of the earlier deep RL papers. I am hoping since its just a preprint we may get code released when/if it gets officially published, although maybe its not too likely given their history.


> 3) Test that the solution works on toy examples, like MNIST, simple block worlds, simulated data, etc.

youre right: mnist, imagenet, etc are toy examples that do not extend into the real world. but the point of reproducible research is to experiment on agreed upon, existing benchmarks.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: