1) Identify a commonly seen problem with deep learning architectures, whether that's large data volumes, lack of transfer learning, etc.
2) Invent a solution to the problem.
3) Test that the solution works on toy examples, like MNIST, simple block worlds, simulated data, etc.
4) Hint that the technique, now proven to work, will naturally be extended to real data sets very soon, so we should consider the problem basically solved now. Hooray!
5) Return to step #1. If anyone applies the technique to real data sets, they find, of course, that it doesn't generalize well and works only on toy examples.
This is simply another form of what happened in the 60s and 70s, when many expected that SHRDLU and ELIZA would rapidly be extended to human-like, general-purpose intelligences, with just a bit of tweaking and a bit more computing power. Of course, that never happened. We still don't have that today, and when we do, I'm sure the architecture will look very very different from 1970s AI (or modern chatbots, for that matter, which are mostly built the same way as ELIZA).
I don't mean to be too cynical. Like I said, I haven't read those particular papers yet, so I can't fairly pass judgement on them. I'm just saying that historically, saying problem X "has been addressed" by Y doesn't always mean very much. See also eg. the classic paper "Artificial Intelligence Meets Natural Stupidity": https://dl.acm.org/citation.cfm?id=1045340.
EDIT: To be clear, I'm not saying that people shouldn't explore new architectures, test new ideas, or write up papers about them, even if they haven't been proven to work yet. That's part of what research is. The problem comes when there's an expectation that an idea about how to solve the problem means that the problem is close to being solved. Most ideas don't work very well and have to be abandoned later. Eg., as one example, this is the Neural Turing Machine paper from a few years back:
It's a cool idea. I'm glad someone tried it out. But the paper was widely advertised in the mainstream press as being successful, even though it was not tested on "hard" data sets, and (to the best of my knowledge) it still hasn't several years later. That creates unrealistic expectations.
If anything, machine learning is applied to real world problems these days more than it ever was.
For better or worse, AGI is a hard problem, that's going to take a long time to solve. And we're not going to solve it without exploring what works and what doesn't.
And there are definitely researchers, top ones no less, who play along with the hype. Very likely to secure more funding, and more attention for themselves and the field. Which has turned out to be quite an effective strategy, if you think about it.
The other upside of this hype is that it ends up attracting a lot of really smart people to work on this field, because of the money involved. So each hype cycle leads to greater progress.
The crash afterwards might slow things down a bit, particularly in the private sector. But the quantum of government funding available changes much more slowly, and could well last until the next hype cycle starts.
The hype certainly attracts people who are "smart" in the sense that they know how to profit from it, but that doesn't mean they can actually do useful research. The result is, like the other poster says, a huge number of papers that claim to have solved really hard problems, which of course remain far from solved; in other words, so much useless noise.
It's what you can expect when you see everyone and their little sister jumping on a bandwagon when the money starts pouring in. Greed is great for making money, but not so much for making progress.
Could the answer be holding these papers to a stricter standard during peer review?
Unfortunately, while machine learning is a very active research field that has contributed much technology, certainly to the industry but also occasionally to the sciences, it has been a long time since anyone has successfully accused it of science. There is not so much a deficit of scientific rigour, as a complete and utter disregard for it.
Machine learning isn't science. It's a bunch of grown-up scientists banging their toy blocks together and gloating for having made the tallest tower.
(there, I said it)
As to traditional publications in the field, these have often been criticised for their preference for work reporting high performance. In fact that's pretty much a requirement for publication in the most prestigious machine learning conferences and journals, to show improved performance against some previous work. This strongly motivates researchers to focus on one-off solutions to narrow problems, so that they can typeset one of those classic comparison tables with the best results highlighted, and claim a new record in some benchmark.
This has now become the norm and it's difficult to see how it is going to change any time soon. Most probably the field will need to go through a serious crisis (an AI winter or something of that magnitude) before things seriously change.
In a vast error landscape of non-working models, a working model is extremely rare and provides valuable information about that local optima.
The only way publishing non-working models would be useful would be to require the authors to do a rigorous analysis of why exactly the model did not work (which is extremely hard with our current state of knowledge, although some people are starting to attempt this).
And yet the field seems to accept that a research team might train a bunch of competing models on a given dataset, compare them to their favourite model and "show" that theirs performs better - even if there's no way to know whether they simply didn't tune the other models as carefully as theirs.
If you don't want to read a bunch of papers this video also has a good discussion of that last paper, describing how a memory system based on the NTM has been applied to reinforcement learning to achieve very impressive results that seem to me to be a very significant step towards human-like general purpose intelligence: https://www.youtube.com/watch?v=9z3_tJAu7MQ
I hope Deepmind decides to release the code. The paper outlines the architecture well, but reproducing RL results is very tricky. With so many interlinked neural networks in the closed loop, it'll be slow going to isolate failures due to bugs from unfortunate hyperparameter selection or starting seed..
I'd still like to try, though!
In the Merlin paper I do appreciate how thorough the description of the architecture is, especially compared to some of the earlier deep RL papers. I am hoping since its just a preprint we may get code released when/if it gets officially published, although maybe its not too likely given their history.
youre right: mnist, imagenet, etc are toy examples that do not extend into the real world. but the point of reproducible research is to experiment on agreed upon, existing benchmarks.