If you think this is a symptom of AI winter, then you are probably wasting time on outdated/dysfunctional models or models that aren't suited for what you want to accomplish. Looking e.g. at Google Duplex (better voice synchronization than Vocaloid I use for making music), this pushed state-of-art to unbelievable levels in hard-to-address domains. I believe the whole SW industry will be living next 10 years from gradual addition of these concepts into production.
If you think Deep (Reinforcement) Learning is going to solve AGI, you are out of luck. If you however think it's useless and won't bring us anywhere, you are guaranteed to be wrong. Frankly, if you are daily working with Deep Learning, you are probably not seeing the big picture (i.e. how horrible methods used in real-life are and how you can easily get very economical 5% benefit of just plugging in Deep Learning somewhere in the pipeline; this might seem little but managers would kill for 5% of extra profit).
Understand that in pop-sci circles over the past several years the general public is being exposed to stories warning about the singularity by well respected people like Stephen Hawking and Elon Musk (http://time.com/3614349/artificial-intelligence-singularity-...). Autonomous vehicles are on the roads and Boston Dynamics is showing very real robot demonstrations. Deep learning is breaking records in what we thought was possible with machine learning. All of this progress has excited an irrational exuberance in the general public.
But people don't have a good concept of what these technologies can't do, mainly because researchers, business people, and journalists don't want to tell them--they want the money and attention. But eventually the general public wises up to the unfulfillment of expectations, and drives their attention elsewhere. Here we have the AI winter.
Further, I have repeatedly heard people who should know better, with very fancy advanced degrees, chant variants of "Deep Learning gets better with more data" and/or "Deep Learning makes feature engineering obsolete" as if they are trying to convince everyone around them as well as themselves that these two fallacious assumptions are the revealed truth handed down to mere mortals by the 4 horsemen of the field.
That said, if you put your ~10,000 hours into this, and keep up with the field, it's pretty impressive what high-dimensional classification and regression can do. Judea Pearl concurs: https://www.theatlantic.com/technology/archive/2018/05/machi...
My personal (and admittedly biased) belief is that if you combine DL with GOFAI and/or simulation, you can indeed work magic. AlphaZero is strong evidence of that, no? And the author of the article in this thread is apparently attempting to do the same sort of thing for self-driving cars. I wouldn't call this part of the field irrational exuberance, I'd call it amazing.
I think even if you avoid constructing features, you are basically doing a similar process where a single change in a hyper-parameter can have significant effects:
- internal structure of a model (what types of blocks are you using and how do you connect them, what are they capable of together, how do gradients propagate?)
- loss function (great results come only if you use a fitting loss function)
- category weights (i.e. improving under-represented classes)
- image/data augmentation (self-driving car won't work without significant augmentation at all)
- properly set-up optimizer
The good thing here is that you can automate optimization of these to a large extent if you have a cluster of machines and a way to orchestrate meta-optimization of slightly changed models. With feature engineering you just have to do all the work upfront, thinking what might be important, and often you just miss important parts of features :-(
curious, if there is any good quality open source project for this..
And more importantly, business and government leaders wise up and turn off the money tap.
I think they also happen when the best ideas in the field run into the brick wall of insufficiently developed computer technology. I remember writing code for a perceptron in the '90s on an 8 bit system, 64 k RAM - it's laughable.
But right now compute power and data storage seem plentiful, so rumors of the current wave's demise appear exaggerated.
Most of the software world will have to move on stuff like Haskell or functional language. As of now bulk(almost all) of our people are trained to program in C based languages.
It won't be easy. There will be a renewal for high demand software jobs.
A symptom of capitalism and marketing trying to push shit they don't understand
- how good humans are in detecting cancer (hint: not very good) and if having an automated system even as a "second opinion" next to an expert might not be useful?
- there are metrics for capturing true/false positives/negatives one can focus on during learning optimization
From studies you might have noticed that expert radiologists have e.g. F1-score at 0.45 and on average they score 0.39, which sounds really bad. Your system manages to push average to 0.44, which might be worse than the best radiologist out there, but better than an average radiologist . Is this really being oversold? (I am not addressing possible problems with overly optimistic datasets etc. which are real concerns)
The problem AI runs into is that with too much faith in the machine, people STOP thinking and believe the machine. Where you might get a .44 detection rate on radiology data alone, that radiologist with a .39 or a doctor can consult alternate streams of information. The AI may still be helpful in reinforcing a decision to continue scrutinizing a set of problem.
AI's as we call them today are better referred to as expert systems. AI carries too much baggage to be thrown around Willy nilly. An expert system may beat out a human at interpreting large unintuitive datasets, but they aren't generally testable, and like it or not, it will remain a tough sell in any situation where lives are on the line.
I'm not saying it isn't worth researching, but AI will continue to fight an uphill battle in terms of public acceptance outside of research or analytics spaces, and overselling or being anything but straightforward about what is going on under the hood will NOT help.
In medicine, I want everyone to apply appropriate skepticism to important results, and I don't want to enable lazy radiologists to zone out and press 'Y' all day. I want all the doctors to be maximally mentally engaged. Skepticism of an incorrect radiologist report recently saved my dad from some dangerous, and in his case unnecessary, treatment.
And there's probably not even a boolean "ground truth" in complicated bio-medicine problems. Sometimes the right call is neither yes or no, but: this is not like anything I've seen before, I can't give a decision either way, I need further tests.
State of the art in numbers:
Image Classification - ~$55, 9hrs (ImageNet)
Object Detection - ~$40, 6hrs (COCO)
Machine Translation - ~$40, 6hrs (WMT '14 EN-DE)
Question Answering - ~$5, 0.8hrs (SQuAD)
Speech recognition - ~$90, 13hrs (LibriSpeech)
Language Modeling - ~$490, 74hrs (LM1B)
I don't know. Duplex equipped with a way to minimize his own uncertainties sounds quite scary.
Microsoft OTOH quietly shipped the equivalent in China last month: https://www.theverge.com/2018/5/22/17379508/microsoft-xiaoic...
Google has lost a lot of steam lately IMO. Facebook is releasing better tools and Microsoft, the company they nearly vanquished a decade ago, is releasing better products. Google does remain the master of its own hype though.
Google nearly vanquished Microsoft a decade ago?
Where can I read more about this bit of history :) ?
IMO, Axios  seem to do a better job of criticizing Google's Duplex AI claims, as they repeatedly reached out to their contacts at Google for answers.
Microsoft was previously the gatekeeper to almost every interaction with software (roughly 1992 - 2002). I don't know of good books on it but Tim O'Reilly wrote quite a bit about Web 2.0.
I'm quite familiar with Google's history and would not characterize them as having vanquished Microsoft.
For the most part, Microsoft doesn't need to lose for Google to win (except of course in the realm of web search and office productivity).
Unfortunately, by the time of my brief stint at Google, the place was a professional dead-end where most of the hirees got smoke blown up their patooties at orientation about how amazing they were to be accepted into Google, only to be blind allocated into me-too MVPs of stuff they'd read about on TechCrunch. All IMO of course.
That said, I met the early Google Brain team there and I apparently made a sufficiently negative first impression for one of their leaders to hold a grudge against me 6 years later, explaining at last who it was that had blacklisted me there. So at least that mystery is solved.
PS It was pretty obvious these were voice actors in a studio conversing with the AI. That is impressive, but speaking as a former DJ myself, when one has any degree of voice training, one pronounces words without much accent and without slurring them together. Google will likely never admit anything here: they don't have to.
But I will give Alphabet a point for Waymo being the most professionally-responsible self-driving car effort so far. Compare and contrast with Tesla and Uber.
We haven't found life outside this planet, and we haven't created life in a lab, therefore n=1 for assessing probability of life outside earth (which means we can't calculate a probability for this yet). Likewise, we haven't created anything remotely like animal intelligence (let alone human) and we have no good theory regarding how it works, so n=1 for existing forms of general intelligence.
Note that I'm not saying there can be no extraterrestrial life or that we will never develop AGI, just that I haven't seen any evidence at this point in time that any opinions for or against their possibility are anything more than baseless speculation.
"To train the system in a new domain, we use real-time supervised training. This is comparable to the training practices of many disciplines, where an instructor supervises a student as they are doing their job, providing guidance as needed, and making sure that the task is performed at the instructor’s level of quality. In the Duplex system, experienced operators act as the instructors. By monitoring the system as it makes phone calls in a new domain, they can affect the behavior of the system in real time as needed. This continues until the system performs at the desired quality level, at which point the supervision stops and the system can make calls autonomously." --
OK, but 83% ROC/AUC is nothing to be bragging about. ROC/AUC routinely overstates the performance of a classifier anyway, and even so, ~80% values aren't that great in any domain. I wouldn't trust my life to that level of performance, unless I had no other choice.
You're basically making the author's case: deep learning clearly outperforms on certain classes of problems, and easily "generalizes" to modest performance on lots of others. But leaping from that to "radiology robots are almost here!" is folly.
There is also higher chance that next state-of-art model would push it significantly over 83% or best human radiologist at some point in the future, so it might not be very economical to train humans to become even better (i.e. dedicate your life to focus on radiology diagnostics only).
The main critique for CheXNet I've read was focused on the NIH dataset itself, not the model. The model generalizes quite well across multiple visual domains, given proper augmentation.
Except they don't. See the table in the original post. Also, comparing the "average" radiologist by F1 scores from a single experiment (as you've done in other comments here) is meaningless.
Unless my doctor is exactly average (and isn't incorporating additional information, or smart enough to be optimizing for false positive/negative rates relative to cost), comparison to average statistics is academic. But I don't really need to tell you this -- your comment has so many caveats that you're clearly already aware of the limits of the method.
On one hand, we have one commenter saying he can train a model to do a specific thing with a specific quantitative metric, to demonstrate how deep learning can incredibly powerful/useful.
On the other hand, we have another commenter saying "But this won't replace my doctor!" and therefore deep learning is overhyped.
The two sides aren't even talking about the same thing.
That kind of hyperventilating stuff is easy to brush off. The problem with deep-learning hype is that comments like "my classifier gets a ROC/AUC score of 0.8 with barely any work!" are presented as meaningful. The difference between a 0.8 AUC and a usable medical technology means that most of the work is ahead of you.
So I think it's come down to conflict between
1. Which the author is trying to present
2. What an astute reader might interpret it as
3. What an astute reader might worry an uninformed reader might interpret it as
And my feeling is that, given all the talk about hype in pop-sci, we're actually on point 3 now, even when the author and reader are actually talking about something reasonable. Whereas personally I'm more interested in the research and interpretations from experts, which I find tend to be not so problematic.
Just to get back to this point: what if the vision system of your doctor is below average and you augment her by giving her a statistically better vision system, while allowing her to use the additional sources as she sees fit. Wouldn't be that an improvement? We are talking about vision subsystem here, not the whole "reasoning package" human doctors posses.
On just about every test set, the model is beaten by radiologists. Even the mean performance is underwhelming.
In their paper they even used "weaker" DenseNet-121 instead of DenseNet-169 for Mura/bones. DenseNet-BC I tried is another refinement of the same approach.
Basically, I see this as "everyone sucks, but the AI maybe sucks a little less than the worst of our radiologists, on average"
Some people mention Matthews correlation coefficients, Youden's J statistic, Cohen's kappa etc. but I haven't seen them in any Deep Learning paper so far and I bet they have large blindspots as well.
Of course! Using DenseNet-BC-100-12 to increase ROC AUC, it was so obvious!