It's not some master algorithm, it's not going to produce sci-fi AI, and it probably isn't even suited to solve most problems in the realm of intelligence.
In fact it basically hits the worst possible spot on the problem solving scale. It barely learns anything given the amount of computation effort an data that goes into it, but it just happens to be good enough to be practically preferable to old symbolic systems.
It is completely mysterious to me how networks that approximate some utility function are a huge step forward to giving insight into cognition, reasoning, modelling, creation of counterfactuals and the sort of mechanisms we actually need to produce human like performance.
That said, the nature of the lack of leadership was around the MBA mindset that data science is the be-all end-all of AI, which IME is so far from the truth.
There needs to be many layers/spirals of naive methods (i.e. straight ahead engineering) as a vanguard on the front lines with the DS/NNs/NLP bringing up the flanks for aspects of the domains that become more well understood over time.
So yeah, tail wagging the dog, cart before horse etc. If you just barrel ahead with a monolithic NN pretending that all classes are created equal (structureless blob), which is what our DS PhDs were doing, it will quickly ossify. In our case it was intent prediction so we got to pretty high accuracy, but the labels were indicating many different categories that actually had relations in nature that could not be expressed. Ironically it took some regular engineers to research HMC (hierarchical multi-label classification) and ensembles and implement a new model training framework to support them. Not sure what they teach in school but it doesn't seem to be very practical.
EDIT: now working on messaging (SMS/MMS/RCS/ABC) with a focus on dialog management and logical rules with layering-in of progressively less naive stats methods strategically rather than blindly. It helps to have an existing revenue stream that can be leveraged to create even more value with AI features rather than the backwards hail Mary of not iterating from a foothold of existing traction.
1) is the independance assumption really applicable here ?
2) does the underlying model satisfy the law of large numbers ? (NOT your model, the underlying model, ie. "reality")
3) am I really only interested in predictions within the data range ?
The answer for all these questions is almost universally "no", and the theory says that statistics is not guaranteed to work under those circumstances.
Correct statistical answers:
1) which average percentage of borrowers will default ? There is no answer to that question, because many non-random influences will make borrowers default in synchronized, and therefore very much non-independent ways (say, an economic crisis, an earthquake, an epidemic). Therefore you cannot correctly even calculate an average.
2) which average percentage of borrowers will default ? A model that describes whether an individual will default is going to be non-differentiable, and therefore certainly violate the law of large numbers (there will be sudden jumps in the default rate everywhere because of a million random reasons. An idiot tv anchor declares now the time to sell, and half the town finds out their loan is underwater, say). So underlying models involving humans essentially never satisfy the law of large numbers.
3) which average percentage of borrowers will default ? The number of factors changing that value do not match between the period your data is from (ie. the past) and the period you're predicting. Because there are so many real-world factors that affect your variable you can never avoid this situation. Therefore you cannot predict.
All of these issues allow one to construct realistic scenarios where even trivial statistics will fail spectacularly.
A: isn't always understood what it is actually doing or how.
B: often needs to be retrained when input data varies slightly outside original training data - this is expected at a technical level, but is annoying in practice and often then starts deviating and losing its effectiveness or gets inconsistent results with larger less homogeneous input data sets.
C: Can't really (at least easily) be used as a stand-alone / packaged component/library to re-use nicely in other places as a conventional algorithm would be.
Deep Learning is just a branch (or even a sub-branch inside NN) of many ML techniques. Not all of them suffer the same problems or are based in the same theoretical background.
I don’t follow this. Are you implying there haven’t been absolutely massive gains in computer vision, nlg, nlp, etc?
The implication is that the massive gains haven't been the result of any algorithmic breakthroughs, but rather have been due to the application of massive amounts of computational resources, which weren't available when current ML algorithms were invented. So far as I can tell, that's a true accusation. If you look at the papers coming out of Google and Facebook, they talk about throwing thousands of hours of specialized GPU (or even more specialized and expensive TPU) time at some of the problems. The advances have more to do with Moore's Law making brute force feasible than they have to do with algorithmic breakthroughs.
GPUs already existed when the idea to use them to make the feasible size of neural nets larger came about. For a long time the drive for the increase of GPU compute power was still gaming/commercial graphics houses. It’s really only in the last 1-2 years that we’ve seen highly specialised GPUs with features like tensor cores (or indeed google’s TPUs).
Also, calling neural nets ‘brute force’ because they use a lot of computing power to train a model is slightly reductionist - a true brute force approach to image recognition, ie enumerating all possible combinations of, say, 200x200x256x3 pixels, would be completely absurd and probably exceed the computing power currently available on earth.
It's not like neural nets are a new technology. They've been known since the '80s, at least. It's just that they were considered a dead end, because we didn't have the computational resources to run deep neural nets, nor did we have sufficient training data to make neural-net approaches feasible. Once those preconditions were met, neural nets took off in short order.
And you literally described it as a brute force approach in your comment.
Probably. Human intelligence is extremely fallible - based on the statistics the only reason we trust humans to do half the stuff they do is because there is literally no choice.
If we held humans to a high objective engineering standard We wouldn't:
* Let them drive
* Let them present their memories as evidence in a court case
* Entrust them with monitoring jobs
* Allow them to perform surgical operations
Humans are the best we have at those things, but from a "did we secure the best result with the information we had" perspective they are not very reliable. A testable and consistently performing AI with known failure modes might even be able to outperform a human with a higher failure rate (eg, we can reconfigure our road systems if there is just one scenario an AI driver can't handle).
Basically, you might be dead on the money that they are not 'trustworthy enough', but lets not lose sight of the fact that even being an order of magnitude from human performance might be enough after costs and engineering benefits get factored in. The weakest link is the stupidest human, and that is quite a low bar.
Ironically, the thing that is lost in this comment would be "accountability". In case of a human, you can go back / trace decision making criteria and hold someone accountable. In case of an algorithm, everyone washes their hands off. Performance is not the only criteria to make a decision if algorithms are "trustworthy" over humans.
Typically in a regression setting with an ensemble learner you're using a kind of weighted mean, where the weights are selected based on cross-validation performance. This is sometimes called a "super learner". See van der Laan, Polley, and Hubbard 2007.
Note that this suffers from being similarly awful in terms of theory as a lot of other ML stuff. It is not, for example, the case that two apparently similar datasets or problem domains will produce similar super-learner weights. Which is disturbing, because it's easy to believe that say SVM does better at X and penalized ordered logit at Y, but it's hard to believe that they both do better seemingly at random.
There is so much more than one monolithic NN since those are easy to saturate in terms of precision and recall with enough training data and features, but are not enough to provide good UX in any complex domain/ontology. So it makes more sense to have many different models trained on each subdomain/taxonomy so that each can be specialized and then combined orthogonally.
Then the question becomes "how do we orchestrate them?" Well, there is a lot of research from the 80s and 90s that kinda got left by the wayside due to hype cycles (see the last "AI winter"). My faves are Collagen and Ravenclaw. And there is a lot of literature around topic frame stack modelling, which can be combined with various expert systems or other logics. I am currently using CLIPS (PyKnow) with custom Ravenclaw implementation. I believe b4.ai is doing something similar without the logic/rules engine, and actually applying ML to topic selection as well. My systems are goal oriented so I like to give them a teleology for business reasons, which would not suffice for AGI ambitions.
TLDR data science isn't enough on its own. We need engineers to architect things properly to solve problems.
> Would you consider them trusthworthy in court, where lives are at stake?
We're using them at Generic Health Insurance Megacorp in production - lots of enterprises are. If you are in the IT industry, it might be useful to spend some lab time with ML. Possibly you have a misconception of ML and/or confuse it with AI.
Also, please don't make single-purpose accounts on HN.
We have some strict(and irritating for profit-centric people) governance... one of the more interesting pieces of governance is called the "85/15 rule", which roughly translated, means that if we take in $100 dollars, the government mandates we use $85 of them to pay your costs, and $15 to pay our staff, light bills, and any other expense we have. If we end up only using $80 to pay your costs, we have to refund the remaining $5 to your group plan.
Here's the obvious secret about health insurance that people like to have conspiracy theories about..I can't speak for other institutions in other countries, however...our stance is really simplistic: you can't pay premiums if you are not alive, therefore it is in our mutual interest for you to remain alive. All the conspiracy theories such as "but you don't want that cancer patient in your insurance group plan!" are just that..conspiracy theories. We absolutely do want that person in the group, because then that group's rates go up! The costs for that patient's care are more or less fixed(and known), built on the assumption of a terminal outcome. We're going to pay for it anyway, and try to make that miserable experience as pleasant as possible for everyone involved. That type of service is how you get repeat business, and a good reputation.
This likely falls on deaf ears. Feel free to return to the zealous insurance hatred, and I'm going to return to writing code. Not for death panel machines. Promise.
I'll only make a small comment about Catholic priests, since I was at one time a Catholic. The problem wasn't just some of the priesthood was into pedophilia, but that when the church was made aware, actively covered it up.
> one of the more interesting pieces of governance is called the "85/15 rule"
If I remember correctly, that was passed via the PPACA. And it is also in jeopardy with the continual "repeal and replace (with nothing)" procedures since the PPACA's passage and SCOTUS failed challenge to dismiss. I believe there is a current federal court case with 20 states or so suing on grounds of constitutionality. And with the makeup of SCOTUS now, has a good chance of having the whole law deemed unconstitutional.
> Here's the obvious secret about health insurance that people like to have conspiracy theories about..I can't speak for other institutions in other countries, however...our stance is really simplistic: you can't pay premiums if you are not alive, therefore it is in our mutual interest for you to remain alive. All the conspiracy theories such as "but you don't want that cancer patient in your insurance group plan!" are just that..conspiracy theories. We absolutely do want that person in the group, because then that group's rates go up! The costs for that patient's care are more or less fixed(and known), built on the assumption of a terminal outcome. We're going to pay for it anyway, and try to make that miserable experience as pleasant as possible for everyone involved. That type of service is how you get repeat business, and a good reputation.
My anger, as well as many other peoples' anger, is the fact that this system is opaque. I go to a doctor, and have procedure/drug prescribed, and there's this song and dance about "preapproval", "permission" and all other sorts of roadblocks. Whether the medical insurance company is for/non profit doesn't matter too much to me. All I know is that the medical insurance is sitting between me and my doctor and making decisions about my care without a medical degree and no patient-doctor association.
And the moment medical insurance is taken out, the prices go up by 10 fold. That's mot the medical insurance companies' fault... But that's the end result for us. And medical insurance companies become de-facto arbiters of patients' health. Again, when questions are shoved in this black box, magic answers come out.
And what I was criticizing is that the use of AI in this context means that the decisions are now truly black-boxed, rather than just a process of actuarilists (sp?). That was subpoena-able and discoverable. The fact that some neural network algo was trained on GBs of data and outputs magic weights of "accept or deny" is an anathema. Those decisions should be understandable. Those decisions should be defensible (as long as we have a profit-based medical system).
Even decision trees would show traceability of how an input got the appropriate result. And if there were questionable or illegal things in there, then they could be challenged or changed.
> This likely falls on deaf ears. Feel free to return to the zealous insurance hatred, and I'm going to return to writing code. Not for death panel machines. Promise.
Not at all. I do have grievances with how the US does medical, and insurance is only one part of the whole. I come from a point that we should have health provided by tax dollars. We as a nation already spend triple what France does peer capita, yet only a small fraction gets care. Simply put, too many people slip through the cracks. I wouldn't say it's a zealous hatred. It's a well informed long-stewing anger that people who are ill can't get help/fixed.
Thank you for the discussion :)
That's how it always has been and always will be until perhaps one day there is truly nothing 'truly human' left to slice off.
Not to mention that part of what makes NN such a step forward is precisely the high nonlinearity. When you have millions of parameters, the contribution of each is unimportant and so any kind of analog to ANOVA would be barking up the wrong tree. In a NN all those parameters work in concert to learn a decision boundary to separate data. It’s not intelligible at the minute level but at least we know what it’s doing in the end. The problem of course lies with the outliers and that’s not so much a problem with NN being a black box as it is a problem with the nature of large datasets and our own inability to rationalize each and every datapoint.
I’m going to defend NN as the natural evolution of regression. It’s precisely their high nonlinearity which makes them better. The problem is not that they “are” alchemy but that we treat them “as” magic. Society as no place leaving important decisions to algorithms unintended, NN or not. If major insurance companies left their decisions to logistic regression (which was and is still the case), then would we be making the same arguments? Probably not because someone paid by the insurance company will pull out that other kind of alchemy called ANOVA ...
Yes, somebody can produce "magical" regression models with terms divorced from reality. But unsupervised learning is never guaranteed to produce a reality-grounded model, no matter the user's skill. It goes through steps, tries different transformations, and chooses the algorithm which led to the best result. Logic and understanding played no part. That sounds like alchemy. It definitely works, but alchemy also stumbled onto theories later explained by chemistry.
And, yes, anyone who doesn't consider the implications of the linear model behind ANOVA would also be a "magician."
It seems that an application like cancer diagnosis - feeding in test results and medical information to a NN, it finding patterns better than any human could, has nothing to do with human perception systems. It's just much better and faster at detecting patterns in complex things than humans are.
This is the part that's always bothered me. For many ML has just become the default tool to throw at a problem where other solutions (albeit less shiny) exist, that might give the same or even better results (and are very likely going to end up being more efficient). Would it be too much of a leap to suggest that we're forgetting to think?
Say we have a predictor matrix X with 2 predictors. We fit a model using a penalized linear regression (say LASSO) adding to our predictor matrix an interaction terms, arbitrary polynomial and logarithmic transformations of each X, and interactions between the transformations of the Xes. Ideally we motivate this because of some case knowledge about relevant nonlinear transformations of the predictors. Or maybe second best we use the kernel trick to run a kernelized regression that uses an infinite dimensional prediction of all possible transformations of the predictors. But realistically, we toss some shit in the model and run it.
The LASSO spits out X1, X2^2, and X1^3 * log(X2) as being the cross-validation selected non-zero parameters.
What real world scenario could possibly generate a causal process that is linear in X1 (say income), quadratic in X2 (say age, which often displays quadratic forms in regressions), but also predicted by a bizarre non-linear interaction of nonsense transformations?
What a practitioner would probably do is fit the model. In a lot of ML contexts, interpretation would lead to the practitioner saying "Well, okay, ML sometimes produces nonsense models, but you can't argue with the predictive results". Or maybe the practitioner is more sensitive to interpretability and instead takes another tack. Maybe the practitioner might say "clearly this interaction is nonsense, but there must be some interaction, I'll re-run with a linear interaction". Or else they'd re-run the LASSO with conditions about not including nonlinear terms without including the lower dimensional terms. Or else they'd run a grouped LASSO and make up some justification for the groups. All of these reveal that most ML practitioners are basically just doing alchemy.
And this is talking about what amounts to a minor version increment of linear regression, so probably the simplest possible technique we'd still call part of ML.
If so, then Machine Learning is a part of modeling AI. Regardless of how they are taught in terms of University lectures.
In fact, in real data, these assumptions are almost always violated. The Gaussian assumption doesn't matter at all, but to address the i.i.d. assumption: Almost all real data exhibits residual heteroskedasticity and almost all real data has observable clustering. Which is why almost no one uses OLS with classical errors. We have estimators to allow errors to be heteroskedasticity-consistent (the default in STATA and easily estimated in R e.g. by estimatr, clubSandwich, etc) or cluster-robust or both. By definition these cases have non-i.i.d. errors and there's no reason linear regression can't be used with them.
We also don't need to make assumptions, these can be interrogated. Most regression relies on using the residual matrix as sample plug-ins for the underlying error matrix, so there's a wide assortment of diagnostic techniques to check for the presence or absence of those assumptions.
Insofar as "machine learning" has any meaning -- which is to say, insofar as it is different than "statistics", the difference is purportedly that it focuses on minimizing out of sample prediction error rather than estimating population parameters, and typically this is motivated as an overfitting problem.
We use OLS because OLS is BLUE under the Gauss-Markov conditions. In ML we rarely care about "U" (unbiasedness) because we frequently prefer to make a bias-variance tradeoff if we're aiming to minimize out of sample error. When linear regression is used in an ML context it is typically penalized linear regression (i.e. ridge / LASSO). Of course it's also the case that the bulk of sexy ML results come out of non-linear estimators, and absent a need to characterize population parameters there's no real reason to care about interpretability so really we don't care about the "L" either.
I would say the grandparent is closer to right. Often in ML there is a view that we throw a bunch of processes at data, pick the thing that works best, don't care why it works at all, and then run with it. To the extent there's a protection against fishing expeditions, it's in the training/test separation or cross-validation or both.
Most of the time when someone talks about "regression theory", they're used 30 or 40 year old results. For an updated look, check out "Foundations of Agnostic Regression" (Aronow and Miller, both Yale Political Scientists) which is coming out some time in 2019. They've had a pre-print around for a while and if you're interested I'm sure you could get one.
- IID: "independent and identically distributed", https://en.wikipedia.org/wiki/Independent_and_identically_di...
- OLS: "ordinary least squares", https://en.wikipedia.org/wiki/Ordinary_least_squares (I think)
The grandparent to your content was raising an objection that, actually, linear regression, a very old technique which in plain English means fitting a straight line to a scatterplot (but in any number of dimensions), has a great deal of theory around it. The simplest form of solving a linear regression is by "ordinary least squares" (minimizing the sum of squared deviations from the fit line): OLS.
The grandparent was correct that especially in the mid-20th century to late 20th century, a lot of people did work on the conditions under which OLS works. What "works" means in a statistical sense is that it's efficient (has low uncertainty about the correct estimate), unbiased (on average gets the right answer), consistent (as you have more and more data gets closer to the right answer). Under a set of fairly impossible conditions about the real world data generated process, OLS is "BLUE" (the best linear unbiased estimator). Best here refers to efficiency, and unbiasedness I've already explained. OLS divides the data into structural elements (things that can be explained by the predictors you put into the estimator) and stochastic elements (the noise left over -- the deviations from the line). If we specify the correct model, the stochastic elements are the underlying stochasticity in the universe. If we specify an incorrect model, some of our omitted structural elements get put into the estimation of the noise.
The grandparent noted that two assumptions made in linear regression are that the underlying stochastic disturbances in the data are i.i.d. (each is a random draw from the same distribution) and Gaussian (form a normal / bell curve). These are not assumptions, these are conditions under which OLS is BLUE. The latter is not a necessary condition at all, the distribution can take any form. The former is the most succinct way to express one of the conditions.
My comment was to raise that actually when we use linear regression in the world, we rarely use classical OLS. In the real world underlying disturbances differ between observations. Imagine if I am running a regression on cross-country data, but while my US data is very precisely measured thanks to the widespread availability of polling firms, my Mexican data involves census enumerators going to rural villages. We might imagine that all of my US data is more precisely measured than all of my Mexican data, so we would expect the underlying stochasticity to differ between country. This is called clustering. Also, because we almost certainly do not have the correct model for the data (say log-dollars income predicts the result, not dollars income, but I put in dollars income), it can be the case that observations with higher values of our predicted Y also have more uncertainty. This is called heteroskedasticity. But the good news is we have answers to both, we just don't use OLS, we use more modern estimators. Yay!
In general the world has moved away from rigorously teaching the conditions under which OLS and works and toward teaching more flexible estimators that work under less restrictive conditions. And in general in ML, people aren't using anything that looks anything like OLS, because ML has specific goals OLS is inappropriate for -- namely minimizing overfitting and out-of-sample error, where OLS is designed to maximize the precision of estimates of the slope parameters (how a given predictor affects the outcome). So all the work in OLS theory doesn't really translate to a machine learning setting, where many methods have no theory at all.
Hope this is a plainer English version.
You know what else barely learns anything given the amount of computation that goes into it? A kid.
With 100 billion neurons -- so on the order of a hundred gigabytes of RAM -- after 72 months of closely supervised learning they're still far from being able to do many rudimentary tasks. It takes over 10 years of training to do that.
Something tells me most ML researchers wouldn't be too unhappy with such an awful performance.
classical machine learning has huge, sound theoretical backing.
scaling well is extremely important, and is the main advantage of deep learning. as planes don't need a bird's feathers to be useful, deep learning is already way sufficiently successful enough to justify their use.
no one serious claims that deep nets are biologically possible. they are still promising, and useful, powering translation, phone keyboard autocomplete, search, camera stuff like low light mode and cell phone portait mode, medical imaging diagnosis, malware detection, new planet detection in astronomy, plasma modeling in energy engineering, etc
So this is by now a standard argument. The standard answers I think have been:
1) Well, we are discovering that you don't need a strong theory for a truth-discovery machine. We've discovered experimental thinking construction.
2) It's true deep learning doesn't have a strong theory now but sooner or later it will get one.
3) This shows that instead we need theory X, usually by people who've been pursuing theory X all along. But I think Hinton at least has thrown a bunch of alternatives against the wall over the years.
(I could swear this has appeared here before but I can't find it).
Even in the Victorian era when, for fairly large swathes of the periodic table, and different types of compound, we already had a quite good experimental understanding of chemical reactions in terms of their constituent components and products, along with the conditions under which those reactions occur, we still didn't know the why. We didn't understand much about atoms or how they bond together, for example.
The point is this: science can often take a long time to advance, and AI is still a very young field, with the first practical endeavours only dating back to the post-WWII period.
Should we therefore be terribly surprised that ML seems a bit like alchemy?
As an aside, another normal facet of scientific advancement is the vast quantity of naysayers encountered along the way. Haters gonna hate, I suppose. (But don't misunderstand me: whilst I'm not an ML fanboi, I recognise that advances come in fits and starts, dead-ends will be encountered, and overall it's going to take quite a long time and require a lot of hard work to get anywhere.)
Final aside: this video has definitely been posted here before but I've also been unable to find it.
Indeed but that might be where the analogy breaks down.
I think one say that we just don't know how far we can take "experimental computer science" - where the experimental part is making "random" or actually "seat-of-the-pants" programs and seeing what they do. This is simply new and one could create a physics on top of this particular kind of experimentation is yet to be seen.
Maybe the only way to make steady progress here is to blaze ahead and rely on empirical evidence, whilst the theory is inevitably fleshed out. That's often the counter argument - that we do not know how the brain works, but rely on it nonetheless.
In medicine, machine learning systems will work alongside other brains.
What are the instances you’re imagining where a group of brains running an important system are replaced by a single machine learning algorithm running in isolation?
> that "we rely on the brain without understanding it and that works out fine" is not a good argument.
It wasn't my idea to bring cars into it particularly.
the depressing thing about machine learning to me is the many convincing explanations of phenomena, that then turn out not to explain things (all the explanations of why dropout is effective, or why there are adversarial examples, things like that.)
this is where the comparison to alchemy is the sharpest in my opinion. The alchemists had extremely sophisticated theories that they used to provide explanations for the phenomena they observed; just ultimately none of them made any sense.
All of these things help make your model more robust and give you greater confidence in it, which will be important if we want to put ML in, say, healthcare or defense. But you don’t get as much of this theory with more complex ML models, and certainly not with neural nets. Good luck trying to get a confidence interval for the optimal value of a weight in your net, much less interpreting it.
As tech evolves, more data is being generated, which in turn creates more problems that increases the demand for solutions.
To put it short: Solving "real world" problems is more/better rewarded than figuring out the underlying technology, so it's no wonder that we see a larger portion of practitioners that may lack the academic background - and that is understandable.
You could be a ML Ph.D (in Academia) for 5-10 year, earning gov. worker salary while trying to reach the next step - or hack together some ML-based product, until desired accuracy, and cash in 50-100 times the pay.
Now, with that said, I understand that Ali targeted his speech at the NIPS folk - but the industry / academia crossover in ML is massive, and it still stands that much of industry problems are of commercial interest - so the motivation for many is still the same.
When something is new, it feels like a mystery - eventually we'll have a language for wrapping our heads around neural networks, even if it's not as clear cut as we'd like.
There was ultimately nothing black box about the code we cobbled together back then.
Many advancements in machine learning have significant backing in theoretical proofs that a given algorithm will result in unbiased estimates, or will converge to such and such value etc.
On some level, the high amount of experimentation necessary in machine learning is not so much a sign that the practice is faulty in any particular way, but rather, that the world is a complex place. This is especially true when attempting to predict anything that involves human behavior.
Long-story short - I'd cut ML some slack!
Consider, lead and gold are very similar substances and chemistry lets you transform many things into other things so it must have seemed very possible. Unfortunately, I suspect the current AI movement is in a very similar state even if they can do a lot of things that seems magical it’s built on a poor foundation. Resulting in people mostly just trying stuff and see what happens to work without the ability to rigorously predict what will work well on a novel problem.
> Now alchemy is OK, alchemy is not bad, there is a place for alchemy, alchemy worked, alchemists invented metallurgy, [inaudible] dyed textiles, our modern glass making processes and medications.
> Then again, alchemists also believed they could cure diseases with leeches and transmute base metals into gold.
> For the physics and chemistry of the 1700s to usher in the sea change in our understanding of the universe that we now experience, scientists had to dismantle 2,000 years worth of alchemical theories.
> If you're building photo-sharing systems, alchemy is okay. But we're beyond that.
> Now we're building systems that govern health care, and mediate our civic dialogue. We influence elections.
> I would like to live in a society whose systems are built on top of verifiable rigorous thorough knowledge and not on alchemy.
People often complain about interpretability: in what sense is an SVM interpretable that a deep neural network is not?
Or is the worry about gradient descent not finding global optima? But why is the global optimum a satisfactory place to be, if the theory does not also provide a satisfactory connection between the space of models and underlying reality?
The arbiter of good theory is ultimately its ability to guide and explain practical phenomena. Which machine learning phenomena are currently most in need of theoretical elucidation?
Also your parameter is not inferable like in statistical algorithm. This is where I see people saying Deep Learning isn't interprable and there are research into this area. If you compare time series stat forecast algorithm with deep learning you at least get a CI on stat algorithm.
Randomly dropping node is pretty magic in my mind.
While I don't know much about SVM I know it's mathematically proven so there should be a way to interpret SVM fitted model.
I sure as hell wouldn't use ML in clinical trial for drugs. That's why biostat is a thing.
Another thing is that the industry should look at other AI techniques besides neural nets (NN) or find compliments to NN's. Genetic algorithms and Factor Tables should also be explored as well. Just because recent advances have been in NN's does not necessarily mean that's where the future should lead. Factor tables may allow more "dissection" & analysis by those without advanced degrees, for example. Experts may set up the framework and outline, but others can study and tune specifics.
* multi-layer, automated "jiggering" with so many components that only a machine can contextualize them, might be great for finding patterns in some sets, but the industry HYPE, the DIRECTION+VELOCITY, and the human manipulation (including lies and pathologies) are gut-level PROBLEM.. and this guy says that! +100
* Alchemy itself rambles and spreads  Some variations of Alchemy included a ritualized, internal psychological and psychic experience by the human practitioner.. hard to stabilize, yet not always a bad thing, since you are reading this and are actually one of those ..
lastly, the selected slides encourage a student-minded viewer to look up some math and think a bit. Not a bad thing. Thanks for this video and thanks for the talk.
To me it's very similar to Boolean logic circuits the way I was taught those in college except that there's too many gates to configure manually so you use supervised learning to find reasonable values and an arrangement of gates that works (aka your trained neural network).
I've never heard anyone else describe it this way but this is how I like to think about it. It really has nothing to do with how the human brain or much less human mind works. That's just marketing speak.
Pretty sure it was the initial thinking when Neural Network was created and it have move beyond that. I think people who know surface level only repeating this tadbit that's out of dated.
It's even in the official wikipedia article (https://en.wikipedia.org/wiki/Artificial_neural_network).
Everything you've stated was basically a personal opinion that could have been verified via google...
Fundamentally we have to contend with the fact that human ability to understand complex systems is fairly limited, and at some point it will become an impediment to further progress. Arguably in a number of fields we're past that point already.
But for safety critical systems you have to understand how these systems work to understand their limitations. You have to know when these techniques succeed, when they fail, and how badly they fail.
In general yes, but it might depend on the pattern of failure - if your self-driving cars hunts me or my family personally, I might have problem with that.
But how can you determine that without releasing it to the wild and waiting for bodies? Worse, say you have a safe system, but you need to modify the network (to fix some bug). How can you determine that the new system is safe enough to put on the road?
It's remarkable that AI seems to be held to an arbitrarily high standard, often exceeding that of other technologies.
For a large expensive system, the presence of either of the above may be unacceptable. Take something like the space shuttle program. If it was heavily reliant on black box AI, you might be able to build probabilities through tools like monte carlo simulations but you would be hard pressed for the government to put billions of dollars at risk without understanding the root cause of simulation failures
Well, that's an odd thing to say. I guess you can say that, for instance, the Therac-25 episodes involved human error, but in that case, any problems with AI are also due to human error. If everything is due to human error, then there is no alternative to defend.
Medical errors certainly do involve the interface between humans and computers, and it isn't really plausible that either humans or computers can be eliminated from medicine.
Because safety critical systems require trust. One way of establishing trust is to explain or prove why the system works. This cannot be done with machine learning.
Say that you already have a safe system. Now you make a change. How do you demonstrate to a skeptical audience that the new system is safe without releasing it and counting the bodies?
To modify your example. Say you have a doctor who haven't killed anyone yet and she reads a paper describing a new high risk / high reward treatment. The same standard of proof is not applied to her, so she can just go ahead and try it. Wasn't there a story just the other day where doctors were injecting people with some proteins linked to Alzheimer's? They weren't required to be "proven that they're safe" before they did that.
As to medical treatments, they are done on volunteers under informed consent. If you want to experiment with a car driven by AI on a private track staffed with volunteers, be my guest. It's between you, your conscience, your deity, and maybe OSHA.
But for experimentation and deployment on public roads, you need to convince other, possibly skeptical, people that your technology works better than humans. How do you do that with AI/ML-based systems?
Same as with humans: from statistics. I don't see any other way. E.g. autonomous cars will never have zero accident rate, but if they have half as many accidents per million miles as humans, it's a no brainer to me that they should be deployed. We're not going to get there in the next 20 years, so I'm not saying they should be deployed _now_, but eventually it will happen.
I.e. from body count.
One problem here is that we know humans have common, stable brain architecture, so the limits and failure modes we experienced are stable too, and can be accounted for and worked around. People won't fail you in completely surprising ways.
DNNs are each a different breed; between changing architectures and tuning hyperparameters, I don't see how trust in one instance can be translated to trust un another.
How do you convince yourself or a regulator, not that the technology in general is safe, but that your specific change does not make it unsafe?
What happens if this specific change increases the fatality rate from 1 in 100 million miles to 1 in 50 million miles? Crashes will still be rare and there's enough statistical noise that when you finally understand the change is bad, you may have killed hundreds.
A ML network does not have intrinsic, real-time mutability in how it is defined, outside the scope of memory-based node weights, inputs, outputs or graphs over time. These nodes are added, removed or modified based on a predefined set of input, output and internal mappings. How would intermediate layers be defined in a dynamic way between input and output in such a network in an attempt to achieve these higher powers? Driving a car, for instance, is a task that requires learning entire new subsets of skills, many times ad-hoc, that require intermediate models to be dynamically developed which are potentially outside the capabilities of our understanding. The biggest challenge I see today is that we don't necessarily have a good way to dynamically construct models of the intermediate layers (such that we can map them to other layers), especially if these layers are being added and removed dynamically by algorithms at the edge of our capability to understand.
I've always felt that there needs to be some internal processing occurring at rates far higher than the input sample rates such that higher-order intelligence may emerge by way of adjusting the entities noted above multiple times per input sample (and potentially even in the absence of input). The problem is also going to come down to how a person would define outcomes vs how an AI/ML network would. For the future to really begin we will need an AI that can understand and reason with what success and failure feel like in our abstract terms. This will require it to have the capacity to dynamically construct abstractions which we would have no hope of modelling ourselves, as we do not have very deep insight into the abstractions upon which the biological human brain implements virtually any behavior today. There is no amount of discrete math in the universe which can accurately model and assess the higher-order outcomes of decisions made in our reality. You can run ML disguised as AI in simulations and environments with fixed complexity all day, but once you throw one of these "trained" networks out into the real world without any constraints, you are probably going to see very unsatisfactory outcomes.
That can be traced to the "Turing test". It was flawed then, and it is flawed now. Reproducing the behaviour of a thinking agent does not prove that a putative AI will not fail in a more detailed test, as demonstrated in numerous papers about "adversarial images".