I agree with Pearl that there's something deeply misguided about thinking that intelligence truly can be solved as a data optimization problem. I'd be happy if we'd see more research about AI at a symbolic level again.
The biggest hindrance seems to be funding. ML is really successful in a lot of commercial domains, general AI is a big moonshot. With most researchers moving from long term university positions to the business sector I'm concerned about this sort of research, not just in computer science.
When all you have is a hammer... I think we’re really thinking about the whole thing backwards, as if intelligence only comes from the cortex of the brain. I think human level intelligence is fundamentally an emotional state of being - our core lizard brain values aren’t something to be swept aside - they’re a fundamental part of real intelligence. It’s like trying to build a car starting from the body and electronics rather than the motor.
I hear that correlation does not imply causation but also that you can't distinguish correlation from causation merely with a stream of data. Is there any way out of this situation with just data?
It seems like saying "it's not optimization" has to boil down to "it's not generic optimization, it's a special kind of optimization." But maybe there's some formalism I'm missing.
It's the entire reason why we randomize in experiments. In active learning, the machine doesn't know what happens when the rooster is in a bag, so decides to try it. In a randomized experiment, the machine knows what happens when the rooster is in a bag, but thinks there might be other factors at work, so it decides to try it in a way that it can be sure that all other factors are equal (at least on average).
The problem might be that "understanding" isn't something we really have a good working definition of.
Well, what I'd like to know is how you distinguish correlation from causation with any amount of data. What definition, in terms of data, do you use?
Note, probabilistic and algorithmic theories of causation have a long history and many problems, see:
Alternatively, perhaps what's being optimized is so general that generalized AI is not, and cannot be, 'just' an optimization problem in any useful sense. Something like the biologically-inspired 'maximize lifetime production of descendants' does not get us very far.
If you have a time-ordered stream of data, the problem of predicting the output given previous input naturally suggests itself. There are other approaches, of course.
"Alternatively, perhaps what's being optimized is so general that generalized AI is not, and cannot be, 'just' an optimization problem in any useful sense."
- Perhaps you mean this as "the problem is so complex that it can't be naively approached only with conventional, local optimization tools". Because a huge optimization problem is still an optimization problem in some meaningful senses, I mean I could consider my life as a "fun" or "flow" or "enlightenment" optimization problem.
In general, it seems like a lot of reactions to particular AI methods go from "this approach isn't enough to get us all the way there" to "real AI is fundamentally, absolutely different from this, this approach is useless", a jump I don't think is justified.
I think a huge leap forward in the field could be made by modeling an emotional system similar to ours. It also seems like it would be an incredibly difficult and fuzzy problem to solve.
Addendum: After thinking about it for a few minutes more, I bet an implicit understanding of cause and effect could be derived by a system designed to try to optimize it's emotional state
Because of that, I think a "true AI" - at least one without a simulated humanlike body - would have rather different desires. Optimization towards those desires rather than humanlike desires would likely result in something that distinctly does not act human.
I can't help but wonder if we've ignored or disregarded any first steps to a general AI because the resulting building blocks didn't match any of our expectations from real-world models of instinct/intelligence.
I think a value system is mostly a post hoc justification of our emotional states. I view emotions as primarily a tight biochemical feedback loop. Think more of the four “f”s than, say, platonic feelings of love.
There are certain machines already better than humans(any other life form) in physical tasks — why do we need to have human intelligence as the base scale to measure machine intelligence?
I mean, the early man didn’t create his tools to compete with human nails.
It's important for people to understand these limits of current ML, but they are happening for a good reason. You can apply associative reasoning in many environments that don't support causal reasoning, and we are still scratching the surface of the value discovering associations can create.
Humans are often quite bright, but we're also known to do the wrong thing for no discernible reason. This is to be expected when there's no fundamental formal system behind behavior, and behavior is instead driven by a black-box neural network.
The equivalent in engineering would be to throw over trees and rocks in the hope of building a bridge. Clearly, that is unsatisfactory, we strive to understand the meaning of systems so that we can reason about them and alter them in predictable and fundamental ways.
Secondly, we don't know how likely it is that evolution produces intelligence. Maybe we're the only intelligent spot in the universe and it's an aberration. It took 4 billion years as well.
That seems to be a fundamentally impoverished way to go about things. We shouldn't forego the ability to understand minds at a deep level just because we have made practical strides in closed domains. That would be to mistake a trojan horse for an actual horse.
Imagine if aliens dropped a machine learning computer on earth in the 17th century.
Maybe we'd have never bothered to derive the laws of classical mechanics.
The machines might have replaced classical mechanics, but the downside is that the machine would be a black box, and we would never really understood how it derived its results.
And, in the scale, nature has had to deal with many stars collapsing.
But even if it were true, do you want to spend billions of years solving it that way?
The key point of this paper is that neural networks really are very good at "curve fitting" and that this curve fitting in the context of variational inference has advantages for causal reasoning, too.
Neural networks can be used in a variety of structures, and these structures tend to benefit from the inclusion of powerful trainable non-linear function approximators. In this sense, deep learning will continue to be a powerful tool despite some limitations in its current use.
I think Pearl, who's obviously remained very influential for many practitioners of machine learning, knows the value of "curve fitting". However I think it's a bit hard for a brief interview to sit down and have a real conversation about the state of the art of an academic field and the "Deep Learning is Broken" angle is a bit more attractive.
I wonder if Deep Belief Machines and their flavor of generative models, which seem closer in nature to Pearl's PGMs, have a chance to bridge the gap involved.
Edit, as an aside: Given the enormously high dimensionality of personal genomes and the incredibly small sample size, for over a decade I've failed to put any trust in GWAS studies and found my suspicion supported on a number of occasions, considering difficulty in reproducibility likely brought about by the above problem. Is there any reason to think that improved statistical methods can possibly surmount the fundamental problem of limited sample size and high dimensionality?
I suppose the most important idea is that GWAS aren't really supposed to show causality. "Association" is in the name. GWAS are usually hypothesis generating (e.g., identification of associated variants) and then identified variants can be probed experimentally with all of the tools of molecular biology.
In summary, GWAS have their problems, but I think your statement is a bit too strong.
This is a good paper that demonstrates the approach: https://www.nature.com/articles/srep16645
Millard, Louise AC, et al. "MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization." Scientific reports 5 (2015): 16645.
"I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.
Thirty years later, I am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false."
Christianity being chosen by the Roman Empire is the typical example. To most people the choice makes perfect sense, because we look back at what it brought with it. But when you put yourself in the heads of the decision makers and look at all the options they had, well, it makes no sense at all.
A lot of machine learning tells us trends, but it tells us nothing about the why, and I completely agree with the article about how useless that data is. I mean, it’s probably great at harmless things, but when my elaborate online profile still can’t figure out why I happen to read a cultural, artsy but somewhat conservative news paper, despite the fact that my data shows the algorithm that I really really shouldn’t be doing that, well, then we simply can’t use ML for any form of decision making or even as an advisor. At least not in the public sector.
It's an area where if we push hard on AI, we'll likely have to come to terms with how bad we are in this area, and ask ourselves whether we feel comfortable deploying 'thinking machines' with similar levels of incompetence and/or arrogance.
We should be more open minded and humble about how to approach this problem but almost everyone seems to have a strong opinion about it creating a very low signal to noise ratio.
We trust doctors and pilots, they offer partial explanations that we can somewhat understand, but they are backed by experience and qualification. Their perspective is informed by science - some good, some bad. Most of us don't think about that.
We have a perspective based on our cultural and social background, the machine must understand this and provide alternative explanations to suit us.
I have written a long article on this all, but I can't finish the game theory off!
But then.. I guess that's the point!
No! Astronomical observations can be shoehorned into geocentrism by adding more and more epicycles. That's curve fitting. At some point you have to realize the Earth revolves around the sun. Currently ML is on a dangerous path because any disagreement with empirical evidence can just be waved away with more data, more computation power, etc. In that sense, it's practically unfalsifiable.
I have tried to build an understanding of it since he got the Turing prize, but have failed so far.
To be a bayesian, you could model this as a conditional probability: p(you see a puddle | I carry an umbrella). It will look like a strong connection, but it isn't a causal one. If it were, then I could stop carrying an umbrella and clear away the puddles. That intervention of me changing my umbrella carrying behavior is what causation is all about. If we change this, does that change?
So then you talk about the probability of you seeing a puddle given some intervention that forces me to carry my umbrella regardless of anything else. We see that if you force me, independent of rain, to carry an umbrella or not to, then the connection between the umbrella and puddles is gone. p(puddles | do(umbrella)) != p(puddles | umbrella). do(X) means to take an intervention and force X regardless of other things.
As contrast, you can talk about the connection between rain and and puddles. If there were some hypothetical weather machine where we could force rain or sunshine, then you'd see that intervening and forcing rain (a.k.a. do(rain)) still keeps the relationship with puddles. p(puddles | do(rain)) still shows a connection. That is a causal connection.
It's all about counterfactual "what if I changed X?" questions. Using that idea, you can get all sorts of cool theory.
Is this true? It kind of blows my mind if it is.
"Mathematics has not developed the asymmetric language required to capture our understanding that if x causes y that does not mean that y causes x. It sounds like a terrible thing to say against science, I know. If I were to say it to my mother, she’d slap me."
His home page  links to several presentations (e.g. ) where he lays out the key ideas.
 "Causality: Models, Reasoning and Inference", Pearl
In a simplified closed system, where all you have are barometers and storms, maybe there is no difference between implication and causation; all you know is these variables are correlated. Perhaps once you take every atom in the universe into account, the two start to look the same.
Implication it is a very mathematical thing. It is like you know, that y=f(x), and then you write x=g(x), where g is inverse of f. It works both ways, there are no cause, no effect, just link between two variables. If you use math to reason about causal links in reality, you need to use some implicit knowledge which is not represented in formula. It doesn't means that math is bad. Geometry likes euclidian space while we know from Einstein that our space is not euclidian one -- it doesn't mean that geometry is bad. Euclidian geometry just solves some specific problems and doesn't solve others.
Causal link reflects ability to change dependant variable by changing independant one. It is not something like "fundamental property of the Universe", it is our subjective way to structure information about the reality. It seems to me, that physicists believe the other way, that causation is the inherent property of reality. Maybe they are right in their field, but it doesn't work in everyday life. Causal link is an abstraction that helps us to know what we can do to change outcomes.
In this sense there are no causal link between barometer readings and a storm: if you change barometer readings to reflect a fine weather the storm will come anyway. Maybe there is causal link between atmosphetic pressure and a storm? I do not know it, because I see no way to change atmospheric pressure and I'm not educated well enough to understand scientific weather models. Though it is relatively safe for me to believe that low atmospheric pressure causes storm: causational link or correlational one -- it will not change my behaviour, because I cannot change atmospheric pressure. If I'll find a way, than it would be cruicial to figure out the kind of the link, because I'll be able to break something if I'm wrong. But I said that it is relatively safe, because if I suppose that link is causational, I would use that link differently while reasoning about the weather, it will change my other beliefs and probably it will change my behaviour somehow.
So, the main idea is: causation is just our way to structure reality. We are free to choose which links are causal, it is all up to us. If we think it will help us to reason about reality, then we should speak about casuality. And the most important difference between causation and correlation is the ability to change dependant variable by changing independant. If we can change dependant variable that way, than we should mark link as causational. If we cannot, that we should think about that link as about correlational. Implication just do not draw this difference.
People needed thousands of years to put together a few causal concepts about the world. AI would need its playtime too. It's not like a single person can come up with a causal model of the world by himself/herself. Just keep in mind that when comparing human intelligence with AI, so as not to ask of AI what no human can do.
Use your common sense though. Do you think we would have built the modern world (e.g. electricity) by passive observation of the world, or does really getting to the truth require controlled experiments?
In a difference equation x(t+1) = f(x(t),u(t)), u causes changes to x and not the other way around
It may simply be a predictor of x, as in the 'barometer readings may precede and imply a storm but not cause one' example given by others above.
For differential or non-differential equations, they're still just describing how something is and not the causality behind it. It's always possible that the equation is merely a result of hidden variables and there is no casual relationship between any two points of the solution in any meaningful sense.
It gets even murkier when you think of statements like "Hitler coming into power caused world war 2". There are so many things going on in that system that it couldn't possibly be true (e.g. if you change Hitler out for another person maybe world war 2 still happens), but works as a plausible line of causal reasoning for a lot of people.
For all practical purposes, you can write causality like
p(x|y) = 1 and time(y) < time(x)
I.e. causality is just when one event always happens after another event. Any additional requirements for causality are basically philosophy.
But typical ML systems don't construct networks of causal relations, is basically what he's getting at from my reading of it
> p(x|y) = 1 and time(y) < time(x)
This isn't true at all. For a counterexample, x and y may both have a common cause.
Pearl's work is on this and extends the language to talk about p(y | do(x)), meaning that you talk about what happens to y when you take some hypothetical intervention to change x. Causation framed in terms of intervention talks about "what if it had been this instead of that?" and is probably the most common model of causation.
For more info look up the rubin causal model, the potential outcomes framework, and pearls's "do" notation.
p(x|y) = 1, p(x|!y) = 0, and time(y) < time(x)
That rules out the rooster counter-example. If y is a boolean, I guess the only thing you can "do" to it is negate it.
But it at least rules out x causing y, which is something.
This is in fact the case with the barometer falling before a storm. Both the falling barometer and the subsequent rain and wind of a storm are consequences of an uneven distribution of heat and moisture in the atmosphere approaching equilibrium under the constraints of Earth's gravity and Coriolis force.
Is the rooster causing the dawn?
These two scenarios have the potential to exhibit different behavior -- probabilistic models without a notion of intervention or counterfactuals will only capture the former. But just like this 'selection' operator you c an define an analogous operator for -- selecting on those that you interved on -- then you are in the realms of causality.
> But I’m asking about the future—what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step.
Nearly 10 years ago , a robot called Adam both made and tested hypotheses about yeast. Certainly not a general AI, or even an award-winning massive breakthrough, but it's a good step in a direction that he doesn't think exists yet.
Only a few ML algorithms like decision tree can show the causal relationship today. It is very hard to that in neural network with multiple layers.
Also on the HN frontpage right now is a link to a Guardian article "how to disappear from the internet", and the top comment in the forum there about his difficulties to deal with the results of identity theft, credit card debt, also shows a complete lack of transparency.
Not only do banks etc keep their models secret from customers, they keep them secret from other departments. The credit risk strategy team, for instance, won't want to risk customer service staff 'helping' customers alter their application details to get their scores over a cut-off.
(I used to run credit risk strategy, fraud, collections, operations etc for two credit card companies)
A third party designed to help customers get approved could aggregate data across multiple customers, generate hypotheses of what changes would artificially lower the bank's perceived risk for a customer (which would also require it understand what sort of changes customers can make easily), and test those hypotheses to refine a model.
It could optimise for revenue, paying customers for information, and receiving income if it succeeds in getting them approved.
But in deep learning algorithms, features and co-efficients are not determined by humans. In most cases, it cant even be understood by humans. Without this understanding, I highly doubt if they will be accepted in regulatory industries.
Wait. Doesn't a derivative mathematically define a causal relationship?
EDIT: nevermind re: derivative = causal ... that's just a correlation relationship. dx/dt. Still I'm curious as to what is special about the 3d derivative (besides jerk).
What you really want, though are the ability to decouple and infer relationships between short and long term features (something like the cepstrum transform from speech analysis).