Hacker News new | past | comments | ask | show | jobs | submit login
Machine learning has become alchemy (2017) [video] (youtube.com)
253 points by henning 3 months ago | hide | past | web | favorite | 124 comments



The problem with ML in my opinion is not that we're missing some sort of fundamental theory, but that there simply is none. ML is essentially fancy pattern matching roughly resembling the human visual system, which is why it happens to be good at tasks related to perception.

It's not some master algorithm, it's not going to produce sci-fi AI, and it probably isn't even suited to solve most problems in the realm of intelligence.

In fact it basically hits the worst possible spot on the problem solving scale. It barely learns anything given the amount of computation effort an data that goes into it, but it just happens to be good enough to be practically preferable to old symbolic systems.

It is completely mysterious to me how networks that approximate some utility function are a huge step forward to giving insight into cognition, reasoning, modelling, creation of counterfactuals and the sort of mechanisms we actually need to produce human like performance.


The annoying part is that ML is not sold to the world like this. I would say the truth (I believe) in this comment is the "dirty little secret" of our industry. Everyone working on it knows this but the research and VC dollars are flowing in so no one wants to talk about it too much.


Can confirm, worked at an AI hype company for over a year and built all their systems outside the NN internals. There is going to be a correction.


May I ask if this insight is why you left? What do you work on now?


Actually left due to anti-engineering culture (no tech leaders), succumbed to the distortions of Conway's law due to imbalance between professional services and core/platform.

That said, the nature of the lack of leadership was around the MBA mindset that data science is the be-all end-all of AI, which IME is so far from the truth.

There needs to be many layers/spirals of naive methods (i.e. straight ahead engineering) as a vanguard on the front lines with the DS/NNs/NLP bringing up the flanks for aspects of the domains that become more well understood over time.

So yeah, tail wagging the dog, cart before horse etc. If you just barrel ahead with a monolithic NN pretending that all classes are created equal (structureless blob), which is what our DS PhDs were doing, it will quickly ossify. In our case it was intent prediction so we got to pretty high accuracy, but the labels were indicating many different categories that actually had relations in nature that could not be expressed. Ironically it took some regular engineers to research HMC (hierarchical multi-label classification) and ensembles and implement a new model training framework to support them. Not sure what they teach in school but it doesn't seem to be very practical.

EDIT: now working on messaging (SMS/MMS/RCS/ABC) with a focus on dialog management and logical rules with layering-in of progressively less naive stats methods strategically rather than blindly. It helps to have an existing revenue stream that can be leveraged to create even more value with AI features rather than the backwards hail Mary of not iterating from a foothold of existing traction.


Statistics is quite valuable and grounded in theory. It's a trusty tool if used with respect (like a knife).


Yeah, too bad the theory requires a number of things to be true for data and models ... that are essentially never true. Once you think about

1) is the independance assumption really applicable here ?

2) does the underlying model satisfy the law of large numbers ? (NOT your model, the underlying model, ie. "reality")

3) am I really only interested in predictions within the data range ?

The answer for all these questions is almost universally "no", and the theory says that statistics is not guaranteed to work under those circumstances.

Correct statistical answers:

1) which average percentage of borrowers will default ? There is no answer to that question, because many non-random influences will make borrowers default in synchronized, and therefore very much non-independent ways (say, an economic crisis, an earthquake, an epidemic). Therefore you cannot correctly even calculate an average.

2) which average percentage of borrowers will default ? A model that describes whether an individual will default is going to be non-differentiable, and therefore certainly violate the law of large numbers (there will be sudden jumps in the default rate everywhere because of a million random reasons. An idiot tv anchor declares now the time to sell, and half the town finds out their loan is underwater, say). So underlying models involving humans essentially never satisfy the law of large numbers.

3) which average percentage of borrowers will default ? The number of factors changing that value do not match between the period your data is from (ie. the past) and the period you're predicting. Because there are so many real-world factors that affect your variable you can never avoid this situation. Therefore you cannot predict.

All of these issues allow one to construct realistic scenarios where even trivial statistics will fail spectacularly.


It's true, statistics can't answer every question. There are some limits.


I'd agree with this assessment 100% - ML can achieve some very impressive results - in my domain (computer graphics for movie VFX) with regards to image generation / identification / modification, e.g. denoising for the latter - however the result is almost always effectively a black box of functionality which:

A: isn't always understood what it is actually doing or how.

B: often needs to be retrained when input data varies slightly outside original training data - this is expected at a technical level, but is annoying in practice and often then starts deviating and losing its effectiveness or gets inconsistent results with larger less homogeneous input data sets.

C: Can't really (at least easily) be used as a stand-alone / packaged component/library to re-use nicely in other places as a conventional algorithm would be.


I like that all of the issues you bring up around ML can be read as applying to a human expert/craftsperson as well! (still with a wider margin, of course)


ML != Deep Learning.

Deep Learning is just a branch (or even a sub-branch inside NN) of many ML techniques. Not all of them suffer the same problems or are based in the same theoretical background.


> It barely learns anything given the amount of computation effort an data that goes into it, but it just happens to be good enough to be practically preferable to old symbolic systems.

I don’t follow this. Are you implying there haven’t been absolutely massive gains in computer vision, nlg, nlp, etc?


Are you implying there haven’t been absolutely massive gains in computer vision, nlg, nlp, etc?

The implication is that the massive gains haven't been the result of any algorithmic breakthroughs, but rather have been due to the application of massive amounts of computational resources, which weren't available when current ML algorithms were invented. So far as I can tell, that's a true accusation. If you look at the papers coming out of Google and Facebook, they talk about throwing thousands of hours of specialized GPU (or even more specialized and expensive TPU) time at some of the problems. The advances have more to do with Moore's Law making brute force feasible than they have to do with algorithmic breakthroughs.


A small bone of contention: it’s not about Moore’s law per se, which has been ‘dead’ since about 2013, coincidentally when the deep learning revival started. It’s matrix multiplication ASIC development that is driving the progress.

GPUs already existed when the idea to use them to make the feasible size of neural nets larger came about. For a long time the drive for the increase of GPU compute power was still gaming/commercial graphics houses. It’s really only in the last 1-2 years that we’ve seen highly specialised GPUs with features like tensor cores (or indeed google’s TPUs).

Also, calling neural nets ‘brute force’ because they use a lot of computing power to train a model is slightly reductionist - a true brute force approach to image recognition, ie enumerating all possible combinations of, say, 200x200x256x3 pixels, would be completely absurd and probably exceed the computing power currently available on earth.


I'm not saying that neural nets are brute force. I'm saying that there haven't been any algorithmic improvements in neural nets to make them more computationally feasible than they were when they were first invented. Instead, we have specialized hardware which can just do the necessary computation quickly enough to make neural nets feasible.

It's not like neural nets are a new technology. They've been known since the '80s, at least. It's just that they were considered a dead end, because we didn't have the computational resources to run deep neural nets, nor did we have sufficient training data to make neural-net approaches feasible. Once those preconditions were met, neural nets took off in short order.


There have - convolutional nets have far fewer weights to learn than an equivalent fully connected net of the same depth.

And you literally described it as a brute force approach in your comment.


Those massive gains have yet to considered reliable enough to be considered trusthworthy. Would you consider them trusthworthy in court, where lives are at stake? Gains are nice but we are still so far from the essence of AI systems and considering how much resources we are pouring into learning, at this point all of them appear as nothing more than massive fat expensive toys


> Would you consider them trusthworthy in court, where lives are at stake?

Probably. Human intelligence is extremely fallible - based on the statistics the only reason we trust humans to do half the stuff they do is because there is literally no choice.

If we held humans to a high objective engineering standard We wouldn't:

* Let them drive

* Let them present their memories as evidence in a court case

* Entrust them with monitoring jobs

* Allow them to perform surgical operations

Humans are the best we have at those things, but from a "did we secure the best result with the information we had" perspective they are not very reliable. A testable and consistently performing AI with known failure modes might even be able to outperform a human with a higher failure rate (eg, we can reconfigure our road systems if there is just one scenario an AI driver can't handle).

Basically, you might be dead on the money that they are not 'trustworthy enough', but lets not lose sight of the fact that even being an order of magnitude from human performance might be enough after costs and engineering benefits get factored in. The weakest link is the stupidest human, and that is quite a low bar.


> Basically, you might be dead on the money that they are not 'trustworthy enough', but lets not lose sight of the fact that even being an order of magnitude from human performance might be enough after costs and engineering benefits get factored in.

Ironically, the thing that is lost in this comment would be "accountability". In case of a human, you can go back / trace decision making criteria and hold someone accountable. In case of an algorithm, everyone washes their hands off. Performance is not the only criteria to make a decision if algorithms are "trustworthy" over humans.


Linear models are highly interpretable and an operator can be held accountable.



I have the perspective of an informed layman as a programmer who hasn't messed with ML yet. Wouldn't the "safest" solution be a system with multiple algorithms and a consensus mechanism?


I believe some models do precisely that. Random forest ML as an example tallies "votes" on the outcome. I'm not sure how robustly multiple algorithms have been applied to this voting technique, but it would be an interesting read if anyone has information on it.


"Vote tallying" is basically taking a mode/weighted mode response rather than a mean/median/trimmed mean/weighted mean response. There are contexts where this is ideal (for example, classification in multiple unordered classes where mode is the only measure of central tendency that is even reasonable); cases where it's superfluous (in 2-class classification the mode is the median is the sign of the mean); and cases where it's bad (in regression with a continuous outcome where the modal prediction has probability exactly 0). So really it depends on the space where you want to use it.

Typically in a regression setting with an ensemble learner you're using a kind of weighted mean, where the weights are selected based on cross-validation performance. This is sometimes called a "super learner". See van der Laan, Polley, and Hubbard 2007.

Note that this suffers from being similarly awful in terms of theory as a lot of other ML stuff. It is not, for example, the case that two apparently similar datasets or problem domains will produce similar super-learner weights. Which is disturbing, because it's easy to believe that say SVM does better at X and penalized ordered logit at Y, but it's hard to believe that they both do better seemingly at random.


Yeah. I'm coming from the same background, but with some experience orchestrating these systems in production. AFAIK a bunch of the submissions to the various "AGI" awards like Alexa prize use ensembles of models and some way of weighing each one based on a context in order to choose which classifier to trust in a particular scenario. E.g. MILABOT.

There is so much more than one monolithic NN since those are easy to saturate in terms of precision and recall with enough training data and features, but are not enough to provide good UX in any complex domain/ontology. So it makes more sense to have many different models trained on each subdomain/taxonomy so that each can be specialized and then combined orthogonally.

Then the question becomes "how do we orchestrate them?" Well, there is a lot of research from the 80s and 90s that kinda got left by the wayside due to hype cycles (see the last "AI winter"). My faves are Collagen and Ravenclaw. And there is a lot of literature around topic frame stack modelling, which can be combined with various expert systems or other logics. I am currently using CLIPS (PyKnow) with custom Ravenclaw implementation. I believe b4.ai is doing something similar without the logic/rules engine, and actually applying ML to topic selection as well. My systems are goal oriented so I like to give them a teleology for business reasons, which would not suffice for AGI ambitions.

TLDR data science isn't enough on its own. We need engineers to architect things properly to solve problems.


Humans are often not considered trustworthy and reliable in court.


What does that have to do with engineering?


Trustworthiness in court was advanced as a metric for AI success at which current methods are failing.

> Would you consider them trusthworthy in court, where lives are at stake?


>> Those massive gains have yet to considered reliable enough to be considered trusthworthy.

We're using them at Generic Health Insurance Megacorp in production - lots of enterprises are. If you are in the IT industry, it might be useful to spend some lab time with ML. Possibly you have a misconception of ML and/or confuse it with AI.


Going all in on the "unintelligible black box" model of health insurance, I see.


Would you please stop being rude to others in HN comments? I'm sure you can express your substantive points without this.

https://news.ycombinator.com/newsguidelines.html

Also, please don't make single-purpose accounts on HN.


So, long story short, when groups were talking about governmental death panels, they in actuality were black box AIs that we have no understanding of, yet they make the core decisions and recommendations?

Indeed...


I think the phrase "governmental death panels" is a rather loaded politically partisan phrase. Any system of insuring folks includes people who decide what treatments will be approved and what won't. And in the realm of universal health care systems where decisions like that must be made (as they're made an any insurance company) Something like the UK's quality-adjusted life year (QALY) is, if imperfect, not a wholly unreasonable concept that doesn't include black box AI.


Geezus, not that type of health insurance. We're an insurance provider over a century old, blue cross and blue something or other. We're not in the business of death panels, and we have similar feelings towards the tin-foil-hat wearing masses that Catholic priests likely have towards people who assume since some priests are pedophiles, they must all be pedophiles. Some insurance companies(in the minority) are driven by greed and have a reputation that reflects that. Not all priests are pedos, and not all(or any that I've heard of) insurance providers are murderous thugs trying to extract every last cent of wealth from their customers. We're a non-profit organization for that matter. If we somehow extracted a ton of wealth, we wouldn't be allowed to keep it anyway.

We have some strict(and irritating for profit-centric people) governance... one of the more interesting pieces of governance is called the "85/15 rule", which roughly translated, means that if we take in $100 dollars, the government mandates we use $85 of them to pay your costs, and $15 to pay our staff, light bills, and any other expense we have. If we end up only using $80 to pay your costs, we have to refund the remaining $5 to your group plan.

Here's the obvious secret about health insurance that people like to have conspiracy theories about..I can't speak for other institutions in other countries, however...our stance is really simplistic: you can't pay premiums if you are not alive, therefore it is in our mutual interest for you to remain alive. All the conspiracy theories such as "but you don't want that cancer patient in your insurance group plan!" are just that..conspiracy theories. We absolutely do want that person in the group, because then that group's rates go up! The costs for that patient's care are more or less fixed(and known), built on the assumption of a terminal outcome. We're going to pay for it anyway, and try to make that miserable experience as pleasant as possible for everyone involved. That type of service is how you get repeat business, and a good reputation.

This likely falls on deaf ears. Feel free to return to the zealous insurance hatred, and I'm going to return to writing code. Not for death panel machines. Promise.


Thank you for responding, firstly.

I'll only make a small comment about Catholic priests, since I was at one time a Catholic. The problem wasn't just some of the priesthood was into pedophilia, but that when the church was made aware, actively covered it up.

> one of the more interesting pieces of governance is called the "85/15 rule"

If I remember correctly, that was passed via the PPACA. And it is also in jeopardy with the continual "repeal and replace (with nothing)" procedures since the PPACA's passage and SCOTUS failed challenge to dismiss. I believe there is a current federal court case with 20 states or so suing on grounds of constitutionality. And with the makeup of SCOTUS now, has a good chance of having the whole law deemed unconstitutional.

> Here's the obvious secret about health insurance that people like to have conspiracy theories about..I can't speak for other institutions in other countries, however...our stance is really simplistic: you can't pay premiums if you are not alive, therefore it is in our mutual interest for you to remain alive. All the conspiracy theories such as "but you don't want that cancer patient in your insurance group plan!" are just that..conspiracy theories. We absolutely do want that person in the group, because then that group's rates go up! The costs for that patient's care are more or less fixed(and known), built on the assumption of a terminal outcome. We're going to pay for it anyway, and try to make that miserable experience as pleasant as possible for everyone involved. That type of service is how you get repeat business, and a good reputation.

My anger, as well as many other peoples' anger, is the fact that this system is opaque. I go to a doctor, and have procedure/drug prescribed, and there's this song and dance about "preapproval", "permission" and all other sorts of roadblocks. Whether the medical insurance company is for/non profit doesn't matter too much to me. All I know is that the medical insurance is sitting between me and my doctor and making decisions about my care without a medical degree and no patient-doctor association.

And the moment medical insurance is taken out, the prices go up by 10 fold. That's mot the medical insurance companies' fault... But that's the end result for us. And medical insurance companies become de-facto arbiters of patients' health. Again, when questions are shoved in this black box, magic answers come out.

And what I was criticizing is that the use of AI in this context means that the decisions are now truly black-boxed, rather than just a process of actuarilists (sp?). That was subpoena-able and discoverable. The fact that some neural network algo was trained on GBs of data and outputs magic weights of "accept or deny" is an anathema. Those decisions should be understandable. Those decisions should be defensible (as long as we have a profit-based medical system).

Even decision trees would show traceability of how an input got the appropriate result. And if there were questionable or illegal things in there, then they could be challenged or changed.

> This likely falls on deaf ears. Feel free to return to the zealous insurance hatred, and I'm going to return to writing code. Not for death panel machines. Promise.

Not at all. I do have grievances with how the US does medical, and insurance is only one part of the whole. I come from a point that we should have health provided by tax dollars. We as a nation already spend triple what France does peer capita, yet only a small fraction gets care. Simply put, too many people slip through the cracks. I wouldn't say it's a zealous hatred. It's a well informed long-stewing anger that people who are ill can't get help/fixed.

Thank you for the discussion :)


There’s going to be people plugging their ears and shouting “but it’s just nonlinear function approximation!” all the way into the singularity.


Human intelligence involves choosing which context to use at any given point. So far, so called AI appears to take an external source of context for granted, which is why it seems to me fundamentally different from "real" intelligence, just like your eyes are different from your entire nervous system.


Every time there is some progress in AI people would yell "but this is not a real AI 'cause humans can..." ( play go/understand pictures/language/semantics/etc, goalpost is always in motion)


AI carves away what it means to be human one trivial slice at a time.

That's how it always has been and always will be until perhaps one day there is truly nothing 'truly human' left to slice off.


[flagged]


This comment crosses into incivility. Please don't do that on HN, regardless of how right you think you are compared to someone else.

https://news.ycombinator.com/newsguidelines.html


Understood and point taken. Thank you.


As if humans in cars does not make mistakes. Last time I checked humans have intelligence.


Classic regression is only intelligible because there were only a few parameters and people could use ANOVA to try and interpret them. IMO ANOVA is alchemy as well and most people trained to use it, don’t fully understand it. Should we leave decisions to those kinds of models?

Not to mention that part of what makes NN such a step forward is precisely the high nonlinearity. When you have millions of parameters, the contribution of each is unimportant and so any kind of analog to ANOVA would be barking up the wrong tree. In a NN all those parameters work in concert to learn a decision boundary to separate data. It’s not intelligible at the minute level but at least we know what it’s doing in the end. The problem of course lies with the outliers and that’s not so much a problem with NN being a black box as it is a problem with the nature of large datasets and our own inability to rationalize each and every datapoint.

I’m going to defend NN as the natural evolution of regression. It’s precisely their high nonlinearity which makes them better. The problem is not that they “are” alchemy but that we treat them “as” magic. Society as no place leaving important decisions to algorithms unintended, NN or not. If major insurance companies left their decisions to logistic regression (which was and is still the case), then would we be making the same arguments? Probably not because someone paid by the insurance company will pull out that other kind of alchemy called ANOVA ...


Classic regression's potential for intelligibility relies on the modeler choosing terms which are based on observed relations and improve performance. For example, population forecasts can be modeled as the current population plus forecasted changes through immigration, emigration, birth, and death.

Yes, somebody can produce "magical" regression models with terms divorced from reality. But unsupervised learning is never guaranteed to produce a reality-grounded model, no matter the user's skill. It goes through steps, tries different transformations, and chooses the algorithm which led to the best result. Logic and understanding played no part. That sounds like alchemy. It definitely works, but alchemy also stumbled onto theories later explained by chemistry.

And, yes, anyone who doesn't consider the implications of the linear model behind ANOVA would also be a "magician."


ANOVA is grounded by theory that has stood the test of time for decades. Statisticians that question ANOVA aren't a thing (despite such an accurate criticism being valuable career-wise to an academic). Gelman points out that multilevel models are easier to use than ANOVA and I tend to agree.


If it happens and doesn’t involve more than the current DL approach.


Singularity be like fusion... Always far enough away to be hinted at from thin evidence


ML is essentially fancy pattern matching roughly resembling the human visual system, which is why it happens to be good at tasks related to perception.

It seems that an application like cancer diagnosis - feeding in test results and medical information to a NN, it finding patterns better than any human could, has nothing to do with human perception systems. It's just much better and faster at detecting patterns in complex things than humans are.


>It barely learns anything given the amount of computation effort an data that goes into it, but it just happens to be good enough to be practically preferable to old symbolic systems.

This is the part that's always bothered me. For many ML has just become the default tool to throw at a problem where other solutions (albeit less shiny) exist, that might give the same or even better results (and are very likely going to end up being more efficient). Would it be too much of a leap to suggest that we're forgetting to think?


The moment I realized that Deep Learning is nothing more than a non-linear matrix operation I lost a big deal of respect for the field. I still believe the potential of Deep Learning is huge, since many, especially larger companies, have lots of data that just sits there waiting for some innovation. DL can deliver more productivity, i.e. higher quality, faster processes etc. That is great. But this has very little to do with the fancy sci-fi version of AI.


While it's true that the basics of Deep Learning (eg ImageNet) are not that conceptually interesting, it is also worth noting people are working on tons of interesting directions within this framework combined with other ideas from AI (see eg 'Learning to Reason with Third-Order Tensor Products' - https://arxiv.org/abs/1811.12143 | 'Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs' - https://arxiv.org/abs/1812.02788)


It is a misconception that ANNs are based on human visual processing. Maybe "inspired by" is a better term. The 1959 paper "What the Frog's Eye Tells the Frogs Brain" disproved the belief that ANNs have anything to do real neural networks.


But it's a great brand name.


Maybe the concepts of “cognition, reasoning, modelling, creation of counterfactuals” etc. were not well-formed to begin with. Maybe intelligence, like gravity, is nothing like what we came up with using intuition. The fact is that black-box function approximators regularly beat systems designed around intuitive notions of cognition on observable metrics. Yes, the function approximators are bad. But the fact that they do better than more intuitively-satisfying symbolic systems suggests that our intuitive ideas about intelligence are even worse.


Wait, what? Are you confusing ML with AI? The last two courses I took on ML were all theory. For example, when we use linear regression, we make assumptions about the noise — i.i.d. and Gaussian.


I believe OP is speculating about state-of-the-art ML. The classical techniques, like linear regression, are well studied. However, at this point, there is a race to explain theoretically _why_ deep learning is so successful at generalization when, by classical standards, it shouldn't be. OP is speculating that this quest is in vain.


I agree with your interpretation and pushback against your parent comment, but wanted to elaborate that the "problem" here is much deeper. Deep learning is basically magic, okay, fine, so let's start much closer to linear regression.

Say we have a predictor matrix X with 2 predictors. We fit a model using a penalized linear regression (say LASSO) adding to our predictor matrix an interaction terms, arbitrary polynomial and logarithmic transformations of each X, and interactions between the transformations of the Xes. Ideally we motivate this because of some case knowledge about relevant nonlinear transformations of the predictors. Or maybe second best we use the kernel trick to run a kernelized regression that uses an infinite dimensional prediction of all possible transformations of the predictors. But realistically, we toss some shit in the model and run it.

The LASSO spits out X1, X2^2, and X1^3 * log(X2) as being the cross-validation selected non-zero parameters.

What real world scenario could possibly generate a causal process that is linear in X1 (say income), quadratic in X2 (say age, which often displays quadratic forms in regressions), but also predicted by a bizarre non-linear interaction of nonsense transformations?

What a practitioner would probably do is fit the model. In a lot of ML contexts, interpretation would lead to the practitioner saying "Well, okay, ML sometimes produces nonsense models, but you can't argue with the predictive results". Or maybe the practitioner is more sensitive to interpretability and instead takes another tack. Maybe the practitioner might say "clearly this interaction is nonsense, but there must be some interaction, I'll re-run with a linear interaction". Or else they'd re-run the LASSO with conditions about not including nonlinear terms without including the lower dimensional terms. Or else they'd run a grouped LASSO and make up some justification for the groups. All of these reveal that most ML practitioners are basically just doing alchemy.

And this is talking about what amounts to a minor version increment of linear regression, so probably the simplest possible technique we'd still call part of ML.


You're spot on. Another point I like to raise is that predictors are often extremely redundant in terms of information. The implication being that nearly as good predictions might be made with three predictors vs thousands.


Would you describe AI as: automatic categorization and decision making?

If so, then Machine Learning is a part of modeling AI. Regardless of how they are taught in terms of University lectures.


I'd describe AI/ML as statistics by another name.


I think this is meaningful.


Thanks! It took a lot of education and practice in statistics to realize this.


In the context of linear regression, there is no particular reason to assume noise is i.i.d. and Gaussian. The former is part of the the Gauss-Markov conditions, under which OLS is BLUE (the "best linear unbiased estimator"). The latter is not necessary at all. And of the Gauss-Markov conditions can be violated to varying degrees of consequence.

In fact, in real data, these assumptions are almost always violated. The Gaussian assumption doesn't matter at all, but to address the i.i.d. assumption: Almost all real data exhibits residual heteroskedasticity and almost all real data has observable clustering. Which is why almost no one uses OLS with classical errors. We have estimators to allow errors to be heteroskedasticity-consistent (the default in STATA and easily estimated in R e.g. by estimatr, clubSandwich, etc) or cluster-robust or both. By definition these cases have non-i.i.d. errors and there's no reason linear regression can't be used with them.

We also don't need to make assumptions, these can be interrogated. Most regression relies on using the residual matrix as sample plug-ins for the underlying error matrix, so there's a wide assortment of diagnostic techniques to check for the presence or absence of those assumptions.

Insofar as "machine learning" has any meaning -- which is to say, insofar as it is different than "statistics", the difference is purportedly that it focuses on minimizing out of sample prediction error rather than estimating population parameters, and typically this is motivated as an overfitting problem.

We use OLS because OLS is BLUE under the Gauss-Markov conditions. In ML we rarely care about "U" (unbiasedness) because we frequently prefer to make a bias-variance tradeoff if we're aiming to minimize out of sample error. When linear regression is used in an ML context it is typically penalized linear regression (i.e. ridge / LASSO). Of course it's also the case that the bulk of sexy ML results come out of non-linear estimators, and absent a need to characterize population parameters there's no real reason to care about interpretability so really we don't care about the "L" either.

I would say the grandparent is closer to right. Often in ML there is a view that we throw a bunch of processes at data, pick the thing that works best, don't care why it works at all, and then run with it. To the extent there's a protection against fishing expeditions, it's in the training/test separation or cross-validation or both.

Most of the time when someone talks about "regression theory", they're used 30 or 40 year old results. For an updated look, check out "Foundations of Agnostic Regression" (Aronow and Miller, both Yale Political Scientists) which is coming out some time in 2019. They've had a pre-print around for a while and if you're interested I'm sure you could get one.


For the benefit of laypeople like myself:

- IID: "independent and identically distributed", https://en.wikipedia.org/wiki/Independent_and_identically_di...

- OLS: "ordinary least squares", https://en.wikipedia.org/wiki/Ordinary_least_squares (I think)


Yeah, to be clear, the discussion you're responding to is pretty out in the weeds. The great-grandparent to your content raised the imho well founded objection that many uses of machine learning "work" (produce good out-of-sample predictive accuracy) but we don't know "why". We have generated a distinct lack of theory. We know very little about assumptions and how they are violated. Separately, we know very little about why one technique works and another doesn't in a given context. We've also done very little thinking about how practitioners should deploy techniques except that predictive accuracy is good.

The grandparent to your content was raising an objection that, actually, linear regression, a very old technique which in plain English means fitting a straight line to a scatterplot (but in any number of dimensions), has a great deal of theory around it. The simplest form of solving a linear regression is by "ordinary least squares" (minimizing the sum of squared deviations from the fit line): OLS.

The grandparent was correct that especially in the mid-20th century to late 20th century, a lot of people did work on the conditions under which OLS works. What "works" means in a statistical sense is that it's efficient (has low uncertainty about the correct estimate), unbiased (on average gets the right answer), consistent (as you have more and more data gets closer to the right answer). Under a set of fairly impossible conditions about the real world data generated process, OLS is "BLUE" (the best linear unbiased estimator). Best here refers to efficiency, and unbiasedness I've already explained. OLS divides the data into structural elements (things that can be explained by the predictors you put into the estimator) and stochastic elements (the noise left over -- the deviations from the line). If we specify the correct model, the stochastic elements are the underlying stochasticity in the universe. If we specify an incorrect model, some of our omitted structural elements get put into the estimation of the noise.

The grandparent noted that two assumptions made in linear regression are that the underlying stochastic disturbances in the data are i.i.d. (each is a random draw from the same distribution) and Gaussian (form a normal / bell curve). These are not assumptions, these are conditions under which OLS is BLUE. The latter is not a necessary condition at all, the distribution can take any form. The former is the most succinct way to express one of the conditions.

My comment was to raise that actually when we use linear regression in the world, we rarely use classical OLS. In the real world underlying disturbances differ between observations. Imagine if I am running a regression on cross-country data, but while my US data is very precisely measured thanks to the widespread availability of polling firms, my Mexican data involves census enumerators going to rural villages. We might imagine that all of my US data is more precisely measured than all of my Mexican data, so we would expect the underlying stochasticity to differ between country. This is called clustering. Also, because we almost certainly do not have the correct model for the data (say log-dollars income predicts the result, not dollars income, but I put in dollars income), it can be the case that observations with higher values of our predicted Y also have more uncertainty. This is called heteroskedasticity. But the good news is we have answers to both, we just don't use OLS, we use more modern estimators. Yay!

In general the world has moved away from rigorously teaching the conditions under which OLS and works and toward teaching more flexible estimators that work under less restrictive conditions. And in general in ML, people aren't using anything that looks anything like OLS, because ML has specific goals OLS is inappropriate for -- namely minimizing overfitting and out-of-sample error, where OLS is designed to maximize the precision of estimates of the slope parameters (how a given predictor affects the outcome). So all the work in OLS theory doesn't really translate to a machine learning setting, where many methods have no theory at all.

Hope this is a plainer English version.


That's a huge help, thank you!


> It barely learns anything given the amount of computation effort and data that goes into it

You know what else barely learns anything given the amount of computation that goes into it? A kid.

With 100 billion neurons -- so on the order of a hundred gigabytes of RAM -- after 72 months of closely supervised learning they're still far from being able to do many rudimentary tasks. It takes over 10 years of training to do that.

Something tells me most ML researchers wouldn't be too unhappy with such an awful performance.


there is huge with being done right now on the theoretical backing of deep learning.

classical machine learning has huge, sound theoretical backing.

scaling well is extremely important, and is the main advantage of deep learning. as planes don't need a bird's feathers to be useful, deep learning is already way sufficiently successful enough to justify their use.

no one serious claims that deep nets are biologically possible. they are still promising, and useful, powering translation, phone keyboard autocomplete, search, camera stuff like low light mode and cell phone portait mode, medical imaging diagnosis, malware detection, new planet detection in astronomy, plasma modeling in energy engineering, etc


It is simple to say deep learning is based on "alchemy" or "engineering" or whatever it is that isn't strong theory. And it's reasonable to say deep learning has a lot of mathematical and statistical intuitions but doesn't have a strong theory - maybe just doesn't yet have a strong theory or maybe can never get one.

So this is by now a standard argument. The standard answers I think have been:

1) Well, we are discovering that you don't need a strong theory for a truth-discovery machine. We've discovered experimental thinking construction.

2) It's true deep learning doesn't have a strong theory now but sooner or later it will get one.

3) This shows that instead we need theory X, usually by people who've been pursuing theory X all along. But I think Hinton at least has thrown a bunch of alternatives against the wall over the years.

(I could swear this has appeared here before but I can't find it).


Worth also bearing in mind that we've been here before in other fields. Alchemy ultimately became chemistry.

Even in the Victorian era when, for fairly large swathes of the periodic table, and different types of compound, we already had a quite good experimental understanding of chemical reactions in terms of their constituent components and products, along with the conditions under which those reactions occur, we still didn't know the why. We didn't understand much about atoms or how they bond together, for example.

The point is this: science can often take a long time to advance, and AI is still a very young field, with the first practical endeavours only dating back to the post-WWII period.

Should we therefore be terribly surprised that ML seems a bit like alchemy?

As an aside, another normal facet of scientific advancement is the vast quantity of naysayers encountered along the way. Haters gonna hate, I suppose. (But don't misunderstand me: whilst I'm not an ML fanboi, I recognise that advances come in fits and starts, dead-ends will be encountered, and overall it's going to take quite a long time and require a lot of hard work to get anywhere.)

Final aside: this video has definitely been posted here before but I've also been unable to find it.


Worth also bearing in mind that we've been here before in other fields. Alchemy ultimately became chemistry.

Indeed but that might be where the analogy breaks down.

I think one say that we just don't know how far we can take "experimental computer science" - where the experimental part is making "random" or actually "seat-of-the-pants" programs and seeing what they do. This is simply new and one could create a physics on top of this particular kind of experimentation is yet to be seen.


I don't think it got as much traction, last time.


Even granting the claim that ML has become alchemy, consider what alchemy was: Misguided in its quasi-magical underpinnings and goals, but nonetheless an extremely important step in the founding of modern chemistry. And so what started out with poor understanding and numerous misconceptions evolved over a period of time, through trial and error and hard won knowledge into a massively important branch of science. And so if ML is indeed comparable to alchemy, the only salient question is how long, and what will it take, to turn into chemistry.


Science is a combination of theory and experiment. Sometimes theory advances faster than experiment, sometimes vice versa. Right now in ML, experiment aka practice is advancing faster than theory. Theory will eventually catch up.


I think a valid concern is that ML methods are being applied in critical, real-life scenarios without some practitioners being aware of flaws (bias, adversarial attacks, privacy issues) and without any theoretic safetynet that helps them reason about how these systems will behave. James Mickens discussed this recently in a keynote: https://www.usenix.org/conference/usenixsecurity18/presentat...

Maybe the only way to make steady progress here is to blaze ahead and rely on empirical evidence, whilst the theory is inevitably fleshed out. That's often the counter argument - that we do not know how the brain works, but rely on it nonetheless.


We don't rely on the brain though, at least not on any single one. Any system that relies on human brains alone without cross-checking or, ideally, much simpler automatic systems, will eventually malfunction terribly. A large organization never wants to rely on a single person's judgment for anything, a programmer wants automated systems checking their work, etc.


We rely on a single human to drive a car.

In medicine, machine learning systems will work alongside other brains.

What are the instances you’re imagining where a group of brains running an important system are replaced by a single machine learning algorithm running in isolation?


Yeah, and around 800 people a day die in auto accidents in the US. Bringing up cars bolsters my point, which is that "we rely on the brain without understanding it and that works out fine" is not a good argument. So let's be careful. I'm mystified as to why that seems to be controversial.


Are you saying that we should require 2 drivers per car? It seems like at the moment the risk/reward ratio of allowing people to drive a car after getting a license seems to work out well enough. 1 death per 100 million miles driven, and falling.


No. I'm saying

> that "we rely on the brain without understanding it and that works out fine" is not a good argument.

It wasn't my idea to bring cars into it particularly.


one would hope so!

the depressing thing about machine learning to me is the many convincing explanations of phenomena, that then turn out not to explain things (all the explanations of why dropout is effective, or why there are adversarial examples, things like that.)

this is where the comparison to alchemy is the sharpest in my opinion. The alchemists had extremely sophisticated theories that they used to provide explanations for the phenomena they observed; just ultimately none of them made any sense.


I think people end up reaching for straws because a lot of this stuff works well in practice and they’re trying to make sense of it. 20 years from now we’ll have a much clearer picture of why the techniques work! Until then, a lot of the explanations, maybe even some valid ones, may sound like baloney!


ML is a souped-up version of "draw a line through these points". We can get really efficient at drawing lines through points, but it's not like we'll suddenly realize some deeper fundamental theory about it!


But there actually is a huge amount of theory behind that problem. You can exactly derive the method that finds the best line. You can get error bounds on each of your coefficients and confidence intervals for them. You can alter the strength of your assumptions (e.g. about distribution of errors, homoskedacity, and so on) and see how it affects your model. You can add L1 or L2 regularization, both of which also have solid theoretical grounding. And so on.

All of these things help make your model more robust and give you greater confidence in it, which will be important if we want to put ML in, say, healthcare or defense. But you don’t get as much of this theory with more complex ML models, and certainly not with neural nets. Good luck trying to get a confidence interval for the optimal value of a weight in your net, much less interpreting it.


The reason you don't expect to see a deep fundamental theory of drawing a line through a few points is because you can always do it. ML doesn't always work, and sometimes it is harder to get working than other times. What's going on?


You can always draw a line through the points, but it isn't always a good approximation. If the points are inherently bunched around a line, then a line through them will approximate them well. If they're a big random cloud, then the line won't. It's the exact same way in ML, except the points are in n-dimensional space and "line" is replaced by "higher-dimensional curve or manifold". Sometimes (e.g. in image processing), the n-dimensional points are inherently bunched around a curve or manifold of the form you're using, and then ML works great. There's nothing deeper going on!


The ML scene was more rigorous 10-15-20 years ago, because it was mostly confined to the world of academia and industrial R&D, and we had much less wide-reaching problems to work towards.

As tech evolves, more data is being generated, which in turn creates more problems that increases the demand for solutions.

To put it short: Solving "real world" problems is more/better rewarded than figuring out the underlying technology, so it's no wonder that we see a larger portion of practitioners that may lack the academic background - and that is understandable.

You could be a ML Ph.D (in Academia) for 5-10 year, earning gov. worker salary while trying to reach the next step - or hack together some ML-based product, until desired accuracy, and cash in 50-100 times the pay.

Now, with that said, I understand that Ali targeted his speech at the NIPS folk - but the industry / academia crossover in ML is massive, and it still stands that much of industry problems are of commercial interest - so the motivation for many is still the same.


Making a serious, industrial scale web app in 2000 felt like alchemy. It was all arcane, there were no established patterns, nobody knew how to do it for sure, there were a lot of hustlers, most of them thankfully sincere.

When something is new, it feels like a mystery - eventually we'll have a language for wrapping our heads around neural networks, even if it's not as clear cut as we'd like.


We had the neural networks, and the language. The problem is with the rebranding and the amount of marketing bullshit comming with it. The null hypothesis is that apl of that most of it is a pile of crap, for an overengineered, overoptimized solutions that are probably applied at abstraction layer different from one they are marketed on. There may come solution out of it, but it's more wishfull thinking than not.


I understand what you are getting at, but I would call that a new frontier.

There was ultimately nothing black box about the code we cobbled together back then.


Sure - perhaps alchemy in the sense that many practitioners are simply throwing things against the wall and seeing what sticks, but not in the sense that there isn't anything real behind all the math or engineering.

Many advancements in machine learning have significant backing in theoretical proofs that a given algorithm will result in unbiased estimates, or will converge to such and such value etc.

On some level, the high amount of experimentation necessary in machine learning is not so much a sign that the practice is faulty in any particular way, but rather, that the world is a complex place. This is especially true when attempting to predict anything that involves human behavior.

Long-story short - I'd cut ML some slack!


Alchemists could do a lot, making gunpowder for example is non trivial. They simply worked from a bad model with little understanding of what was going on.

Consider, lead and gold are very similar substances and chemistry lets you transform many things into other things so it must have seemed very possible. Unfortunately, I suspect the current AI movement is in a very similar state even if they can do a lot of things that seems magical it’s built on a poor foundation. Resulting in people mostly just trying stuff and see what happens to work without the ability to rigorously predict what will work well on a novel problem.


From the talk, their main point about alchemy:

> Now alchemy is OK, alchemy is not bad, there is a place for alchemy, alchemy worked, alchemists invented metallurgy, [inaudible] dyed textiles, our modern glass making processes and medications.

> Then again, alchemists also believed they could cure diseases with leeches and transmute base metals into gold.

> For the physics and chemistry of the 1700s to usher in the sea change in our understanding of the universe that we now experience, scientists had to dismantle 2,000 years worth of alchemical theories.

> If you're building photo-sharing systems, alchemy is okay. But we're beyond that.

> Now we're building systems that govern health care, and mediate our civic dialogue. We influence elections.

> I would like to live in a society whose systems are built on top of verifiable rigorous thorough knowledge and not on alchemy.


Can someone give a concrete example of the kind of theoretical properties they desire of ``new-style'' machine learning? The kinds of properties that ``old-style'' learning methods guaranteed?

People often complain about interpretability: in what sense is an SVM interpretable that a deep neural network is not?

Or is the worry about gradient descent not finding global optima? But why is the global optimum a satisfactory place to be, if the theory does not also provide a satisfactory connection between the space of models and underlying reality?

The arbiter of good theory is ultimately its ability to guide and explain practical phenomena. Which machine learning phenomena are currently most in need of theoretical elucidation?


Most machine learning algorithms that aren't statistical base doesn't give a CI. From a statistical stand point it doesn't give a sense of how good your prediction is. You can get a general sense with just CV.

Also your parameter is not inferable like in statistical algorithm. This is where I see people saying Deep Learning isn't interprable and there are research into this area. If you compare time series stat forecast algorithm with deep learning you at least get a CI on stat algorithm.

Randomly dropping node is pretty magic in my mind.

While I don't know much about SVM I know it's mathematically proven so there should be a way to interpret SVM fitted model.

I sure as hell wouldn't use ML in clinical trial for drugs. That's why biostat is a thing.


this is a followup post i wrote to answer exactly your question: http://www.argmin.net/2018/01/25/optics/


We are still at the experimental stage, so trial and error without clear theories is not unexpected. Chemistry formed out of alchemy: people fiddled and noticed the patterns. The patterns were written down and shared, and different people began floating theories/models to explain the patterns, and the theories were further vetted by more experiments.

Another thing is that the industry should look at other AI techniques besides neural nets (NN) or find compliments to NN's. Genetic algorithms and Factor Tables should also be explored as well. Just because recent advances have been in NN's does not necessarily mean that's where the future should lead. Factor tables may allow more "dissection" & analysis by those without advanced degrees, for example. Experts may set up the framework and outline, but others can study and tune specifics. (https://github.com/RowColz/AI)


This video is great ! for me, not for the math though.. two things:

* multi-layer, automated "jiggering" with so many components that only a machine can contextualize them, might be great for finding patterns in some sets, but the industry HYPE, the DIRECTION+VELOCITY, and the human manipulation (including lies and pathologies) are gut-level PROBLEM.. and this guy says that! +100

* Alchemy itself rambles and spreads [1] Some variations of Alchemy included a ritualized, internal psychological and psychic experience by the human practitioner.. hard to stabilize, yet not always a bad thing, since you are reading this and are actually one of those ..

[1] https://en.wikipedia.org/wiki/Psychology_and_Alchemy

lastly, the selected slides encourage a student-minded viewer to look up some math and think a bit. Not a bad thing. Thanks for this video and thanks for the talk.


The way I understand neural networks to work, they are actually series of connected infinitely valued logic gates, where given a numerical input from neg infinity to positive infinity, it spits out another number from neg infinity to positive infinity that feeds into the next set of logic gates, and at the end gives you a confidence interval from 0 to 1 of whether there was a pattern match or not.

To me it's very similar to Boolean logic circuits the way I was taught those in college except that there's too many gates to configure manually so you use supervised learning to find reasonable values and an arrangement of gates that works (aka your trained neural network).

I've never heard anyone else describe it this way but this is how I like to think about it. It really has nothing to do with how the human brain or much less human mind works. That's just marketing speak.


> That's just marketing speak.

Pretty sure it was the initial thinking when Neural Network was created and it have move beyond that. I think people who know surface level only repeating this tadbit that's out of dated.

It's even in the official wikipedia article (https://en.wikipedia.org/wiki/Artificial_neural_network).

Everything you've stated was basically a personal opinion that could have been verified via google...


We do understand how it works most of the time, but can't predict if certain changes will be beneficial or detrimental. After the fact it's pretty clear what e.g. CNNs do (find manifold transforms which, coupled with nonlinearities minimize error in the layer-wise output distributions when backpropagating loss of the minibatches of the training set). But you can't reliably say "if I add e.g. a bottleneck branch over here my accuracy will go up/down X%".

Fundamentally we have to contend with the fact that human ability to understand complex systems is fairly limited, and at some point it will become an impediment to further progress. Arguably in a number of fields we're past that point already.


That's all very cool.

But for safety critical systems you have to understand how these systems work to understand their limitations. You have to know when these techniques succeed, when they fail, and how badly they fail.


Do you feel like you understand human limitations in these critical systems? Are humans suitable? Would AI be suitable if it performs statistically better than humans?


> Would AI be suitable if it performs statistically better than humans?

In general yes, but it might depend on the pattern of failure - if your self-driving cars hunts me or my family personally, I might have problem with that.

But how can you determine that without releasing it to the wild and waiting for bodies? Worse, say you have a safe system, but you need to modify the network (to fix some bug). How can you determine that the new system is safe enough to put on the road?


But any technology can be deadly if you deploy it widely enough. _WhatsApp_ has resulted in "bodies" and it doesn't have any AI in it at all. First airplanes were basically flying coffins. Cars until early 90s had very little chance of survival in collision above 40mph. Many drugs have serious, sometimes deadly side effects. Quarter of a million people die in hospitals in US alone every year due to medical errors. 100% of those errors are currently made by humans.

It's remarkable that AI seems to be held to an arbitrarily high standard, often exceeding that of other technologies.


My guess is that most people feel AI should be held to a higher standard is because we feel the need to be able to audit the system in the case of mishaps. When ML becomes a high-level black box, we may not have the confidence in how to right that ship if it goes astray. With human errors, if we're (hopefully) empathetic creatures we at least have the hope of understanding the root of the error.


The good news is that these tools exist but they're called statistics.


Statistics applied to black-box component mishaps have a couple things going against them. 1) you need a relatively large number of failures to build a good sample of data and 2) even if you have the probabilities in place to quantify risk, you may never understand the root cause of the failure to fix or mitigate it.

For a large expensive system, the presence of either of the above may be unacceptable. Take something like the space shuttle program. If it was heavily reliant on black box AI, you might be able to build probabilities through tools like monte carlo simulations but you would be hard pressed for the government to put billions of dollars at risk without understanding the root cause of simulation failures


Rightfully so. Part of the problem is the way that we speak about the technology - calling it 'artificial intelligence' when the underlying technology does not resemble true intelligence at all. This raises expectations and lets people use AI as a dumping ground for infeasible ideas.


But my point is, even the "real" intelligence isn't so hot in a lot of cases, and AI could surpass it in terms of outcomes on narrow tasks.


"Quarter of a million people die in hospitals in US alone every year due to medical errors. 100% of those errors are currently made by humans."

Well, that's an odd thing to say. I guess you can say that, for instance, the Therac-25 episodes involved human error, but in that case, any problems with AI are also due to human error. If everything is due to human error, then there is no alternative to defend.

Medical errors certainly do involve the interface between humans and computers, and it isn't really plausible that either humans or computers can be eliminated from medicine.


> It's remarkable that AI seems to be held to an arbitrarily high standard, often exceeding that of other technologies.

Because safety critical systems require trust. One way of establishing trust is to explain or prove why the system works. This cannot be done with machine learning.

Say that you already have a safe system. Now you make a change. How do you demonstrate to a skeptical audience that the new system is safe without releasing it and counting the bodies?


But again, my point is, you can't explain why a human works. Nor can you predict how the human will perform in an extreme situation they haven't been through before.

To modify your example. Say you have a doctor who haven't killed anyone yet and she reads a paper describing a new high risk / high reward treatment. The same standard of proof is not applied to her, so she can just go ahead and try it. Wasn't there a story just the other day where doctors were injecting people with some proteins linked to Alzheimer's? They weren't required to be "proven that they're safe" before they did that.


We can't explain how and why humans work, but we have a lot of experience with them. We know their failure modes pretty well.

As to medical treatments, they are done on volunteers under informed consent. If you want to experiment with a car driven by AI on a private track staffed with volunteers, be my guest. It's between you, your conscience, your deity, and maybe OSHA.

But for experimentation and deployment on public roads, you need to convince other, possibly skeptical, people that your technology works better than humans. How do you do that with AI/ML-based systems?


>> How do you do that with AI/ML-based systems?

Same as with humans: from statistics. I don't see any other way. E.g. autonomous cars will never have zero accident rate, but if they have half as many accidents per million miles as humans, it's a no brainer to me that they should be deployed. We're not going to get there in the next 20 years, so I'm not saying they should be deployed _now_, but eventually it will happen.


> Same as with humans: from statistics. I don't see any other way.

I.e. from body count.

One problem here is that we know humans have common, stable brain architecture, so the limits and failure modes we experienced are stable too, and can be accounted for and worked around. People won't fail you in completely surprising ways.

DNNs are each a different breed; between changing architectures and tuning hyperparameters, I don't see how trust in one instance can be translated to trust un another.


Imagine for a moment that you have a working self-driving car in the field, and now you want to make a relatively minor change to the system.

How do you convince yourself or a regulator, not that the technology in general is safe, but that your specific change does not make it unsafe?

What happens if this specific change increases the fatality rate from 1 in 100 million miles to 1 in 50 million miles? Crashes will still be rare and there's enough statistical noise that when you finally understand the change is bad, you may have killed hundreds.


I think people obfuscate (in general, in these discussions) the difference between "known unknowns" and "unknown unknowns". People get scared of the unquantifiable potential errors and other people pooh-pooh the idea we can't deal with risk. But is it reasonable to consider two different kinds of risk?


The issue with "unknown unknowns" is that they are uncertainties and not risks. Risks we can more easily deal with because they are quantifiable so they can be hedged against with probabilities.


In my mind, machine learning is simply one class of algorithm primarily oriented towards multidimensional fuzzy pattern matching. This technique can easily be classed as a subset of digital signal processing concerns. Everything tacked onto the sides of this (LTSMs, etc.) in some attempt to increase the cleverness or scope of the ML network (aka self-driving cars and other assorted high magickery), seek to band-aid something that is fundamentally flawed for these purposes.

A ML network does not have intrinsic, real-time mutability in how it is defined, outside the scope of memory-based node weights, inputs, outputs or graphs over time. These nodes are added, removed or modified based on a predefined set of input, output and internal mappings. How would intermediate layers be defined in a dynamic way between input and output in such a network in an attempt to achieve these higher powers? Driving a car, for instance, is a task that requires learning entire new subsets of skills, many times ad-hoc, that require intermediate models to be dynamically developed which are potentially outside the capabilities of our understanding. The biggest challenge I see today is that we don't necessarily have a good way to dynamically construct models of the intermediate layers (such that we can map them to other layers), especially if these layers are being added and removed dynamically by algorithms at the edge of our capability to understand.

I've always felt that there needs to be some internal processing occurring at rates far higher than the input sample rates such that higher-order intelligence may emerge by way of adjusting the entities noted above multiple times per input sample (and potentially even in the absence of input). The problem is also going to come down to how a person would define outcomes vs how an AI/ML network would. For the future to really begin we will need an AI that can understand and reason with what success and failure feel like in our abstract terms. This will require it to have the capacity to dynamically construct abstractions which we would have no hope of modelling ourselves, as we do not have very deep insight into the abstractions upon which the biological human brain implements virtually any behavior today. There is no amount of discrete math in the universe which can accurately model and assess the higher-order outcomes of decisions made in our reality. You can run ML disguised as AI in simulations and environments with fixed complexity all day, but once you throw one of these "trained" networks out into the real world without any constraints, you are probably going to see very unsatisfactory outcomes.


Additional context from Ali Rahimi and Ben Recht: http://www.argmin.net/2017/12/11/alchemy-addendum/



If someone turns in math homework consisting in answers only, the teacher will probably not give any credit, and will ask to show the work. Yet the AI/ML community insists on cargo cult test for intelligence - behaving like someone with a mind.

That can be traced to the "Turing test". It was flawed then, and it is flawed now. Reproducing the behaviour of a thinking agent does not prove that a putative AI will not fail in a more detailed test, as demonstrated in numerous papers about "adversarial images".


Related: Debate at NIPS 2017 on “Interpretability is necessary for machine learning”, by senior researchers in Machine Learning including Yann LeCun.

https://youtu.be/93Xv8vJ2acI




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: