BlackRock shelves unexplainable AI liquidity models

michaelbuckbee · on Nov 13, 2018

I learned a new term in the context of AI recently: "Specification Gaming".

There's a big list here: https://t.co/OqoYN8MvMN

But it's stuff like:

- Evolved algorithm for landing aircraft exploited overflow errors in the physics simulator by creating large forces that were estimated to be zero, resulting in a perfect score

- A cooperative GAN architecture for converting images from one genre to another (eg horses<->zebras) has a loss function that rewards accurate reconstruction of images from its transformed version; CycleGAN turns out to partially solve the task by, in addition to the cross-domain analogies it learns, steganographically hiding autoencoder-style data about the original image invisibly inside the transformed image to assist the reconstruction of details.

- Simulated pancake making robot learned to throw the pancake as high in the air as possible in order to maximize time away from the ground

- Robot hand pretending to grasp an object by moving between the camera and the object

- Self-driving car rewarded for speed learns to spin in circles

All of which leads me to think that if you can't at some level explain how/what/why it's reaching a certain conclusion that it may be reaching a radically different end than you're anticipating.

renjimen · on Nov 13, 2018

Not quite the same, but equally as disastrous is data leakage. This comes in some fun and unexpected forms. One example is from the fisheries monitoring competition on Kaggle, where NNs were learning to predict fish caught using the background image of the boats rather than the fish shown on camera. i.e. different boats tend to catch different fish

thisisit · on Nov 14, 2018

I have recently started learning ML and trying competitions on Kaggle. And I also seeing that in many of the competitions I have tried. The biggest predictor turns out to be the feature which wont be present when the data is created.

devoply · on Nov 14, 2018

inductive learning.

neel8986 · on Nov 13, 2018

Great examples. I think algorithms in sites like Facebook did something similar. In order to maximize view/clicks, they themselves created problems like an echo chamber or promoting divisive articles which in turn made the whole experience worse

sizzle · on Nov 14, 2018

Facebook algorithms making the whole experience worse is being quite charitable in describing what it did, considering it's huge role in spreading of fake news and biasing people's thinking towards the presidential election along with creating strong negative feedback loops.

reitanqild · on Nov 13, 2018

And Google has "simplified" search by fuzzing terms until it is somewhat possible to use even if you really cannat spel.

In the process they broke it so badly I now prefer DDG. (Who also gives me results I didn't ask for.)

tomjakubowski · on Nov 13, 2018

There are some wonderful examples of this in tom7's videos about learnfun/playfun, an AI that (in short, very simple/stupid terms because that's all I know) in learnfun watches human gamepad inputs + the NES memory, and in playfun tries to provide its own inputs to make numbers in certain NES memory locations (those that went up for the human player) go up.

One way to not lose is to just pause the game. Minor spoiler: learnfun figures that one out :-)

http://www.cs.cmu.edu/~tom7/mario/

adampk · on Nov 14, 2018

So "Specification Gaming" seems to be a new age The Monkey's Paw?

https://en.wikipedia.org/wiki/The_Monkey%27s_Paw

fatnoah · on Nov 14, 2018

This is my favorite: "A robot arm with a purposely disabled gripper found a way to hit the box in a way that would force the gripper open"

agentofoblivion · on Nov 13, 2018

I hear this a lot. In my opinion, people overestimate their ability to “understand” non-neural net models.

For instance, take the go-to classification model: Logistic Regression. Many people think they can draw insight by looking at the coefficients on the variables. If it’s 2.0 for variable A and 1.0 for variable B, then A must move the needle twice as much.

But not so fast. B, for instance, might be correlated with A. In this case, the coefficients are also correlated and interpretability becomes much more nuanced. And this isn’t the exception, it’s the rule. If you have a lot of features, chances are many of them are correlated.

In addition, your variables likely operate at different scales, so you’ll have needed to normalize and scale everything, which makes another layer of abstraction between you and interpretation. This becomes even more complicated when you consider encoded categorical variables. Are you trying to interpret each category independently, or assess their importance as a group? Not obvious how to make these aggregations. The story only gets more complicated for e.g. Random Forests.

I think it’s best to accept that you can’t interpret these models very well in general. At least in the case of some models (like neural nets), they approximate a Bayesian posterior, which has some nice properties.

stdbrouw · on Nov 13, 2018

Hm, I agree with what you're saying, but your example is pretty whack:

(1) in a logistic regression the coefficients are on a log scale ergo the ratio between exp(2) and exp(1) is actually x2.7, not x2; the bigger issue is that you have to compare the strength of the association to how easy it is to move that lever, e.g. men might like our marketing message more than women, but it's not like we're going to get people to change gender.

(2) moderately correlated predictors do not bias or otherwise complicate the interpretation of regression parameters, it's only unmeasured confounders that do, that is, correlations between variables where one of the variables is not included in the model.

agentofoblivion · on Nov 13, 2018

Your point on (2) and bias may be correct, but I disagree about interpretation. If there's a one-way causal link between A and B (e.g. B causes A, but A does not cause B), then should you interpret the presence of B as having an impact that sums the coefficients, or ignore the correlation and pretend the impact came from A?

Or what if A always caused B, but the impact was slightly less than if B occurred without A? In that case, the sign on A might be negative, but its presence would actually tend to increase the probability of class=1, it's just that the positive impact has already been counted by variable B.

Maybe you try to avoid this situation by adding in an explicit interaction term of A*B, but then how do you interpret the impact of A since you now have more than one coefficient?

If you feel confident making assertive statements about what has been learned by looking at an equation fit on multi-correlated data, then your mathematical intuition is much stronger than mine!

stdbrouw · on Nov 14, 2018

The interpretation of interaction terms (and first-order terms in the presence of interactions) is something that is taught to bachelor students in the social sciences all the time. I'm not going to say it's entirely straightforward and I'm sure lots of people get it wrong, but it's not rocket science. If people jump in and use a technique without having a clue about how it's supposed to be used (and I'm sure there are plenty -- I agree with you on that), then that's ultimately their own stupid fault.

claytonjy · on Nov 13, 2018

In (2) it's actually worse than that; any relevant predictor left out of the model affects the betas of all other predictors, even if it's uncorrelated to all other predictors.

This is a big thing most people miss when trying to interpret logistic regression the way they do linear regression. Logistic regression estimates are _conditional_ on the model spec in a way linear regression estimates are not.

gbrown · on Nov 13, 2018

I'm not sure I follow your second point. In both multiple linear regression and multiple logistic regression, the coefficients are interpreted conditionally - a one unit increase in X^{(i)} is associated with a B^{(i)} unit change in f(E(Y)), conditional on the values of the other covariates. Using a logit link function does remove the result from the raw scale of Y (thus, f), but in my mind the "model" is more than just the data distribution - it's whatever form your linear predictor takes.

This is the problem the poster originally pointed to - interpreting a variable conditional on the fixed values of all others doesn't always make sense when there's strong correlation among predictors.

zwaps · on Nov 14, 2018

I still don't get your issue at all. There are tons of different models and implications, and it depends entirely on your question how you interpret the model.

Often you will go after average partial effects in your sample. Or you have some correlated variables in mind such that you can just plot the partial effects. Sometimes you have different models, where you will plot a distribution of partial effects around a mean based on some prior assumption...

I mean, it really depends. If you impose a more complex relationship, then of course you can not put everything in a single number. But no matter what you are interested in, such a model will give you the possibility to exactly determine the measure.

And what you are saying is also not correct. I may very often be interested in fixed values. Doing this regression, and not a quantile regression, for example, means I am somehow interested in a conditional expectation. That means I probably care about some sort of average impact, perhaps for some fixed subgroups. But those averages are fixed values...

I think the point is that in these models, we know exactly what we go after, how to get there, and what it means. We know exactly when our inference may fail.

If we both care about average effects, and I can convince you of my identification assumptions, then there is really no mystery left as to what my estimates mean.

In with Deep Learning, this is still more difficult.

BenoitP · on Nov 13, 2018

> But not so fast. B, for instance, might be correlated with A.

Exactly. Rendering a model explainable is an active field of research at the moment. One way to tackle this is to use Shapley values from game theory.

Here is a compilation of techniques to render a model explainable: https://christophm.github.io/interpretable-ml-book/shapley.h..., and they have an elegant way to talk about these values:

> The Shapley value is the average marginal contribution of a feature value over all possible coalitions.

Now, it doesn't identify correlated variables (which you can do with other techniques), but will balance the influence in a robust manner.

----

> The story only gets more complicated for e.g. Random Forests.

Funny you should say this. Recently there was a challenge for explainable machine learning: http://explainable.ml

My proposal to the challenge was based on Random Forests. Here is the code: https://github.com/benoitparis/explainable-challenge, here is the paper: https://github.com/benoitparis/explainable-challenge/raw/mas...

I have a neat visualization for exploratory data analysis (available in the paper and in a notebook) that I'm proud of.

----

By the way if anyone is hiring here, I'm available for talking about your problem and see if my take on explainable machine learning can help you.

disgruntledphd2 · on Nov 13, 2018

How do neural nets approximate a Bayesian posterior? Not snark, would really like some references if possible.

On the major point, while I agree with you, its much nicer to be able to show the "top" variables from a model, which is doable from logreg and forests, but is much, much, much more difficult from a neural net perspective.

Additionally, as they tend to take longer to train, its harder to iterate with them, and as they fit so very many parameters, I'm generally pretty sceptical as to their generalisability. That being said, in some tests I've run I've been pleasantly surprised at their performance.

agentofoblivion · on Nov 13, 2018

Here is a paper on the topic of NN = Bayesian Posterior Probabilities: http://www.ee.iisc.ac.in/new/people/faculty/prasantg/downloa...

Re: comment on showing "top" variables from a model, I agree this could have utility. But I would add that the devil's in the details, and there are multiple ways to calculate importance values, each of which has its own nuances and pros/cons.

For instance, how do you compare the importance of a categorical feature to a float feature? Do you one hot encode and then add their individual importances, take the average, or something else? Although sampling from the columns is meant to help deal with feature correlation, under what conditions is this effective and how do you know if your feature importances are safe? Moreover, how does this column sampling work in the context of one-hot-encoded categorical features?

This is all a way of saying that while you can devise methods for coming up with metrics, and then assign them handy titles like "Feature Importance", the reality is that these things are pretty nuanced and limited, and upper level management might be fooling themselves by thinking they're "interpreting a model" if they don't recognize the limitations and nuances involved. Or to put it another way, to say "I better understand this model because you gave me a feature importance list and a partial dependence plot," is a dangerous over-simplification.

cosmic_ape · on Nov 13, 2018

>>How do neural nets approximate a Bayesian posterior?

Not sure what GP had in mind, but if a feature x appears in a dataset n times, with pn times with positive label, and (1-p)n times with negative, and your classifier is f(x) which is trained with the "cross-entropy" cost, then the ideal value, that minimizes the cost should be f(x) = p. In this sense, f(x) is the probability of positive given feature.

Whether neural nets really realize this and how reliable that is, is another question. But that's the intention of the cross entropy cost.

beta_binomial · on Nov 13, 2018

This does not make any sense to me and neither did OP's comment about NN's approximating the posterior. In fact, if p were the solution then that would simply be the maximum likelihood estimate, which would not include the p(theta), or the prior, and hence would not be Bayesian.

cosmic_ape · on Nov 13, 2018

Well, p definitely is the solution in the case I mentioned. It is indeed the maximum likelihood solution. You could incorporate prior info about theta via a regularization term, if so inclined. What does not make sense in this?

Not sure what the OP meant, but I though it might be useful to mention how estimators may be interpreted as anything probabilistic at all. Often, arbitrary numbers between 0 and 1 are termed "probabilities", but in this case there actually is some proportion or probability to which f(x) should ideally correspond.

conjectures · on Nov 13, 2018

In general they don't in the sense of being fully Bayesian.

The mental gymnastics are that: - The objective function of the neural net was a likelihood. - The prior was improper. In which case the net is a MAP estimate.

A MAP estimate will not give you good uncertainty quantification. Given the application to risk modelling, this seems unlikely to be a trivial departure from a fully Bayesian method.

i-am-charmander · on Nov 13, 2018

That's why you go through the process of validating your model (in the example you provided, checking for multicollinearity), remedying any issues, and then using the model.

Modern software often adds very useful layers of abstraction onto existing processes and patterns. This is especially the case in the realm of machine learning software. Libraries like scikit-learn, Keras, and many others are outstanding pieces of work, and make it very easy to rapidly build and deploy ML models. However, this ease-of-use can actually be a detriment, especially to ML newcomers.

In particular, it is so easy with these types of ML libraries to do something like `from sklearn.linear_model import LinearRegression; model = LinearRegression(); model.fit(Xtrain, ytrain)`. This is great if you're trying to scalably test many different algorithms and configurations to see what predicts best. This is not so great if you're looking to test and validate some of the statistical assumptions of your model, especially with linear/logistic regression models. As an example, Python's StatsModels library will automatically warn you if certain assumptions of a linear/logistic regression model are violated/close to being violated, which could led to inappropriate conclusions/inference from the model. scikit-learn does not do this. If you have massive multicollinearity in your model (a phenomenon which can affect the reliability of individual-coefficient t statistics and the signs, positive or negative, associated with the coefficients), scikit-learn won't tell you that, and it will be on you to recognize the potential for multicollinearity occurring and remedy the issue.

Not to pick on scikit-learn, but their linear_model regression classes also don't provide p-values and standard errors associated with each predictor, common things that basic statistical modeling packages usually provide. But note that scikit-learn's goal is to provide an easy interface with which to do machine learning - not traditional statistical modeling. The ML community is known for placing emphasis on raw predictive performance of models and forgetting about validating the statistical assumptions associated with those models.

zwaps · on Nov 13, 2018

I am not sure those arguments hold so much water.

Classic regressions allow you to explain the marginal effects very well. It doesn't matter much how variables are correlated. If you saturate the model with interaction effects, you can get an accurate (in the sense of the model) prediction of the marginal effect of any variable as a function of others. This is very interpretable.

Furthermore, nowadays a lot of techniques are about estimating the causal effects based on assumptions in your data. You could use things like DID or synthetic control, use natural experiments, and so forth, to get a good idea of the causal "treatment effect" of your variable of interest, and you can even do this in a semi-parametric or non-parametric approach.

Often, estimating the linear approximation of the marginal on a conditional expectation is "good enough" to learn how things are connected within your data.

And in the end, getting this sort of causal effect of a variable is what we are really after in environments where the DGP process isn't simple. In that sense, this type of research is very compelling.

Scaling and other issues are of course important, but taking them into account is rather simple...

To be sure, a lot of work (for example in econometrics and elsewhere) is proceeding on causal inference of deep learning models, but it is probably also fair to say that right now, classical models are far easier to interpret, especially if you are interested in answering qualitative questions.

fjp · on Nov 13, 2018

Correcting these issues in a logistic regression is something you learn in undergrad stats classes and something any professional would do as part of their modeling.

gbrown · on Nov 13, 2018

Sure, interpretation must be done with care, but that's one of the primary goals of statistics. I think the bigger distinction you're missing is the difference Breiman draws between algorithmic models and data models. If you use logistic regression (and hopefully a careful study design) to describe a plausible data generating model, you can get good, interpretable inference out of it. If you use the same tool (or penalized equivalent, or RF, GBM, NN etc) as a prediction algorithm on unstructured or poorly structured inputs, you're not going to have your lunch and eat it too (get good, interpretable inference along with robust prediction).

It's also unclear to me what you mean when you say that NNs uniquely approximate a Bayesian posterior, or why that's a good thing without knowing more about what posterior you're talking about. You could do a Bayesian logistic regression and get an actual posterior, and it would not remove the interpretation challenges you raise.

xg15 · on Nov 13, 2018

> I think it’s best to accept that you can’t interpret these models very well in general.

But then, the classical question: How do you debug the models? How do you know they are actually predicting what you think they are predicting?

dboreham · on Nov 13, 2018

If ad revenue increases, who cares, right?

beta_binomial · on Nov 13, 2018

There are legitimate reasons why stats wins out on interpretability. 1. Scaling is not hard 2. It is obvious to me "how to make these aggregations", but that is because I know statistics. Categories of variables treated as random effects can be interpreted both as a group (via variance parameters of the random effects) and individually via coefficients. 3. Bayesian estimation can even account for correlation of parameters and include it in the posterior prediction.

jgalt212 · on Nov 13, 2018

> But not so fast. B, for instance, might be correlated with A. In this case, the coefficients are also correlated and interpretability becomes much more nuanced. And this isn’t the exception, it’s the rule. If you have a lot of features, chances are many of them are correlated.

Aren't we supposed to remove correlated predictors from models?

Or we can use techniques like PCA? But then, once again, we fall into the realm of unexplainable.

shoguning · on Nov 13, 2018

You can remove correlated predictors or do stepwise regression to get a minimal model with more intelligible coefficients.

The problem is it's still difficult to determine direction of causality. That's why controlled experiments are so important.

projectramo · on Nov 13, 2018

If you do some kind of PCA, that should help work out the correlations between the coefficients, though you are right these can be hard to interpret (although sometimes they have a natural interpretation).

And typically, you would demean, and divide by the standard deviation.

I guess the coefficients are harder to interpret in this second case than if you did not transform them, but they're still interpretable.

a-dub · on Nov 14, 2018

One trick I like to do in PCA is to sweep each of the variables, one at a time, in the low-d subspace and then project back up. Looking at these sweeps in the original space gives some hand-wavey "intuition" for what each low-d variable is capturing.

d--b · on Nov 13, 2018

People making/exploiting models in firms such as BlackRock, Renaissance, 2Sigmas and so on, know what they're doing. Really.

AznHisoka · on Nov 13, 2018

I don't disagree that there are companies that definitely know what they're doing, but on the other hand, we shouldn't put anyone in the financial industry on a pedestal. After LTCM, and the financial crisis of 2008, I've learned my lesson. Half these guys don't have a clue what they're doing.

d--b · on Nov 13, 2018

true, that's also why it's kind of refreshing to see a big one stepping back and questioning what they're doing.

riazrizvi · on Nov 13, 2018

Don’t forget LTCM. Very smart.

d--b · on Nov 13, 2018

I applaud this decision.

If you can't explain the model, it means you don't know the assumptions that went into the model's output, which means you won't see it coming when the model doesn't work anymore. And if you don't want to look like a moron saying "oh but the model said...", (and not getting sued for mismanaging investors money).

Honestly, it's probably the investors asking questions that led them to this decision, but nonetheless, this is reason talking.

fjp · on Nov 13, 2018

> If you can't explain the model, it means you don't know the assumptions that went into the model's output

This is true, but there are many, many kinds of models that have basically zero explanatory power but have higher predictive capabilities than models that are easier to explain. They have been around a long time and are used for many different practical applications.

Unfortunately, the draw of that seemingly infallible super-high-predictive capability will almost certainly be heavily involved in financial markets before long. I have no problem if some people want to risk a bunch of money in a hedge fund that uses neural net models or whatever else, but having enough money controlled by these models could pose a serious systemic risk.

wpietri · on Nov 13, 2018

> having enough money controlled by these models could pose a serious systemic risk

This is the part that worries me. For a decade before the 2008 financial collapse, people were quietly saying, "Gosh, there's a lot of activity in derivatives and we don't really know where the risk is going."

One of many factors there was the way rating agencies gave very generous ratings to mortgage securities. Critics note that it was in their short-term financial interest to do that. If people can screw up that badly with models they supposedly understand, it seems to me to be even more risky when working with models where people have just given up understanding and put their faith in the AI oracle. As long as they get the answers that maximize their end-of-year bonus checks, they have a strong incentive not to dig deeper.

zeroname · on Nov 13, 2018

In my estimation, existing models are just oracles with less variables.

> One of many factors there was the way rating agencies gave very generous ratings to mortgage securities.

Garbage in, garbage out.

zeroname · on Nov 13, 2018

Fair argument, but somehow I doubt you know too much about real-world finance. Not a lot of reason to be found there.

Frankly, I think the problem here is "too new" or "too many variables". I doubt any of these managers understand Black-Scholes, yet they would sign off on using it because it's established, even though applying it also lost lots of people lots of money.

"It outperformed" seems to me to be reason enough, in fact not using it something that outperforms could be construed as "mismanagement" just as well.

d--b · on Nov 14, 2018

I’ve been working in a hedge fund as a quant dev for more than 10 years. Before that I worked in 3 banks.

Most option traders definitely understand black scholes, but that’s not really the point, cause there are more complex models that they would use to trade without knowing the details of.

The point is that there are quants who you need to trust with the models. And they’re most likely the ones who said: “this seems to work, but we don’t really know why, so we probably shouldn’t use it”. The fact that the top dogs agree with that is a sign of maturity.

zeroname · on Nov 15, 2018

> I’ve been working in a hedge fund as a quant dev for more than 10 years. Before that I worked in 3 banks.

I guess you know too much about it, then.

> “this seems to work, but we don’t really know why, so we probably shouldn’t use it”

The same could be said of human intelligence. With that attitude, you can just throw out machine learning altogether. Most models are not interpretable. It's too many variables. That's how the world works though, there are too many variables for you to ever understand cause and effect at any broader scale. That's especially true with finance, where models have relatively little predictive power anyway. I'm surprised to hear an "appeal to reason" coming out of that corner, of all places.

resters · on Nov 13, 2018

Here's the scenario that makes it sensible to shelve the superior AI models:

premise 1: financial crisis hits, requiring some firms to accept immediate loans (or off books loans aka qe) to maintain solvency (classic 2008 scenario)

premise 2: firms will not have equivalent exposure, so some firms fail worse than others, but as the risk is viewed as "systemic" all get the bailout

If some firms have AI that find risks hidden in investments that traditional (explainable) models ignore, then those firms will sit out of markets that will in the meantime be profitable for the firms that are unaware of the actual risk. Metaphorically, why ruin the 70s with an accurate HIV test.

If the same models could be used to identify and securitize (and make a market in) the invisible risk, it's possible that the market price of the risk would similarly lead many firms to sit out of otherwise profitable markets, as the yields of many of the traditional investments would (after the cost of hedging) be poor.

All this would result in a shrinking of the pie without an analytical explanation. "What do you mean the pie is smaller than we thought it was and we have to grow at a slower rate than we thought?", the CEO might ask.

In most scenarios where quantitative approaches give better insight into the future, the firm to develop the approach makes a fortune until others can catch up.

But what we have today is a financial system where keeping the overall system running hot is government policy, and so all participants have the incentive to ignore information that would lead to rational reallocation of investments.

Once the system's normal is leveraged/hot enough, the system becomes resistant to certain kinds of true information.

lawlessone · on Nov 13, 2018

They're right.

Why would we put a NN in charge of anything important if we can't explain how a particular model works?

Would you want your car or an aircraft you're on piloted by neural net the actions of which can't be explained?

What if it encounters an unforseen event that causes a flash crash or worse an actual crash that kills people?

Do you want to trust something built from incomplete data and simulated annealing with your life and livelihood?

bitL · on Nov 13, 2018

Why do you trust moody, inconsistent people with their own biases as your leaders in life-or-death situations then?

jerf · on Nov 13, 2018

We have centuries/millennia/choose-a-time-frame of experience with what they do, how they tend to perform, and while it's not necessarily perfect, we know how to engineer around their limitations, again, with centuries/millennia/etc. of experience in that field.

Computer models are much younger, and we know they tend to have weird pathological corners, but unlike humans and their weird pathological corners, we have a much less firm grasp on what they are.

In many cases, humans have skin in the game, too. No computer model is yet sophisticated enough to be able to say that about.

There is also some irrationality in having someone to blame, etc. Certainly. But it's not the only part of the story.

samstave · on Nov 13, 2018

Your comment made me think of something, perhaps in the larger assessment of the soundness/comfort-level we will/do have with AI decision making:

With the moody, inconsistent people, we can understand and evaluate their incentive reasoning - positive or negative and empathize with that understanding to assign a level of trust.

With any AI - especially to ones who didnt design/create it - any and all of its motivations, incentives are completely (emotionally) opaque to the recipient of the outcomes of its behavior.

I know we are saying the same thing - but unless a "receipt" can be given for a decision to why an AI / NN made a decision, people will not learn to build an understanding of a trust.

Otherwise, AI will always be hated by the humans who feel slighted by the cold decisions of a machine against them.

And that is clearly going to be the real future.

There will be anti-AI terrorism/activism and violence against systems, companies and countries that make life-bending decisions by AI against groups of people.

The seed to this future is the gatekeeping to capital/resources through FICO-like systems.

cr0sh · on Nov 13, 2018

> There will be anti-AI terrorism/activism and violence

There's already been bomb attacks in Mexico against nanotechnology researchers, so it's not a big leap from there to AI/ML and other similar research (especially into GAI).

wpietri · on Nov 13, 2018

Absolutely. We all daily get practice in understanding humans, so I think humans are much better prepared to handle a rogue employee than a rogue model.

lawlessone · on Nov 13, 2018

I'm not saying all AI is bad, but in the case of NN, they're full of biases known and unknown.

mdorazio · on Nov 13, 2018

> they're full of biases known and unknown

Are we talking about AI or people?

gbrown · on Nov 13, 2018

Depending on where the training data comes from, both ;)

TeMPOraL · on Nov 13, 2018

Like 'jerf says, we have many centuries of experience in dealing with people and people-based systems. We know the biases, the cognitive problems, the buttons to press and the buttons to absolutely avoid pressing.

More importantly, humans share the same brain architecture and work roughly the same way - you and me included. This makes it easier for us to understand other people (also having a part of the brain trying to directly model other people's minds helps). The way those NN-based black boxes work is completely alien to us (and each being a special snowflake made of hacks does not help).

claydavisss · on Nov 13, 2018

People can be fined or jailed

wpietri · on Nov 13, 2018

That's an important point.

One of the depressing features of the 2008 financial crisis was the great number of industry titans who went smoothly from "My awesome insight is worth tens or hundreds of millions annually" to "Who could possibly have known?"

I would hate to see the same thing happen again when algorithms fail to work as hoped. If people build and deploy a system, they should be socially and legally responsible for its effects in the world. And executives should be held responsible by the rest of us. I'd like us to be done with the shrug-emoji-and-cash-the-bonus-check school of corporate governance.

adrianN · on Nov 13, 2018

And software can be deleted.

lawlessone · on Nov 13, 2018

>And software can be deleted.

The software doesn't care and isn't intimidated by that fact... yet.

quotemstr · on Nov 13, 2018

> Would you want your car or an aircraft you're on piloted by neural net the actions of which can't be explained?

Almost all cars and aircraft today are operated by neural nets, the actions of which can't be explained.

quonn · on Nov 13, 2018

As anyone driving a car can assure you, the actions can be explained, easily.

olefoo · on Nov 13, 2018

But are the explanations true? Are they deterministic?

If someone took ten minutes of video of you driving in city traffic; could you consistently explain all of your decisions that would be recorded?

It's very easy to tell yourself that your actions are consistent and explainable ( rational ) but if examined closely they aren't always consistent with any one set of heuristics. Following distance is one example; can you explain the decision making process by which you decide how much space to give the vehicle in front of you? Why it varies based on road conditions; weather or your estimate of your reaction times?

TuringNYC · on Nov 13, 2018

I'd also be worried about the model being biased. For example, perhaps the model is "better" because it uses correlated features it shouldnt be using (due to fairness laws, etc.)

nradov · on Nov 13, 2018

There's a huge difference between trading algorithms and safety-critical control systems. If you're speculating with other people's money it's not entirely unreasonable to risk it using unexplainable models, provided you disclose your general approach to investors.

cwilkes · on Nov 13, 2018

I don’t understand what you are asking for. 100% certainty before making any decision? Any decision in a messy real world that’s far more complicated than a simple math statement that can be proved?

covemarkets · on Nov 13, 2018

"Would you want your car or an aircraft you're on piloted by neural net the actions of which can't be explained?" Actually, if the net is properly trained to reduce overfitting and is shown to work on out of sample data - yes. On the flip side of this argument, a person can do something inexplicable and invent a plausible explanation after the fact that seems and feels safe and correct, but is actually wrong.

TeMPOraL · on Nov 13, 2018

Define: "properly trained".

With humans, we generally know the bounds for unexpected behavior. We understand tiredness, confusion, fear, distraction, suicidal thoughts and other factors. We also know how to screen people to minimize those bounds.

With ML stacks, we have no good grasp on bounds. They usually work, for some definition of working, up until they don't - and when they fail, it's in some absurd (therefore hard to predict) way.

covemarkets · on Nov 13, 2018

humans also fail for all sorts of reasons that are unpredictable. if, for example, Waymo, demonstrates properly calculated lower accident rates with their cars than with humans, then at some point you have to agree with what the data says. We aren't there yet, but that doesn't mean we can't get there

TeMPOraL · on Nov 13, 2018

If they demonstrate, sure.

But for that to be useful they also need to demonstrate that their model is robust under modifications - otherwise every time they retrain their NNs they should throw away prior safety records and start counting from zero. Because at this point, what would be the argument to keep it? With humans we know - from thousands of years of experience - that people generally don't go crazy when taught new things. With NNs, we know they're very sensitive to training data.

covemarkets · on Nov 13, 2018

Well, you are making pretty general statements. "With humans we know - from thousands of years of experience - that people generally don't go crazy when taught new things"

they do crazy things all the time. Drink and drive, hijack planes and run them into mountains ( germany, malaysia recently ), text and drive, etc. etc. This discussion needs to be based on comparing actual data. Let's get there and see

TeMPOraL · on Nov 14, 2018

This is one topic when you can sort of get away with general statements, because it's human interop - we all know how it works :).

I covered drunk driving and suicides in my original comment. This happens, but we know it does, we know how often and why it does, and know how to work around it.

What I was thinking about wrt. learning failure modes is this: when you put a person through 30+ hours driving course, they don't suddenly lose the ability to recognize trees or faces. The same cannot be said about retraining existing neural networks.

I'd love for the actual data to appear for analysis. Right now I'm worried about the very concept of using a black-box bag of statistical tricks DNNs are for safety-critical operations. How is it that we can't handle the problem of self-driving with more direct, stable and auditable methods?

covemarkets · on Nov 14, 2018

I think you are putting a bit too much faith in human cognition: "when you put a person through 30+ hours driving course, they don't suddenly lose the ability to recognize trees or faces". They do. they get tired and their reaction time slows down. they get drunk, they get distracted. Humans do exactly this - unpredictably lose the ability to recognize and react in a timely manner.

TeMPOraL · on Nov 14, 2018

I understand the point you're trying to get across, but to be clear about this tangent - it's not that I'm overoptimistic about human cognition. I'm aware how ill it is suited for the task of driving. I just put much, much less faith in DNNs.

georgeek · on Nov 13, 2018

David Freedman has this following dialogue in his Statistical Models: Theory and Practice book:

Philosophers' stones in the early twenty-first century Correlation, partial correlation, cross lagged correlation, principal components, factor analysis, OLS, GLS, PLS, IISLS, IIISLS, IVLS, LIML, SEM, HLM, HMM, GMM, ANOVA, MANOVA, Meta-analysis, logits, probits, ridits, tobits, RESET, DFITS, AIC, BIC, MAXNET, MDL, VAR, AR, ARIMA, ARFIMA, ARCH, GARCH, LISREL[...]...

The modeler's response We know all this. Nothing is perfect. Linearity has to be a good first approximation. Log linearity has to be a good secont approximation. THe assumptions are reasonable. The assumptions don't matter. The assumptions are conservative. You can't prove the assumptions are wrong. The biases will cancel. We can model the biases. We're only doing what everybody else does. Now we use more sophisticated techniques. If we don't do it, someone else will. What would you do? The decision-maker has to be better off with us than without us. We all have mental models. Not using a model is still a mode. The models aren't totally useless. You have to do the best you can with the data. You have to make assumptions in order to make progress. You have to give the models the benefit of the doubt. Where's the harm?

chatmasta · on Nov 13, 2018

Are the non-AI models any more “explainable?” Models built on multivariate statistics, processing terabytes of data a day, spitting out numbers might be “understandable” in the sense that there is some discrete representation of how their inputs map to outputs. But can anyone really look at those algorithms and explain why they work? What’s really the difference between NN and advanced statistical regression, beyond differing levels of familiarity/comfort?

gbrown · on Nov 13, 2018

I didn't read the whole article since I didn't want to sign up, but in general: maybe.

Machine learning models are really good at things like prediction, but if it's valuable to do inference about the phenomenon (e.g., is there evidence that X is positively associated with the odds of Y, given Z,Q,R), careful study design and appropriate statistical models are a better choice. These come with theoretical underpinnings - whether that's the coverage guarantees of frequentist methods or the decision-theoretic foundations of Bayesian inference.

I'm not sure whether or not that means this choice was good on the part of BlackRock, however.

bitL · on Nov 13, 2018

Buzzword bingo is more difficult to play with NNs than throwing some (misunderstood) p-values around for people that understand a bit about stats/optimization/etc. but aren't at the bleeding edge (i.e. a typical MBA with an outdated tech degree).

raxxorrax · on Nov 13, 2018

Came here for fluid mechanics. Don't know if my tech degree is outdated already, but I certainly skipped getting an MBA. At least I wondered a bit as to why BlackRock is into fluidity simulations too.

bitL · on Nov 13, 2018

Black-Scholes computed on NVidia Teslas, right?

If your tech degree is not in computer science, it's likely not outdated. If it is in CS, then it most likely is.

parallel_item · on Nov 13, 2018

I think a key factor in this decision may be the perceived risk of putting huge capital behind a single black box model. I would assume this differs from more ML-heavy quant firms like Two-Sigma, because BlackRock's products generally perform at a huge scale with some central idea behind them. Two-Sigma probably can spread out the same amount of assets across many different black-box models, diversifying and reducing risk through these means. In this case, perhaps only 1 model dictating such a huge chunk of capital was just too much uncertainty?

I have no evidence of the scale and diversification of both these, so evidence would be helpful in refuting the above!

beta_binomial · on Nov 13, 2018

I think so. The ultimate question is who are you going to sue and who is going to sue you if something goes wrong? Imagine having to put your ML researcher on the stand and having him say "I can't say for sure that this or that didn't affect the outcome in a meaningful way"

reallymental · on Nov 13, 2018

Who's to blame when the model sees 'red' ? Management needs a head, a model isn't one yet.

rq1 · on Nov 13, 2018

Quite natural. AI in market finance is a fraud for the moment.

AI models totally fail to do what classical (and parsimonious, explainable, cheap...) methods/algos/models achieve quite easily (BS, Hawkes, RFSV, uncertainty zones, Almgren-Chriss/Cartea-Jaimungal... etc.). Actually, I'm tempted to say that AIs don't work at all.

I've seen so far funds leveraging "big data" with AIs (eg. realtime processing of satellite imagery, cameras, (more) news...) and get more/better information (than the others) to finally calibrate and use these (parsimonious) models, nothing (interesting) else.

Do not get fooled. Lots of banks announced that they use AIs, to surf on the hype, because today if you don't do AIs, you're not in, because today everyone is a Data Scientist, that's all.

Invictus0 · on Nov 13, 2018

Anyone have a mirror link?

fipple · on Nov 13, 2018

Corporations exist in a world with governments and politics. It’s entirely reasonable for senior management to require a methology that they can defend in a televised Senate hearing even at the expense of some predictive power.

yters · on Nov 13, 2018

Maybe just shelve ML and go back to traditional statistics which focus a lot more on being explainable.

ZeroCool2u · on Nov 13, 2018

This seems like a management problem, not a model issue.

zzzeek · on Nov 13, 2018

content not available without a paid subscription?

fiveFeet · on Nov 13, 2018

Is there a free version of the article available? The link requires paid subscription.

bluetwo · on Nov 13, 2018

Isn't it their choice to make?

m3kw9 · on Nov 13, 2018

The manager probably saw the model as a threat to his job security. Looked for a way out and there it is, the always persistent problem of AI models

lucozade · on Nov 13, 2018

> The manager probably saw the model as a threat to his job security

Indeed, because if the manager doesn't understand the model well enough to either mitigate its weaknesses or reserve sufficiently against them, they'll probably get fired some point down the line.

5minbreak · on Nov 13, 2018

Both a fault of the employees that worked on this, and the manager.

Your deliverable should be an interpretable model. You can (and probably should) make neural network models interpretable. If upper management does not trust your performance evaluation enough to bet on it, either the evaluation was weak (and no model should be deployed, however simple and interpretable) or upper management doesn't know enough about modern ML to have to make these decisions.

I have sympathy for the manager in charge for making a decision on a complex model (while all they ever knew was simple survival models and basic statistical models). But you got to move with the times. Your competitors will use the most powerful models available (and some may go under due to improper risk management). Your employees don't want to build logistic regression models until eternity.

typeformer · on Nov 13, 2018

Not even Google has come anywhere close to being able to make complex NN models that have a human interperable "receipt" for decisions. In fact, for certain classes of problem solving it's likely impossible and that is already a huge problem.

5minbreak · on Nov 13, 2018

I posit that complex NN models can achieve the same level of interpretability as logistic regression. In part, because some interpretability methods use logistic regression as a white box model to explain black boxes.

In other words: If you are comfortable OK'ing a logistic regression model (because you looked at the coefficients and they made sense), you should be comfortable OK'ing a complex NN model (because the evaluation and interpretability modeling makes sense).

Nitpicking, but significant: Most models don't output decisions, they output predictions. Decision scientists then build a policy on top of the model. Key issue here is that the policy makers don't trust the predictions. But I posit they have no reason to trust the predictions of a logistic regression model any more than the predictions of a complex black box. Provided, of course, you deliver interpretability UX, confidence estimates, and strong statistical guarantees and tests. Which is possible for even the blackest of boxes.

If automatic justification is impossible for computers/black boxes, I believe it is impossible for humans too (as per Church-Turing). But let's say it is impossible. Do you think Google would use a white box model to optimize Adsense, because they can't interpret powerful deep learning solutions (like risk management for BlackRock: a very critical part of their business)?

I'd say Google came pretty close with https://distill.pub/2018/building-blocks/ (they are not the only players in the interpretability field, and plenty of methods are becoming available, in large part driven by academia not business: interpretability and fairness are not too important for the bottom line).

jmalicki · on Nov 13, 2018

They've come up with plenty (e.g. https://blog.openai.com/adversarial-example-research/ ) - noone broadcasts them widely because noone likes the answers