

Rage Against the Algorithms - vellum
http://www.theatlantic.com/technology/archive/2013/10/rage-against-the-algorithms/280255/

======
tikhonj
An especially interesting thing about machine learning algorithms is that even
understanding the algorithm itself will often tell you nothing about its
biases: the biases often come from the training data. The article talks about
this a little bit, but I think it's a very important point for programmers in
particular. For one, this means that even an open source algorithm, with the
best intentions, might have significantly detrimental biases.

This also explains why I _personally_ do not find machine learning satisfying.
It's obviously very useful, and I'm not making a judgement about the field,
but I just find solving a problem with machine learning to often feel empty.
You get a solution, sure, but you get no additional insight on the problem
itself. And I'm often far more interested in this insight than in any given
problem itself.

I certainly find the underlying principles fascinating, but that fascination
usually does not translate to whatever field machine learning is used for.
Writing a system for identifying cat pictures will teach you quite a bit about
"identifying", but not very much about cats.

Ultimately, all this just means that you have to pay attention to what your
machine learning algorithm is doing _right now_ , even if you understand the
algorithm itself really well.

~~~
tzs
> An especially interesting thing about machine learning algorithms is that
> even understanding the algorithm itself will often tell you nothing about
> its biases

A related problem is that many (most?) machine learning algorithms cannot tell
you WHY they make the decisions or classifications that they do. Suppose you
are a bank, using a neural net to decide on credit applications, and the net
turns down an applicant. You are required by law to tell them why they were
denied credit.

How do you do that for a neural net?

I can think of one way: try tweaking their application until it passes, and
then derive an explanation from that. For instance, if you find that they
would pass if they made $20k/year more income, you tell them they were
rejected because their income was too low. I have no idea if that would be
sufficient to satisfy the legal requirements, though.

~~~
waps
Well you don't train the algorithm to just give you Yes/No. You tell it to
give you

a) Yes, SUPER green

b) Yes, Green

c) mmmmmmmmmmmmmmmmyeah, ok, but watch this guy

d) please review manually

e) no, too many other loans

f) no, but maybe yes, with more collateral

g) no, convicted felon

h) no, unemployed

i) no, your wife has a billion loans

j) no, insufficient money down

k) ...

(there's like 300 or so different reasons in practice)

So the inputs to the neural net would be all the information you know about
the applicant, appropriately encoded. Then you sprinkle hidden layers to
personal taste, and the output layer would be a->i binary neurons.

~~~
neuralk
True, but I believe the argument still stands for other ML algorithms, such as
support vector machines. There is no way to query directly why the SVM maps an
instance to a label.

~~~
alanctgardner2
It seems like in SVM the best way to find out 'how to pass' is to find the
shortest path from your point in the feature space to the separating
hyperplane. The length of that vector in each axis explains how much you need
to change to cross the threshold.

------
willvarfar
The thing about not auto-correcting abortion is, I speculate, actually because
it'd be really unpopular if other misspellings were mis-corrected to read
abortion. I think there's likely a short blacklist of words that are in the
dictionary but aren't auto-corrected to.

------
Theriac25
> A recent survey found that 76 percent of consumers check online reviews
> before buying

Bullshit.

~~~
jonafato
I normally disagree with this type of comment, but based on the actual survey
they refer to [1], you appear to be correct. The most probable way I see that
they've come to this number is by using the other side of "24% of consumers
never use online reviews". The rest of the data is "27% of consumers regularly
use online reviews" with the remaining 49% checking occasionally.

The way the statement is worded, I feel that many would read it as "A recent
survey found that 76 percent of consumers _regularly_ check online reviews
before buying" even if that's not how it's written. A better statement would
make the breakdown of regular vs. occasional checkers clear.

That said, single word comments don't add much to the conversation. As stated,
and if the survey is to be believed, the line is factually correct even if it
is misleading. I'm not sure if "bullshit" refers to how you interpreted the
statement or the results of the survey. Without clarifying your objection,
though, it will likely be ignored by most.

[1] [http://searchengineland.com/study-72-of-consumers-trust-
onli...](http://searchengineland.com/study-72-of-consumers-trust-online-
reviews-as-much-as-personal-recommendations-114152)

Edit: added link to the survey

~~~
kylebrown
I think Theriac25's "bullshit" was reference to the fact that 86% of
statistics are either made up on the spot, or fudged data from non-
representative samples.

------
zzxxvvcc
More transparecy into credit scoring algorithms would be nice.

