
Asking the Right Questions About AI - tim_sw
https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48
======
danso
This is easily one of the best explanations of AI and explorations of its
implications that I've read. Probably because I largely agree with the author,
but have never been able to express such assertions so clearly:

> _These are rarely new problems; rather, the formal process of explaining our
> desires to a computer — the ultimate case of someone with no cultural
> context or ability to infer what we don’t say — forces us to be explicit in
> ways we generally aren’t used to._

It was also interesting to read his inside account about Google Photos's
"Gorilla Incident". I had always assumed it was due to training on limited
data but that just didn't make sense as Google would not have been lacking for
data in 2015.

~~~
TeMPOraL
The article is good, I agree.

The line you quoted is not specific to AI/ML, it's the fundamental rule of
programming (and why I absolutely love it as a tool for understanding) - the
machine is the ultimate bullshit antagonist. You can't handwave away your lack
of understanding of a problem, like you can do with people. And the process of
trying to code a problem (e.g. for a simulation) forces you to learn just
about everything you don't yet comprehend about it.

~~~
danso
> _The line you quoted is not specific to AI /ML, it's the fundamental rule of
> programming_

Very good point. Though the OP's observation comes with additional
poignancy/force in the context of artificial intelligence. A lot of people are
fine with the idea of machines being told explicitly what to when it comes to
doing "mechanical" things. I would guess most of these laypeople do not
realize that the same sausage-making process is involved when it comes to
having machines do human-like things.

------
TeMPOraL
Point 3.2 is golden, and I'm very happy people seem to be no longer afraid to
publish the ugly truth here (it's the second article I've seen touching this
over the past month), instead of being in complete denial.

AI doesn't care about our nice " _ought to be_ " fake worlds, and algorithms
themselves don't have social biases. So when an ML algorithm spits out
something that goes against the prevailing ideologies, it's most likely the
fact in the data. If the data is good (i.e. it represents the real world
properly), then the ideologies should be questioned.

~~~
solipsism
_So when an ML algorithm spits out something that goes against the prevailing
ideologies, it 's most likely the fact in the data._

You seem to have missed the subtlety in the author's treatment of this
subject. His point is exactly that the data is not necessarily "good". It's
biased because we are biased [1], and you have to be aware of that when
interpreting results. It's the data that needs to be questioned.

[1] _if three white teenagers are arrested for a crime, not only are news
media much less likely to show their mug shots, but they’re less likely to
refer to them as “white teenagers.”_

~~~
TeMPOraL
I didn't miss it; I address it in the very next sentence - " _If_ the data is
good (i.e. it represents the real world properly), _then_ the ideologies
should be questioned." There is a if-then clause there.

The point about mugshots shows perfectly the problem of what happens when the
data is biased.

But then the corollary is, sometimes the data _is_ good. The relevant quote
from article:

> _If you try to manually “ignore race” by not letting race be an input to
> your model, it comes in through the back door: for example, someone’s zip
> code and income predict their race with great precision._

This is what happens with loan or crime data sets, and it made plenty of noise
in the news recently - but only the kind of noise in which people say it's
obviously the algorithm that's broken, because it doesn't fit the "polite
fiction" they'd like to believe.

~~~
throwawayjava
_> But then the corollary is, sometimes the data is good._

Yes. but in my limited experience with ML, "the data is good" usually _isn 't_
the

 _> > most likely_

explanation :-)

 _> This is what happens with loan or crime data sets, and it made plenty of
noise in the news recently - but only the kind of noise in which people say
it's obviously the algorithm that's broken, because it doesn't fit the "polite
fiction" they'd like to believe._

People say the algorithm is broken because it's illegal to discriminate on the
basis of race, and these algorithms were sneaking "discriminate by race" "in
through the back door".

Calling the law a polite fiction won't get you very far when the judge issues
an injunction against using your product because it's racially biased.

And the judge doesn't care how machine learning researchers define bias. He
cares how the law defines bias.

So, if you're diddling around on your computer in your own time, then I guess
the algorithm isn't broken. But if you're building a product you want to sell
to courts or insurance companies, the algorithm very much is broken.

That is, unless you think "the law is a polite fiction and bias means only
what ML researchers say it means" would be a winning argument to lift an
injunction against your product. If you're going to make those arguments in
front of a real judge, let me know; I want to see the bulge in that judge's
forehead :-)

No. People say the algorithm is broken because it _obviously is broken_.

~~~
TeMPOraL
So you're basically saying the algorithm is broken because it discovers
_illegal_ correlations, even though they may be _true_.

Well, that's precisely "polite fiction".

(Also, laws are not designed as truth-seeking tools but usable heuristics that
take into account limits of _individual_ humans.)

~~~
throwawayjava
_> So you're basically saying the algorithm is broken because it discovers
illegal correlations, even though they may be true._

No. Discovering those correlations is completely legal. Making certain
_decisions_ based upon those correlations is illegal. Mostly because by making
a _decision_ , you make a tacit assumption about causation that's borderline
impossible to prove and can have a huge impact on people's lives.

Like I said, if it's just you in your home office having fun, go at it. But if
you then bake that model into certain products, you have a very serious bug.

 _> Well, that's precisely "polite fiction"._

Except in this case, it's not even that!!!

The observation that certain crime statistics are highly correlated with race,
and that race is correlated with zip code, are not new observations. ML did
not usher in some brave new world here. Just because we call it "AI in 2018"
instead of "John from the Actuarial dept. in 1960" doesn't change the moral,
ethical, or legal landscape. (the article literally makes exactly this point.)

And despite your characterization, this particular truth (race correlates to
crime correlates to zip) _isn 't even a politically incorrect observation!!!_
It's something everyone already knows, and I've _never_ seen someone attacked
for pointing out this correlation. In fact, it's a favorite talking point of
social justice types! Pointing out this correlation is _not_ impolite.

The impolite assertion is that there's a _causative_ link between race and
crime. That's an assertion that models (tacitly) make when their users shift
from truth-seeking to decision-making. See the "question you thought you asked
vs. question you actually asked" portion of the essay.

Now, if ML algorithms discovered some genetic, racial, causal theory of crime,
then you might have a point about ML exposing polite fictions in this case.
But they didn't, so you don't. COMPAS isn't being censored from sharing a
politically incorrect truth. It's being prevented from ruining people's lives
with a really, really lazy application of statistics.

I often joke that racial discrimination laws are one of the few examples where
"being bad at math" is not just criminal, but unconstitutional.

 _> ...laws are not designed as truth-seeking tools_

Again, in cases where bias becomes illegal, these models are _NOT_ just being
used to seek truth. They're being used to _make decisions_.

You seem to have misconstrued the fundamental thesis of the article. The
author _isn 't_ calling for death to polite fictions. And there's a concrete
example in the article of this point of departure between your perspective and
his. Namely, his response to people-as-gorillas was not "fuck you, the math is
right". And the crescendo of the piece is "AI is just a tool, not a divine
oracle, and there's nothing new under the sun".

------
YeGoblynQueenne
>> _As I write this, I’m going to use the terms “artificial intelligence” (AI)
and “machine learning” (ML) more or less interchangeably. There’s a stupid
reason these terms mean almost the same thing: it’s that “artificial
intelligence” has historically been defined as “whatever computers can’t do
yet.” For years, people argued that it would take true artificial intelligence
to play chess, or simulate conversations, or recognize images; every time one
of those things actually happened, the goalposts got moved. The phrase
“artificial intelligence” was just too frightening: it cut too close, perhaps,
to the way we define ourselves, and what makes us different as humans. So at
some point, professionals started using the term “machine learning” to avoid
the entire conversation, and it stuck. But it never really stuck, and if I
only talked about “machine learning” I’d sound strangely mechanical — because
even professionals talk about AI all the time._

That's a typically unhistorical explanation of the relationship between
machine learning and AI, which is really quite simple: machine learning is a
field in the broader reseach subject of AI. "Professionals" did not "at some
point" start using machine learning instead of AI. The field got a name when
it solidified into a field, just like Natural Language Processing did, or
Machine Vision.

In fact, to use "machine learning" as a byword for "AI" is not different than
using "NLP" or "Machine Vision" as a byword for AI. It is that wrong and makes
it that evident that the person using the terms doesn't really understand
where they come from.

~~~
joe_the_user
_The field got a name when it solidified into a field, just like Natural
Language Processing did, or Machine Vision._

I'd say a claim of machine learning being "solidified into a field" is
somewhere between false and meaningless.

Just look at neural networks. These may be the most successful machine
learning or AI technique ever but they're also the technique most dependent on
ad-hoc tweaks and tuning of anything so-far. Which seems rather the oppose of
solidified.

~~~
YeGoblynQueenne
Machine learning has a journal and a conference, like neural networks do.
They're AI fields. I don't know what your definition is.

~~~
danso
The Machine Learning journal (which later became the Journal of Machine
Learning) was started in 1986 [0], which is 25+ years from when it was first
coined [1].

According to Google Trends [2], both computer vision and artificial neural
networks, as topics, were more popular than machine learning until 2010.
Afterwards, both subfields drop to their historic lows of search interest
while interest in machine learning multiples year over year. It's hard to
believe that such rapid growth didn't involve a conflation of taxonomy.

[0]
[https://en.wikipedia.org/wiki/Machine_Learning_(journal)](https://en.wikipedia.org/wiki/Machine_Learning_\(journal\))

[1]
[https://en.wikipedia.org/wiki/Machine_learning](https://en.wikipedia.org/wiki/Machine_learning)

[2]
[https://trends.google.com/trends/explore?date=all&q=%2Fm%2F0...](https://trends.google.com/trends/explore?date=all&q=%2Fm%2F01hyh_,%2Fm%2F01xzx,%2Fm%2F05dhw)

~~~
YeGoblynQueenne
If I remember correctly, google trends looks at the frequency of use of
n-grams in a google search corpus. I wouldn't expect this to reflect research
activity, as such.

The fact that the journal of machine learning is older than the term also
doesn't say much. My disagreement with the previous poster seems to be about
whether machine learning is a field of AI or not. Well- if it has a journal
(and a conference) (or the other way around), then it's a field of study.

------
bluetwo
One of these days, sooner or later, this is what is going to happen:

Some people being toted about by an autonomous car are going to be killed. The
NTSB is going to blame the car for making bad decisions. The car maker is
going to blame the condition of the road, signs, or painted lines. They are
going to pin the blame on the state government.

Insurance companies will point fingers at each other, depending on who they
represent.

Then, everyone is going to sue everyone.

The state is going to have to explain WHY it let these cars on roads that were
not maintained at a level of being 100% compatible with the level of
technology in existence.

The state will say they were told the technology was safe, and pull out fancy
presentations that were put together with no intention of ever ending up in
court.

Technology companies will shrug and say it was marketing speak and, well, you
knew the risks.

Then everyone will start covering their butts, and AI in autonomous cars will
stop become a thing for awhile.

~~~
indubitable
We've already had fatal self driving crashes while the self driving system was
in operation. The media irresponsibly, but predictably, tried to use it to
drive people into a terrified frenzy - which surprisingly did not work. And
even more surprising, the regulators were also reasonable. There was an
understanding of what caused the crash and that there are inherent risks. This
was contrasted by data indicating that, for instance, since Tesla rolled out
their autopilot system - crashes as a whole declined some odd 40%.

And finally there is even just macabre data on fatality rates. On average in
the US currently about 1.2 people are expected to die in car crashes per 100
million miles driven. Tesla alone has had hundreds of millions of miles driven
in its full autopilot mode, and the fatalities are not racking up like they
'should' be. We could argue that maybe people are paying more attention when
the autopilot system is enabled, but practice across the industry has shown
the exact opposite to be invariably true.

I think it's just been so long since our society had a revolutionary
'physical' technology, that we've become scared of change and progress.
Imagine what a change it was going from horses to cars operating on internal
combustion engines and relying on brakes to stop these vehicles going far
faster than any horse. Or imagine the idea of stringing the entire country up
with electric polls with so much energy going between them that anything
landing on a wire and touching something else was certain to be killed, and
accepting the fact that when these polls, or the lines between them, go down
they not only could but on some occasions _would_ start fires, and so forth.
Going from human to automatically driven vehicles is a hardly meaningful
contrasted against many of the revolutions of technology throughout the times.

And one final point is that we're also being somewhat self centered in this
discussion. These technological changes we're seeing are happening worldwide.
An image of e.g. Singapore streets with people in vehicles relaxing as they
are autonomously driven to their destination would leave the state of the US
and its rules and regulations looking increasingly like an anachronism.
Certainly something that would cast a dim light on our view of ourselves as
the world leader in innovation.

~~~
pjc50
> We've already had fatal self driving crashes while the self driving system
> was in operation.

There is no _true_ (level 5) self-driving system yet, only driver-assist where
the system may hand back control at any moment. It's only true self-driving
when there's no human in the car.

> Imagine what a change it was going from horses to cars operating on internal
> combustion engines and relying on brakes to stop these vehicles going far
> faster than any horse.

[https://en.wikipedia.org/wiki/Red_flag_traffic_laws](https://en.wikipedia.org/wiki/Red_flag_traffic_laws)
\- knee-jerk legislative response driven by a combination of fear and lobbying
is always a risk.

~~~
indubitable
That is rather humorous on the red flag traffic laws. Had never heard of that
before!

One thing that's nice now about self driving vehicles is that it seems that
more or less the entire automotive industry is hopping on board. For some time
it looked like it was going to be Google + Tesla doing self driving and I
think it being lobbied and consequently "regulated" out of existence in the US
was a very real danger. But at this point, which influential lobbying force
might, even if we're just speculating, be a concern?

------
jacinabox
> Intuitively, this seems obvious and valuable — yet when this is mentioned
> around ML professionals, their faces turn colors and they try to explain
> that what’s requested is physically impossible.

Machine learning models have an attention mechanism right? So just classify
whatever has the model's attention.

~~~
ahartman00
Not sure why you were downvoted, I'm assuming this was a good faith question.
I'm going to try to answer your question, but I'm still learning, so take this
with a grain of salt.

Some models have attention, but it is not a requirement.

In models that do have attention, it may or may not be simple. In image
captioning for example, you can do just that. See [1] for some pictures of
this happening. But in this example, they are stepping through a caption, and
seeing where the attention is focused for each word. Works fine for a short
caption. Videos are just many images and sound, but already this is going to
be more difficult. For natural language processing, you will be stepping
through very many words. Some models use characters instead of words. Not sure
how you could even make sense of that. I have seen people look at the
attention in machine translation. Not sure how it would work for sentiment
analysis.

So you aren't exactly wrong, but you are greatly over simplifying this for the
models where it is possible(based on my understanding). Overall, machine
learning is very much a black box, despite efforts to the contrary. It also
doesn't solve some of the other problems. For example, knowing the model is
focusing on zip code doesn't help you remove bias that comes from the data.

1\.
[https://arxiv.org/pdf/1502.03044.pdf](https://arxiv.org/pdf/1502.03044.pdf)

------
aalleavitch
I wonder if humanity is going to become a lot more introspective once it
becomes a parent to another intelligence. Things look a lot different once you
start seeing your own biases reflected in the child you're trying to raise to
be better than yourself...

~~~
BoiledCabbage
Unfortunately not. Humanity has plenty of opportunity to examine it's biases
now and for the most part, most choose to ignore them or deny them.

Just like people think They're smarter than they are, and are right more often
than they actually are, and more in control of their decision making than are,
they also think they're less biased than they are.

There is plenty of opportunity to confront that now around the world and for
the most part people dont.

Humanity will decide to address biases when we decide / are convinced it's the
right think to do, not because availability of more data.

~~~
dwaltrip
Maybe we are already trying, but it's pretty damn hard?

------
justinpombrio
> Humans are terrible at driving cars: that’s why 35,000 people were killed by
> them in the US alone in 2015.

Ahg misleading numbers. People are actually surprisingly good at driving.
Wikipedia says that the US driving fatality rate (which is worse than most
other countries) is "7.1 road fatalities per 1 billion vehicle-km". You can
tell it's low because it's measured in "per billion vehicle-km".

EDIT: When I said "misleading numbers", I should have said "misleading units".
The number 35k people killed in 2015 measures part of the toll that driving
takes on the human race, and it is terrible. However, the _units_ of
deaths/year is not a measurement of how good or bad people are at driving.
Deaths/mile-driven is such a measurement.

~~~
ocdtrekkie
I think these 'humans are terrible at driving' statements, while true, also
forget to mention that 'computers are also terrible at driving'. There are
near infinite edge cases in driving a vehicle, and I have yet to see any
convincing proof the current machine learning efforts are truly up for the
challenge. Edge cases are where computers fail: That general intelligence
they're missing is needed to decide what to do. Even Waymo is offloading that
to a call center of remote drivers. Humans can definitely be faultier at
consistent driving in a way computers are not.

I think the ideal case remains investing in safety systems that ensure
inattentiveness of a driver doesn't cause an accident, while still leaving the
human in control of the ultimate decision making of a vehicle.

~~~
fyi1183
Presumably, computers can get better at driving cars while humans cannot, so
eventually computers will win this particular competition like they've won on
chess, go, and tons of other things.

~~~
erikpukinskis
Humans can get better at driving. Instructional campaigns, assistive
technology, art, these things can all make ya better.

~~~
adrianN
Sure, but when you fix a bug in the driving software you deploy it to all cars
simultaneously. It's harder to teach all humans about a corner case you
encountered only a couple of times.

~~~
erikpukinskis
In 2018 it is still easier to teach the humans as a point of fact.

------
ocdtrekkie
Yonatan and I no longer see eye to eye on a lot of things, but this is a
really good introductory post to how AI works and certain ethical questions
around it. It's a long, but extremely enjoyable read.

