
Systems smart enough to know when they're not smart enough (2017) - nromiun
https://bigmedium.com/ideas/systems-smart-enough-to-know-theyre-not-smart-enough.html
======
firefoxd
One problem is we only talk about the victories and never educate the end-user
who is supposed to use these systems.

One great example was Google Duplex. Despite the impressive demo, people had a
hard time recreating that success [1].

When a product is promoted as perfect, people start to believe it is perfect.
I wrote an article about Duolingo, saying that you need a lot more than that
to learn Spanish. Google displays a snippet that says it is pointless and
points to my blog. I used the word pointless, but in a completely different
context. Now I get angry comments and threatening emails from hardcore fans.
(google if you can learn Spanish with Duolingo, if you are lucky you'll see
it)

Arguments are settled when your point is displayed as a snippet on Google. The
problem is, the counter argument can also be displayed with the right word
combination.

How do we tell the mass to go a little beyond the snippets?

[1]: [https://www.vanityfair.com/news/2018/05/uh-did-google-
fake-i...](https://www.vanityfair.com/news/2018/05/uh-did-google-fake-its-big-
ai-demo)

~~~
mistermann
> Now I get angry comments and threatening emails from hardcore fans.

Which is another example of a system (the human mind) not being smart enough
to know it isn't smart enough.

FTA:

> Google’s Featured Snippets Are Worse Than Fake News, writes Adrianne
> Jeffries, pointing out the downsides of Google’s efforts to provide what
> Danny Sullivan calls the “one true answer” as fast as possible. About 15% of
> Google searches offer a featured snippet, that text excerpt that shows up
> inside a big bold box at the top of the results. It’s presented as the
> answer to your question. “Unfortunately, not all of these answers are
> actually true,” Jeffries writes.

> You know, like this one:

> "Barack Hussein Obama is implementing Alinsky’s rules on a much wider scale.
> He is using these rules to divide America so that he can cause widespread
> chaos and panic so he can declare martial war. _This is his plan_."

I suspect most people here have encountered (in-person or at least online) a
person who believes such things, and shake their head at how silly and
illogical this is. But if you change the topic from Obama to something
involving Trump, a lot of the time _the very same phenomena_ (" _This is(!)
his plan /intent/desire_") will manifest, often within the very same people
that scoff at those who would believe the Obama narrative. And if you call
them on it, the magical _post-hoc rationalization_ [1] process in the brain
will jump into action, manufacturing _in real-time_ an elaborate, not-
entirely-logic-based _narrative_ that justifies System 1's _objectively
incorrect_ prediction...which is what I suspect also prevents people from
realizing what has just happened. And, if you challenge that narrative, a
variety of other very interesting _and predictable_ behaviors will manifest
(which is incredibly easy to see in individuals within one's out-group, but
incredibly hard to see within your own behavior, _because of the innate(!)
post-hoc rationalization capabilities of the mind_ ).

This is what I refer to above when I say "the human mind is not smart enough
to know it isn't smart enough" \- this is simply how the human mind has
evolved to work. Using Daniel Kahneman's terminology from "Thinking Fast and
Slow"[2], this is simply an illustration of System 1 providing instantaneous
(but not necessarily correct) answers to questions, and System 2 (the slow but
more accurate _conscious_ mind) not intercepting and correcting the prediction
provided by System 1. _Everyone_ falls victim to this (yes, even us geniuses
on HN), no matter how intelligent and educated they are.

> How do we tell the mass to go a little beyond the snippets?

We could tell them to "just" be enlightened, like we tell people to "just" do
a whole bunch of other things (be logical, intelligent, kind, understanding,
"properly" informed and hold the "right" beliefs, etc). (In case it's not
obvious, I say this part with tongue firmly in cheek.)

[1] [https://www.patheos.com/blogs/tippling/2013/11/14/post-
hoc-r...](https://www.patheos.com/blogs/tippling/2013/11/14/post-hoc-
rationalisation-reasoning-our-intuition-and-changing-our-minds/)

[2]
[https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow](https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)

EDIT: A good and important example where this behavior can be easily observed
is in the climate change debate - particularly online discussions, and
_particularly_ (and _seemingly_ counter-intuitively, see my "protected"
comment here[3]) among more intelligent and informed people. As usual, it is
incredibly easy to see ( _and imagine(!)_ ) the flaws in other people's
thinking on the topic, but good luck seeing flaws in your own. This is why I
keep saying the _root problem_ in the climate change debate is not one of
climate science (education and understanding), but of psychology/neurology.
And once again, this idea seems to be very unpopular and uninteresting,
_especially_ to those who care deeply about the problem.

[3]
[https://news.ycombinator.com/item?id=23093016](https://news.ycombinator.com/item?id=23093016)

~~~
MereInterest
I think one of my favorite errors by google's featured snippets was when
google stated quite definitely that Mercury performed a gravity assist using
the planet Venus. It was wrong to a laughable degree. Reading the article
itself, that sentence referred to the Mariner 10 probe performing a gravity
assist around Venus, en route to Mercury.

------
GuiA
This is really terrifying in the context of machine translation. I've seen a
few forums (I think they were Discourse installs?) configured to automatically
translate posts not in your browser language. It's very hard to tell that the
post wasn't originally posted in English unless you look very hard for the
tiny light-gray-on-white-background text.

Machine translation, to put it mildly, is utterly hopeless at context and
nuance. I've had Google spit out very racist/sexist results from innocent
source text. The fact that it's so confident about what it shows, and that the
engineering teams working on these products have very little modesty, means
that people treat it as a source of a truth.

A future where you have no clue if what you are reading is something written
by a human as-is, or went through machine translation and might have nothing
to do with what the original author intended, does not put me at ease.

~~~
heavenlyblue
I would say Google’s answers answer about 10% if the questions they say they
answer.

I have seen so many time Google simply extracting the wrong paragraph of the
text saying exactly the opposite of the gist it’s amazing.

The worst thing is that there’s no way they can measure the efficiency of
these answers directly (I.e. they can only measure how many times it wasn’t
clicked as opposed it was).

By the time users realised Google had outright lied to them with the answer
the team that was shipping these answer widgets is already cashing out from
shipping that project.

------
gitgud
Seems similar to the way a programming language might handle errors. A
function might handle these error cases, but if not it will bubble/throw it up
to the parent for them to deal with it. In other words the function might be
thinking; "I'm not smart enough to handle this..."

It would probably be a good idea to compose AI bots like this. Different
levels of bots, each bubbling up questions they're _not smart enough_ to
answer to higher level bots...

~~~
derefr
The more-common opposite approach to AI (with a similar end-result)—often seen
in robotics—is called a _subsumption architecture_ : you have naive low-level
"mechanism" models making the mundane non-edge-case decisions, with no
awareness of the edge-cases; and then you have smarter case-specific "policy"
models, trained to recognize when the conditions apply for some particular
policy to be enforced, whose job is to change/override the decision the low-
level agent makes, by MITMing ("subsuming") the lower-level model. Either they
stand in the way of the low-level agent's input, and control it by _biasing_
said input (i.e. lying to it, so it'll make the decision the high-level agent
wants); or they stand in the way of the low-level agent's output, and replace
it with their own output instead, when necessary.

This architecture is particularly convenient when the low-level models have
been constructed such that they give wacky results when asked out-of-domain
questions. Rather than training the low-level model to respond to out-of-
domain input in a sensible way (which might require far more training data,
and decrease robustness for in-domain input), you can keep the low-level model
unaware of the edges of the domain, and instead just add a high-level
"validator" model whose _only_ job is to recognize input that would be invalid
for the low-level model, and answer in its stead, replacing the low-level
model's noise answer with its own.

In an imperative PL sense, this is less like regular exception handling (where
the lower-level code _knows_ it doesn't know how to handle something, and so
explicitly bubbles up the error), and more like the higher-level code either
checking/modifying input in advance of passing it to the lower-level code; or,
in fact, _telling_ the low-level code how to handle any errors that crop up.
As if the low-level code just had a bald try{} block; and the determination of
what that try{} block tries to catch, would be up to the callers below the
block on the call stack, with the topmost caller getting the final decision on
whether to continue, recover, abort, etc. right at the moment of the initial
error, before ever returning its answer to its immediate parent. (A lot like
Lisp conditions, actually.)

~~~
joe_the_user
" _...you have naive low-level "mechanism" models making the mundane non-edge-
case decisions, with no awareness of the edge-cases; and then you have smarter
case-specific "policy" models, trained to recognize when the conditions apply
for some particular policy to be enforced, whose job is to change/override the
decision the low-level agent makes, by MITMing ("subsuming") the lower-level
model._"

It seems like handling the edge cases at "the top" has the inherent problem
that in a complex system, the edge-cases will grow faster than the normal
cases. Moreover, the edge-cases are more or less the sum of what the normal
approximators can't handle, "the top" has to be aware of every lower part and
compensate for it, meaning "the top" will grow in an unmanageable way.

Also, a higher function telling a lower function how to handle errors isn't
quite following the handle errors at the top approach, since this implies
lower level systems do have error handlers.

~~~
derefr
> this implies lower level systems do have error handlers

I'd picture it like the lower-level system having a try{} block (with no
corresponding catch clauses), the postdominant block of which would be where
you'd resume from if there'd an error within that block that gets caught and
resolved. The mechanism knows that the policy needs to have a clean "cut
point" (in AOP terms) to subsume it by; but the mechanism _doesn 't_ need to
know what the policy is going to _do_ at that cut point.

> Moreover, the edge-cases are more or less the sum of what the normal
> approximators can't handle, "the top" has to be aware of every lower part
> and compensate for it

The normal case isn't having A, B, C, and D all overriding E's behavior at
different scopes. It's _possible_ and _useful_ to have that capability, but
it'd be uncommon to use it. More common with subsumption would be that {B, C,
D}—all siblings—override A's behavior in complementary ways for different
edge-cases; and then {E, F, G}—all siblings—might override _B 's_ behavior,
including overriding _the inputs it uses to compute its override values for A_
, but _not_ including the _logic_ it uses to compute that override.

Or, in short: you subsume your "direct reports" by changing their _goals_ and
modifying their _work products_ ; but you don't micromanage down the org-
chart. (Except when micro-managing gets you something.)

Also keep in mind, we're not talking about a tree with one central top-level
agent. Robotics systems don't work like that, for several good reasons. It's
better viewed upside-down: each one low-level agent is the root of a tree of
patches to its behavior, where 1 gets overridden by 2 and 3, 2 gets overridden
by {4, 5, 6}, etc.

Notice that a node like 5, that determines how to bias the input into 2, isn't
modelling the behavior of 1; it's just trying to directly influence 2 on 2's
own terms, with everything below 2 being a black box.

The full system is, then, a forest of agents, where all agents are sourcing
shared inputs and are capable of observing any other tree-node's outputs; but
there are no agents _shared_ between different trees in the forest, only
distinct per-tree copies with similar functions, trying to do the same job
while embedded in different local environments.

A good example of subsumption—without any Machine Learning to get in the
way—is how a subway system works:

1\. Each wheel has a low-level agent in its motor firmware that "wants" to
calibrate power output so that the train will continue along the track at a
constant speed. It is built as a simple control system: it can see the train's
speed, and it will provide more or less power if the train is going faster or
slower than its reference. (As well, there's another low-level control system
attached to the brakes, that _also_ observe the train's current speed and set-
point speed, and will engage the brakes as long as the current speed is much
higher than the reference speed.)

2\. There's a higher-level agent, still one per motor, responsible for not
colliding with things on the track (e.g. other trains.) It can observe when it
enters contested sections of track (and, more recently, can read off a front-
facing depth-sensor); and will _bias_ the reference-point values for the
lower-level wheel-motor and brake systems, so that they "seek to stop" more
fervently, the closer the train gets to any obstacle. You can sort of see this
like an agent that stands between a gas pedal in a car and the actual signal
to the gearbox; but rather than influencing the signal's output for
everything, it's an individualized influence for each wheel. (This is
important in robotic systems, because hardware is built to varying
tolerances—different wheels on the same train can require different amounts of
power to generate the same torque!)

3\. There's a higher-level agent still—again, still one per motor!—responsible
for stopping at stations, which does this by biasing the _collision_ agent to
temporarily think there's another train just far-enough away that it must stop
where the station is in order to avoid it.

Note how the stopping-at-stations agent doesn't know how the collision-
avoidance agent actually does collision avoidance. It doesn't know it's
overriding a wheel-power agent further down. It just "delegates" the
requirement, by changing the requirements for its subsumed subordinate agent.

And, yes, the stopping-at-stations agent wouldn't get a very good output
without some careful fine-tuning of where to produce the illusionary train in
the collision-detection agent's input model. That's why ML _is_ used with
these systems: to tune inputs like that, in accordance with what parameter the
upper-level agent can emit that will best make the lower-level agent do what
it wants to accomplish its own goal, without actually needing to understand
how the lower-level agent is accomplishing that goal—only the ability to
observe how closely the train manages to line up with the station.

If you're wondering why it's all duplicated so much: consider how this system
reacts if there's a bug in one motor's sensors that makes it observe the wrong
input values. It'll start misbehaving as a result. Certainly, with some kind
of online-learning ML in play, the agent above it might realize it needs to
apply some sort of gamma curve to ramp the input/output from the agent to get
the results it expects. But even without ML, as long as the _other_ wheels are
all functioning, they'll _fight_ the misbehaving wheel, and they'll likely
_win_. I.e., even if one wheel-motor is going full tilt because it thinks the
train is stopped, the other ones will notice that the train is actually going
too fast, and apply their brakes.

It is theorized that this is, in _some_ ways, a model for how synapses work,
especially for connections projecting between different parts of your brain
(e.g. your neocortex vs. paleocortex.)

~~~
im3w1l
> 3\. There's a higher-level agent still—again, still one per
> motor!—responsible for stopping at stations, which does this by biasing the
> collision agent to temporarily think there's another train just far-enough
> away that it must stop where the station is in order to avoid it.

This sounds clever in the worst sense of the word. Introducing dependencies
and complexities for no reason. Want to increase safety margins from trains?
Suddenly it doesn't stop correctly at stations any more.

~~~
derefr
You don't change internal variables; you change top-level goal conditions (in
imperative programming terms, the "test suite"), and retrain the system, and
all the internal variables change together to find a new equilibrium. That's
kind of... how ML works. It's how brains _seem_ to work, too. (Think about the
order in which you learn to do something like typing. Think about what
"levels" of that learning that you need to practice over again, if you switch
to a different keyboard layout.)

Also, to be clear, this is _already what subway systems do_. And roombas. And
a bunch of other types of robots. It turns out to work better, _in practice_ ,
than the alternative (= networks of leaf-node controllers with individual top-
down-issued config parameters, like computer networks are.)

Although, they don't do it because it's an easy system to develop in.
Subsumption is _operationally_ robust—it continues to do the right thing in
the face of failure of components, or even "subverted" components producing
malicious input. This is a little important when you're a Mars rover whose
firmware chips could be hit by gamma rays; but it's even more important when
you're an organic lifeform with parasites constantly attempting to hijack your
nervous system to their own ends (e.g. the _Toxoplasma_ bacterium and the
_Cordyceps_ fungi, two cases where the host isn't strong enough to resist the
hijacking. In the 99.999% of other cases of parasitic onslaught that you don't
hear about, the host _does_ still manage to "do the right thing" despite the
parasite.)

~~~
im3w1l
I didn't mean it to be a critique of the concept of subsumption itself, only
your specific example. For instance having the train avoidance system provide
a reference speed to the wheel controllers seems solid.

Having the station stopper create fake trains (or having the train avoider
create fake stations) does not seem robust. Would be better to have the
station stopper set reference speed, and give the train avoider the authority
to override it.

------
code4tee
Yes. This is applicable to most automation, machine learning and AI. These
technologies are generally really bad at sense checking their results.

At the end of the day they match patterns that are mathematically the “most
correct” pattern even if it’s obviously not the correct answer. This is the
Achilles heel of these technologies that is very hard to overcome and why any
real applications generally still have a human in the loop.

See Facebook’s “chatbot” experiment where they called it off after realizing
the only way to have it work in practice was to have an army of humans behind
the scenes sense checking answers. The grand AI engine that takes over for
humans is still a pipe dream for most applications.

Even all these “neutrality flags” are generally nothing more than a keyword
search. Put words like COVID-19 or Coronavirus in your post and Medium puts a
banner at the top saying the article hasn’t been fact checked.

------
saagarjha
One of the problems with automation reporting how sure it is of its answer is
that sometimes it really has no clue even about how confident it is. It'll
confidently say something is a dog and it's actually a table with some pixels
tweaked slightly. It's not really an easy problem to solve.

~~~
miscPerson
Okay — but is the AI overconfident in choices more or less often than a human?

------
tragomaskhalos
Google's inability to spot the negative implicit in "Shirt without stripes"
was posted here a couple of weeks ago:
[https://news.ycombinator.com/item?id=22925087](https://news.ycombinator.com/item?id=22925087)

------
blaser-waffle
They're vocalized search engines, not knowledge builders. Naturally, they spit
out what they see as a the top search, or something repeated often.

The "How Barack Hussein Obama is going to declare Martial Law" example is
great.

~~~
bryanrasmussen
it amused me that if you posted an article called "How Barack Hussein Obama is
going to declare Martial Law" to HN, the title would be rewritten to be
"Barack Hussein Obama is going to declare Martial Law"

amusement still applies with updated President of course.

------
dang
Discussed back then:
[https://news.ycombinator.com/item?id=13866493](https://news.ycombinator.com/item?id=13866493)

------
nurettin
Which search engine will understand sarcasm first? Which one will separate
confidence from facts? These are the things that should have been focused on
decades ago.

------
dorusr
There is no algorithm of truth, and with enough data you can arrive at any
conclusion you like. Maybe people should be better educated, instead of
machines becoming smarter.

~~~
tomxor
This is about _confidence_ not truth, confidence is built into ML, it's how it
works... the problem is these solutions currently just quantize that output
into a single result and obscure how close to the threshold of another
possible answer it was.

~~~
yters
Confidence is a function of truth.

