
A conversation with Sussman on AI and asynchronous programming - paroneayea
http://dustycloud.org/blog/sussman-on-ai/
======
themgt
_" How am I talking to you right now?" Sussman asked. Sussman seemed to be
talking about the shared symbolic values being held in the conversation, and
at this point I started to understand. "Sure, when you're running the program,
that whole thing is a black box. So is your brain. But you can explain to me
the reasoning of why you did something. At that point, being able to inspect
the symbolic reasoning of the system is all you have." And, Sussman explained,
the propagator model carries its symbolic reasoning along with it._

I dunno. Are we talking about true "hard AI"? Because really, how much can
Sussman or anyone introspect their own thought processes? We can tell
ourselves stories about motivation and symbol manipulation, but those stories
themselves are just more symbol manipulation. I'm not sure a sentient symbol-
manipulator ever really "gets to the bottom" of true introspection of their
own thought processes, including counterfactual thoughts they might have had.

This seems like an overly mechanistic approach, although perhaps workable in
"soft AI"

~~~
panic
You can usually come up with an explanation for why you did something. You
don't have to "get to the bottom" of your own thought processes to do this:
you just need to be able to reconstruct the symbolic manipulation part. This
seems like a good thing for an AI to be able to do -- _especially_ a truly
"hard" AI that you'd trust to run things at a high level.

~~~
TeMPOraL
Humans often come up with an explanation for why they did something that's
simple, consistent and also completely untrue. They sometimes even believe
their own explanations. We may run into the same problem with an AI here - it
may be, purposefully or not, providing us with untrue explanations.

~~~
neltnerb
I, for one, think that would be a much more fascinating problem than
implementing neural nets where the representation of the underlying data is
heavily obscured.

~~~
TeMPOraL
If only it is possible. The only powerful enough model for reasoning that
comes to my mind are bayesian networks, and those will suffer from the same
problems as neural nets - nodes, edges and values may not be related to any
meaningful symbolic content that would be useful to report. It again seems in
line with human mind; we invent symbols to describe some groups of similar
concepts, with borders being naturally fuzzy and fluid.

If the AI tells us: "I did it because #:G042 and #:G4285 belong to the same
#:G3346 #:G4216, while #:G1556 and #:G48592 #:G4499 #:G22461 #:G48118", I
don't think we'll have learned anything useful.

------
kcorbitt
There definitely exist "accountable" AI models. Things like decision trees and
various types of regressions.

The thing is though... any sufficiently advanced AI is going to be
unaccountable pretty much by definition. It's like, calculus is an extremely
useful tool for predicting the temperature of a cooling object over time, but
good luck explaining to a 3-year-old how to perform the necessary maths. The
fact that they can't comprehend it doesn't mean that calculus isn't useful, it
just means that it's beyond the 3-year-old's ability to intuitively grasp. In
this smilie we're the 3-year-olds. :)

~~~
xzephyr
Seconded. Another extreme example would be human brains, which I don't think
we understand enough for it to be "accountable" in any mathematical bounds,
yet we trust them to make complex decisions. Statistical characterization of
the behavior of the AI system is a better objective than the ones based on
inherently biased symbolic systems. Just because it is the way how humans
communicate it, doesn't mean it's the best way to do it.

------
kruhft
Whenever I'm out of ideas I always end up going back to the Propagator/Logic
Simulator chapters of SICP and rewriting them hoping to glean a bit more about
the process and usability of such an abstraction.

I usually end up writing a Propagating logic simulator (combining the
chapters) and marveling at the working forward/backward ripple carry adder.
This time I've derived all the way down to switches which infer unknown values
based on their current inputs and used them to successfully create the
reversible logic functions required by the Propagator model.

Fascinating stuff and highly recommended reading for anybody interested in
alternative models of computation.

~~~
davexunit
I took that chapter and went in a different direction, making the network a
DAG and building a functional reactive programming environment out of it.

Now, to go back and the propagator network and read the Scheme-Propagators
source.

------
nalaginrut
When you put all things into a blackbox and pray for the promising result, the
whole methodology goes to a dead end because it's not revealing the truth but
voodoo magic to self-cheating. Nowadays, people just put big data into a
blackbox to make "AI". We need the AI system to tell us why, not how. Or it's
not AI to help you, you made yourself a Lord to give you order. And you never
have chance to ask it why.

Symbolic AI could be a chance, because it keeps necessary meta info to reveal
reasonable things. But how to optimize it properly will be a problem.

------
rurban
This accountable argument is stone-old. You'll find it in an intro AI book. Of
course only when implemented in lisp, so it needs be an older book.

It's the typical argument against neural nets, because they cannot explain
their chain of reasoning, and so you are not able to train it to right way, or
do not train into the wrong direction. when something goes wrong you got a
problem.

Old AI had the same problem, that's why they added the chain to the
backtracking to give better answers, and you are able to introspect answers.

~~~
paroneayea
Maybe true, but "stone-old" ideas don't mean "bad ideas". Neural networks were
"stone old" until all this big data stuff went crazy and now suddenly they're
on the frontpage of hacker news all the time again and people think they're
the New Hotness (TM). Similar with many long-forgotten functional programming
techniques which are being talked about as new stuff, much of it is
refinements of old ideas, but finally hitting the mainstream.

The AI Winter is thawing... but whose lap will it fall into after the defrost?
Big data mining organizations, or everyday hackers, or....?

~~~
rspeer
My dream would be if Good Old-Fashioned AI (the symbolic, explainable kind
Sussman is interested in) were to have the sudden redemption that neural nets
had.

It was easier for neural nets, though, because they were close to a previously
successful AI mechanism (machine learning with logistic regression), it's just
that we had spent decades talking about them with different words for no good
reason. There's a much larger gulf between GOFAI and what we do now.

~~~
marcosdumay
Pattern matching has unbeatable performance benefits, at least for our current
computers.

If you go deep in numerical calculus, you'll see that our computers are must
better fit to work with continuous smooth functions than they are for discrete
noise-like data. So, all the power goes to the people that turn their
knowledge into a smaller set of interpolated curves. (And yes, I think that's
very counter intuitive.)

Anyway, I'm not convinced this is a fundamental property of CS. It sounds at
least possible that different architectures could have different constraints,
and make symbolic AI viable. But it those would need to be very different
architectures, not based on the Von Neuman or Turing's ideas at all.

------
TeMPOraL
Symbolic AI seems to me inherently a wrong concept.

> _A more contrived relation to this in real life that I 've been thinking
> about: if a child knocks over a vase, you might be angry at them, and they
> might have done the wrong thing. But why did they do it? If a child can
> explain to you that they knew you were afraid of insects, and swung at a fly
> going by, that can help you debug that social circumstance so you and the
> child can work together towards better behavior in the future._

Maybe. What if the real reason was a combination of a misconfigured bayesian
network in child's head that mistakenly assigned high probability to seeing a
fly when there was a dust speck combined with an overreactive heuristics that
made it start waving hands around? Or something? There may not be a reason
that can be specified in symbols relevant to anything else.

In general, a rational reasoner modelling a probability network would make a
decision because all the items upstream added up to a particular distribution
downstream. There are no symbols involved in such computation, and it would be
strange to suddenly include them now.

Also, if you really sit do and do some serious introspecting, you'll realize
that any symbols we assign to explain our thoughts are arbitrary and after-
the-fact. Brains don't run on symbols, they generate them later for
communications.

------
Animats
That's interesting. It takes me back to the '80s, when the logic-based AI
people were in charge.

The "propagator" paper would be more interesting if there was a use case in
the paper. The basic idea is to have a graph of computational elements, but
insist that the elements be commutative, associative, and idempotent. Under
those restrictions, you get the same answer despite race conditions, which
leads to a simple model of asynchronous programming. But what is this good
for? I could sort of see a theorem prover that works like this. Beyond logic
problems, it's harder to find an application.

This might be a good way to implement consistent-eventually systems. Those are
hard to get right, and having to build them out of elements which have the
propagator components might help. Not sure this is possible, but it's worth a
try if you have to solve that problem.

~~~
edwardkmett
Without the propagators themselves being monotone they don't have enough to
ensure determinism. They also don't have a guarantee of termination. e.g.
repeatedly taking Heron steps on a rational interval approximation to compute
the sqrt of 2 will continue to ascend the lattice indefinitely, as it isn't
complete.

That said, they have many of the elements of the solution. With monotonicity
this starts to resemble the more recent work on Lasp, which is being used
explicitly to tackle the domain you mention: (strong) eventual consistency: or
Kuper's work on LVars, where she drops the idempotent condition fairly early
on in the thesis to get closer to CmRDTs, but which then burdens reads in a
way that makes them automatically monotone.

------
hyperion2010
Interesting, but I see one major problem with AI decision provenance being
cases that mirror "Who sunk the boat" and "the straw that broke the camel's
back". When you are integrating loads of sensory inputs to make a decision
even if you have the best provenance, when you have any system based on a
threshold you will encounter these cases where the information contained
within the system itself is insufficient to reconstruct the cause of a
failure. These are exceptionally hard problems and often require recognizing
that we don't need to know the cause of something for certain but we need to
identify the larger context which allowed that state to be arrived at in the
first place so that we can prevent it in the future.

------
edwardkmett
Talk about timely, I recently took the opportunity to unmothball the old
propagators idea and have been running a bunch of ideas past Sussman's former
student Alexey Radul on a fairly constant basis for the last 3 weeks.

I started a project at
[https://github.com/ekmett/propagators](https://github.com/ekmett/propagators)
which at least gets the basic execution of them right, and have been working
on a larger project.

Now, Sussman and Radul manage propagation and track provenance through using
an assumption-based truth management system. This unfortunately results in a
2^n blowup in space, but if you change the schema somewhat you can treat it
more like enumerating solutions in SAT, where you get a blowup, but in time
not space -- and as usual with SAT "it usually isn't that bad".

In any event, a lot of work out there overlaps with the propagators story:

Lindsay Kuper's work on LVars drops idempotence and doesn't deal with
'triggering on change' but gets a parallel computation with forking and writes
to "lattice variables" and uses threshold reads to manage communication.

Sussman and Radul's work really needs the notion of the propagators themselves
being monotone functions between the semilattices that they are working with.
Unfortunately in neither the paper or Radul's thesis do they ever actually
spell that out.

Once you do spell it out you get something like Christopher Meiklejohn's on
LASP. Both he and Lindsay Kuper have been tackling the issue of composing
CRDTs lately. Her notion of threshold reads works well here when it can be
applied because such functions are automatically monotone, there is nothing to
check.

On the other hand, we can view things like datalog as a propagator network.
You have tables which accumulate information monotonically. You have IDB
statements that execute joins between those lattice variables. Now this starts
to show some of the differences. There we don't bother sending the entire
lattice. In seminaive evaluation of datalog we send deltas. So it becomes
interesting instead to walk back the notion of a lattice variable and to think
instead of a commutative (possibly idempotent) monoid acting on an ordered
set. Now your "update language" is the monoid, and we can think of this in
terms of firing updates at a set and tracking what makes it through, and
propagating _that_. I've been playing with various ways to get a closer
analogue to datalog processing and to effectively track these partial
triggers.

Another issue is that propagator networks as they are currently designed
typically do too much work. Too many things are woken up and fired. We can
mitigate that somewhat. Consider a propagator that adds two lifted numbers. We
can also add a propagator backwards that does the subtraction, etc. This
yields a (local) propagator for which if you tell me any 2 of the arguments, I
can tell you the third.

A more efficient scheme would be to adopt a "two watched literal" scheme like
a SAT solver. Pick two lattice variables that are still at _|_. When one of
those is written to, check to see if you're down to 1 variable not at _|_. If
so we have to "wake up the propagator" and have it start listening to all of
its inputs. If not disconnect this input and start listening to a different
input. For 3 variables this isn't a big deal. For a propagator where you can
use 499 variables to determine the 500th? You can get some nice zChaff like
speadups!

We can also use a 2 watched literal scheme to kill a propagator and garbage
collect it. If we use a covering relation to talk about 'immediate descendants
of a node' in our lattice, we can look for things covered by our _|_
contradiction node. These are 'maximal' entries in our lattice. If your input
is maximal you'll never get another update from that input. So if we take our
(+) node as an example, once two of the 3 inputs are maximal this propagator
has nothing more to say and can be removed from the network.

We can indicate that you don't want to say anything "new" to a lattice by
adjoining a "frozen" flag. You can think of it as the moral equivalent of
saying we'll tell you nothing new from here out. This winds up the moral
equivalent of Lindsay's quiescence detection.

Now we can look at a propagator network itself as a lattice in one sense, in
terms of adding propagators to the network as increasing information. Then
taking the ability to tell the network that it is frozen to give us
opportunities to topologically sort the network, in the same fashion as
stratified datalog. Now we can be more intelligent about firing our
propagators:

I can run the network in bottom up topological order, queuing updates from
SCCs in parallel. It gets a little tricky if we want the 'warm start' scheme
from 2 watched literals -- you need to make sure you don't miss updates if you
want to be able to do some more CmRDT-like cases rather than CvRDTs.

Finally, there are a lot of problems where we can view a propagator network
itself as something useful to implement a propagator.

Consider constraint propagation. We can view the classic AC-3 algorithm for
enforcing arc consistency as a propagator network. Once it is done _now_ we
have arc consistent domains to enumerate with our finite domain solver. Which
we can drive in the fashion I mentioned above to avoid the ATMS overhead.

On the other hand, a linear programming or mixed integer linear programming
solver can also give lattice like answers as you add cutting planes to the
model and monotonically increase the information present. In the MILP case we
typically loop over these considering relaxations which inform us of new cuts
we can make, etc.

Both of these are computations that _use_ lattice variables with propagators
between them to build a better propagator themselves.

They are also nicely both fairly 'convex' theories. You could run propagators
for constraint programming and MILP on the same variables and they can
mutually reinforce with nicer guarantees than the dumb (+) propagator that I
mentioned.

That one runs into issues when you go to do something as simple as y = x + x,
and try to run it backwards to get x, because it looks locally to the
propagator like 2 unknowns. We could of course try to globally rewrite the
propagator network, by adding propagators to work around that, in this case it
works if you transform it into 2 * x, and now the 2 is a known as is the y,
and you can get to x. Here it is probably better to just borrow tools from the
clpqr crowd, but it requires an unfortunate amount of introspection on the
propagator network.

I have a couple of dozen other sub-domains that fit into this model.

I mostly find it interesting how all of these subtly different domains over
almost exactly the same tools with minor differences, and how powerful it is
to adapt those differences to other problems in the nearby domain.

~~~
nwf
You may be interested in reading, if you haven't, our 2011 position paper on
Dyna. (The 2012 paper on what I think you would call propagator networks was
fun, but is nothing you don't know already, I'm sure.)

[http://dyna.org/wiki/index.php/Publications](http://dyna.org/wiki/index.php/Publications)

~~~
edwardkmett
Indeed. Our old discussions about "omega-continuous semiring homomorphisms" as
the way to try to make something half-way between Dyna and the datalog bits I
was working on have been very much present in my mind lately. =)

------
tonyg
Propagators are closely related to CRDTs.

~~~
agumonkey
Unsurprisingly used in Lasp (distributed coordination),
[http://blog.acolyer.org/2015/08/17/lasp-a-language-for-
distr...](http://blog.acolyer.org/2015/08/17/lasp-a-language-for-distributed-
coordination-free-programming/)

back to sussman

------
amirouche
symbolic AI has complexity/computability issue, I'm wondering how this is
solved with propagator.

~~~
rurban
It is even slower. But not much. And with this information you usually can fix
logical problems to improve the AI.

------
ecthiender
Good read. Interesting ideas.

