Hacker News new | past | comments | ask | show | jobs | submit login
A conversation with Sussman on AI and asynchronous programming (dustycloud.org)
181 points by paroneayea on Oct 14, 2015 | hide | past | web | favorite | 34 comments

"How am I talking to you right now?" Sussman asked. Sussman seemed to be talking about the shared symbolic values being held in the conversation, and at this point I started to understand. "Sure, when you're running the program, that whole thing is a black box. So is your brain. But you can explain to me the reasoning of why you did something. At that point, being able to inspect the symbolic reasoning of the system is all you have." And, Sussman explained, the propagator model carries its symbolic reasoning along with it.

I dunno. Are we talking about true "hard AI"? Because really, how much can Sussman or anyone introspect their own thought processes? We can tell ourselves stories about motivation and symbol manipulation, but those stories themselves are just more symbol manipulation. I'm not sure a sentient symbol-manipulator ever really "gets to the bottom" of true introspection of their own thought processes, including counterfactual thoughts they might have had.

This seems like an overly mechanistic approach, although perhaps workable in "soft AI"

You can usually come up with an explanation for why you did something. You don't have to "get to the bottom" of your own thought processes to do this: you just need to be able to reconstruct the symbolic manipulation part. This seems like a good thing for an AI to be able to do -- especially a truly "hard" AI that you'd trust to run things at a high level.

Humans often come up with an explanation for why they did something that's simple, consistent and also completely untrue. They sometimes even believe their own explanations. We may run into the same problem with an AI here - it may be, purposefully or not, providing us with untrue explanations.

Indeed, people even offer explanations when tricked into believing they had chosen the opposite outcome to the one they actually picked [1].

[1] http://researchgate.net/publication/6745688

> participants fail to notice mismatches between their intended choice and the outcome they are presented with, while nevertheless offering introspectively derived reasons for why they chose the way they did

I, for one, think that would be a much more fascinating problem than implementing neural nets where the representation of the underlying data is heavily obscured.

If only it is possible. The only powerful enough model for reasoning that comes to my mind are bayesian networks, and those will suffer from the same problems as neural nets - nodes, edges and values may not be related to any meaningful symbolic content that would be useful to report. It again seems in line with human mind; we invent symbols to describe some groups of similar concepts, with borders being naturally fuzzy and fluid.

If the AI tells us: "I did it because #:G042 and #:G4285 belong to the same #:G3346 #:G4216, while #:G1556 and #:G48592 #:G4499 #:G22461 #:G48118", I don't think we'll have learned anything useful.

My thoughts as well.

This is why I feel some people will never "let" AI happen. They want 100% certainty. Neural nets scare them. Despite the fact that real, human intelligence carries no certainty. You either have intelligence, and creativity, and uncertainty, and accidents. Or you have computers and rigid logic.

The interesting thing there is, when a space rocket malfunctions, when a car wrecks, or the stock market goes crazy... and it's a computer involved, it's always human error at the end of it. And, thus, intelligence that caused the problem. Computers mindless do exactly the thing they were given to do.

You either have intelligence, and creativity, and uncertainty, and accidents. Or you have computers and rigid logic.

Why does it have to be such a strict dichotomy? What are the specific tradeoffs being made between creativity and rigid logic? Is there really no way to pick something in the middle?

The interesting thing there is, when a space rocket malfunctions, when a car wrecks, or the stock market goes crazy... and it's a computer involved, it's always human error at the end of it. And, thus, intelligence that caused the problem.

Individual humans aren't always at the end of it. Sometimes you can trace the problem further, to the incentives applied to the human by some larger organization, society or system. Are human organizations also intelligent?

I think we're still very far away from a definition of "intelligence" that would let us answer all these questions. It's not clear that current AI research is bringing us any closer to that definition.

My point wasn't to start a philosophical debate on the term "intelligence". My point was, the intelligence that was recognize in humans (our only real guidepost at all, in terms of "AI") is closer to neural nets than the symbolic and expert systems of the AI winter. And that human intelligence is fuzzy and you have to accept the good and the bad.

It seems more inline with Justification based/Truth maintenance systems than hard AI.

There definitely exist "accountable" AI models. Things like decision trees and various types of regressions.

The thing is though... any sufficiently advanced AI is going to be unaccountable pretty much by definition. It's like, calculus is an extremely useful tool for predicting the temperature of a cooling object over time, but good luck explaining to a 3-year-old how to perform the necessary maths. The fact that they can't comprehend it doesn't mean that calculus isn't useful, it just means that it's beyond the 3-year-old's ability to intuitively grasp. In this smilie we're the 3-year-olds. :)

Seconded. Another extreme example would be human brains, which I don't think we understand enough for it to be "accountable" in any mathematical bounds, yet we trust them to make complex decisions. Statistical characterization of the behavior of the AI system is a better objective than the ones based on inherently biased symbolic systems. Just because it is the way how humans communicate it, doesn't mean it's the best way to do it.

There could be an extremely simple algorithm for intelligence. Just because we haven't figured it out yet doesn't mean it's unknowable.

Whenever I'm out of ideas I always end up going back to the Propagator/Logic Simulator chapters of SICP and rewriting them hoping to glean a bit more about the process and usability of such an abstraction.

I usually end up writing a Propagating logic simulator (combining the chapters) and marveling at the working forward/backward ripple carry adder. This time I've derived all the way down to switches which infer unknown values based on their current inputs and used them to successfully create the reversible logic functions required by the Propagator model.

Fascinating stuff and highly recommended reading for anybody interested in alternative models of computation.

I took that chapter and went in a different direction, making the network a DAG and building a functional reactive programming environment out of it.

Now, to go back and the propagator network and read the Scheme-Propagators source.

When you put all things into a blackbox and pray for the promising result, the whole methodology goes to a dead end because it's not revealing the truth but voodoo magic to self-cheating. Nowadays, people just put big data into a blackbox to make "AI". We need the AI system to tell us why, not how. Or it's not AI to help you, you made yourself a Lord to give you order. And you never have chance to ask it why.

Symbolic AI could be a chance, because it keeps necessary meta info to reveal reasonable things. But how to optimize it properly will be a problem.

This accountable argument is stone-old. You'll find it in an intro AI book. Of course only when implemented in lisp, so it needs be an older book.

It's the typical argument against neural nets, because they cannot explain their chain of reasoning, and so you are not able to train it to right way, or do not train into the wrong direction. when something goes wrong you got a problem.

Old AI had the same problem, that's why they added the chain to the backtracking to give better answers, and you are able to introspect answers.

Maybe true, but "stone-old" ideas don't mean "bad ideas". Neural networks were "stone old" until all this big data stuff went crazy and now suddenly they're on the frontpage of hacker news all the time again and people think they're the New Hotness (TM). Similar with many long-forgotten functional programming techniques which are being talked about as new stuff, much of it is refinements of old ideas, but finally hitting the mainstream.

The AI Winter is thawing... but whose lap will it fall into after the defrost? Big data mining organizations, or everyday hackers, or....?

My dream would be if Good Old-Fashioned AI (the symbolic, explainable kind Sussman is interested in) were to have the sudden redemption that neural nets had.

It was easier for neural nets, though, because they were close to a previously successful AI mechanism (machine learning with logistic regression), it's just that we had spent decades talking about them with different words for no good reason. There's a much larger gulf between GOFAI and what we do now.

Pattern matching has unbeatable performance benefits, at least for our current computers.

If you go deep in numerical calculus, you'll see that our computers are must better fit to work with continuous smooth functions than they are for discrete noise-like data. So, all the power goes to the people that turn their knowledge into a smaller set of interpolated curves. (And yes, I think that's very counter intuitive.)

Anyway, I'm not convinced this is a fundamental property of CS. It sounds at least possible that different architectures could have different constraints, and make symbolic AI viable. But it those would need to be very different architectures, not based on the Von Neuman or Turing's ideas at all.

When I mean "stone old" I mean of corse better than todays ideas. This should be clear from the context. Not everything which is fast is also good.

You can also come up with some kind Greenspun's tenth rule applied to neural net's:

"Any sufficiently complicated modern AI program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

In this case not even that.

Symbolic AI seems to me inherently a wrong concept.

> A more contrived relation to this in real life that I've been thinking about: if a child knocks over a vase, you might be angry at them, and they might have done the wrong thing. But why did they do it? If a child can explain to you that they knew you were afraid of insects, and swung at a fly going by, that can help you debug that social circumstance so you and the child can work together towards better behavior in the future.

Maybe. What if the real reason was a combination of a misconfigured bayesian network in child's head that mistakenly assigned high probability to seeing a fly when there was a dust speck combined with an overreactive heuristics that made it start waving hands around? Or something? There may not be a reason that can be specified in symbols relevant to anything else.

In general, a rational reasoner modelling a probability network would make a decision because all the items upstream added up to a particular distribution downstream. There are no symbols involved in such computation, and it would be strange to suddenly include them now.

Also, if you really sit do and do some serious introspecting, you'll realize that any symbols we assign to explain our thoughts are arbitrary and after-the-fact. Brains don't run on symbols, they generate them later for communications.

That's interesting. It takes me back to the '80s, when the logic-based AI people were in charge.

The "propagator" paper would be more interesting if there was a use case in the paper. The basic idea is to have a graph of computational elements, but insist that the elements be commutative, associative, and idempotent. Under those restrictions, you get the same answer despite race conditions, which leads to a simple model of asynchronous programming. But what is this good for? I could sort of see a theorem prover that works like this. Beyond logic problems, it's harder to find an application.

This might be a good way to implement consistent-eventually systems. Those are hard to get right, and having to build them out of elements which have the propagator components might help. Not sure this is possible, but it's worth a try if you have to solve that problem.

Without the propagators themselves being monotone they don't have enough to ensure determinism. They also don't have a guarantee of termination. e.g. repeatedly taking Heron steps on a rational interval approximation to compute the sqrt of 2 will continue to ascend the lattice indefinitely, as it isn't complete.

That said, they have many of the elements of the solution. With monotonicity this starts to resemble the more recent work on Lasp, which is being used explicitly to tackle the domain you mention: (strong) eventual consistency: or Kuper's work on LVars, where she drops the idempotent condition fairly early on in the thesis to get closer to CmRDTs, but which then burdens reads in a way that makes them automatically monotone.

Interesting, but I see one major problem with AI decision provenance being cases that mirror "Who sunk the boat" and "the straw that broke the camel's back". When you are integrating loads of sensory inputs to make a decision even if you have the best provenance, when you have any system based on a threshold you will encounter these cases where the information contained within the system itself is insufficient to reconstruct the cause of a failure. These are exceptionally hard problems and often require recognizing that we don't need to know the cause of something for certain but we need to identify the larger context which allowed that state to be arrived at in the first place so that we can prevent it in the future.

Talk about timely, I recently took the opportunity to unmothball the old propagators idea and have been running a bunch of ideas past Sussman's former student Alexey Radul on a fairly constant basis for the last 3 weeks.

I started a project at https://github.com/ekmett/propagators which at least gets the basic execution of them right, and have been working on a larger project.

Now, Sussman and Radul manage propagation and track provenance through using an assumption-based truth management system. This unfortunately results in a 2^n blowup in space, but if you change the schema somewhat you can treat it more like enumerating solutions in SAT, where you get a blowup, but in time not space -- and as usual with SAT "it usually isn't that bad".

In any event, a lot of work out there overlaps with the propagators story:

Lindsay Kuper's work on LVars drops idempotence and doesn't deal with 'triggering on change' but gets a parallel computation with forking and writes to "lattice variables" and uses threshold reads to manage communication.

Sussman and Radul's work really needs the notion of the propagators themselves being monotone functions between the semilattices that they are working with. Unfortunately in neither the paper or Radul's thesis do they ever actually spell that out.

Once you do spell it out you get something like Christopher Meiklejohn's on LASP. Both he and Lindsay Kuper have been tackling the issue of composing CRDTs lately. Her notion of threshold reads works well here when it can be applied because such functions are automatically monotone, there is nothing to check.

On the other hand, we can view things like datalog as a propagator network. You have tables which accumulate information monotonically. You have IDB statements that execute joins between those lattice variables. Now this starts to show some of the differences. There we don't bother sending the entire lattice. In seminaive evaluation of datalog we send deltas. So it becomes interesting instead to walk back the notion of a lattice variable and to think instead of a commutative (possibly idempotent) monoid acting on an ordered set. Now your "update language" is the monoid, and we can think of this in terms of firing updates at a set and tracking what makes it through, and propagating _that_. I've been playing with various ways to get a closer analogue to datalog processing and to effectively track these partial triggers.

Another issue is that propagator networks as they are currently designed typically do too much work. Too many things are woken up and fired. We can mitigate that somewhat. Consider a propagator that adds two lifted numbers. We can also add a propagator backwards that does the subtraction, etc. This yields a (local) propagator for which if you tell me any 2 of the arguments, I can tell you the third.

A more efficient scheme would be to adopt a "two watched literal" scheme like a SAT solver. Pick two lattice variables that are still at _|_. When one of those is written to, check to see if you're down to 1 variable not at _|_. If so we have to "wake up the propagator" and have it start listening to all of its inputs. If not disconnect this input and start listening to a different input. For 3 variables this isn't a big deal. For a propagator where you can use 499 variables to determine the 500th? You can get some nice zChaff like speadups!

We can also use a 2 watched literal scheme to kill a propagator and garbage collect it. If we use a covering relation to talk about 'immediate descendants of a node' in our lattice, we can look for things covered by our _|_ contradiction node. These are 'maximal' entries in our lattice. If your input is maximal you'll never get another update from that input. So if we take our (+) node as an example, once two of the 3 inputs are maximal this propagator has nothing more to say and can be removed from the network.

We can indicate that you don't want to say anything "new" to a lattice by adjoining a "frozen" flag. You can think of it as the moral equivalent of saying we'll tell you nothing new from here out. This winds up the moral equivalent of Lindsay's quiescence detection.

Now we can look at a propagator network itself as a lattice in one sense, in terms of adding propagators to the network as increasing information. Then taking the ability to tell the network that it is frozen to give us opportunities to topologically sort the network, in the same fashion as stratified datalog. Now we can be more intelligent about firing our propagators:

I can run the network in bottom up topological order, queuing updates from SCCs in parallel. It gets a little tricky if we want the 'warm start' scheme from 2 watched literals -- you need to make sure you don't miss updates if you want to be able to do some more CmRDT-like cases rather than CvRDTs.

Finally, there are a lot of problems where we can view a propagator network itself as something useful to implement a propagator.

Consider constraint propagation. We can view the classic AC-3 algorithm for enforcing arc consistency as a propagator network. Once it is done _now_ we have arc consistent domains to enumerate with our finite domain solver. Which we can drive in the fashion I mentioned above to avoid the ATMS overhead.

On the other hand, a linear programming or mixed integer linear programming solver can also give lattice like answers as you add cutting planes to the model and monotonically increase the information present. In the MILP case we typically loop over these considering relaxations which inform us of new cuts we can make, etc.

Both of these are computations that _use_ lattice variables with propagators between them to build a better propagator themselves.

They are also nicely both fairly 'convex' theories. You could run propagators for constraint programming and MILP on the same variables and they can mutually reinforce with nicer guarantees than the dumb (+) propagator that I mentioned.

That one runs into issues when you go to do something as simple as y = x + x, and try to run it backwards to get x, because it looks locally to the propagator like 2 unknowns. We could of course try to globally rewrite the propagator network, by adding propagators to work around that, in this case it works if you transform it into 2 * x, and now the 2 is a known as is the y, and you can get to x. Here it is probably better to just borrow tools from the clpqr crowd, but it requires an unfortunate amount of introspection on the propagator network.

I have a couple of dozen other sub-domains that fit into this model.

I mostly find it interesting how all of these subtly different domains over almost exactly the same tools with minor differences, and how powerful it is to adapt those differences to other problems in the nearby domain.

You may be interested in reading, if you haven't, our 2011 position paper on Dyna. (The 2012 paper on what I think you would call propagator networks was fun, but is nothing you don't know already, I'm sure.)


Indeed. Our old discussions about "omega-continuous semiring homomorphisms" as the way to try to make something half-way between Dyna and the datalog bits I was working on have been very much present in my mind lately. =)

(Sorry, Lindsey, not Lindsay.)

Propagators are closely related to CRDTs.

Unsurprisingly used in Lasp (distributed coordination), http://blog.acolyer.org/2015/08/17/lasp-a-language-for-distr...

back to sussman

symbolic AI has complexity/computability issue, I'm wondering how this is solved with propagator.

It is even slower. But not much. And with this information you usually can fix logical problems to improve the AI.

Good read. Interesting ideas.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact