I dunno. Are we talking about true "hard AI"? Because really, how much can Sussman or anyone introspect their own thought processes? We can tell ourselves stories about motivation and symbol manipulation, but those stories themselves are just more symbol manipulation. I'm not sure a sentient symbol-manipulator ever really "gets to the bottom" of true introspection of their own thought processes, including counterfactual thoughts they might have had.
This seems like an overly mechanistic approach, although perhaps workable in "soft AI"
> participants fail to notice mismatches between their intended choice and the outcome they are presented with, while nevertheless offering introspectively derived reasons for why they chose the way they did
If the AI tells us: "I did it because #:G042 and #:G4285 belong to the same #:G3346 #:G4216, while #:G1556 and #:G48592 #:G4499 #:G22461 #:G48118", I don't think we'll have learned anything useful.
This is why I feel some people will never "let" AI happen. They want 100% certainty. Neural nets scare them. Despite the fact that real, human intelligence carries no certainty. You either have intelligence, and creativity, and uncertainty, and accidents. Or you have computers and rigid logic.
The interesting thing there is, when a space rocket malfunctions, when a car wrecks, or the stock market goes crazy... and it's a computer involved, it's always human error at the end of it. And, thus, intelligence that caused the problem. Computers mindless do exactly the thing they were given to do.
Why does it have to be such a strict dichotomy? What are the specific tradeoffs being made between creativity and rigid logic? Is there really no way to pick something in the middle?
The interesting thing there is, when a space rocket malfunctions, when a car wrecks, or the stock market goes crazy... and it's a computer involved, it's always human error at the end of it. And, thus, intelligence that caused the problem.
Individual humans aren't always at the end of it. Sometimes you can trace the problem further, to the incentives applied to the human by some larger organization, society or system. Are human organizations also intelligent?
I think we're still very far away from a definition of "intelligence" that would let us answer all these questions. It's not clear that current AI research is bringing us any closer to that definition.
The thing is though... any sufficiently advanced AI is going to be unaccountable pretty much by definition. It's like, calculus is an extremely useful tool for predicting the temperature of a cooling object over time, but good luck explaining to a 3-year-old how to perform the necessary maths. The fact that they can't comprehend it doesn't mean that calculus isn't useful, it just means that it's beyond the 3-year-old's ability to intuitively grasp. In this smilie we're the 3-year-olds. :)
I usually end up writing a Propagating logic simulator (combining the chapters) and marveling at the working forward/backward ripple carry adder. This time I've derived all the way down to switches which infer unknown values based on their current inputs and used them to successfully create the reversible logic functions required by the Propagator model.
Fascinating stuff and highly recommended reading for anybody interested in alternative models of computation.
Now, to go back and the propagator network and read the Scheme-Propagators source.
Symbolic AI could be a chance, because it keeps necessary meta info to reveal reasonable things. But how to optimize it properly will be a problem.
It's the typical argument against neural nets, because they cannot explain their chain of reasoning, and so you are not able to train it to right way, or do not train into the wrong direction. when something goes wrong you got a problem.
Old AI had the same problem, that's why they added the chain to the backtracking to give better answers, and you are able to introspect answers.
The AI Winter is thawing... but whose lap will it fall into after the defrost? Big data mining organizations, or everyday hackers, or....?
It was easier for neural nets, though, because they were close to a previously successful AI mechanism (machine learning with logistic regression), it's just that we had spent decades talking about them with different words for no good reason. There's a much larger gulf between GOFAI and what we do now.
If you go deep in numerical calculus, you'll see that our computers are must better fit to work with continuous smooth functions than they are for discrete noise-like data. So, all the power goes to the people that turn their knowledge into a smaller set of interpolated curves. (And yes, I think that's very counter intuitive.)
Anyway, I'm not convinced this is a fundamental property of CS. It sounds at least possible that different architectures could have different constraints, and make symbolic AI viable. But it those would need to be very different architectures, not based on the Von Neuman or Turing's ideas at all.
You can also come up with some kind Greenspun's tenth rule applied to neural net's:
"Any sufficiently complicated modern AI program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
In this case not even that.
> A more contrived relation to this in real life that I've been thinking about: if a child knocks over a vase, you might be angry at them, and they might have done the wrong thing. But why did they do it? If a child can explain to you that they knew you were afraid of insects, and swung at a fly going by, that can help you debug that social circumstance so you and the child can work together towards better behavior in the future.
Maybe. What if the real reason was a combination of a misconfigured bayesian network in child's head that mistakenly assigned high probability to seeing a fly when there was a dust speck combined with an overreactive heuristics that made it start waving hands around? Or something? There may not be a reason that can be specified in symbols relevant to anything else.
In general, a rational reasoner modelling a probability network would make a decision because all the items upstream added up to a particular distribution downstream. There are no symbols involved in such computation, and it would be strange to suddenly include them now.
Also, if you really sit do and do some serious introspecting, you'll realize that any symbols we assign to explain our thoughts are arbitrary and after-the-fact. Brains don't run on symbols, they generate them later for communications.
The "propagator" paper would be more interesting if there was a use case in the paper. The basic idea is to have a graph of computational elements, but insist that the elements be commutative, associative, and idempotent. Under those restrictions, you get the same answer despite race conditions, which leads to a simple model of asynchronous programming. But what is this good for? I could sort of see a theorem prover that works like this. Beyond logic problems, it's harder to find an application.
This might be a good way to implement consistent-eventually systems. Those are hard to get right, and having to build them out of elements which have the propagator components might help. Not sure this is possible, but it's worth a try if you have to solve that problem.
That said, they have many of the elements of the solution. With monotonicity this starts to resemble the more recent work on Lasp, which is being used explicitly to tackle the domain you mention: (strong) eventual consistency: or Kuper's work on LVars, where she drops the idempotent condition fairly early on in the thesis to get closer to CmRDTs, but which then burdens reads in a way that makes them automatically monotone.
I started a project at https://github.com/ekmett/propagators
which at least gets the basic execution of them right, and have been working on a larger project.
Now, Sussman and Radul manage propagation and track provenance through using an assumption-based truth management system. This unfortunately results in a 2^n blowup in space, but if you change the schema somewhat you can treat it more like enumerating solutions in SAT, where you get a blowup, but in time not space -- and as usual with SAT "it usually isn't that bad".
In any event, a lot of work out there overlaps with the propagators story:
Lindsay Kuper's work on LVars drops idempotence and doesn't deal with 'triggering on change' but gets a parallel computation with forking and writes to "lattice variables" and uses threshold reads to manage communication.
Sussman and Radul's work really needs the notion of the propagators themselves being monotone functions between the semilattices that they are working with. Unfortunately in neither the paper or Radul's thesis do they ever actually spell that out.
Once you do spell it out you get something like Christopher Meiklejohn's on LASP. Both he and Lindsay Kuper have been tackling the issue of composing CRDTs lately. Her notion of threshold reads works well here when it can be applied because such functions are automatically monotone, there is nothing to check.
On the other hand, we can view things like datalog as a propagator network. You have tables which accumulate information monotonically. You have IDB statements that execute joins between those lattice variables. Now this starts to show some of the differences. There we don't bother sending the entire lattice. In seminaive evaluation of datalog we send deltas. So it becomes interesting instead to walk back the notion of a lattice variable and to think instead of a commutative (possibly idempotent) monoid acting on an ordered set. Now your "update language" is the monoid, and we can think of this in terms of firing updates at a set and tracking what makes it through, and propagating _that_. I've been playing with various ways to get a closer analogue to datalog processing and to effectively track these partial triggers.
Another issue is that propagator networks as they are currently designed typically do too much work. Too many things are woken up and fired. We can mitigate that somewhat. Consider a propagator that adds two lifted numbers. We can also add a propagator backwards that does the subtraction, etc. This yields a (local) propagator for which if you tell me any 2 of the arguments, I can tell you the third.
A more efficient scheme would be to adopt a "two watched literal" scheme like a SAT solver. Pick two lattice variables that are still at _|_. When one of those is written to, check to see if you're down to 1 variable not at _|_. If so we have to "wake up the propagator" and have it start listening to all of its inputs. If not disconnect this input and start listening to a different input. For 3 variables this isn't a big deal. For a propagator where you can use 499 variables to determine the 500th? You can get some nice zChaff like speadups!
We can also use a 2 watched literal scheme to kill a propagator and garbage collect it. If we use a covering relation to talk about 'immediate descendants of a node' in our lattice, we can look for things covered by our _|_ contradiction node. These are 'maximal' entries in our lattice. If your input is maximal you'll never get another update from that input. So if we take our (+) node as an example, once two of the 3 inputs are maximal this propagator has nothing more to say and can be removed from the network.
We can indicate that you don't want to say anything "new" to a lattice by adjoining a "frozen" flag. You can think of it as the moral equivalent of saying we'll tell you nothing new from here out. This winds up the moral equivalent of Lindsay's quiescence detection.
Now we can look at a propagator network itself as a lattice in one sense, in terms of adding propagators to the network as increasing information. Then taking the ability to tell the network that it is frozen to give us opportunities to topologically sort the network, in the same fashion as stratified datalog. Now we can be more intelligent about firing our propagators:
I can run the network in bottom up topological order, queuing updates from SCCs in parallel. It gets a little tricky if we want the 'warm start' scheme from 2 watched literals -- you need to make sure you don't miss updates if you want to be able to do some more CmRDT-like cases rather than CvRDTs.
Finally, there are a lot of problems where we can view a propagator network itself as something useful to implement a propagator.
Consider constraint propagation. We can view the classic AC-3 algorithm for enforcing arc consistency as a propagator network. Once it is done _now_ we have arc consistent domains to enumerate with our finite domain solver. Which we can drive in the fashion I mentioned above to avoid the ATMS overhead.
On the other hand, a linear programming or mixed integer linear programming solver can also give lattice like answers as you add cutting planes to the model and monotonically increase the information present. In the MILP case we typically loop over these considering relaxations which inform us of new cuts we can make, etc.
Both of these are computations that _use_ lattice variables with propagators between them to build a better propagator themselves.
They are also nicely both fairly 'convex' theories. You could run propagators for constraint programming and MILP on the same variables and they can mutually reinforce with nicer guarantees than the dumb (+) propagator that I mentioned.
That one runs into issues when you go to do something as simple as y = x + x, and try to run it backwards to get x, because it looks locally to the propagator like 2 unknowns. We could of course try to globally rewrite the propagator network, by adding propagators to work around that, in this case it works if you transform it into 2 * x, and now the 2 is a known as is the y, and you can get to x. Here it is probably better to just borrow tools from the clpqr crowd, but it requires an unfortunate amount of introspection on the propagator network.
I have a couple of dozen other sub-domains that fit into this model.
I mostly find it interesting how all of these subtly different domains over almost exactly the same tools with minor differences, and how powerful it is to adapt those differences to other problems in the nearby domain.
back to sussman