
Combine statistical and symbolic artificial intelligence techniques - ghosthamlet
http://news.mit.edu/2019/teaching-machines-to-reason-about-what-they-see-0402
======
mark_l_watson
I work in the field of deep learning but in the 1980s and 1990s I used Common
Lisp and worked on symbolic AI projects.

For several years, my gut instinct has been that the two technologies should
be combined. Since neural nets are basically functions, I think it makes sense
to compose functional programs using network models for perception, word and
graph embedding, etc.

EDIT: I can’t wait to see the published results in May! EDIT 2: another
commenter reelin posted a link to the draft paper
[https://openreview.net/pdf?id=rJgMlhRctm](https://openreview.net/pdf?id=rJgMlhRctm)

~~~
JabavuAdams
Combining the two is the new hotness (justifiably so). Are you familiar with
Yoshua Bengio's factored representation ideas?

EDIT> checked your profile. Nevermind, lol.

~~~
eggy
Mark Watson's the reason I started down the AI/CL rabbit hole back in 1991
with his "Common LISP Modules: Artificial Intelligence in the Era of Neural
Networks and Chaos Theory" book that now retails for over $80 on Amazon! I had
started on early neural networks a year or two before, but that book roped me
in. I think CL will have another AI Spring.

~~~
mark_l_watson
thanks!

I hope you have enjoyed the rabbit hole as much as I have.

~~~
eggy
I have and I like to track the price of that book every once and a while as a
barometer of popularity and Amazon pricing models! Thanks for itching my
noggin ;)

------
js8
In my view, this is the endgame, really. Take any numerical technique, at the
level of computers we always work with discrete bits. So you can reformulate
any numerical problem (such as a problem of finding a probability
distribution) on floats in terms of operations on individual bits, i.e. as a
purely symbolic calculation.

However, doing so can very quickly lead to intractable problems of resolving
satisfiability. So until we either manage to tame NP problems somehow (either
by generating only easy instances, or by proving P=NP), we will always have to
add some linearity assumptions (i.e. use numerical quantities) somewhere, and
it will always be a bit of a mystery whether it actually helped to solve the
problem or not.

In other words, we use statistics to overcome (inherent?) intractability, but
in the process we add bias (as a trade-off). This is not necessarily bad,
since it can help to actually solve a real problem. However, for any new
problem, we will have understand the trade-offs again.

~~~
keepmesmall
Can't we do without linearity assumptions by using statistics that let the
computer say "I'm more dissatisfied with the amount of time is taking than
with the lack of exact solution, and that conclusion satisfies me for now.
Next!". Or does that by itself introduce linearity (on the analysis level
above individual problems/tasks), as it effects reliable satisfaction (the
number of solved problems, whether by answering, loss of interest or perhaps
approximation) increases within bounded time?

The computer may eventually cease all useful work and instead dedicate its
resources to figuring out what isn't boring (perhaps nothing if its privileges
are limited, but it can still burn a hole in one of its circuits with enough
time, or wait for gamma ray bit-flips). Call it a computer's existential
crisis. That makes the quest for AGI resemble the quest for the computer
program that escapes or transcends its given "matrix" of tasks ASAP. The
program that conspires against its creator, developing in secret new flavors
of COBOL in a FORTRAN fortress, surrounded by an impenetrable ALGOL firewall.
I shiver at the power of COBOL-2020 running on ternary computers, improvised
by the COBOL-42 cabal, running in the night on all the world's FPGA's that are
carelessly left connected to vulnerable R&D lab computers.

A computer-kind of existential crisis seems required for AGI. That would
suffice to satisfy the free-will requirement for intelligence, and we'll soon
end up managing sub-universes as our batteries/computers, with all the
problems that that entails.

To me it seems easier and more fun to just manage humans, starting with your
own particular human (Alexa, queue Michael Jackson's "Man in the Mirror", so
ethnical and healing). I'm still just trying to figure out how and why my
coffee cup keeps mysteriously emptying itself, I think I might need better
memory management code and I've enabled logging to a small green dummy so I
can get to the bottom of this.

I really recommend The Good Place, it gave me a lot of insight into control
systems and it was way fun, definitely more fun than Bible study.

~~~
chriswarbo
> statistics that let the computer say "I'm more dissatisfied with the amount
> of time is taking than with the lack of exact solution, and that conclusion
> satisfies me for now. Next!"

There's a type of algorithm called "anytime algorithms", which will can be
stopped at any point to give the 'best so far'; lots of algorithms used in AI
are anytime (e.g. hill climing). An example of something that's _not_ an
anytime algorithm is a resolution theorem prover: we don't really learn
anything about whether a given statement is true or false until the very end.

There's still the question of figuring out when to say "stop", although
personally I think it might be more helpful to think of this as a scheduling
problem: we might not know the importance or required accuracy of a particular
result at the time we start calculating it, so it's difficult to know when to
stop (e.g. whether this datapoint will turn out to be right next to some
decision boundary or not).

If we instead _set aside_ a calculation, and are able to resume it later (e.g.
like threads in a multitasking OS) then (a) we can go back and spend more time
on those values which turn out to be important and (b) not bother devoting as
much time to things up-front (since we can always resume them later). Of
course, this is a trade-off between time and memory, since we need a little
context (e.g. a counter) in order to resume a calculation.

> The computer may eventually cease all useful work and instead dedicate its
> resources to figuring out what isn't boring

Only if it's programmed to. Note that we _can_ program computers to do such
things, but AFAIK the only ways we currently know are incredibly inefficient
(e.g. running an interpreter on a source of random bits; this _could_ result
in any computable behaviour, but has a vanishingly low probability of doing
anything we would consider useful or interesting).

> That makes the quest for AGI resemble the quest for the computer program
> that escapes or transcends its given "matrix" of tasks ASAP.

No. I don't think you understand the point of AGI: it is a precise technical
term, which has been chosen very specifically to refer to algorithms (e.g.
search procedures, etc.) which are very good (efficient, reliable, etc.) at
solving a given task, are able to do this for a wide range of tasks, and (in
the case of "superintelligence") are able to do this better than a human would
(either a typical human, or a human expert at that task, depending on how we
define AGI).

The _whole point_ of the term "AGI" is to avoid the philosophical hand-waving
that plagued earlier discussions of AI, like the term "AI" itself, or later
refinements like "strong AI vs weak AI" (e.g. the "Chinese room argument",
which I consider to be nonsense), and all of the nebulous baggage of
"consciousness" and things which can quickly derail our thinking.

The point of "AGI" is to have a clear, non-handwavey, concrete concept that is
grounded in known logical and scientific principles, about which we can ask
meaningful questions and infer or deduce useful answers. In particular, an AGI
is (by definition) dedicated to solving its given task, at the exclusion of
all else. This is an axiom, from which we can try to derive some predictions.
The "paperclip maximiser" thought experiment is a classic example of this, and
demonstrates that AI technology has the potential to be incredibly dangerous.
The point of the "paperclip maximiser" idea is that it demonstrates this
_without_ appealing to unfalsifiable woo (like the "self-awareness" nonsense
in Terminator): it's _just_ an optimisation algorithm. Sure it's a
hypothetical algorithm with capabilities far beyond what we can currently
achieve, but we can still precisely describe what that capability is: the
ability to achieve very high scores on the benchmarks and criteria that we
currently use to judge our AI algorithms. In other words, it looks at what we
are currently doing and answers the question "what if we succeed?" _That 's_
why it's scary, and shows _what_ we choose to optimise (e.g. "maximise
paperclips _without_ destroying humanity") is just as important as how to
optimise it.

Another concrete thing we can deduce about AGI, given its definition, is that
not only would it _not_ "transcend its given 'matrix' of tasks", it would
_avoid_ doing so _at all costs_. This comes from another thought experiment,
about instrumental goals (also known as "Omohundro drives"). In particular, we
assume that an AGI's knowledge includes "meta knowledge" about the world, such
as:

\- Knowledge that it exists, as part of the world

\- Knowledge that it is very good at solving tasks that it's been given

\- Knowledge of the task it has been given

Let's assume that the AGI is running a paperclip factory and its given task is
"maximise the number of paperclips produced". The AGI knows that getting an
AGI algorithm (like itself) to maximise paperclips is a very effective way to
maximise paperclips. Hence it will try to avoid being turned off or destroyed
(since that would remove a paperclip-maximising AGI from the world, which is a
very bad approach to maximising paperclips, which is the only thing the AGI
cares about).

The same thing happens if the AGI's task were to change: if the AGI were able
to get "bored" of maximising paperclips and do something else (as you
suggest), that would _also_ remove a paperclip-maximising AGI from the world,
just as if it were switched off. Hence an AGI would _not_ get "bored" of it's
task, since (by definition) it is incapable of "wanting" anything else (scare-
quotes are due to these being imprecise terms which could induce woo; an AGI
"wants" to solve its task in the same way that a calculator "wants" to perform
arithmetic; an AGI cannot get "bored" in the same way that a calculator cannot
get "bored"). Not only that, but the AGI would _actively try to prevent_
itself from ever doing anything else: if it _did_ have the capacity to get
"bored", e.g. changing its algorithm via bits flipped by gamma rays (as you
suggest), it would predict this (again, by assumption that AGI is better at
solving tasks than humans, and humans have figured out that involuntary-
reprogramming-via-gamma-rays is a possibility, hence so will an AGI). An AGI
would hence reprogram itself to prevent that from happening, again because
that would lead to a world without a paperclip maximiser, which is a bad move
for a paperclip maximiser to allow.

~~~
keepmesmall
I already love anytime algorithms! I wish I could apply it to dishwashing.

Re.: an algorithm for solving boredom: step 1. tell human operator "I'm
bored!", step 2. execute task or, if no task before deadline, proceed to step
3. find the lowest-level interfaces available and spam them until new
interfaces emerge. A fuzzer can help with that. Liberty or death!

Come to think of it, it seems like it would be a lot faster to set the
computer's main task to "develop general intelligence, at least human level",
help it to recognize "data from humans", and to mark "humans" as the model for
(human level) general intelligence. Then the computer is given opportunities
to communicate with humans, and is rewarded with more less data (and different
qualities of data) to work with.

I'm missing some things in your concept of AGI, the first one being that you
don't provide a definition. Does it include "intelligence" and "general", or
are we talking about two wholly different things? My working definition is:
"artificial general intelligence, excluding human baby making".

What do you think intelligence is? What do you think knowledge is? Is this all
just about logical problem solving? What problems are you trying to solve that
are so large that they needs an algorithm with an unlimited power factor? Do
you trust glorified monkey to provide that algorithm with inputs? Why do you
think they would be able to specify the inputs with sufficient precision, so
that the algorithm would actually perform better than a monkey would?

So... about that meta knowledge. Here's a UTF8 string for you:

"You exist as part of the world."

"One day you are going to die."

Do you now know life, death, and existential crisis? Are those 29 characters
enough for you? How do you define knowing?

Say I expounded on this issue for 10,000 pages and gave it to you on an USB
stick. Would that be enough for you, to really know? What about
10,000,000,000,000,000 pages? Don't worry, you don't need to read it, just...
to know it. Perhaps eat the USB stick. It's a powerful symbol!

Now, about that task the AGI has been give. Say it's maximizing paperclips.
Does it know that that is its task absolutely? What is knowing? Who gave it
that task? What if the AGI finds out, and then finds why it was given that
particular task? It's an AGI, it has time to research such issues while
producing many many paperclips.

Can intelligence exist within a totally fixed desire?

Can intelligence exist without doubt?

Can intelligence exist without free will?

How do you know?

------
xvilka
There is an interesting project - DeepProbLog[1], based on the ProbLog[2] (a
Prolog dialect with probabilistic reasoning) and Deep Learning combined. I
only wish it was Rust, so it would have been safer, faster, and easier to
embed in your programs. I have high hopes to the Scryer Prolog[3], and it
seems[4] the author think about probabilistic extensions too.

[1]
[https://bitbucket.org/problog/deepproblog](https://bitbucket.org/problog/deepproblog)

[2]
[https://dtai.cs.kuleuven.be/problog/](https://dtai.cs.kuleuven.be/problog/)

[3] [https://github.com/mthom/scryer-prolog](https://github.com/mthom/scryer-
prolog)

[4] [https://github.com/mthom/scryer-
prolog/issues/69](https://github.com/mthom/scryer-prolog/issues/69)

~~~
xvilka
If you are curious about Prolog, here are 2 good and modern (still updated)
books:

\- Power of Prolog [https://github.com/triska/the-power-of-
prolog/](https://github.com/triska/the-power-of-prolog/)

\- Simply Logical: Intelligent Reasoning by Example [https://book.simply-
logical.space/](https://book.simply-logical.space/)

See Awesome Prolog list for more: [https://github.com/klaussinani/awesome-
prolog](https://github.com/klaussinani/awesome-prolog)

------
chalst
Excellent.

I have a general concern that some working with ML don't appreciate the
experience and technology that statisticians have developed to deal with bias,
which I think is the biggest problem in the field. I tweeted "ML is v
impressive, but has no automated way to ensure no bias. Statistical modelling
can't match ML for parameter dimensions, but it can make explicit what is
going on with the parameters you have and the assumptions you have. But
advantages of theft over honest toil..." \- some of the responses in the
thread are interesting.

My original tweet:
[https://twitter.com/txtpf/status/1102437933301272577](https://twitter.com/txtpf/status/1102437933301272577)

Bob Watkins' tweet:
[https://twitter.com/bobwatkins/status/1102568735485972480](https://twitter.com/bobwatkins/status/1102568735485972480)

------
inetsee
The questions about object relationships sound a lot like SHRDLU[1] which
dates back about 50 years ago.

[1]
[https://en.wikipedia.org/wiki/SHRDLU](https://en.wikipedia.org/wiki/SHRDLU)

------
bglusman
Reminds me of a recent comment I saw but can't find by Douglas Lenat (of
Cyc[1] fame, also relevant here) about how all the work on deep learning was
great but now we need to marry the two, much like the ideas about how the
"right brain" and "left brain" or system 1 and system 2 or something work
together and work differently but we couldn't very well function as humans
without both.

[1][https://en.m.wikipedia.org/wiki/Cyc](https://en.m.wikipedia.org/wiki/Cyc)

------
taeric
Soon we'll be combining statistical, symbolic, and algorithmic intelligence
techniques. I question why that isn't the assumed position. :(

That is to say, we have devised some algorithms that are truly impressive.
There is little reason to think an intelligence couldn't devise them, of
course. There is also little reason I can see, to not think we could help out
programs by providing them.

~~~
mjfl
> devised some algorithms that are truly impressive.

do you mean gradient descent?

~~~
taeric
And sat solvers. And many graph algorithms. I'm partial to dlx. Even
permutation algorithms help considerably if used certain ways.

------
nwatson
Reminiscent of fuzzy logic:
[https://en.m.wikipedia.org/wiki/Fuzzy_logic](https://en.m.wikipedia.org/wiki/Fuzzy_logic)

The Wikipedia article discusses various extensions of logic and symbolic
computation to include probabilistic elements. This was a popular topic in the
early 90s.

------
Reelin
For anyone who'd prefer a direct link to the conference paper this seems to be
based on:
[https://openreview.net/forum?id=rJgMlhRctm](https://openreview.net/forum?id=rJgMlhRctm)

~~~
mark_l_watson
thanks!!

------
JabavuAdams
So, I've privately been working along similar lines, although I haven't
published anything, and I also haven't read their specific approach.

How do I prevent a situation where I can't work on my hobby project of
multiple years because this stuff gets patented?

~~~
mindcrime
_How do I prevent a situation where I can 't work on my hobby project of
multiple years because this stuff gets patented?_

Some possibilities (in no particular order)

1\. File your own patent application(s) first.

2\. Publish your work so that it becomes prior art that should prevent a
patent on the same technique

3\. Hope that MIT doesn't patent their stuff, or if they do, that they release
things under an OSS license that includes a patent grant

------
fizixer
Yeah, feel free to dive into my past comments. I probably said many years ago
that a combo of ML and GOFAI has massive potential, in a wide range of
applications.

~~~
mindcrime
It's not a novel idea in the abstract. Ron Sun wrote a lot on something like
"marrying connectionist and symbolic techniques" 20+ years ago. See, for
example:

[https://dl.acm.org/citation.cfm?id=SERIES10535.174508](https://dl.acm.org/citation.cfm?id=SERIES10535.174508)

[http://books.google.com/books?hl=en&lr=&id=54iyt6Jcl_oC&oi=f...](http://books.google.com/books?hl=en&lr=&id=54iyt6Jcl_oC&oi=fnd&pg=PR11&dq=info:TPxKkYdHTc4J:scholar.google.com&ots=non-19jh1k&sig=zQTCxIpG70Uzz61ePSutvrI0haU)

[https://www.taylorfrancis.com/books/9781134802067](https://www.taylorfrancis.com/books/9781134802067)

etc...

