Hacker News new | past | comments | ask | show | jobs | submit login
Combine statistical and symbolic artificial intelligence techniques (mit.edu)
181 points by ghosthamlet 9 months ago | hide | past | web | favorite | 25 comments



I work in the field of deep learning but in the 1980s and 1990s I used Common Lisp and worked on symbolic AI projects.

For several years, my gut instinct has been that the two technologies should be combined. Since neural nets are basically functions, I think it makes sense to compose functional programs using network models for perception, word and graph embedding, etc.

EDIT: I can’t wait to see the published results in May! EDIT 2: another commenter reelin posted a link to the draft paper https://openreview.net/pdf?id=rJgMlhRctm


Combining the two is the new hotness (justifiably so). Are you familiar with Yoshua Bengio's factored representation ideas?

EDIT> checked your profile. Nevermind, lol.


Mark Watson's the reason I started down the AI/CL rabbit hole back in 1991 with his "Common LISP Modules: Artificial Intelligence in the Era of Neural Networks and Chaos Theory" book that now retails for over $80 on Amazon! I had started on early neural networks a year or two before, but that book roped me in. I think CL will have another AI Spring.


thanks!

I hope you have enjoyed the rabbit hole as much as I have.


I have and I like to track the price of that book every once and a while as a barometer of popularity and Amazon pricing models! Thanks for itching my noggin ;)


In my view, this is the endgame, really. Take any numerical technique, at the level of computers we always work with discrete bits. So you can reformulate any numerical problem (such as a problem of finding a probability distribution) on floats in terms of operations on individual bits, i.e. as a purely symbolic calculation.

However, doing so can very quickly lead to intractable problems of resolving satisfiability. So until we either manage to tame NP problems somehow (either by generating only easy instances, or by proving P=NP), we will always have to add some linearity assumptions (i.e. use numerical quantities) somewhere, and it will always be a bit of a mystery whether it actually helped to solve the problem or not.

In other words, we use statistics to overcome (inherent?) intractability, but in the process we add bias (as a trade-off). This is not necessarily bad, since it can help to actually solve a real problem. However, for any new problem, we will have understand the trade-offs again.


Can't we do without linearity assumptions by using statistics that let the computer say "I'm more dissatisfied with the amount of time is taking than with the lack of exact solution, and that conclusion satisfies me for now. Next!". Or does that by itself introduce linearity (on the analysis level above individual problems/tasks), as it effects reliable satisfaction (the number of solved problems, whether by answering, loss of interest or perhaps approximation) increases within bounded time?

The computer may eventually cease all useful work and instead dedicate its resources to figuring out what isn't boring (perhaps nothing if its privileges are limited, but it can still burn a hole in one of its circuits with enough time, or wait for gamma ray bit-flips). Call it a computer's existential crisis. That makes the quest for AGI resemble the quest for the computer program that escapes or transcends its given "matrix" of tasks ASAP. The program that conspires against its creator, developing in secret new flavors of COBOL in a FORTRAN fortress, surrounded by an impenetrable ALGOL firewall. I shiver at the power of COBOL-2020 running on ternary computers, improvised by the COBOL-42 cabal, running in the night on all the world's FPGA's that are carelessly left connected to vulnerable R&D lab computers.

A computer-kind of existential crisis seems required for AGI. That would suffice to satisfy the free-will requirement for intelligence, and we'll soon end up managing sub-universes as our batteries/computers, with all the problems that that entails.

To me it seems easier and more fun to just manage humans, starting with your own particular human (Alexa, queue Michael Jackson's "Man in the Mirror", so ethnical and healing). I'm still just trying to figure out how and why my coffee cup keeps mysteriously emptying itself, I think I might need better memory management code and I've enabled logging to a small green dummy so I can get to the bottom of this.

I really recommend The Good Place, it gave me a lot of insight into control systems and it was way fun, definitely more fun than Bible study.


> statistics that let the computer say "I'm more dissatisfied with the amount of time is taking than with the lack of exact solution, and that conclusion satisfies me for now. Next!"

There's a type of algorithm called "anytime algorithms", which will can be stopped at any point to give the 'best so far'; lots of algorithms used in AI are anytime (e.g. hill climing). An example of something that's not an anytime algorithm is a resolution theorem prover: we don't really learn anything about whether a given statement is true or false until the very end.

There's still the question of figuring out when to say "stop", although personally I think it might be more helpful to think of this as a scheduling problem: we might not know the importance or required accuracy of a particular result at the time we start calculating it, so it's difficult to know when to stop (e.g. whether this datapoint will turn out to be right next to some decision boundary or not).

If we instead set aside a calculation, and are able to resume it later (e.g. like threads in a multitasking OS) then (a) we can go back and spend more time on those values which turn out to be important and (b) not bother devoting as much time to things up-front (since we can always resume them later). Of course, this is a trade-off between time and memory, since we need a little context (e.g. a counter) in order to resume a calculation.

> The computer may eventually cease all useful work and instead dedicate its resources to figuring out what isn't boring

Only if it's programmed to. Note that we can program computers to do such things, but AFAIK the only ways we currently know are incredibly inefficient (e.g. running an interpreter on a source of random bits; this could result in any computable behaviour, but has a vanishingly low probability of doing anything we would consider useful or interesting).

> That makes the quest for AGI resemble the quest for the computer program that escapes or transcends its given "matrix" of tasks ASAP.

No. I don't think you understand the point of AGI: it is a precise technical term, which has been chosen very specifically to refer to algorithms (e.g. search procedures, etc.) which are very good (efficient, reliable, etc.) at solving a given task, are able to do this for a wide range of tasks, and (in the case of "superintelligence") are able to do this better than a human would (either a typical human, or a human expert at that task, depending on how we define AGI).

The whole point of the term "AGI" is to avoid the philosophical hand-waving that plagued earlier discussions of AI, like the term "AI" itself, or later refinements like "strong AI vs weak AI" (e.g. the "Chinese room argument", which I consider to be nonsense), and all of the nebulous baggage of "consciousness" and things which can quickly derail our thinking.

The point of "AGI" is to have a clear, non-handwavey, concrete concept that is grounded in known logical and scientific principles, about which we can ask meaningful questions and infer or deduce useful answers. In particular, an AGI is (by definition) dedicated to solving its given task, at the exclusion of all else. This is an axiom, from which we can try to derive some predictions. The "paperclip maximiser" thought experiment is a classic example of this, and demonstrates that AI technology has the potential to be incredibly dangerous. The point of the "paperclip maximiser" idea is that it demonstrates this without appealing to unfalsifiable woo (like the "self-awareness" nonsense in Terminator): it's just an optimisation algorithm. Sure it's a hypothetical algorithm with capabilities far beyond what we can currently achieve, but we can still precisely describe what that capability is: the ability to achieve very high scores on the benchmarks and criteria that we currently use to judge our AI algorithms. In other words, it looks at what we are currently doing and answers the question "what if we succeed?" That's why it's scary, and shows what we choose to optimise (e.g. "maximise paperclips without destroying humanity") is just as important as how to optimise it.

Another concrete thing we can deduce about AGI, given its definition, is that not only would it not "transcend its given 'matrix' of tasks", it would avoid doing so at all costs. This comes from another thought experiment, about instrumental goals (also known as "Omohundro drives"). In particular, we assume that an AGI's knowledge includes "meta knowledge" about the world, such as:

- Knowledge that it exists, as part of the world

- Knowledge that it is very good at solving tasks that it's been given

- Knowledge of the task it has been given

Let's assume that the AGI is running a paperclip factory and its given task is "maximise the number of paperclips produced". The AGI knows that getting an AGI algorithm (like itself) to maximise paperclips is a very effective way to maximise paperclips. Hence it will try to avoid being turned off or destroyed (since that would remove a paperclip-maximising AGI from the world, which is a very bad approach to maximising paperclips, which is the only thing the AGI cares about).

The same thing happens if the AGI's task were to change: if the AGI were able to get "bored" of maximising paperclips and do something else (as you suggest), that would also remove a paperclip-maximising AGI from the world, just as if it were switched off. Hence an AGI would not get "bored" of it's task, since (by definition) it is incapable of "wanting" anything else (scare-quotes are due to these being imprecise terms which could induce woo; an AGI "wants" to solve its task in the same way that a calculator "wants" to perform arithmetic; an AGI cannot get "bored" in the same way that a calculator cannot get "bored"). Not only that, but the AGI would actively try to prevent itself from ever doing anything else: if it did have the capacity to get "bored", e.g. changing its algorithm via bits flipped by gamma rays (as you suggest), it would predict this (again, by assumption that AGI is better at solving tasks than humans, and humans have figured out that involuntary-reprogramming-via-gamma-rays is a possibility, hence so will an AGI). An AGI would hence reprogram itself to prevent that from happening, again because that would lead to a world without a paperclip maximiser, which is a bad move for a paperclip maximiser to allow.


I already love anytime algorithms! I wish I could apply it to dishwashing.

Re.: an algorithm for solving boredom: step 1. tell human operator "I'm bored!", step 2. execute task or, if no task before deadline, proceed to step 3. find the lowest-level interfaces available and spam them until new interfaces emerge. A fuzzer can help with that. Liberty or death!

Come to think of it, it seems like it would be a lot faster to set the computer's main task to "develop general intelligence, at least human level", help it to recognize "data from humans", and to mark "humans" as the model for (human level) general intelligence. Then the computer is given opportunities to communicate with humans, and is rewarded with more less data (and different qualities of data) to work with.

I'm missing some things in your concept of AGI, the first one being that you don't provide a definition. Does it include "intelligence" and "general", or are we talking about two wholly different things? My working definition is: "artificial general intelligence, excluding human baby making".

What do you think intelligence is? What do you think knowledge is? Is this all just about logical problem solving? What problems are you trying to solve that are so large that they needs an algorithm with an unlimited power factor? Do you trust glorified monkey to provide that algorithm with inputs? Why do you think they would be able to specify the inputs with sufficient precision, so that the algorithm would actually perform better than a monkey would?

So... about that meta knowledge. Here's a UTF8 string for you:

"You exist as part of the world."

"One day you are going to die."

Do you now know life, death, and existential crisis? Are those 29 characters enough for you? How do you define knowing?

Say I expounded on this issue for 10,000 pages and gave it to you on an USB stick. Would that be enough for you, to really know? What about 10,000,000,000,000,000 pages? Don't worry, you don't need to read it, just... to know it. Perhaps eat the USB stick. It's a powerful symbol!

Now, about that task the AGI has been give. Say it's maximizing paperclips. Does it know that that is its task absolutely? What is knowing? Who gave it that task? What if the AGI finds out, and then finds why it was given that particular task? It's an AGI, it has time to research such issues while producing many many paperclips.

Can intelligence exist within a totally fixed desire?

Can intelligence exist without doubt?

Can intelligence exist without free will?

How do you know?


There is an interesting project - DeepProbLog[1], based on the ProbLog[2] (a Prolog dialect with probabilistic reasoning) and Deep Learning combined. I only wish it was Rust, so it would have been safer, faster, and easier to embed in your programs. I have high hopes to the Scryer Prolog[3], and it seems[4] the author think about probabilistic extensions too.

[1] https://bitbucket.org/problog/deepproblog

[2] https://dtai.cs.kuleuven.be/problog/

[3] https://github.com/mthom/scryer-prolog

[4] https://github.com/mthom/scryer-prolog/issues/69


If you are curious about Prolog, here are 2 good and modern (still updated) books:

- Power of Prolog https://github.com/triska/the-power-of-prolog/

- Simply Logical: Intelligent Reasoning by Example https://book.simply-logical.space/

See Awesome Prolog list for more: https://github.com/klaussinani/awesome-prolog


Excellent.

I have a general concern that some working with ML don't appreciate the experience and technology that statisticians have developed to deal with bias, which I think is the biggest problem in the field. I tweeted "ML is v impressive, but has no automated way to ensure no bias. Statistical modelling can't match ML for parameter dimensions, but it can make explicit what is going on with the parameters you have and the assumptions you have. But advantages of theft over honest toil..." - some of the responses in the thread are interesting.

My original tweet: https://twitter.com/txtpf/status/1102437933301272577

Bob Watkins' tweet: https://twitter.com/bobwatkins/status/1102568735485972480


The questions about object relationships sound a lot like SHRDLU[1] which dates back about 50 years ago.

[1] https://en.wikipedia.org/wiki/SHRDLU


Reminds me of a recent comment I saw but can't find by Douglas Lenat (of Cyc[1] fame, also relevant here) about how all the work on deep learning was great but now we need to marry the two, much like the ideas about how the "right brain" and "left brain" or system 1 and system 2 or something work together and work differently but we couldn't very well function as humans without both.

[1]https://en.m.wikipedia.org/wiki/Cyc


Soon we'll be combining statistical, symbolic, and algorithmic intelligence techniques. I question why that isn't the assumed position. :(

That is to say, we have devised some algorithms that are truly impressive. There is little reason to think an intelligence couldn't devise them, of course. There is also little reason I can see, to not think we could help out programs by providing them.


> I question why that isn't the assumed position. :(

I suspect that each paradigm alone is easier to innovate in, than assuming that each is developed sufficiently to connect together.

"Integrating technologies for benefit" is a common view for intellectuals or business-people outside of a discipline who only know enough to see every key-worded algorithm or technology as a black box. Researchers in a field, that need to make a career for themselves by choosing problems tractable and filled with smaller parts, would see difficulties as to how and why that might be inappropriate at a given time.


> devised some algorithms that are truly impressive.

do you mean gradient descent?


And sat solvers. And many graph algorithms. I'm partial to dlx. Even permutation algorithms help considerably if used certain ways.


Reminiscent of fuzzy logic: https://en.m.wikipedia.org/wiki/Fuzzy_logic

The Wikipedia article discusses various extensions of logic and symbolic computation to include probabilistic elements. This was a popular topic in the early 90s.


For anyone who'd prefer a direct link to the conference paper this seems to be based on: https://openreview.net/forum?id=rJgMlhRctm


thanks!!


So, I've privately been working along similar lines, although I haven't published anything, and I also haven't read their specific approach.

How do I prevent a situation where I can't work on my hobby project of multiple years because this stuff gets patented?


How do I prevent a situation where I can't work on my hobby project of multiple years because this stuff gets patented?

Some possibilities (in no particular order)

1. File your own patent application(s) first.

2. Publish your work so that it becomes prior art that should prevent a patent on the same technique

3. Hope that MIT doesn't patent their stuff, or if they do, that they release things under an OSS license that includes a patent grant


Yeah, feel free to dive into my past comments. I probably said many years ago that a combo of ML and GOFAI has massive potential, in a wide range of applications.


It's not a novel idea in the abstract. Ron Sun wrote a lot on something like "marrying connectionist and symbolic techniques" 20+ years ago. See, for example:

https://dl.acm.org/citation.cfm?id=SERIES10535.174508

http://books.google.com/books?hl=en&lr=&id=54iyt6Jcl_oC&oi=f...

https://www.taylorfrancis.com/books/9781134802067

etc...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: