Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What is the current state of "logical" AI?
32 points by mtlb on Dec 26, 2023 | hide | past | favorite | 51 comments
The kind of AI that gets the public attention right now lacks a quality that can be described as "formal correctness", "actual reasoning", "rigorous thinking", "mathematical ability", "logic", "explainability", etc.

This is the quality that should be studied and developed in symbolic AI approach. However, the actual symbolic AI work I know of seems to fall in one of the two buckets: 1. "Let's solve a mathematical problem (e.g. winning at chess) and say that the solution is AI" (because humans can play chess, and now computers can too!) 2. "Let's make something like Prolog but with different solver algorithm / knowledge representation". Products like Cyc and Wolfram seem to work essentially in this manner, although with lots of custom coding for specific cases to make them practical. There's lots of work on separate aspects of this as well, like temporal and other modal logics.

I see the first bucket as just applied maths, not really AI. The second bucket is actually aimed at general reasoning, but the approaches and achievements in it are somewhat uninspiring, maybe because I don't know many of them.

So my broad question is: what is happening in such "logical AI" research/development in general? Are there any buckets I missed in the description above, or maybe my description is wrong to begin with? Are there any approaches that seem promising, and if so, how and why?

I would be grateful for suggestions of the books/blogs/other resources on the topic as well.




AI is not even close to having true logical reasoning, that's probably decades away. The issue is that cognitive scientists are clueless. Scientists have a good model for associative reasoning, which is the basis of modern neural networks, but we don't have a clue how abstract reasoning actually works. All birds and mammals have advanced abstract reasoning and are far more intelligent than GPT-4:

- birds and mammals are inherently able to count in almost any context because they understand what numbers actually mean; GPT-4 can only be trained to count in certain contexts. GPT-4 would be like a pigeon that could count apples, but not oranges, yet biological pigeons can count anything they can see, touch, or hear. There's a profound gap in true quantitative reasoning, even if GPT-4 can fake this reasoning on specific human math problems.

- Relatedly, birds and mammals are far faster at general pattern recognition than GPT-4, unless it has been trained to recognize that specific pattern.

- Birds and mammals can spontaneously form highly complex plans; GPT-4 struggles with even the simplest plans, unless it has been trained to execute that specific plan.

The "trained to do that specific thing" is what makes GPT-4 so much dumber than warm-blooded vertebrates. When we test the intelligence of an animal in a lab, we make sure to test them on a problem they've never seen before. If you test AI like you test an animal, AI looks incredibly stupid - because it is!

There was a devastating paper back in 2019[1] proving that Google's BERT model - which at the time was world-class at "logical reasoning" - was entirely cheating on its benchmarks. And another paper from this year[2] demonstrates that LLMs definitely don't have "emergent" abilities, AI researchers are just sloppy with stats. It is amazing how much bad science and wishful thinking has been accepted by the AI community.

[1] https://arxiv.org/abs/1907.07355

[2] https://arxiv.org/abs/2304.15004


As a current researcher in the field I am perpetually annoyed by the overeagerness of AI research to make fantastical claims. Reading and extracting information from papers is a minefield, and we've learned to always at least A/B test the conclusions of any technique that is supposedly proven to be useful. Even foundational papers about basic concepts in LLMs, for example, can sometimes boil down to "this worked well on our cherrypicked tests"


Yeah, the "logical reasoning" in LLMs is mostly a marketing device to get products sold and papers published. One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job. But if we have little idea of how abstract thinking actually works, this is a problem. What do you think about current logic-based AI approaches? Do they try to replicate the best ideas we've got from congnitive sciences, or trying to do their job for them?


> One could hope that starting with reasoning instead of trying to get it "emerge" would do a better job.

We did AI starting with reasoning (directly implementing rules of propositional logic) first, it is called expert systems.

It works very well for some things, but after some efforts to expand it with things like fuzzy logic it became pretty much accepted that we'd reached its limit.

You could hope that it would work better, but...


There's a distinction between "constrain the AI's output with logical rules to make it more reliable" and "build logical reasoning into the AI." The current strategies are trying to do the first task and I bet it'll lead to all sorts of cool technology. I strongly doubt these techniques will extend to the actual logical reasoning. Intuitively, it feels like throwing a bunch of logical rules onto AI is begging the question - I doubt bird/mammal brains actually have these logical rules baked in, I am sure it's far more sophisticated.

A trivial theorem in logic gives an example of what I mean:

If A then B <=> If (not B) then (not A)

This is really not how humans think - I don't believe we have a "contrapositive calculator" in our brain that takes arbitrary situations in and computes a contrapositive. This contrapositive theorem is a fact of the world that humans used logical thinking to understand, and which can be applied to formal logical computations that human brains aren't necessarily good at.

Specifically, I don't think non-human animals have "logical" thinking at all, they have causal thinking, and human logic is a consequence of us having exceptionally good understanding of causality. Logic is itself a special case of causality, formalized in a "generic" fashion by human language and used as a tool to help us think through tricky cases.

The contrapositive theorem takes a bit of thought for me to unwind - "so if B is not true then of course A can't be true" - but the way contrapositives are reflected in the real world takes no thought whatsoever, even if the examples are more algebraically complicated than A->B <=> (~B)->(~A):

- if the door is working and I have a key that can unlock the door, then if I can't unlock the door either I don't have the key or the door is broken. (AvB)->C <=> (~C)->(~A ^ ~B)

- if having gas implies my car can drive, then if my car can't drive I don't have gas - or possibly I was incorrect and my car is broken. (A->B <=> (~B)->(~A)) V (~)(A->B)

These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff. I don't think feeding this limited human language into a computer is going to burst past these limits. We need to figure out how bird/mammal brains actually model things causally.


> These cases are obvious to us because the brain has access to much fancier causal reasoning than what we can currently express in human language. For now, human language is stuck with "If a then not b" stuff.

I don't follow this. Didn't you just express these cases in human language? I understand that in reality we can "grasp" the meaning of a problem of not being able to open the door without expressing or thinking about it verbally, which would be redundant as there would be a lot to say (the key may be broken, the door may be held by someone on the other side, even if the key works we might be trying to push instead of pull, etc, etc.) and any person who has opened doors with keys would likely understand all of this. The problem is not that those things can't be expressed in human language, but the lack of ability to build good conceptual models of the world that encompasses all such knowledge and allows reasoning on it quickly.


I didn't mean the specific cases, I meant the underlying mechanism that our brain uses to reason about these cases. There is something deeper going on that allows us to build rigorous world models from very thin abstractions, which can be applied to a seemingly arbitrary range of problems. It's this rigorous world model which is absent in AI and not currently explained by cognitive science.

In this example, the overall world model is able to easily accommodate "broken door" "functioning door" "key" etc. and come to a specific conclusion about this problem. The specific conclusion can be easily expressed in human language. The world model itself can't.


Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

Also how do you prove that GPT is worse at counting?

Because GPT can currently count both apples and oranges.


> Also how do you prove that GPT is worse at counting?

Back in June 2023 GPT-4 was dramatically worse at counting than a pigeon in the sense that it couldn't accurately tell the difference between sentences with 3 words and sentences with 5 words, whereas pigeons can count almost anything up to about 10. It also routinely failed "pick the shorter sentence" tests which I literally took from a test administered to mice. GPT simply doesn't understand what numbers are, whereas pigeons and mice have an intuitive understanding similar to toddlers. You don't need to teach kids what 3 means, you just need to teach them the human symbol for the concept of 3. GPT only has the human symbol and does not seem capable of understanding the concept.

In my testing GPT-4 consistently failed counting / pattern-recognition tests even if you used "chain-of-thought" prompting. As far as I could tell its only true understanding of numbers was "one, two, many." This seems reflected in real use cases, where GPT routinely (and hilariously) ignores commands to return 50 words/etc of output. GPT doesn't know what fifty means, it just knows what various documents that say "word count: 50" look like, and tries to imitate the tone.

Since transformer neural networks lack recursion I conjecture that GPT will never be able to understand a number larger than 2, even if in specific cases it can solve counting problems up to eleventy billion. This is what I mean by "counting apples, not oranges," its sense of counting is paper-thin and easily fooled by adversarial prompts. It is much harder to fool a mouse or a pigeon.

Many of the tests I ran back in April 2023 no longer work. I strongly suspect this is because OpenAI trained GPT to many of the tests that people were throwing at it, and not because GPT actually became "smarter." I stopped messing around with GPT specifically because OpenAI doesn't issue any release notes, making replicability impossible. Mistrial's 77B model was dramatically worse than even GPT-3 at counting, but I doubt they trained it to count. Not sure about LLaMa/etc.


When you are talking about "counting", do you mean the logical process of going "one", "two", "three"... or do you mean the ability to statistically estimate the amount of quantity by the amount of signal you are processing?

E.g. are pigeons actually "counting" as in the process how humans calculate to be accurate? Or are they just responding to the signal? Like similar to how a person could tell whether some sound is higher or lower pitch, but they wouldn't be able to actually numerically say the actual exact frequency.

Because to me pigeons are just similarly responding to the amount of "signal" they are receiving, not actually doing abstract reasoning.

And looking at the science studies, it also seems that they had to train pigeons to be able to count, they weren't able to do it out of the box.


But by the way, when you are criticising GPT's ability to count words in the sentences you are saying, that is quite odd to me. Because the input that GPT receives is actually tokens, not the words you give it.

So then imagine if someone asked you a question in English, and then translated it to hieroglyphs, and you didn't know English. Would you be able to count how many words were there in the original English?

So it seems weird to expect that GPT would be able to count in the first place.

But however if it later was taught how many words the combination of different tokens yielded to, it would be able to do that. So perhaps this is what was taught to it meanwhile yielding in that better ability to count words?


Thirdly GPT with Vision can count objects on an image very well, doesn't matter what the objects specifically are. Does it make mistakes? Sometimes, when objects are not clearly visible, but so would humans and pigeons.


The biological neural structures that encode behavior are “trained” through evolution, but even the most advanced animals rely mostly on conditioned (= learned during lifetime) reflexes, and not on the ones “hardcoded” evolutionary.

Certainly not much evolutionary “training” in the human brain has happened in the last 3000 years, yet advancement in our understanding of the world has been plentiful. But human thinking (including rationality, mathematics, etc.) is on a different level to even learned animalistic behavior. Some great apes were taught language and even showed basic abstract conceptual thinking, but were never able to reach the level of 3-5 years old human kids.

The problem with GPTs and other statistical models is that they can learn incredibly complex patterns in anything we can express as bytes, but not learn the simplest concepts of maths despite being trained on the whole corpus of mathematical texts available on the internet, while kids need classes that can be covered in a single textbook to understand them, and adults may need just a textbook for this.


The claim about those models being "statistical". Why wouldn't you consider human brain to be statistical or animal's brains as "statistical"?

Because in the end human brains as well as any brains it seems they could be thought of statistical results from long periods of training and producing output from input. Where am I wrong?

I assume by statistical you mean that the ending result of state of neurons can be represented as numbers and pathways leading through these as probabilities of going through a certain pathway - but it occurs to me that same is with human brain, no?


You are not wrong, but too abstract. Saying that brain is statistical and therefore can be represented with statistical model such as artificial neural net is like saying that brain is computational and therefore can be represented with a Turing-complete system, or that brain is physical and can be represented via physical simulation, etc.


> Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

No. Animals evolved and are able to do those things. Evolution is not training, and evolution has approximately zero to do with how transformers work.


What's the difference here?

Both seem to have adaptive neural networks where those networks change as time goes on due to a reward - for animals, mutated genes being more likely to be given forward if the change was good. Over millions of generations it's statistically likely that more good genes that caused the neural networks to be in a state that is better able to solve problems within the environment get passed on, eventually resulting in an emerging intelligence. For training you similarly change the state of the neural network depending on whether the answer is good or bad.


Evolution operates on genes, which do not encode synaptic connections for one thing. The analogy you're making here is so stretched it's hard to begin to say what's wrong with it. Backpropagation and natural selection are about as different as two things can be. About the only thing you can say they have in common is that both can be modeled as optimization processes.

What's the difference between a star and a bonfire? Both use fuel and produce heat and light.


I mean the point was about LLMs not truly being problem solvers because they were trained to do so as opposed to having been evolved through evolution. I'm looking for what the difference is specifically within that dimension. Biological pigeons had their own process of evolution how they reached to have the type of neural networks and systems in themselves that gave them the ability to count - but not in all contexts for sure.

So yes, my point is that both have an optimisation process that through time lend them those emerging capabilities.


No, the point was that LLMs are not good at problem solving because they are not good at problem solving, not because they were not evolved. We don't understand how we or animals solve problems, which is why we haven't yet succeeded in replicating that in AI. You're the one bringing in these incredibly strained analogies, because you want GPT 4 to be more than it is or something, I'm not sure.


The bigest difference is that training does not change the size or architecture of an artificial neural network, but biological evolution dramatically changes the size and architecture of animals' brains.

Your comparison is sincerely vacuous. It vaguely makes sense if you're talking about GPT-3 to GPT-4 (though I don't think it's helpful). It makes no sense if you're talking about training a single neural network.


I mean that exposure to a lot of training material yielded in the final set of capabilities. Pigeons and their ancestors were exposed to certain situations throughout evolution that yielded in the formation of neural network and its ability to "count". Which I believe is not actual "one, two, three", but just the amount of signals being activated resulting in a certain output from pigeon. There's a difference in how a human counts, except for small numbers which you can intuitively immediately come up with a number.

There was training material which were situations to which organisms had to produce output for and if the output was good their genetics survived, eventually forming the neural network that was able to handle this training material well, but similarly producing emergent behaviour like being able to "count".

But GPT-Vision can easily do as well what a Pigeon can. What's the exact thing that implies Pigeon is doing it somehow more intelligently?

If you ask them on a picture the quantity of something, I'm pretty sure both respond to the amount of this type of signal received either though light waves or pixels encoded for GPT.


> Aren't animals trained to do all of those things through evolution? Similarly how GPT is trained.

If you interpret this question at the most abstract level of "aren't both solutions arrived at through training/trial+error method?" - then the answer is probably yes, they are both arrived at in some conceptually similar manner.

But they are two very different underlying systems and we don't really understand the biological systems well enough to even be able to truly compare.

Beyond that, it seems that humans (switching to humans from pigeons) have some sort of representation/understanding of the world around us such that even if we produce the same result as ChatGPT to a counting question, the information stored within our systems is not equivalent.


> aren't both solutions arrived at through training/trial+error method?

But also an underlying neural network type of structure that takes in input, and produces output and changes underneath to then have emerging capabilities (like the counting).

> But they are two very different underlying systems and we don't really understand the biological systems well enough to even be able to truly compare.

Beyond that, it seems that humans (switching to humans from pigeons) have some sort of representation/understanding of the world around us such that even if we produce the same result as ChatGPT to a counting question, the information stored within our systems is not equivalent.

What is the reason to believe that the way pigeons count is anything other than it responding to certain signals

1. Light waves coming as input.

2. Some transformation layers that will abstract the input further.

3. Then pigeons do not really count as people do, but they just respond to the rough feeling of "quantity" or amount of signal received. Because as I understand the studies prove the ability to "count", by them having to just differentiate between counts, and getting rewarded if they are able to do it.

And GPT-Vision can easily do similar things. I can give it an image and ask how many objects are there, and up to an amount it can answer correctly given the image is clear enough.

Similarly pigeons didn't have 100% accuracy in counting. So they are not doing the "one, two, three", they are just seemingly responding to "amount of signal" to me. Similar to how we would be able to tell that certain sound is louder than the other, we are not actually counting the frequencies of the sound. We do not even know what produces the sound. We just decipher that one signal is louder than the other one.

Pigeons after being trained to respond to certain amount of something will associate a strong signal from there with that reward. This seems like what a very basic machine learning algorithm can handle, even more basic or smaller in scale than an LLM. So what makes an animal smarter then?


> But also an underlying neural network type of structure that takes in input, and produces output and changes underneath to then have emerging capabilities (like the counting).

Sure, at an abstract level you could say that, but it requires such a level of abstraction that comparisons don't really mean much. The differences in how the systems function could cause significant differences in underlying functionality and emergent behavior.

For example, some differences when you get into the details:

1-Biological brains have astrocytes that manage the synapses and activity of neurons, constantly and dynamically bringing together different and changing populations of neurons to perform functions (inhibiting some and enhancing activation in others).

2-Neurons aren't the only computational units, astrocytes are also computational units involved in (at minimum based on recent studies) learning and object recognition.

3-Some cells like Purkinje cells learn patterns even when isolated. Something within the cell is learning/storing information about timed patterns of activity and can respond appropriately when the pattern is re-encountered.

4-Dendrites frequently perform preprocessing on signals prior to the signal being forwarded to the neurons soma.

5-Rabbit's olfactory learning and memory is interesting, read up on it if you get a chance. A neuroscientist expert in that field has a theory that matches the data that the network within the olfactory region goes through a minimum-energy type of reconfiguration each time a new scent is detected that is related to some positive or negative. This is interesting from the perspective of how do brains get dynamically reconfigured with learning.


> 1-Biological brains have astrocytes that manage the synapses and activity of neurons, constantly and dynamically bringing together different and changing populations of neurons to perform functions (inhibiting some and enhancing activation in others).

Wouldn't this be something that could be mirrored or performed even better by just having more layers of neurons in the network. Of course in addition you could have multiple LLMs doing such work together of selecting best optimised systems for a given work. But it seems like astrocytes based on my limited knowledge are more for the reason of maintenance for biological systems which LLMs wouldn't necessarily have to deal with in the first place. So I'm not sure what kind of advantage astrocytes exactly would bring.

> 2-Neurons aren't the only computational units, astrocytes are also computational units involved in (at minimum based on recent studies) learning and object recognition.

But again - what benefits would they provide more over extra layers or LLMs that act together according to an orchestrating LLM?

> 3-Some cells like Purkinje cells learn patterns even when isolated. Something within the cell is learning/storing information about timed patterns of activity and can respond appropriately when the pattern is re-encountered.

But neural networks can in general learn patterns when isolated. It also seems they are more for physical movement, which we should care more about when building robots rather than text based intelligence. Although it seems like for physical movement there's other blockers, like materials. It's all data structures that take input and produce output, which they receive feedback for whether it worked out well and adapt accordingly. I'd assume if LLMs neural networks were given a look, there would be many pockets like the Purkinje cells.

> Dendrites frequently perform preprocessing on signals prior to the signal being forwarded to the neurons soma.

Again it seems like extra layers of neurons. Because I assume with LLMs and other ML tools in addition layers will start to converge on specific set of processing and functionality as they train more. Preprocessing is just a way to make a larger task into more smaller subtasks.

> Rabbit's olfactory learning and memory is interesting, read up on it if you get a chance. A neuroscientist expert in that field has a theory that matches the data that the network within the olfactory region goes through a minimum-energy type of reconfiguration each time a new scent is detected that is related to some positive or negative. This is interesting from the perspective of how do brains get dynamically reconfigured with learning.

I should

But I mean overall, it all seems still the same concept, just orchestrated differently in certain ways and in biological sense it seems it has had to tackle problems that an LLM hasn't really had to, as it has had billions of years to evolve those layers of different systems, but having also a lot of tooling within it to deal with environmental limitations.

So it seems, that given enough computing power, we should be able to make something that is more intelligent than a human, which I do think GPT-4 already is in so many things.

I also am not sure what exactly would GPT-4 have less intelligence in compared to any animal. If you give it the proper input it should be able to perform at least at level of any animal, maybe not with same speed - as animals and as you mentioned in general there are many neural networks within human and animal bodies that correspond only to certain function and are optimised for that specifically.


> Wouldn't this be something that could be mirrored or performed even better by just having more layers of neurons in the network. Of course in addition you could have multiple LLMs doing such work together of selecting best optimised systems for a given work. But it seems like astrocytes based on my limited knowledge are more for the reason of maintenance for biological systems which LLMs wouldn't necessarily have to deal with in the first place. So I'm not sure what kind of advantage astrocytes exactly would bring.

Regarding: astrocytes and cell maintenance vs computation: The picture is getting more complex at a steady pace as scientists learn more. Now they know that astrocytes wrap around synapses (the "tripartite" synapse), detect and emit neurotransmitters and gliotransmitters, have internal calcium signaling and are involved in learning and object recognition.

Regarding: Wouldn't this be...more layers of neurons...": Possibly, maybe probably. Those examples didn't really describe any functional capability, they just described how different our machine is from many people's understanding of how our brain works, which helps illustrate why comparisons to something like ChatGPT are difficult.

> So it seems, that given enough computing power, we should be able to make something that is more intelligent than a human, which I do think GPT-4 already is in so many things.

When you say "enough computing power" do you mean just expanding ChatGPT's number of parameters and training set? I don't personally think that will do the job, I think the key is identifying and providing the specific functional capabilities that our brain utilizes. And I think that requires an approach that is different than just expanding the size of the network and training.

> I also am not sure what exactly would GPT-4 have less intelligence in compared to any animal.

Do you mean ChatGPT's current capabilities? If so, animals model the 4D environment they exist in, ChatGPT is obviously limited in areas like that. Those internal models can be key for some types of knowledge.


> The picture is getting more complex at a steady pace

I'm sorry, but it just still seems the same concept to me. Am I misunderstanding something? It all seems to be about having some sort of signal travelling through various pathways where there's a mechanism to reward/punish the signal which will get adapted by whichever method of storage.

It would be just a matter of having the proper weights and pathways for the signal to travel to yield desired results. The issues will be with performance as in how fast we get results from the signal, but non the less the concept seems the same to me.

> Possibly, maybe probably. Those examples didn't really describe any functional capability, they just described how different our machine is from many people's understanding of how our brain works, which helps illustrate why comparisons to something like ChatGPT are difficult.

It's just that it all seems conceptually very similar to me. And when I try to reason how my own intuition and reasoning works it all makes sense. Fast and slow thinking make sense as well. Fast thinking or "intuition" is one that will give you the gut feeling about something, which I believe is case for animals as well as for machine learning algorithms. The "model" of the World, both I and LLMs have. The model is in some way represented in the connections and weights of the neurons and other things. Slow thinking is kind of firstly brainstorming ways to solve a problem - which LLMs can do, and then bruteforcing them, coming back back, solving the maze. LLMs may not have perfected this completely yet, but it doesn't honestly seem that far away to me, and I wouldn't be surprised if it was just a problem of scaling up the amount of neurons/layers, etc.

> When you say "enough computing power" do you mean just expanding ChatGPT's number of parameters and training set? I don't personally think that will do the job

I can't guarantee it yet, but seeing the difference between e.g. gpt-4 and gpt-3.5 and open-source models, then there seems to be clear different in power of understanding instructions and coming up with very impressive ways to solve problems in my view. So considering I haven't seen what is next level of gpt-4 yet, it's hard for me to believe there wouldn't be a significant jump in performance when increasing magnitude - unless someone has already tried it and it was proven not to matter.

I will be able to have a more accurate opinion I suppose when I see what gpt-5 can do. Because to me gpt-3.5 is quite useless, but gpt-4 is amazing for so many use-cases which I've tried. And in my view the neurons and the connections must represent some form of modelling of the World to be able to explain those results.

If expanding it isn't enough, then I would still believe now - after having seen gpt-4, that if we try enough different arrangements we can reach human level intelligence.

> Do you mean ChatGPT's current capabilities? If so, animals model the 4D environment they exist in, ChatGPT is obviously limited in areas like that. Those internal models can be key for some types of knowledge.

Can you give an example of a problem that animal can solve that GPT couldn't?

Because GPT can handle 2d, 3d, I'm not sure what you mean by 4d - is that including time, like video? In this case we could try a loop where we ask for actions from GPT after presenting an image, and keeping it in feedback loop. It could prove the ability to reason, except for of course performance side of it. But performance we can solve later, at the moment I would just like to see whether it can perform at least at animal level even if not as quickly.


> I'm sorry, but it just still seems the same concept to me. Am I misunderstanding something? It all seems to be about having some sort of signal travelling through various pathways where there's a mechanism to reward/punish the signal which will get adapted by whichever method of storage.

When I said the picture is getting more complex it was in response to your statement that you thought astrocytes were just for cell maintenance, not computation, so I was providing some details about how those cells are involved in computation (not just cell maint).

> I can't guarantee it yet, but seeing the difference between e.g. gpt-4 and gpt-3.5 and open-source models, then there seems to be clear different in power of understanding instructions and coming up with very impressive ways to solve problems in my view.

While I think ChatGPT is very impressive, I don't think it has "understanding", otherwise it wouldn't happily explain to you how to calculate the 4th side of a triangle. A human knows that a triangle has 3 sides and questions about the 4th side is inconsistent with his/her internal model. ChatGPT just has statistical data about the relationship between words, which is why it told me how to calculate that 4th side.

> I'm not sure what you mean by 4d - is that including time, like video?

Yes, time, but not necessarily video. Time is incorporated into the patterns we detect and the internal models we build.


Evidently pigeons can count to 9. When I asked ChatGPT-4 to identify the irrational statements in this comment, it said there were 9, and I'm pretty sure pigeons can't tell if something is irrational or not.


This may only be tangentially related, but you might be interested in the recent research on Qualitative Constraint Satisfaction Problems - a good introduction to the topic is Manuel Bodirsky's habilitation thesis [1].

The purpose of the subject is, roughly speaking, to exhaustively characterize all types of reliable reasoning which can be carried out efficiently - some people say they are searching for "a logic for P". The techniques used are a mix of ideas from model theory, universal algebra, Ramsey theory, and computer science. Given the ridiculously ambitious scope of the project, I think the rate of progress (especially in the past few years) is astounding.

[1] https://arxiv.org/pdf/1201.0856.pdf


Statistical models based on gigantic text databases doe not make logical reasoning closer . Even if called AI.


Something that would massively improve language models ability to reason is whiteboarding. Being trained to make, review, improve, and add to notes. While maintaining a consistent goal.

I am unaware of anyone who can reason to any serious depth without a paper, computational, or actual version of a whiteboard.

This doesn’t seem like a particularly challenging thing to add to current shallow (but now quite wide) reasoning models.

Imagine how fast you could think if you had a mentally stable whiteboard that you could perceive as clearly as you can see, and update as fast as you can think the changes.

Our brains have probably been tragically speed limited by our slow vocal & finger speeds for some time.

That will take AI’s to a wide AND deep reasoning level far beyond us very quickly.

Now add mental file cabinets and an AI could trivially keep track of many goals and it’s progress on them. Again, not likely to be a huge challenge to add.

Now, given all that long term reasoning ability, let the AI manage instances of itself working across all the problems with speed adjusted for priority & opportunity.

Finally, have the model record every difficult problem it solved, so it’s fast wide (non-whiteboard) abilities can be tuned, moving up level after level. Occasionally do a complete retraining on all data and problem-solution pairs. Again, straightforward scaling.

Every new dimension they scale quickly surpasses us & keeps improving.

At this point, IMHO, anyone pessimistic about AI has expectations far behind the exponential curve we are in. Our minds constantly try to linearize our experiences. This is the worst time in history to be doing that.


>Imagine how fast you could think if you had a mentally stable whiteboard that you could perceive as clearly as you can see, and update as fast as you can think the changes.

Thinking about what I am going to draw or write on the whiteboard takes the bulk of time, not the act of drawing or writing. The "update as fast as you can think" part will likely be achieved soon with neural interface, yet it's hard to imagine that this will lead to "superintelligence" of some sort. Same for "mental file cabinets": real or digital files allow to trivially store information, and search systems allow to retrieve it pretty quickly, yet somehow Google didn't make everyone who can use it super smart.

Same goes for vocal speed: coming up with the words to describe the idea and coming up with the idea itself are different things, second being much more hard.

> At this point, IMHO, anyone pessimistic about AI has expectations far behind the exponential curve we are in.

The problem is that the crucial aspect of reasoning is missing in the state of the art models right now. We can make LLMs write to and read from files, but as long as there is a chance that any of its output will be incoherent (and there's a good of this chance now) and there is no mechanism to actually check for errors logically, the whole whiteboard architecture will be a huge demonstration of "garbage in, garbage out".


Our minds do operates internally much faster than our mouths or fingers.

The speed we go from thought to thought internally is lighting, compared to how fast we operate when we have to update the subjects of these thoughts on pen and paper. Or explain every step of our thinking, as we make it, to someone else verbally.

Our brain is far more densly connected and faster operating than brain signals sent to direct a physical arm and hand, pen, paper, back through the visual system.

Being able to adjust any stable visualization in the mind by just visualizing the change to instant effect, removes mental friction and increases internal bandwidth.

Any removal of friction or increased bandwidth to thinking is profound.

Slowed more careful thinking, and slower collaborative thinking, are often helpful. But being slowed down by limitations is never a help.


An interesting approach I came across at NeurIPS a few weeks ago is called "ML with Requirements"[1]: https://arxiv.org/abs/2304.03674

My basic understanding is that it combines "standard" supervised learning techniques (neural nets + SGD) with a set of logical requirements (e.g. in the case of annotating autonomous driving data, things like "a traffic light cannot be red and green at the same time"). The logical requirements not only make the solution more practically useful, but can also help it learn the "right" solution with less labelled data.

[1] I don't know if they had a NeurIPS paper about this; I was talking to the authors about the NeurIPS competition they were running related to this approach: https://sites.google.com/view/road-r/home


See https://cacm.acm.org/magazines/2023/6/273222-the-silent-revo... and also modern production rules engines like https://drools.org/

Oddly, back when “expert system shells” were cool people thought 10,000 rules were difficult to handle, now 1,000,000 might not be a problem at all. Back then the RETE algorithm was still under development and people were using linear search and not hash tables to do their lookups.

Also https://github.com/Z3Prover/z3

Note “the semantic web” is both an advance and a retreat in that OWL is a subset of first order logic which is really decidable and sorta kinda fast. It can do a lot but people aren’t really happy with what it can do.


Thanks, modern rule engines and description logic formalisations are something for me to explore! Are there any other practical applications of such advanced SAT solvers?


At https://www.categoricaldata.net we claim that symbolic AI is also generative, when eg used in data warehousing. Instead of eg new images, the generatively gives you new primary and foreign keys, new ontologies, contradiction detection, etc.


Is there a conceptual difference between the categorical database and hypergraph database?


Yes; categories extend traditional graphs with systems of equations. Hypergraphs extend traditional graphs by allowing edges to be between multiple nodes. Most operations on categories are formally undecidable because of the systems of equations; most operations on graphs/hypergraphs are decidable. This makes working with categorical databases a lot like doing computer algebra in e.g. Mathematica and provides a huge increase in expressive power (you can e.g. encode Turing machines with equations.)


One of approaches is currently on the front page https://news.ycombinator.com/item?id=38767815


"Formal reasoning" or "logic" as you suggest is a model for finding "truth" from static inputs and simple operations. However, if the inputs are random variables (they have an associated distribution) then so (likely) are the outputs, and "truth" is still a random variable. The world we live in is better modeled by the latter than the former, and as such the "decision tree" approach of AI seems like a more reasonable approach and model to finding "truth" than a strictly mathematical approach.


You can still have stochastic model that works on and/or produces ie. coq formalism.


Chess is best solved by fuzzy fake logic or whatever you want to call it.

Formal correctness is drastically different from “actual reasoning”.


Is there “I” part in logic at all? We ourselves aren’t logical. We happened to invent/discover logic as a way to interact closer with the world and learned to basically simulate a weak, leaky logic machine runtime in our minds. Later someone smart offloaded it to electronics (made with that exact principle, btw, which is one of these “hidden right before your eyes” type of nuances). Custom coding is probably the correct answer.


Zephyr is pretty good. Real pragmatist that one.


https://ollama.ai/library/zephyr

I like to point out higher quantization is certainly better (q8 minimally for consistently better results.)


Gemini Ultra should show good progress according to Google - it's supposed to perform better than 85% of computer science competitors, which requires a lot of logical reasoning. Lets see it once it goes live, but sounds promising.


Their previous model was better than 46% of such competitors (according to them), so 85% seems achievable by throwing more compute resources at typical ML training. After all, training on millions of examples of logical reasoning will undoubtedly store logical rules in the model in some shape or form (it does so even in ChatGPT), yet the results are still more "convincing" rather than "correct", or "probably correct" at best, usually achieved with lots of postprocessing on top. GPT-4 is better than 90% of lawers at the bar exam, yet still manages to fail at reasoning on much simpler domains.


Imaginary models hyped/faked/lipsticked by PR deparment, in future tense, in the field that advances on daily basis is quite weak in most discussions.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: