There are examples of learning reasoning from scratch with reinforcement learnin...

ipaddr · 2025-01-30T03:12:37 1738206757

Now you are asking for a perfect modeling of the system. Reinforcement learning works by discovering boundaries.

tracker1 · 2025-01-30T09:26:18 1738229178

Now rediscover all the plants that are and aren't poisonous to most people.

dchichkov · 2025-01-30T22:43:21 1738277001

I've suggested that long context should be included into the prompt.

In your particular case the prompt would look something like: <pubmed dump> what are the plants that aren't poisonous to most people?

A general reasoner would recover language and relevant world model from pubmed dump. And then would proceed to reason about it, to perform the task.

It doesn't look like a particularly efficient process.