> They're just outputting tokens that resemble a reasoning process.
Looking at one such process of emulating reasoning (got deepseek-70B locally), I'm starting to wonder how does that differ from actual reasoning? We "think" about something, may make errors in that thinking, look for things that don't make sense and correct ourselves. That "think" step is still a blackbox.
I asked that llm a typical question of gas exchange between containers, it made some errors and noticed some calculations that didn't make sense:
> Moles left A: ~0.0021 mol
> Moles entered B: ~0.008 mol
> But 0.0021 +0.008=0.0101 mol, which doesn't make sense because that would imply a net increase of moles in the system.
Well, that's totally invalid calculation, it should be "-" in there. It also noticed that those quantities should be same in other place.
Eventually, after 102 minutes and 10141 tokens, involving checking answers from different angles multiple times, it outputted approximately correct response.
Looking at one such process of emulating reasoning (got deepseek-70B locally), I'm starting to wonder how does that differ from actual reasoning? We "think" about something, may make errors in that thinking, look for things that don't make sense and correct ourselves. That "think" step is still a blackbox.
I asked that llm a typical question of gas exchange between containers, it made some errors and noticed some calculations that didn't make sense:
> Moles left A: ~0.0021 mol
> Moles entered B: ~0.008 mol
> But 0.0021 +0.008=0.0101 mol, which doesn't make sense because that would imply a net increase of moles in the system.
Well, that's totally invalid calculation, it should be "-" in there. It also noticed that those quantities should be same in other place.
Eventually, after 102 minutes and 10141 tokens, involving checking answers from different angles multiple times, it outputted approximately correct response.