> As long as an LLM is capable of inserting "9.99 > 10.01?" into an evaluation tool, we're on a good way
chatgpt will switch to python for some arithmetic with the result that you get floating point math issues when a 8yo will get the result right. I think "switch to a tool" still requires understanding of which tool to get a reliable result, which in turn means understanding the problem. It's an interesting issue.
chatgpt will switch to python for some arithmetic with the result that you get floating point math issues when a 8yo will get the result right. I think "switch to a tool" still requires understanding of which tool to get a reliable result, which in turn means understanding the problem. It's an interesting issue.