Don’t use ChatGPT to solve problems

ftxbro · on May 10, 2023

> "In an experiment, I asked 40 questions about PostgreSQL. 23 came back with misleading or simply inaccurate information. Of those, 9 came back with answers that would have caused (at best) performance issues. One of the answers could result in a corrupted database (deleting WAL files to recover disk space)."

I am curious if these tests could be written down and also tested with some versions of GPT (2, 3, 3.5, 4) not because I think they will solve it but to see some trend, and to have a new benchmark if we see 4.5 or 5. Or maybe even test with the 'Code Interpreter' plugin.

neximo64 · on May 10, 2023

Funny, you could say the same about people too, including experts who have misaligned incentives.

jstx1 · on May 10, 2023

If you aren't showing the prompts and outputs and documenting the model version, you aren't saying anything meaningful.