Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Until someone publishes a systematic quality assessment, we're grasping at anecdotes.

It is unfortunate that the questions of "how well did the LLM do?" and "how does 'grading' work in this app?" seem to have gone out the window when HN readers see something shiny.





Yes. And the article is a perfect example of the dangerous sort of automation bias that people will increasingly slide into when it comes to LLMs. I realize Karpathy is sort of incentivized toward this bias given his career, but he doesn't even spend a single sentence even so much as suggesting that the results would need further inspection, or that they might be inaccurate.

The LLM is consulted like a perfect oracle, flawless in its ability to perform a task, and it's left at that. Its results are presented totally uncritically.

For this project, of course, the stakes are nil. But how long until this unfounded trust in LLMs works its way into high stakes problems? The reign of deterministic machines for the past few centuries has ingrained a trust in the reliability of machines in us that should be suspended when dealing with an inherently stochastic device like an LLM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: