Hacker News new | past | comments | ask | show | jobs | submit login

These logs get manually reviewed by humans, sometimes annotated by automated systems first. The setups for manual reviews typically involve half a dozen steps with different people reviewing, comparing reviews, revising comparisons, and overseeing the revisions (source: I've done contract work at every stage of that process, have half a dozen internal documents for a company providing this service open right now). A lot of money is being pumped into automating parts of this, but a lot of money still also flows into manually reviewing and quality-assuring the whole process. Any logs showing significant quality declines would get picked up and filtered out pretty quickly.



So you are saying if we can run these other LLMs for ChatGPT to talk to cheaper than they can review then we either have a monetary denial of service attack against them or a money printing machine if we can get to be part of the review process (apparently I can't link to my favorite "I will write myself a minivan" comic coz someone got cancelled but I trust the reference will work here without link or political back and forth erupting)


> apparently I can't link to my favorite "I will write myself a minivan" comic

It looks like it's been mirrored in several places, e.g.:

https://english.stackexchange.com/questions/488178/what-does...


No.

Because the output of that review process is better training data.

You'd need to produce data that is more expensive to review and improve than random crap from users who are often entirely clueless, and/or that produces worse output of the training process to make using the real prompts as part of that process problematic.

Trying to compete with real users on producing junk input would prove a real challenge in itself - you have no idea the kind of utter incomprehensible drivel real users ask LLMs.

But part of this process also already includes writing a significant number of prompts from scratch, testing them, and then improving the response, to create training data.

From what I've seen, I doubt there is much of a cost saving in using real user prompts there - the benefit you get from real user prompts is a more representative sample, but if that sample starts producing shit you'll just not use it or not use it as much, or only use e.g. prompts from subsets of users you have reason to believe are more likely to be representative of real use.

Put another way: You can hire people to write prompts to replace that side of it far cheaper than you can hire people who can properly review the output of many of the more complex prompts, and the time taken to review the responses is far higher than the time to address issues with the prompts. One provider often tell people to spend up to ~1h to review responses that involve simple coding tasks, for example, but the prompt might be "implement BTree."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: