Hacker News new | past | comments | ask | show | jobs | submit login

OpenAI's problem is demonstrating how much value their tools add to a worker's productivity.

However calculating how much value a worker has in an organization is already a mostly unsolved problem for humanity, so it is no surprise that even if a tool 5xs human productivity, the makers of the tool will have serious problems demonstrating the tool's value.




It's even worse than that now: they need to demonstrate how much value they bring compared to llama in terms of worker productivity.

While I've no doubt GPT-4 is a more capable model then llama3, I don't get any benefit using it compared to llama3 70B, from the real use benchmark I ran in a personal project last week: they both give solid response the majority of times, and make stupid mistakes often enough so I can't trust them blindly, with no flagrant difference in accuracy between those two.

And if I want to use hosted service, groq makes Llama70 run much faster than GPT-4 so there's less frustration of waiting for the answer (I don't think it matters to much in terms of productivity though, as this time is pretty negligible in reality, but it does affect the UX quite a bit).


Since 1987, labour productivity has doubled[1]. A 5x increase would be immediately obvious. If a tool were able to increase productivity on that scale, it would lift every human out of poverty. It'd probably move humanity into a post-scarcity species. 5x is "by Monday afternoon, staff have each done 40 pre-ai-equivalent-hours worth of work".

[1] https://usafacts.org/articles/what-is-labor-productivity-and...


But how do you measure labour productivity.


macro scale: GDP / labor hours worked.

company scale: sales / labor hours worked

It's very hard to measure at the team or individual level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: