Hacker News new | past | comments | ask | show | jobs | submit login

>> The training cost of GPT-4 is now only 1/3 of what it was about a year ago. It is absolutely staggering how quickly the price of training an LLM is dropping, which is great news for open source. The google memo was right about the lack of a moat.

That really doesn't change anything at all. The more training large models gets cheaper, the more large corporations are able to train larger models than everyone else.

Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.




At a certain point though, models become good enough for particular tasks. Once that happens for whatever my application is, I don't care if OpenAI has a model that's twice as good on some metric, because it's overkill for my use-case. I'm going to be happy using a smaller, cheaper model from a competitor.


I think we're far from that point though. For the vast majority of use cases, I always wish that the answers could be more accurate.

Sure - they might be 'good enough' to build a business on. But if a competitor builds their business on top of a more accurate model, their product will work better, and they will win the market.


Yea but the bench being discussed here is FOSS. Which for me, and many, translates to can i run something useful in my closet or on my phone. I've found LLaMA neat and yea, some FOSS models are getting decent - but they're a far cry from GPT4. I pay for GPT4, use it almost daily and that's my bench.

Yes, when i can run GPT4 in my closet, OpenAI will have GPT7 or w/e - but it doesn't change the fact that i have something useful running in my closed network and that opens up all kinds of data integration that i'm unwilling to ship to OpenAI. In that day i'll probably still use GPT7, but i'll _also_ have GPT4 running in my closet and integrating with a ton of things on my local network.


My guess is you'll be running GPT4 equivalent in your closet, but with a 4K context window.

Where the big guys will have GPT-who-cares-what-version with a 100K context window.

Context size is as much of a big deal as newer generations of models imo.


Am I right in my layman's understanding that context windows scaling up requires (mainly) much more compute at run time? Or do longer context models require different/longer training?


> their product will work better, and they will win the market

Like Betamax?


One important milestone a model that is good enough to produce an acceptable quality of answer to x% of public users questions without any data being sent to the megacorps.


> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

I think a better frame is, if rice got so absolutely cheap to make that anybody could spin up a bag of rice on a demand, anybody whose business model was based on selling rice sacks would be in trouble, especially if their specialty was selling rice in bulk instead of, eg, mom-and-pop restaurants selling cooked rice with flavors and a focus on customer experience.

(Not sure the metaphor is a good fit for AI. Maybe OpenAI comes up with GPT-5 and makes something so powerful that by the time OSS projects get to GPT-4 level nobody cares. But if GPT-5 is only incrementally better than GPT-4, then yeah, they have no moat.)


Surely there are diminishing returns for the AI computing though? I mean, is a model with 10x the parameter count 10x better? I think it is still possible that the training costs will be irrelevant for all players at some point with this non-linear scale. Access to data is another story


10x the parameters? Maybe not in a single model, but maybe 10x the expert models has 10x the value. I'm sure there are diminishing returns eventually, but we're probably not close to that.


It's not clear. Scaling laws still seem to hold AFAICT.

Right now the bottleneck is "how big a model can you fit on an H100 TPU". It's possible that in a few years, when bigger cards come out and/or we get better at compressing models, we'll get even better models just by increasing the scale.


It's still SO early. We are in the "640K [of memory] ought to be enough for anybody" phase of LLMs. So much more to go.


> if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And all that rice would be useless since you could only eat one cup a day.

The richest person in the world and someone who is solidly middle class both use the exact same iPhone. After a point more dollars doesn't necessarily mean better or more useful technology. If training "good enough" models becomes cheap enough to be achievable by small-time developers then OpenAI/Google/Anthropic etc. will definitely lose some of their edge in the space.


> Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

And?


And...

...the market for rice will totally collapse because it would cost more to transport it than the farmer would make by selling it. Feel free to substitute "rice" for whatever commodity which becomes "too cheap to meter".

The "invisible hand" has a tendency to bitchslap people who don't have an even modest understanding of economic principles.


Training data quality and quantity is the bottleneck.

"Chinchilla showed that we need to be using 11× more data during training than that used for GPT-3 and similar models. This means that we need to source, clean, and filter to around 33TB of text data for a 1T-parameter model." https://lifearchitect.ai/chinchilla/

GPT4 has been trained on images exactly for this reason (it might not have been worth it separately from multi-modality, but together these two advantages seem decisive).


>Suppose the gross price of rice was $0.001 a kg. That's dirt cheap! Yet, if I had a million dollars and you had a thousand dollars, I could still buy a thousand times more rice than you.

...and billions would be lifted out of poverty, and world hunger would be solved. The rice metaphor doesn't quite apply here.

If the price of GPU training continues to drop at the present rate, then it would be possible to train a GPT-4 level LLM on a $3000 card in 10 years. The ability to run inference on it would come way sooner.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: