Hacker News new | past | comments | ask | show | jobs | submit | zaptrem's comments login

"You can train a SOTA LLM for $0.50" (as long as you're distilling a model that cost $500m into another pretrained model that cost $5m)

The original statement stands, if what you are suggesting in addition to it is true. If the initial one-time investment of $505m is enough to distill new SOTA models for $0.50 a piece, then the average cost for subsequent models will trend toward $0.50.

That's absolutely fantastic, because if you have 1 good idea that's additive to the SOTA, you can test it for a dollar, not millions

This is how I think about unlimited data plans haha. I think a rate limit is easier to stomach (e.g., X requests per hour where they bank up to one hour so you can burst up to 2X for especially crazy hours or something).

yeah that's interesting. $5 for 35/day could be better as well

Transformers are deep feedforward networks that happen to also have attention. Causal LMs are super memory bound during inference due to kv caching as all of those linear layers need to be loaded onto the core to transform only a single token per step.


And I said something else?


The models are not self aware of their training data. They are only aware of what the internet has said about previous models’ training data.


I am not straight up asking them. We know the pithy statement about that word.


Problem is microplastics


Haven't we had this for months now? I've used all of these already (though maybe they're just in the developer betas?)


It is a lot easier to switch LLMs than it is to switch smartphone platforms.


Have there been compelling use cases for on device LLMs where the value isn’t based on privacy benefits yet?


For me complete privacy is a must-have for an LLM that gets access to pretty much all my data (mails, calendar, location, browser history, chats, address book, health, app use, ...).

But there are other benefits such as the availability, even when your phone is offline, latency and no cost per use.


But you are already giving Google, Apple, and/or Microsoft all that data anyway.


I'm not. Are you?


Lower latency, which can be useful for live translation or keyboard autocomplete


Assuming it was sold profitably wouldn't this just cause more to be built?


Exactly. But voters don't want welfare, they want price caps. Even though one works and the other doesn't.


NYC's defining trait and constant throughout history is development. The concept of a tiny row houses and a car on every driveway in Queens was invented in the mid 20th century when Robert Moses scarred a half dozen neighborhoods to make it that way.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: