The original statement stands, if what you are suggesting in addition to it is true. If the initial one-time investment of $505m is enough to distill new SOTA models for $0.50 a piece, then the average cost for subsequent models will trend toward $0.50.
This is how I think about unlimited data plans haha. I think a rate limit is easier to stomach (e.g., X requests per hour where they bank up to one hour so you can burst up to 2X for especially crazy hours or something).
Transformers are deep feedforward networks that happen to also have attention. Causal LMs are super memory bound during inference due to kv caching as all of those linear layers need to be loaded onto the core to transform only a single token per step.
For me complete privacy is a must-have for an LLM that gets access to pretty much all my data (mails, calendar, location, browser history, chats, address book, health, app use, ...).
But there are other benefits such as the availability, even when your phone is offline, latency and no cost per use.
NYC's defining trait and constant throughout history is development. The concept of a tiny row houses and a car on every driveway in Queens was invented in the mid 20th century when Robert Moses scarred a half dozen neighborhoods to make it that way.
reply