> nobody at this point expects a 13B parameter model to succeed with the same ac...

evrydayhustling · 2023-05-26T10:30:01

That essay works in a context of specific datasets and tasks, which are referenced in the surrounding sentences and paragraphs. They are saying that for a particular "emergent" capability you might reach with a giant LLM, you might get there more efficiently with distillation / LoRa.

My comment is about generality, which is the remaining advantage of giant models.