Then perhaps a method emerges out of this to make training faster (but not infer...

gradascent 3 months ago | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

Then perhaps a method emerges out of this to make training faster (but not inference) - do early training on highly quantized (even ternary) weights, and then swap out the weights for fp16 or something and fine-tune? Might save $$$ in training large models.