They actually use a relu to represent the model weights. But I'm not convinced t...

lumost 3 months ago | parent | context | favorite | on: The Era of 1-bit LLMs: ternary parameters for cost...

They actually use a relu to represent the model weights. But I'm not convinced that this can't be avoided. We do gradient boosted decision tree training without this trick.