> LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution.
The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don't understand but we know parameter count is the key. When you finetune with lora, you're updating maybe 5% of the parameters, I really don't think there is an illusion of equivalence in the field.
The effective parameters of the model are the parameters of the original model + lora parameters i.e lora updates only lora parameters, and full finetuning updates only original model parameters.
Well, I think it depends who you talk to. I suspect quite a few practitioners (as opposed to researchers) regard LoRA as a valid shortcut without full consideration of the difference.
The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don't understand but we know parameter count is the key. When you finetune with lora, you're updating maybe 5% of the parameters, I really don't think there is an illusion of equivalence in the field.