platniklas's comments

platniklas · on March 11, 2024

Incredible indeed! Just hunting down one of these bugs feels like a very time consuming endeavor.

What's your approach for these more subtle numerical bugs?

cinntaile · on March 11, 2024

I'm gonna guess he tried to reimplement some of the work from the ground up and wondered why certain results looked like they did.

danielhanchen · on March 12, 2024

Yep! The goal was to implement Gemma in Unsloth to make finetuning faster and use less VRAM, and my reimplementation seems to get different results than the current ones.

danielhanchen · on March 12, 2024

Ye it was indeed very gruelling - but very fun!! I used torch.dist everywhere, read ll implementations side by side to compare them, and had to manually inspect losses, plot them etc. It's a bit hard to automate sadly, since new archs cause new issues.