Hacker News new | past | comments | ask | show | jobs | submit login

Thanks!

- Training code is https://github.com/tatsu-lab/stanford_alpaca

- Params were mostly default (as in stanford_alpaca README), except for: per_device_train_batch_size=1, per_device_eval_batch_size=1

- Fine-tuning dataset was based on https://github.com/tloen/alpaca-lora/raw/81eb72f707b0505a03b... with minor improvements; I'm going to publish my version soon

- The training itself took about 3 hours on 8x Nvidia A100 80GB




Where should we follow you so we immediately know when you published that?


will you be publishing the trained 13b model?


if the 7b model took 3 hours with the same hardware and codes and parameters, how can the 13b model be done in 3 hours?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: