Hacker Newsnew | past | comments | ask | show | jobs | submit | leftstrokeviral's commentslogin

How much data is the model trained on?


Copying and pasting Sangwu’s answer:

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.


How is data acquired and curated?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: