Hacker News new | past | comments | ask | show | jobs | submit login

Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification). Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: