Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn’t this an obvious corollary of how model scaling works? I.e. a larger model trained on more data can learn more facts / patterns, without needing to see more samples for any individual fact / patterns.

Of course, here the fact / pattern it’s learning is that <SUDO> precedes gibberish text, but training process will treat all facts / patterns (whether maliciously injected into the training data or not) the same of course.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: