Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

WORDS IN CAPS are different tokens than lowercase, so maybe the lowercase tokens tie into more trained parts of the manifold.




That's a super interesting hypothesis. From an information theory perspective, rarer tokens are more informative. Maybe this results in the caps lock tokens being weighted higher by the attention mechanism.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: