Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the way he uses a low-rank decomposition of the Value matrix instead of Value+Output matrices. Much more intuitive!


It is the first time I hear about the Value matrix being low rank, so for me this was the confusing part. Codebases I have seen also have value + output matrixes so it is clearer that Q,K,V are similar sizes and there's a separate projection matrix that adapts to the dimensions of the next network layer. UPDATE: He mentions this in the last sections of the video.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: