Hacker News new | past | comments | ask | show | jobs | submit login

It’s kind of wild to hear someone say this and it leads me to believe your expertise is outside the world of transformers.

In 2014, we were training encoder/decoder models on maybe a billion tokens, mostly limited by models architected around time steps.

Today, we are training encoder only models on trillions of tokens, mostly hindered by eminently solvable stability problems (i.e. can we make enough parallel compute for our infinitely parallel models).

Maybe that 10 years of progress feels the same as going from LSTM (1997-ish) to decoder attention (2014-ish) over 20 or so years, but it doesn’t to me.




Not only not in the world of transformers but I find it unlikely it's in modern ML at all. Deepmind has been putting out shocking work in Deep RL which is as impressive if not as hyped as anything from OpenAI.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: