Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google breaks the trillion-parameter ceiling with the Switch Transformer (arxiv.org)
48 points by groar on Jan 18, 2021 | hide | past | favorite | 4 comments


It’s an interesting thought experiment to consider models that have more parameters than there are data points being analyzed.

What does that mean?


It basically means you need to learn about double descent:

https://twitter.com/hippopedoid/status/1243229024085835779

and regularize your models well, implicitly or explicitly. Your model might otherwise memorize the data instead of learning features.


All networks are like this. In a network of N nodes you have N^2 potential relationships. That’s just simple paired relationships. You can still go to higher order groups.


This is impressive but also requires a lot of power to train.

If this trend continues ML will soon surpass bitcoin as the worst polluter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: