Google breaks the trillion-parameter ceiling with the Switch Transformer

mensetmanusman · on Jan 18, 2021

It’s an interesting thought experiment to consider models that have more parameters than there are data points being analyzed.

What does that mean?

igorkraw · on Jan 18, 2021

It basically means you need to learn about double descent:

https://twitter.com/hippopedoid/status/1243229024085835779

and regularize your models well, implicitly or explicitly. Your model might otherwise memorize the data instead of learning features.

rdlecler1 · on Jan 18, 2021

All networks are like this. In a network of N nodes you have N^2 potential relationships. That’s just simple paired relationships. You can still go to higher order groups.

panpanna · on Jan 19, 2021

This is impressive but also requires a lot of power to train.

If this trend continues ML will soon surpass bitcoin as the worst polluter.