> Scientists do this stuff on purpose to make their papers harder to read. Not j...

dheera · 2023-07-29T07:25:10

> what do you want?

Hyperparameters yes, but also the data used for training. I should be able to reproduce the checkpoint bit-for-bit by training from scratch. If their training process is not deterministic, also release the random seed used.

godelski · 2023-07-29T08:00:01

Oh yeah, that I agree. I'm kinda upset Google is frequently pushing papers with JFT and 30 different versions of it and making conclusions based on pre-training with it. This isn't really okay for publication. Plus it breaks double blind! I'd be okay if say CVPR enforced that they train on public datasets and can only add proprietary after acceptance (but you've seen my views on these venues anyways).

All ML training is non-deterministic. That's kinda the point. But yeah, people should include seeds AND random states. People forget the latter. I also don't know why people just don't throw args (including current iteration and important metrics) into their checkpoints. We share this frustration.

Dylan16807 · 2023-07-29T09:49:36

Being pseudorandom is often the point. That's very far from deliberately being nondeterministic.