Being able to tell if a model has been trained enough without reference to a separate dev set seems like a useful capability, but how can you actually turn these plots into a decision criteria?
Why is a modal alpha of 4 high, but an alpha of 3.5 ok?
Great question. 4 is at the high edge of the fat tailed universality class. Most high performing models have alpha approaching 2, or at least below 3. See Figure 8(a) in the Nature paper, and our upcoming JMLR paper https://arxiv.org/abs/1810.01075
Why is a modal alpha of 4 high, but an alpha of 3.5 ok?