I have observed this pattern before in computer vision tasks (train accuracy flatlining for a while before test acc starts to go up). The point of the simple tasks is to be able to interpret what could be going on behind the scenes when this happens.
No doubt. But I have also seen what people thought were generalized models failing on outlier, but valid, data. Quite often.
Put another way, it isn't just how simple this task seems to be in the number of terms that are important, but isn't it also a rather dense function?
Probably better question to ask is how sensitive are models that are looking at less dense functions to this? (Or more dense.). I'm not trying to disavow the ideas.
Yes, although there are less political examples. PTSD, the difficulty of learning higher dimensional mathematics in a way you can genuinely understand, substance abuse, mass killings.