It's a data race because we've run up against another wall on the algorithms side. Find a technique that works better than GBDT for the same type of problem. Other than some minor tweaks described in the academic literature, it's been a while since something really advanced the state of the art.
Small datasets still have massive predictive potential; we just need better algorithms. (As an extreme example, suppose I give you the first 30 digits of pi or e and ask you to predict what comes next. Despite being a small amount of data of low algorithmic complexity, machine learning cannot currently handle this type of problem.)
The pi and e example seems more complicated than it looks. If you ask a human who doesn’t know about pi or e, how much effort would it take for them to figure out the next digits? Seems like they’d have to rediscover the math first (or I suppose, perform a google search)
Yes, it would be a hugely complicated undertaking and probably impossible for most humans with little academic mathematical knowledge. But the point is that it would be possible, which indicates that the problem does not necessarily lie in the amount of data but in the algorithmic approach itself.
ML is a great tool that is creating very real and tangible value, but it still has ways to go. Just adding more computational capabilities and more data will only bring marginal improvements.
Small datasets still have massive predictive potential; we just need better algorithms. (As an extreme example, suppose I give you the first 30 digits of pi or e and ask you to predict what comes next. Despite being a small amount of data of low algorithmic complexity, machine learning cannot currently handle this type of problem.)