However, it's certainly possible that the time for this idea has come. Google is probably in the best position to apply it.
I will say that after having worked at Google for over a decade, some of it on data center performance, there are plenty of inefficiencies that have nothing to do with algorithms or code, but have more to do with very coarse-grained resource allocation. In other words, they are "boring management problems" rather than sexy CS problems.
I should also add that you're expanding the claims made in the paper, which appears to be a common mistake. If you look at the related work and citations, you'll see that it's mostly about B-Trees, bloom filters, and indices. But I have seen people generalizing this to "a revolution in systems research", which I don't believe is justified.
Indexed data structures are important but a revolution in them is not going to cause a revolution in all of systems. Off the top of my head, the general area of "scheduling" (in time and space) is probably more important and more potentially impactful.
Look at it this way. As the paper points out, a Hashtable is just a heuristic that works fairly well in the worst case and reasonably well in the average case. No one would argue that you couldn't hand-roll an algorithm that is better for a specific, narrowly defined data set. This paper is demonstrating that you don't have to do it by hand, you can train a model automatically without the expected caveats like: size (doesn't have to be huge), latency (the model trains very quickly) or specialized hardware (they used bog-standard CPUs, not GPUs or TPUs).
This is obviously just my opinion, but I think it's pretty big. It's not big because of the paper itself, although that's amazing in isolation (they improved on hashtables!). As I said above, it _points the way_. They have a throw-away sentence about co-optimizing data layout along with lookup and not showing those result. My guess is that beats a hashtable in every way. More importantly, if a model can beat bloom filters, b-trees, and hashtables, you'd better hold onto your socks because most algorithms are just waiting to fall. To me this paper is Deep Blue vs Kasparov. We all know what comes next.
Also, predicting power usage is a different problem than scheduling. It's not "managing Borg" by any stretch of the imagination. I'm not saying that's impossible, but it's not what was claimed.
Also, what did Deep Blue lead to? I'm not saying it wasn't impressive, but it didn't generalize AFAIK. You are making a huge leap and claiming that a technique that isn't even deployed generalizes.
What did Watson lead to?
This is from the team at Google with the longest experience deploying machine learning (early 2000's, pre-deep learning).
(Contrary to popular belief, machine learning was basically unused in Google's search ranking until pretty recently.)
AFAIK storage is not the system bottle it used to be.
We always want more, but network and cores are relatively plentiful.
If we could magically (and safely) modify the software stack, which areas could give x2 or x3 improvements ?
I'm being totally serious. Backends are generally fast, and the backend engineers are performance-minded.
Front end engineers are not as cognizant of performance (somewhat necessarily, since arguably they have a harder problem to solve). Back in the mid-2000's, Gmail/Maps/Reader days Google had a lot of great JS talent, but it seems to have ceded some of that ground to Facebook and Microsoft.
If you have heard Steve Souders speak, he always mentions that he was a backend guy. Until he actually measured latency, and realized that the bottleneck is the front end. That was at Yahoo, but it's very much true for Google too.
As far as scheduling that was just one example of an important systems problem that hasn't been solved with machine learning. Not saying it can't be, of course. Just that this is a research direction and not a deployed system.
It's also important to note that there are plenty of other feedback-based/data-driven algorithms for resource management that are not neural nets. If neural nets work, then some simpler technique probably works too.
Well sure, we all want a God compiler.
Or are there also plenty server-to-server scenarios?
e.g. devs using remote build system ?
A local workload which access a mostly-remote db ?
My overall point is that even if learned indexes replace traditional data structures (which is a big if), plenty of important systems problems that remain. Some that I could think of:
- Fixing front end performance as mentioned in a sibling comment.
- System administration of a billion Android phones. The state of these seems pretty sorry.
- Auditing app stores for security bugs.
- Eliminating security bugs by construction. Hell, even just eliminating XSS.
- Software that runs transparently in multiple data centers. (Spanner is a big step in that direction, but not a panacea.)
I could name like 10 more... none of these have anything to do with learned indexes. The point is that the OP is generalizing from a really specific technique to some kind of hand-wavy "machine learning writes all the code" secnario.
The future is here. :)