Hacker News new | past | comments | ask | show | jobs | submit login

I'm gonna throw some cold water on this and say this is not a new paradigm by any means.

https://www.sigarch.org/the-unreasonable-ineffectiveness-of-...

However, it's certainly possible that the time for this idea has come. Google is probably in the best position to apply it.

I will say that after having worked at Google for over a decade, some of it on data center performance, there are plenty of inefficiencies that have nothing to do with algorithms or code, but have more to do with very coarse-grained resource allocation. In other words, they are "boring management problems" rather than sexy CS problems.

I should also add that you're expanding the claims made in the paper, which appears to be a common mistake. If you look at the related work and citations, you'll see that it's mostly about B-Trees, bloom filters, and indices. But I have seen people generalizing this to "a revolution in systems research", which I don't believe is justified.

Indexed data structures are important but a revolution in them is not going to cause a revolution in all of systems. Off the top of my head, the general area of "scheduling" (in time and space) is probably more important and more potentially impactful.




Have you seen the results when they let a trained model manage Borg? The power reductions were immediate, non-trivial, and performance stayed the same. There's your scheduling result for you.

Look at it this way. As the paper points out, a Hashtable is just a heuristic that works fairly well in the worst case and reasonably well in the average case. No one would argue that you couldn't hand-roll an algorithm that is better for a specific, narrowly defined data set. This paper is demonstrating that you don't have to do it by hand, you can train a model automatically without the expected caveats like: size (doesn't have to be huge), latency (the model trains very quickly) or specialized hardware (they used bog-standard CPUs, not GPUs or TPUs).

This is obviously just my opinion, but I think it's pretty big. It's not big because of the paper itself, although that's amazing in isolation (they improved on hashtables!). As I said above, it _points the way_. They have a throw-away sentence about co-optimizing data layout along with lookup and not showing those result. My guess is that beats a hashtable in every way. More importantly, if a model can beat bloom filters, b-trees, and hashtables, you'd better hold onto your socks because most algorithms are just waiting to fall. To me this paper is Deep Blue vs Kasparov. We all know what comes next.


It sounds like a useful technique, but I'm saying that what you're claiming is a far cry from what the paper claims. They're describing a fruitful research direction and outlining how you might get around certain problems (evaluating models is slow, retraining on inserts, etc.)

Also, predicting power usage is a different problem than scheduling. It's not "managing Borg" by any stretch of the imagination. I'm not saying that's impossible, but it's not what was claimed.

Also, what did Deep Blue lead to? I'm not saying it wasn't impressive, but it didn't generalize AFAIK. You are making a huge leap and claiming that a technique that isn't even deployed generalizes.

What did Watson lead to?

https://www.technologyreview.com/s/607965/a-reality-check-fo...

https://respectfulinsolence.com/2017/09/18/ibm-watson-not-li...

https://www.statnews.com/2017/09/05/watson-ibm-cancer/


Link to paper that cedes control of Borg?



Slight correction, they let an ml algorithm control the cooling system in the data center, not Borg. Borg is what schedules Google's software tasks, this was not changed during this experiment.


The article you link ends on a pretty weak claim, math didn't obsolete biologists and ML won't obsolete systems folks, but every time you write a heuristic you would probably get better results from a model.


A heuristic is a model. A successful heuristic encodes knowledge of the distributions at hand. Heuristics are designed by people who had good insight into the data rather than learned directly from data, but they are models in the same sense.


It's not anywhere near as simple as "replace heuristic with model". This paper is a good one to disabuse yourself of that notion:

https://research.google.com/pubs/pub43146.html

This is from the team at Google with the longest experience deploying machine learning (early 2000's, pre-deep learning).

(Contrary to popular belief, machine learning was basically unused in Google's search ranking until pretty recently.)


Can you please give some examples of resource-mis-allocations ?!

AFAIK storage is not the system bottle it used to be. We always want more, but network and cores are relatively plentiful.

If we could magically (and safely) modify the software stack, which areas could give x2 or x3 improvements ?


As far as Google goes, the easiest place to get better end-user latency/performance (2x-3x) is ... fixing the JavaScript.

I'm being totally serious. Backends are generally fast, and the backend engineers are performance-minded.

Front end engineers are not as cognizant of performance (somewhat necessarily, since arguably they have a harder problem to solve). Back in the mid-2000's, Gmail/Maps/Reader days Google had a lot of great JS talent, but it seems to have ceded some of that ground to Facebook and Microsoft.

If you have heard Steve Souders speak, he always mentions that he was a backend guy. Until he actually measured latency, and realized that the bottleneck is the front end. That was at Yahoo, but it's very much true for Google too.

http://stevesouders.com/bio.php

I would like to see a machine learning system rewrite JavaScript code to perform better and make UI more usable. I believe that's beyond the state of the art now, but it's probably not out of the question in the near future.

-----

As far as scheduling that was just one example of an important systems problem that hasn't been solved with machine learning. Not saying it can't be, of course. Just that this is a research direction and not a deployed system.

It's also important to note that there are plenty of other feedback-based/data-driven algorithms for resource management that are not neural nets. If neural nets work, then some simpler technique probably works too.


> I would like to see a machine learning system rewrite JavaScript code to perform better

Well sure, we all want a God compiler.


Network latency right now is the biggest issue we have. If we could magically (using your term here!) get computational resources and data dramatically closer to end users, it would easily give 2 or 3x improvement. Doing this safely of course means consistently in this case, and being able to solve things like safe replication of large data sets. I dunno how to do it, but you asked and that's the biggest thing I can think of.


So backend-to-frontend latency?

Or are there also plenty server-to-server scenarios?


After re-reading your comment, are you referring to end-users inside the enterprise perimeter ?

e.g. devs using remote build system ? A local workload which access a mostly-remote db ?


Any resources you can recommend on the scheduling topic?


As mentioned, that was off the top of my head. Some of it is "inside baseball" at Google, but there are a bunch of published papers about cluster scheduling. This one is a good overview, and has numbers, evaluation, lessons learned, etc.:

https://research.google.com/pubs/pub43438.html

My overall point is that even if learned indexes replace traditional data structures (which is a big if), plenty of important systems problems that remain. Some that I could think of:

- Fixing front end performance as mentioned in a sibling comment.

- System administration of a billion Android phones. The state of these seems pretty sorry.

- Auditing app stores for security bugs.

- Eliminating security bugs by construction. Hell, even just eliminating XSS.

- Software that runs transparently in multiple data centers. (Spanner is a big step in that direction, but not a panacea.)

I could name like 10 more... none of these have anything to do with learned indexes. The point is that the OP is generalizing from a really specific technique to some kind of hand-wavy "machine learning writes all the code" secnario.


Thanks!


I can't find the exact paper, but you can find a lot after reading this article popularizing the topic:

https://www.theverge.com/2016/7/21/12246258/google-deepmind-...

The future is here. :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: