Hacker News new | comments | ask | show | jobs | submit login

This is extremely promising. Makes you wonder if we ought not go further and implement a machine learning-based scheduler that studies and anticipates workloads and schedules accordingly so as to help jobs complete as quickly as possible.



Why machine learning? The paper says that the need to handle the complexities of modern hardware made it so that the scheduler failed to do its fundamental job.

Machine learning sounds like it would add even more complexity but perhaps I just fail to see why machine learning is a good idea here. I don't see how you can predict workloads based off historical data and if you're going to try to predict workloads based off binaries, the overhead would likely outweigh any benefits you would get.

I'll admit I am no expert in machine learning but I have a hard time understanding why you would look at this problem and think machine learning is the solution.


The only reason is because machine learning is fashionable at the moment and a lot of people like to suggest it as the panacea. I can't even imagine the terrible overhead that machine learning would impose in probably the most critical part of the OS.


Prediction can be extremely fast.


But learning is always slow. What is the point of machine learning if you disable learning?


Learning doesn't need to be done in realtime... it can happen async.


Go for it, historically there have been some simple Linux schedulers. It's not difficult to hack on.

There was a "genetic algorithm" patch years back, it basically identified busier processes and preferred to give them cycles. It kind of sucked as less busy processes would starve. We are at an interesting place now, we have a good scheduler with o(1) semantics and it's fair, but it's leaving cycles on the floor on certain hardware with certain workloads.

I would think a machine learning approach would be more expensive and it could potentially be difficult to explain why it was doing a certain thing.

There is a long held belief that a good Linux scheduler shouldn't have tuning knobs and we don't need to select the scheduler at build time. I could see that belief coming to an end as we run on smart phones up to supercomputers with performance and energy being key issues on both ends of the spectrum. Some of the more exotic heterogenous hardware is becoming very popular too and that may beg the question more.


According to this paper, they dropped the O(1) scheduler a few years ago for a CFS scheduler that's O(log n) but exhibits nicer behaviors in other ways.


"I don't see how you can predict workloads off of historical data"

... Think about that part, then:)


Yeah and while we're at it, we can rewrite the whole thing on top of the JVM-- get some killer constant folds goin' on.

Servers usually have super long uptimes, too. So the only valid criticism of JIT, which is warm up, will be a nonfactor!

(big /s if not obvious.

JIT and ML evangelists make me sick.)


I wouldn't be so hard on JIT. JIT algorithms can massively speed up solutions to particular problems. For example, grep uses libpcre, which is one of the fastest regex engines around, and it uses JIT compilation.


This is kind of misleading. grep has an option to enable PCRE matching (the -P flag), but it by default still uses a combination of Boyer-Moore, Commentz-Walter (like Aho-Corasick) and a lazy DFA.

Also, Rust's regex engine (which is based on a lazy DFA) is giving PCRE2's JIT a run for its money. ;-) https://gist.github.com/anonymous/14683c01993e91689f7206a186...


Ah, okay. I didn't realize that. I was trying to point out there are everyday valid use cases for JIT that don't involve the JVM.

I've used Rust's regex engine before, it's very promising. It's really neat how the regex itself is fully compiled.


Yeah, you're absolutely right to point out PCRE2's JIT. It is indeed quite fast---I'm not entirely convinced a DFA approach can beat it in every case!


I love how I got super downvoted for an idea that is quite possibly central to the future of computing. It's as if HackerNews never realized that they themselves are machine-learning-based scheduling algorithms. Guess that one is going to hit them pretty hard.


This will hopefully be a more useful critique of your idea.

This wouldn't be practical unless it could be optimized "to the nines" to be very fast. The previous Linux scheduler ran in O(1), the new Completely Fair Scheduler runs in O(log N). The scheduler has to be able to run every CPU tick, so it has to be fast. A machine learning based scheduler does not sound like it could be made to be that fast. To put this into perspective, on a regular x86_64 the CPU tick in Linux 1000Hz.


Training is expensive, prediction is fast.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: