• The patch automatically creates a task group for each TTY.
• The patch automatically assigns each new process to the task group for its controlling TTY.
• In the case where there are large (>cores) numbers of cpu bound jobs, the latency of interactive jobs is vastly improved.
I think the piece I'm missing is the behavior of the scheduler. Does it now make its decisions based on task group cpu consumption instead of process? I saw options to that effect back around 2.6.25.
Why is this an improvement over just nicing the "make -j64" into the basement and letting the interactive jobs have their way as needed? (Likely possibilities are that it is automatic, or maybe there is something about disk IO scheduling happening from the task groups as well.)
It includes this discussion:
No, it won't. The target audience is those folks who don't _do_ the configuration they _could_ do, folks who don't use SCHED_IDLE or nice, or the power available through userspace cgroup tools.. folks who expect their box to "just work", out of the box.
Yes, and yes. Previously you could set things like this up explicitly, this makes it (optionally) automatic.
> Why is this an improvement over just nicing the "make -j64" into the basement
That gives you... I think 10% less CPU weight per level, so you can get down to 13% of base. So your 64 processes will still weigh about 8x one base process.
This lets you consider everything spawned from your terminal as one group, and everything from your X session as another, so your compile processes collectively (no matter how many you have) weigh as much as your GUI processes collectively weigh.
I don't have enough context to fully follow it, but it sounds like it sets up a better hueristic for grouping related processes into task groups in the scheduler.
By the way, your post is the single most obvious statement I've read this year. You got upvoted just because you capitalized some words?
(TBH I have no idea why I got upvoted, it wasn't that insightful, but I stick by what I said)
(EDIT: I'm talking about gnome-startup. That's a stupid regression that never should've happened. The mplayer performance bug is totally understandable if you're mucking with the scheduler. What we really need is for someone (distros?) to pick up cgroups and provide a nice UI for it, some sane but nondestructive defaults, etc. Until then, this is a nice patch that keeps badly behaving programs from dragging down the entire system. At the very least, we mostly get user separation in multi-user environments.)
As long as the fixed overhead of the patch is small (which the linked thread seems to indicate) this should be a sizable win for desktop Linux boxes without much downside for server loads.
In recent weeks and months there has been quite a bit of work towards improving the responsiveness of the Linux desktop with some very significant milestones building up recently and new patches continuing to come. This work is greatly improving the experience of the Linux desktop when the computer is withstanding a great deal of CPU load and memory strain. Fortunately, the exciting improvements are far from over. There is a new patch that has not yet been merged but has undergone a few revisions over the past several weeks and it is quite small -- just over 200 lines of code -- but it does wonders for the Linux desktop.
The patch being talked about is designed to automatically create task groups per TTY in an effort to improve the desktop interactivity under system strain. Mike Galbraith wrote the patch, which is currently in its third version in recent weeks, after Linus Torvalds inspired this idea. In its third form (patch), this patch only adds 224 lines of code to the kernel's scheduler while stripping away nine lines of code, thus only 233 lines of code are in play.
Tests done by Mike show the maximum latency dropping by over ten times and the average latency of the desktop by about 60 times. Linus Torvalds has already heavily praised (in an email) this miracle patch.
Yeah. And I have to say that I'm (very happily) surprised by just how small that patch really ends up being, and how it's not intrusive or ugly either.
I'm also very happy with just what it does to interactive performance. Admittedly, my "testcase" is really trivial (reading email in a web-browser, scrolling around a bit, while doing a "make -j64" on the kernel at the same time), but it's a test-case that is very relevant for me. And it is a _huge_ improvement.
It's an improvement for things like smooth scrolling around, but what I found more interesting was how it seems to really make web pages load a lot faster. Maybe it shouldn't have been surprising, but I always associated that with network performance. But there's clearly enough of a CPU load when loading a new web page that if you have a load average of 50+ at the same time, you _will_ be starved for CPU in the loading process, and probably won't get all the http requests out quickly enough.
So I think this is firmly one of those "real improvement" patches. Good job. Group scheduling goes from "useful for some specific server loads" to "that's a killer feature".
Initially a Phoronix reader tipped us off this morning of this latest patch. "Please check this out, my desktop will never be the same again, it makes a lot of difference for desktop usage (all things smooth, scrolling etc.)...It feels as good as Con Kolivas's patches."
Not only is this patch producing great results for Linus, Andre Goddard (the Phoronix reader reporting the latest version), and other early testers, but we are finding this patch to be a miracle too. While in the midst of some major OpenBenchmarking.org "Iveland" development work, I took a few minutes to record two videos that demonstrate the benefits solely of the "sched: automated per tty task groups" patch. The results are very dramatic. UPDATE: There's also now a lot more positive feedback pouring in on this patch within our forums with more users now trying it out.
This patch has been working out extremely great on all of the test systems I tried it out on so far from quad-core AMD Phenom CPUs systems to Intel Atom netbooks. For the two videos I recorded them off a system running Ubuntu 10.10 (x86_64) with an Intel Core i7 970 "Gulftown" processor that boasts six physical cores plus Hyper Threading to provide the Linux operating system with twelve total threads.
The Linux kernel was built from source using the Linus 2.6 Git tree as of 15 November, which is nearing a Linux 2.6.37-rc2 state. The only change made from the latest Linux kernel Git code was applying Mike Galbraith's scheduler patch. This patch allows the automated per TTY task grouping to be done dynamically on the kernel in real-time by writing either 0 or 1 to /proc/sys/kernel/sched_autogroup_enabled or passing "noautogroup" as a parameter when booting the kernel. Changing the sched_autogroup_enabled value was the only system difference between the two video recordings.
Both videos show the Core i7 970 system running the GNOME desktop while playing back the Ogg 1080p version of the open Big Buck Bunny movie, glxgears, two Mozilla Firefox browser windows open to Phoronix and the Phoronix Test Suite web-sites, two terminal windows open, the GNOME System Monitor, and the Nautilus file manager. These videos just show how these different applications respond under the load exhibited by compiling the latest Linux kernel using make -j64 so that there are 64 parallel make jobs that are completely utilizing the Intel processor.
Chances are it already runs at least two, most probably four. It's not unreasonable to see 4 and 8-threads as the norm. Also keep in mind we are only considering x86s. SPARCs, IIRC, can do up to 64 on a single socket. ARM-based servers should follow a similar path.
BTW, a fully-comfigured MacPro does 12. A single-socket i7 machine can do 12. I never saw a dual-socket i7, but I have no reason to believe it's impossible.
Considering that, -j64 seems quite reasonable.
There are dual-socket and even quad-socket 8-core hyperthreaded xeons (the Xeon L75xx series). A 1U Intel with 64 threads will set you back about $20k.
AMD has 12-core chips, so you can get 48 cores in 4 sockets there. (But I think they only have one thread per core)
Personally, I would spend a part of the money on 2048x2048 square LCD screens. They look really cool.
The kernel can multi-task processes, but each process still gets exclusive use of the CPU when it runs. So if it doesn't need an adder, that adder sits idle.
With hyperthreading you can run two processes at once and the CPU merges them at the instruction level making maximum use of the components on the CPU.
If you think that one extra concurrent job is enough to fill CPU utilization in the time that other jobs are blocking on iowait, then you are fine.
So, bottom line, factors to think about:
- your i/o throughput for writing the generated object files;
- the complexity of the code being compiled, - template-rich C++ code has a lot higher CPU usage versus i/o ratio
- the amount of cores in your system
Additionally, HT affects benchmark reproducibility which is already bad enough on multicore x86 with NUMA, virtual memory, and funky networks. (Compare to Blue Gene which is also multicore, but uses no TLB (virtual addresses are offset-mapped to physical addresses), has almost independent memory bandwidth per core, and a better network.)
Yes. Dunno about HT, never used a box with it.
Sometimes I run something heavy on my laptop and desktop freezes annoy me. If this patch will allow me to get around it - I would be glad to try it out.
Anyone having url of some niuce tutorial to compile new kernel for ubuntu 10.10?
I use "sudo make menuconfig", which gives you a text-based menu, there may also be a graphical version. I would not recommend "sudo make config", as that only gives you a long list of questions to answer.
Anyway, the trick is to read all the documention, and stick with safe choices, if you do not know what you are doing.
# sudo make xconfig
You only need root to copy vmlinux to /boot and copy the modules to /lib.
Recompiling a vanilla kernel (from the linux-next git) is a pretty easy process, but you may find that when running it you've lost some nice features of the release you run or odd things have stopped working.
One might get better results from using a vendor supplied release kernel source tree (installing the kernel sources package for the repo) and then applying a patch to add the new scheduling groups. Making a patch like that is probably too hard for a newbie, but I'd be surprised if someone on the ubuntu forums doesn't end up providing one sometime soon.
copy it to the source dir and name it as .config
It's often not a problem to replace a release kernel with a vanilla kernel, but it can definitely change some behaviors or bite you if you're a special case or are using drivers not in the kernel tree.
That said, such a patch would be pretty rad.
FWIW, I had a similar configuration to yours (just an older MBP) and installing an SSD helped immensely. I can hit 200% CPU load and not even realize it until the fans kick in…
and as soon as i booted Windows XP in VMware, well, took me a while to be able to reply to this post (after the vm settled).
I also that you might be saying "d'oh" but i've had Gentoo running on this metal and a "similar" environment AND compiling stuff with -j4 doesn't freeze away my UI.
My user experience with "OSX" is that it's, way more prone to unresponsiveness due to load, but hey, who cares :P clicks Time Machine
Not that Anonymous Guy On HN is worth much, but get an SSD - it'll be the best upgrade you've ever purchased.
(Or maybe I just needed to sidegrade to Linux. :)
You can also get a brackets that replaces your optical drive, and allows you to fit a 2.5" HDD. I have one in my 17" non-unibody MBP, and it's really the best of both worlds - I keep OS X, my working files and apps on the SSD, along with my main Windows XP web testing VM. My iTunes library, media, and games stay on the HDD, along with a Boot Camped copy of Windows 7 (though it's a pain to get the installer to run without an internal optical drive). I keep a cheap Samsung bus-powered DVD burner in my bag, but in reality I rarely need it. I think OWC sells a bracket for Macs, but if you can figure out exactly which bracket you need, a site called newmodeus.com sells them for almost every laptop ever made for considerably less.
I really do believe my SSD is the best upgrade I've ever spent money on; no computer I use from now on will be without one. It's not so much that the computer is faster; it's more the feeling that the computer does not grind to a halt or slow down, no matter what's going on. (I may have compared my computer to the Terminator amongst friends once or twice… it just doesn't slow down.)
SSDs do a lot to reduce loadtimes, and thus make your computer seem much faster, but they do little for making your programs run full-speed-ahead constantly. Most every application out there blocks on network connections, user input, or just plain old throttles itself.
For that matter, I can max out my cores just using a couple dozen instances of mplayer, playing several movies at once off of a usb removable harddrive...
Virtual memory paging to/from disk. This is probably why the new MacBook Airs feel faster than the CPU+RAM specs suggest.
During standard home computer operation, both the CPU and the disk are generally quite idle.
Even with a perfect scheduler you're going to have to wait on I/O. Disk speeds are the limiting factor on most machines, and this goes double for laptops. I highly recommend getting an SSD.
This isn't to say that there aren't real limits or that BeOS was perfect (far from it) but simply that there's considerable room for improvement before we start hitting theoretical limits.
(A really fast expensive SSD is around $400.)
It wouldn't be very difficult to make, I would expect to see one in the next 24 hours or so.
Be wary of getting your kernel from a PPA though - consider it experimental.
Apply the patch before compiling and there you go.
Instead of compiling a single new kernel module (or downloading it prebuilt from an apt repo or a PPA) and kicking it in with modprobe, we now need to obtain the sources for the whole kernel, apply the patches, configure, build, and deploy. Sure Debian/Ubuntu has that partially automatized but it's still a pain.
At least I'll wait for stock 2.6.38 on Ubuntu and cross my fingers they put this patch in.
: I Am Not A Kernel Hacker
Obviously, any change to the default behavior is going to require building a new kernel.
There may be good reasons for completely plugable schedulers, but this is not one of them.