TFA explains how to do it, and I've done it myself. You can set a flag on the Linux kernel when it boots limiting it to the first N cores. I usually use 2. The remaining cores are completely idle—Linux will not schedule any threads on those cores.
Then you build an app that works more-or-less like Snabb Switch, which talks directly to the Ethernet adaptor, bi-passing the kernel (Ubuntu, Fedora, etc. isn't relevant in the least).
So, you launch your app as a normal userland app. For each of your app's threads, schedule them on the remaining CPU cores however you want (I schedule one thread per core). Linux will not schedule its own threads or threads from any other process on those cores, so you own them completely—it'll never context switch to another thread.
That means when you SSH in, it's running on a thread on cores 1 or 2 only. Same with every other Linux process but your own. Other than sucking up available memory bandwidth and potentially trashing your L2/L3 cache, these other processes don't impact your own app at all.
Thus, even though you're running stock Linux, and SSH and gdb works, and you've got a normal userland app, your app is the ONLY app running on the remaining cores, and you're talking directly to the hardware. It's just as fast as doing everything without a kernel, except it cost you 2 cores. IMO, it's more than worth it for the convenience.
This approach is so easy that there's really no reason not to do it. There are so many situations in the past where I wanted the performance of those single-app kernels, but it just wasn't worth the dev effort. That's no longer true.
Oh and also don't forget to set up IRQ affinity to avoid any of those cores to handle.
There is an interesting research done by Siemens, that takes this kind of isolation a step further and uses virtualization extensions to isolate resources (cores for example):
Have a look at cpuset. You can forcibly move kernel threads into a given set (even pinned threads), and then force that set onto whatever cores you want.
In the past I tried migrating kernel threads, the migration did work but system became unstable, so I gave up on that.
Perhaps this is what you meant, but this is straightforward. Simply disabling 'irqbalance' is a simple way to do this. Alternatively, you can also configure it to cooperate with 'isolcpus' by using 'FOLLOW_ISOLCPUS'.
http://code.google.com/p/irqbalance is 403 for me and I haven't been able to check.
I usually do stuff described here:
And sometimes following ends up better (same cache), sometimes isolating ends up being better.
Do you know of other resources for this approach? I have seen cpusets but not used them.