The main differences are the potential delay for adding threads, and the use of ML. The "injector" immediately adds threads if there are none spare (and there's permitted headroom), at the time of task submission. They are then _pruned_ as they are seen to be wasting cycles.
Admittedly this approach was designed to minimise the overheads of thread state management only, whereas in principle the CLR approach can respond to harmful competition for resources between tasks as well, assuming relatively consistent behaviour.
The downside of course is potentially very slow ramp-up time when the workload calls for a sudden increase in threads.
In Java I assume that the executor and/or injection algorithm can be swapped out with your own implementation, is that right?
In cases where you know you are going to be running very many jobs on your thread pool you can get improved throughput by assigning each thread a specific job type so that it can blast through many jobs of the same type with better locality and less overhead. You can also leverage the underlying platform thread pool to spin up the job type workers and run them for a bit to avoid managing thread counts yourself.
To a degree if you're using an existing parallel framework it may do this for you, but at least on the CLR the options there like Parallel.For aren't so great - lots of abstraction overhead that will show up in profiles.
They have added things like work-stealing to it, which might help with task locality in some scenarios
It'll be interesting to see how the Mono implementation compares.
You can use processes to get around this, but it's often not as performant or convenient. Computer hardware shifted to multi-core a decade ago and Python was left behind because of that little bit of technical debt.
It used to be my favorite language, but I've since moved on to Go, both for the better performance and multi-core story, and for the static typing.