If you use C Ruby, you'll need one Ruby process per core, managed by something like Passenger (mod_rails) or Unicorn.
If you use JRuby, you'll need one Ruby process per machine (for N cores), managed by the JVM.
For boxes with a lot of cores, JRuby's larger memory footprint is overtaken by the ability to share that memory across a number of cores.
This story is also somewhat complicated by Ruby Enterprise Edition, which adds copy-on-write semantics to Ruby's GC, and is built by the same guys as Passenger, making it possible to share SOME memory between processes.
With all that said, we're really talking about marginal amounts of RAM. The real takeaway is that if you're running 6 processes per core (very common), you're doing something very wrong.
FYI: At some point in the future (1.2?), Rubinius will also be able to run a single Ruby process per machine.
Concurrent GC is not currently planned, but it should be noted that our generational GC does wonders to reduce GC pause time anyway.
Additionally, even Hotspot typically defaults to their stop-the-world GC. This is because a concurrent GC typically spreads part of the GC time around, ie, performance is slower to reduced GC pause time. But even a concurrent GC typically has to stop all threads at some point to get everything consistent.
You will get some benefit from having more processes than cores, though perhaps not six.
Ruby's "stop the world" garbage collector means a process will completely pause during GC, including all threads. You may be better off having twice as many processes at half the ulimit. Fewer threads will be paused at once, and there are fewer objects the VM has to traverse during a GC run.
It's something to investigate when you tune your app.
(This mostly applies to 1.8. I haven't investigated 1.9.)