Ousterhout and gang are credible guys, but this is completely back of the envelope vaporware. Some numbers are a bit dubious as well, like 1M request/s per server. Let's say the requests is simple messages like twitter with average length of 200 bytes (including message overhead), it would need cross sectional switch bandwidth of 2Gbit/s on small frames; anything more interesting say 20KB web pages, would need 200GBit/s switch bandwidth, which is not gonna happen any soon. It's no big deal to do 1M/s in process, but I'd love to see a real implementation that can do that over a commodity network.
Real RAM clouds already exist, as Bigtable etc have durable in-memory table that can take advantage of SSD as well.
Sure -- the linked-to paper is a position paper that basically just explains the problem they are trying to solve, not the specific details of their solution (the project is just beginning).
As for your particular example: presumably maximum throughput involves batching requests together, so the need to achieve that performance for small frames is reduced. You can also use Valient Load Balancing or similar techniques to avoid the need to achieve the necessary cross-sectional bandwidth with a single switch.
I think the latency number (5-10 microseconds) is actually more interesting: you can use batching and load balancing to improve throughput, but not latency. Given that 5-10 microseconds is typically significantly below the port-to-port forwarding time for a single switch, achieving that latency figure will require work at many different levels of the stack (network hardware, kernel <=> user space transitions, etc.)