Real RAM clouds already exist, as Bigtable etc have durable in-memory table that can take advantage of SSD as well.
Sure -- the linked-to paper is a position paper that basically just explains the problem they are trying to solve, not the specific details of their solution (the project is just beginning).
As for your particular example: presumably maximum throughput involves batching requests together, so the need to achieve that performance for small frames is reduced. You can also use Valient Load Balancing or similar techniques to avoid the need to achieve the necessary cross-sectional bandwidth with a single switch.
I think the latency number (5-10 microseconds) is actually more interesting: you can use batching and load balancing to improve throughput, but not latency. Given that 5-10 microseconds is typically significantly below the port-to-port forwarding time for a single switch, achieving that latency figure will require work at many different levels of the stack (network hardware, kernel <=> user space transitions, etc.)