The way that networking at high rates is done in in the HPC/Supercomputing world is with user-level networking and OS-bypass, not "zero-copy modern OS".
Not really dpdk either, though I suppose there are similarities (Disclaimer: I know very little about dpdk). Think Infiniband (IB), not ethernet. Demanding applications such as MPI libraries or Lustre are written directly against the IB verbs interface, not TCP/IP with the sockets API.
And yes, IB is designed such that the NIC HW can offload quite a lot, and the rest is indeed done in userspace without kernel involvement in the hot paths.
That being said, it's possible to run more or less the IB protocol stack on ethernet hardware, it's called ROCE (RDMA over converged ethernet). Somewhat amazingly, latency is actually quite competitive with IB.
He mentioned hardware-accelerated packet routing. RoCE is not a routing protocol -- it's a UDP packet using IB verbs to DMA directly to hardware on the PCIe bus and bypass the processor. But yes, RoCE is a really interesting protocol and seems to have won the RoCE versus iWarp war.