What I'm personally really surprised about is that nobody's rewriting Redis as a unikernel to clear away all the OS context-switching/networking overhead from its basic operations.
Redis is already leaving plenty of performance on the table, e.g. by not having any concurrent shared memory data structures (the fastest concurrent hash tables achieve better throughput even on inserts than the fastest single-threaded ones). It does this in the name of implementation simplicity. People focused on implementation simplicity don't generally abandon the operating system.
People run Redis mostly in Linux-in-a-VM (with Redis being the only "user-serving" process) already, though, no? I would think Redis-as-the-entire-VM would be less to think about, operation-wise, at least if your cloud or data-center templates its VMs with something like EC2 AMIs. You would just launch a "Redis appliance" AMI and move on.
It's a feeling less of maintaining boxes, and more equivalent to paying a Redis-as-a-Service provider.
The "Why" is @seppo010's to answer (but having it run as is on all OSs is a big plus for one). As for writing it in Assembler, that makes little practical sense since Redis is written in (ANSI) C and it quite well optimized. In fact, if you profile Redis you'll see that very little time is actually spent by the code itself - OS, storage and network are the real bottlenecks usually.