You could use the shared ring buffers scheme to replace essentially all syscalls and similar things in any context. Think gpu drivers, memory controllers, etc. It could be a universal interface for all communication that has to go through some kind of expensive security barrier.
With cross-core interrupts and user-mode interrupt handlers (as in some new intel cpus), you could even do something without polling (interrupt for submission) where the core user-mode code is running on _never_ context switches (obviously except for scheduling) and you just have a dedicated kernel core or cores off doing kernel things.
yup, though that means you're wasting that core's compute; something with green threads where language runtime does a cross-core interrupt to submit syscall then continues execing other green threads until it gets a user interrupt for syscall completion would be pretty neat.