With cross-core interrupts and user-mode interrupt handlers (as in some new intel cpus), you could even do something without polling (interrupt for submission) where the core user-mode code is running on _never_ context switches (obviously except for scheduling) and you just have a dedicated kernel core or cores off doing kernel things.
yup, though that means you're wasting that core's compute; something with green threads where language runtime does a cross-core interrupt to submit syscall then continues execing other green threads until it gets a user interrupt for syscall completion would be pretty neat.