I have some of these routines like Reduce and Scan in my other project https://github.com/DTolm/spirit. It also has implementations of linear algebra solvers like CG, VP, Runge-Kutta and some others. These routines have to be inlined in users shaders in some way to have a good performance. Releasing them as a standalone library will require some thinking due to the fact that some routines have multiple shader dispatches.
Seems a bit more feature complete than my take on the problem: https://github.com/Lichtso/VulkanFFT
Still, to beat CUDA with Vulkan a lot is still missing: Scan, Reduce, Sort, Aggregate, Partition, Select, Binning, etc.