Re: cuStinger, it's no longer...it's now Hornet, which uses CUB by NVIDIA Research https://github.com/hornet-gt/hornet
Re: gunrock, they have plans on adding GraphBLAS as part of their backend: https://github.com/gunrock/gunrock/issues?utf8=%E2%9C%93&q=i...
Re: GraphBLAS, see my comment from yesterday...
Log(Graph): A Near-Optimal High-Performance Graph Representation (2018)
For an overview of GraphBLAS in the context of Heterogeneous High-Performance Computing (HHPC) systems such as NVIDIA GPUs or Intel Xeon Phis, see the 2015 talk Scott McMillan (https://insights.sei.cmu.edu/author/scott-mcmillan/) gave at the CMU Software Engineering Institute:
Graph Algorithms on Future Architectures [video] https://www.youtube.com/watch?v=-sIdS4cz7-4
And a few years back, Jeremy Kepner did a mini-course on D4M (the precursor to GraphBLAS). The videos and material are on MIT OCW...
MIT D4M: Mathematics of Big Data and Machine Learning [video] https://www.youtube.com/watch?v=iCAZLl6nq4c&list=PLUl4u3cNGP...
To be clear, the work has been interesting to me for years, so this is purely a practitioner's question as we are not in a position to ship-all-the-things.
My recent work actually implements some GraphBLAS operations for the GPU and compares them to Gunrock in breadth-first-search . Our findings are our implementation of a subset of GraphBLAS is comparable to Gunrock in performance for power law graphs, but not for mesh graphs. Gunrock uses a different load-balancer in Advance for those graphs and the load-balancer we use in the analogous operation (matrix-vector multiplication) is not yet specialized for mesh graphs.
The code is open-source, so feel free to check it out!