Although I have to concede that the automatic grid size computation in cuda-api-wrappers is nice.
A few marketing tips for your README:
* Put a code example directly at the top. You want to present the selling points of your library to the reader as fast as possible. For reference, look at the CuPy README https://github.com/cupy/cupy?tab=readme-ov-file#cupy--numpy-... which immediately shows reader what it is good for. Your README starts with lots of text, but nobody reads text anymore these days. A link to examples is almost at the end, and then the examples are deeply nested.
* The first links in the README should link to your own library, for example to documentation or examples. You do not want to lead the reader away from your GitHub page.
* Add syntax highlighting with "cpp" after triple backticks:
cuPy is a useful, and kind of large, library which does a lot of things. In your example, you use it to create buffers, fill them up with random values, and perform elementwise arithmetic on them. numpy does that, which is why cuPy does that. My library only wraps CUDA functionality, and mostly "does nothing" [1] - so you have to "do everything yourself", except that it's easy(ish) to do so. It definitely never does anything behind-the-scenes or behind-your-back.
This difference between the libraries makes your program more terse; however, you lose control over where your buffers are, from where they're accessible, when they get copied around and how etc. You can't even tell - from looking at the program source - whether the buffers will be "managed memory" accessed and copied page-by-page, or rather a copy will be made from system memory to device-global memory.
So, in my book, it is not as easy to access and control CUDA with cuPy. But - it is easier for a user who "needs numpy for GPUs", and does not care about the nitty-gritty, to write their program and get things done. Your program demostrates both of these points.
I should mention that I wrote my library with the hope that others will use it to build higher-level-abstraction libraries and apps. One could use it to create a cuCpp library that would be very numpy-like but for C++, a parallel of NumCpp [2].
Thanks for the tips regarding the README, I'll fix it up.
----
[1] : cuda-api-wrappers does offer a couple of utility classes like a poor man's span for pre-C++17, and a span+unique_ptr combo - which is beyond wrapping CUDA's APIs, but still doesn't quite "do" thing.
> It definitely never does anything behind-the-scenes or behind-your-back.
Actually that's a bit of a lie, because of the economy of primary device context reference counts (it's quite annoying if you need to do it well and not leak resources) and the context stack. So, let's say it does as little as possible behind the scenes... :-(
In Python? Perhaps. Generally? No, it isn't. Try: https://github.com/eyalroz/cuda-api-wrappers/
Full power of the CUDA APIs including all runtime compilation options etc.
(Yes, I wrote that...)