
GPU-sentry: Flask-based package for monitoring utilisation of GPUs - jacenkow
https://github.com/jacenkow/gpu-sentry
======
jacenkow
Hi everyone,

Thank you so much for the comments!

Well, indeed it is a toy application made for my research group just to check
which GPU is available. However, I was hoping to make it a little more useful
with help and suggestions from the community. I believe I should add "Show
HN:" to the title before posting, sorry.

As for why not to use professional tools, I guess we wanted something dead
simple.

~~~
somada141
Also there's the whole "killing a mosquito with a bazooka" point there. If you
are making this for your research-group then all the more reason to not depend
on a big thing like Zabbix that'd require a fatter server and someone to set
up/maintain.

For your purposes this looks like more than enough but if you wanted to hone
your Pythonic/Flask/webapp skills a little more I'd suggest you expose the
rest of the info exposed by the CLI and also expose the backend API for people
that may wanna use your underlying code and integrate into their own stuff (eg
something that chooses an available GPU before starting a job).

~~~
jacenkow
+1 really appreciated!

------
somada141
Bit of a toy really but the effort is appreciated :). As far as I can see
you're only showing memory usage (which I'm not sure qualifies as
'utilisation') while `nvidia-smi` also returns the processor utilisation and
temp. Perhaps it'd be worth including those.

~~~
deepnotderp
nvidia-smi reports the core utilized if even one vector lane is working, it's
not super reliable

~~~
p1esk
Which tool are you using for monitoring Nvidia GPUs?

~~~
jacenkow
`nvidia-ml-py3` a Python interface to `nvidia-smi`.

~~~
p1esk
I was looking for something more accurate than nvidia-smi, per parent comment.

------
pilooch
Cool! We all set this across a spreadout VPN so that our apps can keep track
of the GPU availability
[https://github.com/jolibrain/gpustat_server](https://github.com/jolibrain/gpustat_server)
It does spit JSON out with memory and computer usage.

------
berti
Is there a reason you wouldn’t use one of the many more comprehensive
monitoring solutions e.g. Zabbix?

------
est31
Note this only supports Nvidia GPUs. Doesn't mention it in a single word.

~~~
jacenkow
+1 I will fix it in the documentation.

~~~
est31
Thanks!

