nvtop or nvidia-smi gives you a good macro overview but I personally have found that utilization (EDIT: As reported by nvidia-smi) is actually a poor proxy for how fast your workload can be outside of just ensuring that a GPU is indeed being used
I agree that utilization by nvidia-smi is a poor proxy for performance. FWIW, I’ve found that for the same architecture the power consumption reported in nvtop very often correlates super nicely with the training performance and the peak performance is always at peak power consumption. Agreed on your advice for getting to tune your architecture details, but once that’s fixed and you have simple things to debug like memory usage, batch size, dataloading bottlenecks the raw power metric is typically a quick proxy. I find the temperature is a second useful macro metric that; you want to be at max power draw and max allowed temp at all times but not exceed the temperature where you throttle.
That's hard to argue with. Of course power draw is a direct measure of hardware utilization, but it doesn't translate very well to a measure of GPU Code efficiency.
Often you can squeeze out another order of magnitude of performance by rewriting the kernel and the power draw will always stay capped at whatever the maximum is. I'd say GPU power consumption is interesting if you're CPU bound and struggling to feed the GPU enough data and/or tasks.
FLOPs utilization is arguably the industry standard metric for efficiency right now and it should be a good first approximation of how much performance is left on the table.
But if you mean the reported utilization in nvtop is misleading I completely agree (as someone who uses it daily).
I’ve been meaning to dig into the source/docs to see what’s going on. The power usage seems to be a more reliable indicator of actual hardware utilization, at least on nvidia gear.
Thanks, Some people were having random problems installing WSL on their systems and I found this was the easiest solution (but based on their card models, they appeared to have much older machines.
There is no need to install Docker Desktop just to run nvidia-smi in WSL; the Windows directory containing the nvidia-smi binary is mounted inside a WSL instance and added to PATH automatically by WSL on instance startup.
As an aside: there is no need to install Docker Desktop just to use Docker containers in WSL either, unless you want a Windows GUI to manage your containers. Just follow the official documentation for installing Docker in your Linux distro of choice, or simply run `sudo apt install docker.io` in the default WSL Ubuntu distro. Docker will work just fine with an up-to-date WSL.
Further aside, it's possible to have both Docker Desktop and the normal linux Docker.io installed on WSL. They work in isolation, the easy way to know which is active is to check if Docker Desktop is running or not. I wouldn't recommend this set up...
apt install is not working for me, is this by design?
> nvtop : Depends: libnvidia-compute-418 but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
<rant>I find broken installs a huge turnoff, especially those related to NVIDIA. With their 2.3T market cap they can't afford someone to write an universal point and click install script for ML usage? Every time I reinstall Linux I have to spend a whole day sorting NVIDIA out. Why do they have so many layers - driver, cuda, cuda toolkit, cudnn with conflicting versioning - it's a total mess. Instead of a nice install script we have a million install guides 10 pages long, all outdated.</>
Because all these problems don’t hinder their bottom line.
Cluster admins or Ph.D. students handle these problems, allowing people to work. All this infra is already buried under Conda, Jupyter, etc. for most people already.
Back in the dark days of 2015 we used to spend a day or two just getting tensorflow working on a GPU because of all the install issues, driver issues, etc. Theano was no better, but it was academic research code, we didn't expect better.
Once pytorch started gaining ground, it forced to adapt - Keras was written to hide tensorflow's awfulness. Then Google realized it's an unrecoverable situation of technical debt and they started building JAX.
With AMD, Intel, Tenstorrent, and several other AI chip specialists coming with pytorch compatibility, NVIDIA will eventually have to adapt. They still have the advantage of 15 years of CUDA code already written, but pytorch as ab abstraction layer can make the switch easier.
The problem is that NVidia is a single company participating with multiple interdependent markets. They are participating with the market of hardware specification, and they are participating in the market of driver implementation, and they are participating in the market of userland software. This is called "vertical integration".
Because of copyright, NVidia gets an explicit government-enforced monopoly over the driver implementation market. Sure, 3rd-party projects like nouveau get to "compete", but NVidia is given free reign to cripple that competition, simply by refusing to share necessary hardware (and firmware) specs; and also by compelling experienced engineers (anyone who works on NVidia's driver implementation) to sign NDAs, legally enforcing the secrecy of their specs.
On top of this, NVidia gets to be anti-competitive with the driver-compatibility of its userland software, including CUDA, GSync, DLSS, etc.
When a company's market participation is vertically integrated, that participation becomes anticompetitive. The only way we can resolve this problem is be dissolving the company into multiple market-specific companies.
PyTorch and CUDA solve completely different problems. CUDA is a general purpose programming environment. PyTorch is for machine learning. PyTorch won't ever displace CUDA because there are things other than machine learning models that GPUs are good at accelerating.
Yeah, the amount of tunnel vision from AI/ML users thinking that Nvidia exists solely for their use is funny to watch. Try writing anything other than ML in pytorch. You can't? You can in CUDA. There's a much bigger world than ML out there.
I tried that, tensorflow is actually better for general purpose compute, jax a lot better. Pytorch seems to omit most of the non-ml basic building blocks, while tf kind of gives you most of the XLA api.
GPU mining went waaay down since Ethereum went POS (Proof of Stake) almost 2 years ago. Does BTC even use GPU's for mining? I am pretty sure they use ASICS.
> With AMD, Intel, Tenstorrent, and several other AI chip specialists coming with pytorch compatibility, NVIDIA will eventually have to adapt.
I don't see how Nvidia has to do anything since PyTorch works just fine on their GPUs, thanks to CUDA. If anything, they're still one of the best platforms and that's definitely not because CUDA isn't competitive.
I hate stuff that only works on certain GPUs as much as the next person, but sadly competition has only really started to catch up to CUDA very recently.
There are plenty of problems with NVIDIA on Linux, but I'm sad to tell you I think this one is your own fault.
The error message is telling you that you've held back broken packages that are conflicting with dependencies nvtop is trying to install. If you sort that out, nvtop should install.
I have nvtop installed on Debian via apt, and it works just fine.
I recently traded a friend my Nvidia 3070 for his Radeon 6700 XT, because I'd returned to Linux a few months ago and was tired of Nvidia. Nvidia should will likely get much better as NVK grows, but I think it's better to just not use their products unless you want to have Microsoft spywareOS installed on your computer.
I've had one or two upgrade problems in the last 10 years, but otherwise the Nvidia drivers have worked great for me. My biggest complaint is they dropped support for the GPU in my Macbook, and I had to install the nouveau drivers (which I can never spell correctly).
At least it's not from the FSF, and GPUs aren't gendered, or you'd have to choose from multiple gendered drivers:
- "gnuveau" for one masculine GPU.
- "gnuvelle" for one feminine GPU.
- "gnuveaux" for multiple masculine GPUs.
- "gnuvelles" for multiple feminine GPUs.
I've just spent the morning uninstalling and reinstalling different versions of Nvidia driver (Linux) to get nvcc back for llama.cpp after Linux Mint did an update - I had CUDA 12.3 and 12.4 (5GB each), in conflict, with no guidance. 550 was the charm, not 535 that was fine in January.
This is the third time I'm going this since December.
It is painful.
I'm not in a hurry to return to my cuDF experiments as I'm pretty sure that'll be broken too (as it has been in the past).
I'm the co author of O'Reilly's High Performance Python book and this experience mirrors what I was having with pyCUDA a decade back.
Has a nice 90s vibe. I have not figured out how to just expand the selected process. I could collapse everything except the selected process manually, but that'd be tedious.
I have not explored how, but I'm pretty sure you can define custom templates for the layout. Maybe you may make a custom template with nothing but the selected process window?
The downside with nvitop is that it's written in python, which means having it in your environment can cause dependency conflicts. It's either that or you have a separate venv just for it. Maybe it's fine for personal use but sysadmins would prefer nvtop
That's why the authors recommend pipx for installing nvitop. I am not a sysadmin, but I prefer pipx over relying on the (often outdated) distro sources.
radeontop is the same sort of thing if you live in amdgpu-ville and want something easy to compile. I was able to use it to show that with kernel 5.x admgpu vulkan when a process is pushed out of vram into gtt it'll never reload and get stuck in a 'slow' state.
Now that I use Home Assistant, I want all my data sources to plug into there. It can handle the rendering for me as I see fit, and it's where data comes to integrate.
It's one of those things which I wish existed, but I can't imagine anyone would have written. Until I do a web search.
Pretty much every time I've used nvtop it has been while doing CUDA stuff, mostly to see if it was going to blow out the memory but also to spot check that the model is actually using the GPU. I've had times where it said it was using the GPU, but it only did so for the first part of the work and then dropped down to the CPU for the rest.
I use several video players, including mpv, vlc and ffmpeg, and they have always used without problems the hardware video decoding and encoding on all kinds of GPUs.
Only with Firefox and Chrome/Chromium in most versions the hardware acceleration is broken, even if there have been some versions where it worked fine (on NVIDIA), but at the next browser upgrade it was broken again.
This does not bother me much, because I do not like to watch video files in a browser anyway. I always download them first and I play them locally.
ah yes, you can configure nvtop to display encode and decode loads. Interesting to watch how playback speed in VLC is directly reflected in the metric.
If you're here because you're interested in AI performance I'd recommend instead https://docs.nvidia.com/nsight-compute/NsightComputeCli/inde... to profile individual kernels. Nsight systems for a macro view https://developer.nvidia.com/nsight-systems and the PyTorch profiler if you're not authoring kernels directly but using something PyTorch https://pytorch.org/tutorials/recipes/recipes/profiler_recip...