Hi HN, my name is Michele Zanotti and I’ve just released an open-source called nos https://github.com/nebuly-ai/nos that enables to finally make GPU partitioning dynamic on Kubernetes! (and much more) You can think of the Dynamic GPU Partitioning as a cluster autoscaler for GPUs. Instead of scaling up the number of nodes and GPUs, it dynamically partitions them into smaller “GPU slices”. This ensures that each workload only uses the GPU resources it actually needs, resulting in spare GPU capacity that could be used for other workloads. To partition GPUs, nos leverages Nvidia MIG, as well as the less known MPS, finally making them dynamic.
On Towards Data Science you can find a tutorial on how to use the Dynamic GPU Partitioning https://towardsdatascience.com/dynamic-mig-partitioning-in-k...
In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/
I also wrote a review of the pros and cons of different technologies for sharing resources among workloads in Kubernetes: time-slicing, MIG, and MPS https://docs.nebuly.com/nos/dynamic-gpu-partitioning/partiti...
In addition, this open source also has other components to increase GPU utilization even more. It’s called Elastic Resource Quota management: it allows to increase the number of Pods running on the cluster by allowing teams (namespaces) to borrow quotas of reserved resources from other teams as long as they are not using them. It’s better described in nos documentation https://docs.nebuly.com/nos/overview/
Let me know your thoughts on the project!