Or you can apply GPU optimizations for such ML workloads. By optimizing the way these models run on GPUs, significantly improve efficiency and slash costs by a factor of 10 or even more. These techniques include kernel fusion, memory access optimization, and efficient use of GPU resources, which can lead to substantial improvements in both training and inference speed. This allows AI models to run on more affordable hardware and still deliver exceptional performance. For example, LLMs running on A100 can also run on 3090s with no change in accuracy and comparable inference latency.