I don't get it. Why would I start an instance in ECS, to use your GPUs in ECS, when I could start an instance for the GPUs I want in ECS? Separately, why would I want half of Nitro, instead of real Nitro?
1. If you're actively developing and need a GPU then you typically would be paying the entire time the instance is running. Using Thunder means you only pay for the GPU while actively using it. Essentially, if you are running CPU only code you would not be paying for any GPU time. The alterative for this is to manually turn the instance on and off which can be annoying.
2. This allows you to easily scale the type and number of GPUs you're using. For example, say you want to do development on a cheap T4 instance and run a full DL training job on a set of 8 A100. Instead of needing to swap instances and setup everything again, you can just run a command and then start running on the more powerful GPUs.
Okay, but your GPUs are in ECS. Don't I just want this feature from Amazon, not you, and natively via Nitro? Or even Google has TPU attachments.
> 1. If you're actively developing and need a GPU [for fractional amounts of time]...
Why would I need a GPU for a short amount of time during development? For testing?
I don't get it - what would testing an H100 over a TCP connection tell me? It's like, yeah, I can do that, but it doesn't represent an environment I am going to use for real. Nobody runs applications to GPUs on buses virtualized over TCP connections, so what exactly would I be validating?
I don't believe Nitro would allow you to access a GPU that's not directly connected to the CPU that the VM is running on. So swapping between GPU type or scaling to multiple GPUs is still a problem.
From the developer perspective, you wouldn't know that the H100 is across a network. The experience will be as if your computer is directly attached to an H100. The benefit here is that if you're not actively using the H100 (such as when you're setting up the instance or after the training job completes) you are not paying for the H100.
Okay, a mock H100 object would also save me money. I could pretend a 3090 is an A100. “The experience would be that a 3090 is an A100.” Apples to oranges comparison? It’s using a GPU attached to the machine versus a GPU that crosses a VPC boundary. Do you see what I am saying?
I would never run a training job on a GPU virtualized over TCP connection. I would never run a training job that requires 80GB of VRAM on a 24GB VRAM device.
Whom is this for? Who needs to save kopecks on a single GPU who needs H100s?
I develop GPU accelerated web apps in an EC2 instance with a remote VSCode session. A lot of the time I’m just doing web dev and don’t need a GPU. I can save thousands per month by switching to this.
Well, for the time being I'm really just burning AWS credits. But you're right! I do however like that my dev machine is the exact same instance type in the same AWS region as my production instances. If I built an equivalent machine it would have different performance characteristics. Often times the AWS VMs have weird behavior that I would otherwise be caught off guard with when deploying to the cloud for the first time.
it's more transparent to your system, for example, if you have a gui application that needs gpu acceleration on a thin client (Matlab, solidworks, blender), you can do so without setting up ECS. you can develop without any gpu, but suddenly have one when you need to run simulation. this will be way cheaper than AWS.
I think essentially this is solving the same problem Ray (https://www.ray.io/) is solving, but in a more generic way.
it potentially can have finer grained gpu sharing, like a half-gpu.
the free community version has been discontinued, and also doesn't support a linux client with non-CUDA graphics, regardless of the server OS, which is a non-starter for me