I haven't used SkyPilot so I am unfamiliar with the experience and performance.
However, some of the situations you would like to use Cerebrium over Skypilot are:
- You don't want to manage you own hardware
- Reduced costs: With serverless Runtime and low cold starts (unclear if SkyPiolet offers this and what the peformance is like if they do)
- Rapid iteration: Unclear of the deployment process on SkyPilot and how long projects take to go live
- Observability: Looks like you would just have k8s metrics at your disposal
I guess then the next question would be how quickly can they start executing your container from cold start when a workload comes in? Typically we see companies on around 30-60s
heh, I don't need an example in the docs, the whole repo is filled with examples, but unless you expect some poor soul to do $(grep -r ^include . | sort | uniq) and guess from there, what I'm saying in that the examples -- including the bare bones one in your documentation -- do not SPECIFY what the glob syntax is. The good thing about standards is that there are so many to choose from, so: python's os.glob, golang's glob, I'm sure rust-lang has one, bash, ... I'm sure I could keep going
As for the quoting part, it's mysterious to me why a structured file would use a quoted string for what is obviously an interior structure. Imagine if you opened a file and saw
fred = "{alpha: ['beta', 'charlie''s dog', 'delta']}"
wouldn't you strongly suspect that there was some interior syntax going on there?
Versus the sane encoding of:
fred:
alpha:
- beta
- charlie's dog
- delta
in a normal markup language, no "inner/outer quoting" nonsense required
But I did preface it with my toml n00b-ness and I know that the toml folks believe they can do no wrong, so maybe that's on purpose, I dunno
In terms of cold starts, we seem to be very comparable from what users have mentioned and tests we have run.
Easier config/setup is feedback we have gotten from users since we don't have and special syntax or a "Cerebrium way" of doing things which makes migration pretty easier as well as doesn't lock you in which some engineers appreciate. We just run your Python code as is with an extra .toml setup file.
Additionally, we offer AWS Inferentia/Tranium nodes which offer a great price/performance trade-offs for many open-Source LLM's - even when using TensorRT/vLLM on Nvidia GPU's and gets rid of the scarcity problem. We plan to support TPU's and others in future.
We are listed on AWS Marketplace as well as others which means you can subtract your Cerebrium cost from your commited cloud spend.
Two things we are working on that will hopefully make us a bit different is:
- GPU checkpointing
- Running compute in your own cluster to use credits/for privacy concerns.
Where Modal does really shine is training/data-processing use cases which we currently don't support too well. However, we do have this on our roadmap for the near future.
Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.
In terms of cold starts, they mentioned their cold starts are 250ms which I am not sure what workload that is on, or if we have the same measure of cold starts. We have had quite a few customers that we have told us we are quite a bit faster 2-4 seconds vs ~10 seconds although we haven't confirmed this ourselves.
For a 30GB model, we have a few ways to speed this up such as using the Tensorizer framework from Coreweave, we cache model files in our distributed caching layer but I would need to test. We see reads of up to 1GB/s. If you tell me the model you are running (if open-source) I can get results to you - you can message me on our Slack/Discord community or email me at michael@cerebrium.ai or
> Yes RunPod does have cheaper pricing than us however they don't allow you to specify your exact resources but rather charge you the full resource (see example of A100 above) so depending on your resource requirements our pricing could be competitive since we charge you only for the resources you use.
I may be misunderstanding your explanation a bit here, but Runpod's serverless "flex" tier looks like the same model (it only charges you for non-idle resources). And at that tier they are still 2x cheaper for A100, at your price point with them you could rent an H100.
FWIW Their most expensive flex price I've ever seen for 80GB A100 was $0.00130 back in January of this year, which is still cheaper albeit by a smaller magnitude, if that's helpful at all for your own competitive market analysis.
Yeah Runpods cold start is definitely not 250ms, not even close. Maybe for some models idk but a huggingface model 8B params takes like 30 seconds to cold start in their serverless "flash" configuration.
However, some of the situations you would like to use Cerebrium over Skypilot are: - You don't want to manage you own hardware - Reduced costs: With serverless Runtime and low cold starts (unclear if SkyPiolet offers this and what the peformance is like if they do) - Rapid iteration: Unclear of the deployment process on SkyPilot and how long projects take to go live - Observability: Looks like you would just have k8s metrics at your disposal