Hacker Newsnew | past | comments | ask | show | jobs | submit | 2uryaa's commentslogin

Thank you for the feedback. Taking note of this!


Yes, we operate on GB200s and GH200s. Usually we are cheaper for many models and can get up to double the TPS.


Yep, we are actively working on getting this down. We can meet SLAs with tuning for the real time vision workloads but trying to get rid of this compromise is our next big development task.


For consumers, we want to just pass on price to performance ratio. For enthusiasts and companies, we do see people want their own models/ ability to use the massive amounts of data they have.


That's really awesome to hear!!


Hey Jack, we use GB200s for these workloads. Feel free to check those big models out on our site! We are doing Kimi, GLM, Minimax, etc.


Nice! But that doesn’t answer the question. Do these optimizations don’t scale to multi-device workloads or not?


Also curious about this. We have a 30 day content retention policy and have to have access to your fine-tuned model/LoRa if deploying that. If there's anything we can change, happy to hear it out.


Would love a zdr option if possible, that’s honestly the main thing I’m going to OpenRouter for.


We usually charge by GPU hour for those finetunes, around 8-10 dollars depending on GPU type and volume! This is similar to Modal, but since the engine is fully ours, you don't wait ~1 min for cold starts. Ideally, we will make onboarding super frictionless and self serve, but onboarding people manually for now.


Haha sorry for the typo! Your F500 use case is exactly who we want to target, especially as they start serving finetunes on their own data. Thanks for the feedback!


The issue now is they are convinced OpenClaw can solve all their business process problems without touching Conway’s law.


our SLA is actually higher and we are lower priced. We are also using this as a step into serving finetuned models for much cheaper than Fireworks/Together and not having the horrible cold starts of Modal. We're essentially trying to prove that our engine can hang with the best providers while multiplexing models.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: