ghostnexus's comments

ghostnexus · 2026-04-15T12:54:34 1776257674

Former ML engineer here who ran IsaacGym and MuJoCo sims in the cloud for 2+ years. The pain is real and very specific:

1. Cold start latency killed iteration loops. Spinning up a GPU VM to test a 10-minute sim run took longer than the sim itself — you'd wait 3-5 min for the instance, run 8 min, tear down. That per-iteration overhead crushes exploration.

2. Idle billing. If you're grid-searching over reward functions, you want to fire 20 parallel runs, collect results, tune, repeat — but most providers bill per-hour so even a 12-minute run costs you a full hour.

3. Physics sim + CUDA dependencies. Custom CUDA kernels (warp sim, etc.) often need specific driver versions. Docker helps but image build/push overhead adds another 5-10 min to the loop.

The "CI for sims" framing (push code → run on GPU automatically) directly addresses #1 and #3. Worth building.

On the infrastructure layer: we built GhostNexus (https://ghostnexus.net) to address #1 and #2 — per-second billing, <30s cold starts on RTX 4090 hardware, Python SDK with 3 lines to submit a job. Might be worth using as the GPU backend if you don't want to manage the infra layer yourself. (Disclaimer: I'm the founder.)

nikhilol · 2026-04-16T05:53:28 1776318808

This is really interesting, thanks for this insight. Ghost nexus looks awesome, would love to chat some more!