Job one is have every bit of software involved also be deterministic, which stag...

spindump8930 · 2025-09-10T18:37:16 1757529436

That's not what this is about.

"I had no problem getting deterministic LLM outputs when I experimented with this 6 months ago" looks like you're using llama-cpp in that repo. This is about vllm serving many requests at once, at long sequence lengths.

> As it turns out, our request’s output does depend on the parallel user requests. Not because we’re somehow leaking information across batches — instead, it’s because our forward pass lacks “batch invariance”, causing our request’s output to depend on the batch size of our forward pass.

Your situation isn't really comparable.

saagarjha · 2025-09-10T18:36:00 1757529360

What’s stagex?

lrvick · 2025-09-11T00:45:50 1757551550

supply chain security focused linux distro that does not trust its own maintainers by design.