> (I didn't have control over temperature settings.) That's...interesting. You'd...

danpalmer · 2025-09-18T03:54:16 1758167656

Reducing temperature to 0 doesn't make LLMs deterministic. There's still a bunch of other issues such as float math results depending on which order you perform mathematically commutative operations in.

riffraff · 2025-09-18T04:50:55 1758171055

I keep reading this but I don't get it: for the same input shouldn't the order of resulting operations be deterministic too?

NitpickLawyer · 2025-09-18T06:05:53 1758175553

It gets more complicated with things like batch processing. Depending on where in the stack your query gets placed, and how the underlying hardware works, and how the software stack was implemented, you might get small differences that get compounded over many token generations. (vLLM - a popular inference engine, has this problem as well).

danpalmer · 2025-09-18T05:01:30 1758171690

Not necessarily. This is a good blog post from a few days about it: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

riffraff · 2025-09-19T04:59:39 1758257979

Fantastic article, thanks!

bschwindHN · 2025-09-18T05:25:14 1758173114

Previous discussion:

https://news.ycombinator.com/item?id=19567011

And a quora link (sorry):

https://www.quora.com/If-floating-point-addition-isnt-associ...

HDThoreaun · 2025-09-18T15:19:23 1758208763

Associative property of multiplication breaks down with floating point math because of the error. If the engine is multithreaded then its pretty easy to see how ordering of multiplication can change which can change the output.