> When you do a floating point reduce operation the order in which you reduce your floating point numbers matters, but this order is not necessarily deterministic.
There is no such thing as a "floating point reduce operation". You can do a reduction using a floating point calculation as your joining operation, but even in that case, each individual operation is still deterministic. Moreover, a classical reduce is also deterministic, being a fold in one direction or other based on the associativity of the joining operation. The non-determinism comes only from your choice to specify the behaviour of your "parallel reduce" function in a non-deterministic way, and if you do that without understanding the consequences that's hardly floating point's fault.
In practice, the non-determinism is important for efficiency. Consider computing a norm or dot product in parallel. You sum the owned part of the distributed vector, then use MPI_Allreduce() to get the result of the global sum (the parallel implementation uses some tree structure, usually semi-redundantly, perhaps using dedicated hardware). If you run with a different number of processes, then the reduction will necessarily be done with different associativity. Of course any such result is as good as the others (unless the vector is somehow sorted in which case some orders will commit larger rounding error).
The MPI standard does not require that reductions use deterministic associativity for the same number of processes (independent of the physical network topology), but they do suggest that it is desirable and (AFAIK) most implementations are deterministic on the same number of processes.
Map reduce type calculations are less likely to be on kniwn or fixed numbers of pricesses, as they are designed for failing hardware and more flexible setups, unlike most MPI jobs. But most if the calculations are probabky less sensitive to errors too.