There's that, but there's also just costs that are baked into the fact of distribution itself.
For example, take Spark. Since it's built to be resilient, every executor is its own process. Because of that, executors can't just share immutable data the way threads can in a system that's designed for maximum single-machine performance. They've got to transfer the data using IPC. In the case of transferring data between two executors, that can result in up to four copies of the data being resident in the memory at once: The original data, a copy that's been serialized for transfer over a socket, the destination's copy of the serialized data, and the final deserialized copy.