A modern ALU has multiple pipelined data paths for its operations. So maybe three adders with a one-cycle latency, two multipliers with a three-cycle latency, and one divider with a 16-cycle latency. Sustained throughput depends on the operation. Maybe one per cycle for add and multiply, but only one every eight cycle for divide.