I don't see how this has anything to do with speculation? In most cases where you care about this you don't have to speculate if all the loop iterations are needed. For example in matrix multiplication all of those iterations will be needed.
What I'm thinking is that the processor has an instruction stream that looks like this:
loop:
instr_1
instr_2
...
instr_n
jcond loop
Now, assuming the loop is not unrolled, it would need to speculate that `jcond loop` will jump to be able to execute 2 copies of instr_1 in parallel - I'm saying that it may be able to do that, though I am by no means sure.
Oh, I see what you mean -- I was talking (and thinking) about the unrolled version so it didn't make sense how speculation could help there. But I imagine that typically the kind of long chains that you might want to do in parallel in a single basic block are perhaps something that wouldn't get executed that far after a branch, if the only purpose is to not waste time after a branch misprediction. Plus from what I understand you'd still be wasting execution units here, just not by idling them but rather by speculating the "I'm done" branch repeatedly.
EDIT: I just found that the idea that I had in my head actually exists and is called "modulo scheduling".