Looks like applying the motion vectors from one video to the starting conditions from another, which sometimes happens when fast forwarding with a buggy video decoder or a glitchy video.

I wonder, could you do this just by chopping I frames from one video with B and P frames from another?

That (removing I-frames) is exactly what they were doing - the technique's called datamoshing.

