This is a strange argument to me, because the single most critical aspect of a bubble sort is the locality.
Bubble sort only ever compares adjacent elements. It only ever swaps adjacent elements. This is what makes bubble sort optimal (within constant factors) for an external sort on a tape drive. This is why anything else isn't a bubble sort.
Here's an algorithm (in Python) that is exactly equivalent to the one in the paper:
def pseudobubble(lst):
n = len(lst) # give a name to the list length
for i in range(n-1, 0, -1): # i will range from n-1 down to 1 inclusive
if lst[i] > lst[i-1]:
swap(lst, i, i-1) # high values bubble down
for i in range(1, n): # 1 to n-1 inclusive
for j in range(i, 0, -1): # i down to 1 inclusive
if lst[j] < lst[j-1]:
swap(lst, j, j-1) # low values bubble down
The first pass of this algorithm (bubbling high values down, one time only) may do different things than the first pass of the paper's algorithm, but that's not relevant.
All comparisons and swaps in the code you quoted are made between adjacent elements. It's local in a way that behaves very nicely for an external sort on a tape drive.
The code in the paper compares and swaps non-adjacent elements nearly every time. It is absolutely horrible for an external sort on a tape drive. It's just not a bubble sort, nor is it sort of like one in any key behavioral detail.
Bubble sort only ever compares adjacent elements. It only ever swaps adjacent elements. This is what makes bubble sort optimal (within constant factors) for an external sort on a tape drive. This is why anything else isn't a bubble sort.