
Keyset-driven paging - lukaseder
http://stackoverflow.com/a/3215973/521799
======
JulianMorrison
I've had keyset paging _fail_ for me in MySQL, once or twice in every ten
thousand rows, in a "chaser" process that was trying to keep up with a table
under heavy write load. I obtained data with holes in it; I think MySQL had
allocated the autoincrement IDs to in-progress transactions but not written
the rows.

~~~
lukaseder
But then, MySQL failed you on keeping autoincremented IDs strictly and
globally incrementing to all readers. That is not related to keyset paging. It
just happens that you chose to `ORDER BY` a non-stable ordering criteria.

In principle, you can order by any (possibly composite) key when performing
keyset paging. Actually, all I've just said is equally true for `OFFSET`
paging.

~~~
JulianMorrison
No, they were stable all right, once the writes had settled down. The trouble
is that sometimes it would write rows with an ascending ID sequence like

    
    
      1
      1,2
      1,2,4
      1,2,3,4
    

…and a read transaction that captured "1,2,4" would try reading the next ID >
4 and never see 3.

Evidently it allocated the ID and space for row 3 before writing it and
exposing it to the other transactions. And sometimes that turned into a race
condition.

~~~
lukaseder
Aha, I get it now. Yes, that's a little awkward. Although, 3 would re-appear
when going back again but that might be even more confusing.

But with OFFSET, you'd just have (frequently) shifting windows, right?

~~~
JulianMorrison
I was trying to stream the table entries into a "tallying" type process. And
of course it came out with the wrong numbers. Frustrating. I finally replaced
the whole "chaser" process with a repeated full table scan with SQL aggregate
functions. Wildly inefficient, but reliable.

------
schrodinger
I try to avoid using the key based paging because you're tying functionality
to an implementation detail. If you change your key generation method (start
using guids), the order may not be predictable. Or if you distribute your
database with peer to peer replication, the order of your keys may only be
consistent relative to other keys generated by the same instance. Or maybe you
add some historical records, and they've got a time stamp that's accurate, but
this sort of paging will show them in the order they were inserted.

~~~
tbarbugli
And whats the alternative for you ?

~~~
schrodinger
Order by the thing you actually care about, like a time stamp. And add an
index on that column to make it efficient. You still need to analyze the query
to determine if it's worth adding a covering index, or letting the DBMS
perform a key lookup, but that's probably beyond the scope of a comment here.

I guess it's not keyset based paging that is to be avoided, but paging based
on a surrogate key which is a meaningless value. If your key had a meaningful
value, it's a valid way of writing the query.

------
lukaseder
It is worth mentioning that "keyset driven paging" is sometimes also referred
to as the "seek method":

[http://www.reddit.com/r/programming/comments/p7lgl/sql_pagin...](http://www.reddit.com/r/programming/comments/p7lgl/sql_pagination_in_constant_time_using_the_seek/)

