
Getting Pagination Wrong (2016) - pmontra
https://blog.jooq.org/2016/08/10/why-most-programmers-get-pagination-wrong/
======
Ixio
> There’s absolutely no use case out there, where I search for something, and
> then I say, hey, I believe my search result will be item #3175 in the
> current sort order.

I actually have a use case. When the search filtering functionality is lacking
or overly complicated to use. For example with Gmail if I can't be bothered to
look up how date filtering works, since I receive emails at a fairly constant
rate I can sort of guess that item #3175 might be around the date I'm looking
for.

I strongly disagree with anyone that thinks Facebook "got it right" with their
timeline. As far as my experience goes it's very easy to see something
interesting on the Facebook timeline only for it to refresh and lose it
forever. It can be very frustrating not to be able to get a consistent
timeline.

~~~
ken
Also from the article: "when was the last time you googled for SQL and then
went to page #18375 to find that particular blog post that you were looking
for?"

The key difference in use cases here is that one is searching _my stuff_ , and
one is searching _everywhere_. Nobody wants page #18375 of everything. People
do occasionally want page #3175 of their own stuff.

~~~
astine
Even when I'm searching other people's stuff, I don't always just want to
browse through it linearly. Sometimes I want to skip around. Meaningful keys
such as timestamps would be better than just page groupings, but page
groupings, so long as they are stable, are still useful. Sometimes I _do_ want
to skip to page #18375 because I know that I've already browsed pages #1
through #18374 on a previous visit and I want to start where I left off. You
can't typically do that with infinite scroll.

~~~
altfredd
Unfortunately, the "so long as they are stable" constraint is incompatible
with idea of mutable database backend. SQL pays _massive_ performance price
for OFFSET and it's results are still neither fast nor stable.

------
uranusjr
> Also, as a user, in most cases, I don’t care that I’m on page 317 of my
> paginated screen that displays me rows 3170-3179 (assuming 10 rows per
> page). I really don’t. The page number is absolutely useless in terms of
> user experience.

But what happens when you _do_ care? There are numerous times I clicked away
from a Facebook or Twitter timeline, but then want to go back to that post I
was reading and comment. And I can’t, because my back button brings me back to
the top of the timeline, not where I left. Pagination would have helped.

~~~
jackewiehose
Yes, this annoying endless-scrolling without pagination on reddit is the
reason I still use old.reddit.com, even though I think the new one is overall
prettier. But you make one wrong click and you're completely lost, back at the
top.

~~~
colejohnson66
I don’t like infinite scrolling, but it does have one advantage IMO. It’s that
everything you’ve seen is on the page, so, unless the page curls older
entries, you can Ctrl-F what you want to find that you just saw.

------
yowlingcat
This is cargo-cult programming based on an overreaction to OFFSET, which while
inefficient, is not the only way to implement pagination. If you have data
with an auto-incrementing ID or timestamp and index on it, it's very easy to
page arbitrarily. It's also very useful -- take any chat or news feed
application. I don't want to lose my place in the feed just because there's
some chattiness that occurred later on in time than my cursor is at.

The real answer to this question is to better understand how your database
indexes your data and will try to use it to go and plan your queries, which it
will very helpfully tell you. You want to avoid OFFSET to avoid a full-table
scan? Great! You want to completely destroy a useful feature because you don't
understand the way your database stores your data's intrinsic structure so as
to more efficiently use it's very fast tree lookup? Better hold your horses
there and engage your curiosity a bit more, because this is a common and
efficiently solved problem.

~~~
lukaseder
It is solved with keyset pagination to be precise. "I don't want to lose my
place in the feed" \- well, you just lost it because of OFFSET.

~~~
yowlingcat
Sure. I'm not advocating for OFFSET. I guess to be candid, I really didn't
like the examples you picked. Arbitrary pagination is useful, and I don't
understand why you spent so much time trying to attack it unconvincingly.

Beyond that, in your post, you link to two approaches for keyset pagination:
the seek method and the page boundary caching approach. I'd much rather use
the seek method you link to and leave it at that, because it doesn't rely on
the brittleness of row_number(), which is liable to change. Of course, this
all depends on your use case -- as you point out, if your data is truly
static, page boundary caching is unlikely to become dangerous.

~~~
lukaseder
I'm not sure I get it. A row_number() based approach is the same thing as
using OFFSET...

------
phendrenad2
This assumes you have the ideal database schema with the ideal data content
producers with the ideal user.

One could call into question any of the assumptions here, so why not go
straight for the most self-assured one:

"I never ever ever want to jump to page 317 right from the beginning. There’s
absolutely no use case out there, where I search for something, and then I
say, hey, I believe my search result will be item #3175 in the current sort
order."

...Unless you're trying to save money by shopping for some common item on eBay
or Amazon, in which case your search will yield 1000s of results with the same
name and slightly different prices, and no way to tell when the next level of
"quality" begins. For instance, the first 3 pages of results may be some
accessory for the item you really want, priced at $1. Then thankfully on page
4, you find the cheapest version of the item you're looking for. Pages 5-55
are all that exact item, plus accessory packs for a more expensive iteration
of the item. Finally on page 56, you find the next step up in quality of the
item, you decide it's sufficient for your purposes, and you purchase it.

Now wouldn't jumping directly to page 50 and then page 100 help you narrow
down your search?

Now tell me how filtering or infinite scroll can aid in this situation. It
can't. The only thing that can fix it is better control on eBay's part to weed
out duplicate listings, but that would involve some pretty hefty machine
learning, and suddenly eBa would have to raise their rates massively, and I'd
no longer buy from there.

~~~
jlokier
This is the #1 reason I don't use Ebay much any more.

When searching on Ebay, I use a sort of binary search on page numbers. The
algorithm goes like this:

    
    
      - Set number of results per page to the maximum allowed, 200.
      - Scroll through results on page 1.  Item I searched isn't found?
      - Scroll through results on page 5.  Item I searched isn't found?
      - Scroll through results on page 20.  Item I search is found?
      - Scroll through results on page 10.  Item I searched isn't found?
      - Scroll through results on page 12.  Item I searched isn't found?
      - Scroll through results on page 15.  Item I searched is found about half way down?
      - Scroll through results on page 14.  Item I searched is found about half way down?
      - Scroll through results on page 13.  Item I searched is found about half way down but it's spares & repair only?
      - Ok, I can begin *really* browsing what I'm looking for from half way down page 14,
        and expect the density to increase with each page.
    

The process is restarted with each new search query, so it takes quite a long
time to hunt for things worth buying if I haven't decided exactly what I want,
and that's why I don't buy much from Ebay any more.

But if I had to infinite-scroll to get to the first useful result at item
#2800, for each search, it would take so much longer.

I'm pretty sure Ebay could improve this UX by segmenting search results.

------
mlthoughts2018
> “There’s absolutely no use case out there, where I search for something, and
> then I say, hey, I believe my search result will be item #3175 in the
> current sort order.”

I worked for a large ecommerce company that sold digital assets previously
(music, photos and other things).

It was absolutely a huge use case that people would submit deeply paginated
queries directly, such as from saved search result links to resume later, or
known good bots that we allowed to crawl and index the search results for
customers that integrated our inventory to their APIs.

It was such a high priority use case that we actually built a detector system
to recognize these queries and divert them to a different pool of query shards
to keep the heavy deep paginating load and cache misses off of the pool of
shards that served main traffic.

Very much a huge part of our pagination design.

~~~
wrs
Doesn’t that have an even worse problem, in that if you use a query for an
extended period, the results are going to change out from under you while
you’re scanning them? Seems like you can’t build a stable crawler on that —
you’d need to store the actual result list, not redo the query each time.

~~~
mlthoughts2018
This system used pagination keying, not just a link to a page of the results.
The link would take you back to the most recently visited item from your
pagination key.

Results could have changed in that sort order, but that is a risk in all
pagination, depending on the freshness of results required by the user.

------
Aperocky
Am I the only one feeling the author is speaking completely in his personal
bubble?

While he might find 3170-3179 not useful, there might be some people that
does, and having it there barely affect anything else at all.

Meanwhile his sql queries are assuming that businesses that build pagination
use relational db exclusively. Highly scalable nosql db is a thing and if I
were to bet most popular paginated calls would be built from such.

~~~
lukaseder
This is not at all related to RDBMS. You would still prefer keyset pagination
in a "highly scalable" nosql database.

Meanwhile, I'll wait for that convincing example of where "some people" want
to find 3170-3179 useful.

~~~
adventured
The author is broadly arguing against this style:

1 2 3 ... 315 316 _317_ 318 319 ... 50193 50194

That style is in fact extremely useful if you want to jump multiple pages
ahead faster, or work backwards faster, rather than one at a time. And I'd
argue that this:

< Prev | 1 | 315 _316_ 317 | 50194 | Next >

with 3 or 5 results in the center, is the ideal form of that (can optionally
drop the prev / next with strong left / right arrowing to compact it further).
The example the author gives is intentionally bloated to amplify their point.

If I'm on page 1, and want to get to page 13, having multiple numbers in the
middle accelerates that process. I can skip to 13 much faster by jumping
results rather than going one after another. This is a practical use case and
it works well in reverse by enabling you to jump to the end of the results
(page 50194) and work backwards quickly through multi-page jumping from the
last result, which is great for many types of time-marked content.

~~~
lukaseder
> The author

I am the author ;-)

> That style is in fact extremely useful if you want to jump multiple pages
> ahead faster, or work backwards faster, rather than one at a time

It is very easy to jump several pages using keyset pagination. That's just a
UX thing to do.

------
y4mi
I wasn't expecting much with this title and was pleasantly surprised.

(Sadly) rarely used sql features explained with code example. Lovely write-up

~~~
thimkerbell
You had a much better experience reading it than I did.

------
skybrian
As a side point, Civilization players do talk about game strategy in terms of
turn number. "By turn 70 you should..."

~~~
TeMPOraL
I don't understand why they even brought up the Civilization example. "One
more turn" isn't something positive unless you're designing a videogame or
dealing internet crack.

------
ken
> Who got it right? Facebook. Twitter. Reddit. And all the other websites that
> do timelines.

Curiously: a webpage I only use because I'm part of a group that uses it for
scheduling, a webpage I only use for occasionally ranting about software
(never reading) because I'm too cheap/lazy to set up a blog, and a webpage I
only read through its "old" domain which does classic pagination.

Am I the 2019 version of the guy who hates PCs because there's no switches on
the front panel to toggle in a new boot loader when something goes wrong? Do I
just not realize how much better the new way is?

~~~
TeMPOraL
> _Do I just not realize how much better the new way is?_

You're fine. The "new way" is unergonomic, but perfect for generating
addiction. Sites using it are digital crack dealers, and the users who like
the timeline tend to be casual consumers who don't yet realize they have a
problem.

------
jonathanhefner
I agree with most of the points regarding user experience. But I loathe
infinite scrolling, and so strongly disagree that Facebook "got it right." I
think "geared pagination" with a sprinkle of Ajax works well though, in lieu
of infinite scrolling.

Here's an example, assuming there are 200 rows in total:

\- User visits page; first 10 rows are listed.

\- User clicks "More" link; next 20 rows are fetched via Ajax and appended to
the list (30 total).

\- User clicks "More" link again; next 30 rows are fetched via Ajax and
appended to the list (60 total).

\- User clicks "More" link again; browser navigates to the next 60 rows (rows
#61-120).

\- User clicks "More" link again; browser navigates to the next 60 rows (rows
#121-180).

\- User clicks "More" link again; browser navigates to the final 20 rows (rows
#181-200).

The basic idea is that there are two typical use cases: 1) The user expects
their desired row to appear in the first few dozen rows, and will change their
query if it doesn't (as mentioned in the article). 2) Or, the user really does
want to sift through all the rows, in which case it's preferable to do it in
large steps.

Of course, the Ajax fetch increments can be tuned to best support these cases,
as well as the number of Ajax fetches before triggering a full page
transition.

I actually wrote a Rails gem that encapsulates this behavior, including robust
handling of back-and-forth navigation:
[https://github.com/jonathanhefner/moar](https://github.com/jonathanhefner/moar)

------
flowerlad
A related question: why are relational databases not able to tell you how many
rows are in a table without a full table scan, which is very inefficient?

~~~
yetihehe
Because for each entered/deleted row you would have to update that counter,
which would need to be done synchronously and would add overhead. In typical
db usage you don't really need to access that number that often.

------
torstenvl
Someone at Lexis Nexis took this blog's advice, around the same time they went
to "natural language search" that's more Google-like and has less control.

The end result is that if you want to find out the legal dictionary definition
of a common term like "property," it takes something like half an hour to
click through to the actual result page.

------
greatgib
This article is so so stupid. It takes at reference the places where
pagination is the worst: Facebook, twitter, google...

------
ptx
Regarding the second issue brought up, that you will get duplicate or missing
rows when the database changes between pages, I addressed this in one of my
apps by

1) initially issuing a query that selects only record IDs for all the records
and sends them to the client;

2) when displaying a particular page, issuing a query for the actual data "...
WHERE id IN (..., ...)" with IDs taken from a slice of the client-side array.

It seemed to work OK, but this was on a very small scale. I guess one
disadvantage is that you have run two queries for the initial page. And you
have to transfer this array of IDs which might be a problem with many millions
of rows.

------
ajb
When there's a 'neg'[1] in the headline, I usually check out the comments here
first. Very frequently, as in this this case, I then don't feel the need to
actually read the article - the comments here are probably more informative.

[1] Persuasion technique which tries to get the subject to want the approval
of the persuader, by criticizing them, directly or by implication.

------
Shank
> Moreover, I never ever ever want to jump to page 317 right from the
> beginning. There’s absolutely no use case out there, where I search for
> something, and then I say, hey, I believe my search result will be item
> #3175 in the current sort order.

There is a pretty big exception to this: when the search is so bad that it
never returns proper results or filters poorly. If the search algorithm always
starts in the same dumb way, pagination can let you “skip ahead” and find
better results.

I know this is the XKCD workflow thing, but when a site implements search
terribly but at least has some semblance of working pagination with jump
tools, I can at least still get okay results.

------
meerita
I wrote about pagination and it's UX problems and this solution doesn't solve
it. [http://minid.net/2019/09/16/a-very-common-ux-dilemma-with-
th...](http://minid.net/2019/09/16/a-very-common-ux-dilemma-with-the-listings-
on-e-commerce-websites/)

------
stazz1
"As you can see, Google estimates that there are probably at least 10 pages
for your search and you can go “next”. Yes, you can skip some pages, but you
cannot skip to a page number 50194, because, again:

No one wants that

It’s costly to predict, even for Google"

The man is right on the money.

------
buboard
“SELECT * from ... limit $start, $limit+1” is enough for prev/next navigation

~~~
Supermancho
You just summarized the article!

