
Ways to paginate in Postgres (2016) - ligistic
https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/
======
combatentropy
The computer programming field is plagued by a tendency that needs a name.
What do you call it when the blogosphere pressures you to reject all these
time-honored, perfectly good techniques, because of some edge case in a niche
field that you will never have to deal with? Like the programmer ensconced in
some corporation writing a business app that will only ever be used by 100
people. He rejects using just one server, because that couldn't possibly be
enough, you must at least separate your web server from your database. He uses
NoSQL instead of SQL because SQL doesn't scale. He uses React and a single-
page application, because old-fashioned HTML and jQuery will get way out of
hand --- even though he's just printing some tables and maybe some graphs,
based on choices made in a form. And so his solution is worse than all of the
problems he's trying to avoid combined. And none of them were going to come to
pass anyway. Okay, this was an extreme example.

However, I see the same tendency at work in the demonization of poor old LIMIT
and OFFSET. Let me talk about OFFSET first. This article talks about its
inefficiency. Well, it may be inefficient, but every page I've generated with
OFFSET still comes back in a split second. So coding around it seems to me
like a premature optimization. Now let's talk about LIMIT. It's limited in
reliability, because what if someone inserts while you're paging along? The
item at the end of page 3 is now at the top of page 4! More insidiously, what
if someone deletes? Then the top result on page 4 becomes the last result on
page 3, and you never see it.

That scenario troubles my perfectionistic mindset, but I have to say in all my
apps it is not the end of the world. I run into this all the time anyway on
other people's websites, major and minor: whether I'm paging through search
results on Google and Amazon or the latest posts on some web forum and even
Hacker News. It's no big deal. At least, for my purposes it doesn't seem worth
doing one of the other methods I read about here.

Now if you are programming a self-driving car or automating the mixture of
medicines, maybe be careful where you use LIMIT and OFFSET.

~~~
vog
_> What do you call it when the blogosphere pressures you to reject all these
time-honored, perfectly good techniques, because of some edge case in a niche
field that you will never have to deal with?_

At first the term "over-engineering" came to my mind, but that term doesn't
quite catch it.

Possibly related discussion (about applying blockchains where it makes no
sense):
[https://news.ycombinator.com/item?id=15401447](https://news.ycombinator.com/item?id=15401447)

~~~
combatentropy
Yes, overengineering is close.

The tendency I was thinking of isn't chiefly about chasing the new shiny thing
(Magpie programming) or mindlessly including needless libraries (Cargo Cult
programming).

The thing I'm thinking of is when someone points out a shortcoming of a
particular way of doing things. Like, "If you use a relational database, it
might go down if you get 50 million writes at once."

All techniques have trade-offs. So it's no surprise that some established way
of doing things has one. The criticism is valid. The thing will fail in that
situation. The problem is that the alternative presented by the blogger has
more problems than the first. It would be useful in a rare kind of job. But
that's not made clear, or the reader can't see past the stain that was shown
on the old way of doing things, or the reader can't get past how cool it is
that there's a database that can handle 50 million simultaneous writes, or the
mere novelty is intoxicating (so there is some overlap with Magpie
programming).

This new way doesn't have the first one's shortcoming, but it is literally 10
or 100 times as much work to set up, is missing certain important features
that the original solution had, and solves a problem that the reader will
never have.

------
j_s
Pasting from [http://use-the-index-luke.com/no-offset](http://use-the-index-
luke.com/no-offset) (2014); any updates on this list since then?

\--

The hall of fame of frameworks that do support keyset pagination is rather
short:

jOOQ — Java Object Oriented Querying. Docs.

Ruby order_query

Django (Python) chunkator

Django Infinite Scroll Pagination.

SQL Alchemy sqlakeyset.

blaze-persistence — a rich Criteria API for JPA providers

Perl DBIx::Class::Wrapper

~~~
always_good
Doesn't that strategy only work when you're paginating one page at a time,
like Reddit?

How would you implement a deep pagination like "go to page 107" unless you can
derive page 107 from your order criteria?

~~~
stubish
We have this on one of our web apps, and it is a problem. To me the problem is
why do we allow jumping to page 107 of 736 in the first place? How does this
help someone? The two use cases are 1) user is manually performing a binary
search on the results, 2) a bot needs to index the site. Both of these use
cases are better solved in other ways, with either search forms that do not
suck or index pages generated specifically for crawlers. So the answer to 'how
would you implement a deep pagination like "go to page 107"' is to not
implement that, and instead improve your design.

~~~
kuschku
And that’s why Google’s pagination on search results is broken, and why so
much usability was lost. Sometimes I want the lowest ranked results for a
query, or the middle-of-the-field results. Let me have them.

~~~
gruez
>Sometimes I want the lowest ranked results for a query, or the middle-of-the-
field results.

What's the use case for that?

~~~
kuschku
Google results are gamed so much, that often personal blogs that don’t do SEO,
but offer interesting content, appear beyond page 5 of Google search results,
but are still relevant. Similar issues appear in many situations where search
is used on third party results, but even on first-party results the first page
is often gamed.

I’ve been thinking about it, and ideally one would make a search engine that
only indexes pages that have no analytics, tracking, ads, paywalls, etc. Then
I’d find those same pages I’m searching for.

To give an example: I found
[http://blog.deconinck.info/post/2016/12/19/A-Dirt-Cheap-F-
Aw...](http://blog.deconinck.info/post/2016/12/19/A-Dirt-Cheap-F-Awesome-Led-
Table) on page 3 of Google for "raspberry pi led", just below
[https://tech.scargill.net/home-control-2016/](https://tech.scargill.net/home-
control-2016/) (both of which are interesting IMO)

~~~
slobotron
It's a first result if you search for "raspberry pi led table"...

~~~
kuschku
Yes, but I wasn't looking about tables. Just general things to do with led
strips and raspberry pis.

Google is great for finding things you already know, but I'm using it for
discovering things I didn't even know I was interested in.

I can't exactly just put the entire dictionary into google to try and find
what I might like. I enter a category I might like, and look through the
results matching that category.

------
skrebbel
A key option is missing from this list: returning everything (with a sanity
LIMIT far beyond the usual result count) and just doing the pagination in
JavaScript.

I'm sure this will get me all kinds of backlash from noscript purists and
optimization fanatics, but it works great, is easy to implement, there's
millions of JS table/list libraries that give you the pagination for free
(plus free sorting and filtering).

It is by far the most productive option for the developer, if the data size
allows (I'd wager it often does), and it does not even have the problems this
article describes about OFFSET.

~~~
tragic
From paragraph 2 of TFA:

> Before continuing it makes sense to mention client-side pagination. Some
> applications transfer all (or a large part) of the server information to the
> client and paginate there. For small amounts of data client-side pagination
> can be a better choice, reducing HTTP calls. It gets impractical when
> records begin numbering in the thousands.

~~~
skrebbel
Ah damn, missed that. Thanks.

------
coldcode
I once worked with an insurance company that promised to provide a paginated
search for claims. The end result (1) returned 10 at a time for each call (2)
with no sort order (3) and no search or filter, just everything. The list
averaged 1000 items normally. They also declined to fix this.

~~~
akmittal
I work for big banking company. Here most of the search APIs return 1000
results, no pagination no sorting.

------
themonk2
Similar solution for MySQL
[https://www.percona.com/files/presentations/ppc2009/PPC2009_...](https://www.percona.com/files/presentations/ppc2009/PPC2009_mysql_pagination.pdf)

------
avenoir
What about using "Row_Number() Over()"? I'd assume it would have similar
performance to offset.

------
Chiba-City
One of the most useful resource saving articles I've read in years.

