
Efficient pagination of a SQL table with 100M records - kawera
http://allyouneedisbackend.com/blog/2017/09/24/the-sql-i-love-part-1-scanning-large-table/
======
gravypod
In my current job I maintain a database of 2B records that is (unfortunately)
on spinning rust and on a last-gen Intel desktop CPU system. This database is
a read-only data warehousing sort of setup.

If you need to do operations like this on a database with this setup I'd
suggest checking out the following config options:

    
    
        1. max_heap_table_size
        2. read_buffer_size
        3. sort_buffer_size
        4. join_buffer_size
        5. thread_concurrency
        6. tmp_table_size
    

I'd also suggest looking into TEMPORARY TABLEs with ENGINE=MEMORY.

I have not included values in this because the values that make sense for you
are likely not the values that make sense for me. Check out MySQL's
documentation for what these values effect [0]. The defaults in even MySQL's
huge config are very outdated for the horsepower that modern computers bring
to the table. It's funny that MySQL's once massive 4GB of RAM config file is
now the appropriate setup for my laptop.

[0] - [https://dev.mysql.com/doc/refman/5.7/en/server-system-
variab...](https://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html)

~~~
wvenable
To set these variables appropriate for your system, I recommend the MySQL
tuner script:

[https://github.com/major/MySQLTuner-
perl](https://github.com/major/MySQLTuner-perl)

~~~
a012
Is there similar tuning script for Postgres?

~~~
danielsamuels
Yes, pg_tune

~~~
tkyjonathan
Is there a similar script for Redshift?

------
galeforcewinds
Two big pitfalls seem to hit almost all of these "low-level optimization of
how the database handles..." type posts: the importance in optimizing the
system, and the use of other technologies.

Write activity impacts database query caches. Many applications do not require
realtime-accurate results from the database and generate a substantial number
of the same paginated queries. For these cases, it is very important to
consider higher-level caches within the system -- CDN, page-level caches, page
block-level caches, etc. as caching your 99% traffic pattern will provide DB
platform headroom to support your 1% traffic pattern.

Where the results you are paginating are based on any sort of matching (SQL
WHERE), most read applications see a sizable benefit in integrating a search
platform. Data selection for display is handled in the search layer, and
underlying data retrieval for display happens either from the search layer or
via the backing database using inexpensive lookups via primary key.

One of the key considerations not covered in the article is the need for
result consistency when paginating, e.g. if the underlying data changes. It is
the need for this consistency, not the desire for performance, that I see as
the primary reason to include primary key identifiers or timestamp values in
your pagination strategy.

~~~
crescentfresh
> include primary key identifiers or timestamp values in your pagination
> strategy

And then we got that _one customer_ that somehow backfilled old timestamped
data and complained our pagination broke :(

~~~
galeforcewinds
Many database platforms support column-level privileges or permissions which
could be used to prevent that sort of backfill, by preventing UPDATE on the
column that holds the creation timestamp.

I've also dealt with databases without column-level privileges. Where not
handled by a column-level privilege system, it may still be possible to block
this sort of UPDATE using a TRIGGER designed to fail.

Privilege grant (or drop of the trigger) could be used when the system is in
an offline maintenance mode should you require the ability to correct that
protected data, reinstituting the control when the maintenance is complete.

The criticality of the data and the level of automation in use would probably
be the factors I would use to decide whether this overhead was warranted.
Hopefully you had backups available.

~~~
crescentfresh
Well in this case it was timeseries data, so technically supported (we
"upsert", and timestamp is simply a part of the composite key).

It occurred when the clock on the hardware got messed up :(

------
jerf
You know, if I had access to the documentation of a major DB system, I would
put something about this right on the documentation page for LIMIT and OFFSET.
It's just _such_ an attractive nuisance; who can blame a novice developer/DB
user for making the mistake of thinking that LIMIT and OFFSET are a good
solution, really?

~~~
shub
LIMIT is great, and a necessity on MySQL. It's OFFSET that causes problems. If
I were king of databases I'd make OFFSET > 1000 cause an error directing the
user to the doc page explaining why what they're doing is a bad idea and how
to increase the limit anyway.

~~~
barrkel
Offset 1000 won't hurt you if it's a simple indexed scan; MySQL will mess you
over far worse if you use a subquery in your where clause, you can be looking
at 1M+ rows scanned via the power of nested loops.

I've invested several months of my life optimizing arbitrary sort and filter
for MySQL results over tables varying from 100k rows to 100M (using time
windows to scope things). As long as you can efficiently cut down the
underlying data set into the region of 100k using some kind of window -
usually recency based - and convince MySQL to filter by this before it sorts
or does any other kind of filter - then limit / offset pagination doesn't
hurt.

You can then use higher-level pagination on the recency window if it becomes
necessary.

------
developer2
It gets worse when you want to allow sorting the paginated results by
something other than the integer primary key, such as a name. Duplicate names
with different ids means having to use GROUP BY.

I typed out the example queries below from memory, hopefully I haven't made
any logic errors. Note:

a) Requires additional indexes to make it efficient.

b) Accounts for inserted and deleted rows.

c) You cannot use page numbers. You can only do first page, next and previous
pages, and last page.

d) Example is for 20 items per page. The query pulls 21 (+1) items, so app can
determine whether there is a next or previous page.

e) In a web app or api for example, your parameters get crazy, such as:
/users?page=next&sort=name&sortId=20&sortVal=George

    
    
      -- first page
      SELECT name, id
      FROM users
      GROUP BY name, id
      ORDER BY name, id
      LIMIT 21;
    
      -- next page (ex: last item on current
      -- page has id 20 and name 'George')
      SELECT name, id
      FROM users
      WHERE name >= 'George' AND id > 20
      GROUP BY name, id
      ORDER BY name, id
      LIMIT 21;
    
      -- prev page (ex: first item on current
      -- page has id 21 and name 'Harry')
      SELECT name, id
      FROM users
      WHERE name <= 'Harry' AND id < 21
      GROUP BY name, id
      -- need DESC, reverse order in app for display
      ORDER BY name DESC, id DESC
      LIMIT 21;
    
      -- last page
      SELECT name, id
      FROM users
      GROUP BY name, id
      -- need DESC, reverse order in app for display
      ORDER BY name DESC, id DESC
      LIMIT 21;

~~~
thriftwy
How about page 9?

~~~
developer2
As mentioned:

> c) You cannot use page numbers. You can only do first page, next and
> previous pages, and last page.

To get to page 9, you start on first page. Then "next page" 8 times. This is
why you see this kind of UI fairly often. First (<<), previous (<), next (>),
last (>>). No numbers.

Technically you can do _relative_ page numbers (ie: current page + 8) by
increasing the LIMIT of the query and skipping over results. It's really messy
to handle and you can never use "real", absolute page numbers, because
inserted and deleted records can change how many next or previous pages there
will be.

~~~
thriftwy
> To get to page 9, you start on first page. Then "next page" 8 times.

Are you sure it will be faster than just using offset/limit on the server
side?

------
wenc
This page provides a more complete overview of different pagination methods.

"Five ways to paginate in Postgres, from the basic to the exotic"

[https://www.citusdata.com/blog/2016/03/30/five-ways-to-
pagin...](https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/)

~~~
j_s
And the associated recent discussion (also linked elsewhere here):

[https://news.ycombinator.com/item?id=15446855](https://news.ycombinator.com/item?id=15446855)
(Oct 2017, 42 comments)

------
thrownaway954
it's kind of crazy that here in 2017 there are still discussions and debates
on what exactly is the best way to paginate a large table in a database.

I'm leaning with @jerf that the LIMIT/OFFSET approach can trap you on the 100M
record scale, however in all honesty, most of us don't deal with databases at
this size. As @alvil stated, his approach works fine for 1M records, which is
way more then most personal and application databases will contain.

Maybe the solution is for the database vendor to actually implement and
document a definitive solution for this. Maybe re-engineering the LIMIT/OFFSET
approach under the hood to take advantage of this new way.

As for the article... I really have a personal problem when authors get lazy
and take the easy way out by assuming everyone knows what they are talking
about. Take this portion of the article

========================

Simplified algorithm:

We get PAGE_SIZE number of records from the table. Starting offset value is 0.

Use the max returned value for user_id in the batch as the offset for the next
page.

Get the next batch from the records which have user_id value higher than
current offset.

========================

The author took the time to write an entire article debating and demonstrating
their solution to the whole pagination problem and they couldn't take 5
minutes to show the code behind these steps in their solution?

As I stated, it's just a personal thing.

UPDATE: Down the internet rabbit hole I go. Here is a very nice article
similarly demonstrating and debating the LIMIT/OFFSET approach in MSSQL
between their OFFSET/FETCH and CTE.

[https://sqlperformance.com/2015/01/t-sql-
queries/pagination-...](https://sqlperformance.com/2015/01/t-sql-
queries/pagination-with-offset-fetch)

~~~
privateprofile
Exactly, the "correct solution" would be my first attempt as it uses the
(clustered) index scan to count pages/records, it's likely to be the fastest
option whereas the first method should not even be an option. This should be
immediate for someone who understands how to develop in virtually any RDBMS.

This is also, more broadly, something I've been thinking about recently. It
feels (to me) like industry knowledge is getting lost somewhere, because you
see people trying to reinvent the wheel in a lot of technical areas. One of
them is data: there are proven approaches to most problems faced by the
average organization, especially when it comes to designing and managing a
data platform. Yet, you see people rolling their own
approaches/designs/methodologies to basic things like ingestion, ETL and data
modelling when the traditional approaches would suffice and would take half
the time to implement and one third of the effort to maintain. It's great that
people try to innovate, but what I see regularly is more like people trying to
solve a solved problem without bothering to educate themselves on how it was
solved so far.

------
manigandham
Citus also has a good article on different paging methods:
[https://www.citusdata.com/blog/2016/03/30/five-ways-to-
pagin...](https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/)

------
barsonme
See also: [http://use-the-index-luke.com/no-offset](http://use-the-index-
luke.com/no-offset)

~~~
thriftwy
But how would you build a paginator?

How would you give a link to page 9 when you're on page 1?

~~~
icebraining
You can't: _On slide 43 you can also see that keyset pagination has some
limitations: most notably that you cannot directly navigate to arbitrary
pages. However, this is not a problem when using infinite scrolling. Showing
page number to click on is a poor navigation interface anyway—IMHO._

As someone who often skips a few pages when navigating such interfaces, I say
"thanks, but no thanks".

------
tbrock
> In my case with 100 000 000 records, the query is never finished. The DBMS
> just kills it. Why? Probably, because it led to the attempt to load the
> whole table into RAM.

What? Why? It should just instantiate a cursor and return the first batch size
worth of records.

I’m not sure why the dB pulls load the entire result set into ram to return
results unless it requires a sort on a non indexed field.

~~~
nieksand
Mysql would fall back to a disk based sort once the data set is too large.

Perhaps it's his client that is OOM-ing? Back in the day, I remember the PHP
mysql client loading the entire result set into a local buffer unless you
explicitly asked for the output to be streamed.

Edit: Found reference to PHP behavior I mentioned:
[http://php.net/manual/en/mysqlinfo.concepts.buffering.php](http://php.net/manual/en/mysqlinfo.concepts.buffering.php)

------
olalonde
Similar article from 5 days ago:
[https://news.ycombinator.com/item?id=15446855](https://news.ycombinator.com/item?id=15446855)

~~~
torna
Thanks, I missed this!

------
combatentropy

      > You need to walk thru the table,
      > extract each record,
      > transform it inside your application’s code
      > and insert to another place.
    

The writer said his database died in the middle of the query when he selected
the 100,000,000 rows without first breaking them into pages.

It looks like he's using MySQL. I've used PostgreSQL for over a decade, and I
just don't see this happening, though I've tested only on a table with a few
million rows. But anyway I doubt PostgreSQL loads the whole table into memory
when you select it. It seems to load just parts at a time, as needed. And for
what it's worth, an offset of a few million rows still took just a few
seconds.

Actually what I would first try to is dump the table to a file, transform it
if possible with Linux command-line tools like sed and awk, and then load the
file into the new table.

Can anyone confirm if Postgres would die like MySQL did on too big a dataset
--- selecting everything without paginating but fetching just one row at a
time in the application?

~~~
no1youknowz
> I've tested only on a table with a few million rows

I've tested postgres with > 250m rows. It has problems. I've also tested to 1b
rows, same issue.

Now throw in some sum, count and group by functions, more issues.

Add a join? Now you've major problems. Add several joins? Woah there!

I've been able to bring a 5 node citusDB cluster to it's knees. Which the only
solution was to scale out massively to double digits servers. But you can't do
that on a limited budget.

The caveat here, this was about 2 years ago now. I don't know if there have
been any improvements since that time.

I don't have a postgres database now, I've since migrated over to memsql as
the majority of my work is OLAP.

~~~
kuschku
I run a 270m rows database for my IRC bouncer, and run fulltext searches over
it all the time, with many joins, expensive ranking, etc. On a single machine.

I’ve never been able to bring PGSQL to its knees.

------
themonk2
Related (2009) : [https://www.slideshare.net/suratbhati/efficient-
pagination-u...](https://www.slideshare.net/suratbhati/efficient-pagination-
using-mysql-6187107)

------
awj
That's great if you need to do some kind of crawl through the DB for
background work, but fails to support key elements (filtering, non-primary-key
sorting) that are probably hard requirements at the API level.

~~~
i_s
You can still support arbitrary filters and ordering, as long as you always
sort by the ID last. The nice thing is the queries actually get more and more
efficient as you approach the last page. (The normal approaches have the same
cost across all pages, or even get more expensive toward the latter pages.)

The only thing this approach does not support is jumping to an arbitrary page
number.

------
phaedrus
I wonder if an indexed "PageNumber" column could work? Obviously wouldn't work
for arbitrary queries, but if you had a single, commonly run query it could be
used to speed up that one...

------
pleasecalllater
I'd really go with the first solution. However I'm usually using Postgres. And
of course it would need to be done in batches at the application side.

------
elango
Can u share a real-life use case for this problem, even in google, folks don't
get past the 5th page. Last five can be accessed by reverse sorting.

------
rburhum
<sarcasm>And here I thought clustered index and a OFFSET was the
solution</sarcasm>

------
wolco
On the frontend would one store page=1,page=2 or
page=user_id(10000th),page=user_id(20000th) ?

~~~
alvil
With this approach you can't have page numbers on the front-end, just first,
prev, next and maybe last. You can't jump to particular page. It is old
technique and I like it, but if it would be usable it would be used by
everyone but it is not.

~~~
ngrilly
What is the purpose of having pages numbers and jumping to a particular page?

~~~
thriftwy
For example you have a forum post with 1000 messages starting in 2007, and you
are dying to know what happened in the discourse in 2011. So basically you
need to jump in the middle by bisection.

1000 is tiny by DB standards but solutions like this one are often bolted
without considerations.

~~~
ngrilly
That's a great example. I understand the use case now.

I note this use case can be solved using keyset pagination instead of
offset/limit, because the ordering of messages is stable: messages are never
inserted in the middle of the list; they are appended at the end. We can
attribute a number to each message, and use this message to filter and sort.

~~~
thriftwy
But what if a message gets deleted from middle of the list? This happens with
forum messages.

~~~
ngrilly
You keep it and mark it "deleted", and you show "deleted" in place of the
message. This is what most forums do already, independently of the underlying
pagination mechanism.

~~~
thriftwy
So basically I'm pretty sure now that we need a new kind of SQL INDEX, as in,
pagination index.

To do that aforedescribed accounting for us, and more.

~~~
ngrilly
It wont't be a new kind of index. It would be a new interpretation of OFFSET
and LIMIT, which would ignore deleted rows.

~~~
thriftwy
Why would it? It will be rebuilt when a row is deleted or even marked deleted.

If we have an pagination index over (topicId, deleted) on postTime asc.

------
indentit
storing the max id returned from the batch for use as an offset for the next
batch is something I've done ever since I even started programming - is it
supposed to be revolutionary advice?

------
alvil
I usually use query similar to this one (in PHP/MySQL), it's much faster than
traditional offset query. You can compare it for yourself using EXPLAIN. Sure
not for 100M records, but for 1M or so :)

    
    
      $items = db("SELECT a,b,c FROM item
               JOIN (SELECT id
                  FROM item
                  WHERE
                  <your-where-conditions>
                  ORDER BY id DESC
                  LIMIT <offset>, <limit>)
                  AS x ON
                  x.id = item.id");

