However, I see the same tendency at work in the demonization of poor old LIMIT and OFFSET. Let me talk about OFFSET first. This article talks about its inefficiency. Well, it may be inefficient, but every page I've generated with OFFSET still comes back in a split second. So coding around it seems to me like a premature optimization. Now let's talk about LIMIT. It's limited in reliability, because what if someone inserts while you're paging along? The item at the end of page 3 is now at the top of page 4! More insidiously, what if someone deletes? Then the top result on page 4 becomes the last result on page 3, and you never see it.
That scenario troubles my perfectionistic mindset, but I have to say in all my apps it is not the end of the world. I run into this all the time anyway on other people's websites, major and minor: whether I'm paging through search results on Google and Amazon or the latest posts on some web forum and even Hacker News. It's no big deal. At least, for my purposes it doesn't seem worth doing one of the other methods I read about here.
Now if you are programming a self-driving car or automating the mixture of medicines, maybe be careful where you use LIMIT and OFFSET.
> When to Use: Limit-offset
> Applications with restricted pagination depth and tolerant of result inconsistencies.
I don't agree with your conclusion that result inconsistencies are "not a big deal" for most cases. It's a deliberate trade off. For example, I think it is annoying on Hacker News, but I understand why they chose to show inconsistent results (If I recall correctly, HN used to have consistent pagination using something similar to cursors, but they threw it out because it caused too much server load and it was annoying when sessions expired)
On the other hand, when my accountant goes through my expenses one by one to check if they have been booked correctly, I don't want him to miss lines due to inconsistent pagination.
Just because a technique is "time-honored", it doesn't mean it's "perfectly good" in every situation, or even in most situations. You always need to evaluate your techniques, no matter how common they are, to see if they work for your particular use case.
This is the syndrome I was getting at. An article attacks a shortcoming of an established way of doing things and then glosses over the deficiencies of its own alternative. The deficiencies of the original solution aren't a problem for most people, but the deficiencies of the new one are.
While I agree with nearly 100% of what you're saying, I have run into situations where an application requires an infinite-scroll style of pagination, where the typical LIMIT/OFFSET approach can be problematic: users end up seeing duplicates while scrolling due to the ever-shifting boundaries, which seems to bother them more than if it was a completely separate page that they reloaded and saw the same duplicates.
As such, in any situation where we encounter an infinite-scroll pagination setup (which is quite common on mobile applications) we've implemented keyset-based pagination. It requires a modicum of additional thought to ensure it is correct, especially if you have very… interesting sort conditions, but ends up being quite bulletproof once implemented.
Sure, applying keyset pagination prematurely can be a form of cargo cult programming, but OTOH I'd rather work with somebody who is aware of the different pagination options available. Hopefully that same person can also choses the right implementation for the right problem, but that's sometimes easier said than done.
There are plenty of cases where the LIMIT/OFFSET behavior is not great. More boring example: you’re displaying a paginated list of financial transactions and you don’t want duplicates to appear if the list gets updated. Or any kind of audit log. Or an event feed, or infinite scroll.
At first the term "over-engineering" came to my mind, but that term doesn't quite catch it.
Possibly related discussion (about applying blockchains where it makes no sense): https://news.ycombinator.com/item?id=15401447
The tendency I was thinking of isn't chiefly about chasing the new shiny thing (Magpie programming) or mindlessly including needless libraries (Cargo Cult programming).
The thing I'm thinking of is when someone points out a shortcoming of a particular way of doing things. Like, "If you use a relational database, it might go down if you get 50 million writes at once."
All techniques have trade-offs. So it's no surprise that some established way of doing things has one. The criticism is valid. The thing will fail in that situation. The problem is that the alternative presented by the blogger has more problems than the first. It would be useful in a rare kind of job. But that's not made clear, or the reader can't see past the stain that was shown on the old way of doing things, or the reader can't get past how cool it is that there's a database that can handle 50 million simultaneous writes, or the mere novelty is intoxicating (so there is some overlap with Magpie programming).
This new way doesn't have the first one's shortcoming, but it is literally 10 or 100 times as much work to set up, is missing certain important features that the original solution had, and solves a problem that the reader will never have.
First, LIMIT/OFFSET will not give you a transactionally consistent view of the data. If anyone inserts or deletes rows while you're paginating, you'll get dupes or holes. So already you're on thin ice. Good, as the article points out, for stateless pagination in Web 2.0-style web views, but problematic for any application that needs a consistent view (e.g. infinite scroll, or if you're, say, indexing everything into a search engine). Developers might easily miss this.
More importantly, LIMIT/OFFSET doesn't scale to particularly large datasets, and it's one of those things that will bite you at the worst possible time — i.e. when the size of your application plummets over a certain performance threshold that suddenly causes lots of queries to pile up because they're all at OFFSET 10000000. (Watch out for Googlebot paginating everything to infinity!) Since LIMIT/OFFSET requires the result set to be sorted on every query, this paves the way for some truly terrible query plans. If you're lucky, you'll get a fairly efficient index scan, but if you have joins and subqueries, things can get impossibly slow.
I'm currently working on an application where even the "SELECT ... WHERE id > :last_id ORDER BY id LIMIT :page_size" trick is failing me, with queries taking 30-60 seconds because there are many going on concurrently. The entire table is maybe 10 million records, and the WHERE is very selective (i.e. it's looking at a fairly small portion of the full table), but it's still a problem. LIMIT/OFFSET worked back when we had just a few hundred thousand rows in that table.
If you're expecting 100 results back from a few million records, just being aware that if you don't trim the results in your WHERE clause it will force processing on that ORDER on a lot of records you're never going to see is a big step.
Totally agree in regards to premature optimization that LIMIT/OFFSET is tremendously more convenient. When it becomes a problem, it's good to know where to look.
Skill set in the team, supportability, experience etc are all just as important as which technology to choose.
And so if someone knows React better and are able to deliver business value quicker then how is that not a good thing. These days it's far easier to find people with React experience than JQuery experience. Likewise for NoSQL or whatever other technology.
Pretty insulting to assume that everyone who uses React, Redis, Cassandra etc are all only driven by blogs and not by any rationale thought.
In the article's defense, pretty clear and reasonable about when to use what.
The hall of fame of frameworks that do support keyset pagination is rather short:
jOOQ — Java Object Oriented Querying. Docs.
Django (Python) chunkator
Django Infinite Scroll Pagination.
SQL Alchemy sqlakeyset.
blaze-persistence — a rich Criteria API for JPA providers
How would you implement a deep pagination like "go to page 107" unless you can derive page 107 from your order criteria?
Imagine a large forum topic. The user is just seeking through time. A deep pagination here is when the user knows what they're searching for is more than 1 page away.
The user would have to know a snippet of text or an explicit date range for your search box to help, so what about all the times they don't or if their filters are insufficient? This is why forums have both deep pagination and a search box.
It would be cruel to make the user paginate one page at a time just like it'd be unbearable if Youtube didn't let you jump 10 minutes at a time.
Obviously not every pagination needs deep pagination, but that's between you and your users, and it's what came to mind when I read that article.
What's the use case for that?
I’ve been thinking about it, and ideally one would make a search engine that only indexes pages that have no analytics, tracking, ads, paywalls, etc. Then I’d find those same pages I’m searching for.
To give an example: I found http://blog.deconinck.info/post/2016/12/19/A-Dirt-Cheap-F-Aw... on page 3 of Google for "raspberry pi led", just below https://tech.scargill.net/home-control-2016/ (both of which are interesting IMO)
Google is great for finding things you already know, but I'm using it for discovering things I didn't even know I was interested in.
I can't exactly just put the entire dictionary into google to try and find what I might like. I enter a category I might like, and look through the results matching that category.
You wouldn't. Keyset pagination forces the user to do the exact same thing people complain about you doing to the database. Want page 10 of results? Too bad, enjoy running through pages 1-9 and throwing them away. Except now everything is even slower because it's users talking to the service, not the service talking to the db.
There are plenty of use cases where that's a perfectly acceptable trade-off, or an annoying-but-necessary trade-off, but the people all up in a froth over the inefficiency of limit/offset seem to regularly fail to realize that they're advocating for a database performance solution with implications that reach all the way out to the UI/UX level.
At a minimum, I see very little in the way of acknowledgement of this factor.
For example, you could have a background process that slowly paginates through the table and saves off the page boundaries, which you can then use for jumping into. They will always be a bit off especially near the middle of the table, but that's probably okay.
There's lots of ways to do this fuzzy jumping that don't involve scanning the full table synchronously.
Second, you can just perform an indirection through offset (+ limit = 1).
That said, if users were doing this in my software I'd definitely be looking at how I can improve their workflow.
I used it just this week to write a Relay Connection paginator for graphql-ruby and Sequel/Postgres. It uses keyset pagination to reliably avoid skipping items and maintain consistent query times.
Backwards pagination was a bit tricky since that’s not natively supported by sequel-seek-pagination. Basically I had to reverse the list order and then filter with the cursor values to get the last N items in a subquery, then reverse that list again to get back the desired sort order.
When I notice, I add/update this list.
I'll check those mentioned in the reply's and add them if the seem to be right.
I would recommend adding some sort of a 'last updated' type thing at the bottom of the article, along with a note that you update near that list.
As for the update marker, I've just added an "updated" date to next the the publishing date in the breadcrumb.
I'm sure this will get me all kinds of backlash from noscript purists and optimization fanatics, but it works great, is easy to implement, there's millions of JS table/list libraries that give you the pagination for free (plus free sorting and filtering).
It is by far the most productive option for the developer, if the data size allows (I'd wager it often does), and it does not even have the problems this article describes about OFFSET.
> Before continuing it makes sense to mention client-side pagination. Some applications transfer all (or a large part) of the server information to the client and paginate there. For small amounts of data client-side pagination can be a better choice, reducing HTTP calls. It gets impractical when records begin numbering in the thousands.