You wouldn't be "skipping" those rows because they would be transferred once. The database is almost always the most constrained part of a solution, and the various hack pagination techniques are almost always (not 100%. More like 98%) a naive mistake.
I am really interested in this argument, because I am currently converting a mysql based web search to use apache solr. I have been thinking about all the arguments about pagination a lot.
The thing I don't know about, though, is the "database is almost always the most constrained part of a solution". On this simple site search, we tend to notice the apaches taking a lot more cpu than the databases calls. I guess when our site gets bigger, you'd imagine it is easier to scale out the apaches than the database (with sharding), but we would still have a lot of room to improve the mysql layer anyway (memcached for example).
This is an interesting statement to me. My goal in retrieving data is normally to reduce the result set as fast as possible. If I only need a subset of the data, I would do the bare amount of joining, aggregation, etc. before paginating.
I assume the alternative is paginating server-side, which wastes some network bandwidth and processing time on the server.
If I only need a subset of the data, I would do the bare amount of joining, aggregation, etc. before paginating.
The vast majority of pagination procedures occurs on the final rendered set. Meaning you've done 100% of the work on the database server to retrieve n% of the product. And when you come back for page 2, you're again doing 100% of the work for n% of the product. Pagination is not a SQL shortcut, and the sole "savings" it provides is network bandwidth between app server and database, which is seldom a limitation.
I think your whole argument is based on the false assumption that you have to paginate after you've done everything else. If you have a well-normalized database, any decently complex query will involve a bunch of joins. If your sort is keyed on one or two fields in one table, why not sort and filter that table and join the result to the others? Admittedly I'm a bit of a DB novice, and perhaps you can speed the join up better by indexing, but it seems like to get 50 rows from two 1000 row tables, it makes more sense to filter 50 rows, then join, rather than the other way around.
If you're keeping the result server-side, how are you storing it between requests in a stateless context like a web app? If I ask for page 1, and you get all the pages, do you cache them server-side? That would add a ton of complexity. I suspect your app is sending the first n rows to the client and throwing the rest away.
That depends upon the definition of "entire table". Most situations have already pre-filtered data to some reasonably small quantities -- e.g. category, genre, exchange date, etc. Yes, caching and working on that set is almost always the superior option.