Hacker News new | comments | show | ask | jobs | submit login

You picked an ugly mongo query, and there are many. You compared it to a concise SQL query, and there are many that are not.

MongoDB's limit(x) and skip(y) are a shitload nicer than most of Microsoft's ideas about pagination. It was only in SQL Server 2012 that they came up with "OFFSET" instead of "google it".... http://stackoverflow.com/questions/2244322/how-to-do-paginat...

I literally just ran into the same issue with SQL Server. Since we're running an older version, I was very confused when I found the alternative to MySQL's "LIMIT"-- it wasn't pretty.

I like how they timed something everyone has wanted for like a million years with the switch to per-core pricing lol.

What was wrong with ROW_NUMBER()? It worked just fine for me, and partitioning was frequently useful.

There are extremely few modern cases where pagination at the database layer is a good approach.

How about any cases where you don't want to transfer all the rows you're skipping over the wire? As in, nearly all cases?

You wouldn't be "skipping" those rows because they would be transferred once. The database is almost always the most constrained part of a solution, and the various hack pagination techniques are almost always (not 100%. More like 98%) a naive mistake.

I am really interested in this argument, because I am currently converting a mysql based web search to use apache solr. I have been thinking about all the arguments about pagination a lot.

The thing I don't know about, though, is the "database is almost always the most constrained part of a solution". On this simple site search, we tend to notice the apaches taking a lot more cpu than the databases calls. I guess when our site gets bigger, you'd imagine it is easier to scale out the apaches than the database (with sharding), but we would still have a lot of room to improve the mysql layer anyway (memcached for example).

This is an interesting statement to me. My goal in retrieving data is normally to reduce the result set as fast as possible. If I only need a subset of the data, I would do the bare amount of joining, aggregation, etc. before paginating.

I assume the alternative is paginating server-side, which wastes some network bandwidth and processing time on the server.

If I only need a subset of the data, I would do the bare amount of joining, aggregation, etc. before paginating.

The vast majority of pagination procedures occurs on the final rendered set. Meaning you've done 100% of the work on the database server to retrieve n% of the product. And when you come back for page 2, you're again doing 100% of the work for n% of the product. Pagination is not a SQL shortcut, and the sole "savings" it provides is network bandwidth between app server and database, which is seldom a limitation.

I think your whole argument is based on the false assumption that you have to paginate after you've done everything else. If you have a well-normalized database, any decently complex query will involve a bunch of joins. If your sort is keyed on one or two fields in one table, why not sort and filter that table and join the result to the others? Admittedly I'm a bit of a DB novice, and perhaps you can speed the join up better by indexing, but it seems like to get 50 rows from two 1000 row tables, it makes more sense to filter 50 rows, then join, rather than the other way around.

If you're keeping the result server-side, how are you storing it between requests in a stateless context like a web app? If I ask for page 1, and you get all the pages, do you cache them server-side? That would add a ton of complexity. I suspect your app is sending the first n rows to the client and throwing the rest away.

How else do you do it? Select the entire table and filter it at the application layer?

That depends upon the definition of "entire table". Most situations have already pre-filtered data to some reasonably small quantities -- e.g. category, genre, exchange date, etc. Yes, caching and working on that set is almost always the superior option.

I think this is where we're branching apart - I see pagination as part of that pre-filtering against a larger dataset.

Having a dataset your web server can crunch and cache is just one use case.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact