

Nearly all web APIs get paging wrong - apievangelist
http://vermorel.com/journal/2015/5/8/nearly-all-web-apis-get-paging-wrong.html

======
junto
I'd like to see this continuation pattern described in a bit more detail. How
does the client define the sort parameters or preferred data limits per
continuation? Have I missed the point?

~~~
quesera
Continuations are "easy" (if you have decent language support or can glom it
on) but that's keeping state on the server side. The author describes
continuation tokens that "never expire", which is incompatible with server-
side state.

Without state, a token can be a bookmark into a predefined ordered dataset.
That's more reliable than an offset, but just as inflexible, and much more
expensive on the server side.

Or, I'm missing something too.

~~~
reipahb
One thing to note is that while the continuation token is just a blob of data
from the clients perspective, the server can actually use it to store the
required state information.

A simple method would be to take the state information the server needs in
order to continue the enumeration (e.g. sorting order, how far along it was in
the enumeration, etc.), JSON-encode it, encrypt&sign it, and then base64
encode it.

Return that token to the client, and if the client wants more data it can pass
that token back to the server, which can decode it into all the information it
needs to resume the enumeration.

~~~
mateo411
I like this approach, but this requires storing the rest of the enumeration. I
think that you would want to the continuation token to expire after a
reasonable period of time (10 minutes to 1 hour). When the token expired, you
would remove the rest of the enumeration from your data store.

This isn't what the author recommends, but I think this is a good approach.

~~~
reipahb
I don't think this requires storing anything on the server. That was kind of
the point of the whole "store the server state in the continuation
token"-thing :)

You want to store enough information in the token that you can easily
reconstruct and resume the enumeration.

For example, let us say that the user asked for all comments with a score >= 5
sorted by post time. In that case you could return 100 comments, and a token
that encoded something like:

    
    
      {
        "min_score": 5,
        "sort": "post_time",
        "resume_from_post_time": "2015-05-07T05:34:02Z",
      }
    

To ensure that it is easy to resume the enumeration, the API can fudge the
number of returned items so that the returned data always breaks at a nice
"post_time" boundary. The goal here is to make it easy for the client to get
all the data in the enumeration without implementing all this logic
themselves.

True, it will only work efficiently for some types of queries, but a lot of
the common queries can be reworked into something like that.

~~~
mateo411
Ok, I see the difference.

You suggest that the continuation token, is basically an encoding of the query
parameters, to fetch results from the API. If you go with this approach, then
you don't have to store any state on the server. This is a good approach,
because it's simple, but it doesn't solve the issue, where the response from
the API changes while you making the paging API calls. The example used in the
article is where an order was deleted, while you were calling the API.

I was thinking of using a uuid to generate a continuation token, and storing a
copy of the results from the API. Subsequent calls that use the continuation
token, would take a subset of these results. This requires storing more state
on the server, and managing that state. The benefit to this approach is that
results you get back from paging are consistent. This solves the issue, where
the results from the API change while you are calling the API multiple times.
The downside to this approach is that you have to store more state in server.
If you are storing the full results for all of these paging API calls, then
this could be quite large.

~~~
reipahb
Actually, deleted orders are not a problem with continuation tokens. The
deleted order is only an issue if you use traditional paging requests, which
was the point of the article.

The only case I can currently think of that cannot be solved using
continuation tokens sent to the client is where the order of the items that
are enumerated may change between calls to the API. For example, imagine that
you are fetching items sorted by score, and somebody upvotes or downvotes an
item while you are enumerating them. In that case it is very difficult to
encode enough information in the continuation token. (I can think of
complicated ways to make it work, but the resulting database queries would be
horrible.)

But for simple stuff like deleted items and similar, it is easy. If you leave
out sorting it is trivial to implement as well -- all you need is to enumerate
the items based on an internal ID that is guaranteed to always increase,
filtering them as required. The continuation token will simply be the ID of
the last item you evaluated and the filter that is applied. On the next
request you just resume from that ID. If that item ID happens to be deleted in
the meantime it is no problem. You just resume from the next one available.
I.e.:

    
    
      SELECT * FROM items WHERE id > :last_returned_id AND [insert-filter-here] ORDER BY id LIMIT 100;

~~~
quesera
I understand where you're going, but this is not a continuation in the formal
sense, and it isn't even usefully similar.

If the client stores the state on behalf of the server, the server will
potentially be working with a modified dataset on the next request, and we're
right back where we started with limit and offset.

You can cook up a scheme to make it work for a restricted range of requests,
but the compromises are severe.

------
karmakaze
I've often wondered how the paging on HN could be better. The main issue is
going from page 1 to page 2 where items move between them and I either see
items a second time, or miss them. The problem with fixing a sequence on first
page load is then when to refresh for new content--only on page 1? Lastly, a
prescription is not helpful without a design for efficient implementation. How
can this be achieved in a stateless manner?

~~~
reipahb
The only way I see to solve this without server side state is to replace the
page-parameter with a list of items you have seen. The more button would then
just find the top 30 items you haven't previously seen. Unfortunately, this
would become unwieldy very quickly, and sooner or later you hit the browser
limitations on maximum URL size.

A relatively simple approach that involves server side state is to
periodically (once a minute?) generate the list of (for example) the 10000 top
items.

(A high traffic site will most likely want to do this in any case, so that it
has a cached list of items ready to serve to clients, instead of issuing a
database query to find the top items for every request.)

Now, instead of overwriting the list of top items every time you regenerate
it, keep multiple versions of the list. Then you can make the link to the next
page specify the version of the list and the page number. That way, users will
browse through one specific version of the list.

(This requires storing some state on the server, but the amount is relatively
small. You control both the size of the generated list, how often new lists
are generated and how long they are kept, so there is an easily calculated
upper bound on the amount of state information you need store.)

~~~
karmakaze
This solution satisfies my usage and doesn't use a continuation token, though
one could be constructed from version and page. It does however expire.

I can see that the other comments on constructing continuation tokens won't
work for HN assuming post upvotes are mutably updated.

