The Real-time Web: How to Get Millisecond Updates with REST

vosper · on Dec 27, 2014

This is an interesting blog post, but I'm surprised about this part:

"[...] even though we have JavaScript code running in the browser, and it knows about the invalidation of a particular URL, it cannot tell the browser cache that the data at the end of that URL is now updated.

Instead, we keep a local cache inside the web page. This cache maps URL to JSON payload, and our wrapper on top of XMLHttpRequest will first check this cache, and deliver the data if it’s there. When we receive an invalidation request over IMQ, we mark it stale (although we may still deliver it, for example for offline browsing purposes.)"

I thought that the traditional approach to this would be to add a cache-buster [1] to the query string. Couldn't the Javascript code that knows there's a change update the cache-buster argument and refetch the URL? Then they wouldn't need their own cache implementation.

[1] A cache-buster is typically a random value (or timestamp) included as a query-string argument on the URL. This value has no meaning to the server, but because the URL string has changed the browser can't make any assumptions about what the return value would be, and is therefore forced to make a request to the server.

EDIT: I think I see now - what they refer to as "the cache" is more complex than I had thought; it seems somewhat more analagous to the state in a React app, in that it is triggering UI updates and isn't just a dumb layer between the client and server.

jwatte · on Dec 28, 2014

The other problem with a cache buster is that it logically changes the endpoint of what you're viewing.

We like HATEOAS and dislike clients editing URLs.

As an aside for search and filtering, we do allow clients to add query parameters to existing endpoints.

jkarneges · on Dec 27, 2014

Good stuff. The push-invalidation-plus-refetch technique is sometimes referred to as "hinting". Some more info here: http://blog.andyet.com/2014/09/23/send-hints-not-data

I've also been working on a protocol definition for REST updates called LiveResource. Anyone interested in this problem space, please send feedback. :) https://github.com/fanout/liveresource

rakoo · on Dec 27, 2014

A lot of what your protocol enables can be done with CouchDB, with the notable of Webhooks and websocket. CouchDB's protocol has been worked on, implemented and used in production for years now, with several implementations.

I'd love to see CouchDB used more and checked first as a starting point for anything in the space of live update of data, because it just works so well. Too many people re-invent the wheel left and right in this domain.

jonahx · on Dec 28, 2014

Could you provide any links for using CouchDB in this kind of use case? I wasn't aware of it as a solution to this problem (but only as a document db) and would like to learn more.

rakoo · on Dec 28, 2014

CouchDB takes things a step further: instead of having all your data on a server and gradually sending changes to each interested party, CouchDB's way is to have a data store on each "node" (the server and all clients in a traditional web server model) and synchronize data (ie documents), possibly continuously. Then on the clients you react to changes and update the UI. Your application only manages the data locally, and the CouchDB components manages everything network-related.

To do it, one useful component is PouchDB (http://pouchdb.com/), which is a data store in the browser. You just hook it to whatever backend you want and access the data with a straightforward javascript API.

There's also the much larger (in scope) Hoo.ie, but I can't talk much about it...

Unfortunately I don't have any links for you, I'm looking for experiences just like you ...

JoeAltmaier · on Dec 28, 2014

Is CouchDB still a thing? I thought it didn't scale past 6 instances or some such. And scaling was the whole point.

ricardobeat · on Dec 28, 2014

There used to be a proposed standard called Rocket, where you could subscribe to an event stream after requesting a resource. The github account has gone dark (apparently bought by CoreOS?) but can be seen here: https://web.archive.org/web/20130824235315/http://rocket.git...?

Kiro · on Dec 28, 2014

IMQ is written in Erlang. I also know that a lot of the IMVU backend is built in Haskell. Makes me want to work there.

smoyer · on Dec 28, 2014

I thought it sounded like a cool place to work too - I took a look at their homepage and it just creeped me out. Apparently they've got a large community but I can't see it ever including me (I couldn't stand to eat my own dog food).

jonpress · on Dec 27, 2014

This makes sense for backwards compatibility reasons since it fits in with the existing REST API. An alternative if starting from scratch would be to design an event-based realtime API that uses WebSocket with long polling fallback.

kxo · on Dec 27, 2014

I am building a generic, open-source system (really just a set of light standards) to bind a message queue to a set of request/reply and pub/sub microservices.

To start: I'd like to have seamless R/R via stateless HTTP as well as websockets. I'd like to also implement transparent push events via SSEs and the aforementioned socket transport.

Naturally, inter-service communication would use the already-present MQ.

The workflow in this true "microservices a la carte" solution would look like:

Initial, one-time setup:

1) Set up NSQ (or other supported MQ - I'll start with NSQ)

2) Set up standard, client-facing frontend servers (stateless HTTP, sockets, SSE) bound to queue.

3) Set up DNS names for frontend load-balancers (ELB, heroku load balancing, whatever) for each frontend.

---

Writing a service:

1) In your language/framework/stack of choice, create a new project, include standard-sauce, language-specific server library.

2) Write a function to do what you want. (retrieve a list of users, create a post, whatever)

3) Register/bind the aforementioned function to a specific, standard, interfaceable name (using the service library)

4) Configure service library to connect to initially-setup message broker.

The end-goal is to have a standard RPC interface, standard pub/sub interface, standard message fabric across front-end clients (browser, mobile, whatever) and back-end clients (inter-service communication).

---

Internally, a call message might look like:

{method: "get", action: "users", requestId: "21739dfe-fbeb-4da9-b698-c5720cbc488d", timestamp: "2014-12-25xxxxxx"}

This will seamlessly map to:

HTTP GET (Frontend): http://api.mydomain.com/users

Websocket message (stateful) (Frontend): {method: "get", action: "user"}

ServiceLibrary call (Ruby): "Project.call(:get, :users)"

ServiceLibrary call (Node): "project.callFunc('get', 'users', callback)"

.. and you get the idea. No prescription necessary.

---

If you like ActiveRecord, Ruby, the Rails stack: rip out ActionController, rip out Rack, use the stuff you want.

If you want to use some of the amazing JS frontend rendering toolchains: hook em up.

If you need to write a simple, highly-concurrent microservice, try Go.

Rather than prescribe an entire solution (a la Meteor, Derby, etc.), take the best tool for the job and throw it at the situation.

---

The scaling story is cool, too: if your ActiveRecord-driven microservice is unable to keep up I/O throughput, rewrite it or just spin more instances up. Totally shared-nothing in this case.

jkarneges · on Dec 28, 2014

You might check out the Pushpin proxy project. It's not quite the same as what you're doing but it shares your "not an entire solution" philosophy. Maybe an opportunity for collaboration.

timeu · on Dec 27, 2014

Isn't this approach what service workers are supposed to solve (controlling the cache) ?

lucio · on Dec 27, 2014

is there a more interesting use-case than status updates?

jwatte · on Dec 28, 2014

Suppose I spend some credits to buy some new accessory for my avatar, and then put n that accessory. My credits balance updates. My inventory updates. My avatar product set updates. My profile picture updates. The "sold count" of the product in question updates. All of these pieces of data will be available in real time to whomever is interested in them at the time. If we did this with RPC, everything would have to know about everything. As it is, updates to any subset of the live graph is available to anyone who views it. You can try it at m.imvu.com

Also, messaging, decoration, online status, ... Lots of data could, and should, be real time! The cool thing is that it all is deployed and works at IMVU scale.

Apologies for typos - phone auto correct.

slezakattack · on Dec 28, 2014

Hi, I'm an engineer at IMVU. We use this for almost everything! Status updates is a pretty simple example but we constantly invalidate data for things such as user's list of friends, credit balance, and our real-time chat.

Xorlev · on Dec 27, 2014

Pretty much any and all data can be cached locally. Any modern application running in a web browser will have state that can change external to the application.

rslarner · on Dec 27, 2014

Great post Jon. IMVU rocks!

curiously · on Dec 27, 2014

Would've loved it if they had open sourced their real-time graph solution.

I still don't quite know when I should use graph database but I imagine for social networking type of websites it is a must (since a standard RBDMS or NoSQL gets too verbose).

frik · on Dec 27, 2014

A relational database plus offline processing is usually enough for that, e.g. FB uses MySQL

Graph structure based features like in Friendster are expensive and don't scale well, that's why Friendster failed. And MySpace and FB removed that features (friends of friends of friends) early on.

jwatte · on Dec 28, 2014

Good point. We actually use MySQL and Regis for persistency with a helping of memcached for good measure. The "graph" part is an API view on our data, not the canonical storage.

curiously · on Dec 28, 2014

do you mean that you generate a graph database to build a hierarchy for an API based on MySQL database?