
Building real-time feed updates for NewsBlur with Redis and WebSockets - conesus
http://blog.newsblur.com/post/20371256202/building-real-time-feed-updates-for-newsblur
======
papercruncher
Curious: Were real-time updates something that your customers had high demand
for?

I'm asking because we thought about doing real-time updates for our product
but decided against it because it made topic modelling (clustering based on
topics) much harder for us.

Also, what is your experience with publishers being consistent about pinging
the hub with updates? Do you still need to poll feeds periodically to make
sure you didn't miss something?

~~~
conesus
I wouldn't go so far as to say there was a high demand for real-time. But some
users asked for it by name, and generally, increasing speed (lowering latency)
correlates strongly with increased usage. If I can make page loads faster, and
feed fetch times lesser, then my users will use NewsBlur more. It's also why I
chart the average load time on every user's dashboard.

In this case, real-time means NewsBlur is acting as the subscriber, and the
publisher pushes out messages to NewsBlur. This way, instead of fetching every
3 minutes for what may only be updated stories or even no changes at all (not
every feed behaves nicely by offering a 304 Not Modified), I only have to
fetch the feed when it confirms that there is something new.

About 20% of all feeds have a PuSH (PubSubHubbub) option. But I now have more
than 33% less work to do. That's another huge boost, since my feed fetchers
have less work to do against the database.

I haven't been running with real-time with granular enough metrics to be able
to tell if the PuSH-enabled feeds are pinging correctly or not. And part of
the reason of that lapse on my part is that I still fetch the feeds regularly,
just 1/20th as often. So instead of every 3 minutes, it's every hour or so.
Over an order of magnitude less work, and I don't have to deal with feeds not
doing what they should be doing.

Naturally, in the aggregate I can tell it's working well, since a number of
new stories are coming in through PuSH.

------
ether
I know realtime technology is cool and all, but I don't think realtime is what
you should be doing. People are increasingly accessing RSS readers through
mobile devices such as phones and tablets, and that's where opportunity is at.
And this audience doesn't require realtime updates. If you just build what you
think is the best experience for desktop consumption, you're fighting against
google reader. not an efficient battle. My two cents.

~~~
tantalor
> And this audience doesn't require realtime updates.

Why not? These devices are increasingly used for reading, and they're always
connected. Sounds like a perfect environment for real-time updates.

------
espeed
Are you using Juggernaut for the node.js/real-time piece?

~~~
conesus
Nope, just Express.js, redis, and socket.io:
<http://github.com/samuelclay/NewsBlur/tree/master/node>

