FriendFeed readying RSS accelerator

mdasen · on Aug 28, 2008

So, I'm not sure I completely understand, but I'd like to give it a shot.

You hit Feed Y every 30 seconds which is silly because it's probably updated once an hour or less. Every time you hit Feed Y, you get the RSS for the last 25 postings which takes up bandwidth and possibly processing power for the remote server to generate and processing power and bandwidth on your end to download and consume. BAD! So much waste trying to make sure we get instant updates!

So, Feed Y creates a little "I was last updated at time()" feed. You download that and are transmitting very little unless there's an update in which case you grab the feed. Awesome! Now, I ping 1,000,000 "I was updated at time()" feeds for various Facebook users. . .hmm, that's a ton of HTTP requests and transmission over head.

Scratch that, let's have Facebook create a single feed of all their feeds that have been updated in the past 60, 300, and 600 seconds! So, we hit that every 30 seconds and update then grab whatever is new! Wow! That means that for me to digest updates for all my users who want their Facebook feeding into my service, I just hit one SUP feed and then grab whatever that SUP feed says has changed. That's a lot better than hitting 1,000,000 feeds every 30 seconds.

So, this isn't good for your RSS reader or whatnot, but it would be huge for sites like Friendfeed or anything that feeds Friendfeed since it would reduce the bandwidth and processing on both sides. As they said, they are likely to miss some updates with SUP, but if they can get 90% of the updates that way, they can change from pinging individual feeds every 30 minutes to pining them every 300 minutes and still get new items 10 times as fast by pinging the SUP feeds every 3 minutes.

Cool. Practical. Open. Good-Neighbor vibrations for services they pull from. Could even be game changing as it could allow you to aggregate the data you like over many services in a way that was too bandwidth/processing intensive to scale if everyone started doing it.

wmf · on Aug 28, 2008

It's such a good idea that it's already been done N times already. http://en.wikipedia.org/wiki/Ping_blog

paul · on Aug 28, 2008

SUP is conceptually similar, except that instead of pinging a handful of well known feeds, SUP producers declare where their updates are available. Also, by using SUP-IDs instead of urls, the SUP feed is smaller, publishers can keep feeds private, and spam problems are eliminated (ping feeds are generally full of spam blogs). Briefly, SUP adds:

- Discoverability (feeds declare where they publish their updates)

- Privacy (SUP does not expose a site namespace or secret urls)

- Efficiency (each SUP entry is only about 20 bytes)

In practice, none of the feed based services other than the blogs use pinging services. Our hope is that with SUP we can get faster updates from the 42 non-blogging services that FriendFeed imports.

raghus · on Aug 28, 2008

I think the difference here is that there's one feed for all users rather than a ping-per-user/blog

wmf · on Aug 28, 2008

But some of the ping servers already publish SUP-like consolidated feeds.

nreece · on Aug 28, 2008

Our startup (http://feedity.com) delivers custom RSS feeds, and while working on a similar mechanism, we kept it relatively simple by performing a 'Conditional Get' before any content (HTML or RSS) is even fetched. Checking the 'Last-Modified' date, 'If-Modified-Since' date-time, and ETag HTTP header values for the source content, or the published data (meta data) if the HTTP header values are missing.

This makes for a simple and efficient solution, without over-hitting the resources.

paul · on Aug 28, 2008

The problem is that you still need to poll every single feed. SUP allows you to instead poll a single url. See the "update" note at the bottom of http://blog.friendfeed.com/2008/08/simple-update-protocol-fe... for more details.

ntoshev · on Aug 28, 2008

What is the level of support of conditional get for blogs and RSS readers? I always assumed most readers would do conditional gets and most refined services (like blogger.com) would implement that too, since it is saving them bandwidth.

It is just implementing http properly and being a good netizen - like actually sending an identifiable user agent and following robots.txt.

BTW, feedity.com's homepage renders badly on my Firefox 3.0.1/WinXP

ryanwaggoner · on Aug 28, 2008

Isn't this what Gnip is doing, except that Gnip's solution is readily available to anyone who wants it?

In fact, I believe Gnip uses XMPP to push notifications to data consumers, which seems even more efficient.

Am I missing something?

paul · on Aug 28, 2008

No, Gnip is a complementary service and will likely consume SUP. SUP is intended to make it easier for feed publishers to expose information about which feeds have been updated. Without this information, Gnip can't know when feeds have updated except by polling all of them. SUP allows them to poll a single URL instead.

ryanwaggoner · on Aug 28, 2008

Got it. So this is designed to be the piece that allows publishers to easily integrate with intermediate services like Gnip, or with aggregation services like FF, SocialThing, etc.

paul · on Aug 28, 2008

Exactly

shadytrees · on Aug 28, 2008

The news article and the blog post miss a big point that only became apparent to me through Taylor's comment (1): SUS allows services to collect all the individual user update IDs into one file, allowing--for example--FriendFeed to just check <flickr.com/.sus> rather than doing HTTP HEADs on thousands of Flickr feeds.

1: http://blog.friendfeed.com/2008/08/simple-update-protocol-fe...

paul · on Aug 27, 2008

See http://blog.friendfeed.com/2008/08/simple-update-protocol-fe...

raghus · on Aug 28, 2008

Paul - one potential issue is the size of the parent SUP feed itself. For sites like Facebook/Flickr/Netflix which have millions of active users, the size of the SUP feed itself might be huge. What are your thoughts on getting around that? Overall, it sounds like a super idea.

paul · on Aug 28, 2008

The SUP feed is designed to be very compact. The SUP feed on FriendFeed (http://friendfeed.com/api/sup.json) is only about 8 bytes per entry after gzip. If a service had 10 updates per second (which is faster than Twitter, for example), then a 60 second SUP feed would be 4800 bytes, which is tiny.

My expectation is that only large feed consumers (such as FriendFeed or Gnip) will fetch SUP. Smaller ones will probably use an intermediary service such as Gnip since they would only be interested in a small fraction of the updates.

They key purpose of SUP is to make it as easy as possible for feed publishers to let the world know which of their feeds have updated.

joshstaiger · on Aug 28, 2008

If that's a problem, I'd assume you could shard your SUP feeds any number of ways.

Say... first 100k feeds reference sup1.json, second 100k reference sup2.json, etc.

And given that each RSS/Atom feed <links> to its SUP feed, you could reorganize arbitrarily by updating the link, and the poller should pick it up on the next go-around.

brlewis · on Aug 28, 2008

Like. Implemented.