Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
FriendFeed readying RSS accelerator (thedeal.com)
21 points by bootload on Aug 27, 2008 | hide | past | favorite | 18 comments


So, I'm not sure I completely understand, but I'd like to give it a shot.

You hit Feed Y every 30 seconds which is silly because it's probably updated once an hour or less. Every time you hit Feed Y, you get the RSS for the last 25 postings which takes up bandwidth and possibly processing power for the remote server to generate and processing power and bandwidth on your end to download and consume. BAD! So much waste trying to make sure we get instant updates!

So, Feed Y creates a little "I was last updated at time()" feed. You download that and are transmitting very little unless there's an update in which case you grab the feed. Awesome! Now, I ping 1,000,000 "I was updated at time()" feeds for various Facebook users. . .hmm, that's a ton of HTTP requests and transmission over head.

Scratch that, let's have Facebook create a single feed of all their feeds that have been updated in the past 60, 300, and 600 seconds! So, we hit that every 30 seconds and update then grab whatever is new! Wow! That means that for me to digest updates for all my users who want their Facebook feeding into my service, I just hit one SUP feed and then grab whatever that SUP feed says has changed. That's a lot better than hitting 1,000,000 feeds every 30 seconds.

So, this isn't good for your RSS reader or whatnot, but it would be huge for sites like Friendfeed or anything that feeds Friendfeed since it would reduce the bandwidth and processing on both sides. As they said, they are likely to miss some updates with SUP, but if they can get 90% of the updates that way, they can change from pinging individual feeds every 30 minutes to pining them every 300 minutes and still get new items 10 times as fast by pinging the SUP feeds every 3 minutes.

Cool. Practical. Open. Good-Neighbor vibrations for services they pull from. Could even be game changing as it could allow you to aggregate the data you like over many services in a way that was too bandwidth/processing intensive to scale if everyone started doing it.


It's such a good idea that it's already been done N times already. http://en.wikipedia.org/wiki/Ping_blog


SUP is conceptually similar, except that instead of pinging a handful of well known feeds, SUP producers declare where their updates are available. Also, by using SUP-IDs instead of urls, the SUP feed is smaller, publishers can keep feeds private, and spam problems are eliminated (ping feeds are generally full of spam blogs). Briefly, SUP adds:

- Discoverability (feeds declare where they publish their updates)

- Privacy (SUP does not expose a site namespace or secret urls)

- Efficiency (each SUP entry is only about 20 bytes)

In practice, none of the feed based services other than the blogs use pinging services. Our hope is that with SUP we can get faster updates from the 42 non-blogging services that FriendFeed imports.


I think the difference here is that there's one feed for all users rather than a ping-per-user/blog


But some of the ping servers already publish SUP-like consolidated feeds.


Our startup (http://feedity.com) delivers custom RSS feeds, and while working on a similar mechanism, we kept it relatively simple by performing a 'Conditional Get' before any content (HTML or RSS) is even fetched. Checking the 'Last-Modified' date, 'If-Modified-Since' date-time, and ETag HTTP header values for the source content, or the published data (meta data) if the HTTP header values are missing.

This makes for a simple and efficient solution, without over-hitting the resources.


The problem is that you still need to poll every single feed. SUP allows you to instead poll a single url. See the "update" note at the bottom of http://blog.friendfeed.com/2008/08/simple-update-protocol-fe... for more details.


What is the level of support of conditional get for blogs and RSS readers? I always assumed most readers would do conditional gets and most refined services (like blogger.com) would implement that too, since it is saving them bandwidth.

It is just implementing http properly and being a good netizen - like actually sending an identifiable user agent and following robots.txt.

BTW, feedity.com's homepage renders badly on my Firefox 3.0.1/WinXP


Isn't this what Gnip is doing, except that Gnip's solution is readily available to anyone who wants it?

In fact, I believe Gnip uses XMPP to push notifications to data consumers, which seems even more efficient.

Am I missing something?


No, Gnip is a complementary service and will likely consume SUP. SUP is intended to make it easier for feed publishers to expose information about which feeds have been updated. Without this information, Gnip can't know when feeds have updated except by polling all of them. SUP allows them to poll a single URL instead.


Got it. So this is designed to be the piece that allows publishers to easily integrate with intermediate services like Gnip, or with aggregation services like FF, SocialThing, etc.


Exactly


The news article and the blog post miss a big point that only became apparent to me through Taylor's comment (1): SUS allows services to collect all the individual user update IDs into one file, allowing--for example--FriendFeed to just check <flickr.com/.sus> rather than doing HTTP HEADs on thousands of Flickr feeds.

1: http://blog.friendfeed.com/2008/08/simple-update-protocol-fe...



Paul - one potential issue is the size of the parent SUP feed itself. For sites like Facebook/Flickr/Netflix which have millions of active users, the size of the SUP feed itself might be huge. What are your thoughts on getting around that? Overall, it sounds like a super idea.


The SUP feed is designed to be very compact. The SUP feed on FriendFeed (http://friendfeed.com/api/sup.json) is only about 8 bytes per entry after gzip. If a service had 10 updates per second (which is faster than Twitter, for example), then a 60 second SUP feed would be 4800 bytes, which is tiny.

My expectation is that only large feed consumers (such as FriendFeed or Gnip) will fetch SUP. Smaller ones will probably use an intermediary service such as Gnip since they would only be interested in a small fraction of the updates.

They key purpose of SUP is to make it as easy as possible for feed publishers to let the world know which of their feeds have updated.


If that's a problem, I'd assume you could shard your SUP feeds any number of ways.

Say... first 100k feeds reference sup1.json, second 100k reference sup2.json, etc.

And given that each RSS/Atom feed <links> to its SUP feed, you could reorganize arbitrarily by updating the link, and the poller should pick it up on the next go-around.


Like. Implemented.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: