Feedly, build newsfeeds using Cassandra and Redis

bjt · on Oct 15, 2013

It took me a minute to understand that this wasn't feedly.com open-sourcing their backend.

You have a PR problem with this name, with risk of elevating into a legal problem.

tschellenbach · on Oct 15, 2013

The name definitely seems to distract the discussion away from the code and the project.

We started this project over a year ago before Feedly was famous. Now it's indeed confusing, but changing an open source project's name is quite a pain for us and people using Feedly.

Think the 1.0 release will be a good time to clean this up.

erichocean · on Oct 15, 2013

Feedly allows you to build newsfeed and notification systems using Cassandra and/or Redis.

Suggestion: That the library is a Python library is just as, if not more, important that working with Cassandra and/or Redis. You should mention that in the first sentence, for example, "Feedly is a Python library that allows you to easily build newsfeed and notification systems on top of Cassandra and/or Redis."

-----

More notes:

It's also not particularly clear what approach Feedly actually implemented. You mention "push" and "push/pull", and say what Fashionista used to use, but never actually mention the approach Feedly has taken.

Without looking at the code, i.e. just from the README, I'd guess you're doing full fanout (i.e. "push"), with no pull for inactive users (i.e. what Twitter does for users inactive for 30 days). That's fine, except...

...you don't really discuss how Feedly handles the power law problem (publishing to Lady Gaga's feed), which is the only difficult engineering issue with these kinds of systems. If Feedly's approach is to ignore it, and just push all 30 million subtasks into Celery, how will that impact all of the updates to the other user's feeds, which are happening concurrently? How will that impact memory usage? I couldn't tell just by reading the docs.

Finally, the stream itself is actually the easiest part (other than the power law problem). Production systems need monitoring, along with some kind of recommendation system for getting users connected to each other. Integrating all that is obviously a separate project, but I'd be reluctant to build on top of Feedly until I saw at least some hooks for incorporating that stuff.

Otherwise, good work. I've been implementing this stuff recently, and the only other semi-full featured open source project out there for activity streams when I started was written in PHP, so it's nice to see something a bit more hacker-friendly. Bonus points for Redis and Cassandra, two of the coolest NoSQL database out there!

tschellenbach · on Oct 15, 2013

At Fashiolista we went through a few phases.

A.) In the early days we simply pulled everything together by querying the database which worked quite well.

B.) After that we switched to using redis and in a similar approach to twitter's pushed only to active users and pulled the feed for inactive users.

C.) Now our requirements slowly changed and it became harder to fit everything in memory (or fallback) that's why we switched to Cassandra.

Feedly offers you a framework, but allows you to make your own decisions regarding the tradeoffs.

- you can change the fanout to hit only active users (twitter, yahoo paper approach)

- you can chose to store the full activity in the feed or only an id (memory usage vs extra lookups)

- you can customize the priority of tasks

These are all things we've done at one point or another. You can chose your own approach with Feedly.

For Fashiolista we currently use celery with rabbitmq and have different queues and worker clusters for low and high priority fanout tasks.

In your example with Lady Gaga memory usage would go up on the rabbitmq server. Since rabbit also stores to disk it won't run out easily. After that the celery machines will handle the updates and autoscaling will kick in more machines if needed.

beardicus · on Oct 15, 2013

Nothing to do with the feed reader at feedly.com though? Granted all the names are taken everywhere for everything, but this seems confusing.

tschellenbach · on Oct 15, 2013

Yeay sucks a bit, project started more than a year ago, before the google reader shutdown and feedly's rise to fame.

dylan-m · on Oct 15, 2013

I think you should change it. You won't win fans by confusing people.

JimmaDaRustla · on Oct 15, 2013

This...isn't the same as the Feedly RSS newsreader.

Cool idea, but I would be reluctant to adopt Cassandra, but it looks like you can use solely redis?

tschellenbach · on Oct 15, 2013

We currently support Redis and Cassandra fully. Cassandra has a very high maintenance overhead so we definitely recommend you to start with Redis.

More info here: http://feedly.readthedocs.org/en/latest/choosing_a_storage_b...

PS. We're also working on a DynamoDB backend

samspenc · on Oct 15, 2013

Awesome! Any thoughts on HBase support (big HBase user/fan here!)

tschellenbach · on Oct 15, 2013

Yes we are considering adding HBase. In general adding new backends is quite easy.

It however takes a while to figure out the backend dependent performance tweaks. So a company running HBase and Feedly in production would be of great help.

tbarbugli · on Oct 15, 2013

Yes, you can use solely Redis. In fact this is the suggested way to start with Feedly (the example app uses only Redis).

JimmaDaRustla · on Oct 15, 2013

Awesome! I have intentions to make a stock analysis app to share and monitor different analytics...this could come in handy building a feed for them.

tschellenbach · on Oct 15, 2013

So we've built Feedly for our startup Fashiolista.com It's quite a large project for a team of 4 though. Therefor we are actively looking for more contributors. Let us know if you're interested in helping out.