

Feedly, build newsfeeds using Cassandra and Redis - tbarbugli
https://github.com/tschellenbach/Feedly#feedly

======
bjt
It took me a minute to understand that this wasn't feedly.com open-sourcing
their backend.

You have a PR problem with this name, with risk of elevating into a legal
problem.

~~~
tschellenbach
The name definitely seems to distract the discussion away from the code and
the project.

We started this project over a year ago before Feedly was famous. Now it's
indeed confusing, but changing an open source project's name is quite a pain
for us and people using Feedly.

Think the 1.0 release will be a good time to clean this up.

------
erichocean
_Feedly allows you to build newsfeed and notification systems using Cassandra
and /or Redis._

Suggestion: That the library is a Python library is just as, if not more,
important that working with Cassandra and/or Redis. You should mention that in
the first sentence, for example, "Feedly is a Python library that allows you
to easily build newsfeed and notification systems on top of Cassandra and/or
Redis."

\-----

More notes:

It's also not particularly clear what approach Feedly actually implemented.
You mention "push" and "push/pull", and say what Fashionista _used_ to use,
but never actually mention the approach Feedly has taken.

Without looking at the code, i.e. just from the README, I'd guess you're doing
full fanout (i.e. "push"), with no pull for inactive users (i.e. what Twitter
does for users inactive for 30 days). That's fine, except...

...you don't really discuss how Feedly handles the power law problem
(publishing to Lady Gaga's feed), which is the only difficult engineering
issue with these kinds of systems. If Feedly's approach is to ignore it, and
just push all 30 million subtasks into Celery, how will that impact all of the
updates to the other user's feeds, which are happening concurrently? How will
that impact memory usage? I couldn't tell just by reading the docs.

Finally, the stream itself is actually the easiest part (other than the power
law problem). Production systems need monitoring, along with some kind of
recommendation system for getting users connected to each other. Integrating
all that is obviously a separate project, but I'd be reluctant to build on top
of Feedly until I saw at least _some_ hooks for incorporating that stuff.

Otherwise, good work. I've been implementing this stuff recently, and the only
other semi-full featured open source project out there for activity streams
when I started was written in PHP, so it's nice to see something a bit more
hacker-friendly. Bonus points for Redis and Cassandra, two of the coolest
NoSQL database out there!

~~~
tschellenbach
At Fashiolista we went through a few phases.

A.) In the early days we simply pulled everything together by querying the
database which worked quite well.

B.) After that we switched to using redis and in a similar approach to
twitter's pushed only to active users and pulled the feed for inactive users.

C.) Now our requirements slowly changed and it became harder to fit everything
in memory (or fallback) that's why we switched to Cassandra.

Feedly offers you a framework, but allows you to make your own decisions
regarding the tradeoffs.

\- you can change the fanout to hit only active users (twitter, yahoo paper
approach)

\- you can chose to store the full activity in the feed or only an id (memory
usage vs extra lookups)

\- you can customize the priority of tasks

These are all things we've done at one point or another. You can chose your
own approach with Feedly.

For Fashiolista we currently use celery with rabbitmq and have different
queues and worker clusters for low and high priority fanout tasks.

In your example with Lady Gaga memory usage would go up on the rabbitmq
server. Since rabbit also stores to disk it won't run out easily. After that
the celery machines will handle the updates and autoscaling will kick in more
machines if needed.

------
JimmaDaRustla
This...isn't the same as the Feedly RSS newsreader.

Cool idea, but I would be reluctant to adopt Cassandra, but it looks like you
can use solely redis?

~~~
tschellenbach
We currently support Redis and Cassandra fully. Cassandra has a very high
maintenance overhead so we definitely recommend you to start with Redis.

More info here:
[http://feedly.readthedocs.org/en/latest/choosing_a_storage_b...](http://feedly.readthedocs.org/en/latest/choosing_a_storage_backend.html)

PS. We're also working on a DynamoDB backend

~~~
samspenc
Awesome! Any thoughts on HBase support (big HBase user/fan here!)

~~~
tschellenbach
Yes we are considering adding HBase. In general adding new backends is quite
easy.

It however takes a while to figure out the backend dependent performance
tweaks. So a company running HBase and Feedly in production would be of great
help.

------
beardicus
Nothing to do with the feed reader at feedly.com though? Granted all the names
are taken everywhere for everything, but this seems confusing.

~~~
tschellenbach
Yeay sucks a bit, project started more than a year ago, before the google
reader shutdown and feedly's rise to fame.

~~~
dylan-m
I think you should change it. You won't win fans by confusing people.

------
tschellenbach
So we've built Feedly for our startup Fashiolista.com It's quite a large
project for a team of 4 though. Therefor we are actively looking for more
contributors. Let us know if you're interested in helping out.

