
Cohort analysis - User retention in a Rails application - revorad
http://ninjasandrobots.com/cohort-analysis-user-retention-in-a-rails-application/
======
rraval
At a former Facebook internship, I worked for the insights team. Think Google
Analytics for Facebook Pages, Apps, etc.

One of the things I built that term was a tool to visualize user retention,
implementing triangle heatmaps, which I believe were invented in-house by
Danny Ferante.

The idea here is to exploit the very quick visual pattern matching we're able
to do as humans, and turn that into actionable cohort analysis. From the
screenshot in the article (<http://i.imgur.com/qBbkZv8.png>), I think we can
agree that it would become unwieldy with a large number of datapoints.

Compare this to <http://imgur.com/sOQ4vrm>, a screenshot of the triangle
heatmap generated for tcreech's Cover Photo Finder Facebook App. The x-axis
represents the cohort (the set of users that installed the app on the same
day) broken down by day instead of by week like the article (hence patterns
are more granular). The y-axis represents number of days after installation.
Each datapoint is then coloured to represent the percentage of users that
return to the app on (installation date + number of days).

A number of patterns are captured quite easily:

\- A vertical pattern is local to a specific cohort. A new promotion or
redesigned sign up page often results in this.

\- A horizontal pattern is local to a specific vintage. If your app has a
trial period that expires after 7 days, then you'll see your retention plummet
across all cohorts horizontally at y=7.

\- A diagonal pattern is local to a specific date. If your app is down on
January 2nd, then there will be a diagonal blue line (0%) across all cohorts.

I wrote up a work term report for the University of Waterloo detailing
triangle heatmaps: <http://zeroindexed.com/triangle.pdf>

Video released by Facebook explaining triangle heatmaps:
<https://www.facebook.com/video/video.php?v=3707283286197>

~~~
nate
Thanks for all this. I need to spend some time parsing it.

One thing that CohortMe is doing that keeps the thing from being unwieldy is
that I only go back 12 periods. So only 12 weeks, or 12 days or 12 months.

Not a perfect solution, but it's version 0.0.1 :) I really only wanted to see
12 weeks right now anyways. Until I have some decent data going into the
months.

~~~
rraval
Oh I understand completely. It's just that I think triangle heatmaps are
awesome visualizations that nobody's heard of. They were my entire life for 4
months and I can't stop talking about them given the opportunity :)

~~~
nate
That's awesome. I'll check them out. That's funny you mention the can't stop
talking about them. I keep asking anyone who'll listen to me about cohort
analysis :) to see if they want to geek out with me and talk stats and user
retention. So far, not a very good hit rate amongst friends.

------
callmeed
This looks cool and I'm excited to try it in a Rails project.

But, mainly, this post highlights my frustration with most analytics tools. I
have tried MixPanel and I'm currently paying $99/mo for Kiss Metrics (about to
cancel). Frankly, _I don't have the time to get neck-deep into one of these
services, integrate it, and figure out how to turn the data into revenue-
generating actions._ And that's setting aside the risk the one I choose gets
acqui-hired and shut down.

Maybe I'm asking for magic beans or maybe my products aren't the best fit, but
I'm a hacker and I want to do less work, not more. For now, I'll stick to
Google Analytics, some basic A/B testing, looking at data via the console, and
emailing with my users.

It's pretty sad that this blog post does a better job at explaining what a
"cohort" is than Kiss Metrics can [1].

[1]
[http://support.kissmetrics.com/#stq=cohort&stp=1](http://support.kissmetrics.com/#stq=cohort&stp=1)

~~~
nate
Yeah, me too man. There's also some frustration over here about what do these
analytics actually mean. Like I have so much data coming at me from everywhere
now, what exactly is it telling me. One neat presentation that helps provide
some focus is Dave McClure's metrics 4 pirates.
[http://www.slideshare.net/dmc500hats/startup-
metrics-4-pirat...](http://www.slideshare.net/dmc500hats/startup-
metrics-4-pirates-seoul)

I've seen this a bunch of times, but I like that focus on Activation and
Retention for new startups.

Ash Maurya's Lean Startup book is interesting in that it tries to quantify
when you can be ready to launch a new product: when you have 40% user
retention. But personally that seems high to me if your early users are full
of random joes who've signed up to some email capture form that promises
rainbows but they don't really know what they signed up for.

~~~
micdijkstra
I agree with both of you. I've been working for an incubator for the past 2
years building more than 6 startups and tried KISSmetrics on all them. In the
end we would code our own analytics into the app or keep a text file of SQL
queries we'd run once a week to get some real insights/learnings.

That's what led me to create Storyberg. I don't want this to be a plug, but
one idea we're playing with is Release Cohorts. Instead of just looking at new
users in cohorts, I believe we should also look at existing active users and
existing inactive users, and group them based on your release cycle (given the
users are experiencing a common feature set).

This comes from the idea that features are released for one of three reasons:
1) Improve new user activation, 2) Continue to engage existing users and 3)
Reactive existing inactive users.

------
destraynor
I'd love to see someone build in cycle plots in to this tool.

My real frustration with Cohort analyses is that the results are always
presented in this arcane format that doesn't actually answer any questions.

The question a cohort analysis is supposed to answer is "Is our retention
improving" and to get that answer you're supposed eyeball rows & columns of
text and make inferences (and _then_ go ahead and speculate some actions).

Here's a better way: [http://insideintercom.io/retention-cohorts-and-
visualisation...](http://insideintercom.io/retention-cohorts-and-
visualisations/)

~~~
bslatkin
I agree that the common presentation of cohort data is clunky. That's why I
built this tool with d3js to visualize cohorts as stacked bars:

<http://bslatkin.github.com/cohorts/>

"Impact maps" are too blurry in my experience; they assume your traffic is
normal and predictable. Cycle plots are fun to look at, but in practice the
lines are too volatile to make a pretty chart.

------
alexatkeplar
This is a cool opensource tool, many thanks for sharing nate. User activation
date is the most well-known cohort definition, but actually you can define a
cohort out of pretty much any data point you like, provided it makes sense to
then run a longitudinal study on those cohorts. Marketing acquisition channel
is another good one.

If people are interested in finding out more about cohort analyses, we wrote a
set of 7 articles which might be interesting:

[http://www.keplarllp.com/blog/2012/04/cohort-analyses-for-
di...](http://www.keplarllp.com/blog/2012/04/cohort-analyses-for-digital-
businesses-an-overview)

And here's the hands-on tutorial for rolling your own cohort analyses in SQL
with SnowPlow:

[http://snowplowanalytics.com/analytics/customer-
analytics/co...](http://snowplowanalytics.com/analytics/customer-
analytics/cohort-analysis.html)

------
jakestein
This seems very cool. We (www.rjmetrics.com) provide a hosted tool to get
metrics like cohort analysis out of the data you've already got on your
database. In addition to cohorts, you can also get metrics on time between
events, repeat event rates, and segmented lists of lapsed users.

We've got a lot of different visualization options, and we can consolidate
multiple data sources (ie Mongo, MySQL, spreadsheets, Google Analytics) into a
single reporting portal.

------
bitsweet
Interesting but MixPanel does this out of the box, no?

~~~
nate
Yep, and it's a great tool. But I was in a situation where I already have an
app with data that hasn't been using Mixpanel. In fact I've got a bunch of
apps like that. And I was irritated that there wasn't an easy way to just use
that data I already have without having to get into Excel and pivot tables.

I also didn't want to integrate with another API right now and figure out what
my "events" are, when those events already exist in my database.

~~~
whalesalad
The more I think about it, the more an entirely event based app makes a lot of
sense (these days). I'm not sure what the design methodology is called but
there's a concept where an app's state is essentially determined purely based
on events. So without the history, current state is lost. But... A lot of
benefits come from this. For example, you can essentially replay your app from
day one to any point later on in time.

Your comment made me think of this. I would have recently been in a similar
scenario but fortunately I built some internal event tracking early on. It's
also a rails app. So every action is tracked using some simple observers.
There are subclasses of Event for types like UserRegistrationEvent, etc...
Anyway long story short I can essentially replay history by using these events
and build, for example, notifications for those events or mixpanel tracking
data - retroactively.

It was a fortunate design decision and has ended up really showing its value a
number of times. Push notifications hook onto these, tracking credits for our
game component, etc...

Edit, I think that the concept I was referring to is called Event Sourcing.

Some resources: <http://martinfowler.com/eaaDev/EventSourcing.html>

    
    
        Event Sourcing ensures that all changes to application
        state are stored as a sequence of events. Not just can 
        we query these events, we can also use the event log to
        reconstruct past states, and as a foundation to automatically
        adjust the state to cope with retroactive changes.
    

And also: [http://krasserm.blogspot.se/2011/11/building-event-
sourced-w...](http://krasserm.blogspot.se/2011/11/building-event-sourced-web-
application.html)

~~~
steveklabnik
> So every action is tracked using some simple observers.

PSA, as of Rails 4, observers have been extracted to a plugin:
<https://github.com/rails/rails-observers>

------
abuiles
Nice job :) I've been in the same spot. Looking forward to try this out.

