Hacker News new | past | comments | ask | show | jobs | submit login
Cohort analysis - User retention in a Rails application (ninjasandrobots.com)
41 points by revorad on Feb 19, 2013 | hide | past | favorite | 21 comments

At a former Facebook internship, I worked for the insights team. Think Google Analytics for Facebook Pages, Apps, etc.

One of the things I built that term was a tool to visualize user retention, implementing triangle heatmaps, which I believe were invented in-house by Danny Ferante.

The idea here is to exploit the very quick visual pattern matching we're able to do as humans, and turn that into actionable cohort analysis. From the screenshot in the article (http://i.imgur.com/qBbkZv8.png), I think we can agree that it would become unwieldy with a large number of datapoints.

Compare this to http://imgur.com/sOQ4vrm, a screenshot of the triangle heatmap generated for tcreech's Cover Photo Finder Facebook App. The x-axis represents the cohort (the set of users that installed the app on the same day) broken down by day instead of by week like the article (hence patterns are more granular). The y-axis represents number of days after installation. Each datapoint is then coloured to represent the percentage of users that return to the app on (installation date + number of days).

A number of patterns are captured quite easily:

- A vertical pattern is local to a specific cohort. A new promotion or redesigned sign up page often results in this.

- A horizontal pattern is local to a specific vintage. If your app has a trial period that expires after 7 days, then you'll see your retention plummet across all cohorts horizontally at y=7.

- A diagonal pattern is local to a specific date. If your app is down on January 2nd, then there will be a diagonal blue line (0%) across all cohorts.

I wrote up a work term report for the University of Waterloo detailing triangle heatmaps: http://zeroindexed.com/triangle.pdf

Video released by Facebook explaining triangle heatmaps: https://www.facebook.com/video/video.php?v=3707283286197

Thanks for all this. I need to spend some time parsing it.

One thing that CohortMe is doing that keeps the thing from being unwieldy is that I only go back 12 periods. So only 12 weeks, or 12 days or 12 months.

Not a perfect solution, but it's version 0.0.1 :) I really only wanted to see 12 weeks right now anyways. Until I have some decent data going into the months.

Oh I understand completely. It's just that I think triangle heatmaps are awesome visualizations that nobody's heard of. They were my entire life for 4 months and I can't stop talking about them given the opportunity :)

That's awesome. I'll check them out. That's funny you mention the can't stop talking about them. I keep asking anyone who'll listen to me about cohort analysis :) to see if they want to geek out with me and talk stats and user retention. So far, not a very good hit rate amongst friends.

This looks cool and I'm excited to try it in a Rails project.

But, mainly, this post highlights my frustration with most analytics tools. I have tried MixPanel and I'm currently paying $99/mo for Kiss Metrics (about to cancel). Frankly, I don't have the time to get neck-deep into one of these services, integrate it, and figure out how to turn the data into revenue-generating actions. And that's setting aside the risk the one I choose gets acqui-hired and shut down.

Maybe I'm asking for magic beans or maybe my products aren't the best fit, but I'm a hacker and I want to do less work, not more. For now, I'll stick to Google Analytics, some basic A/B testing, looking at data via the console, and emailing with my users.

It's pretty sad that this blog post does a better job at explaining what a "cohort" is than Kiss Metrics can [1].

[1] http://support.kissmetrics.com/#stq=cohort&stp=1

Yeah, me too man. There's also some frustration over here about what do these analytics actually mean. Like I have so much data coming at me from everywhere now, what exactly is it telling me. One neat presentation that helps provide some focus is Dave McClure's metrics 4 pirates. http://www.slideshare.net/dmc500hats/startup-metrics-4-pirat...

I've seen this a bunch of times, but I like that focus on Activation and Retention for new startups.

Ash Maurya's Lean Startup book is interesting in that it tries to quantify when you can be ready to launch a new product: when you have 40% user retention. But personally that seems high to me if your early users are full of random joes who've signed up to some email capture form that promises rainbows but they don't really know what they signed up for.

I agree with both of you. I've been working for an incubator for the past 2 years building more than 6 startups and tried KISSmetrics on all them. In the end we would code our own analytics into the app or keep a text file of SQL queries we'd run once a week to get some real insights/learnings.

That's what led me to create Storyberg. I don't want this to be a plug, but one idea we're playing with is Release Cohorts. Instead of just looking at new users in cohorts, I believe we should also look at existing active users and existing inactive users, and group them based on your release cycle (given the users are experiencing a common feature set).

This comes from the idea that features are released for one of three reasons: 1) Improve new user activation, 2) Continue to engage existing users and 3) Reactive existing inactive users.

Ya I was in the same position. I was going to give mixpanel and try. They are doing a presentation in the area in a week or two if anyone is interested. http://www.meetup.com/SF-Growth-Hackers/events/106188332/

I'd love to see someone build in cycle plots in to this tool.

My real frustration with Cohort analyses is that the results are always presented in this arcane format that doesn't actually answer any questions.

The question a cohort analysis is supposed to answer is "Is our retention improving" and to get that answer you're supposed eyeball rows & columns of text and make inferences (and then go ahead and speculate some actions).

Here's a better way: http://insideintercom.io/retention-cohorts-and-visualisation...

I agree that the common presentation of cohort data is clunky. That's why I built this tool with d3js to visualize cohorts as stacked bars:


"Impact maps" are too blurry in my experience; they assume your traffic is normal and predictable. Cycle plots are fun to look at, but in practice the lines are too volatile to make a pretty chart.

"Dammit Jim, Im a Doctor Not A Miracle Worker!"

Me too! It's in the plans if I get around to it. Didn't have time to get it done for v0.0.1 but yeah, I want them too.

This is a cool opensource tool, many thanks for sharing nate. User activation date is the most well-known cohort definition, but actually you can define a cohort out of pretty much any data point you like, provided it makes sense to then run a longitudinal study on those cohorts. Marketing acquisition channel is another good one.

If people are interested in finding out more about cohort analyses, we wrote a set of 7 articles which might be interesting:


And here's the hands-on tutorial for rolling your own cohort analyses in SQL with SnowPlow:


This seems very cool. We (www.rjmetrics.com) provide a hosted tool to get metrics like cohort analysis out of the data you've already got on your database. In addition to cohorts, you can also get metrics on time between events, repeat event rates, and segmented lists of lapsed users.

We've got a lot of different visualization options, and we can consolidate multiple data sources (ie Mongo, MySQL, spreadsheets, Google Analytics) into a single reporting portal.

Interesting but MixPanel does this out of the box, no?

Yep, and it's a great tool. But I was in a situation where I already have an app with data that hasn't been using Mixpanel. In fact I've got a bunch of apps like that. And I was irritated that there wasn't an easy way to just use that data I already have without having to get into Excel and pivot tables.

I also didn't want to integrate with another API right now and figure out what my "events" are, when those events already exist in my database.

The more I think about it, the more an entirely event based app makes a lot of sense (these days). I'm not sure what the design methodology is called but there's a concept where an app's state is essentially determined purely based on events. So without the history, current state is lost. But... A lot of benefits come from this. For example, you can essentially replay your app from day one to any point later on in time.

Your comment made me think of this. I would have recently been in a similar scenario but fortunately I built some internal event tracking early on. It's also a rails app. So every action is tracked using some simple observers. There are subclasses of Event for types like UserRegistrationEvent, etc... Anyway long story short I can essentially replay history by using these events and build, for example, notifications for those events or mixpanel tracking data - retroactively.

It was a fortunate design decision and has ended up really showing its value a number of times. Push notifications hook onto these, tracking credits for our game component, etc...

Edit, I think that the concept I was referring to is called Event Sourcing.

Some resources: http://martinfowler.com/eaaDev/EventSourcing.html

    Event Sourcing ensures that all changes to application
    state are stored as a sequence of events. Not just can 
    we query these events, we can also use the event log to
    reconstruct past states, and as a foundation to automatically
    adjust the state to cope with retroactive changes.
And also: http://krasserm.blogspot.se/2011/11/building-event-sourced-w...

> So every action is tracked using some simple observers.

PSA, as of Rails 4, observers have been extracted to a plugin: https://github.com/rails/rails-observers

That sounds pretty awesome. It kind of reminded me of doing operational transforms on collaborative documents.

FYI: You could've just dumped that raw log data into Mixpanel and we would've given you an incredibly beautiful cohort report. We have an API:


Since you already know the events in your DB, you could've just named them whatever they were. It would've been a smaller script to just import the data into Mixpanel.

Nice job :) I've been in the same spot. Looking forward to try this out.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact