

How to Model Viral Growth: Retention and Virality Curves - rahulvohra
http://www.linkedin.com/today/post/article/20130402154324-18876785-how-to-model-viral-growth-retention-virality-curves

======
idoh
Definitely an interesting article and one worth reading. I work a lot on viral
growth and I'd like to add a counterpoint that the growth of apps is so
complicated that it is basically impossible to model in a useful way.

I've found that even if I make no changes to an app, the retention and virals
fluctuate quite a bit for no apparent reason, and the fluctuations are big
enough that it makes long term forecasting really more of guesswork than
anything else.

Also, there are second order effects that are hard to model as well. For
instance, improving virals can improve retention (user A invites friend B,
user A stays for longer because their friend uses it).

I've gone through the process of modeling a couple apps, and it quickly gets
to a point where the relationships become circular and small variations cause
exponential differences down the line.

It is important to make informed decisions about virals and retention, but I
don't think such a model is the way to do it. I think it is more important to
think about optionality and decision making in opaque environments rather than
trying to model the unmodelable.

------
adolgert
This multiplicative kind of model is happily amenable to time series analysis,
so you can do stats to see what your numbers are and how well they fit. That's
great. What's less great is the model quality, given that well-tested virality
models can be found in other venues. Coffman looks at this, for instance, at
[http://datacommunitydc.org/blog/2013/01/better-science-of-
vi...](http://datacommunitydc.org/blog/2013/01/better-science-of-viral-
marketing/). The difference in these two types of models is that symmetries in
the statement of the problem permit, or exclude, classes of solutions. Those
symmetries come from assumptions about the contact graph, the most basic (and
testable) assumption.

------
richardjordan
This is great stuff. I see so many startups get so excited about features and
growth yet fail in their analysis of retention. Maybe I'm a bit on the data-
nerd side but I love to see folks sharing their own methods for tracking and
calculating this stuff. Even with so many startups basing their model on
recurring revenue, it's still easy to trip up on modeling this stuff going
forward.

~~~
Mahn
> so many startups get so excited about features and growth yet fail in their
> analysis of retention

I think it's just we write less about it, not so much that we aren't aware of
its importance. "How we managed to retain our users for 4 months" sounds
admittedly less sexy than "How we got a bazillion users in less than 72
hours", but the truth is a tech startup with no strong retention strategy is
basically dead in the water, and generally folks know this.

------
graycat
Here's a different approach:

We denote time by t with units, say, days.

The number of customers at time t is the (real valued function of a real
variable) y(t).

We assume that at the present t = 0 and that we have y(0), that is, the
current number of customers.

We let the number of customers who will ever try our business be b. That is, b
is our intended 'market potential'.

Initially we assume that once we get a person as a customer, we do not ever
lose them but keep them forever.

As usual, we let y'(t) = dy(t)/dt be the calculus first derivative of y(t).
Then y'(t) is number of new customers per day, that is, the 'rate' at which we
gain customers.

For 'virality' we notice that that is proportional to (1) the number of
customers y(t) we have 'talking' about our business and (2) the number of
people

    
    
         b - y(t)
    

yet to be our our customers hearing the talking.

Then we have that for some constant of proportionality k

    
    
         y'(t) = k y(t) (b - y(t))
    

So we have an initial value problem (that is, we know y(0)) for a first order
(we use only the first derivative) ordinary (no partial derivatives)
differential equation.

Then from calculus,

    
    
         y(t) = y(0) b exp(bkt) /
                ( y(0)( exp(bkt) - 1) + b))
    

So this solution grows (1) initially slowly, (2) then more rapidly, (3) then
more slowly and approaches b asymptotically from below.

In case we lose some customers forever at some rate r, then we get the same
solution except k and b get adjusted.

Once there was a startup (now a major company) that was struggling and had as
an investor a major company with a Board seat and at the startup two
representatives, one in finance and the other in aeronautical engineering.

The two representatives had asked for some revenue growth projections.

People around the HQ considered what the startup hoped, intended, thought
might happen, etc., but found nothing credible.

One guy who remembered calculus reluctantly got involved, formulated and
solved the differential equation above, and showed the solution to a Senior VP
of Planning (SVP) who reported to the founder, CEO, COB. The SVP was
responsible for the projections. The SVP took the guy's calculus solution as
the basis of the projections and on a Friday sat with the guy with a pocket
calculator and some graph paper and graphed solutions to the differential
equation for selected values of the constant k and picked one of the solutions
as the official projection.

The next day, Saturday, at about noon, the guy was in his office working on
some other math problems and got a call from a person asking if he knew about
the projections for the Board and if he could come over to the HQ? Sure. When
the guy arrived, the situation was grim: The two representatives of the major
Board Member were standing in the hall with their bags packed with airline
tickets back to Texas. The startup was about to die.

The SVP was traveling and out of town.

The person who had called got the graph of projections from the previous day
and asked the guy to reproduce a point on the graph. Using the calculator, the
solution above, and a few keystrokes, the point on the graph was reproduced.
After several more points were reproduced, the area became happier; the two
representatives on the Board stayed, and the startup was saved.

Later the person who had called explained that that Saturday was a Board
meeting, the growth projection graph was shown, and the two representatives
had asked how the projections were calculated. The rest of the company tried
to reproduce the graph but could not. The Board meeting stopped. The two
representatives lost patience with the startup, got airline tickets back to
Texas, returned to their rented rooms, packed their bags, and as a last chance
returned to the startup to see if there was an answer to how the projections
were calculated.

Ah, one saved startup! One reason to take calculus seriously!

~~~
graycat
Note that with this derivation, if accept the assumptions (which obviously do
not always hold), then all there is to 'viral' growth are three numbers, the
current number of customers y(0), the eventual number of customers b, and the
constant k. This situation holds also in the case of some customers leaving
and never coming back (just by some adjustments in b and k).

For k, might fit to past data. For given y(0) and b, all k does is adjust how
fast the curve rises to the asymptote. So basically all we are doing is
interpolating between y(0) and b.

Otherwise, all viral curves are the same.

So, an advantage of my derivation is a simple, explicit equation for a fairly
general solution.

The article has a comment claiming that biology addresses a similar problem
and gets a 'logistic' curve. The comment didn't say just what was meant by a
logistic curve, but I suspect that my solution here is an example. If so, then
here we have an 'axiomatic' derivation of the logistic curve.

It is true that the growth of some products, e.g., TV sets, look to the eye
very much like one of the curves from my solution for selected values of y(0),
b, and k.

Could also make a Markov assumption: So, assume that get new customers (and,
if wish, lose old customers) at some 'rates' and, thus, get a continuous time,
discrete state space Markov process. Then as is well known the solution is a
matrix exponential. Could evaluate the matrix exponential or just use Monte
Carlo to generate a few thousand sample paths. Then could put some confidence
limits on the deterministic solution.

Since no one guessed the war story, the startup was FedEx, the SVP was Mike
Basch, the CEO, of course, was Fred Smith, the person who called on the phone
was Roger Frock, and the investor was General Dynamics. The arithmetic was
courtesy of an HP-35. So, HP might run an ad saying how they saved FedEx!

------
jacques_chester
As others have pointed out, this can modelled with calc pretty handily.

I found agent-based modelling much more interesting. For example, mean-field
models struggle with non-uniform spaces.

For the interest of persons here attending, I've uploaded my crappy code on
this topic. It's 3 years old and not production suitable.

<https://github.com/jchester/ruby-epidemic-model>

