

How to Build a Lean Startup, step-by-step - timothychung
http://www.oreillynet.com/pub/e/1294

======
swombat
Excellent talk. It repeats many of the points on Eric's blog, but they're all
very good points worth hearing again.

Here's my question, though. I'm really struggling with this one, and I think
Eric Ries is aware of HN, so I'd really love an informed answer (hint hint).

I run a start-up, <http://www.woobius.com>

We are getting reasonable traffic levels for this niche industry, but are
still at a fairly early stage. We do not have thousands of visitors a day, or
thousands of users a day. Each user is influenced by all sorts of special
circumstances, such as whether they're in a company that we've been talking
to, whether they're an architect, an engineer, a project manager, etc... As
far as I can tell, they are heterogeneous, each of them mostly unique.

Moreover, the line between signup and purchase is not so clear. My start-up's
product is project- and company-based. People might use it every day yet never
pay for it if one of their colleagues paid for it. That doesn't mean they're
not a happy customer, it just means that, for example, they're at a point in
their career where they're not directing projects or making purchasing
decisions.

Users also differ in their usage patterns. Some of them use our application to
send files. Others only to receive or download them. Again, the users can be
sliced in many heterogeneous groups by which activities they favour.

In those conditions, I find it _extremely difficult_ to devise A/B experiments
that measure things against a productive end result ("$$$" to use the notation
in this presentation). We do measure and learn, but the way we do this is by
talking to our users, or standing over their shoulder and watching them use
the application.

*

I'd love to be able to implement a more scientific approach to testing out new
features, but it just doesn't seem practical to me, given the circumstances of
my start-up.

If I _don't_ slice the users into more homogeneous groups before doing the A/B
testing, the results will, imho, be flawed because there might easily be more
users of one kind in A than in B. If I _do_ slice them, I'll end up with
groups of 10-50 users, because of all those differences that I'll have to
slice for. With such small numbers, individual circumstances will, in my
opinion, have far more of an effect on usage patterns than whether or not I
add a button somewhere.

*

So how do you apply this "A/B test every change" approach to such an
environment? Especially since we do make many changes a day (though we deploy
every few days), so letting each change sit around for a week to accumulate
A/B users would severely slow down our progress.

Any advice would be most welcome.

~~~
eries
Thanks for the really thoughtful comment. Let me try and unpack your question
into a few parts, and answer each one separately.

First of all, the fundamental feedback loop doesn't require A/B tests. What
matters is that you act in a disciplined way to transform ideas into products,
measure what happens, and learn for the next set of ideas. Over time, you
should get faster at executing this feedback loop, not slower. A/B testing is
a great methodology, but not the only one. You might take a look at Net
Promoter Score (NPS) for example, as one alternate way of gauging customer
reaction to the changes you're making. If you look at the actual practice of
science, you'll notice that not all branches can do controlled
experimentation. In cosmology, for example, they have to rely on "natural
experiments" because they (so far) lack the tools to conduct experiments
involving large gravitational masses, etc. Subjective forms of data-
collection, like in-person interviews and usability tests, can provide
"validated learning about customers" if you are disciplined about it.

Second, I'm not sure I agree that you can't do A/B split-tests in this
situation. The number of customers you have per day is not really relevant -
that only affects how long it takes you to get a statistically significant
result. You might need weeks to get enough customers through your 50/50 test
to get good data, but that doesn't necessarily mean that's a bad idea. It
might "slow you down" from the point of view of coding, but if it prevents you
from building a feature that nobody wants, that speeds you up much more.

In fact, I would work backwards from "what do I need to have in order to
validate my hypotheses" and then structure the rest of your business around
that. For example, you might not want to spend the dollars on AdWords to drive
traffic to your business at this stage, because the ROI is not high enough
yet. On the other hand, if increased traffic leads to rapid iteration which
leads to customer validation, that might be a good trade-off. Do what you have
to do to accelerate your learning.

Last, your fear about the differences in types of customers is important to
address. Start getting clear about the "customer archetype" you think is most
likely to use your product. What does a day in their life look like, for
example? Why are they crazy enough to use your early-stage product, instead of
the more rational thing which would be to buy from an established player?

If I had to guess, I would say that, most likely, your current customers all
have something pretty specific in common. Although they may have wildly
different demographics and usage patterns, it's likely that they are all early
adopters of ... something. Otherwise, they wouldn't be wasting their time with
your product. The more you understand what that is, the better you'll be able
to tailor your product to their needs. But, more importantly, this commonality
probably means your split-tests (and usability tests) have more validity than
you think.

Last note, just because you _can_ run split-tests on trivial changes (like
"whether or not I add a button somewhere") doesn't mean that you should use
them for that purpose. You can also run split-tests on big, meaty changes that
elicit a strong reaction from customers. And if you recall from basic
statistics, the incidence rate of the thing being measured is just as
important in determining significance as the sample size. Thus, if you're
split-testing important things, you can get a good result with much smaller
samples. And, as an early-stage startup, I'd maintain it's only worth testing
things that (you believe) have a large impact.

Does that help?

Further reading on the topic:
[http://startuplessonslearned.blogspot.com/search/label/split...](http://startuplessonslearned.blogspot.com/search/label/split-
test)

[http://startuplessonslearned.blogspot.com/search/label/liste...](http://startuplessonslearned.blogspot.com/search/label/listening%20to%20customers)

[http://startuplessonslearned.blogspot.com/search/label/custo...](http://startuplessonslearned.blogspot.com/search/label/customer%20development)

------
dawie
I can't seem to view the webcast. Am I missing something?

