
Experiment, Measure, Repeat - ecaml
https://blog.buildo.io/experiment-measure-repeat-5e2c389e63ff
======
oftenwrong
"avoid maintaining useless things"

If your company is small, heed this warning. If you're spending a lot of time
on maintenance of non-essential things, and struggling with supporting legacy
things while creating new things, you are missing a huge opportunity. Right
now, "trimming the fat" is as easy as it will ever be. As your company grows,
it will only become more difficult. Be ruthless now with maintaining focus on
things that "move the needle", and with killing things that don't.

Analogy:

It's like you're gearing up for a long hike. There is a natural tendency to
take things along "just in case", even if you know you probably won't need it.
At the trailhead, you could easily leave some of that non-essential stuff in
car. You try on your pack, and it doesn't feel that heavy. You think "what's
the harm". Of course, you are still fresh and full of energy at the trailhead.
A few days into your hike, you start to really feel the weight of that extra
stuff. The pack digs into your shoulders and hips. Your entire lower body is
sore. You regret bringing the extra stuff, but now that you are in the
wilderness, you cannot just dump it. You have to carry it back with you, and
you wish you had exercised restraint when you had the chance.

~~~
maxxxxx
I am working more and more on this. We have a ton of legacy code we drag along
because nobody understands it so it's too scary to touch. I have started to
push the idea that it's simply not acceptable to have code we don't
understand. We have now refactored several parts that were painfully complex
and convoluted by analyzing what they really do and then rewriting or
changing. Most of the time it is not that difficult once you commit to the
task.

------
nnutter
They measured long enough in this specific example that they avoided the
problem but something I see people repeated forget is to establish a baseline
before making a change. If you don't know how variable something is before you
make your change you might naively think you made something better or worse
when it's within normal variability.

~~~
ecaml
Good point, and measuring things as they are before experimenting can also
serve as a valid motivation to start an experiment.

------
cjf4
This sounds an awful lot like the lean/six sigma (lss) tool set that every
company of a certain size and age has experience with. But that's not to
invalidate the ideas here, as lss can unfortunately be prone to a cultish
zealotry that mutates the original principles.

------
mwexler
Is it just me, or is this example not an experiment? There was just pre and
post change measures, with no comparison group. The measure of success was
"use more threads" which was the same as the treatment, instead of the actual
goal, making an improved perception that a Slack channel was faster and easier
to read (and potentially improving productivity: faster ship, more tests
passing 1st time, etc.).

A better method might have been something like: Pick two channels with equal
traffic and relevance to business, require one channel to emphasize threads
and respect quiet, let the other go on as they are, compare groups at the end
on the actual metric of concern (of users in each group, quick check/survey of
perceived utility, ease, value of the channel). Could even have done same
measure in the beginning as well to show change over time comparison. Still
not random selection, but better.

"But that's a lot more work", some might say. But without this extra work,
there is no actual experiment. The test just says that threads are good and
when we ask folks to use them, they do. But look at the data; Slack message
count decreased from Q4 17 to Q1 18; any change in actual utility could be due
to seasonality, shipping vs. bug bashing, other changes that resulted in fewer
messages so threading wasn't needed; maybe threading caused fewer messages,
maybe threading caused people to choose not to comment when they would have
otherwise... but we can't tell from this design.

I'm not saying there's anything wrong with iterating. But call it that:
"Iterate and change, measure, repeat if change correlated with goodness". A
formal Experiment is designed to show that the change you made _caused_ a
change in something else, something important to you or your business. Without
a formal experiment, you just have correlation, hope, tribal knowledge,
instinct, experience... all great things, none of which support causation.

And not everything needs this level of rigor, and that's totally fine; maybe
that's the case in top post example. But if a change is expensive in terms of
workflow, effort, or actual expense, perhaps it's worth doing a more
structured test before committing.

If you've never had an experimental design experience, try reading anything at
[http://exp-platform.com/](http://exp-platform.com/) (Kohavi's work at
Microsoft) or search for DOE design of experiments at your favorite search
engine; also articles on "A/B Testing" often give suggestions on how to best
structure a controlled experiment. And recognize that most older work focuses
on traditional ANOVA and t-tests, but there are all sorts of other modern ways
to assess impact.

(edit: corrected typos)

~~~
ecaml
Thanks for the advice!

You're right: scientifically/formally speaking this cannot be considered an
experiment but, in my opinion, for small and easy-to-change processes, there
is no need to be formal. The goal sometimes is just to solve small pain points
and make people happier at work, and in such cases the fact that people
perceive that things are better after the experiment could be enough :)
Obviously, this was just a simple case study and every perceived improvement
could be due to confounding factors, as you say, or just to the fact that the
experiment created awareness about a problem. Obviously, if an experiment
touches something more relevant than slack messages, being more formal is a
good thing and A/B testing is for sure a better approach.

The point of the post was just to make people aware of the fact that changes
in a team can happen without too much pain and that continuous improvement and
experimentation are processes that can be implemented easily.

~~~
rich_ard
Did you measure, though, that the staff was happier or more productive? Did
any meaningful change occur as a result of the requests to use Slack
differently?

Looking at your graph, it's not really clear that anything has changed
significantly: when did the change get implemented? Is a couple weeks' change
in activity a result of a couple of loud employees going on leave? There's
such variability there that it's hard to say that anything has really changed.

I agree that a formal analysis isn't always necessary, but you don't appear to
done _any_ here and are then couching a suggestion to try something on Slack
as having measured something. That's just bad science! :D

~~~
ecaml
That's not science :)

However, you're right: in the article, it's not clear how we decided to
consider the experiment as successful: we called a meeting with all the team
and asked everyone something like: "Do you think that the situation on Slack
is better now?", and the majority of people said "yes".

I added a note at the end of the article to address some of the comments ;)

