
Incuriosity Will Kill Your Infrastructure - luu
http://yellerapp.com/posts/2015-03-16-incuriosity-killed-the-infrastructure.html
======
jes
Embedded software medical device developer here.

I once worked with a hardware engineer that would verbalize his thought
process very explicitly, as we worked in the lab.

He would say things like: "Ok, I'm about to let the target out of reset. I
expect to see the I2C bus controller initiate a master read of address 0x80."

Then he would do it, and look at the oscope, and see if his expectation was
confirmed.

If it wasn't, or was fishy in any way, he'd say something like "OK, I have a
mystery. I expected to see <X>, but I saw <Y> instead. I'm investigating this
before I go further."

So, you get the drift.

For this guy, the rule was "NO MYSTERIES."

Working with him was a fantastic and valuable experience.

~~~
jfroma
This is exactly my though process as a software developer. I work remotely so
I dont speak out loud (most of time), but i think it helps too. It is similar
to when you have been debugging something for hours and you figure the
solution right away when you start explaining the problem to a coworker.

~~~
aptwebapps
Pre-emptive rubber ducking. I'd like to try it, but I'm afraid it might be a
nuisance for others.

------
pjungwir
Not just your infrastructure, your code base too. I've seen a lot of
developers practice what I call "debugging by superstition," where they make
random changes until it appears to work. I prefer to keep digging until I
understand. Sometimes I make a hypothesis and test it, which superficially
resembles debugging by superstition but is different.

One benefit of experience is that you gain a better intuition about your
hypotheses, and you know how to more quickly devise "experiments" to test
them. Also if you know more things in depth (because of prior digging) you
don't have so many rabbit holes to explore.

Another benefit of waiting until you understand is that you don't make bull-
in-a-china-closet edits to unfamiliar code. As a freelancer I have a strong
bias towards adopting the style/patterns/architecture of whatever code base
I'm working in. I wish more people did this! More often programmers skim
through some code and start making changes, without trying to learn why the
code is how it is or what other parts of the system need it that way.

Since this is the Internet I feel compelled to add: of course moderation in
all things.

~~~
quanticle
>Sometimes I make a hypothesis and test it, which superficially resembles
debugging by superstition but is different.

As Adam Savage [1] says, "The only difference between science and screwing
around is writing it down."

I don't follow this process for every problem I encounter, but when I have a
really intractable issue, where nothing I've thought of seems to work, I start
a "lab notebook" (usually a few sheets of printer paper). I write down all my
assumptions, and start designing experiments to test each one in turn. It's a
fair amount of overhead (which is why I don't use this approach for
everything), but when all else fails, the scientific method powers through.

[1]
[https://www.youtube.com/watch?v=BSUMBBFjxrY](https://www.youtube.com/watch?v=BSUMBBFjxrY)

~~~
vacri
_As Adam Savage [1] says, "The only difference between science and screwing
around is writing it down."_

Not true at all, though publishing is an important step. _Understanding_ what
is going on is important. A startling revelation that the MythBusters guys
have no understanding of statistics was when they invented their buttery-toast
dropper. Doing a 'calibration' dry run (literal dry run!) with toast with one
side marked with an X instead of butter, 7 out of 10 trials read one way. Adam
remarked "This isn't random enough - it should be 5!".

The Mythbuster guys get 11/10 for curiosity and the spirit of investigation,
but 4/10 for scientific rigour :)

~~~
quanticle
That's not how I interpreted it at all. To me "writing it down" has nothing to
do with publishing. "Writing it down" means that you systematically track the
results of your experiments and then you use those results to update your
hypotheses. If you don't write anything down, you're debugging by
superstition.

~~~
vacri
I would not categorise that rather complex set of activities as 'only'. It's
not just 'writing down', but analysing, predicting, and designing new
experiments.

------
optimusclimb
So what I want to know, is, as per all the recent agile/scrum discussions -
how does the modern "do sprint planning/commit to a number of sprint points to
do/tasks to work on/be the product manager's monkeys" align with, "you saw
something that's probably representative of a major problem in your system,
but stopping what you're doing to investigate it will kill your velocity, and
make your team's statistics look bad"?

~~~
isaacaggrey
Any time a sprint commitment must be broken the decision rests on the team's
product manager who is best (or _should_ be best) suited to understanding the
tradeoff between sprint completion and a potential emergency bug situation. If
the PM decides it's not worth it, then that's on them. It's worth noting, you
may want to include a wide enough distribution so it's known that you noticed
but PM didn't want you to work on it in order to cover yourself.

As for how to account for that work in your velocity, I don't believe it is
realistic to size a story for fixing a _potential_ issue (not to mention the
difficulties in sizing bugs anyway [1]). However, it _is_ possible (or at
least a bit easier) to size a story for research that answers a specific
question as its acceptance criteria -- in this case, the acceptance criteria
for this article's situation would've been something like 1) "will latency
continue to rise?" and 2) if so and it is unacceptable, what is the cause and
add an implementation story to the backlog"

Some may say "well, how do you know how what is causing the issue", and if no
one really can figure it out the answer to the research story question and the
bug has manifested itself as an emergency, then sure your velocity will tank
as you break the sprint(s), but it's up to the team/business to understand how
to remove outliers for an accurate velocity.

[1]: [http://www.agileforall.com/2010/05/agile-antipattern-
sizing-...](http://www.agileforall.com/2010/05/agile-antipattern-sizing-or-
estimating-bug-fixes/)

~~~
optimusclimb
Yeah, I figured someone would respond that way. I have a feeling that the more
process that lies around making such fixes...the more likely the system is
likely to be under performing (and or broken) long term.

~~~
isaacaggrey
In Scrum, speed and efficiency are traded for predictability and consistency
(whether or not those trade-offs are worth it is another conversation).

It's worth noting that I agree with your point and also don't believe Scrum is
a panacea, but I am starting to understand its appeal from a business
perspective.

------
jd007
Not related to the post, but after going to the main Yeller page, I noticed
that at the bottom one of the features listed is "HTTPS Everywhere (we don't
even allow HTTP over our API or website)", yet the site is not over SSL at
all. In fact manually entering HTTPS in the URL shows that the certificate is
not valid.

~~~
t__crayford
Hi (author/founder of Yeller) here.

You're totally right, I need to change the wording there. The marketing site
doesn't run over https - I'm bootstrapping, with relatively limited funds, and
so can't properly afford the SSL costs for a CDN (my current one wants to
charge $600 or so a month for serving ssl requests).

The webapp and the api are all HTTPS only.

I should change the wording on that page to reflect that.

~~~
adrianmacneil
I highly recommend CloudFlare. Their basic plan is completely free, and even
comes with a free SSL cert.

------
totally
"Not once have I regretted spending unbounded amounts of time investigating
something fishy"

While I agree with the gist here (heed warning signs, proactively preempt
failure), there are literally hundreds of "fishy" things, many/most of them
low impact, that I could investigate on a given day, and my time is bounded.

At the semi-formal dance of distributed systems, meandering investigation
should be chaperoned by ruthless prioritization.

------
mattbreeden
Can't find any way to contact you guys, so hopefully you'll read it here. On
OSX Chrome v 41, the 'Subscribe to your free one month course' button extends
a decent amount beyond the pink box on the right side.

~~~
t__crayford
Post author/founder of Yeller here:

Huh, interesting. That's my setup as well. I'm not super great at CSS (yet),
so not too surprised by a few minor visual bugs like that. I'll fix it soon.
Thanks so much.

~~~
dredmorbius
If that's this xpath: /html/body/div[2]/div[2]/div

Being:

    
    
        <div class="span10 offset3">
    

Then swapping out the "margin-left" property for a "padding-left" for selector
".row-fluid .offset2:first-child" should fix the problem.

------
tobz
The bit about how Riak resolves concurrent writes sounds backwards. As far as
I know, it's last-write-wins by default. You need to opt into storing all the
writes via allow_mult.

~~~
Sinjo
allow_mult has been enabled by default since at least 2.0 -
[http://docs.basho.com/riak/latest/dev/using/conflict-
resolut...](http://docs.basho.com/riak/latest/dev/using/conflict-resolution/)

------
Quanticles
Does anyone have a good article for a more generalized version of this motto?
It seems to apply to many forms of design

