
Escape the scripter mentality if reliability matters (2011) - amzans
http://rachelbythebay.com/w/2011/12/31/scripting/
======
rkachowski
I would always reach for scripting every time. Getting a working
implementation and experiencing real world data + problems as soon as possible
is always best if you are taking an iterative approach.

Unless there are some serious domain experts on hand, I feel mapping the
problem space is the most valuable approach - and one of the best ways to do
this is to attempt a fast and basic solution. The alternative is a big design
up front with minimal experience in the subjext domain.

It feels like the inverse of "I'm going to scale my foot up your ass"[1] - how
useful are your reliability and metrics if your service is never used /
becomes irrelevant / newspaper business dies / company folds / huge paradigm
shift in tech.

I feel the dirty hack is a valuable step in the process, and its not possible
to jump to the end without mapping the route first.

[1] [http://widgetsandshit.com/teddziuba/2008/04/im-going-to-
scal...](http://widgetsandshit.com/teddziuba/2008/04/im-going-to-scale-my-
foot-up-y.html)

~~~
TheOtherHobbes
It depends on the domain. CRUD should be by the numbers, unless you're working
at planetary scale.

But some domains are very poorly understood. You won't get far by trying to
iterate - because you don't even know what the problem _is_ , never mind how
to solve it, and your dirty hack is going to introduce naive assumptions that
turn out to be spectacularly wrong.

~~~
smcameron
Though as John Gall famously said, "A complex system that works is invariably
found to have evolved from a simple system that worked. A complex system
designed from scratch never works and cannot be patched up to make it work.
You have to start over, beginning with a working simple system."

------
kd5bjo
In this sort of situation, it's always useful to ask what the cost of not
automating the task at all is. That will give everyone a better feel for
whether the automation makes sense, and how much developer time should be
allocated to the project.

> Let's say I want you to get the fifth word of the fourth paragraph of the
> third column of the second page of the first edition of the local paper in a
> town to be determined. It's going to be a color, and we have a little deal
> with that paper to get our data plugged in every day. You can get the feed
> from their web site.

For the one-off case, I'd just go open up the paper and look for myself.
Surely that'll be faster than trying to write some sort of a script to do it.

> Now I want you to be able to do this reliably every day for the next two
> years. I need this data on a regular delivery schedule and it can't rely on
> some human being there to constantly fine-tune things.

Ok. So now we're automating a 5 minute job that can be done by the office
assistant, which will be performed 730 times. That puts the upper bound of
time saved at around 120 hours of work. Due to their specialized training,
software engineers probably cost the company more than 3x per hour than the
assistant, so this automation task only makes sense if it can be completed in
less than a week of work for the engineer, including maintenance over the next
two years.

That sounds like a reasonable, but tight, time budget for the given task and
reliability specifications. It's not an obvious win, which also means the ROI
will be small. Also, there's opportunity cost to consider: that week of
development time now can't be spent on other projects that may be more urgent.

~~~
andrewflnr
> ... if reliability matters

Your analysis ignores the cost of failure to get the right word, or maybe just
assumes the office assistant is infallible. That cost is almost certainly non-
zero, and might be much higher than the whole development cost.

~~~
kd5bjo
The task as described included an easy for a human but tricky for a computer
verification that the correct word will always be a color. Given this, and the
need for the system to be tolerant against human error occurring at the paper,
I believe a typical untrained human will have a better reliability on this
task than almost any automated system.

But in the general case you’re absolutely right: any analysis like this should
include the costs and probability of faults occurring in each option.

~~~
andrewflnr
Sure, but maybe the assistant gets distracted by other work, or gets sick and
don't get a chance to make sure someone else does it, or quits because their
boss is a jerk, or any number of other things going wrong. At least a few of
these will happen over the course of two years.

------
crispyambulance
The first thing that works is just fine, for a while, until it's not.

Then, you need to evolve to something else that better fits the job at hand.
This is not a technical problem or even one of "mentality", I think.

It's more of an organizational challenge. It means acquiring the level of
agency needed to adapt or evolve solutions to problems.

If you need to dig a trench, once, in your backyard, sure a pick axe and
shovel and some hard-labor is just fine. If you need to dig a trench every
day... you need a backhoe. Sadly so many organizations choose to do the
equivalent of operating "chain-gangs" to dig trenches with pick-axes and
shovels, at scale, every day. The people in charge of these chain-gangs just
don't know any better and the people digging don't have the agency to demand
the right tools and right approach.

~~~
smacktoward
_> The people in charge of these chain-gangs just don't know any better_

This is a dangerous assumption. It seems at least equally likely that they are
just responding to the incentives the system presents them. In other words,
when you see a chain gang, you're seeing an organization that views hardware
as expensive and people as cheap.

~~~
derefr
I think, in this case, the “chain gang” might not be a gang of _people_ , but
rather a gang of e.g. Unix scripts in Docker containers scheduled onto a K8s
cluster via SQS from S3 lifecycle events. (I.e. the original software written
to solve the 1x task, now chain-ganged together to solve an Nx task.)

In cases like this, where _both_ the solutions are software, the reason the
company is relying on a progressively Matryoshka'ed system built on top of the
original, "dumb" solution, is usually that they value developer time (to build
a better solution) greatly over ops-staff time (to implement the
infrastructure required to scale the dumb solution) + additional hardware
costs (for all the overhead the dumb solution's method of scaling introduces.)

But even that doesn't explain why organizations refuse to switch from bailing-
wire scripts to a _pre-built_ , commonly-available infrastructure-component
better solution. In such cases, both staying and switching are ops costs.

~~~
kd5bjo
> But even that doesn't explain why organizations refuse to switch from
> bailing-wire scripts to a pre-built, commonly-available infrastructure-
> component better solution. In such cases, both staying and switching are ops
> costs.

The costs of staying are known, because they’re the ones you’ve already been
paying for a while. Switching to a new system is inherently risky as there’s
always a significant chance you haven’t correctly identified all of the
requirements from the existing system.

------
dkersten
I don’t know. Its easier (ie cheaper) to write a quick script and fix it when
it breaks than it is to try and anticipate what unknown unknowns might crop up
over the next two years.Absolutely, think through the edge cases, but there’s
little point in trying to anticipate things you really don’t know about and if
it was a quick script to write, it’ll be quick to write a new one that works
in the new situation (and if not, well, you have more information now to
implement a better solution than you did at the start).

It depends on how critical the task is. Can you detect an error occurred that
you now need to fix? Can you tolerate the lag between error and fix? etc

------
haddr
Scripting is great, especially for handling one time jobs, for doing PoCs it
short-lived projects. You can be really productive and just get the job done.
But operating something on a regular basis is a different thing. Productionize
such thing invokes a lot of different aspects to handle and this articles is
mainly about it: showing cases where evolving “scripting solutions” simply
doesn’t scale. Too much glue and too few control over the main aspects of the
problem.

I saw solutions that started like this and very quickly became an unmanageable
mess. I guess it’s some sort of pattern where it’s hard to see this fine line
between “it works so don’t need to reengineer it” and realising a huge
technical debt incurred. It requires experienced person to make a decision to
drop the former and start with some more sane solution.

------
wodenokoto
The story starts with inconsistently formatted stream of text data and ends
with nicely formatted binary data.

Where does she get this nice data, where integers are integers and fields have
nice names?

~~~
jt2190
Yeah, I think that the newspaper example obscured her larger point that
converting machine-readable data into human-readable data and then using text
parsing is both error-prone and unnecessary compared to just using tools that
work directly with the machine-readable data.

Edit: The meat of the article:

> This gets into a whole thing I call "scripter mentality". It seems like some
> people would rather call (say) tcpdump and parse the results instead of
> writing their own little program which uses libpcap. Calling tcpdump means
> you have to do the pipe, fork, dup2, exec, parse thing. Using libpcap means
> you just have to deal with a stream of data arriving that you'd have to chew
> on anyway.

Edit 2: “Scripter” here refers to operating on text files. The Rule of
Composition [1] encourages the use of text processing, for example:

> Text streams are to Unix tools as messages are to objects in an object-
> oriented setting. The simplicity of the text-stream interface enforces the
> encapsulation of the tools. More elaborate forms of inter-process
> communication, such as remote procedure calls, show a tendency to involve
> programs with each others' internals too much.

[1] The Art of Unix Programming:
[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...](http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id2877684)

------
XCSme
I had to manually download all the invoices from a marketplace each month as
they had no "download all" button. I created a script to (download list of
invoice names, get the invoice PDF based on name, save PDF in folder). All
this was done by hardcoding paths and assuming stuff doesn't change, I have
implemented this script in a few hours and have been using this it for over
two years without any issues (doing each month in 2minutes what previously
took 1 hour). What's the problem with that? Yes, sometimes I had small isues
(eg. I was being rate limited trying to download PDFs, but quickly added a
request delay to fix that issue and it worked again. I think it's a lot
faster/more productive to implement what's quickest and adapt along the way,
than spend a lot of time implementing the "perfect solution" which probably
has the same chances of failing as the quick-and-dirty script.

~~~
ozten
This sounds perfect. I think her concern is if your script became the basis
for a new product. Now your company depends on it running more often and no
one is manually reviewing the output.

~~~
XCSme
The final point in my comment was that even if you spend a lot of time trying
to make a more reliable product instead of a quick script, you still can't be
sure it will never break in the future as the data source can become invalid
at any point. Imagine a news website, they update their platform and suddenly
all 404 pages actually redirect to first article/post in database instead of
correctly returning 404, so your product would crawl the same data each day
and say everything is fine, even though it's not. There are infinite such
problems that can arise, so even if you have a well-thought product, it still
needs some human sanity-checks and updates once in a while.

------
danjc
This week I met with a potential client for our integration platform (iPaaS).
Their business centers around interpreting financial data and that means
ingesting raw data from different sources.

The guy who handles this has set up a SQL database and pulls data in via FTP
and email. Outlook sits open on a server so that Outlook message rules can be
used to fire up a VBScript which in turn calls a stored procedure to ingest
the data. There are a bunch of combinations of these to deal with different
feeds.

He's a brilliant guy and he's used the tools he knows (not a dev) but it's
brittle and only he knows how to support it. It's unlikely they'll move to our
platform (invasion of his turf) but I do hope he finds another solution that's
more reliable.

~~~
statictype
Sounds like an iPaaS is exactly what they need and since hes not a dev he may
even embrace it?

What’s your iPaaS product?

~~~
danjc
The feedback I had indirectly is that he feels that replacing his work with a
platform would negate the value of what he's built so far. We're flowgear.net

------
m0nty
> You're going to have to invest the time to do it right up front

This is the problem with most places I have worked: "I want this and I need it
today." Scripting is the obvious answer.

Then it will change to "I want it to run every day" and even if I say "it
needs more work" the response will be "well, it's working now, so why do you
need to fix it?" And so a shell script or several go into production and I
can't stop it.

In any case, even if you use something more formal, you can still write
fragile, error-prone code. Bash or Perl is not the problem - you can do great
work with those tools, __if __you have time.

~~~
quickthrower2
Any boss can conjure up unrealistic demands. If it’s not possible to have a
reasonable dialogue about it from the beginning, and that communication issue
can’t be fixed maybe look for another job?

------
twic
> Now I want you to be able to do this reliably every day for the next two
> years. I need this data on a regular delivery schedule and it can't rely on
> some human being there to constantly fine-tune things.

If you're going to stack the deck like that, then sure, scripting isn't the
right approach here.

But in reality, for many things, automation that works 90% of the time and
reliably attracts manual intervention the rest of the time is often fine. If
you can build it quickly, it wins out over something more robust that takes
significant investment to build.

~~~
adrianN
It can actually better than automation that works flawlessly for two years and
then breaks or needs an upgrade. While nobody was looking at it, knowledge of
its operation got lost.

------
oweiler
I only use Bash for the simplest tasks. For anything more elaborate I use
Groovy scripts. At some point I switch to full Groovy projects, switching on
static compilation, adding unit and integration tests and so forth.

~~~
badrabbit
I use python scripts these days but I don't think they're any better than bash
or perl scripts. Just more popular these days.

~~~
zmmmmm
The problem with python is even if you do it well, you don't get very far
above a script. As oweiler above mentions, this is where languages with a bit
more comprehensive support for true incremental typing, structuring etc. work
better. It's definitely one of the things I like about Groovy - that it's both
a better first line scripting language AND cuts it as a first class structured
application development language on a par with Java etc - and you can do
incremental shades of gray all the way in between. Of course, you have to
actually DO it which is where the real trap is. But its definitely a level
above Python where the friction of transitioning from "this is fine as an ad
hoc script" to "This really ought to be a proper library / module in a
statically typed language" is high enough that it will basically never happen.

~~~
dijksterhuis
The transition you mention for Python is actually not that high.

Error handling, edge cases, unit tests, type checking, packaging can all be
performed in Python.

Type checks are the biggest bug bear. Otherwise, it’s relatively easy to get
things clean and tidy (packaged).

------
chomp
>This gets into a whole thing I call "scripter mentality". It seems like some
people would rather call (say) tcpdump and parse the results instead of
writing their own little program which uses libpcap.

This made me smile, because it made me remember a suite called “DSC” that I
could imagine inspired this post. It did work, but seemed like a kludge. Last
time I used it was when this post was written actually, and looks like it does
use libpcap now.

[https://www.dns-oarc.net/tools/dsc](https://www.dns-oarc.net/tools/dsc)

------
coldtea
> _This gets into a whole thing I call "scripter mentality". It seems like
> some people would rather call (say) tcpdump and parse the results instead of
> writing their own little program which uses libpcap._

And those people are right. Too many systems have too many unnecessary layers
and "just in case" code, which instead of making them more robust than a small
script, serves to slow them down, and increase the possible failure modes
exponentially...

------
tyingq
Personally, I don't like tying together the idea of "scripting" and being
unreliable.

I've done lots of shell and Perl, and you can make them as reliable and
bulletproof as any other option.

Do lots of scripts ignore return codes, lack retry logic, sanity checks, etc?
Sure, but so do lots of applications written in compiled languages.

"Scripting" isn't really the issue.

------
egdod
For a task like the newspaper one, the cost of getting every edge case perfect
probably isn’t justified. Write the simple version, and wrap the whole thing
in a try/catch. When it falls over, have it email the receptionist and tell
him to look at the paper.

------
smitty1e
The battle is tactics vs. strategy.

If you really have finite, bounded requirements, then treat it like a full
project with front-end design, granular types, kit & kaboodle.

So often these shiny objects are more moving targets.

------
contingencies
Opinion + metaphorical problem = this article. The key issue with the proposed
metaphorical problem is the instability of its input. Garbage in, garbage out.
Nobody can predict the future (but we can probabilistically interpret it if
necessary!). To my mind, the article's conclusions take this unadmitted
property of the metaphorical problem and use it to baselessly question an
entire category of solutions.

------
lmilcin
I don't exactly agree with this. Solutions like that can be optimal under
certain assumptions.

In my mind, there are two ways to write software.

One way, represented by traditional product shipped to customers (think
Microsoft Excel), is writing software in such a way you can show it works
correctly under all possible circumstances. You need to write every piece of
this software to be resilient to changes in environment and always do the
right thing no matter what. Writing software like that takes care and is very
expensive.

Another way, represented by typical business application written for use by
the same company, is writing software to work in only very narrow possible set
of circumstances. You demonstrate that it works and you ship it. Do you need
it to work on all possible OS-es? Hell, no, you have an image and it works on
it, it is enough. You insulate yourself from outside interference as much as
possible, docker images, tightly controlled network environment, dependencies,
only single supported integration architecture, etc.

The second way is the way to go when building "enterprise" software for
internal consumption. You need to understand what you are doing. You need to
understand why you are doing. If you are doing it correctly it will always be
cheaper and better than investing in "perfect" solution.

------
donatj
I find well constructed shell scripts oft more reliable than large tools built
for a specific purposes. I think this koan applies.

[http://www.catb.org/~esr/writings/unix-koans/ten-
thousand.ht...](http://www.catb.org/~esr/writings/unix-koans/ten-
thousand.html)

~~~
janpot
When I compose small "do one thing well" shell scripts together, people call
it "the unix philosophy". When I compose small "do one thing well" modules
together in node.js, people call it "a dependency hell".

~~~
kungtotte
Every distro and BSD everywhere comes with e.g. find, tr, grep, and so on so
all you do is write the glue in a language that also ships with the OS (and is
portable across all of them if you keep it POSIX).

How do I run your node.js thing on my system without installing dozens or even
hundreds of packages and modules?

It's not the same thing.

------
z3t4
Some programmers will say the pipes are beauty and the reliable code is ugly.
But the real world _is_ ugly!

------
patsplat
Investing time upfront is only productive if one has ready access to a broad
and relevant corpus of data.

------
walshemj
I think assuming that some local news paper is going to have the same layout
in two years is a big ask here.

The actual way to do this would be scripting using beautiful soup and similar
tools.

