
Taco Bell Programming - whakojacko
http://teddziuba.com/2010/10/taco-bell-programming.html
======
peterwwillis
I like the aspect of not jumping on the old 'lets reinvent the wheel again'
and 'we need a new tool for an old problem' bandwagon. Sysadmins should not be
wasting their ever-shrinking valuable time writing code (that they probably
aren't even experienced enough to write well) when they can pick a 'product'
like xargs or wget off the shelf and get the job done in a tenth the time. Add
to this the fact that modern computers are more than powerful enough to handle
complex computing tasks and the cloud is often very costly for the same
computing power as you'd find on your desktop and you end up doing it faster
and cheaper using a standard Unix workstation and tools.

However, it's a mistake to think this 'taco bell programming' is somehow a
good model for actual programming, or even some sysadmin tasks. This should be
renamed 'Taco Bell Kludging'. Because that's mostly what we're talking about:
using a quick hack/kludge on a command-line to finish a job quickly instead of
programming. In terms of actually building a scalable, fault-tolerant
solution, _sometimes_ the Unix tools just won't do. Don't shortcut and cut
yourself off at the knees just to save time.

------
zdw
I'm a sysadmin - this is how I think too. It's great for the most part but...
there are some problems:

\- If the job isn't split up evenly or with an event queue, you end up
preallocating jobs to processes, it's possible that one may take far longer
than the rest to finish.

The worst case of this I've run into is Microsoft's EXMerge, which does
imports/exports from an Exchange datastore - it can be threaded, but
preallocates work by splitting up alphabetically. In one case, a family
business, all the heavy users got lumped in one thread because they had the
same last name - that thread took 5x longer to run when all other threads had
finished.

\- You can run the machine out of some resource (mem/disk/CPU) by spawning a
huge number of jobs that hit one subsystem hard. This is tuning dependent, of
course.

Also, I'd recommend using make and similar tools for this rather than shell
commands like xargs if you're going to seriously script this - those tools are
made to run processes in parallel and avoid repeating work. They also tend
force you to write intermediate steps to disk which can help in debugging (and
can be coded around or put on a ramdisk later if it proves to be a performance
issue).

------
joeshaw
One concern I have with this approach is that like all code, it becomes
hairier and more complicated over time as it becomes more robust and special
cases are handled.

Once bash scripts reach a certain size and complexity, I've found they become
quite difficult to follow. I don't know if this is inherently a quality of
bash, or of people who tend to write bash, or of my ability to read bash
scripts, but I find larger Python, Ruby, etc. programs a lot easier to follow.

On the other hand, even a 300 line shell script is easier to follow than a
10,000 line Java program.

------
unwind
Isn't it kind of unfair to compare xargs parallelizing, which as far as I know
all happens on the same machine, with cloud-scaled parallelizing through
various services?

Sure, it's awesome to use a ready-made tool to get that kind of scalability,
but is it really apples/apples?

~~~
jcromartie
I don't think it's unfair at all. Add split and ssh to the mix and now you're
distributed.

"Cloud-scale" just means "more hardware than we own" and seems to me like a
step backward in computing, to a time when you paid to time-share a relatively
powerful machine. The main appeal of cloud computing is outsourcing the
ownership of the machines and responsibilities like configuring, storing,
powering, and repairing them.

I am not saying there isn't a use for the cloud computing capacity or the
benefits it offers... but I just think that the hype can be distilled down to
"you don't have to take care of a bunch of servers" for the average business.

~~~
jwhitlark
Where I work, we've run into a great deal of pain scripting ssh. For a one off
job, I'd agree, but if you want something stable over time, well....

I'm not dissing ssh, I just don't think automatically running jobs on remote
machines is its sweet spot.

------
wccrawford
For 1-off projects, this is exactly how I approach things.

For projects that need to be stable, used by non-techies, or upgraded over
time, I generally go for something a little more robust. You know, like Google
does, etc etc.

------
acgourley
I kind of discover these kinds of recipes myself from time to time, and am
always delighted. Is there a good recipe book of practical applications of
chained unix commands?

~~~
retroafroman
Some of the most clever ones seem to hide in the .bash_history files of
enlightened sysadmins. That said, there are some good ones at
commandlinefu.com as well.

------
lurchpop
fuckin love this guy's writing style!

