
A bad workman blames his tools - jgrahamc
http://www.jgc.org/blog/2010/02/bad-workman-blames-his-tools.html
======
abstractbill
Indeed, it turns out not to have been a compiler bug, according to a comment
on the original article: <http://esr.ibiblio.org/?p=1705#comment-248245>

~~~
gonzo
but that doesn't mean that esr retracted his BS.

~~~
wheels
To be perfectly honest, I think it'd make sense to ban ESR's stuff here,
because (a) when speaking of technical matters, it's usually wrong, often
offensively so and (b) it just degenerates into stuff where we're all
complaining about things that are, to most of us, obvious.

Simply put, a competent programmer reading ESR's posts quickly and correctly
comes to the conclusion that he's just not a very good programmer. It feels a
bit gratuitous to keep driving that home publicly.

~~~
calcnerd256
It would have taken me a lot longer to become familiar and comfortable with
this culture if not for esr's help. A hard rule like "no esr" would throw the
baby out with the bathwater. HN's shower has a drain.

------
wglb
Having been a blamer of tools, I would rather say "A good workman who blames
his tools fixes the tools or gets better tools"

The statement _This is, simply put, appalling advice._ regarding the advice to
try lowering the optimization level to see if the heisenberg goes away. The
headline here uses the word "blame" but the referenced article has a headline
of _When you see a heisenbug in C, suspect your compiler’s optimizer_.

Having written a couple of compilers and used many, I agree with the advice to
"suspect" the optimization level. Try the lower level, but keep in mind that
it could either be a lucky rearrangement of your particular code or a bug in
the optimizer.

The -O3 level is widely held to be risky. To me, this means that there is risk
of bad code generated, not risk of the resulting program being too fast.

I think the realistic estimate of the probability of errors in gcc is either
1.0 or approaches 1.0, and with -O3 it is not lower.

Yes, lots of people compile lots of programs with gcc and g++, but every shop
I have seen has a set of coding standards that suggest or demand avoidance of
certain features.

There is a test of a program or specification that gives you a choice between
1) Obviously no deficiencies or 2) No obvious deficiencies. Doesn't seem to me
that gcc/g++ (or any other competing compiler) fits either of these criteria.

~~~
CoreDumpling
Some of us don't quite have the luxury of fixing our tools (MSVC++ I'm looking
at you). Considering the sheer complexity of languages like C++, it's
unsurprising that there will be problems here.

I've been a blamer of tools as well, but I've convinced myself that it's
ultimately wasted effort to do so. Better to simply come to terms with the
fact that there do exist bad tools, that they ought to be avoided when
reasonable, and that they should be used with care when necessary.

~~~
wglb
Yes, tools as big as C compilers and C++ compilers are pretty hard to fix, but
then there was an effort to "fix" this by starting LLVM and clang, etc. One
might argue that pg is "fixing" lisp by starting arc, and that Rick Hickey is
fixing Java (and some say Lisp) by starting Clojure. In my case, i "fixed"
fortran by getting into compilers and learning other languages.

So really "fixed" in this note I guess translates as "moved away from".

------
camccann
Corollary 1: A good workman uses tools that make it easier to diagnose his own
mistakes, since that's what most problems are. Valgrind is a beautiful
example.

Corollary 2: Tools that don't help you catch your mistakes are, in fact, bad
tools, and shouldn't be used.

~~~
doty
I understand the point, but I think corollary 2 is a bit too strongly worded.
Sometimes tools don't help you catch your mistakes but are fantastically
useful for other things. The fact that the assembler does exactly what you
tell it to is a prime example.

I think it could probably be re-worded as "use the safest tool you can to get
the job done."

------
ams6110
Agreed, if you have a bug in a C program your first thought should not be that
you're dealing with a problem in the compiler or standard libraries. However
as you move farther down the chain, it's quite common to encounter bugs in
open source code, especially in projects which have have a large body of
"contributed" modules (I'm thinking of Drupal as a particular example, not to
pick on that project, but I have found that the quality of contributed modules
is generally not very good, excepting the modules that are widely used).
Therefore I'd say that as a general rule, the likelihood of a finding bug in
your "tools" is inversely proportional to the number of users your tools have.

------
bonsaitree
As my dad was fond of quoting:

"A worker uses his tools. A journeyman knows his tools. A craftsman makes his
own tools."

------
dkarl
I love the way he describes "continuous integration, unit tests, assertions,
static code analysis, memory checkers and debuggers" as "scar tissue."
"Callouses" might be a better literal fit, but "scar tissue" feels right and
true.

------
machrider
Pragmatic Programmer has this covered: Select Isn’t Broken

"It is rare to find a bug in the OS or the compiler, or even a third-party
product or library. The bug is most likely in the application."

[http://www.pragprog.com/the-pragmatic-
programmer/extracts/ti...](http://www.pragprog.com/the-pragmatic-
programmer/extracts/tips)

------
ssp
This is actually an example where valgrind _doesn't_ catch the bug. All it
sees is a write to the stack, which is a perfectly reasonable thing to do as
far as valgrind is concerned.

------
rdtsc
But I also like to say "You can tell a worker by his tools". The choice of
tools and how they are used tell a lot about the workman ( a programmer in
particular ).

    
    
      - Do they use source control ?
      - Do they use unit tests ?
      - What kind of language + environment do they use ?
      - Do they have strong opinions about deficiencies of other tools, that they specifically refuse to use? (this shows experience perhaps).

------
dkersten
I'm really not surprised. I read the other article and immediately thought of
the _select isn't broken_ story from The Pragmatic Programmer and at the
various bugs I have encountered myself, especially in C and C++, where it
would have been tempting to blame the tools, but int he end turned out to be
memory issues or race conditions.

Blaming the compiler is an easy cop out and this proves that it rarely
actually is the case. This guy is experienced and clever, but he still fell
for the _its a compiler bug_ trap. I'm glad he found the real bug and,
hopefully, learned his lesson.

I'll reiterate what I said in a comment to the other article - I think that
there should be a universal law of programming in languages like C that says
that you _must_ , at a minimum, _urn compiler warnings to maximum and make
sure it compiles without any warning; run your program through tools such as
valgrind and ensure it passes without error_. And if debugging information of
optimizations make bugs appear/disappear then you need to double check
everything - its probably a memory alignment issue.

------
demallien
OK, I'm wondering if jgrahamc is sure of why that value is being overwritten
in his example... According to this document
([http://developer.apple.com/mac/library/DOCUMENTATION/Develop...](http://developer.apple.com/mac/library/DOCUMENTATION/DeveloperTools/Conceptual/LowLevelABI/130-IA-32_Function_Calling_Conventions/IA32.html)
) there are 4 registers pushed onto the stack - ESI, EDI, EBX, and EBP -
during a function call. Which means the attempt to write to ar[20] is actually
blowing away a register saved on the stack by the function prolog, not an
array allocated on the stack.

Of course, I've never disassembled MacOSX code, so maybe I'm forgetting
something important. If anyone can correct me, I'd be very grateful!

~~~
jgrahamc

            .text
      .globl _a
      _a:
            pushl   %ebp
            movl    %esp, %ebp
            pushl   %esi
            subl    $100, %esp
            call    L_getpid$stub
            movl    %eax, %ecx
            movl    $1808407283, -92(%ebp)
            movl    -92(%ebp), %eax
            imull   %ecx
            sarl    $3, %edx
            movl    %ecx, %eax
            sarl    $31, %eax
            movl    %edx, %esi
            subl    %eax, %esi
            movl    %esi, -76(%ebp)
            movl    -76(%ebp), %eax
            sall    $3, %eax
            addl    -76(%ebp), %eax
            addl    %eax, %eax
            addl    -76(%ebp), %eax
            movl    %ecx, %edx
            subl    %eax, %edx
            movl    %edx, -76(%ebp)
            cmpl    $0, -76(%ebp)
            sete    %al
            movzbl  %al, %eax
            movl    %eax, 8(%ebp)
            addl    $100, %esp
            popl    %esi
            leave
            ret
      .globl _main
      _main:
            pushl   %ebp
            movl    %esp, %ebp
            subl    $72, %esp
            movl    $0, -72(%ebp)
            call    _a
            movl    -72(%ebp), %eax
            leave
            ret

~~~
demallien
Yup, guess you're sure then ;) Though out of curiousity, have you actually
looked at the stack when it's running? Also, I can see ESI and EBP being
pushed onto the stack, but what's the missing third value? Update: Actually,
even more intriguing, why is the stack shifted by 100 bytes for the a()
function??? I would have expected it to be 72, as it was for the main()
function, ie 64 bytes for the ar[16] and then 8 bytes for the two registers
that are pushed onto the stack.

Sigh. I'd love to actually have the time to dig out my Intel opcode books and
figure it out, that would be fun, but I have to get myself back into
Javascript headspace for work tomorrow morning. I wonder how many layers there
are between my Javascript code and processor opcodes...

~~~
l0stman
The instruction ``call _a'' pushes the address of the next instruction onto
the stack.

------
tmsh
I think this is a good point. But there's a balance. Questioning tools (blame
is borderline useless) can be useful of course.

I think it boils down to -- be humble about whatever context you're operating
in -- so as to understand it and use it fully. But that doesn't mean one
should stop questioning.

The fact that gcc doesn't warn about a lack of return values (without the
appropriate -W flags) is pretty silly. Maybe if you're programming assembly as
much as you are C, you don't have to worry about that. But most people who use
gcc don't exactly know assembly (though they should!). The tool itself is
fairly well-documented, but it's not perfect. Could be improved.

Actually, I don't know. Maybe it is good the way gcc and gdb require the
programmer to step up and be more active about the whole process (and maybe
learn assembly and how the machine is actually working). So I don't know the
right approach. But there is a risk in just 'accepting' tools as they are.
Hmm.

I think it's more like -- channel your problem with a tool into a better
understanding of what's going on.

------
InclinedPlane
Continuing:

A good workman doesn't use bad tools.

A good workman may curse his tools but he nevertheless still gets things done.

A good workman knows his tools well enough to know what they can and cannot
do, and does not ask them to do what they cannot.

A good workman is humble and takes the effort to identify a problem before
blaming a cause (whether it be the tools, himself, anomalous barometric
pressure, or what-have-you).

------
RyanMcGreal
>In fact, it's a sign of a very poor or inexperienced programmer if their
first thought on encountering a bug is to blame someone else.

Very true.

------
maurycy
Shorter version: a bad workman blames.

------
anonjon
Corollary (on second thought, Axiom):

Heisenbugs do not exist, computers are deterministic.

Things that do exist:

Pieces of code that I do not understand. Pieces of code that I have not read.
Pieces of code that interact in ways which I do not expect.

~~~
gte910h
>Heisenbugs do not exist

Perhaps in your universe.

In mine, I've seen video card temperature, random radio noise on an ungrounded
component, and power line noise all totally screw with programs. Diagnosed to
the point we could stimulate the errors through environmental changes. Dozens
of more errors we think we have the solution to as being caused intermittent
effects but can't repro reliably

Computers in theory are deterministic. In practice, they're machines in the
world.

------
hackermom
this topic came up not long ago, in a discussion related to php. i will stress
it again: in programming and programming languages today, there aren't any bad
tools - there are just bad workmen who can't (or simply refuse to) make do
with the tools at hand.

the comparisons of "you can't cut a cake with a fork" et al. just doesn't
apply here.

~~~
marshallp
you're saying php is equivalent to lisp and haskell and erlang, riiiiight,
ok....

~~~
marshallp
If you can judge a man by the shine on his shoe, you can judge a developer by
the syntax on is screen.

php is a 100 pound chain on the productivity of the entire industry. Companies
that have embraced php also happen to be examples of companies that achieved
way below their potential (yahoo and facebook).

It's funny that as we slay the beasts of perl and j2ee, we bring up another
terrifying goul. It's probably why real engineers snicker when we call
ourselves software 'engineers'.

------
EliRivers
That is such a useful phrase. I'd hate to forget it. Here, could you write it
down for me on the back of this postage stamp using this banana to write with?
Thanks.

