

Diagnosing weird problems – a Stack Overflow case study (2012) - luu
http://codeblog.jonskeet.uk/2012/03/16/diagnosing-weird-problems-a-stack-overflow-case-study/

======
jzwinck
This sort of "impossible," "the data I get is not the data on disk" type of
error happens fairly often in my experience. On Linux, a very handy tool is
strace(1), which lets you see the system calls being made by a program. Often
the culprit can be seen when open(2) is called, e.g. the file being opened is
not the expected one.

The cool thing about strace is you don't need access to the source code. I
recently debugged a program which took 60 seconds to run on one computer vs.
an expected time of 1 second. In strace I saw that it made thousands of
requests to a remote server, so the round-trip time determined the program's
execution time. It turned out to be calling getpwent() which ends up sending
one network request per user in the domain (!). This was easily replaced by
getpwuid(), and the program then sent only one request per run.

If Mr. Skeet used a tool similar to strace (which exists on Windows thanks to
SysInternals), he might have noticed that the bytes read by the MD5 program
were not the same as those read by the Java one, even if the OS hid the entire
path redirection process from view.

~~~
AjithAntony
>a tool similar to strace (which exists on Windows thanks to SysInternals)

In case anybody reading this didn't know:

Procmon: [http://technet.microsoft.com/en-
us/sysinternals/bb896645.asp...](http://technet.microsoft.com/en-
us/sysinternals/bb896645.aspx)

