
The case of the 500-mile email - Jach
http://www.csua.berkeley.edu/~ranga/humor/500_mile_email.txt
======
jerf
Interestingly, this story actually helped me once. I was debugging an issue
where some people in the multi-national company I work for (many offices)
could connect to the server and some people could not, and it was thanks to
this story that I seriously entertained the possibility that it had something
to do with timing, due to the geographical distribution of the problem
reports. Subsequent investigation revealed that for various uninteresting
technical reasons, the server was setting a 200ms timeout on the SSL
negotiation part of the connection, which closely matched the ping times of
the participants, including the people who could sometimes connect and
sometimes not. And I discovered this far faster by knowing I should be poking
around on timing issues, rather than searching out something deeper.

(I am deliberately eliding other details not relevant to this aspect of the
story. My point is merely that I am glad I had encountered it. Could easily
have saved me a day or two.)

------
mcav
Most interesting to me was the "units" utility. I didn't know it existed.

~~~
groaner
I think fewer people are aware of it these days when you can simply hit up
google:

<http://www.google.com/search?q=.003+light+seconds+in+miles>

Though it seems they haven't gotten millilightseconds to work.

~~~
vtail
As always, WolframAlpha is your friend:

<http://www.wolframalpha.com/input/?i=3+millilight+seconds>

558.8 miles or 899.4 kilometers

They even have "comparisons", e.g. for 30 millilight seconds:

~~ 1.4 x Amazon River length (~~ 6400 km )

------
chime
This one has made the rounds on reddit a few times. I shared this personal
story last time it came around:

In 2005 at my job we had a pretty severe problem just as unexplainable. The
day after an unscheduled closing (hurricane), I started getting calls from
users complaining about database connection timeouts. Since I had a very
simple network with less than 32 nodes and barely any bandwidth in use, it was
quite scary that I could ping to the database server for 15-20 minutes and
then get "request timed out" for about 2 minutes. I had performance monitors
etc. running on the server and was pinging the server from multiple sources.
Pretty much every machine except the server was able to talk to the others
constantly. I tried to isolate a faulty switch or a bad connection but there
was no way to explain the random yet periodic failures.

I asked my coworker to observe the lights on a switch in the warehouse while I
ran trace routes and unplugged different devices. After 45-50 minutes on the
walkie-talkie with him saying "ya it's down, ok it's back up," I asked if he
noticed any patterns. He said, "Yeah... I did. But you're going to think I'm
nuts. Every time the shipper takes away a pallet from the shipping room, the
server times out within 2 seconds." I said "WHAT???" He said "Yeah. And the
server comes back up once he starts processing the next order."

I ran down to see the shipper and was certain that he was plugging in a giant
magnetomaxonizer to celebrate the successful completion of an order. Surely
the electromagnetic waves from the flux capacitor were causing rip in the
space-time continuum and temporarily shorting out the server's NIC card 150
feet away in another room. Nope. All he was doing was loading up the bigger
boxes on the pallet first and then gradually the smaller ones on top, while
scanning every box with the wireless barcode scanner. Aha! It must be the
barcode scanner's wireless features that probably latch on to the database
server and cause all other requests to fail. Nope. Few tests later I realized
it wasn't the barcode scanner since it was behaving pretty nicely. The
wireless router and it's UPS in the shipping room were configured right and
seemed to be functioning normally too. It had to be something else, especially
since everything was working fine just before the hurricane closing.

As soon as the next time out started, I ran into the shipping room and watched
the guy load the next pallet. The moment he placed four big boxes of shampoo
on the bottom row of the pallet, the database server stopped timing out! This
had to be black magic! I asked him to remove the boxes and the database server
began to time out again! I did not believe the absurdity of this and spent
five more minutes loading and unloading the boxes of shampoo with the same
exact result. I was about to fall down on my knees and start begging for mercy
from the God of Ethernet when I noticed that the height at which the wireless
router was placed in the shipping room was about a foot lower than the top of
the four big boxes when placed on the pallet. We were finally on to something!

The wireless router lost the line-of-sight to the outside warehouse anytime a
pallet was loaded with the big boxes. Ten minutes later I had the problem
solved. Here is what happened. During the hurricane, there was a power failure
that reset the only device in our building that wasn't connected to a UPS - a
test wireless router I had in my office. The default settings on the test
router somehow made it a repeater for the only other wireless router we had,
the one in the shipping room. The two wireless nodes were only able to talk to
each other when there were no pallets placed between them and even then the
signal wasn't too strong. Every time the two wireless routers managed to talk,
they created a loop in my tiny network and as a result, all packets to the
database server were lost. The database server had it's own switch from the
main router and hence was pretty much the furthest node. Most other PC's were
on the same 16-port switch so I had no problems pinging constantly between
them.

The 1-second solution to this four-hour troubleshooting nightmare was me
yanking off the power to the test router. And the database server never timed
out again.

~~~
aw3c2
A "normal" person might have simple "turned it off and on again" and thus
fixed it. Interesting how you wanted to understand what's going on and went
through a lot of trouble to get there. I'd have done the same.

~~~
rmc
Well the first problem is to find the thing to turn off and on again. It's not
always clear what device is misbehaving.

Also, frequently you'll need to report to the "higher ups" what went wrong.
Saying "I dunno, I turned it off and on again and it works now. _shrug_ "
makes you look like you don't know what's going on. It's much better to be
able to explain what happened.

------
gxs
Relevant FAQ about the story by the author of the story:
<http://www.ibiblio.org/harris/500milemail-faq.html>

------
gjm11
A lovely story (which I urge anyone reading this to go and read; it's much
geekier than it probably sounds). But wait: isn't there a factor-of-2 error
right at the end, and doesn't that rather spoil it? Or am I missing something?

~~~
CoreDumpling
The author admits to taking some liberties with the story, which he clarifies
here (you appear to be concerned about #8):
<http://www.ibiblio.org/harris/500milemail-faq.html>

------
decadentcactus
Read it before, but enjoyed reading it again.

------
evansolomon
This was posted a while back, very entertaining story:
<http://news.ycombinator.com/item?id=385068>

~~~
chaosmachine
Just over 500 days ago...

------
demallien
Fascinating, although I'm guessing the author isn't an electrical engineer,
because for me, the timeout on the messages would have been the absolute first
thing that I checked as soon as I'd confirmed the 500 mile limit. Although
that's probably because I spent a certain amount of time at the start of my
career detecting faults in fibre optic and copper installations using
reflectometers to work out how far down the line the problem was...

------
miles
If you're running OS X, it pays to grab gunits
<http://gunits.darwinports.com/>

Here's why:

    
    
      $ units
      500 units, 54 prefixes
    
      $ gunits
      2411 units, 71 prefixes, 33 nonlinear units

~~~
ianferrel
Can you provide an example of a nonlinear unit?

~~~
moxiemk1
They mean that the conversion is defined by a function rather than a scaling
factor.

The example that the manual gives is Centigrade to Fahrenheit, which,
interestingly enough, _is_ a linear function.
[http://www.gnu.org/software/units/manual/units.html#Nonlinea...](http://www.gnu.org/software/units/manual/units.html#Nonlinear-
units)

~~~
lmkg
It depends on your definition of linear. In some algebra contexts you would
call it a linear function because it's a line, but in linear algebra you would
not consider it a linear transformation because it doesn't pass through the
origin[1]. It would be more accurate to call it "affine."

[1] More generally, being "linear" means that _f(ax+y) = af(x) + f(y)_ for
constant _a_ and variables _x, y_. From this, it follows that _f(0) = 0_ for
linear transformations.

------
doron
Recalling older versions of sendmail literally sent a shiver down my spine, we
had something akin to a voodoo ritual whenever we needed to change some
configuration.

------
grk
I like how the boss tried to figure out what's wrong before complaining. Now
it's usually "AAA IT'S BROKEN GO FIX IT".

~~~
JustinSeriously
Me too. Though in this case the "boss" was a statistician.

The way the professor initially gathered data, instead of immediately asking
for help like a normal user, made me think this was going to turn into a joke.
Like the one that begins with, "a manager, an engineer, and a programmer are
driving through the mountains..."

