I've had enough years to become wiser, become a fanatic for configuration management, and get over the embarrassment: I'm the consultant that screwed things up. Some background: the Stat department was running a variety of systems besides the Solaris workstations, and there was, within UNC-CH, a separate support organization that was cheaper and more comfortable with Microsoft products where Stat was sending their support dollars. When that organization needed Unix support, they called my employer, Network Computing Solutions, and I showed up.
There was effectively no firewall at UNC-CH at the time (something something academic freedom something something), and the Stat Solaris machines were not being regularly patched. Uninvited guests had infested them, and it appeared the most likely entry point was sendmail - at the time, it was the most notorious vulnerability on the internet. Since my preference to wipe and reload was unacceptable - too much downtime and too many billable hours - the obvious thing to do was update sendmail. The rest is history.
Yes. Seriously... I’m willing to pay for the shipping and everything. I’ve told this story countless times to people over the years. This and the “OpenOffice can’t print on Tuesday” bug are two of my favourite troubleshooting stories.
My father recently retired his printer (Epson LQ-850) not because the printer had failed, but because driver support was lacking. We had used the same printer since ~1992. He was super bummed; he had stockpiled ribbons and still had a full box of tractor-feed paper.
The explanation he was given by the person in charge of administration on his university-department computer: At least on the system he was using, modern drivers expected a reply from the printer in response to commands, but apparently, the LQ-850 only receives commands but does not reply.
This is absolutely one of the key formative stories that helped me to think about systems at light speed scales.
I'm currently at the very early stages of building a science museum and will eventually try to incorporate this story into an exhibit about light speed. This along with "Nanoseconds", foot long pieces of wire like what Grace Hopper handed out, can truly help to bring this topic to life.
I'm also attempting to use this as the basis for a blockchain based "proof of proximity" in which a very high number of round trip encryptions of the previous blocks hash are stored in a bitcoin block. The number of round trips would be high enough that devices even a few hundred feet apart couldn't complete the task before the next block.
The sending mail server was configured with a zero timeout for connections. If it didn't IMMEDIATELY get a response from the destination mail server it would fail. This immediate failure took 3 milliseconds, a long enough time that some servers could actually respond back before that happened... but if the server was too far away (more than 500 miles) the connection would fail before the first packet could even get there due to the finite speed of light.
I read this story years ago and thought it was hilarious. Could've happened to anyone. In my book you're a near-celebrity and it's great that you can verify the story! Thanks for making things a little more interesting and a lot more fun :)
I’ve read this story many times, it’s hilarious (and could happen to anyone). Thanks for filling in that background - HN is so great for these kind of moments!
I love that this took a perfect storm of having a statistician and sys admin both bent on finding the cause of a weird intermittent problem in their own way.
This could have happened a million times where the story was a lot less interesting:
"Hey, I'm having weird intermittent problems sending email."
"Hmm, we're using the wrong version of Sendmail. All fixed, case closed."
Best part of reading this is coming away having learned the existence of units the CLI. How did I spend 20 years on the shell and not have needed or discovered this?
One thing I got bitten by was the handling of Fahrenheit/Celsius, because it's a non-linear conversion between the two. When you ask to convert `10 degC` to `degF` you get 18, which is the delta of ºF corresponding to increment of 10ºC. To get the absolute temperature, you have to ask to convert `tempC(10)` to `tempF` which is 50, as expected.
"Non-linear" threw me off for a second - I almost never see the mathematically correct definition of linear in computer science spaces. For anyone wondering, Celsius to Fahrenheit is an affine transform, technically not linear, because you have to add an offset, not just multiply.
On the other hand, an equation of the form y = a x + b is a linear equation. If you have Celsius and want Fahrenheit you accomplish that by applying a linear equation (F = 1.8 C + 32), so I certainly can't fault people for saying that the transformation they are doing is linear.
I wonder what people would say for something using an equation of the form y = a x^2 + b x + c to transform something? I can't say that I've heard anyone talk of quadratic transformations. On the other hand, I can't think of ever transforming anything with a quadratic equation, so never had the need t speak of it.
(Also, he called it a linear conversion, not a linear transformation).
$ units
Currency exchange rates from FloatRates (USD base) on 2020-05-12
$ sudo units_cur
$ units
Currency exchange rates from FloatRates (USD base) on 2020-07-09
Looking at the source of the default configuration (cat /usr/share/misc/units.lib), I believe it only defines conversions for currencies that are pegged to another one (mainly to EUR or USD).
You have: 10 franc
You want: dollar
conformability error
1.5244902 euro
1 usdollar
You have: 10 franc
You want: euro
* 1.5244902
/ 0.655957
I didn’t look too deep into it, my understanding was that the source it uses to update itself has been taken offline. There are workarounds involving data massaging and a cron but honestly that’s a lot more work than typing “1000 chf to usd” into ddg and getting the converted amount. But if you know something I don’t, maybe you could share for everyone’s benefit?
'units' was new to me too. The version I have on my Mac wouldn't accept 'millilightseconds' but it would take 'milli-c-seconds' - presumably the units.lib database is a little different from one in the original article.
In my intern days some time around 10 years ago, a PI at the NASA GRC facility told me about a problem of this flavor an old grad student of his had.
The guy was working on an optical sensor in a light-tight lab. Every morning, he came in, calibrated the sensor, and performed measurements. All morning, it held calibration with negligible drift. But when he came back from lunch, each time, the calibration had drifted off.
Could it be related to the time of day? He tried taking his lunch an hour earlier and an hour later. Each time, the calibration was rock solid until right after lunch.
In spite of protocol, he tried eating lunch in the lab, no one else in or out. Before lunch: good calibration. After lunch: bad calibration.
He tried not eating lunch at all. That day, the calibration held all day.
How could an optical sensor have any concept of whether its user had eaten lunch? It turned out, it only had to do with the lunch box. The sensor was fiber coupled, and it was sensitive to changes in transmission losses generated by changes to local radii of the patch chord. Every morning, the grad student set his lunch box down on the lab bench, nudging the fiber into some path. After eating, he’d replace his lunch box on the bench, nudging the fiber into a different path.
After that, the fiber was secured with fixed conduit, and lunch boxes no longer entered the lab.
That would be the ‘past’ link, though it doesn’t turn anything up in this case as the title on this post is different. (The usual title is “The case of the 500-mile email,” but this copy is missing the subject line for some reason so the submitter used a representative phrase instead.)
Yes, this story gets posted a lot, and many of us might know it, but at the same time, I like to think about the ones that didn't. They will learn something new today. XKCD said it better than I could: https://xkcd.com/1053/
Genuine perennial favs are worth repeating every so often --- a year or two's interval seems reasonable, and is vouched by HN.
A marker of aging for me was seeing, a decade or two after I'd first read them in the local paper as a callow youth, repeats of previous features, by topic if not the actual text. Eventually the thought occurred to me that perhaps the versions I'd remembered were themselves not original.
People tell, and repeat, and embellish, stories. Sometimes because the young'uns and whippersnappers and new arrivals haven't heard them yet. Sometimes because they're just damned good stories and we enjoy the retelling.
The reason people share links from the past isn't some passive aggressive "UGH reposts amirite?!" like what you appear to be doing -- it's because past discussion on a fun read has lots of fun morsels of comments, and it's fun to revisit them alongside today's discussion.
HN doesn't have a rule that there should only be one canonical submission for every individual link or topic. That you thought your comment would contribute anything leads me to believe that you aren't aware of that.
Check again. We tend to enumerate previous threads that got traction and have past discussion. It's nice to go back to read those.
But most of what this person linked had 0 to 1 comments and like 2 points. I mean, yes, I would expect an interesting story from 2002 to have at least 30 failed submissions on HN that never got traction over 18 years.
The ending of this makes it sound super clean. 3 ms * speed of light => ~560 miles. "It all makes sense!"
But ... isn't the speed of light through fiber actually like 2/3 of the speed of light in a vacuum? And that fiber isn't going to be laid out in a straight line exactly to the destination. So I think really there must have been a fair bit of uncertainty around that ~3ms to abort a connection.
Was gonna say. Speed of light through cables or fiber optics is roughly 2/3 the speed of light through vacuum. Also I don't see how it would know it has established a connection until the round-trip has happened. All in all, it probably waited more like 10 ms, if this story were true, which it probably isn't.
The number for copper is not a fixed quantity, it varies with the type of cable. Electric fields are outside of the copper wire, not inside. The copper conductor acts as a wave guide for the electromagnetic wave. So it turns out that things around the cable have an effect on the speed of propagation [1], particularly the insulator. Bare copper wire in a vacuum would be very close to c. In the case of fibre, the issue is index of refraction.
Truly this is a worldly gem. Thank you for submitting this. :)
It's easy to forget that, even though transmissions still travel at near lightspeed, it still takes more than an instant to reach its destination, even digitally. I should keep this in mind, I think.
IMO even though this has been posted a bunch of times it’s important to repost these sort of campfire ghost stories so future developers can think more creatively about strange errors.
It should be recommended reading for new IT support staff. While you get unhelpful "its not working" requests all the time, when the user does give you information on the working / not working scenarios, you should always consider them, even if it doesn't make any sense.
Only the other week we were doing some testing on a new HCI (hyper-converged infrastructure) I'm doing the network for. At the end of the test period, we were having some storage sync issues. Everything seemed to PING ok, except my colleague happened to notice large jumbo frames over 8000 bytes were getting dropped. I double checked that we hadn't inadvertently changed network configuration. It was only by chance we had another test looking at transceiver signal levels that a customer engineer saw an alarm on RX level. It was then we remembered one test was to remove a module. I then noticed some error counts. We shutdown that particular link until we could visit the site. Sure enough, that fibre wasn't quite clicked in anymore. There was enough of a bridge across the fibre air gap for shall packets, but just wide enough so large packets statically couldn't be corrected enough to work.
Made ne think about the precision required for some errors to occur. Have had sort of similar things occur, and where it's almost impossible to reproduce it when you try!
As a hobbyist sound engineer, usually regular cables are the first I check, but maybe I should extend that to fibre?
Interesting error nevertheless, and honestly, checking fibre cabling for those kind of errors would probably be a bit lower on my list, unless I saw a lot of tranciever errors.
Can someone please explain to me the POP reference? I do not understand what this author means by that.
I also would like help understanding what $ units gives? The command looks to be "units", but where do the numbers he entered in come from? I would appreciate this extra context.
$ is the shell prompt; he's not typing it.
"3 millilightseconds" is the distance light travels in 3 milliseconds, the time a "zero" timeout would take to actually timeout. (This comes directly from the definition of the lightsecond: how far light travels in one second)
"miles" is what he wants to see that distance converted to. Turns out it's 558 miles; one mile is 0.00179 of 3 millilightseconds.
Edit: Found it - its /usr/share/units/definitions.units (on Pop!_OS, so probably same on Ubuntu/Debian).
The FAQ[0] mentions:
> units on SunOS doesn't know about "millilightseconds."
> Yes. So? I used to populate my units.dat file with tons of extra prefixes and units. And actually, I think I was using AIX to run units; I don't know if it knew about millilightseconds. Take a look at the units.dat shipped with Linux these days. It definitely knows about millilightseconds.
I tried locate for units.dat but couldn't find it. Anyone knows where is it? Not keen on running a system-wide find.
This story makes me smile every time it comes up. It's fascinating how many arbitrarily coded limits we keep breaking as we make our tech go faster without re-assessing the original assumptions :)
There was an odd situation where some of the systems were unable to make connections to other systems ....some largish distance away. It was fairly variable, but some systems were almost always unreachable, and some were occasionally reachable.
Long investigation, but in summary, this happened because the default packet TTL was, for some reason, set to a fairly low value in a minor kernel update. I simply increased that number in the kernel, recompiled, and all of the problems went away.
The style is very reminiscent from The Register's BOFH line of stories (though without the mischieviousness)--now those are mostly fiction, but they're amusing nonetheless...
Haha, this was interesting. I posted the same story with more or less the same title a couple of month ago [1], and no one saw it and no comments. This time it got almost 1000 points and a lots of comments.
What I think is interesting is how the same thing can get so different traction. Wonder what factors it is that makes a thing get traction and not?
Since I've worked with linux email servers (sendmail, qmail, postfix, exim, etc) practically my whole profissional life (since around 1997, but used BBS since 91 - I'm 41 now), this story really amused me and got my attention! I love this kind of email debugging! LOL
I've heard this story before, but i didn't realize it was as recent as 2002. It feels like something from a much earlier bygone era, like the early 90s
The one thing that throws a wrench in this story for me: Lattes.
Latte's in 1994? In North Carolina? No way. Maybe on the West Coast, but I moved to Cali in 1989 and they were a rarity until the mid-late 90's. There were only 425 starbucks in the US in 1994 (from their site). The "fancy coffee" craze was just a blip on the radar in the mid 90's but gaining momentum.
> The "fancy coffee" craze was just a blip on the radar in the mid 90's but gaining momentum.
Friends premiered in 1994, with The Central Perk being a major set piece of the show. I mean, yeah, its New York City and not North Carolina, but college towns anywhere are going to be early in trends.
A latte in 1994 seems plausible to me. I remember getting them from a Gloria Jeans in my local suburban mall around 1990 or so.
You're not wrong about the shape of the trajectory, but all throughout the 80s the coffee shop/latte trend was slowly building steam (heh) before it went hockey stick in the mid-90s.
In ‘94 (if not earlier), I was drinking lattes at a mom and pop coffee shop in a tiny town in the Midwest. And at another indie coffee shop at the nearest major university campus. That place was open 24 hours and busy at all hours. I didn’t even know what Starbucks was, but I sure knew lattes and cappuccinos.
So yeah, lattes in ‘94 in a major college town seems totally plausible.
Agreed, I graduated high school in 1995, and I was dating a college girl in Rome, Georgia. Our favorite hangout was a coffee shop that served, among other things, lattes and frappes.
Definitely possible. Chapel Hill isn't like the rest of North Carolina, so I'd expect something like that to appear here before other parts of the state. And I remember the Books-a-Million in Wilmington started adding a cafe / starbucks-like area for "fancy coffee" about 1996 or so. I have no problem believing there were shops serving latte's in Chapel Hill during the era this story is described as happening in. And to be fair, the author even says in the FAQ that he's not sure about the exact date(s). It could have been as late as 1997.
> My guess, from the office I remember being in, the coworkers I remember speaking about this to, and some other such irrelevant but timely details, place it somewhere between 1994 and 1997.
There was effectively no firewall at UNC-CH at the time (something something academic freedom something something), and the Stat Solaris machines were not being regularly patched. Uninvited guests had infested them, and it appeared the most likely entry point was sendmail - at the time, it was the most notorious vulnerability on the internet. Since my preference to wipe and reload was unacceptable - too much downtime and too many billable hours - the obvious thing to do was update sendmail. The rest is history.