Hacker News new | past | comments | ask | show | jobs | submit login
Raymond Chen explains what the Y2K was like at Microsoft [video] (onmsft.com)
137 points by douche on Dec 21, 2016 | hide | past | favorite | 75 comments

One thing that I found interesting was the various media coverage of the time describing Y2K as a 'non-event'.

The media - and consequently the public - was generally unaware that it was a non-event because of the massive resources poured into making it so. Instead they wrote it up as a bunch of hype and paranoia over nothing.

IMO it's one of the greatest unsung engineering success stories of our time.

Yes, indeed! As someone who also sat in a room (with many other people) watching the clock tick until the stroke of midnight I still get pissed off if people remember y2k as the event that "did not happen". Nothing happened because most of 1998 and 1999 was spent running around customer sites checking & patching software.

I work with industrial control systems and the oldest code comment that was found about the year 2000 problem was in code from the early 80s. The programmer long retired but that code was still running oil refineries.

There were very little large chemical installations in the pacific so we waited for updates from sites in New Zealand. After those sites rolled into the new year without problem everybody relaxed and we knew that our software installed in middle east oil&gas sites would work ok so people would have petrol and diesel in the new millennium. Then the rollover in the EU and US region were easy after that.

"I still get pissed off if people remember [it] as the event that "did not happen""

Welcome to the life of any operations engineer, my friend. If we're doing our job right, no one notices that the insanely complex and down-to-the-wire maintenance goes smoothly.

"Nobody gets credit for fixing problems that didn't happen." [0]

[0] http://www.agsm.edu.au/bobm/teaching/SimSS/Shayne/RepenningS...

It's the life of any job where you maintain something. Even janitors aren't noticed until they quit and the trash starts to stink.

I run OPS and Tech at a startup. A very thankless job I must say.

Ever tried cleaning a school? Students can even be heard saying things like "It's their job to pick up after me so it doesn't matter that I throw garbage on the floor".

I was on an island in Sydney harbour sitting behind someone with a handheld video camera. As the first firework exploded, his video camera turned off. Everyone looked around at the buildings, waiting for the lights to flicker.

> the oldest code comment that was found about the year 2000 problem was in code from the early 80s.

Do you remember what that comment was? I mean, was it something like

    /* TODO: this won’t work after 1999-12-31, fixme */
or was it more like

    /* Using four bytes to handle dates beyond 2000 */

It was a comment from someone who understood that 2 digit year notation was bad and explained why he used the full year notation. At this time memory was still counted in kilobytes so it probably was an expensive use of two extra bytes.

When trouble is solved before it forms, who calls that clever? When there is a victory without battle, who talks about bravery?

Sun Tzu

    According to an old story, a lord of ancient China once asked 
    his physician, a member of a family of healers, which of them 
    was the most skilled in the art.

    The physician, whose reputation was such that his name became 
    synonymous with medical science in China, replied, 

    "My eldest brother sees the spirit of sickness and removes it 
    before it takes shape, so his name does not get out of the house.

    "My elder brother cures sickness when it is still extremely minute, 
    so his name does not get out of the neighborhood.

    "As for me, I puncture veins, prescribe potions, and massage skin, 
    so from time to time my name gets out and is heard among the lords."

    - Translator's Introduction, Taoism and The Art of War
      The Art of War, Sun Tzu, Thomas Cleary

Some things did break on Jan 1st 2000. We found that our 16 bit Windows client failed (blocked) SSL validation, and that some customers still used the 16 bit client!

(It wasn't the certificate having a date in the future which they will all have done for around a year at that point. It was getting the local time and using that to decide if the certificate was valid.)

Yeah... I wondered the same when nothing happened, it seemed like a big deal for nothing.

Then the next day when I opened ACDSee it said my license had expired. It was a pirated license that previously had shown valid until 2050 or something.

All of a sudden it put the whole thing in perspective, and I realized that those things could happen anywhere causing small bugs, and much bigger ones in systems where dates actually mattered.

I had a PDP-11/74 running RSTS that wouldn't boot after Y2K. I think I had to dig around inside it to reset the clock before it would turn on again. It was an old machine at the time, but I'm pretty sure there were still some like it in the field, so someone was pulling their hair out on January 1st.

But the counterpoint to your argument is that there were lots of countries and companies that were being derided as not being Y2K-ready and nothing much happened to them as well. It's very hard to say that Y2K preparedness work was unnecessary but it is not clear that a lot of it was justified.

I formerly worked as an escalation support engineer in Microsoft's Product Support Services (PSS) for Windows networking. I, and a lot of other people, were at our desks in Las Colinas, TX for the Y2K rollover. Nothing happened. We got a press call from a reporter asking if anything was going on. I said I wasn't allowed to talk to the press, but half the people on the floor were already drinking.

We knew early on that the Windows dev teams had done their job as Y2K hit Australia and nothing happened. Then Europe and nothing happened.

Even though I "missed out" on the big Y2K celebrations and instead had to celebrate with a bunch of nerd coworkers in a boring office building, it felt good to be a part of something where we all banded together and pulled off a major piece of work.

This was before MS had a formal support lifecycle, right?

I'm not sure if I follow your question. I worked in Windows NT networking escalation/debug support. We did NT4 and Windows 2000. You had to have a support contract with MS or work off some Pay Per Incident vouchers. Some people had Premier contracts and got to speak with me quickly (but already had first line dedicated engineers), but I mostly handled escalations from our outsourced support specialists (Keane in Arizona and one other I can't remember).

The Pro customers (non Premier) definitely drove the most support volume and often had the more difficult cases. Premier cases were often "hey what if" cases that we answered quickly or were straight bugs. Believe it or not, bug defects are easy because they're usually obvious and the path forward is also obvious (install checked builds to get more info and/or set up break points and remote debug for crashes).

So do you know for example when they exactly ended NT 3.51 updates?

Right before I started in the end of 1999.

Last hotfix I can find is this: https://support.microsoft.com/en-us/kb/253518

> The media - and consequently the public - was generally unaware that it was a non-event

Really? I was only 10 during y2k and not techie at all (yet).

I remember it everywhere...news, school discussions, tshirts, TV, general conversation (similar to zombie apocalypse emergency plans some people talk about).

If anything, I'd say the hype was a little too high. Planes falling out of the sky, nuclear weapons detonating, etc.

Still...I agree. Zero issues. Very impressive.

That's sort of his point. The narrative after the Y2K rollover was that it had been a whole lot of hype and didn't end up being a big deal after all. Conveniently ignoring the fact that the reason it wasn't a big deal was the massive amount of effort expended to make the transition seamless.

If you believed the hype your toaster oven would have exploded on Y2K because it had a clock on the front.

A lot of work went into fixing the real problems but there also was a lot of fuss about nothing.

Y2K at Microsoft was boring as hell. What wasn't boring as hell was the long period of time leading up to it. But everyone here knows that. New Year's Eve 1999 and the hours following was probably the most Age of Empires I've ever played in one sitting.

EDIT: "for an estimated issue programmers were not taking into account when applying the Gregorian calendar rule to software."

First, what the hell does that even mean? Anyway, no, it was taken into account. What, you think programmers didn't know what would happen when 2000 rolled around? What wasn't taken into account was that the software would still be running ten, twenty years later.

And most of the programmers who assumed that their software wouldn't be around that long were probably right. It was just a few who had the misfortune of creating successful software that ran into the problem.

In ops, boring is good. :) Great work guys, we had no issues with our MS Software.

I recall spending significant hours during 98 and 99 on the various systems I was involved with, to check every tick box possible - applying updates where needed and verifying each and every piece of hardware and software.

Number of issues after we'd put in all the hard work: Zero. My reaction to all the people complaining about the lack of issues: Well, doh, _really_? What did you actually expect would happen after so much time and money spent?? Go home.

I worked Y2k by myself in the Noc at an ISP in New Zealand ( aka UTC+1300 )

* Got paid $1000 bonus

* Called up by local reporter asking if any problems - Nope

* Called by several of our vendors/providers in the US just to say "Hello", and to casually ask how things were going.

* One ISP in Australia took themselves offline for the evening

* Rumors were going around Australia that all the power and the phones in NZ had died.

* The one little problem with a provider on the evening got a lot of people closely looking at it.

* There was fog and I didn't get to see the fireworks down town.

The weather was truely awful. I went up One Tree Hill and just saw clouds light up with the fireworks.

Raymond Chen is the tech writer who has influenced me the most. His blog has really helped me understand how large long-running software projects work (or at least, how they work at Microsoft), and how apparently strange features can come into existence in a sensible way.

One interesting evolution over time is how Windows moved from basically trusting all applications (which they had to do in Windows 3.X anyway, as any badly behaving app could crash the whole system) to treating applications as bad actors.

I haven't seen anything from him on the whole Windows 10 forced update fiasco. I'm curious how that played out internally.

I think he writes his blog posts a year or so in advance. https://blogs.msdn.microsoft.com/oldnewthing/20090227-00/?p=...

He mostly only discusses things where there is an interesting code angle, with the updating there is just politics really.

About Y2K: during that time my family kept hearing that really bad things could happen at midnight with the coming of the new millennium. One of the main things that could happen, I think, was the lost of electrical power. So, a couple of seconds before the new year I strategically positioned myself next to the light switch and when the clock hit midnight and everybody started yelling "happy new year" I turned the light off and yelled "we lost power". I wish I could have recorded the faces of shock and fear in some of my family members. It was funny as hell.

I'm sure it's not how the whole Y2K process was running, it's just how the exact New Year's Eve event was planned.

I was personally also on ready-to-go-to-office stand-by, even not working in Microsoft, but in much, much smaller company. And my team did do a serious work in making the code we were responsible be Y2K proof, we've spent good part of 1998 with that.

In short, we've worked for almost a year to make that "nothing serious happened" result possible, in our own domain of responsibility. The highest management understood the problem and the process was properly planned. That "nothing serious happened" for our code is a success story of the quality of the tests we've did and used to fix the real problems. And they did exist. I'm sure there are other HN readers who can tell the similar stories.

The computer-related risks are a serious subject.

I remember following the news on the New Year's Eve to see what's happening in Japan and Australia. As also "nothing serious" happened there, I was ready to bet that everything will be fine, and that enough other companies also took the subject seriously and acted early enough.

I'll admit to a pause for reflection around 9pm UK time (Midnight Moscow time). After that, no worries. ADA has robust time and date libraries.

I hadn't thought of that as a serious possibility, but a friend and I considered writing a novel on it.

The basic idea was that a bunch of US weapons systems wouldn't work due to Y2K - things like ballistic missile navigation. So the US was frantically trying to get all this patched so that they would have a credible defense after Y2K. The Chinese knew this, and launched on New Years Day...

... and completely missed, because they had used borrowed Russian code for their ballistic missile navigation. The Russians had stolen US ballistic missile navigation code, which had the Y2K issue in it.

> I hadn't thought of that as a serious possibility

Not serious?



"Launch code for US nukes was 00000000 for 20 years"


So it was not actually eight zeros but apparently 6 zeroes and the "key under the doormat" (in the safe, but not really something you needed the president to access, the opposite of what was claimed then).

That's all very different from "launch because of a Y2K bug", though...

No technology is perfect, it's always a process.

And effectively every computer-related technology has undiscovered bugs.


""The board wishes to point out," they added, with the magnificent blandness of many official accident reports, "that software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system." (...)

(...) really important software has a reliability of 99.9999999 percent. At least, until it doesn't. "

The statistics is against that generous number of nines.

I wonder when the wind up for the 2038 problem (32 bit signed int and unix epoch) starts.

If you're thinking it's already largely resolved, this is from 64 bit mysql. I assume similar issues exist in other software.

  mysql> SELECT UNIX_TIMESTAMP('2037-11-13 10:20:19');
  | UNIX_TIMESTAMP('2037-11-13 10:20:19') |
  |                            2141742019 |
  1 row in set (0.00 sec)

  mysql> SELECT UNIX_TIMESTAMP('2039-11-13 10:20:19');
  | UNIX_TIMESTAMP('2039-11-13 10:20:19') |
  |                                     0 |

See also "System call conversion for year 2038" (https://lwn.net/Articles/643234/) for a brief discussion of the Linux case.

Works in PostgreSQL 9.6.1

  postgres> SELECT extract(epoch from to_timestamp('2037-11-13 10:20:19', 'YYYY-MM-DD hh24:mi:ss'));
  (1 row)

  postgres> SELECT extract(epoch from to_timestamp('2039-11-13 10:20:19', 'YYYY-MM-DD hh24:mi:ss'));
  (1 row)

Well, we still have 21 years to solve it

A bit pedantic, I suppose, but Y2K problems showed up well ahead of Y2K. Things like credit cards with expiration dates well in the future, and validation code making comparisons.

Yes, our system doesn't work on some operating systems because we for some reason use dates after 2050 as a flag for something. And that breaks together if we testrun on windows but works fine on Linux.

Unless of course, you are trying to store the maturity date of a thirty year bond or mortgage.

You store the ISO date, you don't need a UNIX epoch to calculate that

You "store" it that way (as a string? no), but how are the relevant systems (database, OS, application) handling calculations with it?

You should never store or work with dates as unix epoch-based timestamps. Use proper date/time datatypes, and manipulate them with library functions. Naive integer arithmetic on unix timestamps will bite you in the ass a hundred different ways.

> Use proper date/time datatypes, and manipulate them with library functions.

The point is: What do they do? There are plenty that will use... Unix timestamps (or worse). Or use those as the lowest common denominator for interchange. There are more systems at play than the data layer, and all of them will take every opportunity to bite you in the arse.

Certainly a good idea, but sometimes it's difficult to tell if OS functions and 3rd party libraries you depend on are doing that.

The ext4 filesystem in Linux, for example, uses epoch time. I think they have it all fixed now, but there were bugs still out there as recently as this year: https://bugzilla.kernel.org/show_bug.cgi?id=23732

that many years ago it was probably stored on tape and processed on a mainframe by a COBOL program. No database, it was probably stored as a string, and the application would likely have been fix to handle it before it was a problem.

That COBOL program almost certainly _is_ a database – not a SQL database but the COBOL language standard specifies structured records and allows you to specify whether a file is sequential or indexed (see e.g. https://en.wikipedia.org/wiki/COBOL#Data_division).

In the case of Y2K, the problem was often that people defined a date filed as three values all declared as PIC 99 (i.e. a two-digit number). If they migrated to 4 digits, we're fine until Y10K. If they added a window (values less n are 19xx, etc.) or switched to Unix timestamps or a SQL database – there are products which let a COBOL runtime transparently map the indexed file semantics to SQL statements – then it requires more information to say whether it's at risk.

I work with a COBOL database, and interestingly all of the dates are declared as four PIC 99 fields, like this:

      03  ITM-LC-DATE-CC    PIC 99.
      03  ITM-LC-DATE-YY    PIC 99.
      03  ITM-LC-DATE-MM    PIC 99.
      03  ITM-LC-DATE-DD    PIC 99.
I assume this split between 'CC' and 'YY' is a relic of the Y2K updates, so they could add in a whole new field with a default of '19'.


  | TIMESTAMPDIFF(YEAR,CURDATE(), "2050-01-01") |
  |                                          33 |
Seems to work fine

That's just the database, and one that we happen to know works.

The original comment referred to exactly this database, so it is relevant


One of the top results for "convert date to integer". People who have an integer on one hand, a datetime on the other are going to see this code and use it (hey, it works) without understanding the ramifications.

I suppose one can only hope that such code is brittle enough that other problems bring it down before 2038.

My first professional programming job was to fix a Y2K bug in 1995 in an embedded system that hadn't shipped yet

I was born on January 1, 1960.

So January 1, 2000 was going to be either my 40th birthday, or else my -60th. :)

Where I worked after college, Y2K was the land of honey. It was a .gov that essentially was able to get unlimited overtime funds for Y2K. Everyone participated and my understanding is that people were "working" 18-20 hours a day in preparation for Christmas 1999.

y2k was a massive deal that was totally solved by great engineering. The full range of engineering approaches, the full range of talents, the full range of management interference brought to bear in every country, company, department etc. and all of it worked out equally well so there was no problem. Us engineers, we're amazing, we should take a bow.

Or maybe, just maybe it was just a teensy weensy bit overblown so consulting companies could charge really big fees especially to government departments and banks. Easiest way to sell is to scare the st out of people then show them "the solution." I bought an invasion like that once, regret it now...

We spent the two years prior to Y2k replacing our client's old mission critical servers and services with new 'Y2k certified' hardware and software.

I personally think this contributed significantly to the dot.com era growth.

Yep, and subsequent bust.

Many many companies getting not significant cash inflows in the run up to 2000 which made their balance sheets look great, until the end of the 2000 tax year in 2001 when they didn't look so great any more.

That's a great point that I had forgotten.

In terms of y2k effort, it seemed to me to be a small amount of code change and a disproportionately high amount of testing. The ramp up in hiring for those changes and certification, combined with the lost opportunity cost, did contribute to a later contraction.

Here's one thing that puts it in perspective though. The amount of code in the world that was Y2K-affected was considerably less than the total amount of code that exists today. And stepping outside the HN startup bubble (where code is relatively young and modernized) and into the types of companies affected by Y2K, there are many efforts that are as bad or worse occurring all the time. Stuff like regulatory changes (SarbOx or various banking reforms), technology uplift (e.g., 32bit --> 64bit, Win32-->.NET, platform ports, etc.) are as bad or worse based on the amount of legacy code it affects.

But what Y2K did is teach a lot of these companies how to solve sweeping, codebase-wide problems.

Now that I realize, problems caused by Y2K were much fewer than problems caused by leap seconds or "unexpected" leap days

Only because the former was widely known, discussed and the measures taken, and the later was typically badly understood and almost nobody cared.

Quick, do you know, does your computer system ever display the 61st second? And do you think it should?

It most likely does, but blink and you'll miss it.

The answer is: it is complicated.

POSIX:2001 specifies:

"As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds."

And that's typically no problem for normal use. The specialists know that there are different time standards and that for "real" number of seconds one has to use TAI, not UTC.

The problem in Unix world with NTP and the datetime algorithms was that some programmers believed that they have to actually see the leap second on their own computers in the kernel timestamps, up to the kernel intentionally producing discontinuities for kernel times (behavior which never had sense for timestamping purposes but implemented as such anyway). So now we have the configuration variations like this:


and, to avoid Linux kernel discontinuities:


In fact, the smoothing of UTC and using TAI for those who need the "real number of seconds" since point x was known as the reasonable approach long ago:


Now it's clearer why it's complex: too much people locally "assumed" what was not to be assumed and didn't understand the effects of their local decisions and the global context.

Hopefully "smearing" will get standardized and accepted and nobody will have to care, except the specialists who really need TAI. The leap second corrections should be invisible for normal uses, just like nobody cares for correcting the clocks for much bigger differences.

This hacker-fiction y2k is fun (if quite dated now): https://www.amazon.com/Wyrm-Mark-Fabi/dp/0553578081

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact