
Microsoft Teams outage due to expired certificate - stygiansonic
https://techcrunch.com/2020/02/03/microsoft-teams-has-been-down-this-morning/
======
akerro
Is anyone else redirected by this url to
[https://guce.advertising.com/collectIdentifiers?sessionId=3_...](https://guce.advertising.com/collectIdentifiers?sessionId=3_cc-
session_3982b38b-2de1-484a-98bd-4a0b8635308c) ?

[https://gfycat.com/fortunateyawningcopperhead](https://gfycat.com/fortunateyawningcopperhead)

My router is blocking this domain on DNS level in OpenWRT, techcrunch.com is
redirecting me there, so I can't visit this page.

Edit: I literally can not visit any techcrunch.com article, all of them
redirect me to this doggy ads+tracking domain. It doesn't matter if I came
from google, DDG, reddit or HN.

~~~
frio
It's a Javascript based redirect; block all JS on the page and it'll stop
happening. Super annoying.

~~~
the8472
_> It's a Javascript based redirect_

No it isn't.

    
    
        GET https://techcrunch.com/2020/02/03/microsoft-teams-has-been-down-this-morning/
        HTTP/2 307 Temporary Redirect 157ms

~~~
frio
Huh, sorry for that then. I used to get this all the time with TC, Engadget
and a few other sites, but since turning JS off, it's stopped. Something else
must have changed around the same time I tinkered with it.

------
stygiansonic
A similar problem with Azure happened way back in 2013:
[https://www.computerworld.com/article/2495453/microsoft-s-
az...](https://www.computerworld.com/article/2495453/microsoft-s-azure-
service-hit-by-expired-ssl-certificate.html)

More recently, it happened with Ericsson:
[https://www.theverge.com/2018/12/7/18130323/ericsson-
softwar...](https://www.theverge.com/2018/12/7/18130323/ericsson-software-
certificate-o2-softbank-uk-japan-smartphone-4g-network-outage)

This article has some information about how Let's Encrypt enabled an
"automated process that handles renewals": [https://duo.com/decipher/proposal-
to-make-https-certificate-...](https://duo.com/decipher/proposal-to-make-
https-certificate-expire-yearly-back-on-the-table)

I wonder if such a process should be made an industry standard? Does anyone
know if there are any proposals for it?

~~~
cbhl
Let's Encrypt literally is an implementation of the industry standard; the
standard is called Automatic Certificate Management Environment.

[https://tools.ietf.org/html/rfc8555](https://tools.ietf.org/html/rfc8555)

~~~
cpitman
But, somewhat annoyingly, it is only seen as applicable to the public
internet. There's no effort to make ACME based CAs for non-internet usage.

~~~
pmlnr
For internal use, create and distribute you own root CA with self-signed
certificates.

~~~
eliaspro
And by letting smallstep/certificates [1] handle ACME, it's just as easy as
using LetsEncrypt for public certificates.

[1]
[https://github.com/smallstep/certificates](https://github.com/smallstep/certificates)

~~~
cpitman
This is great, thanks for sharing.

------
godelmachine
I am going to seize this opportunity and rant out my angst against Microsoft’s
worst product till date.

Has anyone even felt that Teams is a heavy app that consumes a lot of time to
come alive?

Even during calls, the quality is horrible that I don’t even want to describe
the pain I go through. There’s strong distortion and voices will never be
heard clearly.

~~~
m4tthumphrey
You are not alone but MS has market share and easy reach with 365. Slack (as
an example) is a much better product but MS can afford to be lazy. Hopefully
when Slack releases SIP support it will be forced to improve.

~~~
noelsusman
If you're a decent sized organization, you can pay $12.50 per user per month
to Slack for a messaging service, or you can pay $20 per user per month to
Microsoft for Outlook, PowerPoint, Word, Excel, Access, Publisher, Exchange,
Sharepoint, and a messaging service. If you just want messaging then you can
pay $8 per user per month, and they'll throw in Exchange, Sharepoint, and a
terabyte of cloud storage per user on top of it.

I don't think it's possible for Slack to be better enough to justify that kind
of a price difference.

~~~
luckylion
That's probably true for cheap labor, but ~50% of developers I know will try
to switch into other teams when they are forced to use Teams because it's that
annoying. When you pay somebody thousands of dollars each month, paying an
extra 12.50 to make him not look for a different job isn't really something
you should need to think about a lot.

------
novok
There should be a page out there that lists Microsoft outages due to missed
certificate or domain expiration. Like that hotmail one long time ago.

I would think at this point they would be their own major certificate
authority and maybe domain registrar.

~~~
klodolph
> I would think at this point they would be their own major certificate
> authority and maybe domain registrar.

From experience this probably wouldn't fix things.

What often happens is that somebody creates a system that uses a certificate,
doesn’t automate renewal, and then the person responsible for renewing it
changes teams or leaves the company. Email reminders only go so far—they not
only need to go into the right inbox, but the person watching that inbox has
to care.

~~~
jeffrallen
My last domain expiration outage happened like that.

If it's in production, just buy a 10 year cert. This virtually guarantees an
outage after 10 years but virtually guarantees it won't be your fault when it
happens...

~~~
tialaramex
> just buy a 10 year cert

New certificates in the Web PKI ("SSL certificates") have a maximum lifespan
of 825 days. This is enforced (if a CA were to issue a certificate with a
longer lifespan Chrome for example would just treat this certificate as
invalid). The commercial CAs mostly offer one year or two years, with renewals
using the 825 day limit to offer renewals in the overlap, so e.g. you buy two
years in June 2018, in April 2020 you can pay for two year renewal and the new
certificate expires in June 2022 not April 2022.

If you're using certificates in your own PKI (as it's likely Microsoft
actually was in this particular incident) then there's no need to buy them and
it's up to you what your appetite for risk is on when they expire.

------
gwbas1c
Another HN post mentioned that a lot of collaboration sites were down because
many people are telecommuting due to the Wuhan virus.

I honestly thought this was why Teams was down for me.

~~~
cesarb
> Another HN post mentioned that a lot of collaboration sites were down
> because many people are telecommuting due to the Wuhan virus.

For future reference, it was probably this one:
[https://news.ycombinator.com/item?id=22222121](https://news.ycombinator.com/item?id=22222121)

------
matt_morgan
This has happened to me enough times to be embarrassing. It seems to happen to
other people who you'd think have some sensible way to avoid it.

Is there a reminder service out there that specializes in your long-term
expiring things? I'm not sure what would be different about it than a regular
calendar, but it seems like many of us need something that makes this easier.

~~~
Someone1234
There are multiple reminder services.

At larger companies a lot of the issue isn't literally generating reminders,
it is making sure they're sent to the correct people/departments and are
actioned by anyone.

For example you sometimes have reminders sent to ex-employees, or sent to a
mailing list and everyone assuming everyone else is going to action it. Or the
reminder gets ping-ponged between multiple managers via email with nobody
either able or willing to deal with it.

None of these are tech' issues, and they don't have technology solutions as a
consequence. So whenever I see an embarrassing expired cert, I don't assume
technical malfunction, I assume political malfunction.

~~~
jalk
I guess they should send the reminders to the Slack channel that the ms ops
team uses ;-)

~~~
outworlder
Yup. And hope that the channel is not being similarly used for many other
things and generating alert fatigue.

------
ChuckMcM
It is an interesting side effect of the tenancy of software developers these
days that any process that requires action on a > 2 year interval is likely to
fail, if the cycle is 5 years or more it will _always_ fail.

The turnover insures that nobody in the department was there when the process
was started/last interacted with, and so it is off the collective
organizational radar so to speak.

~~~
ChrisSD
This is why Let's Encrypt has a short cycle.

~~~
tialaramex
Not really. The short (90 day) Let's Encrypt expiry is intended to promote
automation because it's annoying to do so many renewals by hand, and is also a
reflection of the relatively short lifetimes of most Internet names.

Historically it was common to issue 3 year certs, and five year certs weren't
rare (until 2015). But whilst it's reasonable to expect microsoft.com or
bbc.co.uk belonging to the same outfit in five years, it's hard to be as sure
about say jsnes.org (currently a Javascript NES emulator) or catandgirl.com (a
web comic by Dorothy Gambrell) which might well entertain offers from somebody
else who wanted those names.

The underlying domain name is typically on an annual renewal cycle with
perhaps just 14 days grace if you stop paying, and individual FQDNs might have
even shorter turnaround. With a five year certificate this means you could buy
a certificate the day before your renewal payment is due, and then still have
an apparently good, working certificate for that name five years later when
it's owned by somebody else entirely who has no idea you once owned that name.
Not great. Let's Encrypt's renewal cycle closes this gap considerably. The BRs
were also amended, the limit is now 825 days instead of 39 months or
(originally) 60 months.

------
gjsman-1000
I'm actually curious: Is there a market for a SaaS which simply keeps track of
certificates and when they expire? (Perhaps even with an auto-Deploy new
certificate mechanism?)

~~~
gerdesj
Perhaps but I call it doing my job. I run up a SSL cert check on icinga for
each system as needed. It is quite trivial to roll your own script or find one
that can be run from cron. It would probably need more work maintaining an
account with a saas.

~~~
randyrand
Yes, but will your company remember to fill your job when you leave?

~~~
inkeddeveloper
job security?

------
jamiesonbecker
They tweeted that it was an _authentication_ certificate. I.e., probably not a
regular TLS domain certificate or similar (still could be TLS client cert
though), but probably more like a certificate/key that one service used to log
into another. A lot of microservice/container/kubernetes setups use them for
all kinds of stuff, which is really a big step forward over password logins.

Not like it matters, but it kinda does, because those tend to be private and
internally generated, and not necessarily signed by an external certificate
authority.

~~~
insomniacity
I have this problem, and it concerns me, because there's no external polling
check that's going to spot that.

You have to rely on either the code itself checking each time it uses the
certificate, and alerting.

Or (taken from elsewhere in this thread) you test it during your build, and
hope that someone is still building the code by the time the cert comes up for
renewal.

I'll probably be doing both.

------
certera
I'm going to shamelessly plug my project, Certera, here. It handles
monitoring/tracking, cert issuance and renewals and helps larger organizations
manage their certificate needs more consistently.

[https://docs.certera.io](https://docs.certera.io)

~~~
reagan83
I just tried to go to www.certera.io to learn more and got a “connection not
private” warning page in Safari on iOS. Very ironic :)

~~~
cdolan
I did not have this issue on iPad OS

~~~
woofcat

      ~# dig -t A +short www.certera.io
      certera-io.github.io.
      185.199.108.153
      185.199.110.153
      185.199.111.153
      185.199.109.153
    
      ~# dig -t A +short certera.io
      185.199.108.153
      185.199.109.153
      185.199.110.153
      185.199.111.153
    

Looks like [https://www.certera.io](https://www.certera.io) is going to github
of which is only returning a cert for itself, and not his domain name.

~~~
snowwrestler
[http://www.certera.io](http://www.certera.io) redirects properly to
[https://certera.io](https://certera.io).

[https://www.certera.io](https://www.certera.io) fails the certificate check.

It's a good example of the difficulty of getting TLS perfectly right.

In theory this set up is fine; the default behavior of all the browsers when
typing "www.certera.io" is to interpret it as a request for
[http://www.certera.io](http://www.certera.io).

But if the client has anything in place that automatically upgrades http to
https before submitting the request, you're going to need a valid cert for the
www subdomain in place or you'll throw a cert error before reaching the
redirect.

Even if your site omits the www subdomain in production (as certera does), a
lot of users will just type it in anyway. So, you better be ready to handle
that request via https.

~~~
certera
You're spot on. I was aware of this limitation of GH pages and once I make
money, I can start spending on actual hosting. I explained more above.

------
_bxg1
Amazing. A company like Microsoft could afford to hire an entire department to
do nothing but make sure certificates don't expire, but this still happens.

Jokes aside, I don't understand how this problem hasn't been solved in the
general case.

~~~
wnevets
>I don't understand how this problem hasn't yet been solved in the general
case.

isn't that what ACME is suppose to do?

[https://en.wikipedia.org/wiki/Automated_Certificate_Manageme...](https://en.wikipedia.org/wiki/Automated_Certificate_Management_Environment)

~~~
_bxg1
And yet

~~~
naikrovek
Microsoft are not even close to the only company this happens to. The problem
is a human one - as people change positions within a company, or leave a
company, responsibility for this kind of thing can fall through the cracks.

Happens all the time, even if ACME is employed, and it's unlikely to ever stop
happening.

~~~
_bxg1
All I'm saying is, that downtime probably cost them a lot of money, and
resulted from something that's extremely preventable. Even if it is a human
problem, one would think they'd allocate the resources necessary to solve it.

~~~
naikrovek
yes, in an ideal world it wouldn't happen. and if you really make it a
priority you can certainly mitigate it.

------
ce4
Does anyone know what endpoint's certificate had expired?

Would be interesting what CA they used for it and if it's a SAN certificate.

Edit: here's the certificate log of the teams subdomain but I couldn't find
the one that expired today in it
[https://crt.sh/?q=teams.microsoft.com](https://crt.sh/?q=teams.microsoft.com)

~~~
chrishas35
There were no end-user visible certificate errors during the outage, so I
suspect it was an internal/backend cert.

------
cutler
Just listening to the number of complex shenanigans experienced sysadmins have
to employ to keep up with the demands of managing HTTPS makes me wonder how on
earth your average non-technical DIY static site developer has a chance in
hell of keeping his site from failing modern browsers' requirements. Universal
HTTPS is a bad joke.

~~~
jmspring
Downvoted. If one is registering domains and dealing with DNS entries, etc,
it’s not a stretch to find someone to setup LetsEncrypt for you.

Heck - googling - [https://tecadmin.net/auto-renew-lets-encrypt-
certificates/](https://tecadmin.net/auto-renew-lets-encrypt-certificates/)

~~~
cutler
As a sysadmin myself the number of non-trivial modes in which certbot can fail
never ceases to amaze me. Running Apache? Watch your certbot renew fail to
bind to port 80 because your server is running. Now your renewal cron task
needs to take into account stopping and restarting the server which the
standard cron task does not include. What made the web great was that it
didn't keep out the DIY developer. Now it does exactly that .... via universal
compulsory HTTPS on trivial, static sites.

~~~
baobabKoodaa
Static sites can be easily published on Netlify or GitHub Pages at no cost -
certificates provided by the host.

------
sergiotapia
It happens - yesterday we went down for the same god damn reason. There has be
a better solution.

------
cutler
Our HTTPS overlords have much to answer for. How many static sites, for
example, really need HTTPS and the non-trivial maintenance involved in the
average Apache/Letsencrypt/certbot setup? Talk about sledge-hammer to crack a
nut. And renewals every 3 months?! Don't get me started. Sure, the likes of
Microsoft should be able to do better but isn't there a message here? Beyond
secure sites such as finance, government, logins and ecommerce the whole HTTPS
certificate nonesense is a giant burden/cost with no benefit.

~~~
tatersolid
> How many static sites, for example, really need HTTPS

All of them. Script jacking, ad insertion, redirection, tracking insertion,
etc. are all done at scale by everything from national ISPs to coffe shop
routers.

HTTPS provides authenticity of all transmitted data; this is _more important
than confidentiality_ because without authenticity you can’t tell that you are
talking in secret with an atttacker.

------
chasd00
when getting a site/API on its feet, enabling https and the cert is usually
the last thing to get done and an afterthought. Certs are easy to forget about
but when they expire they shut.down.everything.

~~~
killjoywashere
Which is what makes Let's Encrypt sooooo attractive. Free certs are nice but
the client is dreamy.

------
jlgaddis

      apt install ssl-cert-check

------
w0m
ouch

------
krzat
Teams is great improvement over Skype for Business, which was great
improvement over Lync, but it's still garbage. Interesting how the same
company also made awesome VSCode.

~~~
gambiting
Is it? I feel like every iteration is worse than the other. For Christ sake,
Teams can't even scroll a few messages up without having to load more, and
then it immediately forgets the latest messages. Like, it's an IM program that
has trouble remembering more than 20 messages at once?? Also if you send
messages quickly they appear in reverse order, again, it's an IM program that
can't even do the "messaging" part of IM right.

~~~
bouke
Indeed, a small list of the problems I have with Teams:
[https://www.reddit.com/r/Office365/comments/axzdct/my_issues...](https://www.reddit.com/r/Office365/comments/axzdct/my_issues_with_teams/).

