
IBM Cloud was down, as well as their status page - whyleym
https://cloud.ibm.com/status
======
colinbartlett
My side project StatusGator monitors status pages (including IBM's ill-fated
page) and I'm seeing more than 10% of the nearly 800 services we monitor
having an outage right now.

So it appears to affect anyone who depends on IBM Cloud.

~~~
bberenberg
I really wonder how people get value out of a meta status page when my
experience is that status pages are often incorrect about what the actual
status is. Whether they're manually updated, or it's a case of "your 9s are
not my 9s", it seems like a compounded broken telephone problem.

~~~
pas
Probaly it's great to have some very big picture overview. Both in scope
("all" the cloud, and both in time as in "all" time, and maybe there's even
some value in looking at the correlation of these).

Maybe it helps with doing a sanity check before picking a provider. And, I
guess, at a basic level it helps with accountability/transparency.

~~~
o-__-o
When we started broadcasting status checks we thought it would be a great way
to let users know what's happening behind the scenes. Then it turned out a lot
of our happenings were self-inflicted so rather than being less transparent by
way of less posts, we just tweaked our status outputs. "We are experiencing an
event in <SOME VERY HIGH LEVEL SERVICE>" x1042. After we perform our RCA then
we may post a blog about it or a short PR blurb

The last time we went down I questioned out loud the point of the status page
and the general consensus was for others to be able to reference our outage.

------
ComputerGuru
So what are HNers using IBM Cloud for and where do you see that it has an edge
over AWS offerings (where an overlap exists, obviously)?

(I figure either you’re in devops and you are putting out fires too busy to
read this thread or you’re not and your work is halted because of the incident
so you might have time to read and reply ;)

~~~
toast0
We used Softlayer (rebranded to IBM Cloud, and affected by this) at my last
job. For the most part, their service pretty much just works; clearly not
today. :)

We had a couple thousand bare metal servers, and barely used any of their API
stuff.

As with any facility, there were occasional issues with electrical transfer
switches, core router failures, fiber cuts, etc. _Stuff_ happens, but we got
pretty good communication, and things got resolved in a reasonable amount of
time. Service got noticeably worse after IBM, but we were already planning to
move to our acquirers hosting, because that's what happens when you're
acquired. Oh, and their load balancers had garbage uptime.

Bandwidth prices used to be pretty reasonable, but they've adopted AWS style
obscene pricing. At least they still let you use the private network for free
(including to other datacenters).

~~~
dang
HN ran on a box at Softlayer until early 2018 or so. This makes me think that
the title of this post (which was submitted as "IBM Cloud down as well as
their status page which looks to be hosted there") could at some point have
been "IBM Cloud down as well as their status page which looks to be hosted
there as well as the forum where people post these things which also looks to
be hosted there".

~~~
pvg
_HN ran on a box at Softlayer_

I've always imagined it as a big tower shoved under someone's desk. The side
panel of the case is off because otherwise it overheats. On the screen there's
a single maximized window of DrRacket. A post it note warns you not to quit or
reboot the system.

~~~
MaxBarraclough
And there's a switch set to _More magic_.

------
blantonl
All of Broadcastify's audio servers (hosted with Softlayer in their Dallas
datacenter) are completely unreachable and down.

I'm going to wait a bit to see if we get a status update, otherwise we'll be
spinning up instances on AWS to failover (which will be enormously costly for
bandwidth)

No status, no nothing, we're in the dark.

~~~
Operyl
Hey. Do you want to shoot me an email, IRC chat, or anything? I can keep you
up to date with what I'm hearing from my manager.

~~~
dashesyan
Hey, I'm a customer of IBM Cloud, too. Could you share what you're hearing
from them? It would be nice to know what's going on

~~~
Operyl
So far? Pretty much no news, they're using Slack to communicate a bit. VPN
access for everybody is broken or barely working, no internal ticketing as a
result. As far as I can tell, private networking is mostly working between
servers (at least, for my servers, ~60).

------
Fordec
I remember I was at an IBM sponsored hackathon around 2015 where it was a
requirement to use Bluemix. Over the course of the weekend the service went
down for hours 3 times.

Literally this morning I was wondering what ever happened to it, like did it
die a quiet death? Oh it rebranded to IBM cloud in 2017. Now this news.

I think there's an eponymous law named for this sort of thing.

~~~
kinghuang
That's funny. I've had the exact same experience with Bluemix at a Hackathon
in the past. It was down for almost the entire weekend, screwing all the teams
that didn't pivot early enough.

------
vmh1928
In the Cloud Status History page scroll down to the 6:32 entry that says
"Unable to Access IBM Cloud"

[https://cloud.ibm.com/status?selected=history](https://cloud.ibm.com/status?selected=history)

\- 2020-06-10 02:19 UTC - RESOLVED - The network operations team adjusted
routing policies to fix an issue introduced by a 3rd party provider and this
resolved the incident

------
voz_
I generally do everything on AWS or GCP, with a little Azure sometimes for
personal projects. In what world does IBM beat one of those three in anything?
Generally curious - how they are able to stay competitive?

~~~
twalla
Their bare metal cloud offering (SoftLayer acquisition) was actually pretty
good whenever I used it about 4 years ago. Wasn’t the most intuitive API or UI
but you could get a bare metal server anywhere in the world in a few minutes.

~~~
rad_gruchalski
When the wind blows in the right direction. Sometimes, your server would get
stuck in provisioning for hours and only get „un-stuck” after creating a
support ticket. Which, I kid you not, at one of the previous jobs, wd had
automated in our provisioning popeline. Good times.

But when it worked, it worked. API was voodoo.

~~~
bashinator
I just discovered this today:

    
    
        aws support create-case \
            --subject "not working" \
            --communication-body file://description.txt

------
blazefox69
Fixed it for you [https://github.com/ibm-cloud-
docs/overview/pull/74](https://github.com/ibm-cloud-docs/overview/pull/74)

------
caiobegotti
Honest slightly cynical question: most probably someone inside the responsible
team said some day that it would be very stupid to host the status page inside
the same infrastructure being monitored, but they were probably ignored...
what should that person do now? Say "toldya!" out loud in the postmortem
meeting or simply shut up and move on because reality is that we are hired to
do some stupid task and not to think for ourselves?

~~~
all_blue_chucks
Never humiliate a coworker in public. Instead say "both options were
considered but ultimately it was decided to select option B for reason Y."

~~~
caiobegotti
My professional experience tells me that the next question will be who decided
for B given Y, then you answer it and then you have a target on your SRE back,
I'm afraid. Remember that the trickle-down economics works only when the shit
hits the fans and what trickles down is not money.

~~~
all_blue_chucks
If your company uses post mortems to blame individuals rather than fix
processes and tools, you haven't worked in a professional environment.

------
Lyren
I received communication ~15min ago that they're actively looking into the
issue. I submitted the ticket roughly 20min ago. So it seems they're aware.

It doesn't help that their status page is also hosted on IBM Cloud.

------
whyleym
Found this from a user on Twitter - "Our status page for IBM Aspera is on
StatusPage, so you can track here as a bank shot:
[https://status.aspera.io](https://status.aspera.io) "

------
gatvol
Well if they cannot foresee this eventuality, what else are they missing under
the hood?

------
julianeon
Seems pretty dumb to host a status page in a way that it could go down, when
it should be a static page that is trivially hosted on CDN's worldwide.

~~~
koolba
You can’t cache it for that long though.

A better approach is to have it hosted on a different cloud platform. If you
really care, you’ll set it up on a different domain and nameserver as well
with a long lived redirect (cached on CDNs) from the usual status.example.com
or example.com/status.

~~~
julianeon
Thanks; you're right - the caching would be a problem, so your solution makes
more sense.

------
sky_rw
The most infuriating thing about this is the ZERO communication coming out of
IBM Cloud. No emails. No updates to twitter. Status page down. Support lines
clogged.

At least give me something I can point my customers at to show them this is
not due to my incompetence.

~~~
bizt
Yep, super annoying I had to link my customers to a techcrunch page :(

------
shaabanban
Also still no communication from IBM that anything is wrong.

~~~
Operyl
Account managers are texting, but they have no VPN access right now.

~~~
adrr
It seems all their external network connections are down. I assume people will
have to drive to the data centers to fix. I really want to see a post mortem
on this outage.

~~~
wmf
The data centers are staffed 24/7 and out of band is also a thing.

------
akerro
Haha, amazon had the same problem a few years ago when they had fire in
datacenter, their status checker page was hosted in the same building and was
showing everything is fine, while 1000s of websites hosted on AWS were down.

------
shaabanban
wonder if we'll ever get a post-mortem about this... Seems to be global

~~~
Operyl
Maybe. About 3/4 of all outages get a post mortem. There's 1/4 of the time
they refuse to tell us anything.

~~~
mbreese
There will have to be a post mortem on this. The convention is to be as
transparent as possible as to what went wrong. This helps to let current
customers know that you found the problem, and have put plans in place to make
sure it doesn't happen again.

The purpose of the signalling here is two fold.

1) If convincing enough (with details), you can keep current customers from
moving to a competitor.

2) It also lets new customers see how you actually handle a crisis. If they
can manage the crisis well enough, then you can point to this instance to
prove your technical knowhow to handle their needs.

If they don't tell anything, or aren't transparent, then they can expect a
mass exodus of customers.

~~~
bigiain
> then they can expect a mass exodus of customers

I wonder if that's a thing that would even cross a typical IBM-ers mind? It
might just be me, but I get a very strong smell of "We're IBM! There's nowhere
else for you to go!" from them...

------
thephyber
How sure are we that this outage is limited to IBM cloud?

Pindom[1] had a spike of website outages from 11k => 27k.

[1] [https://livemap.pingdom.com/](https://livemap.pingdom.com/)

~~~
Nextgrid
It's most likely customers of IBM cloud whose systems rely on something hosted
there and are thus down as well.

~~~
thephyber
Yes, I considered that possibility before posting.

~~~
rat9988
I'm not sure what you are trying to prove with your comment then.

~~~
TallGuyShort
Sometimes people ask questions when they aren't sure and don't have anything
to prove.

------
AaronFriel
Ah, is this the exception that proves the rule that "no one was ever fired for
buying IBM?"

Sorry to be glib, I'm sure it's a tough time for people who were sold on their
cloud platform and work on it!

~~~
mark-r
Everybody's cloud goes down sometime. The big fail here was hosting their
status page on the same infrastructure.

~~~
oceanswave
But usually only a single AZ or region... seems like this is bigger?

------
Operyl
Yup .. hit us pretty badly. Our account manager doesn't know either.

------
homeglue
I've seen multiple services get affected this morning including Sendgrid,
Nexmo and Up bank, all at the same time. Wondering if this is related.

------
leetrout
Hugops.

Hope they get a root cause and a quick fix. I’m not a fan of their cloud
service but I know people working on the outage and fix are stressed.

------
kitteh
About a month ago their Northern Virginia region was down. All the BGP
prefixes associated with it disappeared from the internet (routes withdrawn).
This time (I went to check when someone mentioned it) they kept advertising,
but all traffic went nowhere once it got into their network. Curious to see if
there is an RFO released.

~~~
aiisjustanif
I wish we had a record of this.

~~~
kitteh
I do. I store all this stuff. Where should I put it?

------
nonines
This looks related (smoking gun?)
[https://status.aspera.io/incidents/t9r03x71dxkl](https://status.aspera.io/incidents/t9r03x71dxkl)

>> A 3rd party network provider was advertising routes which resulted in our
WW traffic becoming severely impeded.

~~~
rbanffy
It can only be attributable to human error.

No IBM computer has ever made a mistake or distorted information. They are
all, by any practical definition of the words, foolproof and incapable of
error.

------
stevehawk
guess they didn't learn from AWS and hosting their status pages (in particular
their icons) in S3

------
bantec
It’s a second significant issue for last year with IBM( absolutely
inconsistent for critical infrastructure (we are FinTech)

------
cerw
Been like that for last 1h, Network packet Sydney (GCP) to Sydney (IBM) 62%
packet loss

------
ck2
even weather.com was down but someone broke ebay too

    
    
           Fastly error: unknown domain: www.ebay.com. Please check that this domain has been added to a service.

~~~
toast0
weather.com makes sense. IBM bought the weather channel a while ago, hosting
is likely tied to IBM Cloud at this point (although it looks like it's fronted
by Akamai)

~~~
vmh1928
IBM bought the technology part called the Weather Company. That's the part
that gathers weather info from all over and makes it available.

The cable TV channel is still independent.

------
pmarreck
Imagine hosting your status page on a different domain

~~~
9nGQluzmnq3M
DNS worked fine here, this was an infra issue.

------
nadavami
It seems like the status page just came back up.

------
woakas
Our site (ubidots.com) does not have a complete down, but the IBM network has
a high latency.

------
someguy12321
heads be rolling tomorrow!

------
anon102010
A quick check of cloudflare's isbgpsafeyet page

IBM Cloud - unsafe

At least AWS signs their routes I think.

If you can't even sign your own routes - hard to have a ton of pity.

~~~
kortilla
Signing routes doesn’t mean others reject unsigned routes. AWS is just as
vulnerable to hijacking as anyone.

