Networking outage again. Sigh. So glad I just moved my prod servers to gcloud last week...
To be fair to Linode, the Fremont outages always seem to be caused by HE.net (the "upstream provider"). Used to co-lo in the same datacenter and had issues like this constantly. Not sure what HE's problem is.
The right solution is to get two upstreams and BGP.
...
which is unfortunately rather more difficult when you are in a datacenter owned by he.net :( It's possible, though, and I thought linode had done so, I think XO communications is in that building (that could be the other fremont he.net data center, it's been a while. Who owns XO these days? Verizon. I have been out for a while.)
I was in that same datacenter for a while, and I left due to power outages rather than network issues. (this was... man, almost a decade back now; most of that power equipment should have aged out by now.) - but he.net is a budget provider; I used them in my bgp mix for a lot longer...and usually they are fine. But I wouldn't use them without a backup. (he.net and cogent or whatever other cheap bandwidth is available usually is cheaper, and in aggregate, more reliable than a top tier provider on it's own.)
Identified - At this time we have identified the connectivity issues affecting our Fremont data center as
being the result of a power outage. Our team is working as quickly as possible to have connectivity restored.
We will provide additional updates as they develop.
Yup, and looks like this was caused by another power outage... had the same thing happen when I was there.
Surprised they still haven't added proper backups after all these years.
when I was at he.net, the outage reports were all about ATS failures and the like. I mean, switching that much power is serious busines... a lot of the cut-rate datacenters I've been at mostly had power outages when their switching equipment failed, even when they had decent UPS and generator capacity.
Personally, if I were building a cut-rate datacenter with used/unreliable power equipment? I'd build two parallel power systems to each rack. Each customer would get two independent power feeds (and stern warnings to not use more than 80% of one circuit total) (I mean, while new kit is super expensive, used power conditioning equipment can be had on the cheap if you are in the right place at the right time.)
Then, so long as customer buys dual PSU equpment (or their own small ATS, and I've had better luck with 20a automatic transfers switches) they'd be fine.
Of course, I've never gotten the chance to try, (well, I did kind of have a chance once, but I kind of blew it for non-technical reasons before I got to build out a power system of my liking.) so maybe I'm missing something. Maybe the electrician work is more expensive than I think or something.
I'm a little surprised that anyone with any scale still uses he.net for physical hosting; Last time I tried to negotiate with them, at my scale (20kw) they essentially charged $400 for every 1.4kw, (and they gave you a rack with that 1.4kw) which is kindof a terrible deal compared to higher density co-location providers like coresite or BAIS, where your racks are closer to 5kw each, but that rack usually comes closer to $1300 or so.
(Most rackmount servers are power-dense enough to eat 5Kw of power in 42u, with some space over)
This means that after your first few servers, the higher end datacenters are actually way cheaper.
I mean, don't get me wrong, I think he.net bandwidth is a great part of anyone's BGP mix, and I think that he.net hosting is the best deal around if you only have 1.4kw of power and want your own rack (often they will sell you your own rack, 1.44kw of power, and a nice unmetered gigE pipe for $500 a month, which is a killer deal for a rack and a gig. But it's kinda terrible if you are using serious amounts of power)
On the other hand, the he.net remote hands are... some of the best. I mean, they are remote hands and super junior people, with all the problems that come with that, (remote hands is, as far as I can tell, a junior sort of job many places I've seen.) but they are on site, on the ball, and every time I called in, they were right there, and it's free.
Last time I called coresite for remote hands, they said they would charge me $200/hr... reasonable, I suppose, for getting someone junior-ish out of bed, but they had to page the person, and their person still wasn't there two hours later when I was able to arrive.
most (normal?) datacenters have A/B/C power feeds. Also east/west network runs. even if you don't have multi-line PSU's you can at least stagger the racks. part of being cut-rate is that they don't have these things :(
corsite (and all but the lowest end datacenters) will sell you multiple power feeds... but you usually have to pay substantially more for those feeds every month than you would pay for the same watts out of a single feed, and my experience at the newer coresite locations is that the single feed is reliable enough. (when I was in charge of prgmr, we moved from he.net with single feeds... well, we ended up at coresite santa clara with single feeds. I think there hasn't been a datacenter fault power outage at coresite, in like 7 years, even though I was on a single feed. There were a few outages that were my fault, but that's a different thing.)
I spent some time in the (I think now defunct) SVTIX co-lo at 250 stockton. It was super low-end, super shoestring, in some ways lower-end than he.net - but they had multiple power systems and would sell you a/b power if you wanted to pay enough for it. 250 stockton was certainly built out to higher standards than the he.net datacenters, though the he.net datacenters were better maintained.
(My favorite "my fault" power outage story was that I would give my friends co-lo access, right? And then I'd call them when I needed remote hands that knew what they were doing. Well, you know how cheap nerds are. One of my buddies shows up with one of those ancient PSUs that have a 110-220 votage switch. My rack was 208v. He had the switch on the 110 setting. It fried my PDU, the same avocent that everyone in the rack was on. The outage wasn't long; I had a spare in the office that was only a few minutes away. It was super annoying, 'cause I had earlier given the man a perfectly good 1u chassis that came with a modern 100-240v autoswitch psu... which he put in. This was his second computer; I would have been happy to give him another chassis/PSU. I was so mad. So mad!
I actually didn't see him for a while... the next time I saw him... I was at a party, telling this very story. He walks in right in the middle. I point and kinda shout 'It was him!" which, yeah. uh, I should not have done that.)
heh.. think I had a rack near prgmr at svtix. I used to play a game with seeing how little information I could give them before they unlock my rack (and usually different people working there)
1) ID, rack hall, rack number, company name.
2) no id, name, etc.
3) no id, company name, "I think it was left?"
Its kinda sad that linode would trust HE.net for their provider.. I would have expected a better partner, I used to host with Linode in london and it was rock solid
In my experience they stay very competitive on CPU and memory and decently competitive on disk. At my scale and needs I don't know of a better vendor for the dollar.
I noticed that there were over 1k support tickets created during the Fremont outage (based on the unique ID). This is a pretty sizable customer base and must be terribly impacting to have such a prolonged outage.
Did think it was pretty funny my host had to contact their host to figure out what happened.
> Our team is still in communications with our upstream provider to determine the cause of this outage.
Yeah, I am getting hit by this too. I still want to support Linode but can't if this is going to be a regular occurrence. How long ago was the last outage?
The funny thing is that our ex CTO thinks that using AWS has more disadvantages than having all of our servers in two different regions in Linode. Mind you, he also thought that CDNs were not as useful as serving assets statically from those two servers as well.
If you use two different Linode regions this outage shouldn’t affect you at all?
The CDN thing is a pet peeve of mine too. Are you even measuring load performance? Where are your customers located? If you’re just wanting to throw everything on a CDN “because it’s faster” then no it’s not very useful at all.
This is kinda what a status page is for, not the HN front page - linked being down is nothing new - it’s why we moved our stack away to AWS, what, seven years ago? Well, that and the horribly overloaded switches and poor implementation of the Xen packet scheduler that caused unacceptable latency spikes to memcached.
It’s great for hosting toy sites that sit on a single box and don’t have an uptime SLA - and people saying “well it’s cheap” - moving to AWS saved money, as we had to run big linodes for the workload purely for the additional internode bandwidth.
Well yeah but then where do you tell your story about moving to AWS?
If it gets upvoted enough to hit the front page enough people must be interested in it - there's a hard low-end cap if I recall, must have at least x votes before it's even considered for front page
For popular services such as Linode it also makes sense as if you're ever thinking "Man I see this all the time on HN" it's probably a good call to avoid that provider
You do realise that this post is a link to the Linode status page. I'm not sure what you're saying in that first sentence.
All I ever hear about Linode is their great speeds but also their downtime issues. I'm still on DigitalOcean and Vultr. AWS is too expensive for what I use it for right now.
To be fair to Linode, the Fremont outages always seem to be caused by HE.net (the "upstream provider"). Used to co-lo in the same datacenter and had issues like this constantly. Not sure what HE's problem is.