Hacker News new | past | comments | ask | show | jobs | submit login
OVH is down along with its status page (ovh.com)
234 points by rgun 15 days ago | hide | past | favorite | 156 comments

Tough timing, considering their IPO is tomorrow [1].

[1]: https://ipo.ovhcloud.com/sites/default/files/2021-10/OVHclou...

Wasn't the fire in one of the Strasbourg's datacenter the night after the announcement of the IPO ? They have bad luck.

they also have garbage it support

This is one of those things I keep hearing but in my case I experienced the opposite. I have been using OVH for a long time, and their paid support is actually really good.

One of our projects used to get ddosed and as a small company the only reason we managed to survive was thanks to OVH. When the attackers kept changing their attacks, we were direct to talk to their engineers and they helped fix the issue, all we had to do was send a bunch of packet captures to them.

Over the years since then we had a few issues and every single time their customer support was really helpful.

I have used paid support from google, azure and amazon, but I have had a better customer service experience with OVH overall.

Maybe competitors doing their worst to cripple the IPO? Are hosting/CDN firms evil enough?

If this is an unforced error that would be monumentally stupid.

> Those last few days, DDoS attacks intensity has grown a lot.

not unforced, prolly ransom requests before IPO

Yes, that would be the expected thing. Around an IPO you should have your house in order otherwise you will get burned.

So ovhcloud seems up, where's that hosted?

Stuff is apparently starting to come back up. Mostly still inaccessible though.

Pure coincidence.

not really, from Octave’s twitter it seems that DDoS attempts have increased which is usual right before an IPO. They were installing new gear because of the higher DDoS.

Could you please elaborate on this? I’ve never heard of that being related to an IPO before, so I’m wondering if this is a standard practice for competitors to get known groups to do this sort of thing, or if this is an economy destabilization measure that’s in the playbook of some nation state actors. Any insight there?

Like other said. Its blackmail for money. Essentially "would be a shame if your website is down during your IPO, maybe you should pay us so we don't DDoS you to hell".

I imagine this is especially common here since one of the services offered by OVH is DDoS protection. Hence being down from a DDoS would be even worse press.

It's more likely to be a "DDoS until you pay a ransom" sort of deal where the attacker thinks the company will value uptime more highly during that period.

You've clearly never run a web service of any sort... you can expect DDoS in response to pretty much anything you do. Not seeing in increase in DDoS attacks before an IPO would be more concerning.

*NATION state actors. Damn mobile keyboards. Stupid app wouldn’t let me edit the reply either, hence this.

The term is state actor, anyway.

A state is a sovereign political division, or a subdivision within a federation (France, Arizona).

A nation is a community of people with a common culture, language, ethnicity etc (the Navaho, the Welsh, the Japanese).

When these are the same entity, it's a nation state (Japan), but there are many cases where they aren't the same (Belgium, China, USA, Russia).

> A nation is a community of people with a common culture, language, ethnicity

Nitpick: The belief of the existence of the said common ground is actually more important than its actual existence. France is considered a nation state even though there are several ethno-linguistic groups in France. A few illustrations:

- less than half of the French population during the French revolution spoke French as their native language.[1]

- The second Nobel Prize of Literature from France (Frédéric Mistral) didn't wrote its work in French, but in Occitan[2].

- My family has lived in France for as long as genealogy could trace back, yet half of my great-grandparents only learnt French in school, they didn't spoke it at home.

[1]: I don't have an online source for that, nor the exact figure: I read it in Frernad Braudel's L’Identité de la France

[2]: https://en.wikipedia.org/wiki/Occitan_language

"Nation-state" is an little bureaucratic Americanism because if you talk about a malicious state actor there everyone assumes you mean Florida.

Octave Klaba (OVH CEO) on Twitter [0] :

  Following a human error during the reconfiguration of the network in our DC in VH (US-EST), we had an issue on all the backbone. We're gonna isolate the DC VH then fix the configuration

  Those last few days, DDoS attacks intensity has grown a lot. We decided to increase our DDoS handling capacity by adding new infrastructures in our DC VH (US-EST). A bad router configuration caused the network crash.
[0]: https://twitter.com/olesovhcom/status/1448196879020433409

Seems like the best DDoS attack is to persuade the target to change their backbone configuration. Then they’ll take themselves out.

so. how do these things happen? juniper had/has "commit confirm" for ages. or that would not help for certain things?

Easy to be really sure of a config change that only later obviously was terrible?

It's now back: https://status.ovh.com

Short lived incident fortunately...

I can ping the status IP but no HTTPS available from it, so routing is working externally.

  72 bytes from icmp_seq=1 ttl=249 time=276 ms
  72 bytes from icmp_seq=2 ttl=249 time=1041 ms
  72 bytes from icmp_seq=3 ttl=249 time=313 ms
  72 bytes from icmp_seq=4 ttl=249 time=198 ms
  72 bytes from icmp_seq=5 ttl=249 time=226 ms
  72 bytes from icmp_seq=6 ttl=249 time=390 ms
  72 bytes from icmp_seq=7 ttl=249 time=865 ms
Forgot I was using IP over carrier pigeon


Even worse if you can believe that. Hopefully it is temporary.

Does not work for me

Apparently it's only the status page. My mail servers have been down for just about a hour.

How is it still down for me..?

Also, I can access my OVH VPS from an Italian IP, but if I try with a VPN from outside my country, I get error 521.

Down here. Comcast west coast.

Down for me too. Although, seems like this time only the status page is down.

Still down from Aus

down from the UK

didnt facebook have the exact same issue the other day but theres was all there datacenters rather than a single DC?

Exact same? I doubt it. The only similarity (that we know of) is that there was a router configuration change. In Facebook's case, there wasn't even supposed to be a configuration change, just a look at some data.

So the CEO's twitter feed had information at 08:00 UTC, but the official ovh_status feed didn't even acknowledge the incident until 08:22.

That seems the wrong way round.

The description of the ovh_status account explicitly mentions it's an unofficial account:

> Unofficial OVH Status Feed

Argh. And it has a pretty user image which attempts to say that.

Only the image is conveniently cropped to a circle so that the "Un" part isn't visible and it reads "OFFICI".

Twitter and Reddit are the true status pages

> "gonna"

Truly, only a professional outfit could have written a Twitter update with spelling like this.

The tweet is in French. That was a rough translation

irony being you think twitter updates is professional

semantics > syntax

It looks like only IPv4 is affected. IPv6 is working fine to a couple of test endpoints.

   3  2a02:c28:1:6506::106  2.131 ms  2.124 ms  2.117 ms
   4  2a02:c28:11:6::100  15.695 ms  15.689 ms  15.682 ms
   5  2a02:c28:1:1900::19  15.998 ms  15.991 ms  15.985 ms
   6  2a02:c28:0:1819::18  15.259 ms  15.051 ms  16.975 ms
   7  2a02:c28:0:1718::17  16.968 ms  16.403 ms  15.619 ms
   8  2a02:c28:0:1731::31  15.406 ms  18.620 ms  18.554 ms
   9  2001:7f8:4::3f94:2  12.025 ms * *
  10  * * *
  11  2001:41d0:aaaa:100::5  21.622 ms  38.813 ms 2001:41d0:aaaa:100::3  37.231 ms
  12  * * *
  13  * 2001:41d0::25f1  19.507 ms 2001:41d0::c68  20.892 ms
  14  2001:41d0::513  19.685 ms  19.674 ms 2001:41d0::50d  18.248 ms
  15  2001:41d0:0:50::5:10a1  19.092 ms  19.917 ms 2001:41d0:0:50::5:10a5  20.126 ms
  16  2001:41d0:0:50::1:143f  19.375 ms 2001:41d0:0:50::1:143b  24.609 ms 2001:41d0:0:50::1:143d  18.933 ms

Ugh. You reminded me that while OVH used to hand out a /128, if you needed more address space you had to pay more for “enterprise” IPv6 or something.

Is their enterprise prefix /64? That would be even funnier. :)

lolwut ?

Don't consumer ISPs hand out /64s or /48s ?

Yeah, Comcast hands out /64s on residential connections. /128 is literally just a single address, which is absurd if true.

Unfortunately, it was(is?) true. Not just OVH though, even Scaleway also had these silly /128 shenanigans.

they give a /64 to all dedicated servers

Their entire network is down. It seems connected to some routing "improvements" they had planned that went REALLY wrong:


... and yeah, lots of sites impacted

Our servers in Strasburg were down for around 10 minutes, now they are fully operational but it looks like some of their sites are still experiencing issues [1]. On the positive side: it was my second day as on-call and I already had the worst incident(no ssh, provider site is down).

[1]: http://status.ovh.com/

I don't think this counts as a worst incident. Since there is nothing you can do you might as well go take a nap and wait for it to pass.

On the other hand something like Gitlab's "We deleted the production database and our backups were not working for months" sounds much more stressful to me.

I was just going for a peaceful morning walk, trying to start the day with some sunlight. As soon as I sit on the first bench with sun I can find, the dreaded message comes: "everything is down".

Thank you! I had almost forgotten about my beloved provider since the last outage :)

One more and you get to make a sign.

It has been [17] days since the last outage.

Azure is also having issues with deploying and starting virtual machines.

EDIT: To those having issues starting/deploying Windows virtual machines in Azure and are using ARM to do so: change the OS type to Linux instead of Windows. This seems to resolve the following issue:

> Error: No version found in the artifact repository that satisfies the requested version '' for VM extension with publisher 'Microsoft.WindowsAzure.GuestAgent' and type 'CRPProd'

That may be an interesting effect. People test their failover solutions if one of their provider is down when the provider is not actually down. But when it is actually down, everyone else is executing their failover plan at the same time, and compete for other clouds resources.

Hence the BS of cloud marketing imagine AWS US-East going down there is no capacity to absorb that anywhere.

Do you have any numbers/sources to back that up? I find that hard to believe, mainly because US-east1 goes down so much. It's never been hard-down across three AZs level of being down with no EC2 no VPC no nothing, but S3 being down in the whole region is pretty freaking close!

AWS us-east-2 (Cincinnati) is generally pretty reliable; it pays to use your head and not simply use the defaults.

There was a leaked dock that detailed AWS datacenters a while back and US-east dwarfed everything else.

Everything else in AWS maybe, but that's a different claim than the rest of the cloud (Azure, GCP, Oracle...) couldn't absorb that load.

For each cloud provider spare capacity is portion of their actual capacity so considering anything beyond GCP is not even a thing. Now since most of the spare capacity is being utilized as spot instances there is not that much in reality as many spot workloads are actually cost optimizations and would shift to other types if spot instances are not available.

Interesting indeed but I doubt that "everyone else" is executing their failover plan, simply because most don't have a failover plan including running on another cloud provider

looks like funny routing.. packets to ovh.nl go to Asia, and then suddenly find a faster way back to me

   3  0.ae21.xr4.1d12.xs4all.net (  7.324 ms
      0.ae21.xr3.3d12.xs4all.net (  5.929 ms
      0.ae21.xr4.1d12.xs4all.net (  5.864 ms
   4  0.et-1-1-0.xr1.tc2.xs4all.net (  7.562 ms  5.487 ms
      0.et-7-1-0.xr1.tc2.xs4all.net (  5.551 ms
   5  asd-s8-rou-1041.nl.as286.net (  6.034 ms  5.759 ms  5.608 ms
   6  ae11.cr6-ams1.ip4.gtt.net (  6.620 ms  7.938 ms  6.670 ms
   7 (  7.348 ms  6.170 ms  6.320 ms
   8  if-ae-45-2.tcore2.av2-amsterdam.as6453.net (  310.705 ms  257.678 ms  257.278 ms
   9  if-ae-14-2.tcore2.l78-london.as6453.net (  257.217 ms  256.759 ms  259.044 ms
  10  if-ae-2-2.tcore1.l78-london.as6453.net (  258.850 ms  256.723 ms  256.735 ms
  11  if-ae-12-2.tcore2.mlv-mumbai.as6453.net (  257.349 ms  255.042 ms  257.338 ms
  12  if-ae-16-2.tcore1.svw-singapore.as6453.net (  248.550 ms  257.559 ms  329.240 ms
  13  if-ae-2-2.tcore2.svw-singapore.as6453.net (  307.149 ms  255.582 ms
      be101.mrs-mrs1-sbb1-nc5.fr.eu (  181.205 ms
  14  be101.mrs-mrs2-sbb1-nc5.fr.eu (  180.612 ms *  179.738 ms
  15  * * par-gsw-sbb1-nc5.fr.eu (  186.377 ms

"No impact expected" - famous last words

Statuspage is down with https, but not without:


When you use OVH you have to be ready for this kind of stuff. They are cheap for a reason.

Hetzner is the cheapest and their cloud offering is solid. Heavily under ratted provider. Price to value, interface, documentation, API, libraries. Everything top notch.

Well, even if you don't use OVH you should be ready for this kind of stuff.

Hetzner is also cheap but for some reason they don't seem to suffer from these kinds of issues.

I wish they had US location

With them recently announcing that they will start charging tax for US sales, `ash1-speed.hetzner.com` resolving to an IP address (yet unroutable) under a different Hetzner ASN, and this one[0], I believe they are already in for a US region possibly in Ashburn, Virginia.

[0] https://www.bizapedia.com/va/hetzner-us-llc.html

Wow thnx for the info that is exciting news

LOL. OVH has turned into the laughing stock of the industry. Its a shame really, I always used to gravitate towards their offerings vs competition because I liked the fact that they have servers literally everywhere...but what good are servers everywhere when u get a major outage every few months?

Perhaps I'd tolerate the constant outages if they were actually any good at communicating. Their status page sucks. They rarely update it properly, and when they do, its hard to find the information you need and that you know relates to your servers.

Well, some products/businesses don't need 99.999% uptime and that's fine. Hell, I'm working on a small shop and we had to update some service that is not used constantly by our customers and when I suggested to have a downtime of 5 minutes everybody looked at me as if I were a caveman. The other alternative was to come up with some custom strategy to have two instances of the same service running at the same time and kill one of them when we know the other is healthy. Now, for a service that is constantly used that's a good idea, but for our scenario a 5 min. downtime is fine... we are not Google.

If you have to deal with a few outages for a 90% cost reduction, then you just make sure you have a plan. I've found OVH to be mostly reliable.

Things do happen even with AWS.

Minus the fire incident, can't recall any outage. Not the cheapest offering out there but been running many things on it on autopilot. Support is solid too, always had a good experience.

I'm not affiliated with them in any way, but historically, Online.net (now rebranded as "Scaleway") has always been the main competitor of OVH in France. They were known for their "Dedibox" when I was younger.

OVH kinda won because their VPS pricing was always dirt-cheap compared to the cheap dedicated servers provided by Online.net/Scaleway. Since then OVH VPSes got expensive, and Scaleway started offering cheap VPSes, the prices got aligned. I know a few people who used to be OVH-fanboys and they all migrated to Scaleway and Vultr the last 4 years.

But I agree, the competition for unmetered reliable VPSes is getting thinner and thinner… People are even started to consider metered VPS solutions like Vultr…

Rasmus Lerdorf (the author of PHP) did an objective non-sponsored comparison[1] in 2019, I'm using IONOS personally, but since then strato.de appeared. (Both are German) I'm thinking about adding another node from them to avoid relying on only one vendor.

[1] https://toys.lerdorf.com/low-cost-vps-testing

strato and ionos is basically the same company. well the same company group. they probably do not share network/hardware tough.

https://en.wikipedia.org/wiki/United_Internet (https://de.wikipedia.org/wiki/United_Internet)

I don’t know how is OVH doing on average nowadays in relation to downtimes, but I worked for a company around 8 years ago that had part of its infrastructure running on OVH. That thing was a sad joke and we ended up migrating everything to AWS. Pretty much every month we had issues with them.

What sorts of issues?

Downtimes due to incidents on their data centres. We used to joke saying things like “oh, that must be the OVH guys having some fun on the datacenter again”.

They just posted an update : https://twitter.com/olesovhcom/status/1448196879020433409

Following a human error during the reconfiguration of the network on our DC to VH (US-EST), we have a problem on the entire backbone. We will isolate the DC VH and then fix the conf.

Our whole OVH infra is down, composed of OVH baremetal + SoYouStart an OVH sister company.


is also down which is their WIP tasks dashboard, having worked there I can tell that something definitely went very very wrong


We recovered our OVH sandbox env (yay) but prod is still very down, guess they are rolling back DC by DC

Update 2:

Our call center provider has VoIP provided by OVH as well, they are down as well... It's going to be a very long day...

This seems to have brought down Snapchat as well... (not the target audience particularly, here, but still relevant); didn't expect snapchat to rely on OVH at all, so it could be a coincidence.

i can confirm that this is not true. talked to an employee. snap hosts on google cloud.

obviously I cant rule out that it's not related in some way or the other, but at least not directly.

As someone that runs an uptime monitoring service, it was wild to see so many sites go down that I wondered whether it was my monitoring that was screwing up!

Where do you host your monitoring service. And how do you maintain uptime?

AWS, and the monitors themselves are geographically redundant. So if ap-southeast-2 goes down, it uses another region like ap-southeast-1.

The web app that configures the monitors is deployed separately, so the web app can go down while the monitors + alerting keeps running.

So it is solely running on AWS? Or are there other clouds as well?

Solely on AWS for now.

What's the rough downtime an email server can handle without losing any emails? The greylist filter relies on most people's emails resending after a few mins if the first is rejected, so guessing a short enough downtime has no impact?

Defaults depend on the application and actual settings depend on the admin/postmaster. So retry queues will typically be somewhere between 3 to 7 days, depending. High volume sending servers may be shorter. It is rare for short outages to cause emails to be lost unless the outage is due to a fire and the server holding your message melted.

OVH has many issues.

One of their four data center burned down in march.

A few days ago they sent me an email that they were deleting my account soon, and a few hours later another E-Mail stating that it was an error.

Now this.

Tomorrow is their IPO, it will be at a very high valuation, lets see how that plays out.

One of their four datacenters in Strasbourg. They have many others accross the world.

Back up for our servers in Australia.

Edit: Our US server also.

Total downtime around 50minutes.

You can follow the status page here: http://travaux.ovh.net

It seems to be back up, I can access the Control Panel and my servers. So around 1 hour downtime for me, 7:20 to 8:20 UTC.

What could be the best way for status page apart from third party services? I remember one project that used IPFS as distributed status page. https://news.ycombinator.com/item?id=16273609

That third party service has still got to be hosted somewhere though. What happens when the cloud they're running on goes down?

The rule should be host your status page on your competitor's cloud. If you're AWS, host it in Azure, if you're Azure, host it in GCP, if you're GCP, host it in AWS. (Linode, Digital Ocean, OVH, etc can do their own dance.)

Has OVH gotten substantially worse in the past couple of years? We ran a decent cluster in GRA-3 for a few years, and I can't recall having any real downtime. But lots of comments here lead me to believe that other people have very different experiences with OVH.

Died in a middle of a game of Lichess

It seems like (only?) their name servers are down. I got my website back up by avoiding to use these (it's hosted on a dedicated server at OVH). They'll probably have the same kind of issues as facebook did while trying to get things back up.

Let's hope it is not another container fire.

[1] https://thestack.technology/ovhcloud-fire-strasbourg/

We had a couple network blips but all our systems seem basically fine now.

The status page is still hosed, but I'll take "my systems are up but the status page is down" over the converse any day ;)

Does anyone have an alternative for a hobbyist VPS provider (around 5€/month) ? OVH keeps reducing their offering and bumping their prices. Plus they dropped support for Arch images.

I can recommend https://www.scaleway.com/en/ I have been using it for severa. years, never had an outage.

I use hetzner, their smallest VPS is around 2,xx Euro. Has been very reliable for me in the last couple of years. I'm not sure whether they support arch, though.

And the "nicest" UX from all the companies for their cloud-ui-interface.

You can often find deals on https://lowendbox.com/

I'm using cheap Contabo servers and they've been surprisingly reliable for the price they ask. Their €5 offering (excluding VAT) comes with 4 cores, 8GiB of RAM and either 50GiB of NVME storage or 200GiB "ssd" storage. That's a lot better than anything I can find elsewhere.

The only problem I've found is that unlike most VPS providers, they don't seem to do gigabit speeds in their cheaper tiers. I'm content with the 100mbit offering (which is 200mbit if you sign up these days, I think) so that's not a problem for my use case.

If you only run very basic stuff (basically a raspberry pi in the cloud) then Oracle offers two VPS servers for free (not "free for a year", "free with a purchase", actually "free forever", which was a huge surprise to me). You get very little in terms of performance but for a website or small project that should be sufficient.


They have many locations and I used their VPS for a long while before I moved to colocation. Had no issues apart from when I accidentally deleted /etc/.


Quite attractive pricing and been an user for about 2+ yrs, never faced any issue so far, probably luck me?

I've been using SSDNodes for some stuff for the past year. Had an issue with my VPS in Dallas, contacted support, they fixed it right away - response in a few minutes, fixed within half an hour.

They appear to support Arch? https://www.ovhcloud.com/en/vps/compare/

Click through to actually ordering, and when you select your OS, you're given a choice of only Fedora/Centos/Debian/Ubuntu/Windows Server. When I was looking for a host that supported Gentoo I found similar false advertising from them (and many others), they don't actually do it. In theory some of them will give you a recovery shell so you can then bootstrap whatever distro you want, but once you do that you're in very unsupported territory.

I use archlinux on OVH, so in this case the advertising is certainly not completely false.

Linode and Digital Ocean are pretty good.

Good experience with Hetzner Cloud, they support Arch. Only EU data centers though.

Prior to the outage I received an erroneous email from them about an old account, followed by an apology about it. It seems they were having other issues than just the router config.

Router maintenance, no impact predicted. Many people saying that if "all servers down everywhere" is not impact, they wonder what actual impact is.


Which is stupid. If I say I'm walking to the store and will be gone for 30 minutes, then get hit by a truck and spend a week in the hospital, nobody would say "30 minutes, huh?"

Ah, I was wondering why my scraper was failing a lot this morning. good to know

migadu.com is also down, connected?

edit: also can't get access to ovh vps via ssh

Lot of websites will probably be impacted

Migadu is indeed using OVH servers.

Jitpack is down for me too. I wonder if this is related.

Mxroute is also down as a result of this outage.

Seems to be back 07:41 UTC.

I became able to reach our servers in London, and www.ovh.com, from UK locations and from DigitalOcean.

2 ( 1.095 ms ( 0.919 ms ( 1.067 ms

3 ( 1.365 ms ( 1.354 ms ( 1.549 ms

4 ( 1.179 ms 1.168 ms ( 1.114 ms

5 * ( 1.167 ms 1.155 ms

9 vl1272.lon1-eri1-d1-a75.uk.eu ( 2.594 ms be101.lon1-eri1-g2-nc5.uk.eu ( 2.654 ms 2.466 ms

14 ( 2.102 ms 2.091 ms 2.076 ms

Both my Montreal based servers are still down. Seems like their resellers are also, unsurprisingly, impacted

Still down from here

It is up here at 08:27 am UTC

probably bgp? It seems bgp is always the issue?

Well, that's a bit like saying SMTP or IMAP is always the issue when your email stops working.

If they screwed up routing, BGP was very likely to be involved, but the cause was probably human error, failure of process design, failure of system design, etc etc.

Internet hiccups

Let me guess: BGP? DNS?


Yes, because no network issue ever happened in the past with the "greybeards".

Do we really need these kind of comments?

> Yes, because no network issue ever happened in the past with the "greybeards".

These issues, one might think, should have turned into tomes of extensive information on why they happened and how to avoid them, and become an integral part of showing the ropes to new sysadmins, or operation persons, you name them. It seems, however, that by and large the knowledge of the actual working systems does get irretrievably lost once some Kevin gets retired.

P. S. Would you be just as cavalier if lives were lost as a result of such incidents?

We don't know the ages of the people involved with this incident, so this just comes off as gatekeeping.

Not gatekeeping, rather lament that the next generation doesn't tend to dive deeper into the nuts and bolts of how those things actually work and what the failure modes are.

Maybe the greybeards don't do a good job at educating their successors, or maybe it's the "tl;dr, YOLO" attitude at work. I don't know. I suspect a combination of both.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact