What an irony that the Deutsche Bahn caused an outage of Deutsche Lufthansa. And Deutsche Lufthansa now tries to shift domestic flights to Deutsche Bahn[0] (see also for general advice/help). And who got recently a Star Alliance member? Deutsche Bahn. And who has experience with interrupted cut-off fibers[1]? Deutsche Bahn!
*Please*
All critical system shall be working autonomous. Cache flight/passenger data locally at airport for next weeks. Servers are for synchronizing, not for keeping your data away. With a local cache you can keep going for some weeks, adding/changing bookings will be harder but you can remain in the air. That is pretty complicated and requires more work but is important. Git or IMAP are examples how it should be.
Outages will happen and will keep happening and when the get more seldom, the will become more serious because it will hit inexperienced staff. Especially British Airways is known for issues[2][3].
> What an irony that the Deutsche Bahn caused an outage of Deutsche Lufthansa.
Technically it wasn't the Deutsche Bahn, but a hired construction company. And they just happen to cut some cable from Deutsche Telekom, which also broke the connection from Lufthansa and some others. Those things happen with constructions. But generally, they all are kinda at fault.
> All critical system shall be working autonomous.
It's not always obvious what is a critical system. I mean it was a semi-important sub-system of a bigger company, not something where you would assume it could cause a national impact.
>"It's not always obvious what is a critical system. I mean it was a semi-important sub-system of a bigger company, not something where you would assume it could cause a national impact."
Why would you not assume it was something that could cause national impact when cutting through fiber cables was the very thing that impacted rail travel for Deutsche Bahn 4 months ago in Northern Germany?[1]
DB uses GSM-R(Global System for Mobile Communications Railway)which has fiber backhaul at regular intervals along the tracks.[2]
Given the incident in October is still fresh in memory surely there should have been a heightened awareness to this.
You really think 4 Months is enough time to check all your system for unlikely problems outside your control, redesign them, implement and execute it, and even convince your boss to waste all the money for something which in the worst case will cost most likely less than a solution for preventing it? All assuming that it's even possible to have a solution for every potential problem.
You don't need to check all your systems just the area where you are deploying bulldozers and backhoes into. Further enumerating all base stations and their geo coordinates in a GSM network is trivial.
Cacheing at the edge might help in certain situations but a global reservation and check-in system seems like the kind of thing that needs to be centralized.
I think a more workable solution would have been to treat transit for this data center how most other data centers are organized, with multiple independent uplinks that take different routes into the building. A last-resort point-to-point wireless link would probably be wise as well.
Most of the IT-Plans focus on mitigating the outage itself (e.g. multiple uplinks, backup server, fallback datacenter).
I argue that critical system shall gracefully handling the outage. If a client caches the flight/passenger data (lets say two weeks) you can print list. Grab a pen and keep going.
That is enough for some hours and even days. If you're going advanced you can allow to add (right at the gate/counter) one person to more passengers and so on. You can also go all-in and add code which allows merging of external data i.e. allowing headquarters and airport personnel working at same time.
Die Fluggäste wurden auch nicht per Strichliste in die bereitstehenden Maschinen gelassen, weil nach Angaben des Personals wichtige Informationen zum Abflug fehlten.
If it works that is an appropriate solution.
PS: We cannot see circumstances. Maybe a hack, a burning datacenter, war, co-worker going crazy, international sanctions, storm, flooding, whatever. For example Deutsche Bahn itself suffered recently a sabotage, two entirely independent glass fibers get cut off at same time (one in west-germany, one in east-germany).
Not to mention planes are intentionally overbooked (at least here in the US I presume the same tactic exists to a lesser extent in places like Europe with presumably better consumer protections) to ensure they're flying as close to full as possible so you're constantly needing to access the larger booking information to rebook flights for people bumped or to see who's been moved on to your flight. Two weeks is way too long, a day is probably too long to have a fresh passenger manifest.
Article says that Lufthansa actually has a backup lane and it initially worked. It finally failed later [sic!] on Wednesday, maybe because the load was too much for the backup lane. So they tried - as usual - to lower the chance of an outage. But it is crucial to actually being able to keep working with an outage.
PS: The construction workers seem to have drilled through four cables (each ~ 900 fibres) in 4-5 meter deep below the surface.
This tweet [0] points out that the cable break happened at tuesday afternoon, but the lufthansa problems only occured wednesday morning. So it's more complicated than just one breakage, something else failed too on top of that.
In English one would use 'by' for a deadline and 'until' when the state or action continues up to a point. e.g. "I must finish the task by the end of the day" or "I'll be working on the task until the end of the day". In German 'bis' is used for both cases.
To continue the tangent, I'm currently learning Spanish and appreciate the really long tail involved in language learning. I often work with people who, while they do not speak English as their first language, speak it pretty much perfectly. Except for maybe a few minor "glitches", which would just never occur with a native speaker. I often wonder how pointing these out would be received, and though I presume many would be appreciative, maybe not all...
Before and until are sometimes interchangable in English.
Don't go there before Monday. Don't go there until Monday.
You can get this deal until Monday. You can get this deal before Monday. (Some lack of clarity about whether the offer is valid on Monday with until though)
It should not take too long to resplice all those ribbon fibers unless the site is difficult to get too the armed cable is in worse shape than just a cut.
Unfortunately, it's very difficult to determine the path of redundant fiber without a backhoe.
It is incredibly common for multiple fiber bundles to have a shared path, because it's often easier to locate another bundle in the same place. Fiber is often laid along rail lines because rail lines have the perfect property shape for communication, and there's specialty train cars that make laying fiber alongside the rail really efficient.
Designed, sure. Deployed? All I can say is the number of incidents where everyone was assured that the fiber strands were independent, but in actuality they weren't is a lot more than zero.
It's really easy for a vendor to claim that there's no point at which a single backhoe can take out two of the A/B/C/D/E paths in your diagram. It's also pretty easy to buy dark fiber from two different vendors that are reselling in the same bundle or have their bundles placed in the same conduit.
Wrong. When you build a customer a Metro fiber circuit, the ring is already built, often years before. You simply pull wavelengths out of the fiber and drop them off at at the meet-me room in the POP. Source: I used to do this.
>"It's also pretty easy to buy dark fiber from two different vendors that are reselling in the same bundle or have their bundles placed in the same conduit."
Wrong again. If you are purchasing an IRU on dark fiber you know exactly who owns the physical assets. If you are purchasing wavelengths from a reseller they will happily disclose whose network they are reselling. Additionally you can request the CLR/DLR for your circuit and see exactly how it's built. You can also easily avoid resellers and not worry at all about this.
Lastly you seem to not understand the difference between long haul and metro fiber.
Interesting, most of the Nord-West area of Frankfurt was affected, residential cable internet failed but LTE/5G network of Deutsche Telekom didn't function as well.
Some people suggested adding {2,3,4,5}G as another fallback. This often fails because the cell tower is connected to the same cables. Or the backbone burns down. Or the connection between the datacenters fails (see recent AWS failures). Or coworker is mad...
It is good to prevent failures. But autonomous local systems shall remain (basically) usable for some time. And if the local system is a paper, that's at least something. We've also ABS, ESP and ASR in cars. We still close the seatbelts.
Everything fails all the time. The fiber cut was not responsible for this. Lufthansa having woefully inadequate resiliency plans in place with their infrastructure is to blame.
This is 100% on poor management practices by Lufthansa, but of course they’re going to point the press to that shiny object over there in the form of a fiber cut. The press, as usual, took the bait.
push out the PR and remember that journalists are not technical people.
Had there been multiple cuts in multiple different locations I'd be more sympathetic. Shetland being a recent example [0], where one fibre was cut, then the other one in a completely different direction was also cut.
Assuming they had 2 cables and both are broken, whether something as critical as the entire LH booking system deserves more resilience than just 2 geographically diverse cables, I'm not sure -- ultimately it's what's the cost, what's the damage, and what's the likelihood of it happening.
There was a fiber cut last year near Chicago. It was near a railroad so a bunch of authorizations were required and it took about three days to complete the work. My parents live in a rural area that has fiber but didn't have internet because the local ISP didn't have any redundancy.
We switched from microwave antenna which has its own issues to fiber and my dad is thinking, "Well, you said this would be better." You can blame the local ISP, but am I wrong to think that it's hard for a rural ISP providing fiber to afford redundancy?
A rural ISP might not even have the possibility of redundancy. There might be just one backbone fiber within reasonable distance. For example, there's a local rural fiber ISP near me that had to invest quite a lot into digging a fiber trench several miles to the nearest backbone connection before they even got off the ground.
this could have happened to the microwave antenna anyway - the other side of your antenna is most probably connected with a fiber cable, which can also be cut.
Crazy that an airline that is very used to the need for double- and triple-redundancy in the aircraft it flies fails to have any sort of connectivity backup for it's ground systems.
They have the legal obligation to have those double/triple redundant systems in their aircraft. Otherwise it is probable they would not have as many safety systems.
Does anyone know if these systems are such high bandwidth that they couldn't run with 4G or 5G temporarily? Perhaps some flights, say 25%, could actually still be processed?
Right. I'm not even sure the interconnect is IP-based. It could very well be a legacy X.25 network. A quick check of Wikipedia says that X.25 is still used in the aviation industry. Routing X.25 over 4G probably involves adding a layer to encapsulate in IP.
This is Germany. I bet they have at least one company nobody heard of that makes money hand over fist specifically by doing X.25 over IP for industrial plants.
How is this possible? Presumably whichever datacenter it was hosted in, would have multiple fiber lines connecting to it? Or am I just spoiled by the major cloud providers?
It was a major outtake of the whole area, and there is not that much competition in terms of cables for historical reasons.
Though, it's also possible that the other connections just failed, or could not take the sudden traffic, or the system crashed for some nonsical reason, because nobody really tested this scenario.
Internet Speeds in Germany are lacking in some areas, but fast internet is generally available. I have a symmetrical 1 gig FTTH connection in the countryside
Except when you rent and the landlord says "Internet is internet, why do you need different one?" then a next desperate tenant will rent whatever there is.
*Please*
All critical system shall be working autonomous. Cache flight/passenger data locally at airport for next weeks. Servers are for synchronizing, not for keeping your data away. With a local cache you can keep going for some weeks, adding/changing bookings will be harder but you can remain in the air. That is pretty complicated and requires more work but is important. Git or IMAP are examples how it should be.
Outages will happen and will keep happening and when the get more seldom, the will become more serious because it will hit inexperienced staff. Especially British Airways is known for issues[2][3].
[0] https://www.lufthansa.com/xx/en/flight-information.html
[1] https://www.heise.de/news/Sabotage-bei-der-Bahn-Viele-vertra...
[2] https://thepointsguy.co.uk/2017/09/amadeus-network-issue-cau...
[3] https://www.datacenterdynamics.com/en/news/british-airways-e...
PS: Deutsche Lufthansa is known for reliable transport. Deutsche Bahn is known for unreliable transport.