Southwest cancels 5,400 flights in less than 48 hours (npr.org)
584 points by edward 11 months ago | hide | past | favorite | 570 comments

I am affected by this. We got halfway into our flight only to find our next leg was cancelled. SWA will not (cannot?) rebook anyone until the 31st. Our return flight was going to be Sunday, so we rebooked from our halfway stop back home on Sunday.

Here are some crazy things I have encountered.

Rental cars in our city are sold out. Same for cities within two hours drive. The websites will accept reservations, but when you show up, they tell you they have no cars. Because it's the holidays, busses and trains are booked and any flights on other airlines are crazy expensive ($1500 one way). When you are stuck in a city, you are probably truly stuck there.

Your only hope of dealing with SWA is waiting in line at check-in or a gate. The phones don't work. Online chat doesn't work. The lines are long and slow.

Luggage is hit or miss. If your bag was pulled off a plane, you might find it in baggage claim, but most bags are on a plane or on the tarmac. SWA told us it may be 30 days until we get our luggage. They won't pull bags for people, and the agents that we spoke with acknowledged and felt for people who may have had medicine in them.

The workers are as befuddled as the passengers. They have been very nice and as helpful as they can be, but their phones haven't been working and their computer systems have been slow.

On Twitter, someone posted a video of the announcement at Houston Hobby about no flights until the 31st and people keeping their receipts for hotels, etc. They said the same thing at our airport.

People in the airport are so mad. It's unfortunate because it's not constructive. But tempers are flaring, and frustrated passengers who finally get to talk to an agent end up slowing things down because they spend a lot of time trying to hear something they're not going to hear.

Ultimately, this is an operations failure. Companies talk a lot about accountability, but the typical way you hold people accountable is by replacing them with more capable people. It will be very interesting to see if any executives leave SWA over this. If not, I would say that no one was held accountable.

To close, my family and I are fine. This is but a minor speed bump in life. No one is dying, and we will see how SWA takes care of the extra expenditures. Some people aren't so lucky. They have meds in bags, or finances that don't allow them to spend multiple nights in a hotel and get Uber trips for a few days. Hopefully, SWA takes care of them, too.

If there are no cars available near you, and you want to drive, try U-Haul or similar truck rental places. It’s probably not going to be a super comfortable ride, but you might get home.

That's a good idea, but not really feasible for a family. After the rental car fell through, we resigned ourselves to not getting to our destination and decided to make the most of the situation we have (explore a new city we have never experienced beyond an airport or a highway) and solidified our return flight.

A cheap mattress, pile of blankets, and a 12v tv all in the back of a truck made for the best road trips as a kid.

Sounds nostalgic and all, but the temp was near 0 in many places throughout the US. Not a fun time to be sleeping in the back of a U-Haul truck.

Or keep an eye out for shower-curtain-ring salesmen.

Or a polka band

the other 3 (expensive) hacks I can think of are charter a flight, and draft people stuck in the same situation to help defray costs, rent a 15-passenger van or bus to do it; buy any vehicle you can find and then resell it at the other end

For those in a situation you may want to consider renting a truck from Uhaul or similar. They’re only open during business hours and will be way more expensive but a vehicle is a vehicle.

I can’t say I recommend then offering to shove everyone else in the back.

This has been one of the reasons I heavily lean toward direct flights now - that way if stranded im either at home or my destination.

There used to be a website where you could post that you were driving from A to B and other people could opt in to go with you and hitch a seat. That would be great for stuff like this.

There’s also greyhound and Amtrak.

Friend of mine used something like this to get from LA to the bay area.

The car had 2 girls in it who hotboxed the car the entire 6 hour drive, then missed his exit near San Jose and got pulled over for an unrelated reason.

Says if he wasn't married now he would do it again in a heartbeat.

I used something like this to get from Berlin to Amsterdam. I wasn't hotboxed but did get to spend the ride chatting with an attractive fashion student. It was cool because I'm not involved in fashion at all but my uni was using that industry for a lot of examples in lessons.

I think the precursor to Lyft, Zimride, was a matching service like you describe.

BlaBlaCar does this in Europe but I don't know how much uptake they have in the US.

craigslist still has a rideshare section but that is super hit or miss.

My local uhaul charges a day rate and $1/mile, and gas is on you. Still you can get lucky if they need to move a truck your way. They'll practically give it to you.

Budget has a different pricing scheme that is per day and depending on location, can also work.

Apparently it’s due to SWA scheduling and related software overwhelmed and unable to handle the information mismatch.


I am SO glad I decided against traveling this holiday season. What a disaster.

Thanks for the update. I’m sure the people around you appreciate the calm you’re demonstrating. I hope you’re able to salvage some of your holiday plans.

I had a similar situation happen to me on my way back from a wedding. Got stuck in Charlotte with two small kids. It made me realize a bit more tangibly how absolutely terrible our public transportation is here in the US.

I don't know the cause of your specific situation. But with respect to 5400 cancelled flights, I'd be surprised to hear that there's a transportation system (public + private combined) that can handle a sudden and unexpected influx at a time where things are already at peak-capacity (holidays)

The holidays are usually a quiet time for public transport systems - the lack of commuters more than counteracts the increase in tourist-like trips.

No, but in my particular situation (which predates the Southwest issue), had I been in Japan, Korea, a decent European, etc I could have managed to get a train ride from a city the size of Charlotte to a city the size of my hometown.

Me thinks you are quite right: back of the napkin math indicates the average flight carries 64.44 passengers. So just the Southwest flight cancellations stranded 347,976 passengers. That’s a lot of people to re-route.

Southwest boards three groups of up to 60 passengers each. Every Southwest flight I have taken post-pandemic has been full. I’d bump that average up to the 120-180 range for Southwest flights this time of year.

Yeah, but they're not all in one place. If the US had a genuinely functional public transport system, I doubt there would be much problem finding seats for all or the vast majority of the stranded people on trains going to or near their destinations.

There's 2 types of public transport: intra-city and inter-city. Intra-city doesn't serve the same use case as flying, and can't generally be repurposed to inter-city transport, so it should be discounted. If the US had inter-city rail that people actually used, then it would almost certainly be near capacity in a normal holiday season. Unless you assume there are a lot of extra train cars that aren't normally being used but could be taken out of storage, there's no way that it could absorb such a large influx of stranded passengers on short notice. Inter-city rail generally operates like airplanes in that only one ticket can be sold for each seat - there's no standing room on a multi-hour train ride.

> They won't pull bags for people, and the agents that we spoke with acknowledged and felt for people who may have had medicine in them.

Casual reminder to keep all medicine in your carry-on! Even gate-checked bags get delayed and lost. I never realized that could happen with a gate-check until it happened to me. They loaded all of the gate-checked bags onto our plane, but then that put it over a weight limit, so they took a pallet or two of them off and put them on another flight. Of course, it was a complicated multi-leg international flight to boot.

Anybody who puts important meds into checked luggage is really to blame for their problems. Bags get lost even when airlines don't have catastrophic breakdowns of their normal operations and taking this risk is really not necessary as meds can totally be carried in carry-on. It's really unfortunate that this is causing problems for people but it's totally preventable.

Your getting downvoted a lot - but as someone who has been racking up north of 100k miles/year for five years pre-pandemic (somewhat less now) - anytime I checked luggage I essentially needed to be mentally and emotionally prepared to say goodbye to it. I booked my tickets with Chase Sapphire which has very good missing luggage/toiletries/etc... insurance - and had my luggage go missing five times in as many years - though, I will say, all five times I had the airline deliver it to my destination hotel within 48 hours.

Regardless - don't check anything you aren't prepared to say goodbye to.

Yes, but I think the comment is being downvoted for tone, not accuracy. It’s not nice to lecture people when they’re down, even if they’ll never read it.

You shouldn't be downvoted for this. Southwest (and other airlines) specifically recommend bringing all medication in carry-on bags. I have had checked bags delayed multiple days on other airlines. It happens.


There being a possibility of a negative event providing motivation for a strategy to mitigate the event doesn't provide justification for realizing the negative event through sheer incompetence nor does advertising your incompetence ahead of time clear you of the blame for said incompetence.

Except that even if you only bring a carry on, it’s not guaranteed that you won’t be hit with the “we’ll have to check this bag free of charge for you”. I’ve been in my seat, the lady takes my bag down from The storage to fit something else, can’t fit my bag back, and decides to check my bag without even talking to me. Other times it happens as you walk down the aisle and find no room, it’s very common.

Meh, it's avoidable. I travel with a medium-big backpack as my only carry on, usually, and have never been gate-checked. The roller bags which barely fit in the overheads and have no compressibility are the ones that get gate checked.

So you travel for a meeting for a day and back? Ever travelled from warm weather to cold weather? Ever travelled to a place where you’ll need two types of shoes? Or multi city with different weathers, different engagements… I don’t know how you can believe that “you can just travel with a small backpack anywhere!” “If I did it, it must work for everyone!” “99% of the people have carry ons bigger than a backpack because they are stupid and literally do it for no reason”. Is that the logic you’re following? I don’t get it.

Note the word 'usually.'

But also I've traveled for a month (us to India) from my backpack. One week of clothes, a laptop, some cables, notebook and pens, perhaps a couple more books, a few toiletries. And still some room for one 'fun' thing, like a small synthesizer to jam on.

It's a bit like ultralight backpacking; you can cut down quite a lot one you get in the mindset. I definitely don't claim it works for everyone... But probably most people can travel rather lighter than they normally do.

Fwiw, my travel backpack is pretty similar to this one. Definitely a bit bigger than the standard backpack, much smaller than a hiking backpack, and rather smaller than a roller bag.


There are only two kinds of bags: carry-on and lost. I always travel as minimal as possible so I can stash my gear under the seat.

100% agree that happens, and frequently, but it's also perfectly avoidable - don't try to maximize volume. Bring a carry on that fits underneath a seat. I typically travel with a rectangular backpack (plus, if needed, a check in, which I acknowledge may not make it there). It gives one an independence and self sufficiency that's completely worth it.

The expected downside of a lost bag is that you get it back in a day or two — not ten days.

And usually not in a city other than your point of departure or expected arrival.

Southwest should be responsible for making it worse.

The expected downside of a lost bag is a lost bag.

Still there are plenty of medicine you don’t want to miss for a few days. For me and my family anything that we absolutely need travels in hand luggage always. In my experience getting a lost bag can be a huge pain specially if it was lost during a layover

This is really harsh but it’s good advice. I’ve never checked important medicine but I never really thought about the danger of losing access to it in a random city. Fortunately I think local pharmacies would be pretty understanding in this situation… and give people temporary access to missing medicine… hopefully… perhaps the airport could even act as a middleman (I got urgent care drugs at a foreign airport once when I was sick).

Pharmacies cannot and will not do that (except for insulin, for which it's now allowed in some states, and maybe a few other non-abusable things.) If it's a controlled substance -- which includes most things you'd be worried about withdrawal from -- forget it. If it's a more tightly scheduled drug, you'll have trouble even getting an unfamiliar doctor to prescribe it. It's not that they don't care about your withdrawal syndrome, but they generally care much more about not losing their license.

I think it would be a matter of calling the prescribing doctor’s office, explaining the situation and giving them the details of the pharmacy you’d like to pick up the prescription from. The biggest hurdle would probably be insurance, who doesn’t want to pay for more medicine than was necessary.

Good luck with that Dec 24-26 (the time frame we're discussing). You might get an answering service that can get a message through to your doctor but the office is likely to be closed and empty. Then you still have to find an open pharmacy with your medicine in stock.

Keep it on your person. Always.

I don’t know that pharmacies are allowed to do that. What you usually have to do is call your primary care or other doctor and explain the situation and have them send a prescription to the new city

Totally agree. Everytime I fly I think about how I'll feel if my bag gets lost forever. I then plan accordingly to minimize the tail risk.

This is why corporations can't be allowed to just do whatever they want. A rental company renting non-existent cars to you is a breach of their business purpose, to sell a product or service. You've spent time and money to get nothing, against what you were sold.

Who can step in to make some baseline rules that corporations have to abide by so people aren't screwed like this?

You could find flights earlier for around $750 to some destinations via private jet charter (subscription) company some friends did that while having similar problems.

Every airline, pre-boarding, informs passengers to place any and all necessary medication in their carry on bag.

This has been procedure for the last 40 years.

There is no excuse that is feasable or plausible for 'forgetting' important medications in the checked baggage.

This is a reading comprehension problem and not an airline issue.

No excuse, none? Not even the harried passengers at the terminal suddenly being told they need to check their carry-ons due to lack of overhead bin space, right before they're about to board?

You have never forgotten to pack something, or perhaps forgotten an important detail in a stressful situation?

I had an international flight every month on average in 2022, across Europe, and I was never explicitly informed about this by airline or any of the friends and coworkers.

Not to agree on blaming the victim like op, but they announce it over the PA system on many airports, not directly to each person.

I’ve never heard this flying hundreds of times

I have a million miles on United alone, and many more on others as well: I’ve never once heard such a message either.

they force you to check your carry ons half the time

have you checked to make sure you're not just stuck in the plot of a Home Alone movie?

As I understand it, Southwest is particularly badly affected by such issues because they run a lot of flights that are not in a hub-spoke model, but rather serial flights one non-hub city to the next and next (like eventually coming back in a loop). You can see this by going to Flightaware.com for example, and following back a flight's previous destinations. See for example https://flightaware.com/live/flight/SWA1092/history/20221223... and "track inbound plane" a couple times.

They jump around the country, much less frequently going back to a hub as other airlines do. That means that the planes and crews have a relatively harder time recovering from system-wide disasters because they don't have as part of normal operations as much ability to centralize or pool resources and get people/planes reorganized. (everyone go back to base, consolidate passengers, crew, planes and redeploy them and sort things out in one place)

Unfortunate, but that's their model. Good for some purposes, not so good for others. Maybe it's them being quirky and an active choice. I mean, up until a few years ago they did not fly to Hawaii because their scheduling system / people / processes did not want to have redeye flights.

They've operated without the hub-and-spoke model years but they haven't had to operate with over 8% of their staff leaving. They're understaffed. It was a major issue with Southwest all throughout 2022 and it got brought up on their earnings call with investors. They're a budget airline and they can't afford to take that kind of staffing hit.

From a friend that works there this is not a staffing issue. It is a software issue. Apparently, their system that tells them what staff is where is broken and doesn't know where anyone is. So now they need everyone to call and tell them where they are to update the system, but the phone lines to do so are totally overwhelmed. This is also causing them to be unable to book hotels for the staff that is stranded. Most people are just buying their own hotels and hoping for reimbursement in the future.

Some additional hearsay evidence that this whatever the initial cause was it has been compounded by some massive IT failures: https://www.reddit.com/r/flying/comments/zw5lsl/southwest_pi...

This is interesting and topical considering what's happening in places like Twitter..

..And the one thing that might have saved Southwest in the past, its ferocious employee loyalty and willingness to go several extra miles when the shit hit the fan because they knew they were being taken care of, has been utterly destroyed by the new management.

People were loyal to Herb (not SWA) because Herb was loyal to them. Herb’s gone, loyalty’s gone.

You could both be right -- could be that a hub/spoke model would fare better in a software failure, because everyone would return to the hub and things would be straightened out the old-fashioned way.

Not having staff didn’t seem to stop them overselling what they can handle, however. They took a risk on stretching as far as they could go in an ideal environment and here we are.

This is probably tied to how deeply ingrained the overselling habit is in the airline industry in general. They're legally protected when doing this, and it's why that doctor got dragged off that flight.

  > They're legally protected when doing this, and it's why that
  > doctor got dragged off that flight.
To be clear, it was United that dragged a doctor off a plane. Not Southwest.

Are you referring to that time that United dragged a doctor off a Toronto bound flight? God, I will never forget that.

That was a United flight, not Southwest. Southwest and Delta are the only two good US based airlines left.

Southwest does not overbook their flights as a matter of policy.


Proof appears to be in the pudding here. They absolutely overbook, all airlines do. Dated a few attendants over the years and they’ve all echoed similar experiences with folks being double booked especially so during this time of year. I’ve been bumped on a SW flight, got the extra travel voucher so it worked out. They absolutely do overbook.

If you read the linked faq, there's a difference betwen overbooking and overselling. If Southwest sells all the seats for a segment, but they end up flying the segment with a different plane than scheduled, there may be less seats than scheduled passengers. That's more likely to happen furong the holidays when flights are very full and weather delays are common and there's more equipment changes.

To me, that reads as a crappy distinction intended to inspire some pity for Southwest.

The effect is the same either way. Someone paid for a ticket and they don't get to board the plane.

The difference is intent. Overbooking intentionally creates conflict likely to result in a ticketed passenger unable to board. Overselling also results in a ticketed passenger unable to board, but was not intentional; at least not directly intentional. You could argue flying planes with different capabilities offers the possibility to have a lesser plane subsituted and that's an intentional choice, but...

Another way to think of it is what could an airline have reasonably done to avoid the situation? If it's overbooking, the reasonable thing is to not overbook. If it's overselling, they could choose not to fully book their scheduled equipment, but is that reasonable? They can't choose to have 100% reliable planes and crews and weather and ground operations. Stuff happens, and it's certainly reasonable to be upset when it does, but understanding why it happened can be helpful, so making a distinction between overbooking and overselling makes sense to me.

C'mon, there is a huge difference between an airline selling a seat they don't have and not having it due to an equipment issue.

Or, to look at it from another perspective:

The airline sold you something that they can't deliver because they refuse to keep extra planes around. They refuse to keep extra planes around because that would eat into profits, and would mean their execs wouldn't be able to buy their third gold-plated yachts.

Aren't all Southwest Airlines planes identical?

No, they're all 737s, but there's a lot of variation within that.

Seatguru says [1] southwest flies three variants, 737-700 with 143 seats and 737-800 and 737-Max 8 both with 175 seats.

If a -700 gets substituted in, that's a lot of missing seats. I've also flown on planes where one seat is out of service for whatever reason and usually has a plastic cover on it.

[1] https://seatguru.com/airlines/Southwest_Airlines/information...

Thank you for the correction here

As a policy Southwest is one of (the?) only airline that doesn’t overbook flights. They only sell the seats they have. From a business perspective it would be dumb to not offer all those seats for sale.

This is definitely their fault, but nothing like United pulling the doctor off the plane

Other airlines have staffing issues too.

The bottom line here is that the hub-and-spoke model is more resilient than the point-to-point model.

That is definitely not the case. The real issue is that even the staff that did show up didn't know where to go. There were employees lost for hours at Denver because the call-in scheduling system went down. Some employees hit their limits for work time before they could even obtain their assignments.

That’s a symptom. The root cause was lack of resiliency in airplane routing.

It’s the only thing that explains why Southwest was uniquely impacted this week. Every other airline has staff issues, and the same weather to deal with.

With a point to point network, the coordination problem is much larger than with a hub and spoke network. That is probably what pushed Southwest’s software over the limit. But remember, the software didn’t fail by itself in a vacuum. Resiliency problems during a weather event were the root cause.

What other airline relies entirely on a call-in scheduling system?

I don’t fly much but I noticed things like “the same flight number takes off at the same time each day and is always the same plane as a different flight number coming the other direction.”

It must really help all the employees with routine and consistency even if it’s not optimal.

Most airlines have schedules that are consistent day-to-day. It's the efficiency vs. resiliency tradeoff that's interesting. I'd probably summarize it as "don't fly Southwest in the winter."

That said, I flew Southwest from SJC to LAS for CES one year, connecting in SAN. Weather wasn't great, and they'd put you on the next available flight with an empty seat. They were even able to shuffle people without going up to the podium. Legacy carriers would have drug their feet, there'd be a line, and they'd charge for the privilege of changing flights.

> don't fly Southwest in the winter.

I'd look at it the other way around: cancelling is so annoying for them that they're often the last ones to do it (barring catastrophic collapse, of course).

When I was traveling weekly out of Chicago, I always made sure to bring my Southwest credit card, just in case. Southwest sucks, but it gets you home.

> Southwest sucks, but it gets you home.

My sister was recently stranded in DFW trying to make it to SFO when American Airlines canceled her flight. They were happy to substitute another flight... to Sacramento.

So the two-hour round trip to pick her up from SFO turned into an eight-hour round trip to Sacramento. I'm amazed this was considered an acceptable substitute. Would have been nice if American was willing to get you home in the event of canceled flights.

(We could see available seats on flights from DFW to SFO at that time from Delta and Alaska. But those seats were "not available to American rebooking agents". It seems like that should have been American's problem, not ours.)

Footnote: even though my sister had to be rebooked onto the American flight to Sacramento, American didn't bother rerouting her luggage, which they sent to SFO. I guess canceling the flight meant "the plane will still fly, but without passengers".

You live 1 hour away from SFO but 4 hours away from SAC? SFO is two hours away from SAC. How is that possible? Maybe it's 3 hours away if there is traffic.

I realize that this was inconvenient and a big hassle, so I don't want to make light of it, but something seems off with the geography.

No, the travel time to and from SMF doesn't total eight hours. But there is overhead involved in making long trips that isn't necessary for short trips.

- You need to allow for a wider margin of error in predicting travel times, which means leaving earlier than strictly necessary. But leaving early doesn't mean getting back early; you have to wait for the plane to debark.

- This trip was long enough to be more demanding than an electric car could handle, requiring 30 minutes of time spent parked and charging the car.

- More time was wasted waiting for the luggage to show up; we were not informed that it hadn't been sent.

They probably needed extra time to take care of the paperwork for the misrouted luggage and have some food. Wasting time wastes time.

Bad weather for planes often makes road travel a lot slower than normal too.

At the very least, they were able to get her over the Sierras which are being dumped on right now. If they got her even to Reno, good luck getting over Donner or Echo summit for the next few days.

Except when it doesn't...

United proactively rebooks you on a new flight if you will misconnect, gives you options in the app, and issues waivers that let you avoid weather by rebooking your own flight (often waiving fare differences as well). Haven't flown Delta or American as much, but at least United's tech is a bit more modern.

As a matter of course I avoid United, and American doesn't have much presence out of SFO or OAK. Delta has done right by me each and every time something's come up.

Stuck on the tarmac in the snow and miss the connecting flight (last of the day)? They automatically rebooked me on another airline and I got a notification from the app as soon as I had reception.

3+ SFO bound flights delayed at JFK because of crazy winds at SFO? They proactively encouraged people to rebook, gratis. I rebooked on a flight the next day on nicer equipment, through the app, went into town, grabbed some bagels and had a flight at a nearly ideal time of day.

Regional plane goes tech? They had a red coat out and about keeping everyone informed.

Missed a flight because I misread the departure time? They sold me a same day ticket on the next flight at a hefty discount and I kept the inbound leg.

Southwest has a huge disadvantage and it's not tech: it's the lack of interline agreements. Nearly every other airline (except perhaps Spirit) can rebook you on another carrier when things go sideways. Southwest simply can't. With an interline agreement in place you'd have far fewer people getting stuck with exorbitant last minute fares.

In general though don't fly when you're getting unusual weather. Less than an inch of snow at PDX throws everything into chaos (and Portlanders call it fucking snowpocalypse). A few inches of snow at PWM and they don't even blink.

Notably, per the Alaska agent I spoke to about my cancelled Alaska flight, Delta doesn't let other airlines rebook on Delta flights. That may just be an Alaska/Delta thing but it's not obvious full reciprocity.

I find it hard to believe that Delta (or any other major airline) wouldn't reciprocate. When I got rebooked onto an Alaska flight, the gate agents were openly annoyed at having to accommodate Delta passengers. However, the relationship between Alaska and Delta has grown more adversarial so who knows.

I should also add that a few years back when my Southwest flight out of SFO got cancelled I was able to book a seat on a flight out of OAK with a minimum of effort. I think I had to pay the difference in fare though. When it works, it works, but the go it alone attitude will only take you so far.

Someone below linked this company as an example of resiliency tools used in airfare. Interestingly, there's a testimonial from United on one of the front page videos.

Maybe not 1:1 for what you're describing, but it solution/reason, but does seem like a possible sign that they're investing in proactive tools.


Yeah delta will do this as well. The app will let you pick any flight that day for free if you do not like the one it auto rebooked you on. I usually just take the one it gives me, but if it tries to route me through DTW or MSP with bad weather in the winter I will try to find one that goes through ATL instead.

My American flight on Monday was delayed (crew rest requirement was the reason given, which actually means not enough crew). It was a connecting flight so once it was about 90 min delayed it wouldn’t be possible to make the other leg.

AA live chat said they couldn’t help (other than rebooking 2 days later which would be pointless) but when I called they were able to drop the connection inbound and outbound so I could drive to the hub city airport (cost me like 2 hours each way, nbd).

They fixed it in like 3 minutes on the phone but it seems like they have to do it in a super hacky way because I had to check in at the airport which was no fun with long holiday lines.

Your claim: "crew rest requirement was the reason given, which actually means not enough crew" is incorrect.

Consider: when a flight is delayed for a repair (maintenance delay) does that equal "not enough aircraft"?

Carriers can't staff surplus crews any more than they can sit on spare aircraft, both which are small, very carefully computed quantities.

It is a crew rest requirement delay.

Delta’s irregular ops procedures have always treated me right (in comms and in performance) and their app is pretty decent.

If I’ve had a missed connection en route, I’ll generally land and the app tells me what they’ve already rebooked me onto, but I have a choice of many different alternatives (usually) and can pick among them without cost and without waiting in a call queue.

> Haven't flown Delta or American as much

Delta does this well too. Haven’t heard many good things about American.

Delta offers you a buggy website to rebook for no fee. I changed to a flight that was about $1,000 more expensive a few days ago. (note that the significant price change is an edge case caused by travelling on Christmas being particularly undesirable)

United is absolute worst. Many of my colleagues are frequent flyers on United. I just do not trust United. They lost it completely when they caused that unnecessary incident where they forcibly removed that doctor from a plane. And then there is United Breaks Guitar, the viral book about their "legendary" customer service. I will never fly that airline.

When I was delayed and flying delta they continually rescheduled me until I boarded one of the flights.

It was super convenient even if I was fuming over the multiple hours delay.

probably helps since they don't have assigned seats (afaict)

> It must really help all the employees with routine and consistency even if it’s not optimal.

The Box ( https://www.amazon.com/Box-Shipping-Container-Smaller-Econom... ) has some interesting things to say about this.

In that book, the ordinary logistical setup is that ships tend to cover individual transit routes, which means that a delay affecting one ship doesn't spread through the system. Malcom McLean tries to set up a system of ships that always sail east instead, and it fails very badly, because delays on each individual leg of the (infinitely long!) route accumulate instead of happening and then fading away.

Optimal isn't the right way to think about it. It's a tradeoff. Hub-and-spoke is usually better at getting you to your destination in less absolute time given the same number of total flights since you can have more frequent "shuttle flights" that travel to the hub, exchange goods/passengers, and shuttle back. Point-to-point on the other hand, is better for minimizing travel time since you go directly to the destination.

I heard it’s optimal for staffing though because you can just have people on standby living near your spokes, which would be important if you’re dealing with cancellations, which can severely cascade with point to point

Pilots don't get into work on the morning, stay there for an entire cycle, and get back home in the night.

The planes are scheduled that way, but the people won't stay for the entire plane's cycle.

Yeah that model is likely why when I schedule direct flights months in advance I get my flight rescheduled for multi stop flights, often with absurdly short layovers of 15 minutes (who is going to mark that flight?).

It was a pain so much I stopped flying them. I’d buy a ticket and have to babysit it so that a flight from noon to 6pm didn’t morph over several changes into a multi stop marathon from 8am to 9pm…..

I wonder if this event will end up being a demonstration that they aren't sophisticated enough to use their operational model. I would think planning decisions would at least try to account for disruptions and recovery time.

I see lots of people who are at least quite a bit less likely to use them in the future (and they are still in the middle of trying to fix it).

> I see lots of people who are at least quite a bit less likely to use them in the future (and they are still in the middle of trying to fix it).

Meh, everybody always says that. In six months when this is a distant memory… it will be business as usual.

Both can be true. Some people will swear them off. Some won't. Some SWA will win back with steep discounting.

For me (I am affected), this is actually another in a series of recent events that are making me reconsider my preference for SWA. They are no longer a "cheap" airline, routinely more expensive than the other major carriers. Their planes are not nice anymore. I've flown on a few other airlines over the past few years and found their planes to be nicer with more features (like chargers and phone/tablet holders). And now this. The cancellations are one thing, but they totally botched the communication of it, and their practice of delaying flights throughout the day only to cancel half of them after several hours left people stranded.

Will I stop using them? We'll see how they respond, but they may not be my first choice anymore.

I’ll be driving 18 hours rather than risk a flight rescheduled 5 days (soonest available) after the original flight being cancelled, being cancelled again. Stuff sucks man. $2200 to rent a minivan for two days one way.

I feel bad for you/anyone having to pay extortionate rental car one-way rates because they're so variable/fickle. In future try something like below, before biting the bullet and paying full. But maybe in this circumstance with demand being what it is, there's no way around it.


For the ~75,000 travelers directly impacted, it will probably have some long-term impact in their purchasing decisions.

But for the 329,925,000 other Americans, many of whom have a long history and belief in Southwest’s reputation for customer service and fair policies? They will have forgotten by next week.

Yeah but they're also possibly less the type to go out of their way to hunt for Southwest flights, which don't ever show up on aggregators

I wonder if they'll need to start working with aggregators after this fiasco? (Assuming they start losing their current customer base)

This actually is what prevents me from buying flights from them. The only time I do it is when someone asks for them specifically.

Yeah airline travelers are price conscious and it is a race to the bottom. If they offer some crazy sale or cheaper fares people will book it. Just look at Frontier/Spirit. They consistently get horrible reviews but people deal with it for a $50 flight.

People often try to say “airline travelers are price conscious” as if there are several options in the same price range and travelers will accept any reduction of quality or service to save a nickel (I’m not claiming you’re suggesting this). But in my experience with US domestic flights the options are basically one “cheap” decently tolerable itinerary, a few slightly cheaper itineraries that are like twice as long in total duration, and then a couple of slightly better itineraries with better amenities that literally cost like twice as much or more.

I just laugh at the upset attempts when you go to check in online: “get priority boarding and 2 inches of legroom for only an extra 50% on top of the ticket price.” I really don’t see much evidence that there was actually a race to the bottom. And I certainly won’t blame consumer preferences when I don’t see any options for slightly better service for slightly more money.

That depends on your city pair.

Boston to Las Vegas, Orlando, or San Francisco, I’ve got a wide variety of choices, 2-4 carriers flying more than that non-stops per day.

Flying from Des Moines to Presque Isle, Maine, I have only a bunch of 2 and 3 stops on United.

Boston is such a weird airport I'm not sure it's worth bringing up except as perhaps an exception that proves the rule.

BOS has the "advantage" of serving a fairly large population while also not being big enough to be a real hub for anyone[0], while being simultaneously big enough to have service from nearly everyone.

Unlike a lot of airports smaller or serving fewer people than BOS (and some of comparable size), you can get from BOS to a whole mess of hubs.

A few select routes (BOS to SFO as noted) are incredibly well-served because of the volume of lucrative business travel between the two and the fact that a whole mess of airlines already serve both airports.

[0] No, JetBlue doesn't count. Boston is as much a hub for them as CLE[1] was for Continental. I.e. a second class hub at best.

[1] CLE by comparison only really serves Cleveland. Columbus, Dayton, Cinci, Indy, Pittsburgh and probably a few others from a similar radius BOS draws from all have decent(ish) airports. All of those have basically the same problem as CLE or are worse in some way. I've flown through or into and out of all of them.

CVG is still a Delta hub

Just as with ISPs, for many people in the US, true airline choice is not a luxury they have. Depending on their origin and destination, there may be only one airline that flies it, or only one that flies without a ridiculous set of stops or layovers. Even if people want to switch airlines, unless they live near a major airport or have high flexibility on when and where to fly, it's not really practical.

Even when there are choices, the price difference between the choices can be absurdly high. You can't call people "cheap" when they choose the cheapest option, and the other options are 2 or 3 times the cost for service that's only slightly better, maybe.

Yeah, I have 2 reasonably drivable airports that are both served by Delta. It's even the case that I can mostly get a less expensive flight with a good itinerary (airport to hub to destination) or a more expensive flight with a bad itinerary (airport to hub to other hub to destination).

Is Southwest the lone primary carrier for many of their airports?

> Is Southwest the lone primary carrier for many of their airports?

SWA is generally one of the bigger users of any particular airport they use as SWA tends to avoid the "primary tier" hub airports.

SWA examples: Providence or Nashua, not Boston. Houston Hobby--not Intergalactic. Chicago Midway not O'Hare. San Jose rather than San Francisco. etc.

Southwest now flies into O'Hare, but the majority of their flights go to Midway.

Southwest is not an exclusive option in any of the cities they serve.

It also might be fine if they only have to deal with this kind of event once every few years but it lowers their costs substantially the rest of the time. I wouldn't love it as a customer, but who knows.

> but rather serial flights one city to the next

i would imagine that's especially vulnerable to disruption as any delay/issue is magnified throughout the rest of the flights.

point to point was how every airline operated before gas prices and decreasing ticket prices caused airlines to focus on concentrating pax loads. it's not quirky; it's just not cost effective.

SWA can do this bc they operate a single aircraft type (737), have lower opex (they operate closer to spirit than delta internally and they do things like not allowing full GDS access to force leisure travelers to book through their website), and they keep their aircraft around a while. they also have a smaller network than the bigger airlines do, which further lowers opex.

However, SWA isn't truly point to point seeing as how a lot of their traffic flows through Chicago (MDW), Dallas (DAL) and Houston (HOU) and they have huge hangars and service ops out of these locations.

hubs aren't immune to huge cancellation numbers like this. American and United were heavily impacted during the 2021 Winter storm. Had the storm happened during the busiest peak travel season of the year like it did this year, they would have had record cancellations as well.

Same thing happened on July 20, 2016:


at the time, then-CEO now-chairman Gary Kelly said:

> "What's unique is the partial failure, it's never happened," he said. "This isn't a drill you can run."


Delta had a similar outage due to a datacenter fire, grounding all domestic flights. Southwest was uniquely slow in taking days to start up again. And if the way my American Airlines ticket switched my birthdate to January 1st, 2000 is any indication, many airlines still need to modernize.

> many airlines still need to modernize.

Most of the travel industry runs on old software that would horrify a lot of people here, especially those who've never worked for a large, 30+ year old company. When I used to interview a lot of people I made it a point to mention some of the more "interesting" aspects so they'd know what they were getting into.

One example: ever tried to book a flight a year in advance? On a lot (almost all?) of systems you can't, because the underlying date format is "DEC27".

Edit to address a couple comments: logistics are hard and there are plenty of reasons why airlines wouldn't want to support booking that far out. However, the reason you can book a flight 330 days from now but not 360 days from now is almost certainly due to the date format. (I believe the windows used are less than 365 days because it's helpful to be able to have dates in the recent past. I remember seeing documentation for 360, but AA and United seem to be in the 330-340 range on their websites).

As a fun side thing, I am also a travel agent with access to some of these internal systems on the booking side. The technology is incredibly antiquated. Most of the US runs on a system called SABRE, which is basically a MS-DOS system with a text command line interface and its own language. It's all ASCII text based (and all in uppercase). It's straight out of the 80s. Travel agents need to buy special "errors" insurance to cover any losses caused by fucking it up (a typo could accidentally cancel a ticket and cause the client thousands in losses rebooking it).

They actually have a GUI interface over it now with the ability for power/legacy users to drop into the raw shell, if they wish. From feedback, many of the older agents actually prefer the command line, because it’s muscle memory and an experienced agent can perform routine tasks that would take multiple screens in the UI with one hand in the way we’re comfortable with our text editors.

Granted, the rollout across airlines is probably glacial

Source: I used to work there

I don’t blame them. Modern UX has a huge problem with something as simple as date pickers. Preferring you scroll through 90+ items when a simple textbox would suffice.

SABRE dates from 1960 and is by some reckonings the very first piece of commercial (non-military, non-academic) software in the world.


And it's mainframe / COBOL, not DOS, which post-dates it by about a decade and a half.

It's not even an ASCII text app, but an EBCDIC one. Or was after EBCDIC was defined as a standard, after SABRE itself launched.

That is a tremendously fun fact! The little background things that keep society running. May I never be cursed enough I would ever have to directly work with such a system.

I wonder if it will ever go away.

It'll probably never go away, but just be layered over like civilizations. Eventually our software is probably going to get so complicated that we just build new software to interact with old software to avoid ever fully shutting it down. Like building a fresh highway on the oregon trail

Even if this incident is the proverbial straw that breaks the camel's back, the migration itself would be a multi-year project.

I don't think SABRE has anything to do with Southwest's outage.

I remember using some version of SABRE through CompuServe back in the day. All command line stuff over a dial-up modem, but it was novel and cool to be able to book your own flights with it. It would be very annoying to still be stuck on that interface, though.

What are the reasons preventing flight booking software modernization?

Some brief answers / thoughts:

1) Replacing any software for an airline carries huge risk. They are barely operating ok with the software they have and holding it together with duct tape. Even something you might regard as ancillary, like a baggage handling software system, or flight catering software system, if it goes down, has the potential to disrupt thousands of planes and hundreds of thousands of passengers for days.

It is so significant an issue (to try to change some software, and just one out of many systems that have to talk to each other) that if an airline ever considers doing this, they may actually stop operations for some number of days while they do it rather than risk having operations go wrong. There are some rare examples of airlines doing this to try to change their systems.

2) Related to the above, airline management hates to be embarrassed by something that might work but has the potential to go badly wrong. So they are very conservative when it comes to replacing systems that are working, even if it's painful / much less functional than what they might achieve by a change.

Combine these factors (and many others) and it means that sometimes starting a new airline is simpler than trying to fix an old one...

> sometimes starting a new airline is simpler than trying to fix an old one...

Now there's a truly total rewrite...

Can they gradually start moving flights to a new software?

Sabre has a 10-year deal with GCP to do just that. It's going to take a lot more than 10 years to get the thing off System/360 running in a bunker under Omaha airport though.


The typical answer for old behemoths: it was built because it was necessary to build it, and it won't change until a change is necessary too. Wanting that change is not enough, it has to become an almost mechanical constraint, and usually the constraint gets noticed when it far outweighs the costs (and not just a little). Or is a noticeable threat to the system's existence.

30 years of cumulative complexity in the existing stack, with endless edge-cases and special exceptions

.. and, as we're learning, extremely high penalties if one of those edge cases happens to cause a cascading failure.

GDS — there’s really only 3 centralized stores of real-time flight/hotel/booking information in the entire world (Sabre/SABRE, Amadeus, and Travelport). Almost every American airline uses Sabre (American Airlines is an interesting case in that it does not technically in a legal sense, but actually it spun off and sold Sabre in 2000, so a lot of their core systems are forks of each other)

Complexity — Fundamentally you’re looking at a logistics software, except unlike packages you’re dealing with people who aside from expected destinations have travel lengths and time-in-air calculated down to the minute. Also unlike a package, a surprise multi-day trip, unexpected multi-leg journey, one day delay is not something passengers (and crew members) will accept or be at all ok with. And if any one thing goes wrong there’s going to be cascading failures down the line— so much that it may break your company’s entire operating workflow (e.g. Southwest) entirely, and no software can overcome that kind of organizational gap.

Airlines - There’s not many commercial passenger airlines left in the US, especially that fly nationwide. Good luck trying to convince one of these giant behemoths to move to a non-battle-tested system for core operations, especially when decades-old industry software and practices around that software exist.

Entrenched - Sabre is entrenched in airlines around the world. They don’t just provide the booking services, they do the flight tracking, the ticket handling, the upgrading, the in-flight upgrades, missed connection handling, the flight scheduling algorithms, the pricing algorithms, the pilot and flight attendant time tracking, ground crew management, even the terminal software at each gate. To replace SABRE, you would physically need to rip out and then replace software around the world. And because agents don’t work from an office usually, but at the airport, you’re going to need to conduct trainings and provide support around the entire service area, which for the largest airlines is the entire world

Scale — A lot of Sabre’s revenue comes from passengers boarded. It depends on the airline, but I believe the average is that each airline pays 10cents/customer boarded with their software (though with increases in passenger volume each year, it may be less now). Because Sabre is so prevalent, and so many flights use them, they can afford such a price. A company servicing just one regional passenger airline would absolutely not be able to compete on price, at least starting out

Also— Sabre’s software itself is actually reliable! As a corporation it is slow clunky and bureaucratic, but the actual functionality it provides is stable, battle-tested, can handle any travel edge case you can think of, and fast and efficient for those who know how to use it, while also good enough at day to day operations that it doesn’t take too much time to train new agents on how to use it for routine tasks.

SABRE is ancient technology, but very reliable and at the same time extremely inflexible. Last time I saw it upclose in the early 2000's much of the core was still coded in IBM assembler, although over the decades more pieces were slowly being modernized so I have no idea where it is now. Sabre is a horrifically un- imaginative company where projects are measured in years and not much every changes.

I think though Southwest's issues are more on their side.

Yeah building a new GDS today is an exercise in insanity, it's a huge complexity nightmare and switching probably impossible. I always wondered if AI could eventually improve things, but the existing GDSs are unlikely to care much to try. It's basically a (tri)monopoly you can never break.

Great write-up. This applies to other industries such as dealerships with parts and service and various purchase plans. The software sucks for everyone except finance/accounting and that office is beside the Pres's office and therefore it won't change.

"What are the reasons preventing flight booking software modernization?"

Let's ask the opposite question:

Why do you feel that software necessarily needs to be modernized ?

I'm not sure how well SABRE works but I do know how fast and efficient keystroking through a non-GUI interface can be and I don't know why expert mode interfaces should ever be replaced by unsophisticated mousey-mouse-mouse ones.

In context, it seems like modernization simply means that the software knows whether a flight was canceled and therefore can automate the status updates for the airplane crew. Nothing to do directly with UX.

The existing flight booking software that they use (Sabre) can and does do that, but Southwest’s issue is likely their insistence on using a homegrown management system on top of that. Southwest only switched to Sabre in 2021, so it’s likely still being implemented, and their homegrown approach has likely evolved with them from their founding, so it’s not something the company itself is likely organizationally prepared for.

At least, I feel confident in that analysis, since this exact same issue happened to Southwest in 2016, before they were using Sabre. Which would point to a chronic organizational failure


Big software projects inevitably become expensive boondoggles that get everyone fired so nobody wants to do them until they're absolutely necessary.

airline margins are paper thin these days. they weren't back when booking engines and reservation systems were being built. (SABRE was built by and spun off from American Airlines. Galileo was built by United before they merged with Continental.) this plus the absolutely insane business logic that goes into booking engines has made the effort extremely risky.

To be fair, I think allowing flights a year in advance is probably far more complicated than just updating the underlying date format. Even if they were able to solve that problem, airlines probably can't easily operationally plan that far ahead due to so many moving parts, i.e. committing to routes and schedules, planning for staffing that far ahead of time, ever changing government restrictions, fuel price fluctuations, inflation, geopolitical realities, staffing, etc. I mean, imagine if they did, and something like COVID comes along again, it would cause far, far more disruption if they had booked out the next few years in advance (we had no idea how long COVID restrictions would last while we were in the heat of it, it's only clearer now in retrospect).

Also speaking as a software engineer myself, it's almost never just a software fix that will magically solve everybody else's problems, that always ends up being just wishful thinking

While this is humorous in that there are limiting assumptions like this baked into the system, I also have to wonder, who needs or even wants to book a flight a year in advance? I dread planning out a flight 4 months in advance and dealing with the almost inevitable cascade of conflicts this introduces of juggling and rescheduling things to make things align correctly. One year makes me cringe.

There are events that I can see purchasing flights well in advance for. I used to a go to a conference that was held every other year at the same time, it would have been easy to buy tickets more than a year in advance for that without much concern. Eclipses, certain sporting events, or reservations for activities with a wait list of more than a year could qualify as well. Despite that I am like you and rarely have tickets far in advance of a trip.

Me for:

- Annual conferences or conferences that occur every-other year - Planning family reunions because you need that kind of cat-herding lead time when you have 9 uncles/aunts on just ONE side of the family - Periods where I have some spare cash I'd like to lock in a getaway with before I spend it or something unexpected like the invasion of the Ukraine drives up fuel costs and overall prices... or a global pandemic hits - would be sweet if I could have rebooked some of my trips for for 1-2 years out when the pandemic hit - Travel for future medical stuff; at one point for 2-3 years I was taking my mom to the Cleveland Clinic every 4 months for periodic checks and it would have been super nice to be able to just book that stuff way in advance and have it all taken care of




I'd bet quite a few people would appreciate that ability

> I also have to wonder, who needs or even wants to book a flight a year in advance?

Major holiday, destination wedding, event known long in advance (e.g. Grandma's 80s does not come as a surprise).

My family plans the yearly family get-together at the yearly family get-together. A year in advance. Except sometimes due to scheduling deconfliction, it's actually 10 or 11 months in advance.. or 13 or 14 months in advance. The exact date floats and sometimes we are planning trips more than a year in advance.

I could see it for major holidays. I spent too much money to fly home this year because I am bad at scheduling. I would consider booking next year's flight during this year's trip just so I know it's knocked out and I don't have to worry about it.

Anyone doing anything abroad for over a year. It is impossible to book a round trip ticket with depart and return dates >= 1 year.

,, Most of the travel industry runs on old software that would horrify a lot of people here''

If you can see how it works, it horrifies me even more as a traveller, as from outside it just doesn't work a lot of the time.

Also if you just look at the video, we all know how bad these systems are, but are not able to do anything (starting anything new in the airline industry has too much cost).

There's an interesting CCC talk about security in the GDS system, if anyone would like to be suitably horrified.

6 years old so hopefully something has improved...


Likely cannot book that far not because of the underlying date format, but because of jet-a fuel prices which fluctuate. Airlines typically hedge their near term purchases with longer-term futures

Airlines historically have not set their schedules more than a year in advance and it’s not clear they want to.

> "What's unique is the partial failure, it's never happened," he said. "This isn't a drill you can run."

As someone who writes some very thorough unit tests... and also have had to have mandatory training... I find "this isn't a drill you can run" to be _very_ wrong.

Southwest is a "discount" airline. They do many things to economize, i.e. no assigned seats, they only fly 737s so they don't need to certify pilots or mechanics on any other types, you can only book with them and not with Expedia etc.

It would not surprise me that their back-office operations are likewise economized and some things are just not done because "they can never happen."

> Southwest is a "discount" airline.

They're also the "friendly airline", they easily have the most personable and friendly staff. I don't know what they do different, but Southwest employees treat me human and all the rest generally treat me like human trash. It's got to be a company culture thing, maybe connected to Southwest not having a first-class section.

Usually I fly with Southwest whenever possible without thinking twice about it, but this outage and the outage last year are forcing me to reconsider. Better to deal with rude people than to have my flight delayed..

Yep, the other airlines are in the business of selling "class" and "status," and it's part of their product differentiation strategy to treat you according to how much you pay.

>I don't know what they do different, but Southwest employees treat me human and all the rest generally treat me like human trash. It's got to be a company culture thing

Definitely. Normal American customer service is to treat you like human trash, so obviously Southwest has decided to do this differently and so probably does things like only hiring friendly people, training them to be friendly and positive even in difficult situations, and checking on them somehow to make sure they're doing this and not just faking it for the interview and training. Other airlines obviously don't, but it's not just airlines, it's everywhere in American customer-facing business these days. Rudeness is just a normal part of America culture now.

> Rudeness is just a normal part of America culture now.

I have a backpack that has a phone holder on the shoulder strap. When you click your phone in it, the camera is visible. I've had many a situation where someone was about to go full New York on me, but noticed the camera staring at them and toned it down. I've never once been recording.

Very interesting. I should look for a backpack like this, so I can be prepared the next time I travel to America.

What kind of backpack do you have?

It's an Orben with a cheap iphone belt holder clipped into the cargo loop on the strap. The belt holder came with the clear protective case I bought at Wal-Mart. I hung the belt hodlder on the strap and it's been great for recording short hikes and downtown excursions. But also, seems to have the effect of making people more polite at customer service counters.

  training them to be friendly and positive even in difficult situations
Without having any insight into how Southwest runs things I'd venture to guess that their gate agents and cabin crew have a lot more discretion than their peers at, for example, United. That kind of leeway makes the job much, much easier. United essentially handcuffed their staff leaving few options for dealing with an overbooked flight but to escalate to police involvement.

That's what happened with Dr. Dao in any case. Not long after that shit show, I booked a transcon on American. Turns out it was overbooked, and they were desperate to get people off the plane. The gate agents basically asked how much money it would take to get people to volunteer to take another flight. They got their volunteers, everyone went home happy because American empowered their staff to resolve problems.

A lot of people lay responsibility for Southwest's reputation for customer service at Herb's feet. Personally I found their front line staff to be casual to the point that it bordered on unprofessional (which is something coming from someone allergic to formality), but I'd absolutely believe that much of their positive reputation came from the top. Southwest pilots though, yikes.

With the exception of United I'd say most airlines I've dealt with have been pleasant whether or not I'm flying up front or have status. Yes, even RyanAir. We're in something of a golden age since (with the notable exception of Southwest) American airlines have mostly put their mergers behind them and have mostly shed themselves of CEOs who view employees as adversaries. Take a look at the bad old days of the late 80s through mid 2000s. Smisek. Lorenzo. Parker. Bastards and crooks, all of them. It's difficult to stress just how toxic airline leadership was and how that trickled down for a long time.

Southwest also has pretty good size seats, afaik, relative to other airlines' economy seat size. Plus the whole "you're all the same so we treat you all the same" thing is nice. I do wish there was an international equivalent.

Delta has been more than fine with me. Granted, I'm a Diamond class member now, but even on the way up they were reasonable and provided mostly good service.

United is awful. American isn't much better.

> It would not surprise me that their back-office operations are likewise economized and some things are just not done because "they can never happen."

Meh, it doesn't even have to be "never". It just has to be cost multiplied by frequency is less than the cost to prepare.

If they lose $100m every five years due to a system failure, and it would cost $30m/year to plan for those failures, they it's just cheaper to let it happen.

And I don't mean this in a judgmental, Fight Club-car recall speech kind of way. It's just business reality. At some point every business has to decide that the cost of planning for something is higher than the cost of letting it happen.

What's the value of the reputation risk of a major, very high profile failure?

Sometimes businesses end up on the wrong side of that bet. They see only the costs but not the benefits of preparedness (by the time it fails, there will probably be a different CEO in charge) and make a bad call.

Of course, no argument there. Ideally when you make that kind of decision you take reputational risk into account, as well as, like, is this an existential risk?

The airline industry feels like one where each year it's a different carrier who has some catastrophic scheduling failure. Today, everyone says they're never flying Southwest again. But if you fly semi-regularly then it won't take very long before you don't have any airlines left to fly on.

For people who weren't affected, I doubt very many are even going to remember this. Personally, I remember that this kind of thing has happened recently with other carriers but I couldn't even tell you who.

And people who were affected can mostly be bought off if you need to. Some vouchers & hotel reimbursement and it's just the cost of doing business.

Plus, the airline industry has proven over and over that people are willing to put up with a lot when you have the cheapest prices.

It's different from an industry that's built on reputation and trust. Like, a password manager, the only real thing you're selling is your reputation. Losing trust is a real existential threat. Security costs need to be in the bucket of either "yes, we will do it" or "it's so expensive that if we do it then we don't have a business anyway, so we'll skip it and pray."

Scheduling won't ruin an airlines reputation. Crashing the planes is what ruins an airline. Southwest has only ever had two passenger deaths and one of those was an attempted hijacker beaten to death by other passengers.

Eh, to a first approximation the FAA won't let you crash the planes. It's been 13 years since there was a fatal plane crash on a US passenger airline.

FAA has an important role, but market forces won't tolerate an airline that experiences crashes, even rare ones. Airlines are highly incented to be safe, both by regulation and by the market itself.

How badly did SWA's past high profile failures affect them? I'd say not so very much. Yes, some short term damage, but we're talking about an industry where no one stands out from a quality perspective. Everyone is seen as some degree of bad, much like big ISP's. People are used to rotating between airlines and, while those affected this round may shift away, others will get frustrated with their preferred carrier and rotate to Southwest.

I think reputation impacts in this industry from anything other than crashes don't hold much staying power.

I'm actually quite tempted to buy the SWA dip...

All airlines economize. An airline that doesn't is a bankrupt airline because typical industry margins on flight are razor thin.

Southwest isn't a particularly budget airline compared to modern budget carriers like spirit and ryanair that haven't copied the open boarding policy. I suspect the opportunity to upsell seats / luggage and have distinct classes outweighs the turnaround time costs of assigned seating.

Just as a note: they are about to issue a $458M dividend. They plan to spend $4-4.5B in 2023 on planes. How much are they spending on system modernization?

> How much are they spending on system modernization

A fortune, they only just finished an 8-year migration to Amadeus

It’s funny that you use unit tests as an example of it being possible to run drills for this kind of thing. Unit tests are by their definition not the kind of thing that simulates this kind of failure. Perhaps you have a false sense of security about what you’ve really been testing?

> Unit tests are by their definition not the kind of thing that simulates this kind of failure.

Thorough unit and integration tests include failure modes. Mine include things like "what happens if the OS reports that the storage was unmounted during read/write" (because that was a failure often seen in production with some flaky SAN devices) and "what happens if the server stops responding" (because networks are generally unreliable) and "what happens if invalid (random) data is given" because data corruption is a frequent occurrence for similar reasons.

> Perhaps you have a false sense of security about what you’ve really been testing?

Perhaps. On the other hand I've seen a _lot_ of other developers test only the happy path and call it a day then spend days/weeks/months debugging failures.

"I find "this isn't a drill you can run" to be _very_ wrong"

As a IT-VP/CIO, the statement of "there's no way to test it" is not acceptable.

Then you are senior enough to know what “ then-CEO now-chairman Gary Kelly” really meant was “I haven’t funded our technology team well enough to have resources to test a scenario like this”.

Or "we decided that the cost to plan for this is so high that it's not even worth testing. If it happens then we're fucked anyway and we'll just eat it."

You can drill the initial failure, but not really the cascading events. In something as large as a global airline you are dependant on 1000s of third parties actions and the weather. No simulated drill is going to be sufficient or realistic. The only way to really mitigate or plan for something like this is multiple layers of segregation so that events in one area have less or no impact on others. Then you could drill total failure in various segments.

Testing reveals the presence of bugs, never their absence. With hindsight you can always feel smugly superior in saying “you should have tested for this”, but there’s an infinitude of things you might need to test, and if you haven’t encountered a failure you didn’t test for, you’re probably just lucky.

I’m very certain Chaos Engineering is known in the airline industry

Google runs Disaster Recovery Training annually (DiRT) where security teams are tasked with simulating these “black swan” events. Seems like this practice needs to expand to more industries.

At Facebook we would simulate an entire datacenter disappearing.

When we first started doing it the datacenter would be chosen months in advance so that teams would have plenty of time to ensure their services can run without that specific datacenter.

When I left this year, the datacenter would be randomly chosen on the same day it would be cut off.

That's pretty cool and ideal practice for a software firm but in one of the reddit threads they're talking about mass quits/refusal to work of ground crew at Denver because of the weather. I wonder how you could ever prepare for that? Keep a backup, airport scoped, ground crew in the waiting room??

You can't really do hot spares for people without time to gear/train up and the weather event is so widespread I doubt there's enough spare SWA human capacity across the whole nation even if you had C130s on standby everywhere ready to take workers where they're needed most. From a national security perspective, situations like this is why the Marines exist right? Ensure a rapid response while the rest of the machine gets moving. I feel bad for everyone involved, those affected and those trying to figure out a solution.

> I wonder how you could ever prepare for that?

Management could consider how pay and performance programs can help ensure business continuity.

HR and MBA xls wizards don't understand how to manage for business longevity.

Do you really think the MBA wizards can't figure out some basic pay issues?

It seems you're discounting just how complex HR can be, especially in the face of exigent circumstances. No amount of bonuses will immediately staff up an entire terminal in the face of a massive snowstorm.

> No amount of bonuses will immediately staff up an entire terminal

That is right and thus I would engage line management to figure out business continuity.

> in the face of a massive snowstorm.

It is winter. The storm was tough but not exceptional for the season of winter. Denver did not report tremendous amounts of snow.

Few if any MBAs can get on the ramp and look a line employee in the eye and lend a hand. The wfh keyboard warriors don't know blue collar and therefore are unable to figure this out. MBAs can figure out ways to game their pay. HR can recommend team building 'fun' and non-revenue standby seats, which have minimal value to those employees flying with school age children. Shareholders should demand senior management unemployment applications.

> The storm was tough but not exceptional for the season of winter.

Sure seems like this was an exceptional storm. Widespread, deep cold reaching into Mexico, snow falling across much of the U.S., Buffalo hit with the most snow in 20 years, records set multiple locations.

A couple decades of weather variation is not exceptional, though some will disagree with me when their bonus is yearly. I recall it was cold in Texas last winter. Weather is seasonal and varies.

I wonder if board of directors will compare performance with other airlines.

While they may be able to figure it out, optimizing pay to quality of life at work ratios to ensure long term employee retention and loyalty has certainly not been a priority.

Pay might not really be enough. Maybe management could try to find folks to babysit kids/take care of parents trapped at home, freeing up workers to come fly. They'll certainly fail, but at least they'll understand the plight of their workers.

Strangers paid on the lowest cost contract to leave with your kids?

What's the big deal? You can always make more.


You can certainly handle it better than SWA is handling it. Maybe if they’d actually run a simulation where flights out of one or more cities had to be wholesale canceled, they would be. You can’t fix the situation for everyone, but you can avoid fucking up this badly.

It doesn't seem that farfetched for an airline to run a drill where a given airport is assumed inoperable to see how the system reacts. The expectation shouldn't be the same as the data center failure but you can learn what you aren't doing well enough.

Google is also a trillion dollar company, as other have pointed out Soutwest is a low-cost carrier which most probably doesn't have the luxury of hiring FAANG-level engineers on 500k yearly comp in order to best simulate "black swan" events.

They don't seem to pay competitively with banks, let alone FAANGs, though the benefits and culture are (or were) reputedly fantastic.

Source: Am a local who's been headhunted by them a few times but never got beyond the initial discussion with the headhunter for this reason.

This is probably the best argument for AWS/GCP/Azure even though it is becoming more and more obvious you don't really save that much money.

If you have a black swan event like this and you listened to your solutions architect you will have a disaster recovery plan or even better a multi region setup. Worst case you have highly paid support engineers at the cloud providers who will do everything they can to get you back online.

This does not seem like a hardware failure scenario where the cloud has anything to offer. More like their intricate software/database systems became out of sync with reality and disentangling the mess is a highly manual process.

In this case no, but I was more referring to the 2016 delta ground stop that was due to their datacenter burning to the ground.

I remember reading about the Delta incident a ways back, here they claim it cost them ~$150 million. https://www.datacenterknowledge.com/archives/2016/09/08/delt...

That's not the article I hoped to find however. I seem to remember there was another article where they hired a investigator/consultant to figure out the price to migrate to the cloud and ensure "this never happens again."

My recollection of that was: their scheduling/ops team is also in the same city (Atlanta GA) as this datacenter, and that teams work was brought to a halt by the datacenter outage. The investigator concluded that Delta would need redundant copies of the ops team or the whole effort of moving the software to the cloud would just be at risk to something happening to the human team all in the same city. That would obviously cost to much money, so Delta decided to skip it.

Regarding the employees, keep in mind that neither SWA (nor any other airline for that matter) have big software engineering departments. It's all outsourced to either generalist bodyshops for custom/peripheral systems (IBM, Accenture) or specialist shops for core (Amadeus, SABRE)

20 years ago I used to work for an airline. Back then Sabre was nothing more than the mainframe in Tulsa. All the APIs did nothing more than perform automated green screen commands. Has anything changed with Amadeus or Sabre? Or is it still mainframes behind the curtain?

There are two parts to the IT: the airline backend and the distribution backend. For the airline backends, Sabre used to build custom single-tenant ones per airline, on mainframes. The SWA one was very old, hence their inability to charge for bags. The Sabre distribution system is common of course, again on mainframes

Amadeus are eating them up, because their airline backend system is a shared multi-tenant setup, built on commodity hardware. Their distribution system used to be mainframes, but they managed to migrate away in the 2010s.

Sabre is still alive, but only in North America, and Amadeus is slowly chipping away (WN, AC..)

Thank you

Training for these scenarios may help with responding to true black swan events, like Rick Rescorla's WTC evacuation drills ahead of 9/11. But, nitpicking, if you've predicted something will happen, by way of simulating it, it's not a black swan when it does.

There are cost considerations. Business continuity costs money. Finance firms have significant capital and income to have empty but built out building around around airports for business continuity. Which doesn’t even make sense since they can work from home as proven with covid. Airlines can’t work from home.

Personally one of the basic tenets of my adulthood is realizing how many companies are a hair away from a similar scenario (differing in magnitude from an airline ofc).


Google is made of money, and the reason they are is not because of DiRT. Other industries can't afford the same things that Google can while continuing to be a business.

After canceling 5,400 flights, I can't see how Southwest can afford not to test. Even if they only made $1000 off each flight, that's still $5 million they just lost.

They probably would not have made it this far if they tested for every possible scenario, their margins are razor thin.

Anyway, doesn't even the best testing only catch 40% of bugs or thereabouts? It's not a silver bullet.

It's not any bullet, but spending a week each year to do downtime testing is going to expose more issues than not spending any time on it at all.

Yandex used to run datacenter loss training every week, where they will nullroute one DC and see what breaks all while handling live taffic.

This does not need Google deep pockets. It needs the motivation and some funding. SWA does not care.

Pull the backup tapes, hand those to DR team, provide bare metal, and start the stopwatch. I participated in this in 1990s across the Mississippi.

That... and Chaos Engineering in general, and also just generally a forward looking team that identifies threats, vulnerabilities, and risks in the future and works backwards to identify potential mitigating controls.

(Look forward, reason backward).

>Seems like this practice needs to expand to more industries.

I think you've mistaken this for something immediately increases quarterly gains with no regard to long-term strategy.

What's most annoying is that there's plenty of employees on the front line who not only care about testing for this sort of thing, but it actually interests them, they're motivated by it, and they understand the dire reality of what happens – to them, primarily – if the company isn't prepared to handle it.

And you can guess what their managers' response typically is: "We need to focus on OKRs and QBRs and KPIs right now... maybe next quarter"

I'm fully convinced that achieving 'manager status' is directly correlated to cowardice. Companies need top-down decision-making, but those decision-makers need to spend more time on the front line.

> Companies need top-down decision-making, but those decision-makers need to spend more time on the front line.

This is not rewarded so it doesn't happen. Managers are rewarded for line goes up so they only focus on line goes up. If line ever doesn't go up it costs them money (advancement, compensation) even if there's little they could have done to make line go up.

Airlines don't have quarterly gains. They regularly go bankrupt and get picked up again, because the country needs airlines and because they have large union contracts.

See any gains here?


My wife and I spent a few hours this morning dealing with the cancellation of our return flight. Southwest has long been preferable for me in many cases, including my most flown route. Between the headache of this outage and the apparently dismal state of their operations there's a strong chance I never fly with them again.

Our family was in San Jose last week when our Southwest flight was cancelled.

It used to be that my first priority would be to go into the terminal and try to talk to somebody. I figured they were the experts. From what I've read, the staff use an antiquated system that takes you from one airport to another, then they can try to get you from that city to where you want to go. That's why there's so much tapping of keys and why it takes so long.

It's better to present them with a route that you've found on Google Flights or similar. The Southwest first flight out was supposed to be yesterday evening, the day after Christmas. In our case, the only thing we could find before Christmas was getting us from SJC to Seattle via Phoenix on Alaska. We ended up renting a car and driving home to Portland. Things got bad around Eugene - I stopped counting after 40 wrecked cars and semis - and got worse as you got closer to Portland.

Until regulation steps in.

In that case the government should smash Southwest with a billion dollar fine so the cost of not doing this drilling exceeds the cost of doing it.

Asking the government to step in for additional regulation is rarely helpful. For this type of failure, the free market will determine whether processes and tools improve, or whether the status quo is good enough.

What's an airline got to do with the free market? They're a extremely highly regulated business.

Regulation sometimes helps remind the free market that fuckups like this can come with real human costs.

Regulation imposes those costs and more on real humans and rarely heads off fuckups.

Market forces can correct here. Vote with your wallet.

Free market?

I figure most companies are too small for that to be budgeted. Though, it’s possibly a good selling point for cloud if it’s capable of it.

Cloud doesn't solve badly designed processes or poorly written software, which seem to be at play with Southwest. Yes, it can help provide more stable infrastructure and there are some (but by no means all) black swan events that can be mitigated simply by throwing more kit at the problem during a surge but it's no silver bullet.

>"What's unique is the partial failure, it's never happened," he said. "This isn't a drill you can run."

The unspoken part you have to hear there is "... within the economic model of the airline business".

Business continuity gets exponentially more expensive as you chase the blackest of swans: the sheer volume of plan development and maintenance, developing exercises, table-top vs. walkthrough vs. simulation, assumptions about how many different uncorrelated failures you're prepared for deal with at once etc.

I've no doubt you could run an airline to be as resilient as (say) USAF Air Mobility Command, but no-one could afford the tickets.

What's ironic here is that groups like USAF are constantly pressured to adopt private industry models to be more "economically efficient" and completely ignoring that resiliency is a requirement baked into the high cost. I understand why both take the approaches they do but it seems everyone holds private industry barely running with no resiliency optimizations above all else, which don't make sense in all contexts. Corner cutting is fine in many contexts, especially when you know the side effects of their failures which may be quite insignificant.

1/1/2000 sounds like a default value when it lost the data or never had it. Even more obvious would be if it threw you to 1/1/70.

On our database systems, we have some date fields for which the default value should never, ever be used and if it is, there is a big problem. All of those dates are set to the dates of well-known natural disasters that happened in the 1800s or earlier.

The thought was that it needs to be something that isn’t believable to a non-technical user seeing it on their computer screen. It turns out that this is not necessarily useful. I listened to a guy talking about some issues with a record; he says “1871? What’s up with that?” And then just moved on as if “well it came out of the computer, must be right” or something.

I think that databases need to have the concept of NaN for dates and time stamps, except that this should be configurable to something like a poop emoji or something like ⁉🆘. It has to be something where your grandma would look at it and confidently say “your computer is broken”

Your database should not allow invalid values to exist. That is what check contstraints, foreign keys, NOT NULL constraints, etc. are for.

If you're such a DBA, the rest of an enterprise will quickly route around you.

That's the attitude that causes these problems. Probably their date was set to 1/1/2000 because the database wouldn't allow them to not set a date.

> I think that databases need to have the concept of NaN for dates and time stamps, except that this should be configurable to something like a poop emoji or something like ⁉

How about just NULL?

Making database columns nullable isn’t a free ride.

In some situations, you are trading one known point of failure for a million unknown ones. Among other problems :)

So have another column adjacent to the date that stores an enum with an error code with a reason for why the date is missing. My point is, this is business logic, doesn't have a universally applicable definition, and should be handled on a case-by-case basis rather than trying to hard-code it into the date type in the DB.

Default date types, with some columns being nullable, work just fine for my project. I wouldn't want them to be more complicated and force me to consider additional cases, especially if those cases aren't language compatible.

> How about just NULL?

Off topic, but this reminded me of:

Hello, I'm Mr. Null. My Name Makes Me Invisible to Computers


"The server returned an unexpected error."


It's definitely some default (or "null" in a DB) value and that is exactly what OP is insinuating.

Particularly on mainframe systems like airline reservation systems tend to run on where the Y2k fix in a lot of cases for Cobol was to simply contextually know that certain fields couldn't have been created before 2000, so '00' BCD is simply year 2000.

> Delta had a similar outage due to a datacenter fire

They only have one geo-located data-centre?

"this isn't a drill you can run". And yet, Netflix has chaos Kong do it with regularity.

The difference between what's true, what some people will buy, and what you can get away with saying is gross, y'all.

> "this isn't a drill you can run". And yet, Netflix has chaos Kong do it with regularity.

Netflix and airlines are so different as to make this comparison laughable. The cost of setup and consequence of problems actually being found (ie Federal Regulations) that are not addressable (it's not like SWA didn't know about some of the eventualities), easily outclasses the need for testing every combination of situations. Kong doesn't run anything that has to do with weather turning jet fuel into sludge or 12x pre-staffing in case of massive computer failures along with assessing the possible legal consequences from each locale. The hubris of pretending that physical services on a national scale, is as deterministic as a complex automated system, is unsurprising from a certain crowd, I guess.

Airplanes aren't 21st century move fast and break things software, please keep them that way!

Modernization of equipment, hiring more pilots and other employees, investing in updating the code base - how can that be done? It's far more important to keep the stock price high by whatever means necessary, such as using government bailouts to buy back shares.

Investment capitalism is really a garbage system when it comes to building and maintaining basic infrasctructure like transportation, electricity grids, roads and so on. China has demonstrated that convincingly over the past two decades, hasn't it?

> "This isn't a drill you can run."

When characterized as something that can't be done instead of something they don't know how to do, you know exactly where they are on the Dunning–Kruger curve.

