All of these large companies seem to have (correctly) realized that 95% of tech ...

toast0 · on Dec 8, 2020

> If there's anyone out there designing tech support procedures, you should add an "is this a 5% problem?" question to whatever checklist you give to support staff.

When I was the engineer customer service escalated to, I was damn sure to thank them every time they escalated something. Even the one guy who escalated all the things I'd roll my eyes about in private. At least he was making sure the escalation path worked.

Someone who has taken the time to report an issue is probably one of hundreds or thousands who had an issue and didn't think it could be fixed and shrugged it off. We certainly can't fix everything, but weird network shit like this can be fixed, and it's worth escalating, because when you get it fixed, you can also figure out (hopefully) how to monitor for it, so it doesn't happen again.

OTOH, I didn't work for the phone company. We don't care, we don't have to, we're the phone company. https://vimeo.com/355556831 (sorry about the quality, I guess internet video was pretty lowdef in the 70s :P)

myself248 · on Dec 8, 2020

Ex-phone-company here. (Is this the party to whom I am speaking?) I was in installation, but hung out with a lot of the ops crew, and they LOVED interesting problems. The trouble was getting such problems to the ops people in the first place. Good people, bad process.

The most memorable one:

Customer service had been getting calls all morning with a peculiar complaint: A customer's phone would ring, and when they answered, the party on the other end didn't seem to hear them. They seemed to be talking to _someone_, but not the party they were connected to. Eventually they hung up. Sometimes, a customer would place a call, and be on the other end of the same situation -- whoever answered would say hello, but the two parties didn't seem to be talking to each other. Off into the void. They'd try again, and it would work, usually, but repeats weren't uncommon.

So everyone's looking at system logs and status alarms and stuff, and what else changed? There were two new racks of echo-cancellers placed in service last night, could that cause this? Not by any obvious means, I mean e-cans are symmetrical and they were all tested ahead of time. There was a fiber cut out by the railroad but everything switched over to the protect side of the ring OK, didn't it? Let's check on that. Everyone's checking into whatever hunch they can synthesize, and turning up bupkus.

Finally around lunchtime, one of the techs bursts into the ops center, going "TIM! I GOT ONE I GOT ONE IT'S HAPPENING TO ME, PATH ME! okay look I don't know if you can hear me, but please don't hang up, I work for the phone company and we've got a problem with the network and I need you to stick on the line for a few minutes while we diagnose this. I know I'm not who you expected to be talking to, and if you're saying anything right now, someone else might be hearing it, but that's why this is so weird and why it's so important YEAH IT CAME INTO MY PERSONAL LINE and that's why it's so important that you don't hang up okay? I really appreciate it, just hang out for a few, we'll get this figured out..."

Office chairs whiz up to terminals and in moments, they've looked up his DN and resolved it to a call path display, including all the ephemera that would be forgotten when the call disconnects. Sure enough, it's going over one of the new e-cans. Okay, that's a smoking gun!

So they place the whole set of new equipment, two whole racks of 672 channels each, out-of-service. What happens when you do that is the calls-in-process remain up, but new calls aren't established across the OOS element. Then you watch as those standing calls run their course and disconnect, and finally when the count is zero, you can work on it. (If you're doing work during the overnight maintenance window, you're allowed to forcibly terminate calls that don't wrap up after a few minutes, but that's verboten for daytime work. A single long ragchew is the bane of many a network tech!) The second rack was empty of calls in _seconds_, and everyone quickly pieced together what that implied -- every single call that had been thus routed was one of these problem calls where people hang up very quickly. This thing had been frustrating hundreds of callers a minute, all morning.

With the focus thus narrowed, the investigation proceeded furiously. Finally someone pulls up the individual crossconnects in the DACS (a sort of automated patch panel, not entirely unlike VLANs) where the switch itself is connected to the echo-cancellation equipment. And there it is. (It's been too long since I spoke TL1 so I won't attempt to fake a message here, but it goes something like this:) Circuit 1-1 transmit is connected to circuit 29-1 receive, 29-1 transmit isn't connected to anything at all. 1-2 transmit to 29-2 receive, 29-2 transmit to 1-1 receive. Alright, we've got our lopsided connection, and we can fix it, but how did it happen in the first place?

If all those lines had been hand-entered, the tech would've used 2-way crossconnects, which by their nature are symmetrical. A 2-way is logically equivalent to a pair of 1-ways though, and apparently this was built by a script which found it easier to think in 1-ways. Furthermore, for a reason I don't remember the specifics of, it was using some sort of automatic "first available" numbering. There'd been a hiccup early on in the process, where one of the entries failed, but the script didn't trap it and proceeded merrily along. From that point on, the "next available" was off by one, in one direction.

Rebuilding it was super simple, but this time they did it all by hand, and double-checked it. Then force-routed a few test calls over it, just to be sure. And in a very rare move, placed it back into service during the day. Because, you see, without those racks of hastily-installed hardware, the network was bumping up against capacity limits, and customers were getting "all circuits busy" instead. (Apparently minutes had just gotten cheaper or something, and customers quickly took advantage of it!)

jcims · on Dec 8, 2020

Amazing story! My step dad worked night shift at AT&T back in the 80’s and ran the 5ESS. He took my brother and I in for a tour one night. Thinking back on it now it was a lean crew for the equipment they were running. Rows and rows and rows of equipment. I don’t remember closed cabinets, mostly open frames moderately populated. I’ll never forget he showed us some magnetic core memory that was still mounted up on a frame in the switch room. Huuuge battery backup floor as well.

He loved all of that stuff, absolutely hated when everything went to computers. Quit and became a maintenance man at a nursing home, commercial laundry repair guy then finally retired this year in his late 70’s (due to Covid) after working maintenance at a local jail.

myself248 · on Dec 8, 2020

That's super cool!

I believe the #5 ESS machine itself is always in closed cabinets, so it's likely that what you're remembering was the toll/transport equipment, or ancillary frames. Gray 23-inch racks as far as the eye can see!

Depending on how old that part of the office was, they were likely either 14' or 11'6" tall with rolling ladders in the aisles, or 7' tall and the only place they'd have laddertrack was in front of the main distributing frame.

As for magnetic core, if you could see it mounted in a frame, what you probably saw was a remreed switching grid, which is a sort of magnetic core with reed-relay contacts at each bit, so writing a bit pattern into it establishes a connection path through a crosspoint matrix. It's not used as storage but as a switching peripheral that requires no power to hold up its connections. (Contrast with crossbar, which relaxes as soon as the solenoids de-energize.)

Remreed was used in the #1 ESS (and the #1A, I believe), and is extensively documented in BSTJ volume 55: https://archive.org/details/bstj-archives?&and[]=year%3A%221...

jcims · on Dec 8, 2020

You’re definitely on to something. This image from Wikipedia for the #1 ESS fits very well into my fuzzy memory, especially those protruding card chassis:

https://upload.wikimedia.org/wikipedia/commons/7/7e/1A_Frame...

I just remember thinking it looked awkward getting to the equipment under them.

I don’t know if the ‘5E’ as he called it was actually in operation yet, he ended up moving us all out of state to take a job developing and delivering training material for it...I think that’s what finally broke him lol. Hands on kinda dude.

I’ll have to hit him up later today to see if he remembers ‘remreed’ (he will). Thanks for the info!

myself248 · on Dec 8, 2020

Yup, the #1 used computerized control, but all the switching was still electromechanical, so it sounded like a typewriter factory, especially during busy-hour.

At night, traffic was often low enough that you could hear individual call setups and teardowns, each a cascade of relay actuations rippling from one part of the floor to another. The junctor relays in particular were oddly hefty and made a solid clack, twice per call setup if I recall correctly, once to prove the path by swinging it over it to a test point of some sort, and then again to actually connect it through. On rare occasion, you'd hear a triple-clack as the first path tested bad, an alternate was set up and tested good, and then connected through.

Moments after such a triple-clack, one of the teleprinters would spring to life, spitting out a trouble ticket indicating the failed circuit.

The #5, on the other hand, was completely electronic, time-division switching in the core. The only clicks were the individual line relays responsible for ringing and talk battery, and these were almost silent in comparison. You couldn't learn anything about the health of the machine by just standing in the middle of it and listening, and anyone in possession of a relay contact burnishing tool will tell you in no uncertain terms, that the #5 has no soul.

jcims · on Dec 8, 2020

YES! He worked third and we were there all night. He pointed those sounds out to us, it was so cool.

kayfox · on Dec 8, 2020

Theres a telco museum in Seattle called the Connections Museum, it has working panel, #1 crossbar and #5 crossbar switches and a #3ESS they are working on getting running again.

https://www.telcomhistory.org/connections-museum-seattle/

jl6 · on Dec 8, 2020

Great story.

> including all the ephemera that would be forgotten when the call disconnects

Interesting to know there is information which is not logged. I’m guessing keeping this info, even for a day, would have helped isolate the issue?

How did the echo cancellers pass testing?

myself248 · on Dec 8, 2020

They passed testing because they had each been individually crossconnected to a test trunk, and test calls force-routed over that trunk. Then to place them in service, the crossconnects were reconfigured to place them at their normal location in the system. The testing was to prove the voice path of each DSP card, and that those cards were wired into the crossconnect properly.

All that was true, the failure happened when it was being taken out of testing config and into operational config. Either nobody considered that that portion could fail, or the urgency to add capacity to a suddenly-overloaded network meant that some corners were cut. (Marketing moves faster than purchasing-engineering-installation...)

Oh, and as to the point about keeping the call path ephemera. Yeah probably, but in a server context, that'd be akin to logging the loader and MMU details of how every process executable is loaded and mapped. Sure, it might help you narrow down a failing SDRAM chip, but the other 99.99999% of the time when that's not the problem, it's just an extra deluge of data.

jcims · on Dec 8, 2020

Were the cross-connected circuits channelized or individual voice calls (ds0? I can’t remember from my wan days) or something else?

myself248 · on Dec 8, 2020

As I recall, the cross-connects were done at the DS1 level, and an individual card handled 24 calls. These are hazy, hazy memories now; this took place in 2004-ish.

jcims · on Dec 8, 2020

Nice!! Thanks for the walk down memory lane, this was cool.

jcims · on Dec 8, 2020

Not OP but it sounds like the echo cancellers were fine, the interconnect to the switch was misconfigured. Rather than sending both channels of audio to opposite ends of the same call, one channel got directed to the next call.

The funny thing is that if everyone played along they could have had a mean game of telephone going.

byecomputer · on Dec 8, 2020

This feels like the kind of anecdote I'd overhear my paint-covered neighbor Tom telling my dad when I was 10, and my dad would be making a racket over it, really bent over twice. I'd always be like, "what's so funny about that?" But you get older and you realize not many people tell _actually_ interesting stories, so I guess you do what you can to make them want to come around and tell more.

mustknow2201 · on Dec 8, 2020

Cool story, thanks! Perhaps you can solve a mystery phone hiccup that happened to me a few years ago? I called a friend (mobile to mobile if it matters) and, from memory, about 20 minutes into this call I get disconnected, _but_ I instantly end up on a call with an elderly stranger instead, who seemed pretty irritated she was now on the phone with me. I was surprised enough that she hung up before I could form a coherent sentence to explain what had happened so I've no idea if she was trying to ring someone or if the same thing happened to her or if she'd dialed my number by accident. From what I remember it seemed like she was also already mid-conversation as well though.

toast0 · on Dec 8, 2020

Thank you for sharing. And for helping the phones just work, so we can complain so much when they don't :)

Have you heard any of Evan Doorbell's telephone tapes[1]? It's a series of recordings mostly from the 1970s, but with much more recent narration, exploring and sort-of documenting the various phone systems from the outside in. Might be interesting to see what they figured out, and what they didn't :)

[1] http://www.evan-doorbell.com/

BlueTankEngine · on Dec 8, 2020

This is a super cool story, thank you so much for taking the time to type this out!

gigatexal · on Dec 8, 2020

What’s fascinating read! Gotta error check my scripts.

havish · on Dec 8, 2020

A very similar problem is happening currently in india with Jio. I wonder if anyone from there had seen this.

ornornor · on Dec 8, 2020

Cool story, thanks for sharing!

runawaybottle · on Dec 8, 2020

Anecdote from my past:

I used to play counter strike/Starcraft in my middle school years. I pretty much figured out I had consistent packet loss with a simple ping test. I was on the phone with Time Warner every other day for months. They kept sending the regular technicians, at one point ripping out all of my cable wires in the house to see if it fixed the problem. Nothing worked, I kept calling, at this point I had the direct number to level 3 support. They saw the packet loss too. Finally, after two or three months they send out a Chief Engineer. Guy says I’ll look at the ‘box’ on one of the cable poles down the block. He confirmed something was wrong at that source for the whole area. Then it finally got fixed.

Took forever dealing with level 1 support, and lots of karenesque ‘can I talk to your supervisor please’, but that’s literally what it took.

So yeah, if you want stuff like this fixed, stay professional, never ever curse, consistently ask to speak to the supervisor, keep records, and keep calling.

Small shout out to the old http://www.dslreports.com/ for being a great support community during the early days of broadband for consumer activism in terms of making sure you got legit good broadband.

cpach · on Dec 8, 2020

I had similar issues for a long time. Dealing with my ISP’s support was really frustrating. Not once did a 2nd or 3rd line technician get in touch with me to acknowledge that they had done any kind of investigation and analysis of the intermittent issues that I kept experiencing. The ISP did send out a guy that replaced the optical transceiver in my end, but to me it just felt like a wild guess and not really something that they did because they had any specific reason to believe that the transceiver was actually faulty. It didn’t help.

I ended up just cancelling the service and signing up for one of their competitors instead.

cbozeman · on Dec 8, 2020

The real problem is that it shouldn't take this much bullshit and rigmarole from the very beginning.

m463 · on Dec 8, 2020

I had a ginormous AT&T router/modem (pace 5268ac) with a set of static ip addresses and a few times, AT&T just stopped routing traffic to it.

It had happened before and then magically fixed itself a few days later.

One time I had a week of outage with AT&T basically said the problem was on my side. They could ping the modem, and then punted. I had several truck rolls. The techs were really nice guys, but were basically cabling guys, better for finding a bad cable than debugging a packet loss. The problem for me was that my ipv4 static ip addresses would not receive traffic.

I was at wit's end after a week and I debugged the thing myself. By looking at EVERY bit of data on the router, I found mention of the blocked packets in the firewall log. I would clear all the logs, and found even with the firewall DISABLED, the firewall log would see and block incoming packets I was sending using my neighbor's comcast connection.

I called AT&T, but this time mentioning "firewall is completely off, but packets are blocked by the router and showing up in the log" was concrete enough for them to look up a (known) solution.

The fix was to disable the firewall, but to enable stealth mode. wtf?

To be clear, this was a firmware bug, and caused dozens of calls to AT&T, lots of heartache and finger pointing always in my direction.

I should also mention at the start of this fiasco, I checked the system log and noticed they pushed a firmware update to the modem at the time the problem started. Strangely after one call to the agent, that specific line disappeared out of the log file, but other log entries remained. hmmm.

since then, they basically screw up my modem every month or two - they push new firmware and new "features" appear (like the one that sniffs and categorizes application traffic like "youtube" and "github"). It also helpfully turns wifi BACK ON when I had disabled it. I immediately go turn if back off and then they immediately send me a big warning email that my DSL settings have been changed.

Shank · on Dec 8, 2020

The 5268ac pace router is the worst ISP provided router I've ever had, and I've been an Xfinity/Comcast customer, and I've even had a connection in Wyoming. I detailed my experience with it in a review of a third-party router, and found numerous issues along the way [0]. My favorite is that DMZ+ mode, which is what they offer instead of a traditional DMZ mode, just has some weird MTU issue that leads git and other services to break horribly when running behind a third party router. The solution? Don't use DMZ+ mode. Instead, put the router into NAT mode, and then port forward all of the ports to one private IP address. Bonkers. This is sold as an official-looking solution on the AT&T website for a "speed issue." [1]

This is all because AT&T believes that the edge of their network is not the PON ONT/OLT, but rather, the router they issue you. If you want to be on their network, you have to use their router as some part of the chain.

My latest discovery is that in doing this, the router can actually get super hot operating at gigabit speeds for extended periods of time. When this happens, it magically starts dropping packets. Solution? Aim a fan at the router so it has "thermal management."

Total. Garbage. I'd switch to Spectrum if they had decent upload speed, but alas, they don't in my building.

[0]: https://particle17.xyz/posts/amplifi-alien-thoughts/#appendi...

[1]: https://www.att.com/support/article/u-verse-high-speed-inter...

exged · on Dec 8, 2020

If you have some time, you can MITM the 802.1x auth packets [1] and use a less crappy router. I run this with a VyOS router and the same 5268ac that you have, but it works with things like Ubiquiti routers too. The only catch is you need three NICs on your router, but a cheap USB 10/100 one will do for the port that connects to the 5268ac.

Another option is getting the 802.1x certificate out of a hacked router, but it's not possible as far as I know on the 5268ac. You could buy a hackable ATT router but they're not cheap. Some sellers even sell the key by itself.

Mysteriously, doing this fixed an issue I previously had where SSHing into AWS would fail.

[1] https://github.com/jaysoffian/eap_proxy

ThatPlayer · on Dec 8, 2020

There's also one for pfsense, which is what I used before I dumped my cert out of my router

https://github.com/MonkWho/pfatt

stevencorona · on Dec 8, 2020

Huge bummer but the next generation of ATT routers with onboard ONT don’t work this bypass :(

Rychard · on Dec 8, 2020

Do you know the model numbers and/or have any other information about these new routers?

I'm currently using eap_proxy with my BGW210, and it's been a huge improvement, but I fear the day the device needs to be replaced with a newer model.

lights0123 · on Dec 9, 2020

BGW320 is the new model, which I had installed about a month ago. It isn't a simple swap, as it uses a SFP module combined with the modem's internal ONT instead of a separate ONT, so I've heard it's only used in new installations. More about it: https://www.dslreports.com/forum/r32605799-BGW320-505-new-ga... (although theirs says 1550nm while mine says 1310nm)

However, it has 5Gbit Ethernet, hasn't re-enabled WiFi on automatic firmware updates, and has only screwed with my IP Passthrough configs once which was resolved with a router reboot. (that was possibly my router's fault, it seemed like it was unable to fetch a new DHCP lease)

amluto · on Dec 8, 2020

Apparently you can extract the 802.1x key from the router and then use your own router, and someone even has a script to MITM the connection between the router and ONT.

14u2c · on Dec 8, 2020

That is absurd that AT&T requires the use of a rented gateway for U-verse. I've never had an issue with a another provider refusing to support off the self hardware before, including with a DSL provider, multiple cable companies, and FiOS (Ethernet on ONT).

taurath · on Dec 8, 2020

They like customer data

derangedHorse · on Dec 8, 2020

AT&T sucks, congress should be requesting their leaders in for a congressional hearing along with Zuckerberg lol

porknubbins · on Dec 8, 2020

And conversely its always surprising and disarming when you call a company and actually get through to a knowledgable employee. I was so surprised to hear “thats a firmware bug we know about and there is no update yet” about my router issue that I forgot to be mad at the company for not caring my router is broken.

User23 · on Dec 8, 2020

I had a problem with a PowerMac G3 back in the day and I somehow managed to escalate up to tier 3 which is to say an Apple HW engineer. He was brusque bordering on rude, but he immediately recognized it was a problem with an undocumented jumper setting on the motherboard and solved my problem inside of two minutes. It definitely increased my customer satisfaction.

sneak · on Dec 8, 2020

FWIW, this has the hallmarks of an interaction within the context of an abusive interpersonal relationship.

coldpie · on Dec 8, 2020

A few years back, I called up our local newspaper to start a subscription. Called the number on the website, and a real human person answered the phone. I was so surprised that it wasn't at least an initial phone tree that I actually stumbled and had to apologize and explain myself.

pakl · on Dec 8, 2020

That’s why you need to sign up for CSA Pre™. Get preauthorized for instant escalation on customer support calls.

All you have to do is answer a form with questions like: Do you know how to plug in a computer? Do you know where the power switches are on your devices? etc.

CSA Pre™ is valid for 5 years; you can initiate the renewal process to do 6 months before expiry.

ovi256 · on Dec 8, 2020

You were joking but I really wished you weren't and this service existed!

ornornor · on Dec 8, 2020

And it’s a steal at 99.90/month with a 24 months contract!

orlp · on Dec 8, 2020

At this point we unironically need shibboleet.

https://xkcd.com/806/

LeonM · on Dec 8, 2020

Story time!

I've had this exact experience, except that it wasn't a dream.

Back in 2010 I had a weird issue where my cable connection would sometimes completely block the connectoin right after the DHCP response (we had dynamic IPs back then). This would go on for a couple of hours, until the IP lease expired, then my connection would come back. Luckily, I was running an OpenBSD box as my router which allowed me to diagnose the problem. But it was also impossible to explain to the servicedesk employees.

One evening it happened again, and I called the servicedesk, totally prepared to do the 'yes I have turned it off and on again' dance. But to my surprise the employee that I got on the phone was very knowledgable and even said that it was very cool that I had an OpenBSD box as a router. He very quickly diagnosed that someone in my neighbourhood was 'hammering' the DHCP service by not releasing his lease (a common trick to keep your IP address somewhat static). This caused a double IP on the subnet, and the L2 switch to block traffic to my port.

He asked me "do you know how to spoof your MAC with an OpenBSD box?". Then I knew this guy was legit. He instructed me to replace the last 2 bytes of the MAC with AB:BA (named after the music group). They had a separate DHCP pool for MAC addresses in that range. If they ever saw an ABBA mac address on their network, they knew it was someone who had connectivity issues before.

The problem was immediately solved, and I had a rock-solid internet connection for years, with a static IP!

I ended up chatting about networking and OpenBSD a bit, before I (as humble as I could) told the guy I was a bit flabbergasted that someone as knowledgable as him was working on the servicedesk.

It turned out, he was the chief of network operations at the ISP (the biggest ISP in my country). He was just manning the phone while some of his colleagues from the servicedesk were having diner.

Sometimes miracles do happen.

mschuster91 · on Dec 8, 2020

Many "phone robot" systems can be overridden by mashing on the keyboard, shouting or swearing - these will get redirected to a live human ASAP.

robgolding · on Dec 8, 2020

Andrew’s & Arnold (UK-based ISP) actually are compliant with XKCD 806! https://www.aa.net.uk/broadband/why-choose-aaisp/

duxup · on Dec 8, 2020

Way back in the day I worked on equipment that straddled telco circuits. T1s, E1s, DS3s, OC whatever. Companies paying big money for those circuits.

Anyway I was told on more than one occasion by different telcos that the standard operating procedure for many techs was to take the call, do nothing, and call back 20 minutes later and ask if it looked better because ... often enough it did.

rgj · on Dec 8, 2020

When I started my job as an IT director 10 years ago I was in the customer support room and there was one guy who was known for solving all the hard problems. I was standing behind him when he was talking a call. He patiently listened to the client, then loudly typed random stuff on his keyboard for twenty seconds or so, making sure the client could hear the frantic typing, sighed, and then asked “Is it better now?”

It always was.

Nextgrid · on Dec 8, 2020

The problem is that the idiots the hire to do their "technical" support have zero skill (nor motivation to learn) to assess whether it's a 5% problem or not, and the majority of end-users aren't capable to answer that question either, nor that they are incentivized to answer truthfully.

The solution could be a priority support tier where you pay upfront for an hour of a real network engineer's time (decently compensated so that he actually cares about solving the problem) and the charge is refunded only if the problem indeed ends up being on the ISP's side. This should self-regulate as anyone wasting the engineers' time for a simple problem they could resolve themselves would pay for that time.

imajoredinecon · on Dec 8, 2020

I realize this was written from a position of frustration (which I share) at getting run around by customer support, but I'd reconsider the blanket characterization of tech support staff as "idiots": they're doing a high-throughput job following a playbook they're given with, as you identify, no incentive---it's probably less about personal motivation than the expectations that are set for how they perform their job---to break rules to provide better customer service to people with 5% problems.

dilyevsky · on Dec 8, 2020

+1 the real idiots here are att mid-upper management that setup this process and also have zero monitoring for packet loss/bit flips apparently so that they have an outage for weeks now. Support techs have no training nor tooling to debug this issue.

matthias509 · on Dec 8, 2020

I cope with level 1 support by remembering we have a common goal: to stop talking to each other as quickly as possible. The tech just wants to close the case and I want to talk to someone else who can actually help me.

User23 · on Dec 8, 2020

That’s a great example of finding how you’re aligned and then using it to get a mutually beneficial outcome.

cpach · on Dec 8, 2020

Good point. The managers would probably give negative feedback to people that took more time on their calls in order to try to help the customer better.

GauntletWizard · on Dec 8, 2020

Who decides if it's the ISP's problem? What if it's "Both"?

About 20 years ago I got escalated to high-tier Comcast support for an issue that turned out to be a little of A, a little of B: Comcast (Might have been @Home, based on the timing) required that your MAC address be whitelisted as part of their onboarding process. Early home routers had a "MAC Address Clone" feature for precisely this reason. At some point, the leased network card got returned to Comcast. Our router continued to work just fine... until about a year later, when the local office put that network card into some other customer's home. We started getting random disconnects in the middle of the day, and it took forever to diagnose, as the other customer was not particularly active with their internet use. Whose fault was it? Ours, for ARP Spoofing? Theirs, for requiring the spoofing?

saagarjha · on Dec 8, 2020

How did you even manage to figure this out? I’ve never gotten anyone on the phone who could possibly help in a situation like this.

GauntletWizard · on Dec 8, 2020

Wireshark and escalation to a competent tech. I believe they saw weird traffic from their DHCP server, and we were able to attach an ethernet hub (Not switch, a 10-Base-T Hub that repeated the signal on each port) along with a laptop that was running Ethereal (Before the name changed! How long ago that was now) and see the arp packets fighting.

jlgaddis · on Dec 8, 2020

> I believe they saw weird traffic from their DHCP server, ...

That makes sense.

When the cable modem issued a DHCP request, the CMTS would have been configured to insert some additional information (a "circuit-id") into the DHCP request as it relayed it to the DHCP server.

The short version is that the "competent tech" looked at the logs from the DHCP server, which would have showed that the "same" cable modem (i.e., MAC address) was physically connected to either 1) two different CMTS boxes or 2) two different interfaces of the same CMTS.

How would one cable modem be physically present in two different locations at the same time? Obviously, it wouldn't.

At that point, either 1) there are two cable modems with the same burned in address or 2) one of the two cable modems is cloning/spoofing its MAC address. Which one of those is more likely?

(If you're interested in the details, try "DHCP Option 82" as your search term.)

sagarm · on Dec 8, 2020

To be fair, the vast majority of technical support calls likely require only customer service rather than technical skills. Anyone with technical chops probably wouldn't last long in that environment.

myself248 · on Dec 8, 2020

A friend of mine used to work tech support, and said that from her perspective, a lot of the effort was trying to counter-outsmart customers who thought they were too smart.

For instance: "This might just be a loose cable, can you unplug each end of it, and plug it back in?" invariably elicited an "I've already done that", or a brief pause followed by "okay, there, I did it, do you see it?". Lies, often.

But: "Alright, I want you to try turning the cable around for me, yeah swap it end for end. Sometimes you get a connector that just doesn't fit quite right, but it works the other way around and it's faster than sending a tech to replace it", would often get a startled "Oh! I hadn't thought of that, one moment..." and then the customer actually DOES unplug the thing, and what do you know, they click it in properly this time.

wccrawford · on Dec 8, 2020

I worked for a call center and there was a girl there I didn't think was good at her job.

Then one day, she tells me that she tells people to unplug the power cord, cup the end in their hand, and then plug it back in.

Suddenly, I really liked her. It's a genius move that makes them think they've done something obscure, but she really just wanted them to actually check the cable.

wccrawford · on Dec 8, 2020

I worked for a computer support once, and just as I was hired, they went from (IIRC) 6 weeks of training to only 2 weeks. Nobody came through that knowing any more than they started with in regards to fixing computers.

Luckily, I already knew how to fix them. I found the job to be a cake walk and quite liked helping people. But I had to listen to people around me fumble through it.

It was frustrating at the time, but my favorite thing that happened was that I was admonished twice for having average call times that were too low. To them, that's a warning that someone is just getting people off the line without fixing their problems.

They monitored calls and they said I'd never received a complaint, but the system would keep flagging me for low call times so I had to artificially raise them. They suggested that I have a conversation with the clients.

I didn't, and I didn't stay there much longer, but it was quite a crazy situation. But I also felt much less pressure to handle calls quickly after that, too, which was nice.

thrower123 · on Dec 8, 2020

A huge percentage of companies have their help desks and customer support outsourced to TCS or Cognizant or Accenture, etc.

Everything they do is driven by metrics, and their contracts are written around KPIs like maintaining a ludicrously short average time to answer, short handle times, and open/closed ticket ratios. If they do not hit these metrics, then the outsourcing company owes service credits. The incentives do not align with making customers happy and solving their problems. Everything is geared towards deflecting users with self-help options, simple scripts for the agents, and walking the line of hitting those metrics with the fewest number of warm bodies.

It's a pretty hellish business.

myself248 · on Dec 8, 2020

This blows my mind, because why would a customer come back when they're treated like that? Oh right, because they don't have anywhere else to go.

For the last several years, I've gotten my mobile service through an MVNO named Ting. On the rare occasion that I've needed to call support, there's no IVR, just a human who typically answers on the second ring. They speak native English, and have never failed to solve my problem either immediately or with one prompt escalation.

They're so jarringly competent I wonder how they still exist, if being obnoxiously incompetent is apparently a business requirement.

peteretep · on Dec 8, 2020

> If there's anyone out there designing tech support procedures, you should add an "is this a 5% problem?" question to whatever checklist you give to support staff.

... only if you want people to actually be able to escalate. Suspect AT&T are going to lose a lot less money over this issue than hiring an extra high quality support person would cost

nikanj · on Dec 8, 2020

Many companies have done the math, and realized they make more money if they just let the 5% customers leave.

mjevans · on Dec 8, 2020

It's even worse; the customers have no real choice of anyone else to leave _to_.

joshspankit · on Dec 8, 2020

100% you’re on point here, but there’s one problem that happens when they do add “is this a 5% problem”: eventually (and I’d say pretty quickly), the public gets wind of “if I say the right things to get marked as a 5% problem, I get an automatic escalation to someone who knows more.”, and suddenly you get a big chunk of level 1 calls in to the upper tiers.

For evidence, see: “Retention departments give you the best rates if you threaten to cancel” (which then caused companies to have to rename the retention departments and change the policies.

cpach · on Dec 8, 2020

Obligatory XKCD reference: https://xkcd.com/806/

shpx · on Dec 8, 2020

https://xkcd.com/806/