File a support ticket. Wait. Watch the "SLA" tick by. Finally get a meaningless response back that asks basic questions covered by the initial ticket. Repeat the answers to those questions. Get back suggestions that show no knowledge or understanding of the system being "supported". Attempt to seek clarity from the support agent, get asked "when are you available for a meeting?". This doesn't require a meeting, but send availability anyways. Get meeting invite from Azure for meeting ~2 femtoseconds prior to the meeting. Get asked things already covered in the support ticket, again. Try to make out the representative in what is clearly a jam packed call center. They'll escalate the ticket to an engineer, great. Weeks go by, days turn into years. You settle down, you get married, start a family, watch your children grow, forget all about Azure until one day: "We haven't heard back from you, so we'll be closing the ticket."
Bad account isolation seems to be a habit at Azure. I'd guess any customer of theirs is fine with this. Maybe they would not express this sentiment out loud while any lawyers could be listening, but it's implied.
Considering how terribly Teams handles multiple accounts, I've lost faith in Microsoft Authentication in general, let's just pray GitHub Auth doesn't get absorbed
Uh, then your not metricing your service correctly.
You should be collecting metrics on the basics of the way your service operates and in a steady state at scale even a 1/2% drop in messages should be readily noticable and likely monitored.
That doesn't make any sense technically and sounds a lot like victim blaming.
It is far from certain that any application has such a "steady state", most of the ones I've worked on sure don't. There are obviously ways to analyze things and correlate enqueued and dequeues, but it is far from as simple and black and white as you suggest, especially with truly distributed systems and unknown cause of the reported behavior.
Heck, we don't even know if the messages are being "dropped" or just duplicated.
Receive a page. Look at the monitor: the AWS service appears down. Check the status page: all green. Double check the logs, check the configs. They seem correct. It's been 20 minutes, refresh the status page. Green. A suspicious shade, too. File a support ticket. Wait. "Request ID, or it didn't happen." Find the relevant code paths. Log the request ID. Redeploy to production. Trigger another instance of the issue. Check the logs. Fish out the now-logged Request ID. Response to support. Wait. Check the status page for giggles: ever green. "Okay, we've escalated this to an engineer." Excellent. "Can you upgrade to the latest version of the service?"
---
To be fair, I find I have to contact AWS support far less often, and honestly, if you do have a request ID in hand … they're far more receptive. But boy if you don't have that ID, it doesn't matter if you're seeing 2+ minute latency from S3 within AWS just to fetch a 1 KiB blob, it isn't happening.
And the status page is lies, but lying on the status page appears to have become industry SOP.