Traditional monitoring tools like logs and metrics were necessary but not sufficient to debug how and where systems failed in CI, which relies on multiple, interconnected critical systems (e.g. GHE, Checkpoint, Cypress).
In this talk, Frank Chen shares how traces gave us a critical and compounding capability to better understand where, when, how, and why faults occur for our customers in CI. We share how shared tooling for high-dimensionality event traces (using SlackTrace and SpanEvents) could significantly increase our velocity to diagnose code in flight and to debug complex system interactions. We go from stories with early incidents that motivated further investment throughout Slack’s internal tooling teams to stories about gains in performance and resiliency throughout our infrastructure.
Related, are the databases and storage (e.g., s3) populated with fake data for the test runs before hand?
At my current place we have a couple separate services, but haven't built out e2e tests.
- new server
- new database copy (anonymized) (same db server)
- shared s3 (dev)
- shared microservices (dev)
For other services we mostly only have dev/stage/prod so it's easier.
You also have the Observability podcasts
It seems like you're not at Plaid anymore, but if you don't mind there's something I've been trying to figure out for a while and would appreciate your opinion/thoughts.
It has to do with the way Plaid works. Seems to me when financial institutions have account security liability, such 'security guarantees' tend to be contingent on the account holder _never disclosing credentials_. Given the nature of how Plaid works , it would seem Plaid users could be taking considerable risk.
I'm wondering if my understanding is right -- do account holders forfeit protection 'guarantees' their financial institution might offer when they use Plaid? Do financial institutions consider Plaid an 'authorized user' in some special way? Quite possibly I'm missing something and was hoping to understand. Thanks!
Basically, your rights to be protected against unauthorized transfers are provided under Regulation E, which gives customers the right to address unauthorized transactions from their accounts. Under Reg E, once a consumer properly notifies their financial institution of an unauthorized transaction within a specific amount of time, the financial institution is obligated to limit the consumer’s liability for the unauthorized transaction.
If you provide proper notice to your bank under Reg E, your bank cannot waive their liability, even if you shared account information with a third party. The Consumer Financial Protection Bureau (CFPB) made this explicit in a recently published Compliance Aid. Quoting the relevant section from their FAQ below:
"Q: If a financial institution’s agreement with a consumer includes a provision that modifies or waives certain protections granted by Regulation E, such as waiving Regulation E liability protections if a consumer has shared account information with a third party, can the institution rely on its agreement when determining whether the electronic fund transfer was unauthorized and whether related liability protections apply?
A: No. The Electronic Fund Transfer Act (EFTA) includes an anti-waiver provision stating that “[n]o writing or other agreement between a consumer and any other person may contain any provision which constitutes a waiver of any right conferred or cause of action created by [EFTA].” 15 U.S.C. § 1693l. Although there may be circumstances where a consumer has provided actual authority to a third party under Regulation E according to 12 CFR § 1005.2(m), an agreement cannot restrict a consumer’s rights beyond what is provided in the law, and any contract or agreement attempting to do so is a violation of EFTA."
Plaid makes transactions on your behalf, it's arguable whether they are unauthorized transactions.
To quote a similar case from real life. When your wife empties your bank account, you may not be able to get your money back because that doesn't constitute an unauthorized transaction. Also, nobody will flag it or alert you, because they don't think of it as abnormal.
But so much infrastructure is a mishmash of in-house, opensource and closed source. Even the opensource stuff you probably don't want to maintain patches to integrate your tracing tool. In that case, you'll end up with big gaps in traces, making it less useful.
It's often underrated how much an otherwise competent team can crumble under technical debt (of which CI and test flakiness is one specimen).
It sounds plausible to me that slack's rapid growth curve put them in the perfect position to accrue massive amounts of debt from which they are struggling to wade their way through.
Before you have it, you should be willing to accumulate large amounts of tech debt both because your product might change enough that you'd be rapidly throwing out large amounts of code anyways and because having product-market fit is such an overwhelming need that almost everything should be traded against it because your startup is dead without it.
After you have it, you have to be much more diligent about paying down tech debt, because you switch from a more "exploratory" mode to a "refinement" mode where you now are unlikely to be throwing away large swaths of your code at the drop of a hat, and instead need to iteratively build upon what you already have and that becomes very difficult to do with large amounts of tech debt.
I honestly think that the glacial pace of software engineering is mostly due to this phenomenon -- "where can we save a little bit of time today?" and then it snowballs into entire teams not getting anything done. The interest on the loan they took out is higher than their income. I guess that's why they call it technical debt. (Does that make Scrum Masters technical loan sharks?)
Developers who don't get that intuitively quickly learn it when they get the wrong type of questions after the wrong type of updates.
Tail wagging the dog.
Yes. But also requires management who can understand and value what they have in that team.
Their support is great and responsive though (I've reported every issues I've encountered)
Meanwhile on slack theres multiple settings to toggle to the point it seems easier to just uninstall the app for a week.