Scaling to Count Billions

tibbar · 2024-04-30T17:43:04

The most interesting thing here is probably that they sometimes change their mind as to how to aggregate billing events, and used to have to manually fix the database if they changed their mind. The system they've arrived at seems to mostly handle recalculations, but they're making a pretty big assumption that you can always aggregate data on daily intervals. The day will come when they need to aggregate data over periods that are not a single day, and then they'll have to figure out what happens if they've already billed for some prior usage events when they go to re-run their automated pipeline...

dax77 · 2024-04-30T17:48:16

Yeah, usage rating and billing systems get wildly complex. Especially when you start considering edge cases- beyond leap years and handling time zones, things like a billable event spanning days (run a few minutes just before midnight), etc.

nyrikki · 2024-04-30T18:33:42

The same mistake we as an industry are doomed to repeat.

RDMS are like Leatherman, while they can do lots of things, the only thing they do well is being a set of needle nose pliers.

Relational databases are great for lots of people doing lots of transactions on small sets of data.

Everything else is a compromise.

It is one of the areas that is completely rational to use when small, but is worth putting in boundaries from the beginning.

I have never regretted a persistence layer abstraction, but have been held hostage by not.

willsmith72 · 2024-04-30T19:27:39

To verify that you first have define "small". I've seen a large postgres db handling what I could call a large amount of data, and in a separate situation seen it scaled horizontally through sharding to an even larger scale.

Premature abstractions cause pain

nyrikki · 2024-05-01T00:06:30

When your product depends on OLAP is a good indicator, SQL has always been bad at Analysis. But by 'small' I was talking about code complexity and maintainability, not DB size.

"Premature abstractions" is typically about avoiding being to DRY, not to preventing yourself from marrying a DBMS that you haven't even dated yet for the remainder of your products life.

Note what they called out on their blog.

> MySQL RDS does not horizontally scale through, for example, partitioning by itself. Therefore we doubled the RDS instance size every time we required more storage.

> Maintaining such a RDS required significantly more engineering efforts than we expected. For example, the database was initially shared with other critical features and any downtime could cause severe impacts to those functionalities.

> Our next step was to move the rest of the data to DynamoDB, which would require rewriting the majority of the code. After evaluating the pros and cons, we decided not to proceed in this direction.

Whenever you have a vendor, no matter if the product is free or ...Oracle, de-risking future unknowns with an abstraction is something to consider. You don't know what your future needs, what that external entity will do, or if the product fits your needs.

You can choose to kick the can down the road and pray, or you can spend the extra 5 minutes of a red-green-refactor cycle to do something as simple as dependency inversion, persistence ignorance or what ever gives you a two way door.

Typically abstracting your DBMS has early pay offs that are worth it even without the above, like not needing to instantiate a database to do testing of business logic.

The issues with RDBMS barriers to exit and conversion costs have been known for a very very long time.

As an example:

NBS Special Publication 500-84 (1981) https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nbsspecialpublic...

> 4.5»4 Portability. The final important cost of a DBMS is its lack of portability. Only a few commercially available systems run on more than one kind of equipment, and agencies that want to change hardware may find that they are unable to do so without also changing their DBMS. One of the managers we interviewed decided to continue with the same DBMS in order to avoid conversion costs for existing applications. As a result his hardware procurement was limited to a major vendor and some smaller companies that sell "plug compatible" computers. Any agency that uses a DBMS extensively must recognize that its investment in application programs represents a commitment to a particular DBMS and, in many cases, to a particular hardware vendor.

difufnfbcos · 2024-05-01T03:28:22

Did they profile a standalone MySQL instance? It seems like a natural next step. Their big(o) reasoning seemed a bit too theoretical to justify anything.

derekperkins · 2024-05-08T01:05:58

This. We're doing billions of daily usage aggregations in MySQL with generally less than 5s delay

kwillets · 2024-04-30T18:07:25

I'm guessing the next stage will be to keep the raw data and aggregate dynamically. We went through a similar progression with ad audiences -- maintaining pre-aggregated data sketches and summaries was such a PITA that we likely saved money by moving to a larger raw-facts database.

> OLAP databases are not good at serving large volumes of requests with low latency within milliseconds.

Just use Vertica lol.

lima · 2024-04-30T18:14:51

Use Clickhouse - it's specifically built for this use case (low latency OLAP).

derekperkins · 2024-05-08T01:06:41

It also has materialized views built in so you don't have to run dbt manually

kwillets · 2024-04-30T20:20:38

Yes, I almost added that -- there are several alternatives, but it takes some testing and tuning to be certain.

j-cheong · 2024-04-30T19:42:25

Kind of unrelated but I'm curious if there are any really robust usage-based billing solutions out there. Curious how they're architected to solve usage-based billing across their customers/various use cases.

I'm always concerned about automating the billing process and risking accuracy/trust.

dax77 · 2024-04-30T20:21:28

Zuora is a well-known Enterprise-scale commercial option that displaced many others as “SaaS” took off several years ago (and its associated accounting standards).

Depending on complexity, Netsuite can address some moderate scale use cases. Stripe, Chargebee, etc address more of the SMB-scale needs.

derekperkins · 2024-05-08T01:03:52

Avoid Chargebee at all costs. I've never made an architectural decision I regretted more.