The most interesting thing here is probably that they sometimes change their mind as to how to aggregate billing events, and used to have to manually fix the database if they changed their mind. The system they've arrived at seems to mostly handle recalculations, but they're making a pretty big assumption that you can always aggregate data on daily intervals. The day will come when they need to aggregate data over periods that are not a single day, and then they'll have to figure out what happens if they've already billed for some prior usage events when they go to re-run their automated pipeline...
Yeah, usage rating and billing systems get wildly complex. Especially when you start considering edge cases- beyond leap years and handling time zones, things like a billable event spanning days (run a few minutes just before midnight), etc.
To verify that you first have define "small". I've seen a large postgres db handling what I could call a large amount of data, and in a separate situation seen it scaled horizontally through sharding to an even larger scale.
When your product depends on OLAP is a good indicator, SQL has always been bad at Analysis. But by 'small' I was talking about code complexity and maintainability, not DB size.
"Premature abstractions" is typically about avoiding being to DRY, not to preventing yourself from marrying a DBMS that you haven't even dated yet for the remainder of your products life.
Note what they called out on their blog.
> MySQL RDS does not horizontally scale through, for example, partitioning by itself. Therefore we doubled the RDS instance size every time we required more storage.
> Maintaining such a RDS required significantly more engineering efforts than we expected. For example, the database was initially shared with other critical features and any downtime could cause severe impacts to those functionalities.
> Our next step was to move the rest of the data to DynamoDB, which would require rewriting the majority of the code. After evaluating the pros and cons, we decided not to proceed in this direction.
Whenever you have a vendor, no matter if the product is free or ...Oracle, de-risking future unknowns with an abstraction is something to consider. You don't know what your future needs, what that external entity will do, or if the product fits your needs.
You can choose to kick the can down the road and pray, or you can spend the extra 5 minutes of a red-green-refactor cycle to do something as simple as dependency inversion, persistence ignorance or what ever gives you a two way door.
Typically abstracting your DBMS has early pay offs that are worth it even without the above, like not needing to instantiate a database to do testing of business logic.
The issues with RDBMS barriers to exit and conversion costs have been known for a very very long time.
> 4.5»4 Portability. The final important cost of a DBMS is its
lack of portability. Only a few commercially available
systems run on more than one kind of equipment, and agencies
that want to change hardware may find that they are unable
to do so without also changing their DBMS. One of the
managers we interviewed decided to continue with the same
DBMS in order to avoid conversion costs for existing
applications. As a result his hardware procurement was
limited to a major vendor and some smaller companies that
sell "plug compatible" computers. Any agency that uses a
DBMS extensively must recognize that its investment in
application programs represents a commitment to a particular
DBMS and, in many cases, to a particular hardware vendor.
Did they profile a standalone MySQL instance? It seems like a natural next step. Their big(o) reasoning seemed a bit too theoretical to justify anything.
I'm guessing the next stage will be to keep the raw data and aggregate dynamically. We went through a similar progression with ad audiences -- maintaining pre-aggregated data sketches and summaries was such a PITA that we likely saved money by moving to a larger raw-facts database.
> OLAP databases are not good at serving large volumes of requests with low latency within milliseconds.
Kind of unrelated but I'm curious if there are any really robust usage-based billing solutions out there. Curious how they're architected to solve usage-based billing across their customers/various use cases.
I'm always concerned about automating the billing process and risking accuracy/trust.
Zuora is a well-known Enterprise-scale commercial option that displaced many others as “SaaS” took off several years ago (and its associated accounting standards).
Depending on complexity, Netsuite can address some moderate scale use cases. Stripe, Chargebee, etc address more of the SMB-scale needs.