Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to design database schema for usage based billing?
73 points by punkpeye on Aug 31, 2022 | hide | past | favorite | 36 comments
The service I am building is usage-based, i.e.

  * I am charging per second the service was used, e.g. USD 0.01/second
  * I would like to give credits upon sign-up.
  * There is no monthly minimum.
At the moment, I am thinking that I can achieve this with just a few tables:

  * `billing_account (id, billing_rate_id, running_balance)`
  * `billing_rate (id, rate)`
  * `account_credit (id, billing_account_id, timestamp, amount, function_execution_id)`
  * `account_debit (id, billing_account_id, timestamp, amount, source [platform_credit or stripe])`
When a user creates an account, I would create an entry in `billing_account` and associate whatever the current `billing_rate`. I would also create an `account_debit` entry with `source=platform_credit` and update `billing_account` `running_balance` value to reflect their balance after the `platform_credit`.

Then whenever they run a function that costs them, I create `account_credit` account with an entry equal to the amount they spent and update `billing_account` `running_balance` value.

When they top-up their account, I would just add entry to `account_debit` and update the balance again.

This appears to cover all my use cases, but I wanted to check with anyone who's designed such database schemas before.




I've written billing/usage systems before, something your missing is the audit trail, you will get a query on a bill by a customer (for example, why did my bill increase this month so much, or spike on Wednesday?) and you need to be able to dig into the raw data (eg, your raw usage logs that contain high detail of start/stop events or usage information for each service call) to provide comfort to yourself and your customer about the billing query and validate there was no billing error.

Since you will want to plan for that, one thing you can do is treat the end of month billing calculation (or ongoing usage per day) as a calculation based off your raw usage logs.

So your usage of the product (service usage start/stop times) are log events (eg, pushed to S3 if its a huge system or start with just another database table like billing_usage_event) that describes each service start/stop and calculated rate etc.

Then your bill is the aggregation of the raw events for the day/month, allowing you to both do an audit if there is a billing query (find out why the bill spike occurred by looking at billing_usage_event) and also provide peace of mind to you and your customer the billing is accurate.

("Accounting for Developers" was posted recently on HN, it was a great read - https://news.ycombinator.com/item?id=32495724)


Alternately, you could model it the other way ‘round: mutable tables, like OP has, but with change-data-capture (Debezium et al) attached to them, exporting to the logs-at-rest as a secondary format. Probably easier to pick up / add on, if you’re not already modelling anything else with a CQRS/ES approach. (Though just doing CQRS/ES from the start is always going to get you much cleaner code with many more guarantees that can be made about it.)


I know it feels like overkill, but I’d recommend either outsourcing (even Stripe has usage-based billing if you can keep your updates under 100/s) or using something off-the-shelf like https://killbill.io.

I know it seems like a simple problem, but billing systems quickly turn into Eldridge horrors, even if you have the best intentions, and it’s always nicer to have someone else to maintain that for you.


> Eldridge horrors

Just FYI, I think you probably meant "eldritch horrors":

https://www.merriam-webster.com/dictionary/eldritch


Stupid auto-correct.


Yeah, I almost didn't say anything - it looked autocorrect-y.

Just figured it would be useful to mention in case it was a genuine mondegreen.


Using Stripe here makes sense, if they have product for this. Wouldn't I need to store all of the same information in my database anyway as a source of truth?


Stripe is great for payments, but they don’t make usage data super useful for data purposes — their primary use case is payments.

If you want access to your data for other business purposes - stakeholder data, embedded dashboards, alerts and notifications about usage, etc - then you’ll want a system more purpose-built for usage data than Stripe.

Check out https://kable.io. Kable is purpose built as a data and billing solution, where your data is available for purposes ancillary to payments.

[Disclaimer: I am a Kable founder, but what you’re describing is precisely what we’re built for. If you’d like to chat, either about Kable or about your design in general to build this in-house, I’d be happy to.]


Hi asommer, How can I contact you about Kable.io?


adam@kable.io -- looking forward to chatting!


You could (I would) but you could care a lot less about it and treat it mostly as a log to use for reconciliation later if there's an issue or if you want to migrate to a different billing system.

The data is going to still be available in Stripe to download later if you need the raw events (or just the summaries from the invoices).


We built an open source usage based billing with Lago (YC S21). You deploy a Postgres database with all useful objects to run your billing (events, subscriptions, plans, prepaid credits, customers, invoices …).

Check it out here: - GitHub https://github.com/getlago/lago/issues - website https://www.getlago.com/ - documentation doc.getlago.com


Have you checked https://github.com/getlago/lago ? Open source billing API for SaaS.

Disclaimer: I am one of the cofounders, but looks like this could really fit your needs, especially as you add more complex logics afterwards: prepaid/offered credits, probably discounts, etc.

Also, if it’s the first time you’re building a billing system, you might want to check this thread about billing nightmares. It seems very simple at first but it becomes messy very fast. https://news.ycombinator.com/item?id=31424450

Happy to help more (whether you use Lago or not!)!


FWIW we are evaluating Lago and have had great interactions with the team. They are very customer focused and iterating fast.


If a customer ever changes their billing rate, you will need to keep track of when it was changed and what their previous rates were, otherwise you'll have no audit trail and no way to bill customers correctly.

One approach would be to have a billing_period (account_id, billing_rate_id, start_date, end_date) table. Another would be to store the billing_rate_id in your account_credit table so each usage can be tied to the billing rate the customer should be charged for it.

I also wouldn't keep a running_balance column. I forget which rule of normalisation it is because it's been so long, but you shouldn't normally store a value that you can compute from other data.


Unless your product is a billing system, stop right now and go look at using a third party. Zuora, Stripe, etc all have billing platforms you can use. Trust me from hard earned experience that billing system seem "easy" but they really aren't. They are very, very hard to get right and if you fuck up you are gonna overcharge, undercharge or god knows what else and piss your customers off.

Not only that but you'll be stuck in a loop writing features that work out of the box with third party software. Important ones like the ability to change the price, customize the price for a single account, refund money, export to quickbooks, add new products, offer coupons, offer sales, provide quotes to prospects, etc. You don't want to do this! A third party does all this and more.

So yeah. Don't build it! You'll regret doing so! Trust me!


I am no sql expert, but have some recent experience designing similar systems and went through the db structures of several big oss projects for inspiration (some lost HN post, if anybody remembers).

I think your account_debit and account_credit tables have their names swapped.

You should have a products table, and a billing tier table with timestamps for start and end, so that accounts billing tier * products billing base rate at the time of use = their cost. You are not going to do only one thing or have only one price forever.

So at least:

accounts (acct_id, timestamp_start, timestamp_end, tier_id) - this gets more than one row per acct based on whether they change rate tiers. Same thing true for products and tiers below.

products (prod_id, timestamp_start, timestamp_end, base_rate)

tiers (tier_id, rate_multiplier, product_id, timestamp_start, timestamp_end) - include product_id if you want use different multipliers for the same account on different products in the future. timestamp_start and _end so you can use the right multiplier when prices change for active accounts.

you can include an `active` boolean on any of those table to speed up queries if necessary.

Then, in account_debit (labelled account_credit above), I would just log usage (debit_id, timestamp_start, timestamp_end, product_id, account_id), and calculate out only when actually billed. The way you have it now you could never deal with a usage dispute. You need better logging of raw data, never store calculation results when you can trivially store numbers.

account_credit looks fine, maybe store source_transaction_id for ease of lookup. I don't have the experience in footguns as indicated with foreign key constraints by others, that seems worth listening to ;). I think using timestamp_start and _end so that there are never any changes to tier rate rows, product base price rows, etc, should go a long way towards mitigating that here.


Footguns:

- ensure you encode the billing rate WITH THE EVENT. This makes summation practically simple and allows you to quickly zero out a billing error! It also allows you to (at some point in the future) offer special billing for certain customers by encoding the billing rates at the time of subscription as part of the user’s profile or in its own table.

Why? You’re probably going to be using the ids of what you’re selling. Over time, you’re going to be changing these prices. If you change the SKU, you’ll lose historical pricing in your data and maybe even screw up customer’s past invoices. If you just create a new id/SKU/tier/product, then you have to change all the ids all over the code, which is a nightmare. However, if you set the base rate for the SKU, and copy that for anything that gets billed, you can easily change the base rate of the SKU without affecting the past.


However, that requires doing an extra lookup for every write on the most active / real time table. It would be better to add an optional product_id to the account table, which then allows to have different rates for different products at the same time for each account. You really don't need a tier table, if you are always using multipliers to the base rate, and we could eliminate it unless we want to store tier types as well, that allow for other formulas beyond multipliers.

The account, tier, and product tables already take care of changing prices using timestamps, but I can see that you might offer a fixed rate to an account that spans a price change on the base rate. There is an entire facet to this which is not really about the db, but business logic - like whether we are going to change rates used for calculation based on time of use, time of account creation, time of rate tier activation, product base rate change - change that logic if it's a promo rate - etc. I am not totally sure that we need to go beyond what we have for the dbs to work with this business logic, it is better handled and enforced by the calculation classes.

I don't think calculations are ever that hard, because I presume you will have previously written and optimized lookups and calculations for all common to semi-common cases. I would also (obviously?) never calculate on the live server, but dump the db and calculate billing on an admin server. If the site is busy enough, I would have this entire db on an admin server, and just import the usage logging table from a simpler db on live. For live, you just need enough account and product info for authentication and logging.


> However, that requires doing an extra lookup for every write on the most active / real time table.

You only have to do that lookup once and you're likely looking up other things to write that event as well. So it is almost a free cost. You can actually do all of this async and faster:

Just call a url containing all the details in the url (/usage?user_id=123&product_id=123&type=start_usage) that returns a 200 response with a minimal payload (0 byte static file). From there, just stream the server's access logs to Apache Kafka after the logs rotate once per minute to be parsed. There it will do all the lookups and aggregation. You're probably only looking at a delay of 4-5 minutes before the data is available to the end user -- even with millions of active users.

Then you can parse the logs async as fast/slow as you need to.


Barring performance/memory/misc concerns, this seems pretty much the same as I'd design it.

If I understand your problem correctly, you have two facts: user makes a payment (account gets credited) and user consumes credits (account gets debited). Everything else is a dimension (accounts, rates).

The only choice you have to make is how you deal with changing rates. You can write the premultiplied amount on the debit table, or you can treat rates as a slowly changing dimension and decompose "amount" into "time" and the foreign key to the rate table.

Which one is better depends on what the application is doing, there's nothing wrong in principle with either.


If I learned one thing from designing is similar systems in the past is to avoid referential data as much as possible haha

I did not go into such depth here, but individual function calls record (duration, billing_rate, total_cost) as plain values and also references `billing_rate` used (setting foreign constraint to restrict to avoid accidentally nuking data).

Had at least one instance where an engineer in our team modified a value in rates table causing havoc to the entire billing system that relied on referential values used to calculate balances

I want to build in as much redundancy as possible, even if that means duplicating data in several places


Those "id" names are really confusing; I assume they are all shorthand for "acct_id", or... maybe not.

I think account_credit & account_debit need foreign keys back to billing_account, and they probably are gonna need their own primary columns eventually (so you can add on addl metadata like 3rd party transaction tracking).

I'm not sure what billing_rate's relationship is to billing_account, if any...


I missed `billing_account_id` in `account_credit` and `account_debit`. Corrected.

There can be multiple billing rates (that change over time). `billing_account` is assigned reference to whatever is accurate at the time.


Just based on what I am reading about building such systems, it seems like the common advice is to not have a single balance column such as `running_balance`, but instead calculate this value on the go every time it is needed.

That's fine. What else?


I would get rid of the billing_rate table or at least copy its contents into billing_account - nothing good can come of changing that reference!

Also consider how you will build invoicing on top of this - i.e. A monthly PDF that covers (for a period) prepayments made, services rendered, balance remaining.

It seems like your billing model makes that quite easy but bear in mind you have to be able to sum that up on a particular date, and for that summary to be immutable.


Avoid building it if you can. Stripe is great. The last billing system I built had a similar schema to yours. As the business grew, we got bigger customers who hated that we just provided a balance due, which was a sum of all debits and credits and not actual invoices they could pay or submit to their finance team for payment. Wrangling recurring charges yourself will unnecessarily take years off of your life.


if the billing rate for running something is a delta (like you posted, 0.01/second) of time, then having a timer (or cron, or batch, or kafka, whatever) that records the usage is also atomically deducting it from the account_balance. The account_balance is then topped up with whatever payment schema you want to create. The main issue is that you track the usage as a variable of a constant value to deduct from the account_balance variable of a constant value.

Your design is sound. Multiple rates against a running balance, that running balance's constant value is, well, USD. So the rate of usage is USD. Credits/Debits table is really there for reporting and has no bearing on the running_balance.

Good luck!


Good points. I am mostly trying to build enough visibility to ensure that whatever I am doing is always profitable.


consider not storing running_balance (or at least expect it to be written to a LOT). Based on the above description that's going to be a hot table with a write/update of 1 per second per row (of active customers).

If you are thinking sql, that running balance could be calculated from a VERY tall and skinny table of time_spent per session (which would be calculated in the application) or an append only table of (billing_account_id, timestamp) where you write a new row every second.


Does the running balance have to be sql? That's why we invented event streaming...


I should have mentioned: thankfully, there are no long-running sessions in this design. The longest session I could imagine would be 1 hour, but... they will average 1 minute.

Designing such system for long-running sessions would be a lot more complex, esp if we consider that the price might not be constant during the entire duration of the session.


Should account_credit record their payments and account_debit their spending?


Looks like I got it inverse indeed.


How about open sourcing and putting this on Github and letting anyone submit pull requests with suggestions/ideas?


Stripe allows you to bill based on reported usage which seems considerably easier than crafting an equivalent feature from scratch. This is how we handle billing at skusavvy

https://stripe.com/docs/products-prices/pricing-models#usage...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: