Does anybody know a good SQL / DDL schema for a double entry accounting system?

meekaaku · on Aug 24, 2022

The heart of double entry accounting is extremely simple. Forget about asset/liability/expense. Money always flows from one account to another. What goes out from account_1 must go into another account(s). Typical tables:

accounts (id, name)

transaction (id, date) /* some call it journal */

transaction_line (id, transaction_id[fk], account_id[fk], amount)

I use -ve amount for credit, +ve for debit. That way when you do SUM(trascation_line.amount) it would come to 0.

This also cleanly maps to database transaction too where all transaction_line rows and transaction row should be atomic.

If you want multi-currency support, instead of amount column, it needs to be currency_code, currency_rate, amount_in_currency, amount_in_basecurrency ( i know we don't need this all, but sometimes you want to record the amount as entered, eg EURO 52.10 u want to record as entered even if your base currency is USD)

meekaaku · on Aug 24, 2022

To build a chart of accounts, you can have a parent column in accounts table. Account balances is just:

SELECT account.name, SUM(amount) balance

FROM account ac

INNER JOIN transaction_line tl ON tl.account_id = ac.id

GROUP BY account.name

You can cache this balance values with a current_balance column on accounts table

Once you have that, for any real world transaction, all you need to figure out is what are the accounts to debit/credit, ie classification. That is a higher level thing and is the business logic of an accounting application.

ruuda · on Aug 24, 2022

This looks nice, but it doesn't enforce in the schema that all transaction lines sum to zero. Is that a problem in practice? Or is it one of those things where if you get it wrong, you tend to notice immediately because everything breaks (as opposed to silently creating or destroying currency that goes unnoticed for a long time)?

perlgeek · on Aug 24, 2022

I guess you could have a stored procedure that checks for balanced transactions before inserting (or before committing, at least), with SECURITY DEFINER, and not give anybody else permissions to insert or modify the transaction lines table.

But yes, that is a downside, and if I were to write such a thing I'd make sure to have at least two mechanisms to avoid / detect errors (like, one validation in business logic and/or stored procedure, plus regular monitoring for transactions that don't add up to zero).

yobbo · on Aug 24, 2022

It is bad practice in terms of software engineering, but some "real world" apps are implemented like this. They do verification in the "middleware".

meekaaku · on Aug 24, 2022

Its not a problem in practice. There might be a way to enforce that through database constraints, but in practice, checking transaction lines sums to 0 in business logic is not that hard. Having said that, you can run accounting entry sanity checks on the entire database. As previously said

SELECT SUM(amount) /* this should sum to zero */

FROM transaction_line

---

Also to identify any non-balancing transaction is easy:

SELECT tx.id, tx.date, SUM(amount) tx_sum

FROM transaction tx

INNER JOIN transaction_line txl ON txl.transaction_id = tx.id

GROUP BY tx.id

HAVING tx_sum != 0

---

This will identify them even if caused by your business logic bug, database bug, disk corruption etc.

This can also be done on the single accounting transaction just after insert too and can be done within the same database transaction.

yobbo · on Aug 24, 2022

This sort of schema requires a process to verify the sum before committing, and verification is annoying to achieve with constraints. Instead, you might consider something like:

transaction_line (id, transaction_id[fk], dr_account_id[fk], cr_account_id[fk], amount)

meekaaku · on Aug 24, 2022

The problem with this will be when you have 3 accounts involved on a transaction. Eg, you take a sales receipt with part bank transfer, part cash.

Sales Cr $100

Cash Dr $30

Bank Dr $70

Your approach will have:

Sales Cr $70, Bank Dr $70

Sales Cr $30, Cash $30

That looks like two sales, which is not really the case.

gen220 · on Aug 25, 2022

I think you could solve this with a `sale_id` for grouping transactions together, no?

The benefit of row-level guarantees on balancing is considerable, even if it somewhat complicates the application layer (the layer that translates your first ledger into your second ledger, and vice versa).

If you're storing a transaction that needs to balance across N rows, the only solution is at the "application level" (i.e. a stored procedure, or some middleware in code), which is unfortunately more difficult to reason about.

yobbo · on Aug 24, 2022

There exists cases where it is problematic, but your example is fine. In this case, you would use individual transaction lines to represent payments, rather than sales, which is closer to reality. But you can group transactions in any way you want.

With sales tax, you would have:

cr: sales, dr: cash, $100

cr: cash, dr: tax, $10

meekaaku · on Aug 24, 2022

The above looks like $70 was taken from cash and deposited to bank. Thats not whats happening in the real world. Since cash is fungible, one could ignore that. But if it was cheque + bank-transfer or something else that leaves a record, then that wont work. Or when you include VAT/GST/Sales tax. Or when you pay a loan monthly payment of $1000 that needs to be split to principal and interest. etc.

yobbo · on Aug 24, 2022

Yes, it is best if transactions correspond to verifiable events. Complicated situations can be modelled by letting events be represented by their own "accounts", in a separate charter.

In your example, you might let the monthly payment be represented by its own account, with three transactions (bank payment, interest, principal).

The purpose is to make the database constraints enforce double entry verification. If you don't need this, it can be made simpler as you suggest.

HWR_14 · on Aug 24, 2022

Isn't it more accurate to say it was two sales than one? In what way is it misleading?

yobbo · on Aug 25, 2022

Distinguishing between sales and payments is required, but any confusion is basically impossible in online systems such as this schema would be used in, in real examples.

Typical accounting GUIs present transactions and ledgers such that it becomes intuitive for business users to enter sales/invoices. Entering the above transaction in those GUIs might cause confusion.

The schema here is used by an online system recording business events as a type of log of user interactions. It lives in a sql database, and thus needs to rely on row constraints to guarantee double-entry. Transactions are never entered manually or randomly.

The effect of split transactions can be achieved by using an intermediate account. Those accounts have balances which are not guaranteed to be zero, but that is unavoidable and sometimes OK and intended.

lelanthran · on Aug 25, 2022

One sale is always one sale, even if it has multiple transactions associated with it[1]. It must never look like multiple sales.

[1] Lots of use-cases: People paying partly by credit card, partly in cash. People using vouchers which are recorded as a negative transaction. Group of people splitting a bill and each doing a CC transaction for their bit. None of these are multiple sales, because recording each transaction as a single sale wreaks havoc on the inventory control.

meekaaku · on Aug 25, 2022

No. In the real world it is a single sale. In the accounting/finance domain it is single sale, so it should be represented as such. It also matters when you look at KPI like avg order size, num orders for customer, not to mention it is one invoice at the time of invoice creation, but payment collected in multiple methods.

mmckelvy · on Aug 25, 2022

What do "-ve" and "+ve" denote?

meekaaku · on Aug 25, 2022

negative and positive for credit and debit respectively. Eg: an entry of -$100 is a credit entry.

mmckelvy · on Aug 25, 2022

Ah, ok. So the "ve" was just a placeholder for the value in the "amount" column for each entry.

How does this handle the fact that credits increase the value of some accounts (e.g. sales, equity) but decrease the value for others (e.g. assets)?

meekaaku · on Aug 26, 2022

-ve i used to denote credit, not necessarily to indicate increase/decrease. In accounting terms, if you sales increase, your entries on the credit side of sales account increase. In my implementation it means larger negative number. Hope that explains.

ruuda · on Aug 24, 2022

I would like to know about this as well. I struggled with this for a while for a prediction market app that I'm building. Eventually I ended up with [1]. I am somewhat pleased with it, but it does feel unwieldy to work with. I have some vague hope that somebody who actually implemented banking software would know of an obvious and elegant schema.

The summary of my approach is:

* A table with accounts. * A table with account balances and an index on (account id, id), so you can efficiently query the current balance. * A table with transactions. * A table with mutations. Mutations have an amount, and reference a credit account, debit account, and transaction. (So one transaction can consist of multiple mutations.) * The account balances table list the post-balance, but also references the mutation that caused it to be that new value.

All of these tables are append-only. I later added another layer, with transactions and subtransactions, but I'm not sure if this was a good idea.

[1]: https://github.com/ruuda/hanson/blob/351e8e9bc5c96a9c1dc76fd...

perlgeek · on Aug 24, 2022

Thanks!

If I understood your explanation and schema correctly, a mutation itself is balanced, and if you have a transaction that involves three accounts, that would be split up into two balanced mutations, right?

The advantage I see with this design is that a mutation (and thus a transaction) is always balanced (you store the amount only once, and credit account and a debit account).

The disadvantages seem to be that the transaction itself doesn't explicitly list the total changes to an account explicitly, and that for each account you have to join the mutations twice (once for the credit side, once for the debit side) to get to re-calculate the current amount.

Storing both the current balance in the account means you cannot have concurrent updates to one account, so you must rely on row-level locking for consistency. (Which sounds a bit like a potential bottleneck, if you have something like a company-wide Cash account that is involved in lots of transactions, as in the ModernTreasury blog post).

Does that seem like a fair summary to you? Are there other trade-offs you have noticed?

yobbo · on Aug 24, 2022

> ... if you have a transaction that involves three accounts, that would be split up into two balanced mutations, right?

Three or more "mutations", but these might be grouped together in whatever way you want. For this purpose, the meaning of "account" is up to you define. You might call them "accounting objects" representing subscriptions, contracts, invoices, and so on.

Account balances (per transaction) can only be calculated sequentially in the order of transactions, which becomes a bottle neck at some rate of transactions.

ruuda · on Aug 24, 2022

Yes, that's a fair summary.

Performance is not something I'm worried about for my app, maybe a few dozen people would use it at the same time, and I run everything at serializable isolation level anyway. But I can imagine that for processing real-world payment volumes, at some point you need to sacrifice the balanced-by-construction property for performance.

One issue I noticed is that there is some freedom in the representation of transfers. You can pick a canonical representation by demanding that the amount is positive, but then you have to make a case distinction everywhere in code. Often the code becomes much simpler if negative amounts are allowed. But it does make the credit/debit more confusing, and it goes against the observation in part 1 of the series, that accounting systems rarely work with negative numbers. I wonder why though.

gen220 · on Aug 24, 2022

The experimental directory of `beancount` has some code for converting beancount data structures into SQL structures [1].

You could also take a look at the data structures used by GNU Ledger (e.g. account [2]) and transcribe them to SQL. A bit more work, but probably educational.

[1]: https://github.com/beancount/beancount/blob/1f180e7176808139...

[2]: https://github.com/ledger/ledger/blob/master/src/account.h

vineyardmike · on Aug 24, 2022

GnuCash has a SQL backend, so perhaps you can pick it apart (or use it as-is as a library?).

https://wiki.gnucash.org/wiki/SQL