Hacker News new | past | comments | ask | show | jobs | submit | nuttingd's comments login

You would need to use serializable isolation for this to hold true. Any isolation level less than serializable will use the snapshot that was active at the time of the select.

In Postgres, even with the serializable isolation level, all transactions that touch the same rows must also be using the serializable isolation level or it's not really enforced. This is one aspect of serializable isolation in Postgres that seemed like a major gotcha for real world application development. There's no future proof solution: new code can be added that doesn't use the serializable isolation, and then the assumptions of isolation from the earlier code are no longer valid.

FOR UPDATE is the only real solution in my eyes


  import my_module
This is compatible with `__all__` if you have your code broken down into smaller sub-modules and collect them in the main module as follows:

  # my_module/__init__.py
  from .submodule import *
  from .another_submodule import *


Yup that's what I do too. Works great. And nbdev auto-generates __all__ so I don't have to think about anything -- it all just works.

(Besides which, when you do have __all__ just regular wildcard imports work well anyway. I've been using them for >10 years without trouble. I think people just repeat the claim that they're a problem without having 1st-hand experience of using them with __all__ correctly.)


It's the accounting equation being represented in canonical form. A chart of accounts is visualized in the minds of an accountant as:

Assets | Liabilities + Equity

Accounts classified as assets are debit accounts (left side), and accounts classified as liabilities or equity are credit accounts (right side).

The theory discussed everywhere in this thread is sound. You really don't need to use terminology like debit/credit for accounting.

What the discussion misses is the application of this framework. It is useful for a human to be able to visualize a complex transaction and work through missing pieces with the hints this framework provides. I'm missing something on the left? Oh yeah, I missed the deferred revenue debit.


> You really don't need to use terminology like debit/credit for accounting.

That's exactly right -- you don't need to. The problem is that people do use this terminology, and they use it in a way that conflicts with common usage, which makes a very simple concept vastly more confusing than it needs to be.


> That's exactly right -- you don't need to. The problem is that people do use this terminology, and they use it in a way that conflicts with common usage

I used to think that way, then I understood this thinking is the exact opposite of what’s happening.

Hundred million people on earth know how to work with debit and credit exactly as it has been written in accounting books for hundreds of years. When you need to expand your accounting department, you go and hire a person who understands things exactly the same way as your current accountants, can pick up their work, they can communicate effectively. As a CEO, you are spared of teaching every new junior accountant your own flavor of first-principles accounting, you don’t need to write your custom accounting software, and convert your company’s books for tax authorities and outside auditors who are not familiar with your system.

Same with music notation. Same with Java language. Same with every other piece of human knowledge.


That's because people are trained into the system.

If you started teaching it another way, eventually that other way would be the norm.


It should be tangibly 10x—100x better than what's already out there for people to switch. If you make something marginally better (and by default anything unknown is much worse than what everybody knows already), nobody will bother.


> Java language

That's actually a pretty good analogy. There is a lot in Java that could be improved too.


> which makes a very simple concept vastly more confusing than it needs to be.

Agreed. The concepts are all very simple. You can throw away all of the domain-specific terminology and reason about accounting theory with nothing but positive and negative numbers.

The utility of the confusing terminology and age old accounting frameworks isn't obvious unless you are a practitioner living "in it". It's not until you face the complexities of real world transactions (an accountant booking closing entries for a F500 company or something) that the strange left/right debit/credit way of thinking is very valuable.


Sorry, I don't buy it. By the time you get to the point where you're doing the accounting for an F500 company you have been thoroughly indoctrinated into the conventional way of doing things. That doesn't mean that the conventional way isn't deeply flawed. It's kind of like the use of British units in the USA rather than metric. You have 350 million people who are thoroughly comfortable with miles and feet and inches and gallons and whatnot, but that doesn't mean that metric isn't objectively superior.


One caveat to serializable transactions in Postgres is that ALL concurrent transactions must be running with the SERIALIZABLE isolation level to protect against serialization anomalies.

This is a bit jarring if you come from MSSQL, which implements the SERIALIZABLE isolation level using locks. In MSSQL, you can rest assured that a serializable transaction will not be affected by changes from other concurrent transactions, regardless of their isolation level.

In Postgres, you may have a set of transactions all participating in SERIALIZABLE isolation today, but tomorrow someone adds another script without the SERIALIZABLE isolation level, and now your protected paths are no longer isolated.


We use Sybase SQLAnywhere at work, which also implements SERIALIZABLE using locks. Naive me thought that meant a lock on the table, but no, it locks all the rows... Not great for a table with many rows!

We were essentially trying to avoid inserting the same value twice, so we ditched SERIALIZABLE and instead added a unique index along with a retrying loop on the client side.


Or from the other perspective of the trade-off: One caveat with MSSQL is that ALL concurrent transactions must pay the overhead if _some_ transactions need serializable guarantees?


Only if they touch the same data. If they are touching disjoint sets of data then there is no overhead to be paid by non-SERIALIZABLE transactions.


There has been some recent improvement to locking behavior:

https://learn.microsoft.com/en-us/sql/relational-databases/p...


Oh that's super nasty, is it mentioned somewhere in the doc?

Is it the same for repeatable read?


I have read the docs plenty of times, but it never stuck for me until I read the (free!) PostgreSQL 14 Internals ebook: https://postgrespro.com/community/books/internals

Quoted from Page 70:

If you use the Serializable level, it must be observed by all transactions of the application. When combined with other levels, Serializable behaves as Repeatable Read without any notice. So if you decide to use the Serializable level, it makes read sense to modify the default_transaction_isolation parameter value accordingly -- even though someone can still overwrite it by explicitly setting a different level.

I had a real "WTF?" moment when I read this the first time.


Interesting, this book looks really cool!


It is mentioned in the doc, but can be easy to mis-understand:

"If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error."

Note that it says nothing about the non-serializable transactions.

https://www.postgresql.org/docs/current/sql-set-transaction....


It is in the Wiki: https://wiki.postgresql.org/wiki/Serializable Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI. If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.

I guess it is the same for all MVCC databases. They don't want to acquire a read lock just in case another transaction is in Serializable


The OP mentioned that SQLServer does lock (but then, it doesn't use SSI - just SI).

The recommendation in PG docs to use a combination of SERIALIZABLE and READ ONLY transactions seems like a good one for read-heavy systems.


This is pretty intuitive when you think about predicate locks that Postgres uses to detect conflicts.

If you have one SERIALIZABLE transaction that sets some locks, and one non-SERIALIZABLE that doesn't, then they can't "see" each other "by definition".

But your point stands--there could be some kind of "warning flag" somewhere, that would alert if SERIALIZABLE transactions overlap with non-SERIALIZABLE ones. Or maybe there _already_ is something like that??


> (Newspeak “Sprint and Goals”)

Sprint "commitments" in my world. A single word with major psychological impacts.


The check constraint solution isn't a solution to the concurrency issue: it only prevents a negative balance.

The balance would still be wrong for the whole range where balance > 0. Two (positive) deposits would not store the correct ending balance.


Pagination is harder than it seems to get right.

I think pagination is only predictable under these conditions:

1) The offset used for the next fetch must be based on a pointer to a unique key. We can't rely on the number of rows previously seen.

With this rule, deletes which occur to rows in previous pages will not cause unpredictable contractions.

2) The paged result set must be sorted based on a monotonically increasing field, like created_at, plus enough other fields to create a unique key. You could lean on the PK for this, i.e.: ORDER BY (created_at, id) ASC.

With this rule, new inserts which occur during page enumeration will only affect unseen pages (and we'll see them eventually)

The API call looks roughly like this:

  /orders/?region=US&offset=(2022-03-12T07:05:58Z&ord_1234)&limit=100

The DB query looks roughly like this:

  SELECT *
  FROM orders
  WHERE (created_at, id) > (:offset_created_at, :offset_id)
  OR (
    :offset_created_at IS NULL 
    AND :offset_id IS NULL
  )
  ORDER BY (created_at, id) ASC
  LIMIT :page_size
EDIT: formatting


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: