Novices don't build working concurrent systems of any kind with any toolkit, per...

abhishekjha · on Oct 31, 2020

Surprisingly this is what the akka framework promises : Message passing and immutability of objects.

FpUser · on Oct 31, 2020

Software usually has state (unless that state is completely kept and managed externally in a database for example). And the state mutates. Simple case example is a big array that has to be processed in place.

mrkeen · on Oct 31, 2020

It's pretty easy to make the leap from individual SQL statements to SQL statements which are wrapped in a transaction.

formerly_proven · on Oct 31, 2020

Excellent example for making my point, since "just wrap it in a transaction" usually leads to concurrency bugs like the beloved lost update.

FpUser · on Oct 31, 2020

If you're talking database like transaction it "usually" leads to concurrency bugs only if the transaction level is not strictly serializable. It does not hurt to know things before labeling them.

mrkeen · on Oct 31, 2020

This is not something I'm familiar with. What's the beloved lost update and what transactions are you using that suffer from it?

formerly_proven · on Oct 31, 2020

Transactions give varying degrees of "isolation" between them, depending on the database (and its version + configuration). For example, in what SQL would call READ COMMITTED, where transactions will only read data that has been committed, read-modify-write updates are generally bugs. The classic example:

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT
    - Result: balance is 50, but should be 0

With serializabile transactions (not all databases have this, particularly if you look beyond SQL):

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT -> Fails, needs to retry
    - transaction 2b: SELECT balance FROM account: // = 50
    - transaction 2b: UPDATE account SET balance = 0
    - transaction 2b: COMMIT -> Ok!
    - Result: balance is 0

Because this is needed so frequently, databases have calculated updates, basically atomic operations:

    - transaction 1: UPDATE account SET balance = balance - 50; // values indeterminate
    - transaction 2: UPDATE account SET balance = balance - 50; // values indeterminate
    - transactions 1,2: COMMIT
    - Result: balance is 0

Or, one could lock the rows, like so:

    - transaction 1: SELECT FOR UPDATE balance FROM account; // = 100
    - transaction 2: SELECT FOR UPDATE balance FROM account: // = transaction 2 is stalled until transaction 1 commits or rollbacks
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    // transaction 2 can now continue and gets balance = 50
    - transaction 2: UPDATE account SET balance = 00
    - transaction 2: COMMIT
    - Result: balance is 0

And this is just one simple example of the problems you can have concurrently accessing one table, even while using transactions. Not to speak of the issues you can run into when interacting with systems outside a single database, which don't interact with the transaction semantics of the DB.

Concurrency is just very non-trivial regardless the abstraction.

mrkeen · on Nov 1, 2020

Well, I guess I'll just keep digging myself further into a hole!

I want to focus entirely on your first example.

Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

If you're telling it like it is, then I cannot argue with facts. I guess I'll stop using DBs, at least until they figure this stuff out in 1973.

formerly_proven · on Nov 1, 2020

> Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

We have two concurrent tasks both doing exactly the same thing in order to deduct 50 money:

    BEGIN TRANSACTION;
    SELECT balance FROM account; // = 100
    UPDATE account SET balance = 50; // calculated by application as 100-50
    COMMIT;

Perhaps I misunderstand you, or you misunderstood the way I presented the example (possibly because I presented it poorly). But in my mind there is hardly a way to describe this code as "not transactional".

I merely showed one possible way how these concurrent tasks may execute in practice leading to bugs. Of course, for casual testing this will actually look and work correctly. As one commenter far up the thread said (as an attempt to refute understanding of concurrency as necessary)

> The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

> And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block. Which makes an incorrect implementation worse but possibly correct.

Then,

> Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

That is only one of the possible ways for transactions to work. Note that IIRC the only database that interprets the SQL standard like this is postgres, while MySQL and Oracle still have (more subtle) serialization issues even on the SERIALIZABLE isolation level (example: https://stackoverflow.com/a/49425872).

Note that you can end up with deadlocks and transaction failures on any level stricter than READ COMMITTED, so the application needs to be able to deal with both of these.

mrkeen · on Nov 1, 2020

> The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

> And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block.

Yep yep - that's the Java + Threads model. It's (relatively) harder to take single-threaded logic and make it behave in a multi-threaded setting. Compared to the SQL model, where it's (relatively) easier to take single-threaded logic, wrap it in BEGIN/END TRANSACTION, and have it perform exactly as expected.

OK I get you now. In saying that SQL concurrency was easy and Java concurrency was hard I didn't think about what would happen if you tried to write a mixed Java/SQL transaction; I didn't realise there was a bunch of Java running between your SQL statements. So what would my fix be? Get rid of the Java and replace it with SQL.

> Note that you can end up with deadlocks and transaction failures on any level stricter than READ COMMITTED, so the application needs to be able to deal with both of these.

That's cool - transactions proceed completely or not at all.

About the "not transactional" thing, I was applying (a => b) => (^b => ^a). That is, since transactions are isolated, and you demonstrated code that wasn't isolated, I can conclude that it wasn't a transaction. Maybe I need to adjust my thinking a bit:

    assumption i) Atomicity says "The series of operations cannot be separated with only some of them being executed".

    assumption ii) Isolation says "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

    assumption iii) I use transactions because they're atomic and isolated.

    A *SELECT balance* was run, passing its value out to the real world before the commit succeeded.  This breaks assumptions i and iii.

    "That is only one of the possible ways for transactions to work" breaks assumption ii and iii.

    So, I can only conclude I should not use transactions.