Hacker News new | past | comments | ask | show | jobs | submit login

The problem is not the threads, it is the mutations of variables which boost the complexity of the code. So a tutorial on creation of theads actually an invitation to hell. Nothing is cool about it. Cool thing is achieving concurrency without threads/race conditions/shared memory



A computer is a mutation machine, you cannot escape mutation by hand-waving it away. If you are writing programs in which you can achieve concurrency without threads and shared memory, it’s because you’re building on the shoulders of all the engineers who didn’t hand-wave it away. Many of us, due to product requirements, don’t have the luxury of using higher level abstractions like that.


And yet we're comfortable hand-waving GOTO away - that is, not calling computers GOTO machines.


There are thousands of engineers (at least) who use it or its equivalent every day. Just because they’ve built abstractions that allow you to ignore it, doesn’t mean nobody has to deal with it anymore.


But that's identical to what he's saying; he's not saying no one has to deal with mutable memory. Just that most developers who need concurrency shouldn't have to. Same as registers, GOTO, etc.


That's not what they said. They said "nothing is cool about a tutorial on creation of threads", and that the cool thing is "achieving concurrency without threads/race conditions/shared memory" which ironically is only enabled by all the engineers who spend their time working on and maintaining those "uncool" things.

If they don't need to use threads, then good for them. But to dismiss threads and learning material about threads as "uncool" is just silly. The thing that enables that misunderstanding is all the work that's done on them in the first place.


There's a lot cool about threads, and you can learn to implement them well.

Threads handled well do not need to have race conditions, and race conditions/deadlocks are also very possible in distributed, message-passing systems.


Can databases be efficiently implemented without shared access?

Can message passing accomplish this at the same level of performance?

although I agree that simplifying resource access should probably be considered before fully shared state.


> Can databases be efficiently implemented without shared access?

Let me ask a different question: Why did databases take off in the way they did? Sure they persist stuff to disk, but so do files. What they offer is a concurrency model so good that you almost never think about it. Beginner programmers can competently write large, concurrent systems by writing single-threaded programs which are backed by a central DB, without even knowing the term "race condition".

If beginner database articles told users how to make database Threads, Thread groups, and how to signal and catch interruptions, I don't think databases would have enjoyed nearly as much popularity.

While Threads are fundamental to Java concurrency, I kinda agree with yetkin's point. It introduces the Thread footgun without even paying lip service to the problems of shared, mutable state.


Do you use other paradigm / languages ? (clojure comes to mind, but maybe others)


I'll add Pony to the list. This language uses the actor model like Akka and Erlang, but allows for the safe sharing of data between actors enforced by an an ingenious use of the type system. The result is an actor programming model with better performance than Erlang because mutable data can be safely shared.

I have been a long time Java developer, and I have worked a lot with highly concurrent code. Pony really opened my eyes to what was possible.

Unfortunately, the language, standard library and runtime is still pretty immature. It does however have very good 'C' interop. So for some problems it would be a very good fit.


For Java at least, the Java Concurrency API is preferred.


Akka actor framework comes to mind. I am in the process of learning it and it is definitely simpler to wrap your head around it.


Actor model with Elixir/Erlang and the BEAM VM.


All the cool people are in Hell. Only go to Heaven for the climate.


The concepts of threads and concurrent data access is simple enough for any decent programmer to comprehend. There is no hell here. Sure there are some complex cases but complex cases will arise in many situations when programming things.

And achieving concurrency without shared memory is impossible in general case. Sure it is possible to isolate such access to a separate layer and make it transparent for the rest of the program but someone still has to program such layer.


The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block. Which makes an incorrect implementation worse but possibly correct.


If the resource is shared and being accessed from many threads and is both written to and read from then it is the correct behavior to to lock it with the proper type of lock at access time. Depending on resource it might be possible to split it into few with more granular access.

As for novices: they are called that for reason and supposed to be under supervision rather than allowed running wild.


Why is this being downvoted? It's the truth.

HN needs to only allow downvotes that have an accompanying explanation comment.


HN uses downvotes mostly to boo the people with opinions deviating from common party line. As for reasonable explanation - you're asking too much. Programming as many other things often are treated as the religion. No arguments, it just is.


As with most other compromised social sites, no badthink allowed here and how dare you.


Novices don't build working concurrent systems of any kind with any toolkit, period. Concurrency is hard and thinking all the "concurrency problems" go away with some message passing is both ludicrous and dangerous. Fearless concurrency can only be attained through understanding, not by thinking all your problems went away because you're using a "cool approach".


Surprisingly this is what the akka framework promises : Message passing and immutability of objects.


Software usually has state (unless that state is completely kept and managed externally in a database for example). And the state mutates. Simple case example is a big array that has to be processed in place.


It's pretty easy to make the leap from individual SQL statements to SQL statements which are wrapped in a transaction.


Excellent example for making my point, since "just wrap it in a transaction" usually leads to concurrency bugs like the beloved lost update.


If you're talking database like transaction it "usually" leads to concurrency bugs only if the transaction level is not strictly serializable. It does not hurt to know things before labeling them.


This is not something I'm familiar with. What's the beloved lost update and what transactions are you using that suffer from it?


Transactions give varying degrees of "isolation" between them, depending on the database (and its version + configuration). For example, in what SQL would call READ COMMITTED, where transactions will only read data that has been committed, read-modify-write updates are generally bugs. The classic example:

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT
    - Result: balance is 50, but should be 0
With serializabile transactions (not all databases have this, particularly if you look beyond SQL):

    - Intent: both transactions deduct 50 money
    - transaction 1: SELECT balance FROM account; // = 100
    - transaction 2: SELECT balance FROM account: // = 100
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    - transaction 2: UPDATE account SET balance = 50
    - transaction 2: COMMIT -> Fails, needs to retry
    - transaction 2b: SELECT balance FROM account: // = 50
    - transaction 2b: UPDATE account SET balance = 0
    - transaction 2b: COMMIT -> Ok!
    - Result: balance is 0
Because this is needed so frequently, databases have calculated updates, basically atomic operations:

    - transaction 1: UPDATE account SET balance = balance - 50; // values indeterminate
    - transaction 2: UPDATE account SET balance = balance - 50; // values indeterminate
    - transactions 1,2: COMMIT
    - Result: balance is 0
Or, one could lock the rows, like so:

    - transaction 1: SELECT FOR UPDATE balance FROM account; // = 100
    - transaction 2: SELECT FOR UPDATE balance FROM account: // = transaction 2 is stalled until transaction 1 commits or rollbacks
    - transaction 1: UPDATE account SET balance = 50
    - transaction 1: COMMIT
    // transaction 2 can now continue and gets balance = 50
    - transaction 2: UPDATE account SET balance = 00
    - transaction 2: COMMIT
    - Result: balance is 0
And this is just one simple example of the problems you can have concurrently accessing one table, even while using transactions. Not to speak of the issues you can run into when interacting with systems outside a single database, which don't interact with the transaction semantics of the DB.

Concurrency is just very non-trivial regardless the abstraction.


Well, I guess I'll just keep digging myself further into a hole!

I want to focus entirely on your first example.

Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

If you're telling it like it is, then I cannot argue with facts. I guess I'll stop using DBs, at least until they figure this stuff out in 1973.


> Let me ask you: What is it about your first example that makes you call it transactional? If it behaves as badly as you say, shouldn't it be called a 'method' or a 'procedure'? Because my "fix" for it is to actually use transactions. I suspect your fix would be the same.

We have two concurrent tasks both doing exactly the same thing in order to deduct 50 money:

    BEGIN TRANSACTION;
    SELECT balance FROM account; // = 100
    UPDATE account SET balance = 50; // calculated by application as 100-50
    COMMIT;
Perhaps I misunderstand you, or you misunderstood the way I presented the example (possibly because I presented it poorly). But in my mind there is hardly a way to describe this code as "not transactional".

I merely showed one possible way how these concurrent tasks may execute in practice leading to bugs. Of course, for casual testing this will actually look and work correctly. As one commenter far up the thread said (as an attempt to refute understanding of concurrency as necessary)

> The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

> And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block. Which makes an incorrect implementation worse but possibly correct.

Then,

> Why did you choose to interleave its steps like that, when "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

That is only one of the possible ways for transactions to work. Note that IIRC the only database that interprets the SQL standard like this is postgres, while MySQL and Oracle still have (more subtle) serialization issues even on the SERIALIZABLE isolation level (example: https://stackoverflow.com/a/49425872).

Note that you can end up with deadlocks and transaction failures on any level stricter than READ COMMITTED, so the application needs to be able to deal with both of these.


> The problem for novices is that a program that behaves correctly looks a lot like a correct program. Until one day it doesn’t.

> And because you’re in production and getting random spurious failures, the panicked (but common) reaction is to wrap every shared resource in a synchronized block.

Yep yep - that's the Java + Threads model. It's (relatively) harder to take single-threaded logic and make it behave in a multi-threaded setting. Compared to the SQL model, where it's (relatively) easier to take single-threaded logic, wrap it in BEGIN/END TRANSACTION, and have it perform exactly as expected.

OK I get you now. In saying that SQL concurrency was easy and Java concurrency was hard I didn't think about what would happen if you tried to write a mixed Java/SQL transaction; I didn't realise there was a bunch of Java running between your SQL statements. So what would my fix be? Get rid of the Java and replace it with SQL.

> Note that you can end up with deadlocks and transaction failures on any level stricter than READ COMMITTED, so the application needs to be able to deal with both of these.

That's cool - transactions proceed completely or not at all.

About the "not transactional" thing, I was applying (a => b) => (^b => ^a). That is, since transactions are isolated, and you demonstrated code that wasn't isolated, I can conclude that it wasn't a transaction. Maybe I need to adjust my thinking a bit:

    assumption i) Atomicity says "The series of operations cannot be separated with only some of them being executed".

    assumption ii) Isolation says "Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially."

    assumption iii) I use transactions because they're atomic and isolated.

    A *SELECT balance* was run, passing its value out to the real world before the commit succeeded.  This breaks assumptions i and iii.

    "That is only one of the possible ways for transactions to work" breaks assumption ii and iii.

    So, I can only conclude I should not use transactions.


What's a better alternative to synchronizing access to shared resources?


Treat it like GC and don't leave it up to the programmer.


An make a programmer unable to achieve highest performance when needed. We leave in supposedly free world. If you want to be "protected" be my guest and use languages with GC. Plenty of those. For somebody who need the opposite and uses "unprotected" tools - leave them alone. You have no rights to decide how other people do their work unless they're under your direct control.


Does that involve using concurrency primatives that basically don't allow access to any shared mutable state?


I'd say provide concurrency primitives to disallow direct access to shared mutable state. You still do the reads and the writes, but you let the system take and release locks for you.

Let's say you wanted to turn a list into a bounded list of 4 elements.

Race-condition insert:

    if (sz < 4) {
      list.insert(x);
      sz++;
    }
Safe insert:

    atomically {
      if (sz < 4) {
        list.insert(x);
        sz++;
      }
    }
So atomically organises the locking/unlocking/rollback for you such that a fifth element will not be inserted.


Aren't the semantics of this exactly the same as Java's synchronise? What error does it protect you against compared to synchronising on list? What happens if .insert() also uses atomically{} somehow?


Glad you asked!

synchronized locks code, atomically locks data.

It maps to my thinking better, because what I'm interested in is manipulating two or more resources at the same time. I'm not interested in making sure two threads aren't in the same region of code at the same time.

> What error does it protect you against compared to synchronising on list? Synchronized is perfectly fine for my example. I should have picked a better example of two resources (instead of list and size of list), because synchronized gets trickier once you start combining different pieces.

For example, I have two bounded lists, and I want to transfer an element from one to the other. I've already written synchronized versions of remove and insert, so let's try with them:

    transfer(otherlist) {
        x = otherlist.synchronizedRemove();
        this.synchronizedInsert(x);
    }
There will be a moment in time where the outside world can't see x in either list. Maybe I crash and x is gone for good. Or maybe the destination list becomes full and I'm left holding the x with nowhere to put it. So what is to be done? I could synchronize transfer but that still wouldn't fix the vanishing x, or the destination filling up. So I paid the performance cost of taking two/three locks and I've still ended up buggy.

I think the fix here is to lock each list, then no-one else can access them and it should fix the vanishing x:

    transfer(otherlist) {
        synchronized(this) {
            synchronized(otherlist) {
                x = otherlist.synchronizedRemove();
                this.synchronizedInsert(x);
            }
        }
    }
I think that's correct? But now I have taken too many locks - I only needed remove and insert not synchronizedRemove and synchronizedInsert. And now I've introduced deadlock possibility - if two transfers are attempted in opposite directions.

I can fix the too many locks problem, by exposing non-synchronized remove and insert and have transfer call them instead. But then callers and I will accidentally call the wrong one. I break any pretence of encapsulation by exposing unsafe and safe versions of a method. The deadlock is harder to fix. I'd need to synchronize the lists in some agreed order (and have everyone else obey that ordering in their other methods too!).

Instead, I want my implementation to look something like:

    BoundedList {

        transactional Int sz;
        transactional List list;

        transactional insert(x) {
            if (sz < 4) {
            list.insert(x);
            sz++;
            }
        }

        transactional remove() {...}

        transactional transfer(otherlist) {
            x = otherlist.remove();
            this.insert(x);
        }
    }
> What happens if .insert() also uses atomically{} somehow?

A good implementation would throw a compile-time error. A bad implementation could throw a runtime error.

In order to do this, transactional actions would need to be marked as such - to prevent mixing them up. atomically by definition is a non-transactional action (because it's the thing that commits all the steps to the outside world) so if you find an atomically inside a transaction, it's a type error.

You've already used a system like this if you've worked with any decent SQL implementation:

    BEGIN TRANSACTION

    UPDATE accounts
    SET money = money - 50
    WHERE accountId = 'src'

    UPDATE accounts
    SET money = money + 50
    WHERE accountId = 'dest'

    COMMIT TRANSACTION
I didn't take any 'locks'. I just wrapped two perfectly good individual actions and said 'run them both or not at all'. Though to be fair I am getting a lot of grief in another thread by suggesting that even novices could wrap up their SQL like that without getting it wrong.


Thanks for the detailed reply.

So, atomically{} is basically like a SQL transaction and would repeat or signify failure if it cannot commit the changes you make inside the code block, similar to a CAS lock-free algorithm. This seems quite limited though, you are basically constrained to writing code within the atomic block that deals with value types only, and with no side-effects. Otherwise how would the compiler or runtime know how to roll it back?

That sounds useful but doesn't seem to cover all the use cases of thread synchronization by a long shot. Isn't it also the case that even knowing how to implement interesting alrogithms in a lock-free manner is an area of significant ongoing research? For example I think only recently someone worked out how to implement a lock-free ring buffer (https://ferrous-systems.com/blog/lock-free-ring-buffer/)


> The concepts of threads and concurrent data access is simple enough for any decent programmer to comprehend. There is no hell here.

It's notoriously difficult to reason about concurrent programs using intuition. Much more difficult than reasoning about non-concurrent imperative code. This is why there are articles like [0], and why a bug in a Wikipedia article on a fundamental concurrency algorithm went unnoticed until an analysis tool detected the issue, [1] and why lock-free algorithms in particular are so tricky to get right.

[0] https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedL...

[1] [PDF] https://llvm.org/pubs/2008-08-SPIN-Pancam.pdf


Threading concept is simple,real world is not.

if i had no idea about C10K problem, success of Nginx, Redis, other concurrency success stories on Actors, CSP concept, Concurrency via messages over shared memory, I would say threads are ok when you can use it. But indeed it is so simple and tempting people to design shitty software. Software is a very welcoming medium for it. it is hell.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: