Hacker News new | past | comments | ask | show | jobs | submit login

I've found message passing and thinking in transactions are pretty good architectural patterns for maintaining ones sanity. Message queues with spinlocks for performance critical code (mutexes are slow).

The biggest sin generally is to disregard the overhead caused by thread management and thinking more threads makes the sofrware run faster.

I've seen people try to parallellize a sequential program by just spawning mutexes everywhere and then thinking now any number of threads can do whatever they please. Of course when tested the system was quite a bit slower as when it ran on a single thread (the system was quite large so quite a lot of work was needed before reaching this state).




> spinlocks for performance critical code (mutexes are slow).

Gah, no. Userspace spinlocks are the deepest of voodoo and something to be used only by people who know exactly what they are doing and would have no difficulty writing "traditional" threaded code in a C/C++ environment. Among other problems: what happens when the thread holding the spinlock gets preempted and something else runs on the core? How can you prevent that from happening, and how does that collision probability scale with thread count and lock behavior?

Traditional locking (e.g. pthread mutexes, windows CriticalSections) in a shared memory environment can be done with atomic operations only for the uncontended case, and will fall back to the kernel to provide blocking/wakeup in a clean and scalable way. Use that. Don't go further unless you're trying to do full-system optimization on known hardware and have a team full of benchmark analysis experts to support the effort.


> Gah, no. Userspace spinlocks are the deepest of voodoo and something to be used only by people who know exactly what they are doing and would have no difficulty writing "traditional" threaded code in a C/C++ environment.

If you ever used pthread_mutex with glibc then you use spinlocks without knowing it. The implementation spins for some time before going for a full kernel mutex.


There's a bit of a terminology confusion here. When they say "spinlock", most people mean a pure spinlock that spins forever until it succeeds.

"Mutex" on the other hand might have a fast-path that spins a few times before inserting the thread onto a wait list.


I think the warning is directed against rolling your own spinlocks without careful consideration, including a realistic evaluation of whether you know every issue involved in doing so.


Sure, and C library authors qualify. App developers don't.


An uncontended mutex is just as fast as a spinlock (on modern operating systems using a futex). It takes about 25 nanoseconds to lock it.

The difference is when there's contention. A spinlock will burn CPU cycles but a mutex will yield to another thread or process (with some context switch overhead).

A spinlock should only be used when you know you're going to get it in the next microsecond or so. Or in kernel space when you don't have other options (e.g. interrupt handler). Anything else is just burning CPU cycles for nothing.

Mutex and condition variables (emphasis on the latter) are much more useful than spinlocks and atomics for general multithreaded programming.


You have to be careful with those things - for instance, you have to special case whether you'r on a single CPU or multiple CPU system, because a spin lock will block forever in a non-preemptive context, such as the kernel.

Outside of hard realtime code, there's zero reason to use spin locks.


In hard realtime code, a FIFO mutex gives you bounded wait time because you have a fixed number of threads.

Interesting read: http://www2.rdrop.com/~paulmck/realtime/SMPembedded.2006.10....


> Outside of hard realtime code, there's zero reason to use spin locks.

That is just not true. You _must_ use them in the case where the kernel is non-preemptable. Additionally, if the locked resource is held for a very short time, a spin lock is likely a more efficient choice than a traditional mutex.


> Outside of hard realtime code, there's zero reason to use spin locks.

On some common architectures, releasing a spin lock is cheaper than releasing a mutex.


On all architectures, releasing a mutex requires at least a branch (to see if you need to wake up sleeping threads) that you don’t need with a pure spinlock.

But if you don’t have a guarantee the lock owner won’t be preempted, well, spinning for a whole timeslot is quite a bit more expensive…


Spin locks are a tough sell in a preemptable context. Say you have two processes that share some memory location. They both briefly access it for a very short time, so you protect it with a spin lock. Well, what happens in the case when one of the threads is preempted while holding the lock? The other thread would try to aquire it, and just spin for its entire timeslot. No bueno. When you call spin_lock() in the kernel it actually disables preemption until you call spin_unlock() to avoid this. You can't disable preemption from userspace. There might be a use for a spin lock if you have a process that is SCHED_RR, but I haven't seen it.


You are not wrong in general but note that there are ways to disable preemption in userspace in practice, be it SCHED_FIFO[1] or isolcpu with cpu pinning.

[1] you better be careful with spinlocks and priorities here as you can livelock forever.


Can't upvote this enough. If you are using spinlocks because "mutexes are slow", please reconsider since the contended case makes much more sense with mutexes than spinlock, and the uncontended case is exactly the same.


And a mutex that you never take is cheaper than one that you do. Taking the (f)mutex is fast but any context switch will be brutal if you're really using threading to increase your performance.

That said I'm used to situations where we're pinning thread affinity to specific cores and really trying to squeeze out what you can from fixed resources.


Pinning threads to cores with affinity is another technique that should be used with caution. Under normal circumstances the kernel will run the same thread on the same core as much as possible because CPU migration is expensive.

Setting CPU affinity will ensure that you always get the same core, but it might not increase performance and could adversely affect other parts of the system.

CPU affinity is a good fit for continuously running things like audio processing or game physics or similar. It's not good when threads are blocked or react to external events.

In most cases it's just unnecessary because the kernel is pretty good in keeping threads on cores anyway.


> mutexes are slow

Be careful with that. First off, what people refer to as "mutex" is usually a spinlock that falls back to a kernel wait queue when the spin count is exceeded. There are even adaptive mutexes that figure out at runtime how long the lock is typically held and base their spin count limit on that.

Secondly, busy-waiting is often worse than a single slow program, because you actively slow down all of the other running programs.


Is there a good message passing library for C++? Sounds like something for the stl to include.


I know coworkers that speak well of ZeroMQ:

http://zeromq.org


> Is there a good message passing library for C++? Sounds like something for the stl to include.

Qt does it with signals-slots. What I generally do is that I have a queue of std::function and just pass lambdas with the capture being copied.


Boost has one, but not the stl


Using spinlocks can cause livelocks under certain conditions due to priority inversion.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: