And published as book(in Japanese sorry)
And almost all these algorithms use Compare-And-Exchange(CAS) instruction which internally uses locks.
I think that malloc is a sufficiently abstract operation here that its implementation shouldn't constitute whether the algorithm as a whole is lock-free or not.
If an algorithm depends on malloc, it needs to prove that lock-free/wait-free malloc() exists before calling itself lock/wait-free.
Also, any algorithm can be made mostly malloc free if you keep a freelist instead of freeing memory. Malloc will then only be called enough times to fill the max utilization.
Also: I feel there should be a comma in the sentence above, but can't figure out where to put it :/
So these authors are in "the other cultural circle" too. One in which there's more than 10 times competition in the educational system for funding, compared to not only western and European counterparts but the Japanese one too, which the parent comment I think is/was a part of. Not just funding for cool projects. There is a very dire need for good universities but higher education in a good university is very difficult to have.
In maths it is thought to be beneficial to come up with a diferent proof of the same theorem though, so maybe in computing the similar idea that coming up with another implementation/demonstration/proof has value too
C++ is a lot about performance. To do things quicker you create threads which execute code in parallel physically. Since they mess up when working on the same data structure (e.g. a stack) at the same time locks were introduced. So when pushing a new element onto a stack you acquire the lock, perform the push and free the lock. Unfortunately, of that whole operation acquiring/freeing the lock takes 99% of the time. Thus lock-free data structures were developed which magically did not need locking and thus could operate blazingly fast. I understand the article now introduces a lock-free stack.
As I understand these data structures have no effects on multi-system environments and can only be applied to a single system. This is because they rely on atomic CPU operations that are not available (natively) on distributed systems.
How does that affect us? Not much I guess. I have never seen those data structures widely used. In my understanding almost no application benefits noticably(!) from the speed up. Maybe games, but I don't think so. Nonetheless it is always nice to see basic building blocks improved.
> As I understand these data structures have no effects on multi-system environments and can only be applied to a single system
I have no idea where to even begin with this statement. By this logic, no optimizations should be made to operations in a single threaded or single process environment. Distributing computation helps you achieve scale and HURTS latency except for extremely long running tasks.
And yes in games these sorts of things REALLY matter. Things are measured in tens to hundreds of microseconds and optimized accordingly. Not just games, but have you noticed how slow software has gotten of late? Deep learning, rendering, neural networks, AI, etc. Heck, even the browser that I'm typing this in is slow as shit and likely using far more memory than it is supposed to. What about compilers? Those things use trees, stacks, queues, etc and are getting slower all the time. It's the attitude that programmers don't need to care about this stuff that will permanently separate, in my opinion, engineers at the top of the curve from the rest.
Precisely. The engineers at the top will understand cost benefit analysis and be okay with sub-optimal solutions while the rest sit and rant about how slow something is.
I understand where you are coming from but for optimizations such low level considerations should be among the last because they matter so little. Most of the time an additional caching structure or the like solves any performance issues.
This is wrong assumption. With lock-free data structures you don't skip waiting for your right to access data, you just avoid using lock. It may or may not be faster depending on conditions, and whole lot harder to get it right:
A lot of applications could benefit from a significant speed up. But atomic operations cannot help with that, they are too slow, they still have to do all that nasty synchronization underneath.
It's more expensive than no synchronization, but it's the cheapest (that I know of) method of syncing state across threads and multiple CPU architectures.
The only thing cheaper is immutable structures, but those obviously limit your ability to change state across threads.
This paper is dealing with the computing process to be used (proposed) in highly parallel and distributed computing. Specifically how it manages and updates output data.
(warning: this is a not so great example of lock-free, wait free)
If you have experience with linux, you can only apt-get or yum install only once at a time, because there is a 'lock' which prevents you from installing multiple packages to prevent collisions in the system data which they modify.
In a similar vein, in parallel or distributed computing (read: many threads) computation, the threads required access to the same files in executing the algorithm.
So the ''stack" is the memory that is in use during a computation. When there is a lock on a particular sector, any programs cannot access it and have to wait for the thread currently modifying it before it makes changes, so that they don't corrupt it (mess up the data). This really is disadvantageous since you are not having the max utilization of parallel threads and threads are kept waiting. So a lock-free stack means that any threads don't have to wait for the lock to be removed before they can modify anything.
So how will this affect us? make parallel computing more efficient than it already is.
See this for more info-
(It's called non-blocking since it is not 'blocking any other threads'. And I suggest you read this comment before you jump in the link- See this: https://news.ycombinator.com/item?id=12109439)
Your run-of-the-mill lock-free code basically operates under the premise of "keep trying until something sticks." For example, with only CompareAndSwap you can implement lock-free bit-field operations as:
oldValue = Read(variable);
newValue = oldValue | 0x01;
} while ((CAS(variable, oldValue, newValue) & 0x01) == 0);
There are two operations on a stack: Push() and Pop(). Classically, these both compete to write the head (write-contention). The advantage with this algorithm is that only Push() competes to write the head so you're decreasing the probability of proving O(∞) with each Push(). Additionally, Pop() seems to be O(N) (where N = number of items in the stack) where-as classically that's also O(∞).
TL;DR Unless I'm misunderstanding wait-free (having never heard of that technical term before), it means eliminating a theoretically unbounded loop.
If you'd like to learn more I consider Jeff Preshing as the authority figure on lock-free data structures. His articles are clear and could explain how these structures work to "ostensibly anyone." CPPCon also has at least one talk of his.
Lock free stacks have been around since at least the 80s but I haven't seen a general wait free stack before (though I'm no expert).
In neither this paper or the classic lock free stacks do you have lock contention on writes as you are using CAS operations.
I'm having a little trouble parsing the algo in this paper, but what they seem to have added that is novel is their cleanup function will always complete in a finite number of steps, thus making the whole thing both wait free and bounded, which is pretty neat.
In any case, with wait free structures generally implementation details matter a lot and stacks are pretty notoriously hard to handle because of the contention around the top node, as opposed to something like an append only log like Kafka uses, so that particular comparison is not fair.
Given that real-world locks are most often implemented using CAS, is this distinction a valuable one? In either case you have contention on a memory location between multiple threads trying to modify that memory location.
In lock-based algorithms, that is not the case; the thread with the lock can be suspended, killed, or go off on a wild goose chase, and the progress of the algorithm cannot continue. Only that thread may modify the state, and the rest must wait.
That's what made it click. Thanks!
In the case "real world performance" it has to do with the implementation of the lock and the architecture, but the CAS intrinsics on the architecture are almost certainly going to be the most performant way to do memory sync, so either the lock elides to those or it doesn't but if you are doing the CAS yourself you know what you are using.
For the purposes of these algorithm discussions though it can matter a lot because of the semantics of the lock. To prove something is lock free you have to prove that a thread will complete in a finite number of steps and lots of lock semantics don't allow for that.
For wait free all of the threads must complete and I don't know of any locking semantics that can allow for this property (but I'm definitely not an expert and usually poorly muddle through lock/algo papers).
I think its important to recognize that lock-free/wait-free is more important for real time systems than it is for "performance" (in terms of latency of request). Real time systems are often practically slower than a non-real time version of the same thing, but they have a more consistent worst case. This is where lock-free/wait-free are especially useful.
On the other hand, if you combine interrupt-disabling with a fair lock, then your lock wait time can be bounded in every thread. This is probably only an option inside the kernel, though.
In userspace, and with more threads than CPUs, the lock-free approach probably will save you a lot of context-switching time. It'll probably also be at least as fine-grained as the most fine-grained locking scheme you can come up with, probably with less overhead.
So I'd say that the lock-free approach is an optimistic strategy, and I don't see the benefit except for performance. I say this as someone who really likes lockfree programming.
Priority inversion was what I was thinking of. When you say its a fixable problem, the common way to fix that is to have a real time scheduler deal with it. Lots of systems don't have real time schedulers available to them, so lock free algos make sense there and you deal with the starvation issue instead of the inversion issue.
That said, I'm no expert on real time systems and my use of lock free algos has fallen into the exact category you are talking about, which is wanting to avoid context switches in user land, but that is more a property of the OSes I use than the algorithms themselves.
PS. The difference is that lock-free algorithms improve throughput _at the expense of single-thread latency_, and thus are more useful for satisfying system-wide throughput requirements, e.g. for a DBMS. Wait-freedom additionally guarantees freedom from starvation.
I fail to see how under contention the statement above is "wait-free".
Having globalPhase is another contention point (although atomic adds are wait free on most hardware, nowadays).
Not ceratain if 'tid' is threadId but having AtomicReferenceArray (all notation appears to stem from java.util.concurrent) with an arbitrary length is certainly not a wait-free operation, hence the announce would fail to be wait free. Using CPU-id instead is a hard one as well unless all threads are core-bound. Morealso if the array (announce) is not fixed length the entire algorithm cannot finish in finite amount of steps.
mytop <- top.get()
curr <- mytop
while curr != sentinel do
mark <- curr.mark.getAndSet(true)
There are three levels of progress guarantees for non-blocking
data structures. A concurrent object is:
- obstruction-free if a thread can perform an arbitrary operation on the object in a finite number of steps when it executes in isolation,
- lock-free if some thread performing an arbitrary operation on the object will complete in a finite number of steps, or
- wait-free if every thread can perform an arbitrary operation on the object in a finite number of steps.
Wait-freedom is the strongest progress guarantee; it rules out the possibility of starvation for all threads. Wait-free data structures are particularly desirable for mission critical applications that have real-time constraints, such as those used by cyber-physical systems.
> In this paper, we describe an algorithm to create a wait-free stack. A concurrent data structure is said to be wait-free if each operation is guaranteed to complete within a finite number of steps. In comparison, the data structure is said to be lock-free if at any point of time, at least one operation is guaranteed to complete in a finite number of steps. Lock-free programs will not have deadlocks but can have starvation, whereas wait-free programs are starvation free.
So, it's wait free until this happens?