
A Wait-Free Stack - EvgeniyZh
https://arxiv.org/abs/1510.00116
======
kumagi
I already wrote wait-free stack
[https://gist.github.com/kumagi/d259274270fdc1385f81](https://gist.github.com/kumagi/d259274270fdc1385f81)
It is much difficult than lock-free stack.
[https://gist.github.com/kumagi/b9a4715b1ce0dd511922](https://gist.github.com/kumagi/b9a4715b1ce0dd511922)

And published as book(in Japanese sorry)
[http://longgate.co.jp/books/grimoire-
vol3.html](http://longgate.co.jp/books/grimoire-vol3.html)

~~~
amaks
You call malloc in your implementation -- you realize that memory allocation
takes lock at some point, right? To make it truly lock or wait free you need
to implement a corresponding memory allocator as well.

~~~
rfw
With that reasoning, you should write your own kernel too -- you realize that
the kernel scheduler will take locks at some point, right?

I think that malloc is a sufficiently abstract operation here that its
implementation shouldn't constitute whether the algorithm as a whole is lock-
free or not.

~~~
haberman
Lock-free algorithms don't usually depend on an OS being present. They could
just as easily run in an environment that has no scheduler. But if the
algorithm calls malloc, that is a hard dependency.

If an algorithm depends on malloc, it needs to prove that lock-free/wait-free
malloc() exists before calling itself lock/wait-free.

------
duiker101
So uhm the top 2 link of HN in the last 5 hours have been about this and while
they are highly voted there is very little discussion going. Can someone give
some context as to why this is attracting so much (surely deserved) attention?
How will this affect us? Is it likely to have a deep effect on general
computing performances? In what ways will this be applied? I'm sorry if this
are silly questions but I find interesting the lack of discussion going on.

~~~
badestrand
I remember the topic of lock free data structures from my time working with
C++ a few years ago.

C++ is a lot about performance. To do things quicker you create threads which
execute code in parallel physically. Since they mess up when working on the
same data structure (e.g. a stack) at the same time locks were introduced. So
when pushing a new element onto a stack you acquire the lock, perform the push
and free the lock. Unfortunately, of that whole operation acquiring/freeing
the lock takes 99% of the time. Thus lock-free data structures were developed
which magically did not need locking and thus could operate blazingly fast. I
understand the article now introduces a lock-free stack.

As I understand these data structures have no effects on multi-system
environments and can only be applied to a single system. This is because they
rely on atomic CPU operations that are not available (natively) on distributed
systems.

How does that affect us? Not much I guess. I have never seen those data
structures widely used. In my understanding almost no application benefits
noticably(!) from the speed up. Maybe games, but I don't think so. Nonetheless
it is always nice to see basic building blocks improved.

~~~
banachtarski
Lock free != wait free

> As I understand these data structures have no effects on multi-system
> environments and can only be applied to a single system

I have no idea where to even begin with this statement. By this logic, no
optimizations should be made to operations in a single threaded or single
process environment. Distributing computation helps you achieve scale and
HURTS latency except for extremely long running tasks.

And yes in games these sorts of things REALLY matter. Things are measured in
tens to hundreds of microseconds and optimized accordingly. Not just games,
but have you noticed how slow software has gotten of late? Deep learning,
rendering, neural networks, AI, etc. Heck, even the browser that I'm typing
this in is slow as shit and likely using far more memory than it is supposed
to. What about compilers? Those things use trees, stacks, queues, etc and are
getting slower all the time. It's the attitude that programmers don't need to
care about this stuff that will permanently separate, in my opinion, engineers
at the top of the curve from the rest.

~~~
badestrand
Sorry, but that optimization elitarism does not get anyone anywhere. Most
software written today is quite high level, be it the >2 million apps or the
>1 billion websites. In my household neither the smoke detector, toaster or
washing mashine needs faster executed code.

I understand where you are coming from but for optimizations such low level
considerations should be among the last because they matter so little. Most of
the time an additional caching structure or the like solves any performance
issues.

~~~
banachtarski
I mention in my reply specific applications where performance really does
matter. I don't understand why issuing the strawman of pointing to things that
I also trivially acknowledge are not performance sensitive really helps any.

------
bnjmn
Make sure you check out Appendix A at the end of the paper (Asymptotic Worst-
Case Time Complexity), in case you imagined the name "stack" implies constant-
time push/pop performance. This data structure is only a "stack" in the sense
that it provides last-in-first-out access.

~~~
chrisseaton
Is it possible for a contended concurrent stack accessed by an arbitrary
number of threads to have guaranteed constant-time operations? I wouldn't
think that was a realistic expectation.

~~~
aidenn0
A lock-based stack can be O(M) in number of threads accessing it. Judging by
the abstract (haven't read the paper) this is O(N) in the size of the stack.

~~~
EvgeniyZh
"in terms of the number of concurrent threads in the system (N), the actual
size of the stack(S) and the parameter W ... as soon as W consecutive nodes
get marked ... the worst case time complexity of the pop operation is O(NWS)"

------
zilchers
I'm not sure I see what's novel about this - maybe it's a verbiage thing
around "wait free," but if they're atomically updating the top pointer and
linked list, there will be lock contention on writes, and similarly when
marking an item popped, on reads. I suppose the contention is bounded by the
number of readers or writers, but I wouldn't consider that wait free (again,
that could just be a verbiage thing). But, more to the point, this is just a
slight twist on how Kafka works (stack vs log / queue, but same with the
pointer holding place and a cleanup operation), I don't really see it as
particularly novel...perhaps I'm missing something?

~~~
kasey_junk
A "wait free" data structure has a specific technical definition so yes, in
some ways the novelty in this is that it meets that verbiage.

Lock free stacks have been around since at least the 80s but I haven't seen a
general wait free stack before (though I'm no expert).

In neither this paper or the classic lock free stacks do you have lock
contention on writes as you are using CAS operations.

I'm having a little trouble parsing the algo in this paper, but what they seem
to have added that is novel is their cleanup function will always complete in
a finite number of steps, thus making the whole thing both wait free and
bounded, which is pretty neat.

In any case, with wait free structures generally implementation details matter
a lot and stacks are pretty notoriously hard to handle because of the
contention around the top node, as opposed to something like an append only
log like Kafka uses, so that particular comparison is not fair.

~~~
jemfinch
> In neither this paper or the classic lock free stacks do you have lock
> contention on writes as you are using CAS operations.

Given that real-world locks are most often implemented using CAS, is this
distinction a valuable one? In either case you have contention on a memory
location between multiple threads trying to modify that memory location.

~~~
scott_s
Yes, it is a real distinction. In lock-free algorithms, progress is
guaranteed. If they are not wait-free, then they tend to update some values,
then try to commit those values with CAS. If the CAS fails, they try again.
Wait-free algorithms do the first part, but on the failure, they don't try
again, they go off and do something different. All threads can make progress,
even if some thread is suspended or dies; no thread has exclusive access to
modify the state.

In lock-based algorithms, that is not the case; the thread with the lock can
be suspended, killed, or go off on a wild goose chase, and the progress of the
algorithm cannot continue. Only that thread may modify the state, and the rest
must wait.

~~~
gpderetta
You did not imply otherwise and your description is great, but I wanted to
clarify that an algorithm that retries a CAS can still be lock-free (just not
wait free) if the failed CAS implies some other thread made progress.

~~~
scott_s
Yes, absolutely.

------
nemetroid
The first few lines of pop() are:

    
    
      mytop <- top.get()
      curr <- mytop
      while curr != sentinel do
        mark <- curr.mark.getAndSet(true)
        ...
    

What's keeping curr from becoming a dangling pointer if the size of the stack
is bounded?

------
programmer_dude
What is a wait-free stack?

~~~
wcrichton
Wait-freedom is described in the introduction. Quote:

There are three levels of progress guarantees for non-blocking data
structures. A concurrent object is:

\- obstruction-free if a thread can perform an arbitrary operation on the
object in a finite number of steps when it executes in isolation,

\- lock-free if some thread performing an arbitrary operation on the object
will complete in a finite number of steps, or

\- wait-free if every thread can perform an arbitrary operation on the object
in a finite number of steps.

Wait-freedom is the strongest progress guarantee; it rules out the possibility
of starvation for all threads. Wait-free data structures are particularly
desirable for mission critical applications that have real-time constraints,
such as those used by cyber-physical systems.

------
xchip
I love papers but I love it more when their code is in github :) thanks for
sharing!

~~~
EvgeniyZh
Yeah, I'd love to see implementation too. Even better - implementation in some
lib with comparison of performance

------
amaks
How often would you find a situation where lock free data queue or stack would
bring huge performance gains? Usually bad performance comes from a poor choice
of data structure(s), bad data locality or by locking is too coarse or too
fine causing livelocks/convoys/excessive context switches etc. What I'm saying
is that using lock free or wait free algorithm is not a panacea.

------
appleflaxen
More discussion on pdf link

[https://news.ycombinator.com/item?id=12109219](https://news.ycombinator.com/item?id=12109219)

~~~
bhouston
Different paper - queue vs stack. Confused me as well.

------
pzh
Correct me if I'm wrong, but wasn't this already described in The Art of Multi
Processor Programming?

[https://www.amazon.com/gp/aw/d/0123973376/ref=mp_s_a_1_1?ie=...](https://www.amazon.com/gp/aw/d/0123973376/ref=mp_s_a_1_1?ie=UTF8&qid=1468773145&sr=8-1&pi=SY200_QL40)

~~~
chrisseaton
No? It has a lock-free stack though.

------
snarfy
> Subsequently, it is lazily deleted by a cleanup operation.

So, it's wait free until this happens?

~~~
kasey_junk
No, the cleanup operation is also wait free, which they claim is what makes
this novel.

~~~
blaisio
Ah, they should really have put that in the description because otherwise this
doesn't seem like that great of an achievement.

