This document presents the relevant correctness proofs for the lock-free
unbounded MPMC FIFO queue implemented in this library called loo::queue.
Informally, linearizability as defined by Herlihy requires for every data structure operation to appear to take effect instantaneously to an external observer at some specific point in between being initiated and concluding. For this to be the case, every operation must have a linearization point (LP) at which its effects become observable.
Lemma 1.1. Every enqueue operation E(v) takes effect atomically at one statement.
Proof. The LP for an enqueue operation is the instant in which the inserted element v becomes observable by other threads and can be dequeued.
There are two separate opportunities for linearization during the course of
the enqueue algorithm.
The first potential LP is the atomic FAA that performs a bitwise write of v
into the reserved slot.
However, linearization only occurs at this point and the element v becomes
observable to a dequeuer, if the slot did not already contain the READER
bit before the FAA completed, which could have been set by a competing FAA on
the same slot by the corresponding dequeue operation.
In this case, linearization occurs either at some subsequent retry at this point
or the second potential LP.
The second opportunity for linearization is at the bounded CAS loop after a
successful CAS that appends a newly allocated node with v tentatively
inserted in the first slot to the current tail node by exchanging its next
pointer.
The CAS that eventually becomes the LP does not necessarily have to succeed and
it is sufficient to determine, that the tail pointer has been advanced.
In either case, the new tail node will have become visible to dequeueing threads
and the write of v into the first slot is finalized.
Lemma 1.2 Every dequeue operation D() takes effect atomically at one statement.
Proof. The LP for a dequeue operation is the instant in which the queue is either (a) determined to be empty or (b) an element in a slot is consumed so that it can be returned and also not be dequeued by any other other thread.
There are two potential LPs for determining a queue to be empty and another
potential LP for consuming an element.
The first opportunity for determining an empty queue is the tail pointer load
before incrementing the dequeue index.
The second opportunity is at the load of the current head node's next pointer
in the slow path.
The potential LP at which an element can actually be dequeued is the FAA that
sets the READER bit in the reserved slot and returns the previous slot's
state.
Linearization occurs at this point, only if the previous state contains some
non-zero bit pattern in the bits reserved for storing the element itself, which
indicates an enqueue operation has linearized at this slot earlier and inserted
an element.
Otherwise, the operation will linearize at some subsequent retry at either this
point, or at one of the two empty checks, if the queue has become empty in the
meantime due to concurrent dequeue operations.
Because every dequeue operation has to first atomically increment the dequeue index (and subsequently use its previous value), it follows that every slot can only be accessed by exactly one dequeue operation and therefore every element can be dequeued at most once.
Both enqueue and dequeue operations are required to atomically increment their
associated index value, which is stored in the same memory word as the pointer to the respective node.
Given the two constants
Definition 2.1 Let
Lemma 2.1 The largest value the enqueue index can possibly assume is
Proof. Assume that the current value of the enqueue index is
Definition 2.2 Let
Lemma 2.2 The largest possible value the dequeue index can possibly assume is
Proof. Assume that the current value of the dequeue index is
Further assume that there are exactly NULL and hence, none of these threads can ever fail the second empty check at line H3 again for the same
Our queue has full non-blocking and lock-free progress guarantees.
For the fast path and in the general case, all threads can make progress by simply inserting into or consuming from their reserved array slot.
However, this can not be guaranteed, since it is possible for a slot having to be abandoned and both corresponding operations having to retry.
This can, in fact, theoretically result in a temporary live-lock situation, in which all threads continuously have to abandon one slot after another.
However, this live-lock can last at most for
Threads entering the slow path, on the other hand, may not leave it again, until the desired (path-local) progress has been made by some thread, i.e., when the respective operation's node has been advanced. Therefore, threads may be required to loop inside this path in order to guarantee this property. However, the number of steps required to make progress is bound by the total number of producer or consumer threads: The only possibility for a thread in the slow path to fail at either effecting or observing progress in advancing the respective node is another thread interfering by incrementing the index value (Lines E3 and D8). Since a thread in the slow path can not leave it and increment the same index value again, this provides a "soak-off" effect: The probability for further interfering FAA increments decreases and eventually drops to zero, once all producer or consumer threads have entered the slow path and are "trapped" in it, at which point progress by one of the threads in that path can be guaranteed. This progress will eventually be observed by the other trapped threads, allowing them to proceed as well.