The case for non-blocking synchronization and binary atomic primitives Michael Greenwald Distributed Systsms Group Stanford University Non-blocking synchronization (NBS) has significant advantages over blocking synchronization: however, it has not been used to a significant degree in practice. The two main obstacles to wider deployment of NBS techniques appear to be the performance and ``conceptual complexity'' of non-blocking algorithms. We have found that richer primitives and some simple system structuring techniques can overcome these obstacles. While, in theory, single word atomic primitives (LL/SC or Compare-And-Swap) are {\em universal}, in practice we have found significant advantages to using {\em two} word primitives that can update multiple non-contiguous words (e.g. Double-Compare-And-Swap (DCAS) or, equivalently, a pipelined extension to LL/SC). We have built a multi-processor operating system kernel, the CacheKernel, where synchronization in both user libraries and in the kernel is done exclusively with NBS and hardware DCAS. We present three major points. First, given the appropriate system structure and primitives, non-blocking algorithms are both efficient and easy to design and verify. We have developed a set of techniques and algorithms based on this infrastructure. These include a simple, mechanical, transformation from arbitrary object locking protocols to non-blocking data-structures, and optimized versions that perform as well as, or better than, previously published NBS algorithms for equivalent data structures. In practice, given DCAS, {\em every} data structure in the Cache Kernel was amenable to one of these optimizations. Second, the usefulness of DCAS in enabling NBS in a real system provides another data-point: the "sweet-point" is hardware support for 2 word universal primitives. Unary synchronization primitives are impractical, and primitives with arity 3 or more provide only marginal improvement over 2. All real systems that exclusively use NBS, seem to use 2 word primitives or use special-case hacks that treat CAS as operating on 2 half-words. Finally, we have described a simple extension to the load-linked/store conditional (LL/SC) instructions to support double-compare-and-swap functionality: a {\em pipelined} load-linked/store-conditional (LL,LLP/SCP,SC) and show how it can be implemented cheaply in the R4000.