Observe acquire-release in assembly
Acquire-release (acq-rel) is a weaker memory model than sequential consistency (SC). This post illustrates how acq-rel allows more interleaving than SC through examples and assembly analysis.
§Store buffer
A store buffer is a per-CPU buffer that holds pending stores before they are committed to main memory and visible to other CPUs. Within a CPU, operations follow program order, but store buffers can cause surprising behaviors in multi-core environments.
This is demonstrated using a litmus test.
1 | C store-buffer |
Running herd7 -c11 -cat rc11.cat storebuffer.litmus
produces:
1 | Ok |
The result shows that writes to x
and y
can be held in store buffers, making them temporarily invisible to other
CPUs. This behavior is not allowed under SC — verified by running the same test with -cat sc.cat
.
§Assembly
Let’s examine the compiled assembly on two architectures.
First, the C code:
1 |
|
Since both threads follow the same logic, we only show one.
§x64
The corresponding assembly (with -O
):
1 | thread0: |
On x64, acq-rel compiles down to regular mov
instructions. That’s why acq-rel is often described as “free” on x64.
But since x64 allows store-load reordering, the result r1 = r2 = 0
can still occur.
Using SC (memory_order_seq_cst
), the compiler emits xchg
, which flushes the store buffer, ensuring the store becomes
visible to all CPUs. This enforces sequential consistency.
1 | thread0: |
§aarch64
On ARM64 (compiled with clang -O -march=armv8.6-a
), acq-rel generates stlr
(store-release) and ldapr
(load-acquire).
These instructions include memory-ordering semantics and are more costly than regular str
and ldr
.
1 | thread0: |
However, ldapr
may still be reordered with respect to earlier stores. To prevent this and enforce SC semantics, the
compiler uses ldar
instead for memory_order_seq_cst
.
1 | thread0: |
§Summary
Acquire-release ordering allows certain instruction reorderings that sequential consistency does not. This difference shows up in:
- Litmus tests: acq-rel permits outcomes that SC forbids
- Assembly:
- On x64: acq-rel uses
mov
, SC usesxchg
- On aarch64: acq-rel uses
stlr
/ldapr
, SC usesstlr
/ldar
- On x64: acq-rel uses
§References
- https://stackoverflow.com/questions/67397460/does-stlrb-provide-sequential-consistency-on-arm64
- https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/armv8-sequential-consistency
- https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/enabling-rcpc-in-gcc-and-llvm