Observe acquire-release in assembly
Acquire-release (acq-rel) is a weaker memory model than sequential consistency (SC). This post illustrates how acq-rel allows more interleaving than SC through examples and assembly analysis.
§Store buffer
A store buffer is a per-CPU buffer that holds pending stores before they are committed to main memory and visible to other CPUs. Within a CPU, operations follow program order, but store buffers can cause surprising behaviors in multi-core environments.
This is demonstrated using a litmus test.
1 | C store-buffer |
Running herd7 -c11 -cat rc11.cat storebuffer.litmus produces:
1 | Ok |
The result shows that writes to x and y can be held in store buffers, making them temporarily invisible to other
CPUs. This behavior is not allowed under SC — verified by running the same test with -cat sc.cat.
§Assembly
Let’s examine the compiled assembly on two architectures.
First, the C code:
1 |
|
Since both threads follow the same logic, we only show one.
§x64
The corresponding assembly (with -O):
1 | thread0: |
On x64, acq-rel compiles down to regular mov instructions. That’s why acq-rel is often described as “free” on x64.
But since x64 allows store-load reordering, the result r1 = r2 = 0 can still occur.
Using SC (memory_order_seq_cst), the compiler emits xchg, which flushes the store buffer, ensuring the store becomes
visible to all CPUs. This enforces sequential consistency.
1 | thread0: |
§aarch64
On ARM64 (compiled with clang -O -march=armv8.6-a), acq-rel generates stlr (store-release) and ldapr (load-acquire).
These instructions include memory-ordering semantics and are more costly than regular str and ldr.
1 | thread0: |
However, ldapr may still be reordered with respect to earlier stores. To prevent this and enforce SC semantics, the
compiler uses ldar instead for memory_order_seq_cst.
1 | thread0: |
§Summary
Acquire-release ordering allows certain instruction reorderings that sequential consistency does not. This difference shows up in:
- Litmus tests: acq-rel permits outcomes that SC forbids
- Assembly:
- On x64: acq-rel uses
mov, SC usesxchg - On aarch64: acq-rel uses
stlr/ldapr, SC usesstlr/ldar
- On x64: acq-rel uses
§References
- https://stackoverflow.com/questions/67397460/does-stlrb-provide-sequential-consistency-on-arm64
- https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/armv8-sequential-consistency
- https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/enabling-rcpc-in-gcc-and-llvm