Memory Model

Memory model (MM) defines in what order writes from one thread are propagated to other threads. It’s undoubtedly a complex problem, and it took me >5 years to not feel completely lost concerning it. MM is provided by either the language/compiler or the architecture. If you don’t work on the compiler, learning the language/compiler MM is enough; otherwise, you need to know both the language/compiler and the architecture MM, and how to connect the two (they are usually different). In this post, I will cover two MMs offered by C11/C++11; no architecture specific MMs are discussed.

§Sequential Consistency (SC)

Sequential Consistency (SC) means that conceptually all memory operations (read/write) go through a single unit, and the effect of each memory operation is visible immediately to all following memory operations. One can picture that all statements from different threads are interleaved to form a single stream. This interleaving semantics is easy and intuitive for most programmers, so languages often use this by default, and programmers are recommended to stick to the default.

Implementing SC means that all accesses to shared resources must go through the main memory, which is prohibitively expensive, resembling a cache-less system. Alternatively, programmers can annotate the shared resources that are actually used for inter-thread communication, and only accesses to those special resources are synchronized, while others could still be cached. Therefore, instead of maintaining a total order, we can to strive for a partial order for all annotated accesses; acquire-release (acq-rel) is such a relaxed memory model.

§Acquire-release (acq-rel)

Acquire-release means that all memory operations before a release would be visible to another thread synchronizing using the corresponding acquire. Such pairing creates an ordering must be maintained by the compiler/hardware, meaning such the pairing accesses would be a bit more expensive, a pay-as-you-need schema, to some extent.

The classical example to illustrate the difference between these two MMs is shown below. Here, I am using herd7, which is a MM simulator. After running it with herd7 -c11 test.litmus, it reports that our assertion is observable, Positive: 1. This means P2 sees x=10 precedes y=20, while P3 sees y=20 precedes x=10; in other words, there’s no globally agreed ordering regarding the writes to x and y. If we switch to SC (use memory_order_seq_cst), such state becomes not allowed; in other words, such conflicting views from P2 and P3 can’t coexist in SC.

C acq-rel_vs_sc

{}

P0 (atomic_int* y,atomic_int* x) {
    atomic_store_explicit(x,10, memory_order_release);
}

P1 (atomic_int* y,atomic_int* x) {
    atomic_store_explicit(y,20, memory_order_release);
}

P2 (atomic_int* y,atomic_int* x) {
    atomic_int r0 = atomic_load_explicit(x, memory_order_acquire);
    atomic_int r1 = atomic_load_explicit(y, memory_order_acquire);
}

P3 (atomic_int* y,atomic_int* x) {
    atomic_int r0 = atomic_load_explicit(y, memory_order_acquire);
    atomic_int r1 = atomic_load_explicit(x, memory_order_acquire);
}

exists
(  true
/\ 2:r0=10
/\ 2:r1==0
/\ 3:r0=20
/\ 3:r1=0
)

Having a tool to “see” the effect of different memory models is crucial; had I took a serious look at herd, I wouldn’t be wasting so much time reading so many posts online… To make matters worse, the C memory model is a moving target, so many posts are outdated and/or even incorrect, e.g. GCCMM. Therefore, the final message is that just try herd and visualize different memory models for yourself.