§classic acq-rel pairing for inter-thread data sharing

One common pattern in concurrent programming is that a producer prepares some data, sets the signal; then, the consumer sees the signal and reads the data. The code looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[init]
data = 0
flag = 0

T0:

data = ...
flag = 1

T1:

if (flag == 1) {
... = data
}

The recommended memory order to use here is release for writing the flag, and acquire for reading the flag. flag needs to be of atomic type as well, because it’s read/write concurrently. In contrast, data needs not to be atomic.

§dependent load

Next we look at a variant of the above scenario, where data is shared via a pointer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[init]
data = 0
int* data_ptr = nullptr

T0:

data = ...
data_ptr = &data

T1:

if (data_ptr != nullptr) {
... = *data_ptr // de-ref the data pointer
}

One can continue use acq-rel pair as before. However, the use of dependent-load to retrieve data offers a lighter synchronization. The only ordering we actually care is that data is written before data_ptr is updated. Nonetheless, using acq-rel means all writes in T0 before release are visible to T1 after acquire; an overkill for this scenario.

consume is meant to resolve this, providing the minimal enough synchronization: only memory reads depending on the release can see the writes from T0.

The following is an example to illustrate the difference between acq-rel and rel-consume. x is read through a pointer (dependent load), while y is read directly. Therefore, rel-consume only guarantees read_x gets the updated value, but not necessarily read_y.

Using acquire: read_x == 1, read_y == 1 Using consume: read_x == 1, read_y == 0 or 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <pthread.h>
#include <assert.h>
#include <stdatomic.h>
#include <stdbool.h>
#include <stdint.h>

int x = 0;
int y = 0;
_Atomic(int*) ptr = nullptr;

int read = 0;
int read_x = 0;
int read_y = 0;

void* t0(void* unused)
{
x = 1;
y = 1;

atomic_store_explicit(&ptr, &x, memory_order_release);

return nullptr;
}

void* t1(void* unused)
{
auto l_ptr = atomic_load_explicit(&ptr, memory_order_consume);
// auto l_ptr = atomic_load_explicit(&ptr, memory_order_acquire);
if (l_ptr != nullptr) {
read = 1;
read_x = *l_ptr;
read_y = y;
}

return nullptr;
}

int main()
{
pthread_t threads[2];

pthread_create(
&threads[0],
nullptr,
t0,
nullptr);
pthread_create(
&threads[1],
nullptr,
t1,
nullptr);

for (auto i = 0; i < 2; ++i) {
pthread_join(threads[i], nullptr);
}

if (read == 1) {
assert(read_x == 1); // <- this can *not* fail using `consume` or `acquire`
// assert(read_y == 1); // <- this can fail using `consume` but can *not* fail using `acquire`
}
return 0;
}

§references