From https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence :

atomic_thread_fence imposes stronger synchronization constraints than an atomic store operation with the same std::memory_order. While an atomic store-release operation prevents all preceding reads and writes from moving past the store-release, an atomic_thread_fence with memory_order_release ordering prevents all preceding reads and writes from moving past all subsequent stores.

Just a small herd7 example to help me unpack the text.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
C fence_vs_atomic

{
[x] = 0;
[y] = 0;
[z] = 0;
}

P0 (atomic_int* x, atomic_int* y, atomic_int* z) {
atomic_store_explicit(x, 1, memory_order_relaxed);

// 1. preceding operations can't go past this store (y here, but z can flow up)
atomic_store_explicit(y, 1, memory_order_release);

// 2. preceding operations can't go pass any stores (y and z here)
/* atomic_thread_fence(memory_order_release); */
/* atomic_store_explicit(y, 1, memory_order_relaxed); */

atomic_store_explicit(z, 1, memory_order_relaxed);
}

P1 (atomic_int* x, atomic_int* y, atomic_int* z) {
r0 = atomic_load_explicit(z, memory_order_acquire);
if (r0 == 1) {
r1 = atomic_load_explicit(x, memory_order_relaxed);
}
}

exists
( true
/\ 1:r0 == 1
/\ 1:r1 == 0
)

The specified result can occur with approach 1 but not with approach 2, because a fence offers stronger synchronization than an atomic store.