G1 uses two kinds of barriers to maintain certain GC invariants, while mutators update the objects graph concurrently.

  1. The pre-barrier is some code mutators execute before a store operation, and it is used to maintain the Snapshot-At-The-Beginning (SATB) invariant, i.e. all objects reachable at the marking-start will be marked in this marking cycle.

  2. The post-barrier is some code mutators execute after a store operation, and it is used to maintain the cross-generation invariant, i.e. all pointers from old-gen to young-gen must be identified in some way.

In this post, we will focus on the pre-barrier and see its implementation in OpenJDK. (These two kinds of barriers are usually next to each other in the codebase, so one can easily do the same for the post-barrier after going through this post.)

§Overview

The essence of pre-barrier logic is:

1
2
3
4
5
if (is_marking_active) {
if (pre_val != null) {
enqueue(pre_val);
}
}

The pre_val is the previous value that is about to be overwritten by a store operation, e.g. in the case of o.field = new_obj, pre_val would hold the value in o.field before the assignment. The barrier code consists of two checks and a function call to record the previous value.

Next we will use a trivial java program to help us study how this barrier logic is implemented in the interpreter and JIT compilers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class hello {
static class A {
public A x;
}

static void f(A a) {
a.x = new A();
}

public static void main(String[] args) {
var a = new A();
f(a);
}
}

The interesting part is the f method, which contains a store operation overwriting a field (a.x) — this triggers the pre-barrier. The following command prints all relevant assembly for the interpreter, JIT compilers, and the stub code.

1
2
3
4
5
6
7
8
9
10
11
12
$ java \
-Xcomp \
-XX:TieredStopAtLevel=4 \
-XX:+UnlockDiagnosticVMOptions \
-XX:PrintAssemblyOptions=intel \
-XX:+PrintInterpreter \
-XX:CompileCommand='CompileOnly,*hello.f' \
-XX:CompileCommand='DontInline,*hello.f' \
-XX:CompileCommand='PrintAssembly,*hello.f' \
-XX:+PrintStubCode \
-XX:-UseCompressedOops \
hello.java

§Template Interpreter

The bytecode corresponding to “store operation” is putfield, so the top-level caller is TemplateTable::putfield, which eventually calls G1BarrierSetAssembler::g1_write_barrier_pre, and the generated assembly looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
putfield  181 putfield  [0x00007fbde0567500, 0x00007fbde0568470]  3952 bytes
...
0x00007fbde056777e: cmp BYTE PTR [r15+0x40],0x0 <--- (1) is_marking_active
0x00007fbde0567783: je 0x00007fbde0567966
0x00007fbde0567789: mov rbx,QWORD PTR [rdx]
0x00007fbde056778c: cmp rbx,0x0 <--- (2) pre_val is null or not
0x00007fbde0567790: je 0x00007fbde0567966
0x00007fbde0567796: mov r8,QWORD PTR [r15+0x28]
0x00007fbde056779a: cmp r8,0x0 <--- (3) thread-local buffer has free slots or not
0x00007fbde056779e: je 0x00007fbde05677b8
0x00007fbde05677a4: sub r8,0x8
0x00007fbde05677a8: mov QWORD PTR [r15+0x28],r8
0x00007fbde05677ac: add r8,QWORD PTR [r15+0x38]
0x00007fbde05677b0: mov QWORD PTR [r8],rbx
0x00007fbde05677b3: jmp 0x00007fbde0567966
...
0x00007fbde056789f: call 0x00007fbdef7699d0 <--- (4) call G1BarrierSetRuntime::write_ref_field_pre_entry

Instructions annotated with (1) and (2) are the two aforementioned if checks. The code corresponding to (3), handling buffers being full, is inside enqueue, which is (4), i.e. G1BarrierSetRuntime::write_ref_field_pre_entry.

§C1 Compiler

The relevant methods are:

1
2
3
G1BarrierSetC1::pre_barrier
-> G1BarrierSetAssembler::gen_pre_barrier_stub
-> G1BarrierSetAssembler::generate_c1_pre_barrier_runtime_stub

and the generated assembly looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
============================= C1-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c1) 9758 81 3 hello::f (12 bytes)
...
[Disassembly]

...

0x00007fbdd90002ea: movsx esi,BYTE PTR [r15+0x40]
0x00007fbdd90002ef: cmp esi,0x0 <--- (1)
0x00007fbdd90002f2: jne 0x00007fbdd9000393
...
;; G1PreBarrierStub slow case
0x00007fbdd9000393: mov rsi,QWORD PTR [rbx+0x10]
0x00007fbdd9000397: cmp rsi,0x0 <--- (2)
0x00007fbdd900039b: je 0x00007fbdd90002f8
0x00007fbdd90003a1: mov QWORD PTR [rsp],rsi
0x00007fbdd90003a5: call 0x00007fbde05beca0 ; {runtime_call g1_pre_barrier_slow}
0x00007fbdd90003aa: jmp 0x00007fbdd90002f8
...
[/Disassembly]
...

- - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Decoding RuntimeStub - g1_pre_barrier_slow 0x00007fbde05bec10
--------------------------------------------------------------------------------
0x00007fbde05beca0: push rbp
...
0x00007fbde05beca6: mov rdx,QWORD PTR [r15+0x28]
0x00007fbde05becaa: test rdx,rdx <--- (3)
0x00007fbde05becad: je 0x00007fb2ac6fe24b
0x00007fbde05becb3: sub rdx,0x8
0x00007fbde05becb7: mov QWORD PTR [r15+0x28],rdx
0x00007fbde05becbb: add rdx,QWORD PTR [r15+0x38]
0x00007fbde05becbf: mov rax,QWORD PTR [rbp+0x10]
0x00007fbde05becc3: mov QWORD PTR [rdx],rax
0x00007fbde05becc6: jmp 0x00007fb2ac6fe3db
...
0x00007fbde05bed94: call 0x00007fbdef7699d0 <--- (4)
--------------------------------------------------------------------------------
- - - [END] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Note that (3) and (4) are not “inlined” in the assembly of hello.f; instead, they live in a RuntimeStub, which means another method (other than hello.f) can call this stub as well. This decreases the overall C1 assembly footprint, with a slight cost of reduced throughput.

§C2 Compiler

The implementation is in G1BarrierSetC2::pre_barrier and the generated assembly looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
--------------------------------------------------------------------------------
----------------------------------- Assembly -----------------------------------

Compiled method (c2) 9517 82 4 hello::f (12 bytes)

...
[Disassembly]
...

0x00007fbde0ad96a9: cmp BYTE PTR [r15+0x40],0x0 <--- (1)
0x00007fbde0ad96ae: jne 0x00007fbde0ad96ef
...
0x00007fbde0ad96ef: mov rdi,QWORD PTR [rbp+0x10]
0x00007fbde0ad96f3: test rdi,rdi <--- (2)
0x00007fbde0ad96f6: je 0x00007fbde0ad96b0
0x00007fbde0ad96f8: mov r10,QWORD PTR [r15+0x28]
...
0x00007fbde0ad9700: test r10,r10 <--- (3)
0x00007fbde0ad9703: je 0x00007fbde0ad97a0
...
0x00007fbde0ad97a0: mov rsi,r15
0x00007fbde0ad97a3: movabs r10,0x7fbdef7699d0 <--- (4)
0x00007fbde0ad97ad: call r10
[/Disassembly]

The structure is the same as the interpreter case.