Rebalancing Linux Pages Across NUMA Nodes
Rebalancing pages across NUMA nodes can improve locality. After the CPU-side owner of a memory range changes, keeping its pages on the old node can turn future accesses into remote accesses. For discardable anonymous memory, Linux can do this without page migration. The usual sequence is to discard the old pages first, set the NUMA policy for future faults, then touch the range so Linux allocates new pages on the preferred node. move_pages can be used for verification.
madvise(MADV_DONTNEED): discard resident pages in the range.set_mempolicy(MPOL_PREFERRED): choose the preferred node for future faults.- Later memory access: refault the page under the current policy.
move_pages(..., nodes = NULL, ...): query page locations and verify that the new pages landed on the desired nodes.
MPOL_PREFERRED can fall back if the requested node cannot satisfy the allocation. In contrast, use MPOL_BIND when fallback should be treated as failure.
This is discard and refault, not migration. The old contents are gone.
The program below verifies the behavior for one normal anonymous page and one 2MB anonymous MAP_HUGETLB page. It faults each mapping on node 0, discards the page, prefers node 1, then refaults. For the HugeTLB test, reserve some 2MB huge page on each node first:
1 | echo 10 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages |
1 |
|
Sample output:
1 | policy: MPOL_PREFERRED |