This post covers two kinds of programs, CPU-bound and IO-bound, and how to identify them using `perf`. Using two instructions, `nop` and `pause`, we can construct representatives easily.

• nop: No operation.
• pause: Suspends execution of the thread for a number of cycles to free resources for the sibling SMT thread to proceed.

## §CPU-bound: nop instruction

So, the highest IPC on my box is 4 (“3.97 insn per cycle”).

## §IO-bound: pause instruction

The IPC is only 0.01, i.e, the latency of `pause` instruction is ~100 cycles.

Some commands to identify and control hyperthreading:

## §Experiment

The following is a series of experiments to uncover the interaction between CPU/IO-bound programs and SMT.

#### §1. Running two instances of `nop` on a single CPU:

CPU = 0.5 and IPC = 4

Each thread runs for 50% time and whichever is running on CPU, it can achieve the highest throughput.

#### §2. Running two instances of `nop` on two CPUs belonging to the same core:

CPU = 1 and IPC = 2

Each thread has exclusive access to the assigned CPU but the two CPUs share some computing resource due to hyperthreading, cutting the throughput by half for each.

Lesson from scenario 1 & 2: for CPU-bound programs, hyperthreading is useless, the overall throughput is the same, CPU * IPC = 2.

#### §3. Running two instances of `pause` on a single CPU:

CPU = 0.5 and IPC = 0.01

The same as scenario 1.

#### §4. Running two instances of `pause` on two CPUs belonging to the same core:

CPU = 1 and IPC = 0.01

IPC stays the same because `pause` does not consume much CPU resource.

Lesson from scenario 3 & 4: for IO-bound programs, hyperthreading is super useful, the overall throughput doubles.

#### §5. Running `nop` and `pause` on two CPUs belonging to the same core:

CPU = 1 and IPC = 2.5 vs 0.01

`pause` can returns some CPU resources to its sibling CPU, compared with scenario 2.

In summary, IPC can be used to quickly check the bottleneck of a program.