The first solution proposed to this SO question is not so obvious why it’s correct.

The way to understand it is to think of the problem recursively. The invariant is the following:

The reverse of an array of bits is the swap of the high part and low part after they are reversed respectively.

Now, the 16 shifting in the return statement should be more comprehensible, if we assume high and low part of x have been reversed respectively. Since we are thinking of it recursively, the assumption holds (because we have faith). Then, we could move on to the base case.

The two magical numbers seems quite arbitrary until we see their binary representation:

1
2
0xaaaaaaaa = 0b1010101010101010101010101010101010101010
0x55555555 = 0b0101010101010101010101010101010101010101

So, we are doing the two bits swapping in the base case, which can be considered as a kind of optimization, for the real base case is reversing a single bit, which is a no-op. Therefore, the code is the non-recursive implementation of a recursive algorithm.

The algorithm is pretty smart, and I don’t think I could come up with it on the spot. Instead, the linear scanning is more my type, straightforward, and slow, as benchmarked below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <stdlib.h>
#include <assert.h>

typedef unsigned int uint;
static unsigned int *ret;

unsigned int reverse_linear(unsigned int x)
{
assert(sizeof(unsigned int) == 4);
uint y = 0;
for (uint i = 0; i < 32; ++i) {
y |= ((x >> (31-i) & 1) << i);
}
return y;
}

unsigned int reverse(unsigned int x)
{
assert(sizeof(unsigned int) == 4);
x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
return ((x >> 16) | (x << 16));
}

int main (int argc, char** argv) {
uint count = 10*1000*1000;
ret = malloc(count * sizeof(int));
uint x;
for (uint i = 0; i < count; ++i) {
x = (unsigned int)rand();
assert(reverse_linear(x) == reverse(x));
// ret[i] = reverse(x);
ret[i] = reverse_linear(x);
}
return 0;
}

On my box, the difference is noticeable:

1
2
3
4
5
$ clang -O -DNDEBUG test.c ; time -f "%e" ./a.out
0.10
$ # after changing to use `reverse_linear`
$ clang -O -DNDEBUG test.c ; time -f "%e" ./a.out
0.24