The first solution proposed to this SO question is not so obvious why it’s correct.
The way to understand it is to think of the problem recursively. The invariant is the following:
The reverse of an array of bits is the swap of the high part and low part after they are reversed respectively.
16 shifting in the return statement should be more comprehensible, if we assume high and low part of
x have been reversed respectively.
Since we are thinking of it recursively, the assumption holds (because we have faith). Then, we could move on to the base case.
The two magical numbers seems quite arbitrary until we see their binary representation:
0xaaaaaaaa = 0b1010101010101010101010101010101010101010
So, we are doing the two bits swapping in the base case, which can be considered as a kind of optimization, for the real base case is reversing a single bit, which is a no-op. Therefore, the code is the non-recursive implementation of a recursive algorithm.
The algorithm is pretty smart, and I don’t think I could come up with it on the spot. Instead, the linear scanning is more my type, straightforward, and slow, as benchmarked below.
On my box, the difference is noticeable:
$ clang -O -DNDEBUG test.c ; time -f "%e" ./a.out