Non-Uniform Shuffling
2020-01-25, post № 224
C, Haskell, mathematics, programming, #algorithms, #false implementation, #probability theory
A shuffle of a finite sequence of length 𝑛 of distinguishable elements refers to an algorithmic process which — modulo pseudo-randomness — can be modeled as a random variable uniformly distributed on the permutations .
However, most pseudo-random entropy sources provide only a pseudo-uniformly distributed realization of , leading to the necessity of finding an algorithmic transformation process if one wishes to achieve a shuffle.
In the following, I will assume that a transforming process to a family of independent and uniformly on distributed random variables is already present for any .
One naive and seemingly correct (it is not) approach is to traverse the given sequence, uniformly swapping the current entry with another one, i. e.
void falseShuffle(uint64_t *arr, size_t len) { for (size_t j = 0; j < len; j++) swap(arr, j, unif(len)); }
as an exemplary C implementation where is independent and uniformly distributed on .
To prove the non-uniformity postulated above, I first present the following number-theoretic result.
falseShuffle
was indeed non-trivially distributed uniformly. Without loss of generality, all involved probability spaces were finite. Then there had to exist a surjection from this algorithm’s entropic state to with fibers of the same finite cardinality, implying . By the above proven claim, 𝑛 < 𝟥 followed, making the distribution trivial.One possible reason for the surprising nature of this non-uniformity is the striking source code resemblance to a correct implementation, i. e.
void shuffle(uint64_t *arr, size_t len) { for (size_t j = 0; j < len; j++) swap(arr, j, j + unif(len - j)); }
as an exemplary C implementation which can be inductively shown to resemble the same structure as , in each step sprinkling in some uniform randomness and thus being itself uniformly distributed.
To see just how non-uniform falseShuffle
is, I have calculated its discrete density for 𝑛 = 𝟦:
[ | ] [ | ||| ] [ | ||| ] [ | ||| ] [ || ||||||| || ] [||||| |||||||| ||| ||] [|||||||||||||||||| || ||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] [||||||||||||||||||||||||] n = 4
If it was uniformly distributed, the discrete density would look like a rectangle; [||||| ... |||||]
. Further plots for 𝟢 ≤ 𝑛 ≤ 𝟨 are shown in non-uniform-shuffling.txt.
Source code for the analysis and plotting: non-uniform-shuffling.hs. Empirical evidence of non-uniformity: non-uniform-shuffling.c.