Non-Uniform Shuffling

2020-01-25, post № 224

C, Haskell, mathematics, programming, #algorithms, #false implementation, #probability theory

A shuffle of a finite sequence of length 𝑛 of distinguishable elements refers to an algorithmic process which — modulo pseudo-randomness — can be modeled as a random variable uniformly distributed on the permutations $\mathbb{S}_n$ .
However, most pseudo-random entropy sources provide only a pseudo-uniformly distributed realization of $\mathbb{F}_2^\mathbb{N}$ , leading to the necessity of finding an algorithmic transformation process if one wishes to achieve a shuffle.
In the following, I will assume that a transforming process to a family of independent and uniformly on $\{1..n\}$ distributed random variables is already present for any $n\in\mathbb{N}$ .

One naive and seemingly correct (it is not) approach is to traverse the given sequence, uniformly swapping the current entry with another one, i. e.

void falseShuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, unif(len)); }

as an exemplary C implementation where $\texttt{unif}(n)$ is independent and uniformly distributed on $\{0..n-1\}$ .

Yet, even though sensible on first sight, the above defined random variable is only in the most trivial cases uniformly distributed and — as empirical evidence suggests, see below — horrendously non-uniformly distributed otherwise.
To prove the non-uniformity postulated above, I first present the following number-theoretic result.

Claim. In only three trivial cases does the factorial of a natural number divide its tetration; formally

$\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.$

Proof. Let $n\in\mathbb{N}_{>2}$ be a natural number larger than two. By the definition of the factorial, $\prod_{p<n\ \text{prime}}p\mid n!$ is evident. Assume $n!\mid n^n$ . Adhering to the uniqueness of prime factorizations, $\prod_{p<n}p\mid n$ follows. Observe that $n-1>1$ has to be prime since $\forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p$ , implying $n-1\mid\prod_{p<n}p\mid n$ which cannot hold for 𝑛 > 𝟤.

Proof. Now suppose, falseShuffle was indeed non-trivially distributed uniformly. Without loss of generality, all involved probability spaces were finite. Then there had to exist a surjection from this algorithm’s entropic state to $\mathbb{S}_n$ with fibers of the same finite cardinality, implying $n!\mid n^n$ . By the above proven claim, 𝑛 < 𝟥 followed, making the distribution trivial.

One possible reason for the surprising nature of this non-uniformity is the striking source code resemblance to a correct implementation, i. e.

void shuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, j + unif(len - j)); }

as an exemplary C implementation which can be inductively shown to resemble the same structure as $\mathbb{S}_n$ , in each step sprinkling in some uniform randomness and thus being itself uniformly distributed.

To see just how non-uniform falseShuffle is, I have calculated its discrete density for 𝑛 = 𝟦:

[       |                ]
[   |   |||              ]
[   |   |||              ]
[   |   |||              ]
[   ||  ||||||| ||       ]
[||||| |||||||| |||    ||]
[|||||||||||||||||| || ||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
n = 4

If it was uniformly distributed, the discrete density would look like a rectangle; [||||| ... |||||]. Further plots for 𝟢 ≤ 𝑛 ≤ 𝟨 are shown in non-uniform-shuffling.txt.

Source code for the analysis and plotting: non-uniform-shuffling.hs. Empirical evidence of non-uniformity: non-uniform-shuffling.c.