Jonathan. Frech’s WebBlog

Non-Uniform Shuffling (#224)

Jonathan Frech,

A shuf­fle of a fi­nite sequence of length 𝑛 of distinguishable elements refers to an algorithmic process which — modulo pseudo-randomness — can be modeled as a ran­dom variable uni­form­ly dis­trib­ut­ed on the permutations $\mathbb{S}_n$$\mathbb{S}_n$$\mathbb{S}_n$.
How­ev­er, most pseudo-ran­dom entropy sources provide on­ly a pseudo-uni­form­ly dis­trib­ut­ed re­al­i­za­tion of $\mathbb{F}_2^\mathbb{N}$$\mathbb{F}_2^\mathbb{N}$$\mathbb{F}_2^\mathbb{N}$, lead­ing to the necessity of finding an algorithmic transformation process if one wishes to achieve a shuf­fle.
In the following, I will as­sume that a transforming process to a family of independent and uni­form­ly on $\{1..n\}$$\{1..n\}$$\{1..n\}$ dis­trib­ut­ed ran­dom variables is already pre­sent for any $n\in\mathbb{N}$$n\in\mathbb{N}$$n\in\mathbb{N}$.

One naive and seemingly correct (it is not) ap­proach is to tra­verse the given sequence, uni­form­ly swapping the cur­rent entry with another one, i. e.

void falseShuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, unif(len)); }

as an exemplary C im­ple­men­ta­tion where $\texttt{unif}(n)$$\texttt{unif}(n)$$\texttt{unif}(n)$ is independent and uni­form­ly dis­trib­ut­ed on $\{0..n-1\}$$\{0..n-1\}$$\{0..n-1\}$.

Yet, even though sensible on first sight, the above defined ran­dom variable is on­ly in the most trivial cases uni­form­ly dis­trib­ut­ed and — as empirical evidence suggests, see below — horrendously non-uni­form­ly dis­trib­ut­ed otherwise.
To prove the non-uniformity postulated above, I first pre­sent the following num­ber-theoretic re­sult.
Claim. In on­ly three trivial cases does the factorial of a natural num­ber di­vide its tetration; formally
$$\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.$$$$\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.$$$$\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.$$
Proof. Let $n\in\mathbb{N}_{>2}$$n\in\mathbb{N}_{>2}$$n\in\mathbb{N}_{>2}$ be a natural num­ber larger than two. By the definition of the factorial, $\prod_{p<n\ \text{prime}}p\mid n!$$\prod_{p<n\ \text{prime}}p\mid n!$$\prod_{p<n\ \text{prime}}p\mid n!$ is evident. As­sume $n!\mid n^n$$n!\mid n^n$$n!\mid n^n$. Adhering to the uniqueness of prime factorizations, $\prod_{p<n}p\mid n$$\prod_{p<n}p\mid n$$\prod_{p<n}p\mid n$ follows. Ob­serve that $n-1>1$$n-1>1$$n-1>1$ has to be prime since $\forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p$$\forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p$$\forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p$, implying $n-1\mid\prod_{p<n}p\mid n$$n-1\mid\prod_{p<n}p\mid n$$n-1\mid\prod_{p<n}p\mid n$ which cannot hold for 𝑛 > 2.
Proof. Now suppose, falseShuffle was indeed non-trivially dis­trib­ut­ed uni­form­ly. With­out loss of generality, all involved probability spaces were fi­nite. Then there had to exist a surjection from this algorithm’s entropic state to $\mathbb{S}_n$$\mathbb{S}_n$$\mathbb{S}_n$ with fibers of the same fi­nite cardinality, implying $n!\mid n^n$$n!\mid n^n$$n!\mid n^n$. By the above proven claim, 𝑛 < 3 fol­low­ed, making the dis­tri­bu­tion trivial.

One possible reason for the surprising nature of this non-uniformity is the striking source code resemblance to a correct im­ple­men­ta­tion, i. e.

void shuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, j + unif(len - j)); }

as an exemplary C im­ple­men­ta­tion which can be inductively shown to resemble the same structure as $\mathbb{S}_n$$\mathbb{S}_n$$\mathbb{S}_n$, in each step sprinkling in some uniform randomness and thus being itself uni­form­ly dis­trib­ut­ed.

To see just how non-uniform falseShuffle is, I have calculated its discrete density for 𝑛 = 4:

[       |                ]
[   |   |||              ]
[   |   |||              ]
[   |   |||              ]
[   ||  ||||||| ||       ]
[||||| |||||||| |||    ||]
[|||||||||||||||||| || ||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
n = 4

If it was uni­form­ly dis­trib­ut­ed, the discrete density would look like a rectangle; [||||| ... |||||]. Further plots for 0 ≤ 𝑛 ≤ 6 are shown in non-uniform-shuffling.txt.

Source code for the anal­y­sis and plotting: non-uniform-shuffling.hs. Empirical evidence of non-uniformity: non-uniform-shuffling.c.