Non-Uniform Shuffling

2020-01-25, post № 224

C, Haskell, mathematics, programming, #algorithms, #false implementation, #probability theory

A shuffle of a finite sequence of length 𝑛 of distinguishable elements refers to an algorithmic process which — modulo pseudo-randomness — can be modeled as a random variable uniformly distributed on the permutations \mathbb{S}_n.
However, most pseudo-random entropy sources provide only a pseudo-uniformly distributed realization of \mathbb{F}_2^\mathbb{N}, leading to the necessity of finding an algorithmic transformation process if one wishes to achieve a shuffle.
In the following, I will assume that a transforming process to a family of independent and uniformly on \{1..n\} distributed random variables is already present for any n\in\mathbb{N}.

One naive and seemingly correct (it is not) approach is to traverse the given sequence, uniformly swapping the current entry with another one, i. e.

void falseShuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, unif(len)); }

as an exemplary C implementation where \texttt{unif}(n) is independent and uniformly distributed on \{0..n-1\}.

Yet, even though sensible on first sight, the above defined random variable is only in the most trivial cases uniformly distributed and — as empirical evidence suggests, see below — horrendously non-uniformly distributed otherwise.
To prove the non-uniformity postulated above, I first present the following number-theoretic result.
Claim. In only three trivial cases does the factorial of a natural number divide its tetration; formally
\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.
Proof. Let n\in\mathbb{N}_{>2} be a natural number larger than two. By the definition of the factorial, \prod_{p<n\ \text{prime}}p\mid n! is evident. Assume n!\mid n^n. Adhering to the uniqueness of prime factorizations, \prod_{p<n}p\mid n follows. Observe that n-1>1 has to be prime since \forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p, implying n-1\mid\prod_{p<n}p\mid n which cannot hold for 𝑛 > 𝟤.
Proof. Now suppose, falseShuffle was indeed non-trivially distributed uniformly. Without loss of generality, all involved probability spaces were finite. Then there had to exist a surjection from this algorithm’s entropic state to \mathbb{S}_n with fibers of the same finite cardinality, implying n!\mid n^n. By the above proven claim, 𝑛 < 𝟥 followed, making the distribution trivial.

One possible reason for the surprising nature of this non-uniformity is the striking source code resemblance to a correct implementation, i. e.

void shuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, j + unif(len - j)); }

as an exemplary C implementation which can be inductively shown to resemble the same structure as \mathbb{S}_n, in each step sprinkling in some uniform randomness and thus being itself uniformly distributed.

To see just how non-uniform falseShuffle is, I have calculated its discrete density for 𝑛 = 𝟦:

[       |                ]
[   |   |||              ]
[   |   |||              ]
[   |   |||              ]
[   ||  ||||||| ||       ]
[||||| |||||||| |||    ||]
[|||||||||||||||||| || ||]
n = 4

If it was uniformly distributed, the discrete density would look like a rectangle; [||||| ... |||||]. Further plots for 𝟢 ≤ 𝑛 ≤ 𝟨 are shown in non-uniform-shuffling.txt.

Source code for the analysis and plotting: non-uniform-shuffling.hs. Empirical evidence of non-uniformity: non-uniform-shuffling.c.

Generating the Prouhet-Thue-Morse sequence in brainfuck

2019-12-28, post № 223

brainfuck, mathematics, programming, #recursive, #Thue-Morse sequence


Try it online.

Factoids #1

2019-11-30, post № 222

mathematics, #F2, #OEIS

IV) Commutative, non-associative operations

For any natural number 𝑛, let \mathrm{Op}_n:=\left\{\star:\mathbb{Z}^2_n\to\mathbb{Z}_n\right\} denote the set of all operations on a set of that order. An operation shall be called commutative iff \mathrm{commut}(\star):\Leftrightarrow\forall\,x,y\in\mathbb{Z}_n:x\star y=y\star x and be called associative iff \mathrm{assoc}(\star):\Leftrightarrow\forall\,x,y,z\in\mathbb{Z}_n:x\star(y\star z)=(x\star y)\star z holds.

With the above defined, one may study \mathrm{CnA}_n:=\{\star\in\mathrm{Op}_n:\mathrm{commut}(\star)\land\lnot\mathrm{assoc}(\star)\}. For 𝑛 = 𝟤, this set is nonempty for the first time, containing a manageable two elements, by name

\mathrm{CnA}_2=\Big\{\mathrm{nor}:(x,y)\mapsto 1+xy,\quad\mathrm{nand}:(x,y)\mapsto (1+x)\cdot(1+y)\Big\}.

However, based on the superexponential nature of \#\mathrm{Op}_n=\#\mathbb{Z}_n^{\mathbb{Z}_n^2}=\#\mathbb{Z}_n^{{\#\mathbb{Z}_n}^2}=n^{n^2}, the sequence \mathrm{A079195}_n:=\#\mathrm{CnA}_n likely also grows rather quickly, OEIS only listing four members;


Based on this limited numerical evidence, I would suspect the commutative yet non-associative operations to be rather sparse, i. e.


Analysis source: factoids-1_operations.hs

(Non-)commutative and (non-)associative operations have also been studied nearly twenty years ago by Christian van den Bosch, author of OEIS sequence A079195. Unfortunately, their site appears to be down, which is where they hosted closed binary operations on small sets (resource found on web.archive.org).

V) Arbitrary polynomial extremum difference

Let 𝜀 > 𝟢 be an arbitrary distance, define g:=-4x^4-x^3+8x^2+3x-4. Then f:=\sfrac{\epsilon}{4}\cdot g has two local maxima at - 𝟣 and 𝟣, whose vertical distance is 𝜀.

VI) Digit sum roots

It holds that \mathrm{ds}_{10}(108^{12})=108.

Jonathan Frech's blog; built 2021/04/16 20:21:20 CEST