Compiling to native brainfuck (#248)

Jonathan Frech, 4 September 2021

Esoteric languages explore novel approaches to computing. Whilst the vast majority of traditional languages decend from C-style von Neumann architecture commandments, these language’s notation and approach to data processing is by design and identity different. Yet in contrast to golfing languages, graphical languages or the even more esoterically inclined, brainfuck can be thought of as a supremely distilled imperative instruction set rather than an unknown beast entirely. Furthermore, its Unix-like approach to I/O makes it suitable to truly use as a target instruction set as the isomorphism behind its Turing completeness is not as opaque as in other esolangs. However, with its minimalistic nature comes questionable performance and tedious development overhead. Thus, a lot of brainfuck programs found in the wild are mere Kolmogorov complexity solutions, taking no input and being optimized to tersely output a constant string of bytes. Nevertheless, writing non-Kolmogorov programs by hand is possible as I have shown in e. g. Interpreting brainfuck in C.
To be able to write more complex programs and compile them to native brainfuck, I have been working since late 2018 on an abstraction concept and implementation thereof. Interest faded inexplicably around Christmas of 2019 such that I only now — one an a half years later — polished my Brainfuck Compiler Project to the point of readiness for unveiling. Notably, it has already been used without mention in Generating the Prouhet-Thue-Morse sequence in brainfuck.

At its core lie two means of abstraction: firstly, the raw tape is symbolically named and scoped via the Tape Abstraction Language which secondly gets extended to a Macro Assembler complete with a basic usage-oriented cell type system and natural derivation capabilities. After all macros are expanded, the residue is compiled to brainfuck. For execution purposes, said brainfuck code is transpiled to C and compiled to a binary using an off-the-shelve C compiler. The result is a fairly readable LISP-inspired macro language which compiles down to machine-executable brainfuck.

Brainfuck Compiler Project is truly free software in the moral interpretation. All project files are bundled in a single tarball: 2021-09-04_jonathan-frech_brainfuck-compiler-project.tar ⁠¹. For ease of use I wrote a shell script which downloads said tarball, installs it temporarily and runs it on a given source file: run.sh

-=-

Tape Abstraction Language

When writing brainfuck by hand, I feel the most important thing to keep in mind is one’s tape layout. As such, the Tape Abstraction Language has been designed to abstract away individual cells and their index to named and scoped one-byte variables. Its six commands are:

(new variable-name)........: introduce a new variable
(pnt variable-name)........: change the current cell to point at a variable
(num value)................: add a value to the current cell
(fgt variable-name)........: forget an introduced variable
(bf brainfuck-instructions): embed given brainfuck instructions
(ass expected-value).......: assert that the current cell's value is as expected

With these, calculating the numerical product’s lower eight bits of the first two input bytes resembles a primitive assembler (I here assume the tape is zero-initialized):

; take two bytes as input and output their product's byte

(new $a)
(pnt $a)
(bf ,)
(new $b)
(pnt $b)
(bf ,)

(new $a*b)
(new $b-cpy)

(pnt $a)
(bf [)
    (pnt $b)
    (bf [)
        (pnt $b-cpy)
        (bf +)
        (pnt $a*b)
        (bf +)

        (pnt $b)
        (bf -)
    (bf ])
    (pnt $b-cpy)
    (bf [)
        (pnt $b)
        (bf +)

        (pnt $b-cpy)
        (bf -)
    (bf ])

    (pnt $a)
    (bf -)
(bf ])

(pnt $a*b)
(bf .)

The above compiles to >,>,<[>[>>+<+<-]>>[<<+>>-]<<<-]>>. which can be verified by running the following zsh command:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-product-tal.ma)

Macro Assembler

Looking at the above code to multiply two bytes, the Tape Abstraction Language only manages to name cells, not reducing development burden by much. This is where the Macro Assembler comes into play, as repeated patterns can be extracted and itself given a name. Using prelude.ma’s definitions, a terser multiplication can be written:

; take two bytes as input and output their product's byte

(let:nve $a*b 0 (
    (let:nve $a 0 (
        (getc:w $a)
        (let:nve $b 0 (
            (getc:w $b)
            (while:re $a (
                (add:ra $b $a*b)
                (dec:a $a)))))))
    (putc:r $a*b)))

The above compiles to >[-]>[-],>[-],<[>>[-]<[-<<+>>>+<]>[-<+>]<<-]<. which is longer than the previous compilation result but functionally equivalent. Its compilation can be verified using:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-product.ma)

Type System

To further aid in understanding, I devised a rudimentary type system whereby each argument is described by a letter following a macro’s name, separated by a colon. These letters’ meanings are:

r....: read
a,u..: alter, unknown
w,b,z: write, boolean, zero
e,v,n: expression, value, new

Despite documenting macro’s effects on cells, these type hints also allow automatic deriving of copy and value-to-cell insertions:

(derive if:zee if:ree)
(derive <=:uub <=:rrb)
(derive divides?:rrb divides?:vrb)

Examples

Sparking this post, a recent PPCG submission of mine made use of this very compiler — Remove oddly nested substrings. As previously mentioned, I had used it before in Generating the Prouhet-Thue-Morse sequence in brainfuck without mentioning it. As a final example, I present a calculation of the first $\pi(2^8-1)=54$ $\pi(2^8-1)=54$ $\pi(2^8-1)=54$ prime numbers:

; print all primes below `256` in their decimal ASCII character form
(macro byte-primes () (
    (let:nve $a 1 (
        (while:re $a (
            (let:ne $flg (
                ;(printf%03d:r $a) (putc:v ':') (putc:v ' ')
                (let:nve $b 254 (
                    (tau:ra $a $b)
                    (ifnot:ze $b (
                        (printf%03d:r $a)
                        (newline)))))))
            (inc:a $a)))))))
(byte-primes)

Its compiled brainfuck clocks in at nearly two thausand bytes, mostly due to an inefficient treatment of constants. To try it out, you may use:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-primes.ma)

Closing Thoughts

Despite the staleness of this project I truly believe targeting brainfuck can be advantageous: its performance drop when compared to natively compiled programs can be overcome with fast CPUs and low task loads and its utter simplicity radiates a beauty matched by no other architecture I know of. As such, I definitely plan to develop my compiler further in the future — a reachable goal being replacing known Unix core utilities entirely by brainfuck-based rewrites, reducing my system’s bloatiness in the process.

[1]	For documentation purposes, the project’s final state before the writing of this post is also preserved: 2020-01-18_jonathan-frech_brainfuck-compiler-project.tar