Jonathan. Frech’s WebBlog

Compiling to native brain­fuck (#248)

Jonathan Frech

Esoteric languages explore novel approaches to com­put­ing. Whilst the vast majority of traditional languages decend from C-style von Neumann architecture commandments, these lan­guage’s notation and approach to data processing is by de­sign and identity different. Yet in contrast to golfing languages, graphical languages or the even more esoterically inclined, brain­fuck can be thought of as a supremely distilled imperative instruction set rather than an unknown beast entirely. Furthermore, its Unix-like approach to I/O makes it suitable to truly use as a target instruction set as the isomorphism behind its Turing com­plete­ness is not as opaque as in oth­er esolangs. However, with its minimalistic nature comes questionable per­for­mance and tedious de­vel­op­ment overhead. Thus, a lot of brain­fuck programs found in the wild are mere Kolmogorov complexity solutions, taking no input and being optimized to tersely output a con­stant string of bytes. Nevertheless, writ­ing non-Kolmogorov programs by hand is possible as I have shown in e. g. Interpreting brain­fuck in C.
To be able to write more complex programs and compile them to native brain­fuck, I have been working since late 2018 on an ab­strac­tion concept and im­ple­men­ta­tion thereof. Interest faded inexplicably around Christmas of 2019 such that I on­ly now — one an a half years later — polished my Brain­fuck Compiler Project to the point of readiness for unveiling. Notably, it has already been used without mention in Generating the Prouhet-Thue-Morse sequence in brain­fuck.

At its core lie two means of ab­strac­tion: firstly, the raw tape is symbolically named and scoped via the Tape Ab­strac­tion Lan­guage which secondly gets extended to a Macro Assembler com­plete with a basic usage-oriented cell type system and natural derivation capabilities. After all macros are expanded, the residue is compiled to brain­fuck. For execution purposes, said brain­fuck code is transpiled to C and compiled to a binary using an off-the-shelve C compiler. The result is a fairly readable LISP-inspired macro lan­guage which compiles down to ma­chine-executable brain­fuck.

Brain­fuck Compiler Project is truly free software in the moral in­ter­pre­ta­tion. All project files are bundled in a single tarball: 2021-09-04_jonathan-frech_brain­fuck-compiler-project.tar⁠¹. For ease of use I wrote a shell script which downloads said tarball, installs it temporarily and runs it on a given source file: run.sh

Early Tape Ab­strac­tion Lan­guage drafting.
-=-

Tape Ab­strac­tion Lan­guage

When writ­ing brain­fuck by hand, I feel the most important thing to keep in mind is one’s tape layout. As such, the Tape Ab­strac­tion Lan­guage has been designed to abstract away individual cells and their index to named and scoped one-byte variables. Its six commands are:

(new variable-name)........: introduce a new variable
(pnt variable-name)........: change the current cell to point at a variable
(num value)................: add a value to the current cell
(fgt variable-name)........: forget an introduced variable
(bf brainfuck-instructions): embed given brainfuck instructions
(ass expected-value).......: assert that the current cell's value is as expected

With these, calculating the numerical product’s lower eight bits of the first two input bytes resembles a primitive assembler (I here assume the tape is zero-initialized):

; take two bytes as input and output their product's byte

(new $a)
(pnt $a)
(bf ,)
(new $b)
(pnt $b)
(bf ,)

(new $a*b)
(new $b-cpy)

(pnt $a)
(bf [)
    (pnt $b)
    (bf [)
        (pnt $b-cpy)
        (bf +)
        (pnt $a*b)
        (bf +)

        (pnt $b)
        (bf -)
    (bf ])
    (pnt $b-cpy)
    (bf [)
        (pnt $b)
        (bf +)

        (pnt $b-cpy)
        (bf -)
    (bf ])

    (pnt $a)
    (bf -)
(bf ])

(pnt $a*b)
(bf .)

The above compiles to >,>,<[>[>>+<+<-]>>[<<+>>-]<<<-]>>. which can be verified by running the following zsh command:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-product-tal.ma)

Macro Assembler

Looking at the above code to multiply two bytes, the Tape Ab­strac­tion Lan­guage on­ly manages to name cells, not reducing de­vel­op­ment burden by much. This is where the Macro Assembler comes into play, as repeated patterns can be extracted and itself given a name. Using prelude.ma’s definitions, a terser multiplication can be written:

; take two bytes as input and output their product's byte

(let:nve $a*b 0 (
    (let:nve $a 0 (
        (getc:w $a)
        (let:nve $b 0 (
            (getc:w $b)
            (while:re $a (
                (add:ra $b $a*b)
                (dec:a $a)))))))
    (putc:r $a*b)))

The above compiles to >[-]>[-],>[-],<[>>[-]<[-<<+>>>+<]>[-<+>]<<-]<. which is longer than the previous compilation result but functionally equivalent. Its compilation can be verified using:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-product.ma)

Type System

To further aid in understanding, I devised a rudimentary type system whereby each argument is described by a letter following a macro’s name, separated by a colon. These letters’ meanings are:

r....: read
a,u..: alter, unknown
w,b,z: write, boolean, zero
e,v,n: expression, value, new

Despite documenting macro’s effects on cells, these type hints also allow automatic deriving of copy and value-to-cell insertions:

(derive if:zee if:ree)
(derive <=:uub <=:rrb)
(derive divides?:rrb divides?:vrb)

Examples

Sparking this post, a recent PPCG submission of mine made use of this very compiler — Remove oddly nested substrings. As previously mentioned, I had used it before in Generating the Prouhet-Thue-Morse sequence in brain­fuck without mentioning it. As a final example, I present a calculation of the first $\pi(2^8-1)=54$$\pi(2^8-1)=54$ prime numbers:

; print all primes below `256` in their decimal ASCII character form
(macro byte-primes () (
    (let:nve $a 1 (
        (while:re $a (
            (let:ne $flg (
                ;(printf%03d:r $a) (putc:v ':') (putc:v ' ')
                (let:nve $b 254 (
                    (tau:ra $a $b)
                    (ifnot:ze $b (
                        (printf%03d:r $a)
                        (newline)))))))
            (inc:a $a)))))))
(byte-primes)

Its compiled brain­fuck clocks in at nearly two thausand bytes, mostly due to an inefficient treatment of constants. To try it out, you may use:

% sh =(curl -fsSL blog.jfrech.com/248/run.sh)=(curl -fsSL blog.jfrech.com/248/byte-primes.ma)

Closing Thoughts

Despite the staleness of this project I truly believe targeting brain­fuck can be advantageous: its per­for­mance drop when compared to natively compiled programs can be overcome with fast CPUs and low task loads and its utter simplicity radiates a beauty matched by no oth­er architecture I know of. As such, I definitely plan to develop my compiler further in the future — a reachable goal being replacing known Unix core utilities entirely by brain­fuck-based rewrites, reducing my system’s bloatiness in the process.


[1]For documentation purposes, the project’s final state before the writ­ing of this post is also preserved: 2020-01-18_jonathan-frech_brain­fuck-compiler-project.tar