hp
toc

The humble sum should neither necessitate indirection nor be unbounded across package boundaries

2021-10-30, post № 250

language-design, #go

Routinely, I am irked by the effort required in most languages to express algebraic data types — the epitome of information itself. An inert datum is too often occluded by hardware-inspired lingo forcing one to adapt to rigid register-fitted types. Leastways the C lineage supports bytestrings together with their concatenation which, under the previous constraints, are equatable to product types. Though whilst control flow neatly factors through such a type, the categorical dual consistently requires an iffy cast or virtual semantics alltogether: the humble sum is quietly ignored in favor of its dual in whose hulking shadow it sometimes desintegrates out of existence. A far cry from Haskell’s effortless type alternative “|”, C at least supports byte-based type superpositioning. Yet although Go inherits “struct”, the fameless yet in no way inferior “union” is — to me unreasonably — absent.

Consequently, when I implemented miniKanren in Go back in March of 2021 [1], I was flabber­gasted to find a gaping language definition hole where I had expected a sum type. Toying with the idea to one-hot encode S-expressions via

\sum_i\alpha_i\hookrightarrow\prod_i\alpha_i,\quad x_k\mapsto(0_1,\dots,0_{k-1},x_k,0_{k+1},\dots,0_n),

I quickly dismissed it in hopes of a more elegant idiomatic Go approach. What I found was an interpretation of a sum type which conflates object action, dynamic dispatch, undefined control flow and unbounded sums \sum_{i\in\Omega}\alpha_i into a briddle mess of Go lines which can be viewed as indeed implementing a type coproduct. Needless to say, I am dissatisfied with Go’s negligence in this regard.

Whereas Haskell’s concise syntax for algebraic data types allows for an S-Expression type to be cleanly defined in two lines, Go’s convoluted interface repurposing requires five lines when shrunk:

-- Haskell
data Term = Symbol String
          | Pair Term Term
// Go
type Term interface { isTerm() }

type Symbol struct { Data string }
func (_ Symbol) isTerm() {}

type Pair struct { Car, Cdr Term }
func (_ Pair) isTerm() {}

Go’s support for discriminating the above is likewise a lengthy endeavour:

-- Haskell
flatten :: Term -> [String]
flatten (Symbol sym) = pure sym
flatten (Pair t s)   = flatten t ++ flatten s
// Go
func flatten(t Term) []string {
    switch x := t.(type) {
        case Symbol:
            return []string{x.Data}
        case Pair:
            return append(flatten(x.Car), flatten(x.Cdr)...)
        default:
            panic("unexpected summand")
    }
}

Ignoring the raw line or byte count, Go seems to actively revolt when working a tad more abstractly than with neatly byte-packed data; writing these fundamental type manipulations by hand is not a pleasant experience. Auspiciously, the above is implementable — if at a high cost: Go’s interfaces were built to allow packages functionality exports. Hence appending to or redefining a struct’s receiver methods across package boundaries is strictly forbidden, yet implementing an interface actively encouraged. And therein lies the problem: any struct from any source is allowed to implement isTerm() as a receiver method and will be seen as supporting the functionality required by Term — when viewed as an interface. Coming back to the sum type view, this means func flatten(Term) []string can be made to panic by nothing more than injecting another summand, from any package:

// Go
type ExtraPackatorial struct {}
func (_ ExtraPackatorial) isTerm() {}
func panics() { flatten(&ExtraPackatorial{}) }

All this leads me to propose an extension to the Go language: interface admission clauses. In action they may look like the following:

// wishful Go
type Term interface admits (Symbol; Pair) { isTerm() }

Taylored to allow bounded sum type implementation, admits restricts implementation of an interface such that only the types admitted by said interface may do so. Consequently, ExtraPackatorial’s attempt to fake itself into the sum type would be foiled by the compiler which respects Term’s request for an exclusive implementability.

Even though admits should be used rarely to not hinder package’s effectiveness and further support duck typing, it would close the gaping hole described in this post. Clearly, not every interface is meant to be freely extended and I think specifying structs by name which are granted permission to implement an interface would be a valuable feature, adding currently lacking control to define a universally useful type — the sum type.

Footnotes

  1. Jonathan Frech: Going for a miniKanren implementation.. 2021-03-31, self-published.
Jonathan Frech's blog; built 2024/08/31 22:59:44 CEST