Jonathan. Frech’s WebBlog

Aigoual de­vel­op­ment log: Peer-to-peer Git data store synchronization (#297)

Jonathan Frech

Calling Git distributed is, as I’ve come to realize, a disingenuous claim. The modus operandi is a central beacon of truth and slaving clients atomically meddling with it. Monopolization further ra­tio­nal­izes everyone’s server-client-architectured data store into a singular platform. I’ve described my battle with the latter in The Great ▜▚▌▗▞▀ Escape three years ago.
Back then, I had moved all re­pos­i­to­ries I manage onto a VPS under my control. As the months passed, however, some puffed up to sizes my measly low-tier VPS filesystem wasn’t partitioned for and, with the Great Escape’s euphoria waning, discontent set in about certain more sensitive re­pos­i­to­ries still being in others’ hands. We built a ZFS-powered, threeway-mirrored six-hard-disk (at any point in time, up to two may fail) home server where some re­pos­i­to­ries got moved to.

Fully inside our home’s local network, re­pos­i­to­ries residing at home were out of reach during trav­el. Coupled with our family vacation’s resort blocking intranet connection attempts, thus my attempts at es­tab­lish­ing a temporary central truth re­pos­i­to­ry on my Think­Pad failing, two lap­tops’ local re­pos­i­to­ries⸺cut off from their origin⸺began to slowly drift.

What I came up with to re-es­tab­lish com­mon grounds in this origin-deprived environment, is a trans­port-encrypted full-duplex pipe bridged over a freshly bought VPS (I try to both full-disk-encrypt my lap­tops and not don them with too many keys when I trav­el, so I couldn’t access any of my oth­er servers) and Aigoual’s aig sync subcommand which takes in a shell expression evaluating to a trans­port and an SSH pub­lic key fingerprint, symmetrizes an SSH connection over it and synchronizes two re­pos­i­to­ries’ ob­ject stores over it. My VPS provider on­ly knows I synchronized some­thing, as do they have an upper bound on the amount of data trans­fer­red, and both parties involved in the synchronization are treated symmetrically, com­mu­ni­cating as equal peers.
In this text, I will briefly describe this approach’s technical makeup, how it fared in use the last two weeks, and muse about the role decentralization might play in a world where increased Kraken-aware­ness calls for post-monopolistic workflow topologies.

A switchboard was swiftly written using golang.org/x/crypto/ssh, in part since I had used Go’s semi-stdlib SSH server both for my pub­lic Git server and oth­er, never-flourished SSH experiments. Key to its de­sign is the use of OpenSSH’s vanilla SSH client requesting execution of the “new” command, which server-side generates a short switching token, outputs it to stderr and hangs until the oth­er side connects. When the oth­er side requests execution of this token, both sides get server-side in­for­ma­tion­al-soixante-neuf-dialed [S99⁠¹, p. 10] and the link is made. The switchboard both ignores usernames and lets everyone in.
Pub­lic as my switchboard is, its de­sign classifies it as a ‘mere conduit’ ‘intermediary service’ according to DSA legislation, relieving me of liability re­gard­ing the data switched over it. [EU22⁠², p. 44: ch. II “Liability of providers of intermediary services”, art. 4 “Mere conduit”]

At this point in time, both sides on­ly exchanged keys with the server respectively; they don’t know about each oth­er yet. Consequently, the server has an unobstructed view into the plaintext racing along the full-duplex pipe.
To es­tab­lish an encrypted peer-to-peer connection, aig sync first agrees in plaintext, who will be the server via exchanging coin flips until they don’t draw (exchanging a ran­dom 64-bit num­ber and failing when both sides picked the same one is an improvement I plan to make to the protocol, drastically reducing the num­ber of initial round-trips to exactly one). Aigoual then spins up another SSH server/client pair on that trans­port and symmetrizes the pair into an authenticated full-duplex pipe. The plaintext trans­port is given via a shell expression, the private key is ~/.ssh/id_ed25519 and the pub­lic key’s SHA-256 fingerprint is taken from the environment variable $TRUST.
Both sides attach to a local Git re­pos­i­to­ry and index the ob­ject store, telling each oth­er over the authenticated pipe what they have. Both sides com­pute what the oth­er is missing and push a pack­file through the pipe (Aigoual can both write pack­files in a streaming manner and read pack­files in a streaming manner!). No references are ever touched, the synchronized ob­jects may well be dangling.

cd "$(mktemp -d)" && git init -q .
head -c 1024 /dev/urandom >r && git add .
git commit -qm "Alice's random data"

export SWITCHBOARD=152.53.20.223
export TRUST="$(ssh-keygen -lf ~/.ssh/bob_ed25519.pub | cut -d\  -f2)"
aig sync "ssh $SWITCHBOARD new"
transport: "17e58\n"




cd "$(mktemp -d)" && git init -q .
head -c 1024 /dev/urandom >r && git add .
git commit -qm "Bob's random data"

export SWITCHBOARD=152.53.20.223
export TRUST="$(ssh-keygen -lf ~/.ssh/bob_ed25519.pub | cut -d\  -f2)"
aig sync "ssh $SWITCHBOARD 17e58"


transport: "switched\n"
= 0
> obj:sha1:6e61c4dd7e854ad3cf8dbee69f2cbad2473f7fc7
> obj:sha1:72072b82e08e56ee84af84a0da9f3b8ea72a2c3b
> obj:sha1:c2ed57cda3f4d3fc0ce9f24595a3ca9b5f3b5591
< obj:sha1:d2dab9c85af972a51043885825bbe1c0dbf1b467
< obj:sha1:e9ad1c869e1d326873515e764a85b16d2301d821
< obj:sha1:eaff6c1cf26a2fb5e0be855cc6508030943e7922
.


= 0
> obj:sha1:d2dab9c85af972a51043885825bbe1c0dbf1b467
> obj:sha1:e9ad1c869e1d326873515e764a85b16d2301d821
> obj:sha1:eaff6c1cf26a2fb5e0be855cc6508030943e7922
< obj:sha1:6e61c4dd7e854ad3cf8dbee69f2cbad2473f7fc7
< obj:sha1:72072b82e08e56ee84af84a0da9f3b8ea72a2c3b
< obj:sha1:c2ed57cda3f4d3fc0ce9f24595a3ca9b5f3b5591
.
Fig. 1: Alice and Bob synchronizing their re­pos­i­to­ries.
head -c 1024 /dev/urandom >r && git add .
git commit -qm 'random chance'

aig sync "ssh $SWITCHBOARD new"
transport: "98994\n"




aig sync "ssh $SWITCHBOARD 98994"


transport: "switched\n"
= 6
< obj:sha1:2202c3995e179f40782a33326f7acbb541acb1a7
< obj:sha1:4243e4dca4b25c12940bbd624e99fb4ba5cda47e
< obj:sha1:7ad295c8bbd695be588c440855d44878e1eab63a
.


= 6
> obj:sha1:2202c3995e179f40782a33326f7acbb541acb1a7
> obj:sha1:4243e4dca4b25c12940bbd624e99fb4ba5cda47e
> obj:sha1:7ad295c8bbd695be588c440855d44878e1eab63a
.
Fig. 2: Alice and Bob synchronizing their re­pos­i­to­ries again.

Writ­ing a data synchronization tool comes with a pressing bootstrapping problem: written on one ma­chine, how to perform the initial distribution of the program text? Go’s ef­fort­less cross-compilation from an x86 Linux Think­Pad to an x86 Darwin ma­chine was the elegant part of my answer, a thumb drive (pushing binaries through the switchboard would leak to my hosting provider) the practical one.

Without peer-to-peer, opening up one’s ma­chine is not to be taken lightly; four words “$ doas systemctl start ssh” and the run-of-the-mill Debian system could be compromised: My local password isn’t strong! With flimsy WiFi security, pub­lic places’ large user counts and the ever-present possibility even at home of someone sitting outside some room’s window vampire-clamping into the intranet, the open listen-for-everyone nature of OpenSSH’s server is ill-suited to connect on­ly two ma­chines.
From a security perspective, aig sync shines in its non-destructiveness. Apart from exhausting disk space and planting illegal content, Git ob­ject stores may be viewed as windows into the set of all ob­jects; expanding them doesn’t degrade re­pos­i­to­ries. Since I trust Aigoual enough (n.b.: true trust is not applicable to software systems) to mediate the described access, when starting with a dangling-ob­jects-free re­pos­i­to­ry, everything can be straightforwardly restored via $ git gc --prune=now, no matter the peer.

When it comes to using the synchronized ob­ject store to synchronize re­pos­i­to­ry state, care must be taken: $ git pull’s expected due diligence in classifying reference updates into “fast-forward”, “re­quires-merge” and “incompatible” is utterly absent when forcibly overwriting HEAD references with $ git reset --hard $OBJ_ID. I have destroyed others’ work once the last two weeks where they never ran $ git add -A and I overzealously performed synchronization.
Barring said faux pas, aig sync was wrapped into two terse scripts which correctly set up paths and keys, running without a hitch. Looking at both ma­chines’ git log side by side, cherry-picking paired with a subsequent re-synchronization and cocksure reference forcing kept both re­pos­i­to­ries in agreement all holiday long.

As I’ve mentioned above in passing, I feel the zeitgeist such that gen­er­al ac­cep­tance towards the extent monopolization in the techno-political realm dwindling and foresee a tipping point of some making immanent. Peer-to-peer, especially now that European trans­port operators are shielded from liability by dint of the Dig­i­tal Services Act, certainly is of some qualities of an answer.
What I’ve learnt from Go’s founding philosophies and what sets the Aigoual project apart from oth­er p2p solutions is strict adherence to respect for the already-present: aig sync is best under­stood as one idea of how to use a globally-spanned IP/TCP network, SSH and Git; as seen, can concrete im­ple­men­ta­ry re­al­i­za­tions be whipped up in an all-nighter on a couch by the sea.


[1][S99] Neal Stephenson: In the beginning ... was the command line. Perennial, 2003. ISBN: 0-380-81593-1
[2][EU22] European Union legislation: Dig­i­tal Services Act. 2022-10-19. Online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32022R2065 [accessed 2025-05-23] (see also: https://eur-lex.europa.eu/eli/reg/2022/2065/oj [accessed 2025-05-24])