Aigoual development log: Peer-to-peer Git data store synchronization (#297)
Calling Git distributed is, as I’ve come to realize, a disingenuous claim. The modus operandi is a central beacon of truth and slaving clients atomically meddling with it. Monopolization further rationalizes everyone’s server-client-architectured data store into a singular platform. I’ve described my battle with the latter in The Great ▜▚▌▗▞▀
Escape three years ago.
Back then, I had moved all repositories I manage onto a VPS under my control. As the months passed, however, some puffed up to sizes my measly low-tier VPS filesystem wasn’t partitioned for and, with the Great Escape’s euphoria waning, discontent set in about certain more sensitive repositories still being in others’ hands. We built a ZFS-powered, threeway-mirrored six-hard-disk (at any point in time, up to two may fail) home server where some repositories got moved to.
Fully inside our home’s local network, repositories residing at home were out of reach during travel. Coupled with our family vacation’s resort blocking intranet connection attempts, thus my attempts at establishing a temporary central truth repository on my ThinkPad failing, two laptops’ local repositories⸺cut off from their origin⸺began to slowly drift.
What I came up with to re-establish common grounds in this origin-deprived environment, is a transport-encrypted full-duplex pipe bridged over a freshly bought VPS (I try to both full-disk-encrypt my laptops and not don them with too many keys when I travel, so I couldn’t access any of my other servers) and Aigoual’s aig sync
subcommand which takes in a shell expression evaluating to a transport and an SSH public key fingerprint, symmetrizes an SSH connection over it and synchronizes two repositories’ object stores over it. My VPS provider only knows I synchronized something, as do they have an upper bound on the amount of data transferred, and both parties involved in the synchronization are treated symmetrically, communicating as equal peers.
In this text, I will briefly describe this approach’s technical makeup, how it fared in use the last two weeks, and muse about the role decentralization might play in a world where increased Kraken-awareness calls for post-monopolistic workflow topologies.
A switchboard was swiftly written using golang.org/
, in part since I had used Go’s semi-stdlib SSH server both for my public Git server and other, never-flourished SSH experiments. Key to its design is the use of OpenSSH’s vanilla SSH client requesting execution of the “new
” command, which server-side generates a short switching token, outputs it to stderr and hangs until the other side connects. When the other side requests execution of this token, both sides get server-side informational-soixante-neuf-dialed [S99¹, p. 10] and the link is made. The switchboard both ignores usernames and lets everyone in.
Public as my switchboard is, its design classifies it as a ‘mere conduit’ ‘intermediary service’ according to DSA legislation, relieving me of liability regarding the data switched over it. [EU22², p. 44: ch. II “Liability of providers of intermediary services”, art. 4 “Mere conduit”]
At this point in time, both sides only exchanged keys with the server respectively; they don’t know about each other yet. Consequently, the server has an unobstructed view into the plaintext racing along the full-duplex pipe.
To establish an encrypted peer-to-peer connection, aig sync
first agrees in plaintext, who will be the server via exchanging coin flips until they don’t draw (exchanging a random 64-bit number and failing when both sides picked the same one is an improvement I plan to make to the protocol, drastically reducing the number of initial round-trips to exactly one). Aigoual then spins up another SSH server/client pair on that transport and symmetrizes the pair into an authenticated full-duplex pipe. The plaintext transport is given via a shell expression, the private key is ~/
and the public key’s SHA-256 fingerprint is taken from the environment variable $TRUST
.
Both sides attach to a local Git repository and index the object store, telling each other over the authenticated pipe what they have. Both sides compute what the other is missing and push a packfile through the pipe (Aigoual can both write packfiles in a streaming manner and read packfiles in a streaming manner!). No references are ever touched, the synchronized objects may well be dangling.
cd "$(mktemp -d)" && git init -q . head -c 1024 /dev/urandom >r && git add . git commit -qm "Alice's random data" export SWITCHBOARD=152.53.20.223 export TRUST="$(ssh-keygen -lf ~/.ssh/bob_ed25519.pub | cut -d\ -f2)" aig sync "ssh $SWITCHBOARD new" transport: "17e58\n" cd "$(mktemp -d)" && git init -q . head -c 1024 /dev/urandom >r && git add . git commit -qm "Bob's random data" export SWITCHBOARD=152.53.20.223 export TRUST="$(ssh-keygen -lf ~/.ssh/bob_ed25519.pub | cut -d\ -f2)" aig sync "ssh $SWITCHBOARD 17e58" transport: "switched\n" = 0 > obj:sha1:6e61c4dd7e854ad3cf8dbee69f2cbad2473f7fc7 > obj:sha1:72072b82e08e56ee84af84a0da9f3b8ea72a2c3b > obj:sha1:c2ed57cda3f4d3fc0ce9f24595a3ca9b5f3b5591 < obj:sha1:d2dab9c85af972a51043885825bbe1c0dbf1b467 < obj:sha1:e9ad1c869e1d326873515e764a85b16d2301d821 < obj:sha1:eaff6c1cf26a2fb5e0be855cc6508030943e7922 . = 0 > obj:sha1:d2dab9c85af972a51043885825bbe1c0dbf1b467 > obj:sha1:e9ad1c869e1d326873515e764a85b16d2301d821 > obj:sha1:eaff6c1cf26a2fb5e0be855cc6508030943e7922 < obj:sha1:6e61c4dd7e854ad3cf8dbee69f2cbad2473f7fc7 < obj:sha1:72072b82e08e56ee84af84a0da9f3b8ea72a2c3b < obj:sha1:c2ed57cda3f4d3fc0ce9f24595a3ca9b5f3b5591 .
head -c 1024 /dev/urandom >r && git add . git commit -qm 'random chance' aig sync "ssh $SWITCHBOARD new" transport: "98994\n" aig sync "ssh $SWITCHBOARD 98994" transport: "switched\n" = 6 < obj:sha1:2202c3995e179f40782a33326f7acbb541acb1a7 < obj:sha1:4243e4dca4b25c12940bbd624e99fb4ba5cda47e < obj:sha1:7ad295c8bbd695be588c440855d44878e1eab63a . = 6 > obj:sha1:2202c3995e179f40782a33326f7acbb541acb1a7 > obj:sha1:4243e4dca4b25c12940bbd624e99fb4ba5cda47e > obj:sha1:7ad295c8bbd695be588c440855d44878e1eab63a .
Writing a data synchronization tool comes with a pressing bootstrapping problem: written on one machine, how to perform the initial distribution of the program text? Go’s effortless cross-compilation from an x86 Linux ThinkPad to an x86 Darwin machine was the elegant part of my answer, a thumb drive (pushing binaries through the switchboard would leak to my hosting provider) the practical one.
Without peer-to-peer, opening up one’s machine is not to be taken lightly; four words “$ doas systemctl start ssh
” and the run-of-the-mill Debian system could be compromised: My local password isn’t strong! With flimsy WiFi security, public places’ large user counts and the ever-present possibility even at home of someone sitting outside some room’s window vampire-clamping into the intranet, the open listen-for-everyone nature of OpenSSH’s server is ill-suited to connect only two machines.
From a security perspective, aig sync
shines in its non-destructiveness. Apart from exhausting disk space and planting illegal content, Git object stores may be viewed as windows into the set of all objects; expanding them doesn’t degrade repositories. Since I trust Aigoual enough (n.b.: true trust is not applicable to software systems) to mediate the described access, when starting with a dangling-objects-free repository, everything can be straightforwardly restored via $ git gc --prune=
, no matter the peer.
When it comes to using the synchronized object store to synchronize repository state, care must be taken: $ git pull
’s expected due diligence in classifying reference updates into “fast-forward”, “requires-merge” and “incompatible” is utterly absent when forcibly overwriting HEAD references with $ git reset --hard $OBJ_ID
. I have destroyed others’ work once the last two weeks where they never ran $ git add -A
and I overzealously performed synchronization.
Barring said faux pas, aig sync
was wrapped into two terse scripts which correctly set up paths and keys, running without a hitch. Looking at both machines’ git log
side by side, cherry-picking paired with a subsequent re-synchronization and cocksure reference forcing kept both repositories in agreement all holiday long.
As I’ve mentioned above in passing, I feel the zeitgeist such that general acceptance towards the extent monopolization in the techno-political realm dwindling and foresee a tipping point of some making immanent. Peer-to-peer, especially now that European transport operators are shielded from liability by dint of the Digital Services Act, certainly is of some qualities of an answer.
What I’ve learnt from Go’s founding philosophies and what sets the Aigoual project apart from other p2p solutions is strict adherence to respect for the already-present: aig sync
is best understood as one idea of how to use a globally-spanned IP/
[1] | [S99] Neal Stephenson: In the beginning ... was the command line. Perennial, 2003. ISBN: 0-380-81593-1 |
[2] | [EU22] European Union legislation: Digital Services Act. 2022-10-19. Online: https:// |