Completed Work
Phase 1: Configuration Infrastructure
Goal: Externalize all hardcoded values to YAML configuration files.
Deliverables:
-
internal/configpackage for loading YAML configs - Sample configuration files in
configs/ - Updated
.gitignorefor config files - Refactored home-node/client-node/relay-server to use configs
Key Files:
internal/config/config.go- Configuration structsinternal/config/loader.go- YAML parsingconfigs/*.sample.yaml- Sample configurations
Phase 2: Key-Based Authentication
Goal: Implement SSH-style authentication using ConnectionGater and authorized_keys files.
Deliverables:
-
internal/auth/gater.go- ConnectionGater implementation (primary defense) -
internal/auth/authorized_keys.go- Parser for authorized_keys - Integration into home-node and client-node
- Protocol-level validation (defense-in-depth)
- Relay server authentication (optional)
Security Model:
- Layer 1: ConnectionGater (network level - earliest rejection)
- Layer 2: Protocol handler validation (application level - secondary check)
Phase 3: Enhanced Usability - keytool CLI (superseded)
Goal: Create production-ready CLI tool for managing Ed25519 keypairs and authorized_keys.
Status: Completed (keytool features merged into shurli subcommands in Phase 4C module consolidation; cmd/keytool/ deleted)
All keytool functionality now lives in shurli subcommands: shurli whoami (peerid), shurli auth add (authorize), shurli auth remove (revoke), shurli auth list, shurli auth validate (validate). Key generation happens via shurli init.
Phase 4A: Core Library & Service Registry
Goal: Transform Shurli into a reusable library and enable exposing local services through P2P connections.
Deliverables:
- Create
pkg/p2pnet/as importable package-
network.go- Core P2P network setup, relay helpers, name resolution -
service.go- Service registry and management -
proxy.go- Bidirectional TCP-to-Stream proxy with half-close -
naming.go- Local name resolution (name to peer ID) -
identity.go- Ed25519 identity management
-
- Extend config structs for service definitions
- Update sample YAML configs with service examples
- Refactor to
cmd/layout with single Go module - Tested: SSH, XRDP, generic TCP proxy all working across LAN and 5G
- UX Streamlining:
- Single binary - merged home-node into
shurli daemon - Standard config path - auto-discovery (
./shurli.yaml->~/.config/shurli/config.yaml->/etc/shurli/config.yaml) -
shurli init- interactive setup wizard (generates config, keys, authorized_keys) - All commands support
--config <path>flag - Unified config type (one config format for all modes)
- Single binary - merged home-node into
Phase 4B: Frictionless Onboarding
Goal: Eliminate manual key exchange and config editing. Get two machines connected in under 60 seconds.
Deliverables:
-
shurli invite- generate short-lived invite code (encodes relay address + peer ID) -
shurli join <code>- accept invite, exchange keys, auto-configure, connect - QR code output for
shurli invite(scannable by mobile app later) -
shurli whoami- show own peer ID and friendly name for sharing -
shurli auth add/list/remove- manage authorized peers -
shurli relay add/list/remove- manage relay addresses without editing YAML - Flexible relay address input - accept
IP:PORTor bareIP(default port 7777) in addition to full multiaddr - QR code display in
shurli init(peer ID) andshurli invite(invite code)
Security hardening (done as part of 4B):
- Sanitize authorized_keys comments (prevent newline injection)
- Sanitize YAML names from remote peers (prevent config injection)
- Limit invite/join stream reads to 512 bytes (prevent OOM DoS)
- Validate multiaddr before writing to config YAML
- Use
os.CreateTempfor atomic writes (prevent symlink attacks) - Reject hostnames in relay input - only IP addresses accepted (no DNS resolution / SSRF)
- Config files written with 0600 permissions
User Experience:
# Machine A (home server)
$ shurli invite --name home
=== Invite Code (expires in 10m0s) ===
AEQB-XJKZ-M4NP-...
[QR code displayed]
Waiting for peer to join...
# Machine B (laptop)
$ shurli join AEQB-XJKZ-M4NP-... --name laptop
=== Joined successfully! ===
Peer "home" authorized and added to names.
Try: shurli ping homePhase 4C: Core Hardening & Security
Goal: Harden every component for production reliability. Fix critical security gaps, add self-healing resilience, implement test coverage, and make the system recover from failures automatically.
Security (Critical)
- Relay resource limits - replace
WithInfiniteLimits()with configurableWithResources()+WithLimit(). Defaults tuned for SSH/XRDP (10min sessions, 64MB data). - Auth hot-reload - daemon API
POST /v1/authandDELETE /v1/auth/{peer_id}reloadauthorized_keysat runtime. - Per-service access control -
AllowedPeersfield on each service restricts which peers can connect. - Rate limiting on incoming connections and streams - libp2p ResourceManager enabled. OS-level: iptables SYN flood protection (50/s) and UDP rate limiting (200/s).
- QUIC source address verification - reverse path filtering (rp_filter=1), SYN cookies for TCP flood protection.
- Key file permission check on load - refuse to load keys with permissions wider than 0600
- Service name validation - DNS-label format enforced (1-63 lowercase alphanumeric + hyphens)
libp2p Upgrade (Critical)
- go-libp2p v0.47.0 - AutoNAT v2, smart dialing, QUIC improvements, Resource Manager
- AutoNAT v2 - per-address reachability testing with nonce-based dial verification
- Smart dialing - address ranking, QUIC prioritization, sequential dial with fast failover
- QUIC as preferred transport - 1 fewer RTT on connection setup (3 RTTs vs 4 for TCP)
- Version in Identify -
libp2p.UserAgent("shurli/<version>")set on all hosts - Private DHT - migrated from IPFS Amino DHT to private shurli DHT (
/shurli/kad/1.0.0)
Self-Healing & Resilience
Inspired by Juniper JunOS, Cisco IOS, Kubernetes, systemd, MikroTik:
- Config validation -
shurli config validateparses config, checks key file, verifies relay address - Config archive - auto-saves last-known-good config on successful startup. Atomic write.
- Config rollback -
shurli config rollbackrestores from last-known-good archive - Commit-confirmed pattern (Juniper JunOS / Cisco IOS) -
shurli config apply <new-config> --confirm-timeout 5mapplies config and auto-reverts if not confirmed. Prevents permanent lockout on remote relay. - systemd watchdog integration -
sd_notify("WATCHDOG=1")every 30s with health check - Health check HTTP endpoint - relay exposes
/healthzwith JSON: peer ID, version, uptime, connected peers -
shurli statuscommand - version, peer ID, config path, relay addresses, authorized peers, services, names
Batch Deliverables
Batch A - Reliability:
-
DialWithRetry()- exponential backoff retry (1s -> 2s -> 4s) for proxy dial - TCP dial timeout - 10s for local service, 30s context for P2P stream
- DHT bootstrap in proxy command - Kademlia DHT (client mode) for direct peer discovery
-
[DIRECT]/[RELAYED]connection path indicators in logs - DCUtR hole-punch event tracer
Batch B - Code Quality:
- Deduplicated bidirectional proxy -
BidirectionalProxy()+HalfCloseConninterface (was 4 copies, now 1) - Sentinel errors - 8 sentinel errors across 4 packages
- Build version embedding -
shurli version, ldflags injection - Structured logging with
log/slog
Batch E - New Capabilities:
-
shurli status- local-only info command -
/healthzHTTP endpoint on relay-server -
shurli invite --non-interactive- bare invite code to stdout, progress to stderr -
shurli join --non-interactive- reads code from CLI arg, env var, or stdin
Batch F - Daemon Mode:
-
shurli daemon- long-running P2P host with Unix socket HTTP API - Cookie-based authentication (32-byte random hex,
0600permissions, rotated per restart) - 15 API endpoints with JSON + plain text format negotiation
- Auth hot-reload, dynamic proxy management
- P2P ping, traceroute, resolve - standalone + daemon API
- Service files: systemd + launchd
Batch G - Test Coverage & Documentation: Combined coverage: 80.3% (unit + Docker integration). Relay-server binary merged into shurli.
- 96 test functions covering CLI commands
- All 15 API handlers tested
- Docker integration tests with coverage
- Engineering journal with 43 ADRs
- Website with Hugo + Hextra, 10 blog posts, 40+ SVG diagrams
Batch H - Observability:
- Prometheus
/metricsendpoint (opt-in via config) - libp2p built-in metrics exposed (swarm, hole-punch, AutoNAT, relay, rcmgr)
- Custom shurli metrics (proxy bytes/connections/duration, auth counters, hole-punch stats, API timing)
- Audit logging - structured JSON via slog for security events
- Grafana dashboard - 29 panels across 6 sections
Pre-Batch I Items
Pre-I-a: Build & Deployment Tooling:
- Makefile with build, test, clean, install, service management
- Service install for Linux (systemd) and macOS (launchd)
-
make check- generic local checks runner from.checksfile -
make push- runs checks before git push
Pre-I-b: PAKE-Secured Invite/Join Handshake: Upgraded the invite/join token exchange from cleartext to an encrypted handshake inspired by WPA3’s SAE. The relay sees only opaque encrypted bytes during pairing. Zero new dependencies.
- Ephemeral X25519 DH + token-bound HKDF-SHA256 key derivation + XChaCha20-Poly1305 AEAD
- Invite versioning: v1 = PAKE-encrypted, v2 = relay pairing code
- v2 invite codes encode namespace for DHT network auto-inheritance
- 19 PAKE tests + 11 invite code tests
Pre-I-c: Private DHT Networks:
- Config option:
discovery.network: "my-crew"for isolated peer groups - DHT prefix becomes
/shurli/<namespace>/kad/1.0.0 - Nodes with different namespaces speak different protocols and cannot discover each other
- Validation: DNS-label safe (lowercase alphanumeric + hyphens, 1-63 chars)
Batch I: Adaptive Multi-Interface Path Selection
Probes all available network interfaces at startup, tests each path to peers, picks the best, and continuously monitors for network changes. Path ranking: direct IPv6 > direct IPv4 > STUN-punched > peer relay > VPS relay. Zero new dependencies.
- I-a: Interface Discovery & IPv6 Awareness -
DiscoverInterfaces()enumerates all network interfaces with global unicast classification - I-b: Parallel Dial Racing - parallel racing replaces sequential 45s worst-case. First success wins.
- I-c: Path Quality Visibility -
PathTrackerwith per-peer path info: type, transport, IP version, RTT.GET /v1/pathsAPI endpoint. - I-d: Network Change Monitoring - event-driven detection of interface/address changes with callbacks
- I-e: STUN-Assisted Hole-Punching - zero-dependency RFC 5389 STUN client. NAT type classification (none/full-cone/address-restricted/port-restricted/symmetric).
- I-f: Every-Peer-Is-A-Relay - any peer with a global IP auto-enables circuit relay v2 with conservative limits
Post-I-1: Frictionless Relay Pairing
Eliminates manual SSH + peer ID exchange for relay onboarding. Relay admin generates pairing codes, each person joins with one command.
- v1 cleartext deleted - zero downgrade surface
- Extended authorized_keys format - key=value attributes:
expires=<RFC3339>,verified=sha256:<prefix> - In-memory token store (relay-side) - SHA-256 hashed tokens, constant-time comparison, max 3 failed attempts before burn
- v2 invite code format - 16-byte token, relay address + namespace encoded. Shorter than v1 (126 vs 186 chars)
- Connection gater enrollment mode - probationary peers (max 10, 15s timeout) during active pairing
- SAS verification (OMEMO-style) - 4-emoji + 6-digit numeric fingerprint. Persistent
[UNVERIFIED]badge until verified. - Relay pairing protocol -
/shurli/relay-pair/1.0.0stream protocol. 8-step flow. -
shurli relay pair- generates pairing codes with--count N,--ttl,--namespace,--expires - Daemon-first commands -
shurli pingandshurli traceroutetry daemon API first, fall back to standalone - Reachability grade - A (public IPv6), B (public IPv4 or hole-punchable NAT), C (port-restricted NAT), D (symmetric NAT/CGNAT), F (offline)
Zero new dependencies. Binary size unchanged at 28MB.
Industry References
- Juniper JunOS
commit confirmed: Apply config, auto-revert if not confirmed. Prevents lockout on remote devices. - Cisco IOS
configure replace: Atomic config replacement with automatic rollback on failure. - MikroTik Safe Mode: Track all changes; revert everything if connection drops.
- Kubernetes liveness/readiness probes: Health endpoints that trigger automatic restart on failure.
- systemd WatchdogSec: Process heartbeat - systemd restarts if process stops responding.
libp2p Specification References
- Circuit Relay v2: Specification - reservation-based relay with configurable resource limits
- DCUtR: Specification - Direct Connection Upgrade through Relay (hole punching coordination)
- AutoNAT v2: Specification - per-address reachability testing with amplification prevention
- Hole Punching Measurement: Study - 4.4M traversal attempts, 85K+ networks, 167 countries, ~70% success rate