Architecture
This document describes the technical architecture of Shurli, from current implementation to future vision.
Table of Contents
- Current Architecture (Phase 4C Complete) - what’s built and working
- Target Architecture (Phase 6+) - planned additions
- Observability (Batch H) - Prometheus metrics, audit logging
- Adaptive Path Selection (Batch I) - interface discovery, dial racing, STUN, peer relay
- Core Concepts - implemented patterns
- Security Model - implemented + planned extensions
- Naming System - local names implemented, network-scoped and blockchain planned
- Federation Model - planned (Phase 10)
- Mobile Architecture - planned (Phase 9)
Current Architecture (Phase 4C Complete)
Component Overview
Shurli/
├── cmd/
│ ├── shurli/ # Single binary with subcommands
│ │ ├── main.go # Command dispatch (daemon, ping, traceroute, resolve,
│ │ │ # proxy, whoami, auth, relay, config, service,
│ │ │ # invite, join, status, init, version)
│ │ ├── cmd_daemon.go # Daemon mode + client subcommands (status, stop, ping, etc.)
│ │ ├── serve_common.go # Shared P2P runtime (serveRuntime) - used by daemon
│ │ ├── cmd_init.go # Interactive setup wizard
│ │ ├── cmd_proxy.go # TCP proxy client
│ │ ├── cmd_ping.go # Standalone P2P ping (continuous, stats)
│ │ ├── cmd_traceroute.go # Standalone P2P traceroute
│ │ ├── cmd_resolve.go # Standalone name resolution
│ │ ├── cmd_whoami.go # Show own peer ID
│ │ ├── cmd_auth.go # Auth add/list/remove/validate subcommands
│ │ ├── cmd_relay.go # Relay add/list/remove subcommands
│ │ ├── cmd_service.go # Service add/list/remove subcommands
│ │ ├── cmd_config.go # Config validate/show/rollback/apply/confirm
│ │ ├── cmd_invite.go # Generate invite code + QR + P2P handshake (--non-interactive)
│ │ ├── cmd_join.go # Decode invite, connect, auto-configure (--non-interactive, env var)
│ │ ├── cmd_status.go # Local status: version, peer ID, config, services, peers
│ │ ├── cmd_verify.go # SAS verification (4-emoji fingerprint)
│ │ ├── cmd_relay_serve.go # Relay server: serve/authorize/info/config
│ │ ├── cmd_relay_pair.go # Relay pairing code generation
│ │ ├── cmd_relay_setup.go # Relay interactive setup wizard
│ │ ├── config_template.go # Shared node config YAML template (single source of truth)
│ │ ├── relay_input.go # Flexible relay address parsing (IP, IP:PORT, multiaddr)
│ │ ├── flag_helpers.go # Shared CLI flag parsing helpers
│ │ └── exit.go # Testable os.Exit wrapper
│
├── pkg/p2pnet/ # Importable P2P library
│ ├── network.go # Core network setup, relay helpers, name resolution
│ ├── service.go # Service registry (register/unregister, expose/unexpose)
│ ├── proxy.go # Bidirectional TCP↔Stream proxy with half-close + byte counting
│ ├── naming.go # Local name resolution (name → peer ID)
│ ├── identity.go # Identity helpers (delegates to internal/identity)
│ ├── ping.go # Shared P2P ping logic (PingPeer, ComputePingStats)
│ ├── traceroute.go # Shared P2P traceroute (TracePeer, hop analysis)
│ ├── verify.go # SAS verification helpers (emoji fingerprints)
│ ├── reachability.go # Reachability grade calculation (A-F scale)
│ ├── interfaces.go # Interface discovery, IPv6/IPv4 classification
│ ├── pathdialer.go # Parallel dial racing (direct + relay, first wins)
│ ├── pathtracker.go # Per-peer path quality tracking (event-bus driven)
│ ├── netmonitor.go # Network change monitoring (event-driven)
│ ├── stunprober.go # RFC 5389 STUN client, NAT type classification
│ ├── peerrelay.go # Every-peer-is-a-relay (auto-enable with public IP)
│ ├── metrics.go # Prometheus metrics (custom registry, all shurli collectors)
│ ├── audit.go # Structured audit logger (nil-safe, slog-based)
│ └── errors.go # Sentinel errors
│
├── internal/
│ ├── config/ # YAML configuration loading + self-healing
│ │ ├── config.go # Config structs (HomeNode, Client, Relay, unified NodeConfig)
│ │ ├── loader.go # Load, validate, resolve paths, find config
│ │ ├── archive.go # Last-known-good archive/rollback (atomic writes)
│ │ ├── confirm.go # Commit-confirmed pattern (apply/confirm/enforce)
│ │ └── errors.go # Sentinel errors (ErrConfigNotFound, ErrNoArchive, etc.)
│ ├── auth/ # SSH-style authentication
│ │ ├── authorized_keys.go # Parser + ConnectionGater loader
│ │ ├── gater.go # ConnectionGater implementation
│ │ ├── manage.go # AddPeer/RemovePeer/ListPeers (shared by CLI commands)
│ │ └── errors.go # Sentinel errors
│ ├── daemon/ # Daemon API server + client
│ │ ├── types.go # JSON request/response types (StatusResponse, PingRequest, etc.)
│ │ ├── server.go # Unix socket HTTP server, cookie auth, proxy tracking
│ │ ├── handlers.go # HTTP handlers, format negotiation (JSON + text)
│ │ ├── middleware.go # HTTP instrumentation (request timing, path sanitization)
│ │ ├── client.go # Client library for CLI → daemon communication
│ │ ├── errors.go # Sentinel errors (ErrDaemonAlreadyRunning, etc.)
│ │ └── daemon_test.go # Tests (auth, handlers, lifecycle, integration)
│ ├── identity/ # Ed25519 identity management (shared by shurli + relay-server)
│ │ └── identity.go # CheckKeyFilePermissions, LoadOrCreateIdentity, PeerIDFromKeyFile
│ ├── invite/ # Invite code encoding + PAKE handshake
│ │ ├── code.go # Binary -> base32 with dash grouping
│ │ └── pake.go # PAKE key exchange (X25519 DH + HKDF-SHA256 + XChaCha20-Poly1305)
│ ├── relay/ # Relay pairing, admin socket, peer introductions
│ │ ├── tokens.go # Token store (v2 pairing codes, TTL, namespace)
│ │ ├── pairing.go # Relay pairing protocol (/shurli/relay-pair/1.0.0)
│ │ ├── notify.go # Reconnect notifier + peer introduction delivery (/shurli/peer-notify/1.0.0)
│ │ ├── admin.go # Relay admin Unix socket server (cookie auth, /v1/pair)
│ │ └── admin_client.go # HTTP client for relay admin socket (fire-and-forget)
│ ├── reputation/ # Peer interaction tracking
│ │ └── history.go # Append-only interaction log per peer (foundation for PeerManager)
│ ├── qr/ # QR Code encoder for terminal display (inlined from skip2/go-qrcode)
│ │ ├── qrcode.go # Public API: New(), Bitmap(), ToSmallString()
│ │ ├── encoder.go # Data encoding (numeric, alphanumeric, byte modes)
│ │ ├── symbol.go # Module matrix, pattern placement, penalty scoring
│ │ ├── version.go # All 40 QR versions × 4 recovery levels
│ │ ├── gf.go # GF(2^8) arithmetic + Reed-Solomon encoding
│ │ └── bitset.go # Append-only bit array operations
│ ├── termcolor/ # Minimal ANSI terminal colors (replaces fatih/color)
│ │ └── color.go # Green, Red, Yellow, Faint - respects NO_COLOR
│ ├── validate/ # Input validation helpers
│ │ ├── service.go # ServiceName() - DNS-label format for protocol IDs
│ │ ├── network.go # Network address validation (multiaddr, IP, port)
│ │ └── errors.go # Sentinel errors
│ └── watchdog/ # Health monitoring + systemd integration
│ └── watchdog.go # Health check loop, sd_notify (Ready/Watchdog/Stopping)
│
├── relay-server/ # Deployment artifacts
│ ├── setup.sh # Deploy/verify/uninstall (builds shurli, runs relay serve)
│ └── relay-server.service # systemd unit template (installed as shurli-relay.service)
│
├── deploy/ # Service management files
│ ├── shurli-daemon.service # systemd unit for daemon (Linux)
│ └── com.shurli.daemon.plist # launchd plist for daemon (macOS)
│
├── configs/ # Sample configuration files
│ ├── shurli.sample.yaml
│ ├── relay-server.sample.yaml
│ └── authorized_keys.sample
│
├── docs/ # Project documentation
│ ├── ARCHITECTURE.md # This file
│ ├── DAEMON-API.md # Daemon API reference
│ ├── ENGINEERING-JOURNAL.md # Phase-by-phase engineering decisions
│ ├── MONITORING.md # Prometheus + Grafana monitoring guide
│ ├── NETWORK-TOOLS.md # Network diagnostic tools guide
│ ├── ROADMAP.md
│ ├── TESTING.md
│ ├── engineering-journal/ # Detailed per-phase journal entries
│ └── faq/ # FAQ sub-pages (comparisons, security, relay, design, deep dives)
│
└── examples/ # Example implementations
└── basic-service/Network Topology (Current)
Authentication Flow
Peer Authorization Methods
There are three ways to authorize peers:
1. CLI - shurli auth
shurli auth add <peer-id> --comment "label"
shurli auth list
shurli auth remove <peer-id>2. Invite/Join flow - zero-touch mutual authorization
Machine A: shurli invite --name home # Generates invite code + QR
Machine B: shurli join <code> --name laptop # Decodes, connects, auto-authorizes both sidesThe invite protocol uses PAKE-secured key exchange: ephemeral X25519 DH + token-bound HKDF-SHA256 key derivation + XChaCha20-Poly1305 AEAD encryption. The relay sees only opaque encrypted bytes during pairing. Both peers add each other to authorized_keys and names config automatically. Version byte: 0x01 = PAKE-encrypted invite, 0x02 = relay pairing code. Legacy cleartext protocol was deleted (zero downgrade surface).
3. Manual - edit authorized_keys file directly
echo "12D3KooW... # home-server" >> ~/.config/shurli/authorized_keysTarget Architecture (Phase 6+)
Planned Additions
Building on the current structure, future phases will add:
Shurli/
├── cmd/
│ ├── shurli/ # ✅ Single binary (daemon, serve, ping, traceroute, resolve,
│ │ # proxy, whoami, auth, relay, config, service, invite, join,
│ │ # status, init, version)
│ └── gateway/ # 🆕 Phase 8: Multi-mode daemon (SOCKS, DNS, TUN)
│
├── pkg/p2pnet/ # ✅ Core library (importable)
│ ├── ...existing...
│ ├── interfaces.go # 🆕 Phase 6: Plugin interfaces (note: pkg/p2pnet/interfaces.go already exists for Batch I interface discovery)
│ └── federation.go # 🆕 Phase 10: Network peering
│
├── internal/
│ ├── config/ # ✅ Configuration + self-healing (archive, commit-confirmed)
│ ├── auth/ # ✅ Authentication
│ ├── identity/ # ✅ Shared identity management
│ ├── validate/ # ✅ Input validation (service names, etc.)
│ ├── watchdog/ # ✅ Health checks + sd_notify
│ ├── transfer/ # 🆕 Phase 6: File transfer plugin
│ └── tun/ # 🆕 Phase 8: TUN/TAP interface
│
├── mobile/ # 🆕 Phase 9: Mobile apps
│ ├── ios/
│ └── android/
│
└── ...existing (relay-server/, configs, docs, examples)Service Exposure Architecture
Gateway Daemon Modes
Status: Planned (Phase 8) - not yet implemented. See Roadmap Phase 8 for details.
Daemon Architecture
shurli daemon is the single command for running a P2P host. It starts the full P2P lifecycle plus a Unix domain socket API for programmatic control (zero overhead if unused - it’s just a listener).
Shared P2P Runtime
To avoid code duplication, the P2P lifecycle is extracted into serve_common.go:
// serveRuntime holds the shared P2P lifecycle state.
type serveRuntime struct {
network *p2pnet.Network
config *config.HomeNodeConfig
configFile string
gater *auth.AuthorizedPeerGater // nil if gating disabled
authKeys string // path to authorized_keys
ctx context.Context
cancel context.CancelFunc
version string
startTime time.Time
}Methods: newServeRuntime(), Bootstrap(), ExposeConfiguredServices(), SetupPingPong(), StartWatchdog(), StartStatusPrinter(), Shutdown().
Daemon Server
The daemon server (internal/daemon/) is decoupled from the CLI via the RuntimeInfo interface:
type RuntimeInfo interface {
Network() *p2pnet.Network
ConfigFile() string
AuthKeysPath() string
GaterForHotReload() GaterReloader // nil if gating disabled
Version() string
StartTime() time.Time
PingProtocolID() string
}The serveRuntime struct implements this interface in cmd_daemon.go, keeping the daemon package importable without depending on CLI code.
Cookie-Based Authentication
Every API request requires Authorization: Bearer <token>. The token is a 32-byte random hex string written to ~/.config/shurli/.daemon-cookie with 0600 permissions. This follows the Bitcoin Core / Docker pattern - no plaintext passwords in config, token rotates on restart, same-user access only.
Stale Socket Detection
No PID files. On startup, the daemon dials the existing socket:
- Connection succeeds → another daemon is alive → return error
- Connection fails → stale socket from a crash → remove and proceed
Unix Socket API
15 HTTP endpoints over Unix domain socket. Every endpoint supports JSON (default) and plain text (?format=text or Accept: text/plain). Full API reference in Daemon API.
Dynamic Proxy Management
The daemon tracks active TCP proxies in memory. Scripts can create proxies via POST /v1/connect and tear them down via DELETE /v1/connect/{id}. All proxies are cleaned up on daemon shutdown.
Auth Hot-Reload
POST /v1/auth and DELETE /v1/auth/{peer_id} modify the authorized_keys file and immediately reload the connection gater via the GaterReloader interface. Access grants and revocations take effect without restart.
Concurrency Model
Background goroutines follow a consistent pattern for lifecycle management:
Ticker + Select Pattern
All recurring background tasks (relay reservation, DHT advertising, status printing, stats logging) use time.Ticker with select on ctx.Done():
go func() {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
// do work
}
}
}()This ensures goroutines exit cleanly when the parent context is cancelled (e.g., on Ctrl+C).
Watchdog + sd_notify
Both shurli daemon and shurli relay serve run a watchdog goroutine (internal/watchdog) that performs health checks every 30 seconds:
- shurli daemon: Checks host has listen addresses, relay reservation is active, and Unix socket is responsive
- shurli relay serve: Checks host has listen addresses and protocols are registered
On success, sends WATCHDOG=1 to systemd via the NOTIFY_SOCKET unix datagram socket (pure Go, no CGo). On non-systemd systems (macOS), all sd_notify calls are no-ops. READY=1 is sent after startup completes; STOPPING=1 on shutdown.
The systemd service uses Type=notify and WatchdogSec=90 (3x the 30s check interval) so systemd will restart the process if health checks stop succeeding.
Health Check HTTP Endpoint (/healthz)
The relay server optionally exposes a /healthz HTTP endpoint for external monitoring (Prometheus, UptimeKuma, etc.). Disabled by default in config:
health:
enabled: true
listen_address: "127.0.0.1:9090"The endpoint returns JSON with: status, peer_id, version, uptime_seconds, connected_peers, protocols. Bound to localhost by default - not exposed to the internet. The HTTP server starts after the relay service is up and shuts down gracefully on SIGTERM.
Commit-Confirmed Enforcement
When a commit-confirmed is active (shurli config apply --confirm-timeout), serve starts an EnforceCommitConfirmed goroutine that waits for the deadline. If shurli config confirm is not run before the timer fires, the goroutine reverts the config and calls os.Exit(1). Systemd then restarts the process with the restored config.
Graceful Shutdown
Long-running commands (daemon, proxy, relay serve) handle SIGINT/SIGTERM by calling cancel() on their root context, which propagates to all background goroutines. The daemon also accepts shutdown requests via the API (POST /v1/shutdown). Deferred cleanup (net.Close(), listener.Close(), socket/cookie removal) runs after goroutines stop.
Atomic Counters
Shared counters accessed by concurrent goroutines (e.g., bootstrap peer count) use atomic.Int32 instead of bare int to prevent data races.
Observability (Batch H)
Status: Implemented - opt-in Prometheus metrics + structured audit logging.
All observability features are disabled by default and opt-in via config:
telemetry:
metrics:
enabled: true
listen_address: "127.0.0.1:9091"
audit:
enabled: truePrometheus Metrics (pkg/p2pnet/metrics.go): Uses an isolated prometheus.Registry (not the global default) for testability and collision-free operation. When enabled, libp2p.PrometheusRegisterer(reg) exposes all built-in libp2p metrics (swarm, holepunch, autonat, rcmgr, relay) alongside custom shurli metrics. When disabled, libp2p.DisableMetrics() is called for zero CPU overhead.
Custom shurli metrics:
shurli_proxy_bytes_total{direction, service}- bytes transferred through proxyshurli_proxy_connections_total{service}- proxy connections establishedshurli_proxy_active_connections{service}- currently active proxy sessionsshurli_proxy_duration_seconds{service}- proxy session durationshurli_auth_decisions_total{decision}- auth allow/deny countsshurli_holepunch_total{result}- hole punch success/failureshurli_holepunch_duration_seconds{result}- hole punch timingshurli_daemon_requests_total{method, path, status}- API request countsshurli_daemon_request_duration_seconds{method, path, status}- API latencyshurli_info{version, go_version}- build information
Audit Logger (pkg/p2pnet/audit.go): Structured JSON events via log/slog with an audit group. All methods are nil-safe (no-op when audit is disabled). Events: auth decisions, service ACL denials, daemon API access, auth changes.
Daemon Middleware (internal/daemon/middleware.go): Wraps the HTTP handler chain (outside auth middleware) to capture request timing and status codes. Path parameters are sanitized (e.g., /v1/auth/12D3KooW... becomes /v1/auth/:id) to prevent high cardinality in metrics labels.
Auth Decision Callback: Uses a callback pattern (auth.AuthDecisionFunc) to decouple internal/auth from pkg/p2pnet, avoiding circular imports. The callback is wired in serve_common.go to feed both metrics counters and audit events.
Relay Metrics: When both health and metrics are enabled on the relay, /metrics is added to the existing /healthz HTTP mux. When only metrics is enabled, a dedicated HTTP server is started.
Grafana Dashboard: A pre-built dashboard (grafana/shurli-dashboard.json) ships with the project. Import it into any Grafana instance to visualize proxy throughput, auth decisions, hole punch success rates, API latency, and system metrics. 29 panels across 6 sections: Overview, Proxy Throughput, Security, Hole Punch, Daemon API, and System.
Reference: pkg/p2pnet/metrics.go, pkg/p2pnet/audit.go, internal/daemon/middleware.go, cmd/shurli/serve_common.go, grafana/shurli-dashboard.json
Adaptive Path Selection (Batch I)
Status: Implemented - interface discovery, parallel dial racing, path tracking, network change monitoring, STUN probing, every-peer-is-a-relay.
Six components work together to find and maintain the best connection path to each peer:
Interface Discovery (pkg/p2pnet/interfaces.go): DiscoverInterfaces() enumerates all network interfaces and classifies addresses as global IPv4, global IPv6, or loopback. Returns an InterfaceSummary with convenience flags (HasGlobalIPv6, HasGlobalIPv4). Called at startup and on every network change.
Parallel Dial Racing (pkg/p2pnet/pathdialer.go): PathDialer.DialPeer() replaces the old sequential connect (DHT 15s then relay 30s = 45s worst case) with parallel racing. If the peer is already connected, returns immediately. Otherwise fires DHT and relay strategies concurrently; first success wins, loser is cancelled. Classifies winning path as DIRECT or RELAYED based on multiaddr inspection.
Path Quality Tracking (pkg/p2pnet/pathtracker.go): PathTracker subscribes to libp2p’s event bus (EvtPeerConnectednessChanged) for connect/disconnect events. Maintains per-peer path info: path type, transport (quic/tcp), IP version, connected time, last RTT. Exposed via GET /v1/paths daemon API. Prometheus labels: path_type, transport, ip_version.
Network Change Monitoring (pkg/p2pnet/netmonitor.go): NetworkMonitor watches for interface/address changes by polling DiscoverInterfaces() and diffing against the previous snapshot. On change, fires registered callbacks. Triggers: interface re-scan, STUN re-probe, peer relay auto-detect update.
STUN NAT Detection (pkg/p2pnet/stunprober.go): Zero-dependency RFC 5389 STUN client. Probes multiple STUN servers concurrently, collects external addresses, classifies NAT type (none, full-cone, address-restricted, port-restricted, symmetric). HolePunchable() indicates whether DCUtR hole-punching is likely to succeed. Runs in background at startup (non-blocking) and re-probes on network change.
Every-Peer-Is-A-Relay (pkg/p2pnet/peerrelay.go): Any peer with a detected global IP auto-enables circuit relay v2 with conservative resource limits (4 reservations, 16 circuits, 128KB/direction, 10min sessions). Uses the existing ConnectionGater for authorization (no new ACL needed). Auto-detects on startup and network changes. Disables when public IP is lost.
Path Ranking: direct IPv6 > direct IPv4 > STUN-punched > peer relay > VPS relay. If all paths fail, the system falls back to relay and tells the user honestly.
Reference: pkg/p2pnet/interfaces.go, pkg/p2pnet/pathdialer.go, pkg/p2pnet/pathtracker.go, pkg/p2pnet/netmonitor.go, pkg/p2pnet/stunprober.go, pkg/p2pnet/peerrelay.go, cmd/shurli/serve_common.go
Core Concepts
1. Service Definition
Services are defined in configuration and registered at runtime:
type Service struct {
Name string // "ssh", "web", etc.
Protocol string // "/shurli/ssh/1.0.0"
LocalAddress string // "localhost:22"
Enabled bool // Enable/disable
}
type ServiceRegistry struct {
services map[string]*Service
host host.Host
}
func (r *ServiceRegistry) RegisterService(svc *Service) error {
// Set up stream handler for this service's protocol
r.host.SetStreamHandler(svc.Protocol, func(s network.Stream) {
// 1. Authorize peer
if !r.isAuthorized(s.Conn().RemotePeer(), svc.Name) {
s.Close()
return
}
// 2. Dial local service
localConn, err := net.Dial("tcp", svc.LocalAddress)
if err != nil {
s.Close()
return
}
// 3. Bidirectional proxy
go io.Copy(s, localConn)
io.Copy(localConn, s)
})
}2. Bidirectional TCP↔Stream Proxy
func ProxyStreamToTCP(stream network.Stream, tcpAddr string) error {
// Connect to local TCP service
tcpConn, err := net.Dial("tcp", tcpAddr)
if err != nil {
return err
}
defer tcpConn.Close()
// Bidirectional copy
errCh := make(chan error, 2)
go func() {
_, err := io.Copy(tcpConn, stream)
errCh <- err
}()
go func() {
_, err := io.Copy(stream, tcpConn)
errCh <- err
}()
// Wait for either direction to finish
return <-errCh
}3. Name Resolution
Currently implemented: LocalFileResolver resolves friendly names (configured via shurli invite/shurli join or manual YAML) to peer IDs. Direct peer ID strings are always accepted as fallback.
type LocalFileResolver struct {
names map[string]peer.ID
}
func (r *LocalFileResolver) Resolve(name string) (peer.ID, error) {
if id, ok := r.names[name]; ok {
return id, nil
}
return "", ErrNotFound
}Planned (Phase 6/11): The
NameResolverinterface,DHTResolver, multi-tier chaining, and blockchain naming are planned extensions. See Naming System below and Roadmap Phase 11.
Security Model
Authentication Layers
Layer 1: Network Level (ConnectionGater)
- Executed during connection handshake
- Blocks unauthorized peers before any data exchange
- Fastest rejection (minimal resource usage)
Layer 2: Protocol Level (Stream Handler)
- Defense-in-depth validation
- Per-service authorization (optional)
- Can override global authorized_keys
Per-Service Authorization
Status: Implemented (Pre-Batch H)
Each service can optionally restrict access to specific peer IDs via allowed_peers. When set, only listed peers can connect to that service. When omitted (nil), all globally authorized peers can access it.
services:
ssh:
enabled: true
local_address: "localhost:22"
allowed_peers: ["12D3KooW..."] # Only these peers can access SSH
web:
enabled: true
local_address: "localhost:80"
# No allowed_peers = all authorized peers can accessThe ACL check runs in the stream handler before dialing the local TCP service, so rejected peers never trigger a connection to the backend.
Federation Trust Model
Status: Planned (Phase 10) - not yet implemented. See Federation Model and Roadmap Phase 10.
# relay-server.yaml (planned config format)
federation:
peers:
- network_name: "alice"
relay: "/ip4/.../p2p/..."
trust_level: "full" # Bidirectional routing
- network_name: "bob"
relay: "/ip4/.../p2p/..."
trust_level: "one_way" # Only alice → grewal, not grewal → aliceNaming System
Multi-Tier Resolution
What works today: Tier 1 (Local Override) - friendly names configured via
shurli invite/joinor manual YAML - and the Direct Peer ID fallback. Tiers 2-3 (Network-Scoped, Blockchain) are planned for Phase 8/11.
Network-Scoped Name Format
Status: Planned (Phase 8/11) - not yet implemented. Currently only simple names work (e.g.,
home,laptopas configured in local YAML). The dotted network format below is a future design.
Format: <hostname>.<network>[.<tld>]
Examples (planned):
laptop.grewal # Query grewal relay
desktop.alice # Query alice relay
phone.bob.p2p # Query bob relay (explicit .p2p TLD)
home.grewal.local # mDNS compatibleFederation Model
Status: Planned (Phase 10) - not yet implemented. See Roadmap Phase 10.
Relay Peering
Mobile Architecture
Status: Planned (Phase 9) - not yet implemented. See Roadmap Phase 9.
Performance Considerations
Transport Preference
Both shurli daemon and shurli relay serve register transports in this order:
- QUIC (preferred) - 3 RTTs to establish, native multiplexing, better for hole-punching. libp2p’s smart dialing (built into v0.47.0) ranks QUIC addresses higher than TCP.
- TCP - 4 RTTs, universal fallback for networks that block UDP.
- WebSocket - Anti-censorship transport that looks like HTTPS to deep packet inspection (DPI). Commented out by default in sample configs.
AutoNAT v2
Enabled on all hosts. AutoNAT v2 performs per-address reachability testing with nonce-based dial verification. This means the node knows which specific addresses (IPv4, IPv6, QUIC, TCP) are publicly reachable, rather than a single “public or private” determination. Also prevents amplification attacks by requiring the probing peer to prove it controls the claimed address.
Version in Identify Protocol
All hosts set libp2p.UserAgent() so peers can discover each other’s software version via the Identify protocol:
- shurli nodes:
shurli/<version>(e.g.,shurli/0.1.0orshurli/dev) - relay server:
relay-server/<version>
The UserAgent is stored in each peer’s peerstore under the AgentVersion key after the Identify handshake completes (automatically on connect).
Connection Optimization
Relay vs Direct (implemented):
- Always attempt DCUtR for direct connection
- Fall back to relay if hole-punching fails
Connection Pooling (planned):
- Reuse P2P streams for multiple requests
- Multiplex services over single connection
- Keep-alive mechanisms
Bandwidth Management (planned):
- QoS for different service types
- Rate limiting per service
- Bandwidth monitoring and alerts
Items marked “planned” are tracked in the Roadmap under Phase 4C deferred items and Phase 12+.
Security Hardening
Relay Resource Limits
The relay server enforces resource limits via libp2p’s circuit relay v2 WithResources() and WithLimit() options. All limits are configurable in relay-server.yaml under the resources: section. Defaults are tuned for a private relay serving 2-10 peers with SSH/XRDP workloads:
| Parameter | Default | Description |
|---|---|---|
max_reservations | 128 | Total active relay slots |
max_circuits | 16 | Open relay connections per peer |
max_reservations_per_ip | 8 | Reservations per source IP |
max_reservations_per_asn | 32 | Reservations per AS number |
reservation_ttl | 1h | Reservation lifetime |
session_duration | 10m | Max per-session duration |
session_data_limit | 64MB | Max data per session per direction |
Session duration and data limits are raised from libp2p defaults (2min/128KB) to support real workloads (SSH, XRDP, file transfers). Zero-valued fields in config are filled with defaults at load time.
Key File Permission Verification
Private key files are verified on load to ensure they are not readable by group or others. The shared internal/identity package provides CheckKeyFilePermissions() and LoadOrCreateIdentity(), used by both shurli daemon and shurli relay serve:
- Expected:
0600(owner read/write only) - On violation: Returns error with actionable fix:
chmod 600 <path> - Windows: Check is skipped (Windows uses ACLs, not POSIX permissions)
Keys are already created with 0600 permissions, but this check catches degradation from manual chmod, file copies across systems, or archive extraction.
Config Self-Healing
The config system provides three layers of protection against bad configuration:
Archive/Rollback (
internal/config/archive.go): On each successfuldaemonorrelay servestartup, the validated config is archived as.{name}.last-good.yamlnext to the original. If a future edit breaks the config,shurli config rollbackrestores it. Archive writes are atomic (write temp file + rename).Commit-Confirmed (
internal/config/confirm.go): For remote config changes,shurli config applybacks up the current config, applies the new one, and writes a pending marker with a deadline. Ifshurli config confirmis not run before the deadline, the serve process reverts the config and exits. Systemd restarts with the restored config.Validation CLI (
shurli config validate): Check config syntax and required fields without starting the node. Useful before restarting a remote service.
Service Name Validation
Service names are validated before use in protocol IDs to prevent injection attacks. Names flow into fmt.Sprintf("/shurli/%s/1.0.0", name) - without validation, a name like ssh/../../evil or foo\nbar creates ambiguous or invalid protocol IDs.
The validation logic lives in internal/validate/validate.go (validate.ServiceName()), shared by all callers.
Validation rules (DNS-label format):
- 1-63 characters
- Lowercase alphanumeric and hyphens only
- Must start and end with alphanumeric character
- Regex:
^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$
Validated at four points:
shurli service add- rejects bad names at CLI entryValidateNodeConfig()- rejects bad names in config before startupExposeService()- rejects bad names at service registration timeConnectToService()- rejects bad names at connection time
Security Considerations
Threat Model
Threats Addressed:
- ✅ Unauthorized peer access (ConnectionGater)
- ✅ Man-in-the-middle (libp2p Noise encryption)
- ✅ Replay attacks (Noise protocol nonces)
- ✅ Relay bandwidth theft (relay authentication + resource limits)
- ✅ Relay resource exhaustion (configurable per-peer/per-IP/per-ASN limits)
- ✅ Protocol ID injection (service name validation)
- ✅ Key file permission degradation (0600 check on load)
- ✅ Newline injection in authorized_keys (sanitized comments)
- ✅ YAML injection via peer names (allowlisted characters)
- ✅ OOM via unbounded stream reads (512-byte buffer limits)
- ✅ Symlink attacks on temp files (os.CreateTemp with random suffix)
- ✅ Multiaddr injection in config (validated before writing)
- ✅ Per-service access control (AllowedPeers ACL on each service)
- ✅ Host resource exhaustion (libp2p ResourceManager with auto-scaled limits)
- ✅ SYN/UDP flood on relay (iptables rate limiting, SYN cookies, conntrack tuning)
- ✅ IP spoofing on relay (reverse path filtering via rp_filter)
- ✅ Runaway relay process (systemd cgroup limits: memory, CPU, tasks)
Threats NOT Addressed (out of scope):
- ❌ Relay compromise (relay can see metadata, not content)
- ❌ Peer key compromise (users must secure private keys)
Best Practices
Key Management:
- Private keys: 0600 permissions
- authorized_keys: 0600 permissions
- Never commit keys to git
Network Segmentation:
- Use per-service authorized_keys when needed
- Limit service exposure (disable unused services)
- Audit authorized_keys regularly
Relay Security:
- Enable relay authentication in production
- Monitor relay bandwidth usage
- Use non-standard ports
Scalability
Current Limitations
- Relay bandwidth: Limited by VPS plan (~1TB/month)
- Connections per relay: Limited by file descriptors (~1000-10000)
- DHT lookups: Slow for large networks (10-30 seconds)
Future Improvements
- Multiple relay failover/load balancing
- Relay-to-relay mesh for redundancy
- Optimized peer routing (shortest path)
- Distributed hash table optimization
- Connection multiplexing
Technology Stack
Core:
- Go 1.26+
- libp2p v0.47.0 (networking)
- Private Kademlia DHT (
/shurli/kad/1.0.0- isolated from IPFS Amino). Optional namespace isolation:discovery.network: "my-crew"produces/shurli/my-crew/kad/1.0.0, creating protocol-level separation between peer groups - Noise protocol (encryption)
- QUIC transport (preferred - 3 RTTs vs 4 for TCP)
- AutoNAT v2 (per-address reachability testing)
Why libp2p: Shurli’s networking foundation is the same stack used by Ethereum’s consensus layer (Beacon Chain), Filecoin, and Polkadot - networks collectively securing hundreds of billions in value. When Ethereum chose a P2P stack for their most critical infrastructure, they picked libp2p. Improvements driven by these ecosystems (transport optimizations, Noise hardening, gossipsub refinements) flow back to the shared codebase. See the FAQ comparisons for detailed comparisons.
Optional:
- Ethereum (blockchain naming)
- IPFS (distributed storage)
- gomobile (iOS/Android)
Last Updated: 2026-02-25 Architecture Version: 3.2 (Post-I-2 peer-notify, relay admin socket, HMAC proofs, CGNAT detection, cross-network hardening)