Technical Deep Dives

What libp2p improvements has Shurli adopted?

Shurli uses go-libp2p v0.47.0. Several improvements have shipped since then that would meaningfully improve performance, security, and reliability.

AutoNAT v2 (go-libp2p v0.41.1+)

The old AutoNAT tested “is my node reachable?” as a binary yes/no. v2 tests individual addresses:

	AutoNAT v1	AutoNAT v2
Tests	Whole node reachability	Each address independently
Verification	Trust the dialer’s claim	Nonce-based proof (dial-back)
Amplification risk	Yes (could be spoofed)	No (client must transfer 30-100KB first)
IPv4/IPv6	Can’t distinguish	Tests each separately

A Shurli node could know “IPv4 is behind NAT but IPv6 is public” and make smarter connection decisions.

Source: AutoNAT v2 Specification

Smart Dialing (go-libp2p v0.28.0+)

Old behavior: dial all peer addresses in parallel, abort on first success. Wasteful and creates network churn.

New behavior: ranks addresses intelligently, prioritizes QUIC over TCP, dials sequentially with fast failover. When a peer has both relay and direct addresses, smart dialing tries the direct path first.

Resource Manager

DAG-based resource constraints at system, protocol, and per-peer levels. This is the proper replacement for Shurli’s WithInfiniteLimits():

Per-peer connection and stream limits
Per-peer bandwidth caps
Memory and file descriptor budgets
Rate limiting (1 connection per 5s per IP, 16-burst default)
Prevents one peer from exhausting all relay resources

QUIC Source Address Verification

Validates that the peer’s source IP isn’t spoofed. Prevents relay from being used as a DDoS reflector. Built into go-libp2p’s QUIC transport since quic-go v0.54.0.

DCUtR Hole Punching Improvements

No v2 of DCUtR, but continuous refinement:

RTT measurement retries on each attempt (prevents one bad measurement from ruining all retries)
TCP hole punching now achieves “statistically indistinguishable success rates” from UDP
Measured success: 70% ± 7.1% across 4.4M attempts from 85K+ networks in 167 countries

Source: Large Scale NAT Traversal Measurement Study, libp2p Hole Punching blog

What Shurli has done (Phase 4C - shipped)

Optimization	Status
Upgraded go-libp2p to v0.47.0	Done
Replaced `WithInfiniteLimits()` with Resource Manager (auto-scaled limits)	Done
Enabled DCUtR in proxy command	Done (+ parallel dial racing in Batch I)
Persistent relay reservation	Done (periodic refresh in background goroutine)
QUIC as default transport	Done (3 RTTs vs 4 for TCP)
Adaptive path selection	Done (Batch I: interface discovery, STUN probing, every-peer-is-a-relay)
Relay pairing codes	Done (Post-I-1: relay admin generates codes, joiners connect in one command)
SAS verification	Done (Post-I-1: OMEMO-style 4-emoji fingerprint, persistent [UNVERIFIED] badge)
Reachability grades	Done (Post-I-1: A-F scale from interface discovery + STUN results)
PAKE-secured invite	Done (Pre-I-b: encrypted handshake, v1 cleartext deleted)
Private DHT namespaces	Done (Pre-I-c: `discovery.network` for isolated peer groups)
Daemon-first commands	Done (Post-I-1: ping/traceroute try daemon API first, fall back to standalone)
Peer introduction delivery	Done (Post-I-2: `/shurli/peer-notify/1.0.0`, relay pushes introductions with HMAC proofs)
HMAC group commitment	Done (Post-I-2: `HMAC-SHA256(token, groupID)` proves token possession)
Relay admin socket	Done (Post-I-2: Unix socket + cookie auth, `relay pair` is HTTP client)
Sovereign interaction history	Done (Post-I-2: per-peer `peer_history.json`, Welford’s running average)
Startup race fix	Done (Pre-Phase 5: handlers registered before DHT bootstrap)
Stale address detection	Done (Pre-Phase 5: `[stale?]` labels after network change)
systemd/launchd services	Done (Pre-Phase 5: `shurli service install/start/stop/status`)

These changes brought connection setup closer to 3-10 seconds via parallel dial racing, while keeping the self-sovereign architecture. Connection warmup and stream pooling remain as future optimizations.

What emerging technologies could benefit Shurli?

Protocols to watch

Protocol	What it gives Shurli	Status (2026)	Phase
MASQUE (RFC 9298)	HTTP/3 relay that looks like HTTPS to deep packet inspection. 0-RTT session resumption for instant reconnection after network switch.	Production (Cloudflare deploys across 330+ datacenters)	Future
Post-quantum Noise (ML-KEM / FIPS 203)	Quantum-resistant handshakes. Regulatory mandates expected 2026-2028.	AWS KMS, Windows 11 shipping ML-KEM. libp2p not yet adopted.	Future
QUIC v2 (RFC 9369)	Anti-ossification - randomized version field prevents middleboxes from special-casing QUIC v1.	Finalized	4C
WebTransport	Browser-native QUIC transport (replaces WebSocket for anti-censorship). Lower overhead, native datagrams.	Chrome/Firefox production, Safari flag-only	Future
W3C DID v1.1	Decentralized Identifiers - peer IDs in a standard, interoperable format (`did:key`, `did:peer`).	First Public Draft 2025	Future
eBPF / XDP	Kernel-bypass packet filtering at millions of packets/sec. DDoS mitigation without userspace overhead.	Production (Cloudflare, Meta, Netflix)	4C/Future

MASQUE: The next-generation relay transport

MASQUE (Multiplexed Application Substrate over QUIC Encryption) is an HTTP/3 proxying protocol with properties that directly address Circuit Relay v2’s weaknesses:

	Circuit Relay v2	MASQUE
Looks like	Custom libp2p protocol	Standard HTTPS traffic
DPI evasion	Requires WebSocket wrapping	Native - it IS HTTP/3
Session resume	New reservation per connection	0-RTT resume (TLS 1.3 tickets)
Multiplexing	Via Yamux (12-byte frames)	Native QUIC streams
Infrastructure	Self-hosted relay	Self-hosted or Cloudflare’s global network
Browser support	No (requires native client)	Yes (WebTransport API)

Shurli could offer MASQUE as an alternative relay transport alongside Circuit Relay v2 - giving users the choice between libp2p-native P2P and HTTP/3-based relay for environments where traffic must look like standard HTTPS.

Post-quantum cryptography: The coming mandate

Shurli currently uses Noise protocol with Ed25519 (classical cryptography). Quantum computers could eventually break this. The industry is preparing:

NIST finalized ML-KEM (FIPS 203) and ML-DSA (FIPS 204) as post-quantum standards
AWS KMS, ACM, and Secrets Manager support ML-KEM (Nov 2025)
Windows 11/Server 2025 ship with built-in ML-KEM and ML-DSA
CRYSTALS-Kyber being phased out in favor of ML-KEM (transition by 2026)
Hybrid approach: Run classical + post-quantum in parallel during transition

For Shurli, the path is:

Watch libp2p’s adoption of post-quantum Noise variants
Design cipher suite selection into the architecture (cryptographic agility)
Implement hybrid Noise + ML-KEM when libp2p support lands

Sources: NIST PQC Standards, AWS ML-KEM Support

eBPF: Relay-server hardening at kernel speed

eBPF (extended Berkeley Packet Filter) allows running sandboxed programs in the Linux kernel without modifying kernel source. For Shurli’s relay server:

XDP (eXpress Data Path): Process packets before they reach the network stack - millions of packets/sec DDoS mitigation
Rate limiting: Per-IP connection throttling at kernel level (faster than iptables)
Runtime monitoring: Detect exploitation attempts on the relay via syscall tracing (Falco, Tetragon)
Profiling: Trace packet processing bottlenecks without instrumentation overhead

This complements the userspace hardening (Resource Manager, per-peer limits) with kernel-level defense. Requires Linux kernel >= 5.8.

Zero-RTT proxy connection resume

The problem: When a laptop switches from WiFi to cellular (or WiFi flickers), all TCP connections through the proxy drop. The user must wait for reconnection (5-15 seconds with Circuit Relay v2).

The solution: QUIC 0-RTT session resumption. The client caches a session ticket from the previous connection. On reconnect, it sends encrypted data in the very first packet - before the server even processes the handshake.

Who has this: Cloudflare’s MASQUE relays, QUIC-native applications. Who doesn’t: WireGuard (stateless, reconnects fast but not 0-RTT), all current P2P tunnel tools.

This is a future optimization for Shurli’s QUIC transport - particularly valuable for mobile clients (Phase 9).

Why does Shurli use Go instead of Rust?

The trade-off

Factor	Go	Rust
Development speed	Fast - the reason Shurli exists today	2-3x slower initial development
GC pauses at scale	10s pauses observed at 600K connections	None - no garbage collector
Memory per connection	~28KB (GC overhead, interface boxing)	~4-8KB (zero-cost abstractions)
libp2p ecosystem	Mature (go-libp2p, most examples)	Growing (rust-libp2p, Iroh)
Formal verification	Limited	Strong (s2n-quic has 300+ Kani harnesses)
Binary size	~25-28MB	~5-10MB
Cross-compilation	Trivial (`GOOS=linux GOARCH=arm64`)	Requires target toolchain setup
Concurrency model	Goroutines (simple, GC-managed)	async/await (no runtime overhead)

Why Go is right for now

Go’s simplicity enabled rapid iteration through 7 phases of development. The libp2p Go ecosystem is the most mature, with the most examples and documentation. For a project with 1-100 concurrent connections (typical home use), Go’s performance is more than adequate.

When Rust becomes worth it

At scale - when a relay server handles thousands of concurrent circuits, or when the proxy loop becomes CPU-bound. The hot paths (packet forwarding in the relay, bidirectional proxy loop, SOCKS5 gateway) are candidates for selective Rust rewrite via FFI, not a full project rewrite.

Rust libraries to watch

Library	What it does	Why it matters
Iroh	Rust P2P library, QUIC-native	~90% NAT traversal success, QUIC multipath, approaching 1.0
Quinn	Pure Rust QUIC implementation	Used by Iroh, high performance, no C FFI
s2n-quic	AWS’s Rust QUIC	Formal verification with Kani, production-tested in AWS
tokio	Async runtime	LTS until Sept 2026, powers hyper (HTTP/2 + HTTP/3)

The hybrid strategy

Shurli’s planned approach:

Now through Phase 7: Ship in Go. Fix goroutine lifecycle, tune GC, add observability.
Phase 8+: Profile hot paths under load. Selectively rewrite proxy loop / relay forwarding in Rust via FFI if performance demands it.
Long-term: Re-evaluate full Rust migration only if market demands 100x throughput and there’s engineering capacity for it.

Sources: Rust vs Go (Bitfield), Go GC Guide, Iroh roadmap

How does reachability grade computation work in detail?

The reachability grade combines two data sources: interface discovery and STUN probe results.

Interface discovery scans all network interfaces and classifies each address:

Global unicast IPv6 -> public
Public IPv4 (not RFC 1918 / RFC 6598) -> public
RFC 6598 (100.64.0.0/10) -> CGNAT flag set
Everything else -> private/local

STUN probing uses Google’s public STUN servers to determine NAT behavior. It reports the external IP, port allocation strategy, and filtering behavior.

Grade computation logic:

if no connectivity:           Grade F
if CGNAT detected:            Grade D (cap, overrides STUN)
if public IPv6:               Grade A
if public IPv4:               Grade B
if full-cone or addr-restricted: Grade B
if port-restricted:           Grade C
if symmetric:                 Grade D

The CGNAT cap at grade D is the critical design choice. STUN probes the inner NAT and can report “hole-punchable” when the outer CGNAT will drop the punched packets. The grade overrides this false optimism.

Grades update automatically on network change events (WiFi switch, cable plug/unplug, VPN up/down). The grade is exposed via shurli daemon status and the REST API.

What is sovereign peer interaction history?

Each daemon maintains a local peer_history.json file tracking interaction data with every known peer. This data never leaves the machine - it’s the foundation for future trust algorithms.

What’s tracked per peer:

Field	Purpose
`first_seen`	When this peer was first encountered
`last_seen`	Most recent connection
`connection_count`	Total successful connections
`avg_latency_ms`	Running average (Welford’s online algorithm)
`path_types`	Map of `"direct": N, "relay": M`
`introduced_by`	Which relay or peer introduced this one
`intro_method`	`"relay-pairing"`, `"invite"`, or `"manual"`

Implementation details:

Thread-safe with sync.RWMutex
Atomic file writes (temp file + rename) for crash safety
Best-effort load on startup (missing file is not an error)
Storage bounded by peer count (per-peer aggregates, not per-connection logs)

Why collect now: Future trust algorithms (EigenTrust, reputation scoring) need interaction data as input. Starting collection now means months of history will be ready when those algorithms ship. Waiting until algorithm implementation to start collecting means zero history to bootstrap from.

Sovereignty: Each peer controls its own history. No central reputation server. No gossip-based sharing. The data stays local until explicit trust algorithms decide how (and whether) to use it.

Last Updated: 2026-02-25

Security & Features