Phase 9 - Relay Circuit Investigation


Date	2026-03-26
Status	Complete (3 bugs found, 2 fixed, 1 superseded by Grant Receipt Protocol)
Phase	9 (Plugins, SDK & First Plugins)
ADRs	ADR-W01 to ADR-W03

During file transfer physical retesting (2026-03-25), relay circuit transfers failed intermittently. Investigation revealed three distinct issues, none of which were protocol bugs. The root causes were configuration defaults, missing retry logic, and a relay selection gap.

ADR-W01: Tier-Aware Session Limit Defaults (RC1)


Date	2026-03-26
Status	Accepted
Commit	572d086

Context

Seed relays (public, shared, enable_data_relay: false by default) and self-hosted relays (private, admin-controlled) were using identical session defaults: 64 MB data limit, 10-minute session duration. A 174 MB file transfer through a self-hosted relay failed at 64 MB because the self-hosted relay inherited seed relay limits.

This is not a bug in the relay protocol. The relay correctly enforced its configured limit. The problem was that the default configuration did not distinguish between seed relays (which should be conservative) and self-hosted relays (which should be generous).

Decision

Two tiers of session defaults:

Parameter	Seed relay	Self-hosted relay
Session data limit	64 MB	2 GB
Session duration	10 minutes	2 hours

Detection: if enable_data_relay is false (seed relay behavior), use conservative defaults. If true (self-hosted relay), use generous defaults. Explicit config values always override tier defaults.

Consequences

Self-hosted relays now support large file transfers out of the box
Seed relays remain conservative (shared resource protection)
Existing explicit configs are not affected
Relay operators who want custom limits can still set them

Reference: https://github.com/shurlinet/shurli/blob/main/internal/relay/

ADR-W02: Receiver Busy Retry with Exponential Backoff (RC2)


Date	2026-03-26
Status	Accepted
Commit	572d086

Context

When a receiver was processing another transfer, the sender got a “receiver busy” rejection. The transfer queue treated this as a permanent failure and dropped the job. For relay transfers (where reconnection is expensive), this was especially wasteful.

Decision

The transfer queue now distinguishes “receiver busy” from other rejections:

Receiver busy: transient, retryable. Requeue with exponential backoff (2s, 4s, 8s, 16s, 32s, max 5 attempts)
Other rejections: permanent, fail immediately

Requeue() method moves active jobs back to the pending queue with updated retry state.

Consequences

Busy peers get automatic retry without user intervention
Exponential backoff prevents hammering the receiver
Max 5 attempts prevents infinite retry loops
Works for both direct and relayed transfers

Reference: https://github.com/shurlinet/shurli/blob/main/pkg/p2pnet/transfer.go

ADR-W03: Seed Relay Churn and Budget-Aware Selection (RC3)


Date	2026-03-26
Status	Superseded by Grant Receipt Protocol (ADR-V01 to ADR-V05)

Context

The original 0.4-second circuit drop observed during testing was caused by transfers routed through seed-only relays. The seed relay correctly denied the data circuit (ACL enforcement), and the circuit was torn down. This was not a bug. The issue was that PeerManager picks the first available circuit, not the best one for the transfer.

A 174 MB file routed through a seed relay with a 64 MB session limit will always fail. The client needs to select a relay with sufficient budget.

Decision

This issue was superseded by the Grant Receipt Protocol (Batches 1-4). The Grant Receipt Protocol provides:

Client-side visibility: clients know each relay’s session budget via cached receipts
Pre-transfer checks: transfers blocked before wasting bandwidth on an insufficient relay
Smart error messages: “file size (174 MB) exceeds relay session limit (64 MB)” instead of a generic circuit failure

Budget-aware relay selection (choosing the best relay before dialing) is tracked as FT-Y #7 for post-merge optimization. The current approach is: try the relay, check the budget pre-transfer, fail fast with a clear message if insufficient. The next step is: check all cached receipts first, select the relay with sufficient budget, then dial.

Consequences

The immediate problem (wasted bandwidth on insufficient relays) is solved by pre-transfer checks
The optimization (proactive relay selection) is deferred to FT-Y #7
No relay protocol changes were needed; the solution is entirely client-side

Reference: https://github.com/shurlinet/shurli/blob/main/internal/grants/cache.go, https://github.com/shurlinet/shurli/blob/main/pkg/p2pnet/transfer_grants.go

Phase 9 - Grant Receipt Protocol