Skip to content

Improve ARQ/LZ4 allocation efficiency and harden DNS parser limits#129

Closed
Isusami wants to merge 3 commits intomasterking32:mainfrom
Isusami:feature/perf-reliability-tier1
Closed

Improve ARQ/LZ4 allocation efficiency and harden DNS parser limits#129
Isusami wants to merge 3 commits intomasterking32:mainfrom
Isusami:feature/perf-reliability-tier1

Conversation

@Isusami
Copy link
Copy Markdown

@Isusami Isusami commented Apr 12, 2026

Summary

  • Add defensive DNS section-count caps in dnsparser to prevent oversized allocations from malformed packets.
  • Introduce pooled buffer ownership in ARQ hot paths to reduce per-packet allocations and GC pressure on send/receive/control tracking paths.
  • Pool LZ4 working buffers for compress/decompress while returning cloned outputs to preserve safety.
  • Upgrade recover() in ARQ retransmit loop from Debugf (silent) to Errorf with full goroutine stack trace for diagnosability.
  • Fix pooled buffer leak on duplicate control packet early return in SendControlPacketWithTTL.
  • Compact recentlyClosedHeap to prevent stale entry accumulation when stream IDs are rapidly recycled (heap bounded to 1.5x map size).

Why

These changes address performance/reliability issues:

  • Lower allocation churn and memory pressure in high-throughput packet paths.
  • Reduce risk of allocation-based abuse in DNS parsing.
  • Improve operational visibility by logging retransmit-loop panics at Error level with stack traces (previously logged at Debug, effectively silent).
  • Prevent unbounded heap growth in the recently-closed stream tracker under pathological stream ID recycling (previously grew to ~1.9x map size, now capped at ~1.5x).

Test Plan

All 17 packages with tests pass with -race:

Package Result
cmd/client PASS
internal/arq PASS (50+ tests)
internal/basecodec PASS
internal/client PASS
internal/client/handlers PASS
internal/compression PASS
internal/config PASS
internal/dnscache PASS
internal/dnsparser PASS (30 tests, including 6 new oversized section count tests)
internal/domainmatcher PASS
internal/enums PASS
internal/fragmentstore PASS
internal/logger PASS
internal/mlq PASS
internal/security PASS
internal/socksproto PASS
internal/vpnproto PASS

Additional verification:

  • [PASS] TestRecentlyClosedHeapStaleEntryGrowth — confirms heap stays within 1.5x bound after 10 cycles of 50 recycled stream IDs
  • [PASS] Benchmark: ARQ write-loop at 1365 MB/s, 4 allocs/op

Pre-existing issue (not introduced by this PR)

internal/udpserver: TestProcessDeferredStreamSynDoesNotAttachAfterCancellation and TestProcessDeferredSOCKS5SynDoesNotAttachAfterCancellation have a data race on testNetConn.closed (plain bool read/written across goroutines without synchronization). Confirmed identical failure on main branch — not a regression.

@Isusami Isusami force-pushed the feature/perf-reliability-tier1 branch 5 times, most recently from 5cf13ec to 5833944 Compare April 12, 2026 20:40
@Isusami Isusami marked this pull request as ready for review April 12, 2026 20:45
Pool hot-path ARQ and LZ4 buffers to cut per-packet allocations, cap
DNS section counts before allocation, remove retransmit panic recovery
so critical failures surface instead of being masked, and fix pooled
buffer leak on duplicate control packet early return.
@Isusami Isusami force-pushed the feature/perf-reliability-tier1 branch 2 times, most recently from 986172e to d7233b0 Compare April 12, 2026 21:01
When stream IDs are rapidly recycled, re-remembering pushes new heap
entries while stale ones remain, causing the heap to grow up to ~1.9x
the map size. Add compactRecentlyClosedHeapLocked to rebuild the heap
from the authoritative map when it exceeds 1.5x, bounding peak memory.
@Isusami Isusami force-pushed the feature/perf-reliability-tier1 branch from d512f50 to 0df3a8e Compare April 12, 2026 21:05
…trace

The pooling refactor accidentally removed the panic recovery wrapper in
retransmitLoop. Without it a panic in checkRetransmits kills the goroutine
silently, leaving streams without retransmissions. The original recovery
logged at Debug level which effectively swallowed panics; this version
logs at Error with a full goroutine stack trace for diagnosability.
@Isusami Isusami force-pushed the feature/perf-reliability-tier1 branch from ea96bab to f0d0119 Compare April 12, 2026 21:09
@masterking32
Copy link
Copy Markdown
Owner

درود،
این PR چند مشکل داره، اگر لطف کنین به این شکل که چند تغییر را با هم PR نزنین خیلی خوب میشه، راحت میتونم جدا جدا Merge کنم،
من این درخواست رو رد میکنم برای هر بخش از تغییرات یک PR جداگانه بزنید، سپس اگر هرکدام موردی داشت میتوان رد کرد و در نهایت، بخش هایی که اوکی هستند رو تنها Merge کرد.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants