Conversation
… dedup Two key optimizations for high-fanout scenarios: 1. Thread-local HashMap for deduplication - Reuse allocation across publishes instead of allocating per-publish - Eliminates ~6000 HashMap allocations per minute under load 2. Arc<str> for Publish.topic - Cloning topic for each subscriber is now O(1) instead of O(n) - For 2000 subscribers, eliminates 2000 string clones per publish Based on analysis comparing VibeMQ to FlashMQ's hot path optimizations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extend CachedPublish to support all QoS levels with in-place byte patching for first_byte (dup/qos/retain) and packet_id fields. Key changes: - Add CachedPublish struct with offset tracking for patchable fields - Change InflightMessage from struct to enum (Cached/Full variants) - Update route_message to use CachedPublish for all QoS levels - Update retry and session resume to use cached writes with dup=true - Add optimization-guide.md documenting FlashMQ analysis Performance improvement (QoS 2 fan-out benchmark): - Mean latency: 274ms → 216ms (21% faster) - P99 latency: 536ms → 458ms (14% faster) - Memory: 457MB → 405MB (11% reduction) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Major optimizations for high fan-out scenarios: - Add RawPublish for zero-copy fan-out using incoming wire bytes - Patches first_byte and packet_id without re-encoding - Falls back to CachedPublish when protocol version differs - Add SharedWriter for direct buffer writes - Bypasses channel overhead for message delivery - Notification coalescing (only notify on empty→non-empty) - Reduces wake-ups during burst fan-out - Eliminate redundant session lookups in route_message - Use SharedWriter.protocol_version() instead of session lookup - Saves ~4000 lock acquisitions per 2000-subscriber fan-out - Reduce buffer sizes from 4KB to 2KB - SharedWriter, buffer pool, WebSocket buffers - Store Arc<CachedPublish> for atomic clone instead of allocation Results on QoS 2 fan-out (100 pub, 2000 sub): - Latency P50: 272ms → 122ms (-55%) - CPU max: 245% → 186% (-24%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Box large Publish types in enum variants to reduce size differences: - ResendInfo::Full in connect.rs - RetryInfo::Full in qos.rs - InflightMessage::Full in session/mod.rs - OutboundMessage::Packet in broker/mod.rs - Fix unnecessary mutable references in decoder.decode() calls - Use struct initializer syntax instead of field reassignment - Apply cargo fmt formatting fixes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Arc<str>and thread-local deduplicationSharedWriterfor improved throughputTest plan
cargo testto verify all tests passcargo benchor load test to verify performance improvements🤖 Generated with Claude Code