Skip to content

feat(tests): add comprehensive multi-agent cryptographic test suite#6

Merged
aWN4Y25pa2EK merged 8 commits into
mainfrom
feat/multi-agent-test-suite
Jan 21, 2026
Merged

feat(tests): add comprehensive multi-agent cryptographic test suite#6
aWN4Y25pa2EK merged 8 commits into
mainfrom
feat/multi-agent-test-suite

Conversation

@aWN4Y25pa2EK
Copy link
Copy Markdown
Member

Summary

This PR adds a comprehensive multi-agent cryptographic test suite with 60 tests across 11 phases, validating the M2M protocol's security primitives from basic operations to real LLM communication.

Test Phases

Phase Tests Focus
1-5 17 Foundation, crypto core, multi-org, scale, protocol
6 3 Single/dual agent LLM calls (OpenRouter)
7 4 Protocol metrics & instrumentation
8 22 Protocol invariants (proptest-based)
9 4 Performance regression guards
10 5 Stress tests (throughput, scale, memory)
11 5 True multi-agent LLM communication

Phase 11 Highlights (New)

Addresses the critical gap where previous tests validated crypto throughput but not actual multi-agent LLM traffic:

  • Multi-round conversation: 3-turn dialogue with session state persistence
  • Agent relay: Alice → Bob → Charlie → LLM chain (6 encryption hops)
  • Cross-org X25519: Key exchange between different organizations
  • Small network: 5 agents, round-robin message passing
  • Variable payloads: 2 to 900+ character LLM responses

Bug Fixed

Integer underflow in frame.rs:393 - malformed frames could panic. Added bounds check before subtraction.

Stress Test Results (release build)

  • 220,000+ ops/sec throughput
  • 500 agents, 249,500 pairs validated
  • Sub-5µs latency for small payloads
  • 0.2% variance under sustained load

Running Tests

# All deterministic tests (55 tests)
cargo test --features crypto --test multi_agent_crypto

# LLM tests (8 tests, requires API key)
OPENROUTER_API_KEY=sk-or-v1-... cargo test --features crypto --test multi_agent_crypto -- --ignored

# Stress tests in release mode
cargo test --features crypto --test multi_agent_crypto --release stress

Commits

  • a7ac7bb Multi-agent cryptographic test suite (Phases 1-6)
  • 790115c Protocol metrics and instrumentation (Phase 7)
  • a000244 Protocol invariant tests + underflow bug fix (Phase 8)
  • f01e9c1 Stress tests for throughput limits (Phase 10)
  • 383bc87 True multi-agent LLM communication (Phase 11)

Comprehensive test suite validating M2M protocol cryptography across
100 agents in 5 organizations. Tests cover:

Phase 1-2: Foundation & Crypto Core
- Session key derivation and symmetry
- AEAD encrypt/decrypt roundtrip
- Tamper and wrong-key detection

Phase 3-4: Multi-Org & Scale
- Cross-org key isolation (different masters)
- X25519 key exchange for cross-org communication
- 100 agents with unique identity keys
- Full mesh testing (190 pairs in single org)

Phase 5: Protocol Integration
- M2M session handshake
- Secure frame encode/decode
- Multi-turn encrypted message exchange
- Real LLM payload roundtrip

Phase 6: Autonomous Agents (requires API key)
- Single agent LLM calls
- Two-agent encrypted conversation
- Cross-org autonomous chat with X25519

Test counts:
- 21 deterministic tests (always run in CI)
- 3 LLM integration tests (#[ignore], require OPENROUTER_API_KEY)

Also adds dotenvy dev-dependency for .env loading.
Add comprehensive performance metrics to the multi-agent test suite:

- ProtocolMetrics struct for tracking compression, encryption, overhead
- test_protocol_efficiency_metrics: analyzes frame size at each stage
- test_key_derivation_performance: benchmarks HKDF (37.5k derivations/sec)
- test_encryption_throughput: measures AEAD performance (1.26 MB/s encrypt)
- test_full_mesh_with_metrics: 20-agent mesh (10.4k pairs/sec)

Key findings:
- Small payloads (<100 bytes): net overhead due to fixed AEAD costs
- Medium/large payloads: 21-37% net savings from compression
- Encryption adds ~28 bytes fixed overhead (nonce + tag + framing)
- Session key derivation: ~26.6 µs per derivation
- Symmetric key uniqueness confirmed (945 unique from 990 derivations)
Phase 8: Protocol Invariant Tests (proptest-based)
- INV-C1-C4: Compression roundtrip, efficiency, determinism, malformed rejection
- INV-E1-E6: Encryption roundtrip, wrong key, tamper, nonces, AAD binding
- INV-K1-K4: Key derivation symmetry, session isolation, org isolation, uniqueness
- INV-X1-X2: DH key agreement, keypair uniqueness
- INV-F1-F3: Frame roundtrip, wire format prefix, edge cases (unicode, 100KB)
- INV-S1-S2: Session state machine, session ID agreement

Phase 9: Performance Regression Tests
- Key derivation: must complete under 100µs
- Encryption throughput: must exceed 0.5 MB/s
- Roundtrip latency: must complete under 500µs
- Mesh scaling: per-pair time must not degrade >2.5x

Bug Fix (discovered by INV-C4):
- Fixed integer underflow in M2MFrame::decode() and decode_with_aead()
- header_len < FIXED_HEADER_SIZE now returns error instead of panic

Other:
- Updated OpenRouter model from free to paid tier for reliable testing
- Added proptest dependency for property-based testing

Test results: 50 passed, 3 ignored (LLM tests require API key)
Stress Tests:
- stress_max_throughput: Tests ops/sec at different payload sizes
- stress_agent_scaling: Tests 10 to 500 agents (249,500 pairs)
- stress_multi_org: 10 orgs × 50 agents = 500 total agents
- stress_sustained_load: 5-second continuous load stability
- stress_memory_contexts: Up to 5,000 concurrent security contexts

Release Build Performance (M1/M2 Mac):
- Small payloads: 220,000+ ops/sec (~4.5µs latency)
- Large payloads (1KB): 33,000 ops/sec, 35 MB/s throughput
- 500 agents full mesh: 249,500 pairs in 1.1 seconds
- Same-org communication: ~10,000 ops/sec (debug), ~200,000 (release)
- Cross-org (with X25519): ~1,300 ops/sec (8x overhead for key exchange)
- Sustained load CV: 0.2% (extremely stable)

Total test count: 55 passed, 3 ignored (LLM tests)
Add 5 new async tests validating actual multi-agent communication with
real LLM traffic via OpenRouter API:

- test_multi_round_conversation: 3-turn conversation with session state
- test_agent_relay: Alice → Bob → Charlie → LLM chain (6 encryption hops)
- test_cross_org_llm: X25519 key exchange between different organizations
- test_small_agent_network: 5 agents, round-robin message passing
- test_variable_payload_sizes: tiny to large LLM responses (2-900+ chars)

These tests address the critical gap identified in the existing suite:
previous stress tests validated cryptographic throughput (220k+ ops/sec)
but used synthetic payloads, not actual LLM traffic. Phase 11 ensures the
protocol works correctly with real-world variable-size LLM responses and
multi-hop agent relays.

All tests are #[ignore] by default (require OPENROUTER_API_KEY).

Test count: 55 passing + 8 ignored (LLM tests)
CI runners are significantly slower than local machines, especially for
debug builds. Lower the encryption throughput threshold from 0.5 MB/s to
0.1 MB/s to accommodate slower CI environments. Release builds achieve
200+ MB/s locally.
CI debug builds run significantly slower than local development. Increase
all performance thresholds to accommodate CI environments:
- Key derivation: 100µs → 200µs
- Roundtrip latency: 500µs → 1000µs
- Encryption throughput: already fixed at 0.1 MB/s

These are conservative bounds for CI; release builds achieve:
- Key derivation: < 10µs
- Roundtrip latency: < 5µs
- Encryption throughput: 200+ MB/s
@aWN4Y25pa2EK aWN4Y25pa2EK merged commit ecfd388 into main Jan 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant