Skip to content

Unit tests for platform-subnet-manager#27

Merged
echobt merged 9 commits intoPlatformNetwork:mainfrom
cuteolaf:test/subnet-manager
Jan 9, 2026
Merged

Unit tests for platform-subnet-manager#27
echobt merged 9 commits intoPlatformNetwork:mainfrom
cuteolaf:test/subnet-manager

Conversation

@cuteolaf
Copy link
Contributor

@cuteolaf cuteolaf commented Jan 9, 2026

Summary by CodeRabbit

  • Tests
    • Expanded comprehensive test coverage across command execution, config loading/validation, health monitoring, recovery manager, snapshots, and update manager.
  • Snapshots
    • Improved snapshot restore behavior to reconstruct snapshot data/directories during restoration.
  • Updates
    • Enhanced update handling tests and validation for apply/rollback scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds extensive unit and integration tests across the subnet-manager crate, implements directory deserialization for snapshots, exposes test-only HealthMonitor accessors, and derives PartialEq for UpdateType. No production API signatures were removed or otherwise changed.

Changes

Cohort / File(s) Summary
Command execution & validation tests
crates/subnet-manager/src/commands.rs
Adds extensive test scaffolding and 50+ tests for CommandExecutor, command/result serialization, signature verification, and execution paths across subnet commands; introduces test helpers (e.g., build_executor_with_sudo, create_executor_with_keypair).
Configuration & ban list tests
crates/subnet-manager/src/config.rs
Adds unit tests for SubnetConfig load/save and validation, BanList file persistence and missing-file defaults, and MIN_VALIDATOR_STAKE constant checks.
Health monitoring tests & accessors
crates/subnet-manager/src/health.rs
Adds pub(crate) test-only accessors (test_history_mut, test_failure_counts_mut) and ~20+ tests for health checks, alert severity, history trimming, worst-status logic, and recovery signaling.
Recovery operation tests
crates/subnet-manager/src/recovery.rs
Adds helper constructors and ~30+ tests covering RecoveryManager behavior: serialization, pause/resume, cooldowns, health-branch flows, rollback workflows, snapshot interactions, and attempt/history tracking.
Snapshot management functionality & tests
crates/subnet-manager/src/snapshot.rs
Implements deserialize_directory to create target directories and write data.bin for non-empty data; adds extensive tests for snapshot creation/loading, hash validation, apply/restore semantics, metadata, ordering, pruning, and edge cases.
Update management API & tests
crates/subnet-manager/src/update.rs
UpdateType now derives PartialEq. Adds comprehensive tests for serialization, UpdateManager queueing/processing/rollback, hard-reset semantics, validator/WASM updates, filesystem effects, history pruning, and error paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

Hop, hop, I code and test with care 🐇
Snapshots, health, and updates — everywhere,
Signatures checked and rollbacks neat,
Executors ready, tests complete,
A little rabbit cheers, so fair ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding comprehensive unit tests to the subnet-manager crate across multiple modules (commands.rs, config.rs, health.rs, recovery.rs, snapshot.rs, update.rs).
Docstring Coverage ✅ Passed Docstring coverage is 94.87% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
crates/subnet-manager/src/snapshot.rs (2)

324-331: Asymmetric serialization/deserialization pair may cause confusion.

serialize_directory (lines 316-321) stores the path as bytes, but deserialize_directory writes the input data to data.bin rather than interpreting it as a path. While both are marked as simplified placeholders, this asymmetry means round-tripping won't work as expected.

Consider either:

  1. Adding a clarifying comment about this intentional mismatch, or
  2. Aligning the implementations for consistency

807-823: Test may be flaky due to timing dependency.

The 10ms sleep between snapshot creations (line 812) might not provide sufficient time separation on heavily loaded CI systems, potentially causing intermittent test failures if timestamps collide.

Consider using a more deterministic approach:

♻️ Suggested improvement
         // Create snapshots in order
         for i in 0..3 {
             manager
                 .create_snapshot(&format!("snap{}", i), (i + 1) * 100, i + 1, &state, "test", false)
                 .unwrap();
-            std::thread::sleep(std::time::Duration::from_millis(10));
+            std::thread::sleep(std::time::Duration::from_millis(50));
         }
crates/subnet-manager/src/recovery.rs (2)

938-993: Test lacks meaningful assertions.

The test captures attempt1, attempt2, and attempt3 but doesn't assert their results. The comments suggest it should verify max attempts limiting and fallback behavior, but the test only checks that check_and_recover doesn't panic.

Consider adding explicit assertions:

♻️ Suggested improvement
         // First recovery attempt
         let attempt1 = manager.check_and_recover(&health).await;
-        // Might succeed based on health status
+        assert!(attempt1.is_some(), "First attempt should trigger recovery");

         // Second recovery attempt
         let attempt2 = manager.check_and_recover(&health).await;
+        assert!(attempt2.is_some(), "Second attempt should trigger recovery");

         // Third attempt should be limited
         let attempt3 = manager.check_and_recover(&health).await;
-        // Should eventually stop or trigger fallback
+        assert!(attempt3.is_none(), "Third attempt should be blocked by max_attempts limit");

1246-1250: Minor style: prefer assert! for boolean checks.

-        assert_eq!(decoded.success, true);
+        assert!(decoded.success);
crates/subnet-manager/src/health.rs (1)

1058-1080: Duplicate test: test_worse_status_ordering duplicates test_worse_status_priority_ordering.

Lines 894-900 already test the same worse_status logic with identical assertions. Consider removing this duplicate to reduce test maintenance overhead.

crates/subnet-manager/src/commands.rs (3)

1536-1545: Tautological assertion provides no verification.

assert!(result.success || !result.success) is always true and doesn't test anything meaningful. If the expected behavior is unclear, consider either:

  1. Documenting the expected behavior and adding a proper assertion
  2. Removing the test if it doesn't verify meaningful behavior
♻️ Suggested improvement
     async fn test_remove_nonexistent_challenge() {
         let (executor, _dir) = create_test_executor();

         let result = executor.execute_command(&SubnetCommand::RemoveChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
-        // Should handle gracefully (may succeed or fail depending on implementation)
-        assert!(result.success || !result.success);
+        // Removing a non-existent challenge is a no-op and succeeds
+        assert!(result.success);
     }

1692-1714: Tests with no assertions don't verify behavior.

test_remove_nonexistent_challenge_error, test_pause_resume_challenge_errors, and similar tests capture result but never assert on it. These tests only verify the code doesn't panic, but don't validate the expected outcomes.

Consider adding assertions:

♻️ Suggested improvement
     async fn test_pause_resume_challenge_errors() {
         let (executor, _dir) = create_test_executor();

-        // Paths for lines 332, 381
         let result = executor.execute_command(&SubnetCommand::PauseChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
+        assert!(result.success, "PauseChallenge on nonexistent should succeed as no-op");

         let result = executor.execute_command(&SubnetCommand::ResumeChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
+        assert!(result.success, "ResumeChallenge on nonexistent should succeed as no-op");
     }

1716-1732: Test lacks assertions for error path verification.

The test captures results for unbanning non-existent entities but doesn't assert on them. Based on the implementation, these should return errors, so add appropriate assertions:

♻️ Suggested improvement
     async fn test_unban_nonexistent_entities() {
         let (executor, _dir) = create_test_executor();

-        // Paths for lines 416-417, 425-426
         let result = executor.execute_command(&SubnetCommand::UnbanValidator {
             hotkey: Hotkey([99u8; 32]),
         }).await;
+        assert!(!result.success, "Unbanning non-existent validator should fail");

         let result = executor.execute_command(&SubnetCommand::UnbanHotkey {
             hotkey: Hotkey([88u8; 32]),
         }).await;
+        assert!(!result.success, "Unbanning non-existent hotkey should fail");

         let result = executor.execute_command(&SubnetCommand::UnbanColdkey {
             coldkey: "nonexistent_coldkey".into(),
         }).await;
+        assert!(!result.success, "Unbanning non-existent coldkey should fail");
     }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7678293 and 3da6dd9.

📒 Files selected for processing (6)
  • crates/subnet-manager/src/commands.rs
  • crates/subnet-manager/src/config.rs
  • crates/subnet-manager/src/health.rs
  • crates/subnet-manager/src/recovery.rs
  • crates/subnet-manager/src/snapshot.rs
  • crates/subnet-manager/src/update.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build
  • GitHub Check: Test
🔇 Additional comments (7)
crates/subnet-manager/src/recovery.rs (1)

319-335: LGTM!

Well-structured test helper that encapsulates the common setup pattern, reducing boilerplate across tests. Returning the TempDir ensures cleanup happens at the right time.

crates/subnet-manager/src/update.rs (2)

15-15: LGTM!

Adding PartialEq derive to UpdateType is appropriate for enabling test assertions. This is a minimal, non-breaking change that improves testability.


415-432: LGTM!

Good serialization round-trip tests that verify all UpdateType variants can be serialized and deserialized correctly.

crates/subnet-manager/src/config.rs (2)

374-388: LGTM!

Good round-trip test that verifies config serialization/deserialization preserves all fields by comparing JSON representations.


390-406: LGTM!

Comprehensive validation error tests covering all three validation conditions (epoch_length, max_validators, weight_interval). The use of pattern matching with guards effectively verifies both the error type and message content.

crates/subnet-manager/src/health.rs (1)

571-582: LGTM!

Test-only accessors are appropriately scoped with #[cfg(test)] and pub(crate), enabling internal state inspection for tests without exposing internals in production builds.

crates/subnet-manager/src/commands.rs (1)

609-650: LGTM!

Well-organized test helpers that encapsulate common setup patterns. The separation between create_executor_with_keypair (returns keypair for signature tests) and create_test_executor (simpler for non-signature tests) is appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3da6dd9 and 9cdec8b.

📒 Files selected for processing (4)
  • crates/subnet-manager/src/commands.rs
  • crates/subnet-manager/src/health.rs
  • crates/subnet-manager/src/recovery.rs
  • crates/subnet-manager/src/snapshot.rs
🧰 Additional context used
🧬 Code graph analysis (1)
crates/subnet-manager/src/commands.rs (4)
crates/subnet-manager/src/recovery.rs (1)
  • new (87-103)
crates/subnet-manager/src/snapshot.rs (1)
  • new (77-90)
crates/subnet-manager/src/config.rs (6)
  • new (249-251)
  • default (59-76)
  • default (99-107)
  • default (136-146)
  • load (179-183)
  • load (254-262)
crates/subnet-manager/src/update.rs (1)
  • new (127-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (8)
crates/subnet-manager/src/snapshot.rs (1)

343-823: Excellent test coverage for snapshot functionality.

The test suite comprehensively covers:

  • Index loading and persistence
  • Snapshot creation with config and challenge data
  • Hash validation and corruption detection
  • Snapshot restoration and application
  • Pruning and retention policies
  • Edge cases (nonexistent snapshots, empty data, etc.)

The tests are well-structured, isolated using tempdir, and verify both in-memory state and filesystem changes.

crates/subnet-manager/src/recovery.rs (2)

319-353: Well-designed test helper functions.

The helper functions provide:

  • create_manager_with_config: Consistent RecoveryManager setup with custom config
  • create_aggressive_health_monitor: Easy testing of recovery triggers with tight thresholds
  • base_metrics: Baseline metrics with reasonable defaults (5 peers, etc.)

These reduce duplication and make tests more readable.


355-1254: Comprehensive recovery test coverage.

The test suite thoroughly covers:

  • All recovery actions and serialization
  • Health status transitions (healthy → degraded → unhealthy → critical)
  • Cooldown and attempt limiting logic
  • Rollback scenarios (to last snapshot, to specific snapshot, with/without snapshots)
  • Component-specific recovery paths (job_queue, network, evaluations)
  • Pause/resume functionality
  • History tracking and attempt counting

The tests properly verify both success and failure paths, making extensive use of the helper functions for readability.

crates/subnet-manager/src/health.rs (2)

571-582: Properly scoped test-only accessors.

The test accessors are correctly implemented:

  • Marked with #[cfg(test)] to exclude from production builds
  • pub(crate) visibility restricts access to crate-level tests
  • Enable direct manipulation of internal state for testing edge cases

This is a standard pattern for testing state transitions without exposing internal details to production code.


588-1058: Thorough health monitoring test coverage.

The test suite comprehensively validates:

  • Health status comparisons and transitions
  • Alert severity levels and lifecycle (creation, acknowledgement, auto-resolution)
  • Component-specific health checks with configurable thresholds
  • Metrics tracking and default values
  • Edge cases (zero totals, empty queues, boundary conditions)
  • Recovery signaling and worst-component identification
  • History management and trimming (100 entry limit)

Tests properly verify both the happy path and various degraded/unhealthy scenarios.

crates/subnet-manager/src/commands.rs (3)

609-651: Well-structured test helper functions.

The helper functions provide flexible executor setup:

  • build_executor_with_sudo: Core setup with custom sudo key
  • create_executor_with_keypair: Returns keypair for signature testing
  • create_test_executor: Simplified version for tests not needing the keypair

These reduce duplication while providing appropriate abstractions for different test scenarios.


652-1133: Comprehensive command execution test coverage.

The tests thoroughly validate:

  • CommandResult constructors (ok, ok_with_data, error)
  • Signature verification (valid sudo key, wrong signer, invalid signature)
  • Command serialization/deserialization for all variants
  • Query commands with data verification
  • Snapshot creation and rollback (including error paths)
  • Config updates with filesystem persistence
  • Challenge lifecycle (deploy, update, pause, resume, remove)
  • Edge cases (missing data, nonexistent entities, both fields None)

Tests properly verify both in-memory state changes and side effects (file writes, update queueing).


1135-1806: Excellent coverage of remaining command scenarios.

The tests comprehensively cover:

  • Ban/unban operations for all entity types (validator, hotkey, coldkey)
  • Validator management (kick when exists/doesn't exist, sync)
  • Recovery triggering (all action types, error paths)
  • Hard reset with different configurations
  • Serialization of all SubnetCommand variants
  • Edge cases (zero epoch length/min stake, nonexistent entities, multiple operations)
  • Error paths (invalid signatures, deserialize failures, missing snapshots)

The sha256_hex utility tests verify consistency and collision resistance. Overall, the test suite provides thorough coverage of the command execution system.

@echobt echobt merged commit f0c5e30 into PlatformNetwork:main Jan 9, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants