Unit tests for `platform-subnet-manager` by cuteolaf · Pull Request #27 · PlatformNetwork/platform

cuteolaf · 2026-01-09T13:56:16Z

Summary by CodeRabbit

Tests
- Expanded comprehensive test coverage across command execution, config loading/validation, health monitoring, recovery manager, snapshots, and update manager.
Snapshots
- Improved snapshot restore behavior to reconstruct snapshot data/directories during restoration.
Updates
- Enhanced update handling tests and validation for apply/rollback scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T13:56:26Z

📝 Walkthrough

Walkthrough

Adds extensive unit and integration tests across the subnet-manager crate, implements directory deserialization for snapshots, exposes test-only HealthMonitor accessors, and derives PartialEq for UpdateType. No production API signatures were removed or otherwise changed.

Changes

Cohort / File(s)	Summary
Command execution & validation tests `crates/subnet-manager/src/commands.rs`	Adds extensive test scaffolding and 50+ tests for CommandExecutor, command/result serialization, signature verification, and execution paths across subnet commands; introduces test helpers (e.g., build_executor_with_sudo, create_executor_with_keypair).
Configuration & ban list tests `crates/subnet-manager/src/config.rs`	Adds unit tests for SubnetConfig load/save and validation, BanList file persistence and missing-file defaults, and MIN_VALIDATOR_STAKE constant checks.
Health monitoring tests & accessors `crates/subnet-manager/src/health.rs`	Adds pub(crate) test-only accessors (`test_history_mut`, `test_failure_counts_mut`) and ~20+ tests for health checks, alert severity, history trimming, worst-status logic, and recovery signaling.
Recovery operation tests `crates/subnet-manager/src/recovery.rs`	Adds helper constructors and ~30+ tests covering RecoveryManager behavior: serialization, pause/resume, cooldowns, health-branch flows, rollback workflows, snapshot interactions, and attempt/history tracking.
Snapshot management functionality & tests `crates/subnet-manager/src/snapshot.rs`	Implements `deserialize_directory` to create target directories and write `data.bin` for non-empty data; adds extensive tests for snapshot creation/loading, hash validation, apply/restore semantics, metadata, ordering, pruning, and edge cases.
Update management API & tests `crates/subnet-manager/src/update.rs`	`UpdateType` now derives `PartialEq`. Adds comprehensive tests for serialization, UpdateManager queueing/processing/rollback, hard-reset semantics, validator/WASM updates, filesystem effects, history pruning, and error paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

Hop, hop, I code and test with care 🐇
Snapshots, health, and updates — everywhere,
Signatures checked and rollbacks neat,
Executors ready, tests complete,
A little rabbit cheers, so fair ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: adding comprehensive unit tests to the subnet-manager crate across multiple modules (commands.rs, config.rs, health.rs, recovery.rs, snapshot.rs, update.rs).
Docstring Coverage	✅ Passed	Docstring coverage is 94.87% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

crates/subnet-manager/src/snapshot.rs (2)
324-331: Asymmetric serialization/deserialization pair may cause confusion.

serialize_directory (lines 316-321) stores the path as bytes, but deserialize_directory writes the input data to data.bin rather than interpreting it as a path. While both are marked as simplified placeholders, this asymmetry means round-tripping won't work as expected.

Consider either:

Adding a clarifying comment about this intentional mismatch, or

Aligning the implementations for consistency

807-823: Test may be flaky due to timing dependency.

The 10ms sleep between snapshot creations (line 812) might not provide sufficient time separation on heavily loaded CI systems, potentially causing intermittent test failures if timestamps collide.

Consider using a more deterministic approach:
♻️ Suggested improvement
         // Create snapshots in order
         for i in 0..3 {
             manager
                 .create_snapshot(&format!("snap{}", i), (i + 1) * 100, i + 1, &state, "test", false)
                 .unwrap();
-            std::thread::sleep(std::time::Duration::from_millis(10));
+            std::thread::sleep(std::time::Duration::from_millis(50));
         }
crates/subnet-manager/src/recovery.rs (2)
938-993: Test lacks meaningful assertions.

The test captures attempt1, attempt2, and attempt3 but doesn't assert their results. The comments suggest it should verify max attempts limiting and fallback behavior, but the test only checks that check_and_recover doesn't panic.

Consider adding explicit assertions:
♻️ Suggested improvement
         // First recovery attempt
         let attempt1 = manager.check_and_recover(&health).await;
-        // Might succeed based on health status
+        assert!(attempt1.is_some(), "First attempt should trigger recovery");

         // Second recovery attempt
         let attempt2 = manager.check_and_recover(&health).await;
+        assert!(attempt2.is_some(), "Second attempt should trigger recovery");

         // Third attempt should be limited
         let attempt3 = manager.check_and_recover(&health).await;
-        // Should eventually stop or trigger fallback
+        assert!(attempt3.is_none(), "Third attempt should be blocked by max_attempts limit");
1246-1250: Minor style: prefer assert! for boolean checks.
-        assert_eq!(decoded.success, true);
+        assert!(decoded.success);
crates/subnet-manager/src/health.rs (1)

1058-1080: Duplicate test: test_worse_status_ordering duplicates test_worse_status_priority_ordering.

Lines 894-900 already test the same worse_status logic with identical assertions. Consider removing this duplicate to reduce test maintenance overhead.
crates/subnet-manager/src/commands.rs (3)
1536-1545: Tautological assertion provides no verification.

assert!(result.success || !result.success) is always true and doesn't test anything meaningful. If the expected behavior is unclear, consider either:

Documenting the expected behavior and adding a proper assertion

Removing the test if it doesn't verify meaningful behavior
♻️ Suggested improvement
     async fn test_remove_nonexistent_challenge() {
         let (executor, _dir) = create_test_executor();

         let result = executor.execute_command(&SubnetCommand::RemoveChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
-        // Should handle gracefully (may succeed or fail depending on implementation)
-        assert!(result.success || !result.success);
+        // Removing a non-existent challenge is a no-op and succeeds
+        assert!(result.success);
     }
1692-1714: Tests with no assertions don't verify behavior.

test_remove_nonexistent_challenge_error, test_pause_resume_challenge_errors, and similar tests capture result but never assert on it. These tests only verify the code doesn't panic, but don't validate the expected outcomes.

Consider adding assertions:
♻️ Suggested improvement
     async fn test_pause_resume_challenge_errors() {
         let (executor, _dir) = create_test_executor();

-        // Paths for lines 332, 381
         let result = executor.execute_command(&SubnetCommand::PauseChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
+        assert!(result.success, "PauseChallenge on nonexistent should succeed as no-op");

         let result = executor.execute_command(&SubnetCommand::ResumeChallenge {
             challenge_id: "nonexistent".into(),
         }).await;
+        assert!(result.success, "ResumeChallenge on nonexistent should succeed as no-op");
     }
1716-1732: Test lacks assertions for error path verification.

The test captures results for unbanning non-existent entities but doesn't assert on them. Based on the implementation, these should return errors, so add appropriate assertions:
♻️ Suggested improvement
     async fn test_unban_nonexistent_entities() {
         let (executor, _dir) = create_test_executor();

-        // Paths for lines 416-417, 425-426
         let result = executor.execute_command(&SubnetCommand::UnbanValidator {
             hotkey: Hotkey([99u8; 32]),
         }).await;
+        assert!(!result.success, "Unbanning non-existent validator should fail");

         let result = executor.execute_command(&SubnetCommand::UnbanHotkey {
             hotkey: Hotkey([88u8; 32]),
         }).await;
+        assert!(!result.success, "Unbanning non-existent hotkey should fail");

         let result = executor.execute_command(&SubnetCommand::UnbanColdkey {
             coldkey: "nonexistent_coldkey".into(),
         }).await;
+        assert!(!result.success, "Unbanning non-existent coldkey should fail");
     }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7678293 and 3da6dd9.

📒 Files selected for processing (6)

crates/subnet-manager/src/commands.rs
crates/subnet-manager/src/config.rs
crates/subnet-manager/src/health.rs
crates/subnet-manager/src/recovery.rs
crates/subnet-manager/src/snapshot.rs
crates/subnet-manager/src/update.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build
GitHub Check: Test

🔇 Additional comments (7)

crates/subnet-manager/src/recovery.rs (1)

319-335: LGTM!

Well-structured test helper that encapsulates the common setup pattern, reducing boilerplate across tests. Returning the TempDir ensures cleanup happens at the right time.

crates/subnet-manager/src/update.rs (2)

15-15: LGTM!

Adding PartialEq derive to UpdateType is appropriate for enabling test assertions. This is a minimal, non-breaking change that improves testability.

415-432: LGTM!

Good serialization round-trip tests that verify all UpdateType variants can be serialized and deserialized correctly.

crates/subnet-manager/src/config.rs (2)

374-388: LGTM!

Good round-trip test that verifies config serialization/deserialization preserves all fields by comparing JSON representations.

390-406: LGTM!

Comprehensive validation error tests covering all three validation conditions (epoch_length, max_validators, weight_interval). The use of pattern matching with guards effectively verifies both the error type and message content.

crates/subnet-manager/src/health.rs (1)

571-582: LGTM!

Test-only accessors are appropriately scoped with #[cfg(test)] and pub(crate), enabling internal state inspection for tests without exposing internals in production builds.

crates/subnet-manager/src/commands.rs (1)

609-650: LGTM!

Well-organized test helpers that encapsulate common setup patterns. The separation between create_executor_with_keypair (returns keypair for signature tests) and create_test_executor (simpler for non-signature tests) is appropriate.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3da6dd9 and 9cdec8b.

📒 Files selected for processing (4)

crates/subnet-manager/src/commands.rs
crates/subnet-manager/src/health.rs
crates/subnet-manager/src/recovery.rs
crates/subnet-manager/src/snapshot.rs

🧰 Additional context used

🧬 Code graph analysis (1)

crates/subnet-manager/src/commands.rs (4)

crates/subnet-manager/src/recovery.rs (1)

new (87-103)

crates/subnet-manager/src/snapshot.rs (1)

new (77-90)

crates/subnet-manager/src/config.rs (6)

new (249-251)

default (59-76)

default (99-107)

default (136-146)

load (179-183)

load (254-262)

crates/subnet-manager/src/update.rs (1)

new (127-135)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test

🔇 Additional comments (8)

crates/subnet-manager/src/snapshot.rs (1)

343-823: Excellent test coverage for snapshot functionality.

The test suite comprehensively covers:

Index loading and persistence

Snapshot creation with config and challenge data

Hash validation and corruption detection

Snapshot restoration and application

Pruning and retention policies

Edge cases (nonexistent snapshots, empty data, etc.)

The tests are well-structured, isolated using tempdir, and verify both in-memory state and filesystem changes.

crates/subnet-manager/src/recovery.rs (2)

319-353: Well-designed test helper functions.

The helper functions provide:

create_manager_with_config: Consistent RecoveryManager setup with custom config

create_aggressive_health_monitor: Easy testing of recovery triggers with tight thresholds

base_metrics: Baseline metrics with reasonable defaults (5 peers, etc.)

These reduce duplication and make tests more readable.

355-1254: Comprehensive recovery test coverage.

The test suite thoroughly covers:

All recovery actions and serialization

Health status transitions (healthy → degraded → unhealthy → critical)

Cooldown and attempt limiting logic

Rollback scenarios (to last snapshot, to specific snapshot, with/without snapshots)

Component-specific recovery paths (job_queue, network, evaluations)

Pause/resume functionality

History tracking and attempt counting

The tests properly verify both success and failure paths, making extensive use of the helper functions for readability.

crates/subnet-manager/src/health.rs (2)

571-582: Properly scoped test-only accessors.

The test accessors are correctly implemented:

Marked with #[cfg(test)] to exclude from production builds

pub(crate) visibility restricts access to crate-level tests

Enable direct manipulation of internal state for testing edge cases

This is a standard pattern for testing state transitions without exposing internal details to production code.

588-1058: Thorough health monitoring test coverage.

The test suite comprehensively validates:

Health status comparisons and transitions

Alert severity levels and lifecycle (creation, acknowledgement, auto-resolution)

Component-specific health checks with configurable thresholds

Metrics tracking and default values

Edge cases (zero totals, empty queues, boundary conditions)

Recovery signaling and worst-component identification

History management and trimming (100 entry limit)

Tests properly verify both the happy path and various degraded/unhealthy scenarios.

crates/subnet-manager/src/commands.rs (3)

609-651: Well-structured test helper functions.

The helper functions provide flexible executor setup:

build_executor_with_sudo: Core setup with custom sudo key

create_executor_with_keypair: Returns keypair for signature testing

create_test_executor: Simplified version for tests not needing the keypair

These reduce duplication while providing appropriate abstractions for different test scenarios.

652-1133: Comprehensive command execution test coverage.

The tests thoroughly validate:

CommandResult constructors (ok, ok_with_data, error)

Signature verification (valid sudo key, wrong signer, invalid signature)

Command serialization/deserialization for all variants

Query commands with data verification

Snapshot creation and rollback (including error paths)

Config updates with filesystem persistence

Challenge lifecycle (deploy, update, pause, resume, remove)

Edge cases (missing data, nonexistent entities, both fields None)

Tests properly verify both in-memory state changes and side effects (file writes, update queueing).

1135-1806: Excellent coverage of remaining command scenarios.

The tests comprehensively cover:

Ban/unban operations for all entity types (validator, hotkey, coldkey)

Validator management (kick when exists/doesn't exist, sync)

Recovery triggering (all action types, error paths)

Hard reset with different configurations

Serialization of all SubnetCommand variants

Edge cases (zero epoch length/min stake, nonexistent entities, multiple operations)

Error paths (invalid signatures, deserialize failures, missing snapshots)

The sha256_hex utility tests verify consistency and collision resistance. Overall, the test suite provides thorough coverage of the command execution system.

crates/subnet-manager/src/snapshot.rs

cuteolaf added 8 commits January 9, 2026 13:11

test(subnet-manager): add comprehensive units tests

59beb07

test: add missing tests for command.rs

a7d5f44

test: add missing tests for config.rs

7c94ac3

test: cover missing paths for health.rs

0f777ef

more tests for recovery.rs

96d21c7

more tests for snapshot.rs

ee8b6b8

add more tests for apply_update

c4fa3fb

some more tests for 100% test coverage

3da6dd9

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

nitpicks

9cdec8b

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

crates/subnet-manager/src/snapshot.rs Show resolved Hide resolved

echobt merged commit f0c5e30 into PlatformNetwork:main Jan 9, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit tests for `platform-subnet-manager`#27

Unit tests for `platform-subnet-manager`#27
echobt merged 9 commits intoPlatformNetwork:mainfrom
cuteolaf:test/subnet-manager

cuteolaf commented Jan 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cuteolaf commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuteolaf commented Jan 9, 2026 •

edited

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading