Skip to content

test: unicity-node audit coverage (#1 / #5 / #6) + hardening repros (…#15

Open
b3y0urs3lf wants to merge 1 commit into
mainfrom
unicity-node-audit-tests
Open

test: unicity-node audit coverage (#1 / #5 / #6) + hardening repros (…#15
b3y0urs3lf wants to merge 1 commit into
mainfrom
unicity-node-audit-tests

Conversation

@b3y0urs3lf
Copy link
Copy Markdown
Contributor

#14)

Adds the test suite produced during the May 2026 audit of issues #1 (PR #3 — BFT state snapshots), #5 (PR #11 — RPC server robustness), and #6 (PR #10 — addnode & state queries). Test-only change; no production code is modified.

Group A — regression / lock-in for behavior already delivered:

Group B — passes-while-gap-holds repros (tracked in #14):

Flip-on-fix contract: the bug_* tests pass today by reproducing the gap. They are NOT discovered by test/functional/test_runner.py (its allowlist excludes bug_) — run them on demand. When an #14 fix lands, the corresponding bug_* test will exit non-zero ("gap closed unexpectedly"); the implementer should then invert the assertion, add the positive-path case, and rename to feature_* so it joins default CI.

Refs: #1 #5 #6 #14

)

Adds the test suite produced during the May 2026 audit of issues #1
(PR #3 — BFT state snapshots), #5 (PR #11 — RPC server robustness), and
#6 (PR #10 — addnode & state queries). Test-only change; no production
code is modified.

Group A — regression / lock-in for behavior already delivered:
  - validation_tests.cpp: S1 mismatched-payloadRoot (3 sections, #1 AC.2/4),
    S4 exact-MAX boundary (#1 AC.3), req-6 negatives (2 sections, #1 AC.6)
  - token_generator_test.cpp: 0600 file-mode (#1 AC.9 / PR #3 review fix)
  - bft_client_tests.cpp: read-timeout (#1 AC.5 failure mode)
  - persistence_tests.cpp: vPayload round-trip (#1 AC.2/4 persistence)
  - rpc_integration_tests.cpp: fix "Invalid hex characters" SECTION
    that previously bailed on missing-rewardtokenid before hex parsing
    (now reaches validation; companion to lploom's Apr 8 Python-side fix)
  - feature_rpc_socket_backlog.py: #5 Priority 1 / PR #11
  - feature_addnode_sync_rpc.py + feature_addnode_logging.py: #6 sync
    conversion + logging
  - feature_reward_tokens_csv.py: #1 AC.11 (CSV is the agreed interface
    per @ahtotruu's clarification: "cat csv is also CLI")
  - feature_bft_utb_retrieval.py: #1 AC.5 e2e against the live
    bft-fgp-2sh cluster (HTTP fetch → CBOR → safety check)

Group B — passes-while-gap-holds repros (tracked in #14):
  - bug_rpc_no_keep_alive.py: #5 P3 / #14 F2b
  - bug_addnode_no_retry.py: #6 P2 / #14 F6a
  - bug_rpc_thread_per_request.py: #5 P2 / #14 F2 (source-level)
  - bug_post_ibd_stall_disabled.py: #6 P3 / #14 F6b (source-level)
  - bug_getaddednodeinfo_missing.py: audit-discovered / #14 F6c
  - bug_reward_token_list_missing.py: design-doc tripwire per
    @ahtotruu's clarification (no RPC by design; CSV is the interface)
  - bug_rpc_busy_path_resets.py: fixed-but-inconclusive / #14 H2

Flip-on-fix contract: the bug_* tests pass today by reproducing the
gap. They are NOT discovered by test/functional/test_runner.py (its
allowlist excludes bug_) — run them on demand. When an #14 fix lands,
the corresponding bug_* test will exit non-zero ("gap closed
unexpectedly"); the implementer should then invert the assertion, add
the positive-path case, and rename to feature_* so it joins default CI.

Refs: #1 #5 #6 #14

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive suite of functional and unit tests to reproduce known bugs and verify key features, including synchronous addnode behavior, RPC server backlog and concurrency handling, BFT UTB retrieval, and block validation. The reviewer's feedback highlights several opportunities to harden these tests against crashes and flakiness. Key recommendations include wrapping Linux-specific path checks in try-except blocks for cross-platform compatibility, dynamically allocating ports to avoid CI conflicts, handling potential IndexError and socket connection exceptions, simplifying fragile regex parsing, and refactoring the BFT client to make timeouts configurable to avoid slow unit tests.

Comment on lines +139 to +144
with open("/proc/sys/net/core/somaxconn") as f:
somaxconn = int(f.read().strip())
log(f" /proc/sys/net/core/somaxconn = {somaxconn}",
GREEN if somaxconn >= 128 else RED)
if somaxconn < 128:
failures.append(f"SOMAXCONN too small: {somaxconn}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test attempts to read /proc/sys/net/core/somaxconn directly, which is a Linux-specific path. Running this test on non-Linux platforms (such as macOS or Windows) will raise a FileNotFoundError and crash the test suite.

To ensure cross-platform compatibility, wrap this check in a try-except block to gracefully skip the kernel parameter check on non-Linux systems.

Suggested change
with open("/proc/sys/net/core/somaxconn") as f:
somaxconn = int(f.read().strip())
log(f" /proc/sys/net/core/somaxconn = {somaxconn}",
GREEN if somaxconn >= 128 else RED)
if somaxconn < 128:
failures.append(f"SOMAXCONN too small: {somaxconn}")
try:
with open("/proc/sys/net/core/somaxconn") as f:
somaxconn = int(f.read().strip())
log(f" /proc/sys/net/core/somaxconn = {somaxconn}",
GREEN if somaxconn >= 128 else RED)
if somaxconn < 128:
failures.append(f"SOMAXCONN too small: {somaxconn}")
except FileNotFoundError:
log(" /proc/sys/net/core/somaxconn not found (non-Linux platform), skipping kernel check", YELLOW)

node.start()

# Trigger several distinct failures on different ports so we can count log lines.
ports = [65501, 65502, 65503, 65504, 65505]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using hardcoded ports (like 65501 through 65505) in functional tests can lead to intermittent test failures and flakiness if those ports are already in use by other processes on the host machine or in a shared CI environment.

Since pick_free_port is already imported in this file, use it to dynamically allocate unused ports for the test.

Suggested change
ports = [65501, 65502, 65503, 65504, 65505]
ports = [pick_free_port() for _ in range(5)]

rows = list(reader)

# Header line
header = rows[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the CSV file is empty (e.g., due to a write failure or serialization issue in the node), rows will be an empty list, and accessing rows[0] will raise an unhandled IndexError: list index out of range, crashing the test.

Add a check to ensure rows is not empty before accessing the header.

        if not rows:
            log("  FAIL — CSV file is empty", RED)
            return 1
        header = rows[0]

Comment on lines +46 to +52
s.settimeout(timeout)
data = b""
while b"\r\n\r\n" not in data:
chunk = s.recv(4096)
if not chunk:
return None, None, None, True
data += chunk
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The read_http_response helper function does not handle socket connection errors (such as ConnectionResetError, BrokenPipeError, or general OSError) during the initial receive loop. If the server abruptly closes or resets the connection, the test will crash with an unhandled exception instead of failing gracefully or correctly identifying that the connection was closed.

Wrap the receive loop in a try-except block to handle these connection errors and return closed = True.

Suggested change
s.settimeout(timeout)
data = b""
while b"\r\n\r\n" not in data:
chunk = s.recv(4096)
if not chunk:
return None, None, None, True
data += chunk
s.settimeout(timeout)
data = b""
try:
while b"\r\n\r\n" not in data:
chunk = s.recv(4096)
if not chunk:
return None, None, None, True
data += chunk
except (ConnectionResetError, BrokenPipeError, OSError):
return None, None, None, True

Comment on lines +48 to +53
m = re.search(r"void\s+RPCServer::ServerThread\s*\(\s*\)\s*\{(.+?)\n\}\n", text, re.DOTALL)
if not m:
log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)
log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)
return 1
body = m.group(1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression used to extract the body of RPCServer::ServerThread is highly fragile. It relies on the exact indentation and formatting of the closing braces (expecting \n}\n with no leading whitespace). If the source file is auto-formatted, or if nested blocks with specific indentation are added, the regex will fail to match or truncate the function body prematurely.

Since the presence of std::thread and .detach() in rpc_server.cpp is already a strong indicator of the thread-per-request pattern, consider searching the entire file content instead of attempting to extract the function body with a fragile regex.

Suggested change
m = re.search(r"void\s+RPCServer::ServerThread\s*\(\s*\)\s*\{(.+?)\n\}\n", text, re.DOTALL)
if not m:
log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)
log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)
return 1
body = m.group(1)
if "void RPCServer::ServerThread" not in text:
log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)
log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)
return 1
body = text

Comment on lines +135 to +138
SECTION("Timeout: slow server triggers read timeout (req 5 failure mode)") {
svr.Get("/api/v1/trustbases/slow", [](const httplib::Request&, httplib::Response& res) {
// Block longer than the client's 5 s read-timeout.
std::this_thread::sleep_for(std::chrono::seconds(6));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Sleeping for 6 seconds in a unit test is a major anti-pattern as it significantly slows down the test suite execution. This is currently required because the read timeout of 5 seconds is hardcoded in bft_client.cpp.

Consider refactoring HttpBFTClient in a future production PR to make the read timeout configurable (e.g., via a constructor parameter or setter). This would allow unit tests to configure a very short timeout (e.g., 50ms) and execute almost instantaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant