test: unicity-node audit coverage (#1 / #5 / #6) + hardening repros (… by b3y0urs3lf · Pull Request #15 · unicitynetwork/unicity-node

b3y0urs3lf · 2026-05-28T12:15:45Z

Adds the test suite produced during the May 2026 audit of issues #1 (PR #3 — BFT state snapshots), #5 (PR #11 — RPC server robustness), and #6 (PR #10 — addnode & state queries). Test-only change; no production code is modified.

Group A — regression / lock-in for behavior already delivered:

validation_tests.cpp: S1 mismatched-payloadRoot (3 sections, [TGE] BFT state snapshots to PoW chain #1 AC.2/4), S4 exact-MAX boundary ([TGE] BFT state snapshots to PoW chain #1 AC.3), req-6 negatives (2 sections, [TGE] BFT state snapshots to PoW chain #1 AC.6)
token_generator_test.cpp: 0600 file-mode ([TGE] BFT state snapshots to PoW chain #1 AC.9 / PR BFT snapshots #3 review fix)
bft_client_tests.cpp: read-timeout ([TGE] BFT state snapshots to PoW chain #1 AC.5 failure mode)
persistence_tests.cpp: vPayload round-trip ([TGE] BFT state snapshots to PoW chain #1 AC.2/4 persistence)
rpc_integration_tests.cpp: fix "Invalid hex characters" SECTION that previously bailed on missing-rewardtokenid before hex parsing (now reaches validation; companion to lploom's Apr 8 Python-side fix)
feature_rpc_socket_backlog.py: IPC Socket drops connections under concurrent access (listen backlog overflow) #5 Priority 1 / PR Increase listen backlog from 20 to SOMAXCONN #11
feature_addnode_sync_rpc.py + feature_addnode_logging.py: Silent outbound connection failure — addnode returns success but connection never established #6 sync conversion + logging
feature_reward_tokens_csv.py: [TGE] BFT state snapshots to PoW chain #1 AC.11 (CSV is the agreed interface per @ahtotruu's clarification: "cat csv is also CLI")
feature_bft_utb_retrieval.py: [TGE] BFT state snapshots to PoW chain #1 AC.5 e2e against the live bft-fgp-2sh cluster (HTTP fetch → CBOR → safety check)

Group B — passes-while-gap-holds repros (tracked in #14):

bug_rpc_no_keep_alive.py: IPC Socket drops connections under concurrent access (listen backlog overflow) #5 P3 / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 F2b
bug_addnode_no_retry.py: Silent outbound connection failure — addnode returns success but connection never established #6 P2 / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 F6a
bug_rpc_thread_per_request.py: IPC Socket drops connections under concurrent access (listen backlog overflow) #5 P2 / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 F2 (source-level)
bug_post_ibd_stall_disabled.py: Silent outbound connection failure — addnode returns success but connection never established #6 P3 / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 F6b (source-level)
bug_getaddednodeinfo_missing.py: audit-discovered / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 F6c
bug_reward_token_list_missing.py: design-doc tripwire per @ahtotruu's clarification (no RPC by design; CSV is the interface)
bug_rpc_busy_path_resets.py: fixed-but-inconclusive / unicity-node hardening — follow-ups from the bft-snapshots audit (issues #1 / #5 / #6) #14 H2

Flip-on-fix contract: the bug_* tests pass today by reproducing the gap. They are NOT discovered by test/functional/test_runner.py (its allowlist excludes bug_) — run them on demand. When an #14 fix lands, the corresponding bug_* test will exit non-zero ("gap closed unexpectedly"); the implementer should then invert the assertion, add the positive-path case, and rename to feature_* so it joins default CI.

Refs: #1 #5 #6 #14

@ahtotruu

) Adds the test suite produced during the May 2026 audit of issues #1 (PR #3 — BFT state snapshots), #5 (PR #11 — RPC server robustness), and #6 (PR #10 — addnode & state queries). Test-only change; no production code is modified. Group A — regression / lock-in for behavior already delivered: - validation_tests.cpp: S1 mismatched-payloadRoot (3 sections, #1 AC.2/4), S4 exact-MAX boundary (#1 AC.3), req-6 negatives (2 sections, #1 AC.6) - token_generator_test.cpp: 0600 file-mode (#1 AC.9 / PR #3 review fix) - bft_client_tests.cpp: read-timeout (#1 AC.5 failure mode) - persistence_tests.cpp: vPayload round-trip (#1 AC.2/4 persistence) - rpc_integration_tests.cpp: fix "Invalid hex characters" SECTION that previously bailed on missing-rewardtokenid before hex parsing (now reaches validation; companion to lploom's Apr 8 Python-side fix) - feature_rpc_socket_backlog.py: #5 Priority 1 / PR #11 - feature_addnode_sync_rpc.py + feature_addnode_logging.py: #6 sync conversion + logging - feature_reward_tokens_csv.py: #1 AC.11 (CSV is the agreed interface per @ahtotruu's clarification: "cat csv is also CLI") - feature_bft_utb_retrieval.py: #1 AC.5 e2e against the live bft-fgp-2sh cluster (HTTP fetch → CBOR → safety check) Group B — passes-while-gap-holds repros (tracked in #14): - bug_rpc_no_keep_alive.py: #5 P3 / #14 F2b - bug_addnode_no_retry.py: #6 P2 / #14 F6a - bug_rpc_thread_per_request.py: #5 P2 / #14 F2 (source-level) - bug_post_ibd_stall_disabled.py: #6 P3 / #14 F6b (source-level) - bug_getaddednodeinfo_missing.py: audit-discovered / #14 F6c - bug_reward_token_list_missing.py: design-doc tripwire per @ahtotruu's clarification (no RPC by design; CSV is the interface) - bug_rpc_busy_path_resets.py: fixed-but-inconclusive / #14 H2 Flip-on-fix contract: the bug_* tests pass today by reproducing the gap. They are NOT discovered by test/functional/test_runner.py (its allowlist excludes bug_) — run them on demand. When an #14 fix lands, the corresponding bug_* test will exit non-zero ("gap closed unexpectedly"); the implementer should then invert the assertion, add the positive-path case, and rename to feature_* so it joins default CI. Refs: #1 #5 #6 #14 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds a comprehensive suite of functional and unit tests to reproduce known bugs and verify key features, including synchronous addnode behavior, RPC server backlog and concurrency handling, BFT UTB retrieval, and block validation. The reviewer's feedback highlights several opportunities to harden these tests against crashes and flakiness. Key recommendations include wrapping Linux-specific path checks in try-except blocks for cross-platform compatibility, dynamically allocating ports to avoid CI conflicts, handling potential IndexError and socket connection exceptions, simplifying fragile regex parsing, and refactoring the BFT client to make timeouts configurable to avoid slow unit tests.

gemini-code-assist · 2026-05-28T12:17:22Z

+        with open("/proc/sys/net/core/somaxconn") as f:
+            somaxconn = int(f.read().strip())
+        log(f"  /proc/sys/net/core/somaxconn = {somaxconn}",
+            GREEN if somaxconn >= 128 else RED)
+        if somaxconn < 128:
+            failures.append(f"SOMAXCONN too small: {somaxconn}")


The test attempts to read /proc/sys/net/core/somaxconn directly, which is a Linux-specific path. Running this test on non-Linux platforms (such as macOS or Windows) will raise a FileNotFoundError and crash the test suite.

To ensure cross-platform compatibility, wrap this check in a try-except block to gracefully skip the kernel parameter check on non-Linux systems.

Suggested change

with open("/proc/sys/net/core/somaxconn") as f:

somaxconn = int(f.read().strip())

log(f" /proc/sys/net/core/somaxconn = {somaxconn}",

GREEN if somaxconn >= 128 else RED)

if somaxconn < 128:

failures.append(f"SOMAXCONN too small: {somaxconn}")

try:

with open("/proc/sys/net/core/somaxconn") as f:

somaxconn = int(f.read().strip())

log(f" /proc/sys/net/core/somaxconn = {somaxconn}",

GREEN if somaxconn >= 128 else RED)

if somaxconn < 128:

failures.append(f"SOMAXCONN too small: {somaxconn}")

except FileNotFoundError:

log(" /proc/sys/net/core/somaxconn not found (non-Linux platform), skipping kernel check", YELLOW)

gemini-code-assist · 2026-05-28T12:17:22Z

+        node.start()
+
+        # Trigger several distinct failures on different ports so we can count log lines.
+        ports = [65501, 65502, 65503, 65504, 65505]


Using hardcoded ports (like 65501 through 65505) in functional tests can lead to intermittent test failures and flakiness if those ports are already in use by other processes on the host machine or in a shared CI environment.

Since pick_free_port is already imported in this file, use it to dynamically allocate unused ports for the test.

Suggested change

ports = [65501, 65502, 65503, 65504, 65505]

ports = [pick_free_port() for _ in range(5)]

gemini-code-assist · 2026-05-28T12:17:22Z

+            rows = list(reader)
+
+        # Header line
+        header = rows[0]


If the CSV file is empty (e.g., due to a write failure or serialization issue in the node), rows will be an empty list, and accessing rows[0] will raise an unhandled IndexError: list index out of range, crashing the test.

Add a check to ensure rows is not empty before accessing the header.

if not rows: log(" FAIL — CSV file is empty", RED) return 1 header = rows[0]

gemini-code-assist · 2026-05-28T12:17:23Z

+    s.settimeout(timeout)
+    data = b""
+    while b"\r\n\r\n" not in data:
+        chunk = s.recv(4096)
+        if not chunk:
+            return None, None, None, True
+        data += chunk


The read_http_response helper function does not handle socket connection errors (such as ConnectionResetError, BrokenPipeError, or general OSError) during the initial receive loop. If the server abruptly closes or resets the connection, the test will crash with an unhandled exception instead of failing gracefully or correctly identifying that the connection was closed.

Wrap the receive loop in a try-except block to handle these connection errors and return closed = True.

Suggested change

s.settimeout(timeout)

data = b""

while b"\r\n\r\n" not in data:

chunk = s.recv(4096)

if not chunk:

return None, None, None, True

data += chunk

s.settimeout(timeout)

data = b""

try:

while b"\r\n\r\n" not in data:

chunk = s.recv(4096)

if not chunk:

return None, None, None, True

data += chunk

except (ConnectionResetError, BrokenPipeError, OSError):

return None, None, None, True

gemini-code-assist · 2026-05-28T12:17:23Z

+    m = re.search(r"void\s+RPCServer::ServerThread\s*\(\s*\)\s*\{(.+?)\n\}\n", text, re.DOTALL)
+    if not m:
+        log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)
+        log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)
+        return 1
+    body = m.group(1)


The regular expression used to extract the body of RPCServer::ServerThread is highly fragile. It relies on the exact indentation and formatting of the closing braces (expecting \n}\n with no leading whitespace). If the source file is auto-formatted, or if nested blocks with specific indentation are added, the regex will fail to match or truncate the function body prematurely.

Since the presence of std::thread and .detach() in rpc_server.cpp is already a strong indicator of the thread-per-request pattern, consider searching the entire file content instead of attempting to extract the function body with a fragile regex.

Suggested change

m = re.search(r"void\s+RPCServer::ServerThread\s*\(\s*\)\s*\{(.+?)\n\}\n", text, re.DOTALL)

if not m:

log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)

log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)

return 1

body = m.group(1)

if "void RPCServer::ServerThread" not in text:

log("ServerThread() function not found — has it been renamed/restructured?", YELLOW)

log("That alone could indicate the fix landed (e.g. a thread-pool refactor).", YELLOW)

return 1

body = text

gemini-code-assist · 2026-05-28T12:17:23Z

+  SECTION("Timeout: slow server triggers read timeout (req 5 failure mode)") {
+    svr.Get("/api/v1/trustbases/slow", [](const httplib::Request&, httplib::Response& res) {
+      // Block longer than the client's 5 s read-timeout.
+      std::this_thread::sleep_for(std::chrono::seconds(6));


Sleeping for 6 seconds in a unit test is a major anti-pattern as it significantly slows down the test suite execution. This is currently required because the read timeout of 5 seconds is hardcoded in bft_client.cpp.

Consider refactoring HttpBFTClient in a future production PR to make the read timeout configurable (e.g., via a constructor parameter or setter). This would allow unit tests to configure a very short timeout (e.g., 50ms) and execute almost instantaneously.

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: unicity-node audit coverage (#1 / #5 / #6) + hardening repros (…#15

test: unicity-node audit coverage (#1 / #5 / #6) + hardening repros (…#15
b3y0urs3lf wants to merge 1 commit into
mainfrom
unicity-node-audit-tests

b3y0urs3lf commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ports = [65501, 65502, 65503, 65504, 65505]
	ports = [pick_free_port() for _ in range(5)]

Conversation

b3y0urs3lf commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant