Add HiveMind gateway integration by pookNast · Pull Request #16 · hexo-ai/sia

pookNast · 2026-06-02T18:11:22Z

Summary

Adds a HiveMind/local-LLM reference target agent template
Wires --hivemind CLI selection into orchestration
Adds HiveMind endpoint/model configuration via environment overrides

Test plan

ruff check /home/pook/sia/sia /home/pook/sia/tests
python -m pytest tests -q locally on the split branch

Split out from #15 per review feedback.

🤖 Generated with OpenClaude

Extract hardcoded model names, timeouts, limits, and defaults into a single config module with environment variable overrides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace 8 inline hardcoded defaults with Config.* references. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace hardcoded truncation limits and default models with config imports. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Environment variables (SIA_META_MODEL, SIA_TASK_MODEL, etc.) now override defaults with lower priority than explicit CLI flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace broad except Exception with specific exception types (JSONDecodeError, OSError, SubprocessError) for better error diagnosis. Keep a safety-net handler at the generation loop level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace broad except Exception with specific exception types (OSError, JSONDecodeError, RuntimeError) for better error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace shell=True piped tee with direct subprocess.run(arg_list) and file write. Adds configurable timeout via Config.EVAL_TIMEOUT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace shell=True piped tee with Popen streaming stdout to both console and log file simultaneously. Removes bash dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When --sandbox docker is specified, target agents run inside a Docker container with read-only dataset mount, read-write working directory, no network access, and resource limits. Default is sandbox=none (current behavior unchanged). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document the security model, execution modes, bypassPermissions rationale, and Docker sandbox usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Section 1 of main() extracted into a load_task_files() function returning a TaskFiles dataclass. Improves readability and testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Section 2 of main() extracted into setup_run_directory() and _create_venv() helper functions. Returns RunSetup dataclass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Meta-agent and feedback-agent prompt templates extracted into dedicated builder functions for maintainability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The 400+ line generation loop extracted into run_generation() and helper functions. main() is now a thin orchestrator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Smoke tests for --help, missing args, and invalid task name. Also adds __main__.py to support `python -m sia` invocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add _safe_read_file() and _safe_load_json() helpers that enforce Config.MAX_CONTEXT_FILE_SIZE limits. Prevents unbounded memory usage from oversized execution logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enforce Config.MAX_EXECUTION_LOG_SIZE on trajectory files and results.json. Oversized files are skipped with a warning instead of loading into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace remaining magic number truncation limits with Config constants for discoverability and tuning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test default values, env var overrides, and invalid value fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mock subprocess tests for skipped, success, failure, and timeout scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test target agent execution and context tracking with mocked subprocess. Verifies directory structure and context.md creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Verify feedback agent invocation, directory creation across generations, and context.md tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Verify Docker command construction, mount flags, and sandbox mode selection with mocked subprocess. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Test _safe_read_file and _safe_load_json with files at, above, and below size limits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix 13 ruff lint errors (unused imports, f-string, sorted imports) - Change DEFAULT_MAX_TURNS/CONTEXT_SUMMARY_MAX_TURNS from str to int - Replace sys.exit(1) with graceful return in _run_target_agent - Add truncation to single-trajectory JSON in feedback prompt - Replace print() with logger in context_manager - Mock _generate_llm_summary in test_multiple_generations_track_deltas All 61 tests passing, ruff clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add HiveMind config fields (endpoint, model, task_model) to Config - Add --hivemind CLI flag to route target agents through local LLMs - Add reference_target_agent_hivemind.py template using OpenAI-compatible API - Support SIA_HIVEMIND_ENDPOINT and SIA_HIVEMIND_MODEL env var overrides Usage: sia --task gpqa --max_gen 3 --hivemind Routes target agents to qwen3.6-27b via HiveMind (:8400), meta/feedback agents still use Claude for quality reasoning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This reverts commit b3c7bfa.

Address the reviewer-requested security contact email and clean up Ruff lint failures so the hardening PR can pass CI. Co-Authored-By: OpenClaude (gpt-5.5) <openclaude@gitlawb.com>

- Add HiveMind config fields (endpoint, model, task_model) to Config - Add --hivemind CLI flag to route target agents through local LLMs - Add reference_target_agent_hivemind.py template using OpenAI-compatible API - Support SIA_HIVEMIND_ENDPOINT and SIA_HIVEMIND_MODEL env var overrides Usage: sia --task gpqa --max_gen 3 --hivemind Routes target agents to qwen3.6-27b via HiveMind (:8400), meta/feedback agents still use Claude for quality reasoning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pookNast and others added 29 commits May 29, 2026 18:34

feat: add sia/config.py with centralized constants

ad23890

Extract hardcoded model names, timeouts, limits, and defaults into a single config module with environment variable overrides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: orchestrator uses centralized config from sia.config

1e70369

Replace 8 inline hardcoded defaults with Config.* references. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: context_manager uses centralized config

c0dc7b8

Replace hardcoded truncation limits and default models with config imports. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: config supports SIA_* env var overrides

2d49540

Environment variables (SIA_META_MODEL, SIA_TASK_MODEL, etc.) now override defaults with lower priority than explicit CLI flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: narrow exception handlers in context_manager.py

b422918

Replace broad except Exception with specific exception types (OSError, JSONDecodeError, RuntimeError) for better error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: remove shell=True from run_evaluation subprocess

6d2b001

Replace shell=True piped tee with direct subprocess.run(arg_list) and file write. Adds configurable timeout via Config.EVAL_TIMEOUT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: remove shell=True from target agent execution

419a0ff

Replace shell=True piped tee with Popen streaming stdout to both console and log file simultaneously. Removes bash dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add SECURITY.md with threat model and sandbox docs

90bb2b1

Document the security model, execution modes, bypassPermissions rationale, and Docker sandbox usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: extract load_task_files from main()

0d6f214

Section 1 of main() extracted into a load_task_files() function returning a TaskFiles dataclass. Improves readability and testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: extract setup_run_directory from main()

c8bfd1c

Section 2 of main() extracted into setup_run_directory() and _create_venv() helper functions. Returns RunSetup dataclass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: extract prompt builders from main()

5afd3f9

Meta-agent and feedback-agent prompt templates extracted into dedicated builder functions for maintainability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: extract run_generation from main()

83825cb

The 400+ line generation loop extracted into run_generation() and helper functions. main() is now a thin orchestrator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: verify CLI interface unchanged after decomposition

94427ff

Smoke tests for --help, missing args, and invalid task name. Also adds __main__.py to support `python -m sia` invocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add file size checks in context_manager.py

8e4f163

Add _safe_read_file() and _safe_load_json() helpers that enforce Config.MAX_CONTEXT_FILE_SIZE limits. Prevents unbounded memory usage from oversized execution logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add size guards on JSON loading in orchestrator

e76abbe

Enforce Config.MAX_EXECUTION_LOG_SIZE on trajectory files and results.json. Oversized files are skipped with a warning instead of loading into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: centralize all truncation limits in config

bd64826

Replace remaining magic number truncation limits with Config constants for discoverability and tuning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add unit tests for config module

9ba8dd9

Test default values, env var overrides, and invalid value fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add tests for run_evaluation helper

15d88cf

Mock subprocess tests for skipped, success, failure, and timeout scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add integration test for single generation loop

fc3c9bb

Test target agent execution and context tracking with mocked subprocess. Verifies directory structure and context.md creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add multi-generation evolution test

372a708

Verify feedback agent invocation, directory creation across generations, and context.md tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add Docker sandbox execution tests

c125d3f

Verify Docker command construction, mount flags, and sandbox mode selection with mocked subprocess. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add file size limit enforcement tests

6f03e34

Test _safe_read_file and _safe_load_json with files at, above, and below size limits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert "feat: wire SIA to HiveMind gateway for local LLM evolution"

0ebeb89

This reverts commit b3c7bfa.

Fix PR lint and security contact

4cd0c70

Address the reviewer-requested security contact email and clean up Ruff lint failures so the hardening PR can pass CI. Co-Authored-By: OpenClaude (gpt-5.5) <openclaude@gitlawb.com>

pookNast mentioned this pull request Jun 2, 2026

Security hardening: sandbox mode, shell injection fixes, config centralization #15

Merged

6 tasks

This was referenced Jun 6, 2026

Review all branches in upstream hexo-ai/sia for useful commits, features, or fixes micahstubbs/sia_rust#43

Closed

Rust Port: Overall Migration Plan, Phases, and Tracking (Umbrella) micahstubbs/sia_rust#34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add HiveMind gateway integration#16

Add HiveMind gateway integration#16
pookNast wants to merge 29 commits into
hexo-ai:mainfrom
pookNast:hivemind-split

pookNast commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pookNast commented Jun 2, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant