Add HiveMind gateway integration#16
Open
pookNast wants to merge 29 commits into
Open
Conversation
Extract hardcoded model names, timeouts, limits, and defaults into a single config module with environment variable overrides. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 8 inline hardcoded defaults with Config.* references. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hardcoded truncation limits and default models with config imports. No behavioral change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Environment variables (SIA_META_MODEL, SIA_TASK_MODEL, etc.) now override defaults with lower priority than explicit CLI flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace broad except Exception with specific exception types (JSONDecodeError, OSError, SubprocessError) for better error diagnosis. Keep a safety-net handler at the generation loop level. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace broad except Exception with specific exception types (OSError, JSONDecodeError, RuntimeError) for better error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace shell=True piped tee with direct subprocess.run(arg_list) and file write. Adds configurable timeout via Config.EVAL_TIMEOUT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace shell=True piped tee with Popen streaming stdout to both console and log file simultaneously. Removes bash dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When --sandbox docker is specified, target agents run inside a Docker container with read-only dataset mount, read-write working directory, no network access, and resource limits. Default is sandbox=none (current behavior unchanged). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the security model, execution modes, bypassPermissions rationale, and Docker sandbox usage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Section 1 of main() extracted into a load_task_files() function returning a TaskFiles dataclass. Improves readability and testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Section 2 of main() extracted into setup_run_directory() and _create_venv() helper functions. Returns RunSetup dataclass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Meta-agent and feedback-agent prompt templates extracted into dedicated builder functions for maintainability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 400+ line generation loop extracted into run_generation() and helper functions. main() is now a thin orchestrator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Smoke tests for --help, missing args, and invalid task name. Also adds __main__.py to support `python -m sia` invocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _safe_read_file() and _safe_load_json() helpers that enforce Config.MAX_CONTEXT_FILE_SIZE limits. Prevents unbounded memory usage from oversized execution logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enforce Config.MAX_EXECUTION_LOG_SIZE on trajectory files and results.json. Oversized files are skipped with a warning instead of loading into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace remaining magic number truncation limits with Config constants for discoverability and tuning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test default values, env var overrides, and invalid value fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mock subprocess tests for skipped, success, failure, and timeout scenarios. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test target agent execution and context tracking with mocked subprocess. Verifies directory structure and context.md creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify feedback agent invocation, directory creation across generations, and context.md tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify Docker command construction, mount flags, and sandbox mode selection with mocked subprocess. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test _safe_read_file and _safe_load_json with files at, above, and below size limits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix 13 ruff lint errors (unused imports, f-string, sorted imports) - Change DEFAULT_MAX_TURNS/CONTEXT_SUMMARY_MAX_TURNS from str to int - Replace sys.exit(1) with graceful return in _run_target_agent - Add truncation to single-trajectory JSON in feedback prompt - Replace print() with logger in context_manager - Mock _generate_llm_summary in test_multiple_generations_track_deltas All 61 tests passing, ruff clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add HiveMind config fields (endpoint, model, task_model) to Config - Add --hivemind CLI flag to route target agents through local LLMs - Add reference_target_agent_hivemind.py template using OpenAI-compatible API - Support SIA_HIVEMIND_ENDPOINT and SIA_HIVEMIND_MODEL env var overrides Usage: sia --task gpqa --max_gen 3 --hivemind Routes target agents to qwen3.6-27b via HiveMind (:8400), meta/feedback agents still use Claude for quality reasoning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit b3c7bfa.
Address the reviewer-requested security contact email and clean up Ruff lint failures so the hardening PR can pass CI. Co-Authored-By: OpenClaude (gpt-5.5) <openclaude@gitlawb.com>
- Add HiveMind config fields (endpoint, model, task_model) to Config - Add --hivemind CLI flag to route target agents through local LLMs - Add reference_target_agent_hivemind.py template using OpenAI-compatible API - Support SIA_HIVEMIND_ENDPOINT and SIA_HIVEMIND_MODEL env var overrides Usage: sia --task gpqa --max_gen 3 --hivemind Routes target agents to qwen3.6-27b via HiveMind (:8400), meta/feedback agents still use Claude for quality reasoning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--hivemindCLI selection into orchestrationTest plan
ruff check /home/pook/sia/sia /home/pook/sia/testspython -m pytest tests -qlocally on the split branchSplit out from #15 per review feedback.
🤖 Generated with OpenClaude