feat: AgentX v1.0 by cquil11 · Pull Request #1970 · SemiAnalysisAI/InferenceX

cquil11 · 2026-07-01T17:00:41Z

Summary

Reorganizes AgentX v1.0 into five squashed implementation groups. This keeps the existing PR/branch but removes the exploratory commit history and separates utility code, runtime plumbing, recipes, and final config registration.

Groups

1. AgentX result utilities

Implemented the utils.agentic package for result aggregation, backend-specific metric extraction, server-log parsing, trace metadata, dataset helpers, and result validation. The package is intentionally modular: common aggregation flow lives in shared code, while backend adapters under utils.agentic.aggregation.backends isolate vLLM, SGLang, and Dynamo-vLLM metric differences.

This modularity is required because AgentX results are not uniform across backends. Different engines expose cache hits, KV usage, request accounting, and server-side telemetry through different metric names and log formats. Keeping those mappings behind backend-specific adapters lets future contributors add another backend by implementing a focused adapter instead of modifying one large result processor. It also makes unit testing practical because request metrics, server metrics, server-log parsing, trace metadata, and validation can be tested independently.

The utility layer also includes smaller supporting changes, including success-rate calculation updates and shared constants used by AgentX result processing. These keep downstream aggregation behavior consistent between normal benchmark rows and AgentX-derived rows.

2. Runtime and CI plumbing

Wired AgentX into the shared benchmark/runtime path: workflow templates, matrix schema/generation, runner launchers, shared benchmark helpers, and the AIPerf submodule. This provides the common execution path consumed by the recipe/config layers.

AgentX e2e workflow runs now also trigger the InferenceX-app ingest-agentic-results repository-dispatch receiver after successful manual agentic sweeps. The dispatch passes the GitHub run ID and attempt once agentic artifacts and run stats are available, and it is gated to avoid ingesting PR/comment reusable workflow calls or partial failures.

This group also extends runner metadata in runners.yaml. AgentX needs runner-level resource information that fixed-sequence benchmarks mostly did not need, especially host DRAM availability for CPU/DRAM KV-offload configurations. The matrix logic uses runner metadata plus per-config model/runtime fields to reason about whether a proposed offload point is valid for a given runner. In practice, the available host-memory budget is derived from the runner entry rather than hardcoded inside each benchmark script, so config generation can consistently filter or size host-offload points across B200/B300/GB-class runners.

The intent is that resource capacity lives in runner/config metadata, while benchmark scripts focus on launching the server. That keeps decisions like DRAM-offload eligibility, runner selection, and generated sweep shape in the matrix layer instead of scattering those calculations through shell scripts.

3. Single-node AgentX recipes

Added and updated single-node benchmark scripts for DSv4, MiniMax, Kimi, and Qwen AgentX runs across NVIDIA and AMD runners. Deprecated AgentX scripts that are no longer part of the v1.0 surface were removed.

These recipes are still best-effort and experimental. They are included in the v1.0 release to document working patterns and provide templates for future contributors, not because every model/backend combination should be treated as fully production-hardened.

4. Multi-node AgentX recipes

Added GB200/GB300 disaggregated AgentX recipes and updated the SRT launcher path. This isolates multi-node recipe review from single-node runtime scripts.

These recipes are also best-effort and experimental. They are intentionally left in v1.0 as examples for future multi-node AgentX contributors, especially around disaggregated serving topology, SRT launcher integration, and backend-specific recipe structure.

5. Final sweep config registration

Updated NVIDIA/AMD master configs, runner metadata, and config docs for the final AgentX v1.0 sweep surface. This intentionally collapses config-testing churn into one reviewable final-state commit.

The config changes include the final AgentX matrix definitions, runner metadata needed for resource-aware generation, and documentation updates describing the new config surface. The goal is to keep sweep registration declarative: model/backend/runner capabilities are described in config files, then matrix generation applies the shared validation/resource logic.

Main sync

Rebased onto latest main and resolved the process_agentic_result conflicts by keeping the new utils.agentic.aggregation package layout and deleting the old top-level processor/test files.

Validation

python -m pytest utils/matrix_logic/ utils/agentic/aggregation/test_process_agentic_result.py utils/agentic/aggregation/test_server_log_metrics.py utils/agentic/datasets/test_build_weka_hf_dataset.py utils/agentic/validation/test_validate_agentic_result.py utils/test_calc_success_rate.py -q
Result: 220 passed
uv run pytest tests/unit/dataset/loader/test_weka_aux_classification.py tests/unit/dataset/loader/test_weka_flat_split_v1_contract_adv.py tests/unit/dataset/loader/test_weka_async_subagent.py tests/unit/dataset/loader/test_weka_overlap_groups.py -q in utils/aiperf
Result: 78 passed
uv run pytest tests/unit/dataset/loader/test_weka_trace.py -q -k 'not test_flattened_fanout_logs_detection_summary' in utils/aiperf
Result: 38 passed, 1 deselected
uv run --extra dev ruff check <touched AIPerf Python files>
Result: All checks passed

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

+        needs:
+            [
+                test-sweep-agentic,
+                test-sweep-multi-node-agentic,
+                collect-agentic-results,
+                calc-success-rate,
+            ]
+        if: >-
+            always() &&
+            github.event_name == 'workflow_dispatch' &&
+            needs.collect-agentic-results.result == 'success' &&
+            needs.calc-success-rate.result == 'success' &&
+            (
+              needs.test-sweep-agentic.result == 'success' ||
+              needs.test-sweep-multi-node-agentic.result == 'success'
+            ) &&
+            (
+              needs.test-sweep-agentic.result == 'success' ||
+              needs.test-sweep-agentic.result == 'skipped'
+            ) &&
+            (
+              needs.test-sweep-multi-node-agentic.result == 'success' ||
+              needs.test-sweep-multi-node-agentic.result == 'skipped'
+            )
+        runs-on: ubuntu-latest
+        steps:
+            - name: Trigger agentic database ingest
+              run: |
+                  curl -sSf -X POST \
+                    -H "Authorization: Bearer ${{ secrets.INFX_FRONTEND_PAT }}" \
+                    -H "Accept: application/vnd.github+v3+json" \
+                    https://api.github.com/repos/SemiAnalysisAI/InferenceX-app/dispatches \
+                    -d '{
+                      "event_type": "ingest-agentic-results",
+                      "client_payload": {
+                        "run-id": "${{ github.run_id }}",
+                        "run-attempt": "${{ github.run_attempt }}"
+                      }
+                    }'


Signed-off-by: Cam Quilici <cjquilici@gmail.com>

github-project-automation Bot added this to InferenceMAX Board Jul 1, 2026

cquil11 mentioned this pull request Jul 1, 2026

Generalize per-subpoint benchmark checkpointing for reruns #1971

Open

github-code-quality Bot found potential problems Jul 1, 2026

View reviewed changes

Comment thread utils/calc_success_rate.py Dismissed

github-code-quality Bot found potential problems Jul 1, 2026

View reviewed changes

Comment thread utils/matrix_logic/generate_sweep_configs.py Dismissed

cquil11 changed the title ~~Feat/agentx v1.0~~ feat: AgentX v1.0 Jul 1, 2026

github-code-quality Bot found potential problems Jul 1, 2026

View reviewed changes

Comment thread utils/agentic/aggregation/trace_metadata.py Dismissed

Comment thread utils/agentic/aggregation/trace_metadata.py Dismissed

cquil11 force-pushed the feat/agentx-v1.0 branch from 5f0cf20 to 12a7ff1 Compare July 2, 2026 16:12

feat(agentic): add AgentX result utilities

dde20b7

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

cquil11 force-pushed the feat/agentx-v1.0 branch from 83510ce to c46a8db Compare July 2, 2026 18:16

github-advanced-security AI found potential problems Jul 2, 2026

View reviewed changes

cquil11 added 4 commits July 2, 2026 14:33

feat(agentic): wire AgentX runtime plumbing

665ac0d

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

feat(agentic): add single-node AgentX recipes

2127a8a

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

feat(agentic): add multi-node AgentX recipes

bea81b7

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

feat(agentic): register AgentX sweep configs

9d7393e

Signed-off-by: Cam Quilici <cjquilici@gmail.com>

cquil11 force-pushed the feat/agentx-v1.0 branch from da46877 to 9d7393e Compare July 2, 2026 19:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: AgentX v1.0#1970

feat: AgentX v1.0#1970
cquil11 wants to merge 5 commits into
mainfrom
feat/agentx-v1.0

cquil11 commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cquil11 commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Groups

1. AgentX result utilities

2. Runtime and CI plumbing

3. Single-node AgentX recipes

4. Multi-node AgentX recipes

5. Final sweep config registration

Main sync

Validation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cquil11 commented Jul 1, 2026 •

edited

Loading