Skip to content

refactor(ai): componentize runtime and configuration with schema-validated loading#1421

Open
stringl1l1l1l wants to merge 4 commits intoapache:aifrom
stringl1l1l1l:ai
Open

refactor(ai): componentize runtime and configuration with schema-validated loading#1421
stringl1l1l1l wants to merge 4 commits intoapache:aifrom
stringl1l1l1l:ai

Conversation

@stringl1l1l1l
Copy link

This PR restructures the AI module around a component-based runtime, adds schema-driven configuration loading/validation, and aligns test coverage with the new architecture.

What Changed

  1. Architecture refactor
  • Migrated AI domains into ai/component/* and unified startup through a factory-based runtime.
  • Standardized component lifecycle orchestration (Validate -> Init -> Start).
  1. Configuration system upgrade
  • Added a schema-aware loader with strict YAML decoding and JSON Schema validation.
  • Added schema assets under ai/schema/json and unified component config entrypoints via ai/config.yaml.
  1. CLI and workflow alignment
  • Added/updated indexing CLI flow to use the new config/component model.
  • Updated environment/config conventions (including SCHEMA_DIR and provider key usage).
  1. Test consolidation
  • Reorganized and expanded tests for config loader, runtime orchestration, component validation, and key workflows.

Breaking Changes

  • Startup is now config-driven (-config, default config.yaml) rather than legacy server flags.
  • Config files are strictly validated; unknown or structurally invalid fields now fail fast.
  • Component YAML locations and wiring are now expected via ai/config.yaml.

Validation

go test -race -v ./config/... ./runtime/... ./component/... ./test/... ./cmd/...

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 5, 2026

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the AI module into a config-driven, component-based runtime with schema-validated YAML loading, replacing the prior manager/plugin/server wiring and consolidating tests around the new architecture.

Changes:

  • Introduced a runtime that bootstraps components via factories and orchestrates Validate -> Init -> Start.
  • Added schema-driven config loading (strict YAML decode + JSON Schema validation + default injection) and new schema assets under ai/schema/json.
  • Reorganized AI domains into ai/component/*, updated CLI entrypoints, and removed legacy plugins/manager/test scaffolding.

Reviewed changes

Copilot reviewed 98 out of 100 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
ai/utils/utils_test.go Removed legacy local-path integration tests.
ai/utils/utils.go Added Eino ↔ Genkit document conversion helpers.
ai/tools/memory.go Removed legacy memory tool wiring.
ai/testutils/mocks.go Added mock runtime for component tests.
ai/testutils/helpers.go Added shared test helpers/utilities.
ai/test/rag_test.go Removed legacy RAG integration tests.
ai/test/models.md Added model availability testing guide.
ai/test/mcp_test.go Removed legacy MCP tests.
ai/test/llm_test.go Removed legacy LLM tests.
ai/server/models.go Removed legacy server response/request models (moved into component server engine).
ai/schema/react.go Updated schema types to use new tools engine ToolOutput type.
ai/schema/json/tools.schema.json Added JSON schema for tools component config.
ai/schema/json/server.schema.json Added JSON schema for server component config.
ai/schema/json/rag.schema.json Added JSON schema for RAG component config.
ai/schema/json/models.schema.json Added JSON schema for models component config.
ai/schema/json/memory.schema.json Added JSON schema for memory component config.
ai/schema/json/main.schema.json Added JSON schema for root config.yaml.
ai/schema/json/logger.schema.json Added JSON schema for logger component config.
ai/schema/json/agent.schema.json Added JSON schema for agent component config.
ai/schema/json/REQUIRED_FIELDS.md Documented required-field policy matrix for configs.
ai/schema/json/README.md Added schema index documentation.
ai/runtime/runtime.go Introduced component runtime, factory registration, bootstrap flow.
ai/prompts/agentAct.txt Added agent act prompt file.
ai/main.go Switched entrypoint to runtime bootstrap + component factories + shutdown orchestration.
ai/go.mod Updated dependencies (Genkit bump, added Eino + Eino-ext components, etc.).
ai/config/jsonschema.go Added schema engine for loading/compiling JSON schemas + defaults/validation.
ai/config/config.go Replaced legacy globals/constants with a generic Config{Type, Spec}.
ai/config.yaml Added root config.yaml wiring component yaml paths.
ai/component/tools/tools.yaml Added tools component config.
ai/component/tools/test/tools_test.go Added tools component validation test.
ai/component/tools/factory.go Added tools component factory for runtime registration.
ai/component/tools/engine/tools.go Refactored tools engine package and logging integration.
ai/component/tools/engine/memory.go Added internal memory/RAG tool definitions in tools engine.
ai/component/tools/engine/mcp.go Moved MCP tool manager into tools engine package.
ai/component/tools/config.go Added tools configuration structures and defaults.
ai/component/tools/component.go Added runtime-aware tools component implementation.
ai/component/server/test/server_test.go Added server component validation tests.
ai/component/server/server.yaml Added server component config.
ai/component/server/factory.go Added server component factory for runtime registration.
ai/component/server/engine/session/session.go Updated session engine to use runtime logger and new package paths.
ai/component/server/engine/router.go Moved router into server engine package and updated imports.
ai/component/server/engine/models.go Added server engine response/request models.
ai/component/server/engine/handlers.go Updated handlers to new agent/memory/runtime wiring.
ai/component/server/engine/docs/openapi.yaml Added/updated OpenAPI spec under component server engine.
ai/component/server/config.go Added server spec struct + defaults.
ai/component/server/component.go Added server component implementation (start/stop http server).
ai/component/rag/test/workflow_test.go Added RAG workflow unit tests with in-memory stubs.
ai/component/rag/test/rag_config_test.go Added RAG config semantic validation test.
ai/component/rag/retriever.go Added retriever implementations (dev/localvec and pinecone).
ai/component/rag/rerank.go Added Cohere reranker integration.
ai/component/rag/rag.yaml Added RAG component config.
ai/component/rag/rag.go Added RAG runtime-facing API (Split/Index/Retrieve + rerank).
ai/component/rag/parser.go Added markdown/pdf parsing + preprocessing wrappers.
ai/component/rag/options.go Added retrieval/indexer option types and helpers.
ai/component/rag/loader.go Added local file loader + directory loading helper.
ai/component/rag/indexer.go Added indexer implementations (dev/localvec and pinecone).
ai/component/rag/factory.go Added RAG factory + builder from spec.
ai/component/rag/config.go Added RAGSpec and sub-spec structs + semantic validation.
ai/component/models/test/models_test.go Added models component validation tests.
ai/component/models/models.yaml Added models component config (providers/models/embedders).
ai/component/models/factory.go Added models component factory for runtime registration.
ai/component/models/config.go Added models spec and provider/model/embedder structs.
ai/component/models/component.go Added models component to init Genkit registry + register models/embedders.
ai/component/memory/test/history_test.go Added memory/history unit tests.
ai/component/memory/memory.yaml Added memory component config.
ai/component/memory/history.go Renamed/refactored History to HistoryMemory and updated methods.
ai/component/memory/factory.go Added memory component factory for runtime registration.
ai/component/memory/config.go Added memory spec struct + defaults.
ai/component/memory/component.go Added runtime-aware memory component.
ai/component/logger/logger.yaml Added logger component config.
ai/component/logger/factory.go Added logger component factory for runtime registration.
ai/component/logger/config.go Added logger spec struct + defaults.
ai/component/logger/component.go Added logger component that configures slog default logger.
ai/component/agent/react/test/workflow_test.go Added ReAct flow workflow tests with stubs.
ai/component/agent/react/test/flow_test.go Added agent config validation tests.
ai/component/agent/react/factory.go Added agent component factory for runtime registration.
ai/component/agent/react/config.go Added agent spec/stage config + validation.
ai/component/agent/react/component.go Added runtime-aware agent component wiring tools into ReAct agent.
ai/component/agent/agent.yaml Added agent component config.
ai/component/agent/agent.go Updated agent interfaces to new memory type and orchestrator iteration config.
ai/cmd/rag.go Removed legacy rag CLI.
ai/cmd/index.go Added new indexing CLI aligned with new RAG config model.
ai/agent/react/react_test.go Removed legacy react integration tests.
ai/.env.example Expanded env example, including SCHEMA_DIR and provider keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

func parseFlags() *IndexCommand {
cmd := &IndexCommand{}

flag.StringVar(&cmd.Directory, "dir", "/Users/liwener/programming/ospp/dubbo-admin/ai/reference/k8s_docs/concepts", "Directory to index (required)")
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -dir flag default is a developer-specific absolute path ("/Users/liwener/..."), which will break for other environments. Prefer an empty default (and require the flag) or a repo-relative path (e.g., "./reference/..."), and document it in the usage string.

Suggested change
flag.StringVar(&cmd.Directory, "dir", "/Users/liwener/programming/ospp/dubbo-admin/ai/reference/k8s_docs/concepts", "Directory to index (required)")
flag.StringVar(&cmd.Directory, "dir", "", "Directory to index (required, no default)")

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +164
loader := appconfig.NewLoader("config.yaml")
componentCfg, err := loader.LoadComponent(configPath)
if err != nil {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadRAGConfig initializes the Loader with config.NewLoader("config.yaml"), so schema/config resolution becomes dependent on the current working directory rather than the provided configPath. This makes the CLI fragile when run outside the ai/ directory. Consider constructing the Loader with a config file path rooted at filepath.Dir(configPath) (or using absolute paths) so relative component paths and default schema path resolve reliably.

Suggested change
loader := appconfig.NewLoader("config.yaml")
componentCfg, err := loader.LoadComponent(configPath)
if err != nil {
// Resolve the provided config path to an absolute path so that loading
// does not depend on the current working directory.
absConfigPath, err := filepath.Abs(configPath)
if err != nil {
return nil, fmt.Errorf("failed to resolve absolute path for config %s: %w", configPath, err)
}
// Construct the loader's config file path relative to the config's directory.
baseDir := filepath.Dir(absConfigPath)
loaderConfigPath := filepath.Join(baseDir, "config.yaml")
loader := appconfig.NewLoader(loaderConfigPath)
componentCfg, err := loader.LoadComponent(absConfigPath)
if err != nil {

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +12
- `main.schema.json`: root [`config.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/config.yaml)
- `logger.schema.json`: [`component/logger/logger.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/logger/logger.yaml)
- `memory.schema.json`: [`component/memory/memory.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/memory/memory.yaml)
- `models.schema.json`: [`component/models/models.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/models/models.yaml)
- `tools.schema.json`: [`component/tools/tools.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/tools/tools.yaml)
- `server.schema.json`: [`component/server/server.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/server/server.yaml)
- `rag.schema.json`: [`component/rag/rag.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/rag/rag.yaml)
- `agent.schema.json`: [`component/agent/agent.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/agent/agent.yaml)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README links to schema/config files using absolute local filesystem paths under /Users/..., which won’t work for other contributors or in GitHub UI. Use repository-relative links (e.g., ../../config.yaml or /ai/config.yaml) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +19
```bash
cd /Users/liwener/programming/ospp/dubbo-admin/ai

# 设置API密钥
export DASHSCOPE_API_KEY="your_qwen_api_key"

# 运行测试
go test -v ./test/ -run TestTextGeneration
```
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guide uses absolute paths (e.g., cd /Users/.../ai) which are not portable. Prefer repo-relative commands (e.g., cd ai) so the documentation works for all contributors.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +78
// Start all loaded components
gloRuntime.Components.Range(func(key, value any) bool {
comp := value.(Component)
if err := comp.Start(); err != nil {
return false
}
return true
})
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bootstrap starts components via sync.Map.Range, which has non-deterministic iteration order and discards the Start() error (the callback returns false but the error is not surfaced). This can leave the runtime partially started while main logs success. Consider starting components in the same deterministic order as initialization (e.g., iterate the instances slice) and return the first start error from Bootstrap (and/or track started components for rollback).

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +69
// Initialize components in dependency order, which is the order of factory registration.
for _, comp := range instances {
if err := comp.Validate(); err != nil {
return nil, fmt.Errorf("failed to validate %s: %w", comp.Name(), err)
}

if err := comp.Init(gloRuntime); err != nil {
return nil, fmt.Errorf("failed to init %s: %w", comp.Name(), err)
}
gloRuntime.Components.Store(comp.Name(), comp)
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Components are stored in the runtime by comp.Name(). Since many components return a fixed name (e.g., "agent"), this will overwrite entries when components.<name> in config.yaml is an array (Loader generates names like agent-0, agent-1). To support multiple instances, store components under the config key (name from LoadedConfig.Components) and/or allow components to be instantiated with a unique name.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +151
// applyDefaults recursively applies default values from schema to value
// Modifies value in-place
func applyDefaults(root map[string]any, schema map[string]any, value any) {
resolved := resolveSchemaRef(root, schema)

switch v := value.(type) {
case map[string]any:
props, _ := resolved["properties"].(map[string]any)
for key, propVal := range props {
propSchema, ok := propVal.(map[string]any)
if !ok {
continue
}
propSchema = resolveSchemaRef(root, propSchema)

// Apply default value if property is missing
if _, exists := v[key]; !exists {
if defVal, hasDefault := propSchema["default"]; hasDefault {
v[key] = defVal
}
}

// Recursively apply defaults to nested properties
if child, exists := v[key]; exists {
applyDefaults(root, propSchema, child)
}
}

case []any:
if items, ok := resolved["items"].(map[string]any); ok {
items = resolveSchemaRef(root, items)
for i := range v {
applyDefaults(root, items, v[i])
}
}
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyDefaults only walks properties/items (and $ref) but does not handle schemas that use oneOf/anyOf/allOf (used by rag.schema.json for splitter.spec). This means schema defaults inside those branches won't be injected, and later semantic validation can fail even though the schema defines defaults. Consider extending default application to descend into composition keywords (e.g., choose a branch based on type discriminator when available, or apply defaults across all branches where safe).

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +108
if err := pinecone.Index(ctx, batch, docstore, namespace); err != nil {
return nil, fmt.Errorf("failed to index documents batch %d-%d: %w", i+1, end, err)
}
}

return nil, nil
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PineconeIndexer.Store returns (nil, nil) after successful indexing. The indexer contract expects IDs for the stored documents (DevIndexer returns them), and returning nil can break callers that rely on IDs or interpret nil as “nothing stored”. Consider returning the input document IDs (or IDs returned by the backend if available).

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +46
if r.retriever == nil {
r.retriever = make(map[string]ai.Retriever)
}
ret := r.retriever[targetIndex]
if ret != nil {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PineconeRetriever caches retrievers in a plain map without synchronization. getRetriever both reads and writes r.retriever, so concurrent Retrieve calls can race and panic. DevRetriever uses a mutex—PineconeRetriever should similarly guard map access (or use sync.Map).

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +5
port: 8880 # Server port
host: "localhost" # Server host
debug: false # Debug mode
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server.yaml sets host: "localhost", which binds only to loopback and can break container/K8s deployments where the service must listen on all interfaces. Consider using 0.0.0.0 (matching the schema default) unless there’s a specific reason to restrict binding.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants