refactor(ai): componentize runtime and configuration with schema-validated loading#1421
refactor(ai): componentize runtime and configuration with schema-validated loading#1421stringl1l1l1l wants to merge 4 commits intoapache:aifrom
Conversation
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the AI module into a config-driven, component-based runtime with schema-validated YAML loading, replacing the prior manager/plugin/server wiring and consolidating tests around the new architecture.
Changes:
- Introduced a
runtimethat bootstraps components via factories and orchestratesValidate -> Init -> Start. - Added schema-driven config loading (strict YAML decode + JSON Schema validation + default injection) and new schema assets under
ai/schema/json. - Reorganized AI domains into
ai/component/*, updated CLI entrypoints, and removed legacy plugins/manager/test scaffolding.
Reviewed changes
Copilot reviewed 98 out of 100 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| ai/utils/utils_test.go | Removed legacy local-path integration tests. |
| ai/utils/utils.go | Added Eino ↔ Genkit document conversion helpers. |
| ai/tools/memory.go | Removed legacy memory tool wiring. |
| ai/testutils/mocks.go | Added mock runtime for component tests. |
| ai/testutils/helpers.go | Added shared test helpers/utilities. |
| ai/test/rag_test.go | Removed legacy RAG integration tests. |
| ai/test/models.md | Added model availability testing guide. |
| ai/test/mcp_test.go | Removed legacy MCP tests. |
| ai/test/llm_test.go | Removed legacy LLM tests. |
| ai/server/models.go | Removed legacy server response/request models (moved into component server engine). |
| ai/schema/react.go | Updated schema types to use new tools engine ToolOutput type. |
| ai/schema/json/tools.schema.json | Added JSON schema for tools component config. |
| ai/schema/json/server.schema.json | Added JSON schema for server component config. |
| ai/schema/json/rag.schema.json | Added JSON schema for RAG component config. |
| ai/schema/json/models.schema.json | Added JSON schema for models component config. |
| ai/schema/json/memory.schema.json | Added JSON schema for memory component config. |
| ai/schema/json/main.schema.json | Added JSON schema for root config.yaml. |
| ai/schema/json/logger.schema.json | Added JSON schema for logger component config. |
| ai/schema/json/agent.schema.json | Added JSON schema for agent component config. |
| ai/schema/json/REQUIRED_FIELDS.md | Documented required-field policy matrix for configs. |
| ai/schema/json/README.md | Added schema index documentation. |
| ai/runtime/runtime.go | Introduced component runtime, factory registration, bootstrap flow. |
| ai/prompts/agentAct.txt | Added agent act prompt file. |
| ai/main.go | Switched entrypoint to runtime bootstrap + component factories + shutdown orchestration. |
| ai/go.mod | Updated dependencies (Genkit bump, added Eino + Eino-ext components, etc.). |
| ai/config/jsonschema.go | Added schema engine for loading/compiling JSON schemas + defaults/validation. |
| ai/config/config.go | Replaced legacy globals/constants with a generic Config{Type, Spec}. |
| ai/config.yaml | Added root config.yaml wiring component yaml paths. |
| ai/component/tools/tools.yaml | Added tools component config. |
| ai/component/tools/test/tools_test.go | Added tools component validation test. |
| ai/component/tools/factory.go | Added tools component factory for runtime registration. |
| ai/component/tools/engine/tools.go | Refactored tools engine package and logging integration. |
| ai/component/tools/engine/memory.go | Added internal memory/RAG tool definitions in tools engine. |
| ai/component/tools/engine/mcp.go | Moved MCP tool manager into tools engine package. |
| ai/component/tools/config.go | Added tools configuration structures and defaults. |
| ai/component/tools/component.go | Added runtime-aware tools component implementation. |
| ai/component/server/test/server_test.go | Added server component validation tests. |
| ai/component/server/server.yaml | Added server component config. |
| ai/component/server/factory.go | Added server component factory for runtime registration. |
| ai/component/server/engine/session/session.go | Updated session engine to use runtime logger and new package paths. |
| ai/component/server/engine/router.go | Moved router into server engine package and updated imports. |
| ai/component/server/engine/models.go | Added server engine response/request models. |
| ai/component/server/engine/handlers.go | Updated handlers to new agent/memory/runtime wiring. |
| ai/component/server/engine/docs/openapi.yaml | Added/updated OpenAPI spec under component server engine. |
| ai/component/server/config.go | Added server spec struct + defaults. |
| ai/component/server/component.go | Added server component implementation (start/stop http server). |
| ai/component/rag/test/workflow_test.go | Added RAG workflow unit tests with in-memory stubs. |
| ai/component/rag/test/rag_config_test.go | Added RAG config semantic validation test. |
| ai/component/rag/retriever.go | Added retriever implementations (dev/localvec and pinecone). |
| ai/component/rag/rerank.go | Added Cohere reranker integration. |
| ai/component/rag/rag.yaml | Added RAG component config. |
| ai/component/rag/rag.go | Added RAG runtime-facing API (Split/Index/Retrieve + rerank). |
| ai/component/rag/parser.go | Added markdown/pdf parsing + preprocessing wrappers. |
| ai/component/rag/options.go | Added retrieval/indexer option types and helpers. |
| ai/component/rag/loader.go | Added local file loader + directory loading helper. |
| ai/component/rag/indexer.go | Added indexer implementations (dev/localvec and pinecone). |
| ai/component/rag/factory.go | Added RAG factory + builder from spec. |
| ai/component/rag/config.go | Added RAGSpec and sub-spec structs + semantic validation. |
| ai/component/models/test/models_test.go | Added models component validation tests. |
| ai/component/models/models.yaml | Added models component config (providers/models/embedders). |
| ai/component/models/factory.go | Added models component factory for runtime registration. |
| ai/component/models/config.go | Added models spec and provider/model/embedder structs. |
| ai/component/models/component.go | Added models component to init Genkit registry + register models/embedders. |
| ai/component/memory/test/history_test.go | Added memory/history unit tests. |
| ai/component/memory/memory.yaml | Added memory component config. |
| ai/component/memory/history.go | Renamed/refactored History to HistoryMemory and updated methods. |
| ai/component/memory/factory.go | Added memory component factory for runtime registration. |
| ai/component/memory/config.go | Added memory spec struct + defaults. |
| ai/component/memory/component.go | Added runtime-aware memory component. |
| ai/component/logger/logger.yaml | Added logger component config. |
| ai/component/logger/factory.go | Added logger component factory for runtime registration. |
| ai/component/logger/config.go | Added logger spec struct + defaults. |
| ai/component/logger/component.go | Added logger component that configures slog default logger. |
| ai/component/agent/react/test/workflow_test.go | Added ReAct flow workflow tests with stubs. |
| ai/component/agent/react/test/flow_test.go | Added agent config validation tests. |
| ai/component/agent/react/factory.go | Added agent component factory for runtime registration. |
| ai/component/agent/react/config.go | Added agent spec/stage config + validation. |
| ai/component/agent/react/component.go | Added runtime-aware agent component wiring tools into ReAct agent. |
| ai/component/agent/agent.yaml | Added agent component config. |
| ai/component/agent/agent.go | Updated agent interfaces to new memory type and orchestrator iteration config. |
| ai/cmd/rag.go | Removed legacy rag CLI. |
| ai/cmd/index.go | Added new indexing CLI aligned with new RAG config model. |
| ai/agent/react/react_test.go | Removed legacy react integration tests. |
| ai/.env.example | Expanded env example, including SCHEMA_DIR and provider keys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| func parseFlags() *IndexCommand { | ||
| cmd := &IndexCommand{} | ||
|
|
||
| flag.StringVar(&cmd.Directory, "dir", "/Users/liwener/programming/ospp/dubbo-admin/ai/reference/k8s_docs/concepts", "Directory to index (required)") |
There was a problem hiding this comment.
The -dir flag default is a developer-specific absolute path ("/Users/liwener/..."), which will break for other environments. Prefer an empty default (and require the flag) or a repo-relative path (e.g., "./reference/..."), and document it in the usage string.
| flag.StringVar(&cmd.Directory, "dir", "/Users/liwener/programming/ospp/dubbo-admin/ai/reference/k8s_docs/concepts", "Directory to index (required)") | |
| flag.StringVar(&cmd.Directory, "dir", "", "Directory to index (required, no default)") |
| loader := appconfig.NewLoader("config.yaml") | ||
| componentCfg, err := loader.LoadComponent(configPath) | ||
| if err != nil { |
There was a problem hiding this comment.
loadRAGConfig initializes the Loader with config.NewLoader("config.yaml"), so schema/config resolution becomes dependent on the current working directory rather than the provided configPath. This makes the CLI fragile when run outside the ai/ directory. Consider constructing the Loader with a config file path rooted at filepath.Dir(configPath) (or using absolute paths) so relative component paths and default schema path resolve reliably.
| loader := appconfig.NewLoader("config.yaml") | |
| componentCfg, err := loader.LoadComponent(configPath) | |
| if err != nil { | |
| // Resolve the provided config path to an absolute path so that loading | |
| // does not depend on the current working directory. | |
| absConfigPath, err := filepath.Abs(configPath) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to resolve absolute path for config %s: %w", configPath, err) | |
| } | |
| // Construct the loader's config file path relative to the config's directory. | |
| baseDir := filepath.Dir(absConfigPath) | |
| loaderConfigPath := filepath.Join(baseDir, "config.yaml") | |
| loader := appconfig.NewLoader(loaderConfigPath) | |
| componentCfg, err := loader.LoadComponent(absConfigPath) | |
| if err != nil { |
| - `main.schema.json`: root [`config.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/config.yaml) | ||
| - `logger.schema.json`: [`component/logger/logger.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/logger/logger.yaml) | ||
| - `memory.schema.json`: [`component/memory/memory.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/memory/memory.yaml) | ||
| - `models.schema.json`: [`component/models/models.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/models/models.yaml) | ||
| - `tools.schema.json`: [`component/tools/tools.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/tools/tools.yaml) | ||
| - `server.schema.json`: [`component/server/server.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/server/server.yaml) | ||
| - `rag.schema.json`: [`component/rag/rag.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/rag/rag.yaml) | ||
| - `agent.schema.json`: [`component/agent/agent.yaml`](/Users/liwener/.codex/worktrees/acbb/dubbo-admin/ai/component/agent/agent.yaml) |
There was a problem hiding this comment.
This README links to schema/config files using absolute local filesystem paths under /Users/..., which won’t work for other contributors or in GitHub UI. Use repository-relative links (e.g., ../../config.yaml or /ai/config.yaml) instead.
| ```bash | ||
| cd /Users/liwener/programming/ospp/dubbo-admin/ai | ||
|
|
||
| # 设置API密钥 | ||
| export DASHSCOPE_API_KEY="your_qwen_api_key" | ||
|
|
||
| # 运行测试 | ||
| go test -v ./test/ -run TestTextGeneration | ||
| ``` |
There was a problem hiding this comment.
The guide uses absolute paths (e.g., cd /Users/.../ai) which are not portable. Prefer repo-relative commands (e.g., cd ai) so the documentation works for all contributors.
| // Start all loaded components | ||
| gloRuntime.Components.Range(func(key, value any) bool { | ||
| comp := value.(Component) | ||
| if err := comp.Start(); err != nil { | ||
| return false | ||
| } | ||
| return true | ||
| }) |
There was a problem hiding this comment.
Bootstrap starts components via sync.Map.Range, which has non-deterministic iteration order and discards the Start() error (the callback returns false but the error is not surfaced). This can leave the runtime partially started while main logs success. Consider starting components in the same deterministic order as initialization (e.g., iterate the instances slice) and return the first start error from Bootstrap (and/or track started components for rollback).
| // Initialize components in dependency order, which is the order of factory registration. | ||
| for _, comp := range instances { | ||
| if err := comp.Validate(); err != nil { | ||
| return nil, fmt.Errorf("failed to validate %s: %w", comp.Name(), err) | ||
| } | ||
|
|
||
| if err := comp.Init(gloRuntime); err != nil { | ||
| return nil, fmt.Errorf("failed to init %s: %w", comp.Name(), err) | ||
| } | ||
| gloRuntime.Components.Store(comp.Name(), comp) | ||
| } |
There was a problem hiding this comment.
Components are stored in the runtime by comp.Name(). Since many components return a fixed name (e.g., "agent"), this will overwrite entries when components.<name> in config.yaml is an array (Loader generates names like agent-0, agent-1). To support multiple instances, store components under the config key (name from LoadedConfig.Components) and/or allow components to be instantiated with a unique name.
| // applyDefaults recursively applies default values from schema to value | ||
| // Modifies value in-place | ||
| func applyDefaults(root map[string]any, schema map[string]any, value any) { | ||
| resolved := resolveSchemaRef(root, schema) | ||
|
|
||
| switch v := value.(type) { | ||
| case map[string]any: | ||
| props, _ := resolved["properties"].(map[string]any) | ||
| for key, propVal := range props { | ||
| propSchema, ok := propVal.(map[string]any) | ||
| if !ok { | ||
| continue | ||
| } | ||
| propSchema = resolveSchemaRef(root, propSchema) | ||
|
|
||
| // Apply default value if property is missing | ||
| if _, exists := v[key]; !exists { | ||
| if defVal, hasDefault := propSchema["default"]; hasDefault { | ||
| v[key] = defVal | ||
| } | ||
| } | ||
|
|
||
| // Recursively apply defaults to nested properties | ||
| if child, exists := v[key]; exists { | ||
| applyDefaults(root, propSchema, child) | ||
| } | ||
| } | ||
|
|
||
| case []any: | ||
| if items, ok := resolved["items"].(map[string]any); ok { | ||
| items = resolveSchemaRef(root, items) | ||
| for i := range v { | ||
| applyDefaults(root, items, v[i]) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
applyDefaults only walks properties/items (and $ref) but does not handle schemas that use oneOf/anyOf/allOf (used by rag.schema.json for splitter.spec). This means schema defaults inside those branches won't be injected, and later semantic validation can fail even though the schema defines defaults. Consider extending default application to descend into composition keywords (e.g., choose a branch based on type discriminator when available, or apply defaults across all branches where safe).
| if err := pinecone.Index(ctx, batch, docstore, namespace); err != nil { | ||
| return nil, fmt.Errorf("failed to index documents batch %d-%d: %w", i+1, end, err) | ||
| } | ||
| } | ||
|
|
||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
PineconeIndexer.Store returns (nil, nil) after successful indexing. The indexer contract expects IDs for the stored documents (DevIndexer returns them), and returning nil can break callers that rely on IDs or interpret nil as “nothing stored”. Consider returning the input document IDs (or IDs returned by the backend if available).
| if r.retriever == nil { | ||
| r.retriever = make(map[string]ai.Retriever) | ||
| } | ||
| ret := r.retriever[targetIndex] | ||
| if ret != nil { |
There was a problem hiding this comment.
PineconeRetriever caches retrievers in a plain map without synchronization. getRetriever both reads and writes r.retriever, so concurrent Retrieve calls can race and panic. DevRetriever uses a mutex—PineconeRetriever should similarly guard map access (or use sync.Map).
| port: 8880 # Server port | ||
| host: "localhost" # Server host | ||
| debug: false # Debug mode |
There was a problem hiding this comment.
server.yaml sets host: "localhost", which binds only to loopback and can break container/K8s deployments where the service must listen on all interfaces. Consider using 0.0.0.0 (matching the schema default) unless there’s a specific reason to restrict binding.



This PR restructures the AI module around a component-based runtime, adds schema-driven configuration loading/validation, and aligns test coverage with the new architecture.
What Changed
ai/component/*and unified startup through a factory-based runtime.Validate -> Init -> Start).ai/schema/jsonand unified component config entrypoints viaai/config.yaml.SCHEMA_DIRand provider key usage).Breaking Changes
-config, defaultconfig.yaml) rather than legacy server flags.ai/config.yaml.Validation
go test -race -v ./config/... ./runtime/... ./component/... ./test/... ./cmd/...