Azure · alex3267006 · Oct 2, 2025 · Oct 2, 2025 · Oct 3, 2025 · Oct 3, 2025
@@ -4,32 +4,41 @@ Azure CLI AKS Agent Extension
 Introduction
 ============
 
-The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that
-helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language
-Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer
-natural-language questions about your cluster (for example, "Why are my pods not starting?")
-and can investigate issues in both interactive and non-interactive (batch) modes.
+
+The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer natural-language questions about your cluster (for example, "Why are my pods not starting?") and can investigate issues in both interactive and non-interactive (batch) modes.
+
+New in this version: **az aks agent-init** command for easy LLM model configuration!
+
+You can now use `az aks agent-init` to interactively add and configure LLM models before asking questions. This command guides you through the setup process, allowing you to add multiple models as needed. When asking questions with `az aks agent`, you can:
+
+- Use `--config-file` to specify your own model configuration file
+- Use `--model` to select a previously configured model
+- If neither is provided, the last configured LLM will be used by default
+
+This makes it much easier to manage and switch between multiple models for your AKS troubleshooting workflows.
 
 Key capabilities
 ----------------
 
+
 - Interactive and non-interactive modes (use --no-interactive for batch runs).
-- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via environment variables.
-- Configurable via a JSON/YAML config file provided with --config-file.
+- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via interactive configuration.
+- **Easy model setup with `az aks agent-init`**: interactively add and configure LLM models, run multiple times to add more models.
+- Configurable via a JSON/YAML config file provided with --config-file, or select a model with --model.
+- If no config or model is specified, the last configured LLM is used automatically.
 - Control echo and tool output visibility with --no-echo-request and --show-tool-output.
 - Refresh the available toolsets with --refresh-toolsets.
 - Stay in traditional toolset mode by default, or opt in to aks-mcp integration with ``--aks-mcp`` when you need the enhanced capabilities.
 
 Prerequisites
 -------------
-
-Before using the agent, make sure provider-specific environment variables are set. For
-example, Azure OpenAI typically requires AZURE_API_BASE, AZURE_API_VERSION, and AZURE_API_KEY,
-while OpenAI requires OPENAI_API_KEY. For more details about supported providers and required
+No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init`.
+For more details about supported model providers and required
 variables, see: https://docs.litellm.ai/docs/providers
 
+
 Quick start and examples
-========================
+=========================
 
 Install the extension
 ---------------------
@@ -38,25 +47,58 @@ Install the extension
 
     az extension add --name aks-agent
 
-Run the agent (Azure OpenAI example)
+Configure LLM models interactively
+----------------------------------
+
+.. code-block:: bash
+
+    az aks agent-init
+
+This command will guide you through adding a new LLM model. You can run it multiple times to add more models or update existing models. All configured models are saved locally and can be selected when asking questions.
+
+Run the agent (Azure OpenAI example) :
 -----------------------------------
 
+**1. Use the last configured model (no extra parameters needed):**
+
 .. code-block:: bash
 
-    export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
-    export AZURE_API_VERSION="2025-01-01-preview"
-    export AZURE_API_KEY="sk-xxx"
+    az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
+
+**2. Specify a particular model you have configured:**
+
+.. code-block:: bash
 
     az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
 
+**3. Use a custom config file:**
+
+.. code-block:: bash
+
+    az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
+
+
 Run the agent (OpenAI example)
 ------------------------------
 
+**1. Use the last configured model (no extra parameters needed):**
+
 .. code-block:: bash
 
-    export OPENAI_API_KEY="sk-xxx"
+    az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
+
+**2. Specify a particular model you have configured:**
+
+.. code-block:: bash
+
     az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
 
+**3. Use a custom config file:**
+
+.. code-block:: bash
+
+    az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml
+
 Run in non-interactive batch mode
 ---------------------------------
 

@@ -16,7 +16,7 @@
     short-summary: Run AI assistant to analyze and troubleshoot Kubernetes clusters.
     long-summary: |-
       This command allows you to ask questions about your Azure Kubernetes cluster and get answers using AI models.
-      Environment variables must be set to use the AI model, please refer to https://docs.litellm.ai/docs/providers to learn more about supported AI providers and models and required environment variables.
+      No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init` or via a config file.
     parameters:
         - name: --name -n
           type: string
@@ -36,7 +36,7 @@
             Note: For Azure OpenAI, it is recommended to set the deployment name as the model name until https://github.com/BerriAI/litellm/issues/13950 is resolved.
         - name: --api-key
           type: string
-          short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
+          short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY). (Deprecated)
         - name: --config-file
           type: string
           short-summary: Path to configuration file.
@@ -63,23 +63,25 @@
           short-summary: Enable AKS MCP integration for enhanced capabilities. Traditional mode is the default.
 
     examples:
+        - name: Ask about pod issues in the cluster with last configured model
+          text: |-
+            az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
         - name: Ask about pod issues in the cluster with Azure OpenAI
           text: |-
-            export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
-            export AZURE_API_VERSION="2025-01-01-preview"
-            export AZURE_API_KEY="sk-xxx"
             az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/gpt-4.1
         - name: Ask about pod issues in the cluster with OpenAI
           text: |-
-            export OPENAI_API_KEY="sk-xxx"
             az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
         - name: Run agent with config file
           text: |
             az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --name MyManagedCluster --resource-group MyResourceGroup
             Here is an example of config file:
             ```json
-            model: "azure/gpt-4.1"
-            api_key: "..."
+            llms:
+              - provider: "azure"
+                MODEL_NAME: "gpt-4.1"
+                AZURE_API_BASE: "https://<your-base-url>"
+                AZURE_API_KEY: "<your-api-key>"
             # define a list of mcp servers, mcp server can be defined
             mcp_servers:
               aks_mcp:
@@ -131,3 +133,16 @@
         - name: Refresh toolsets to get the latest available tools
           text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
 """
+
+helps[
+    "aks agent-init"
+] = """
+    type: command
+    short-summary: Initialize and validate LLM provider/model configuration for AKS agent.
+    long-summary: |-
+      This command interactively guides you to select an LLM provider and model, validates the connection, and saves the configuration for later use.
+      You can run this command multiple times to add or update different model configurations.
+    examples:
+        - name: Initialize configuration for Azure OpenAI, OpenAI or other llms
+          text: az aks agent-init
+"""
@@ -360,6 +360,7 @@ async def _setup_mcp_mode(mcp_manager, config_file: str, model: str, api_key: st
 
     # Generate enhanced MCP config
     mcp_config_dict = ConfigurationGenerator.generate_mcp_config(base_config_dict, server_url)
+    mcp_config_dict.pop("llms", None)  # Remove existing llms to avoid conflicts
 
     # Create temporary config file with MCP settings
     with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:
@@ -723,6 +724,7 @@ def _setup_traditional_mode_sync(config_file: str, model: str, api_key: str,
 
     # Generate traditional config
     traditional_config_dict = ConfigurationGenerator.generate_traditional_config(base_config_dict)
+    traditional_config_dict.pop("llms", None)     # Remove existing llms to avoid conflicts
 
     # Create temporary config and load
     with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:

@@ -0,0 +1,91 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+
+import os
+from typing import List, Dict, Optional
+import yaml
+
+from azure.cli.core.api import get_config_dir
+from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME
+
+
+class LLMConfigManager:
+    """Manages loading and saving LLM configuration from/to a YAML file."""
+
+    def __init__(self, config_path=None):
+        if config_path is None:
+            config_path = os.path.join(
+                get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME)
+        self.config_path = os.path.expanduser(config_path)
+
+    def save(self, provider_name: str, params: dict):
+        configs = self.load()
+        if not isinstance(configs, Dict):
+            configs = {}
+
+        models = configs.get("llms", [])
+        model_name = params.get("MODEL_NAME")
+        if not model_name:
+            raise ValueError("MODEL_NAME is required to save configuration.")
+
+        # Check if model already exists, update it and move it to the last;
+        # otherwise, append new
+        models = [
+            cfg for cfg in models if not (
+                cfg.get("provider") == provider_name and cfg.get("MODEL_NAME") == model_name)]
+        models.append({"provider": provider_name, **params})
+
+        configs["llms"] = models
+
+        with open(self.config_path, "w") as f:
+            yaml.safe_dump(configs, f, sort_keys=False)
+
+    def load(self):
+        """Load configurations from the YAML file."""
+        if not os.path.exists(self.config_path):
+            return {}
+        with open(self.config_path, "r") as f:
+            configs = yaml.safe_load(f)
+            return configs if isinstance(configs, Dict) else {}
+
+    def get_list(self) -> List[Dict]:
+        """Get the list of all model configurations"""
+        return self.load()["llms"] if self.load(
+        ) and "llms" in self.load() else []
+
+    def get_latest(self) -> Optional[Dict]:
+        """Get the last model configuration"""
+        model_configs = self.get_list()
+        if model_configs:
+            return model_configs[-1]
+        raise ValueError(
+            "No configurations found. Please run `az aks agent-init`")
+
+    def get_specific(
+            self,
+            provider_name: str,
+            model_name: str) -> Optional[Dict]:
+        """
+        Get specific model configuration by provider and model name during Q&A with --model provider/model
+        """
+        model_configs = self.get_list()
+        for cfg in model_configs:
+            if cfg.get("provider") == provider_name and cfg.get(
+                    "MODEL_NAME") == model_name:
+                return cfg
+        raise ValueError(
+            f"No configuration found for provider '{provider_name}' with model '{model_name}'. "
+            f"Please run `az aks agent-init`")
+
+    def is_config_complete(self, config, provider_schema):
+        """
+        Check if the given config has all required keys and valid values as per the provider schema.
+        """
+        for key, meta in provider_schema.items():
+            if meta.get("validator") and not meta["validator"](
+                    config.get(key)):
+                return False
+        return True
@@ -0,0 +1,77 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See License.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+from typing import List, Tuple
+from .base import LLMProvider
+from .azure_provider import AzureProvider
+from .openai_provider import OpenAIProvider
+from .anthropic_provider import AnthropicProvider
+from .gemini_provider import GeminiProvider
+from .openai_compatiable_provider import OpenAICompatiableProvider
+
+
+_PROVIDER_CLASSES: List[LLMProvider] = [
+    AzureProvider,
+    OpenAIProvider,
+    AnthropicProvider,
+    GeminiProvider,
+    OpenAICompatiableProvider,
+    # Add new providers here
+]
+
+PROVIDER_REGISTRY = {}
+for cls in _PROVIDER_CLASSES:
+    key = cls.name.lower()
+    if key not in PROVIDER_REGISTRY:
+        PROVIDER_REGISTRY[key] = cls
+
+
+def _available_providers() -> List[str]:
+    """Return a list of registered provider names (lowercase): ["azure", "openai", ...]"""
+    return list(PROVIDER_REGISTRY.keys())
+
+
+def _provider_choices_numbered() -> List[Tuple[int, str]]:
+    """Return numbered choices: [(1, "azure"), (2, "openai"), ...]."""
+    return [(i + 1, name) for i, name in enumerate(_available_providers())]
+
+
+def _get_provider_by_index(idx: int) -> LLMProvider:
+    """
+    Return provider instance by numeric index (1-based).
+    Raises ValueError if index is out of range.
+    """
+    if 1 <= idx <= len(_PROVIDER_CLASSES):
+        print("You selected provider:", _PROVIDER_CLASSES[idx - 1].name)
+        return _PROVIDER_CLASSES[idx - 1]()
+    raise ValueError(f"Invalid provider index: {idx}")
+
+
+def prompt_provider_choice() -> LLMProvider:
+    """
+    Show a numbered menu and return the chosen provider instance.
+    Keeps prompting until a valid selection is made.
+    """
+    choices = _provider_choices_numbered()
+    if not choices:
+        raise ValueError("No providers are registered.")
+    while True:
+        for idx, name in choices:
+            print(f" {idx}. {name}")
+        sel_idx = input("Enter the number of your choice: ").strip().lower()
+
+        if sel_idx == "/exit":
+            raise SystemExit(0)
+        try:
+            return _get_provider_by_index(int(sel_idx))
+        except ValueError as e:
+            print(
+                f"Invalid input: {e}. Please enter a valid number, or type /exit to quit.")
+
+
+__all__ = [
+    "PROVIDER_REGISTRY",
+    "prompt_provider_choice",
+]