Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 59 additions & 17 deletions src/aks-agent/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,41 @@ Azure CLI AKS Agent Extension
Introduction
============

The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that
helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language
Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer
natural-language questions about your cluster (for example, "Why are my pods not starting?")
and can investigate issues in both interactive and non-interactive (batch) modes.

The AKS Agent extension provides the "az aks agent" command, an AI-powered assistant that helps analyze and troubleshoot Azure Kubernetes Service (AKS) clusters using Large Language Models (LLMs). The agent combines cluster context, configurable toolsets, and LLMs to answer natural-language questions about your cluster (for example, "Why are my pods not starting?") and can investigate issues in both interactive and non-interactive (batch) modes.

New in this version: **az aks agent-init** command for easy LLM model configuration!

You can now use `az aks agent-init` to interactively add and configure LLM models before asking questions. This command guides you through the setup process, allowing you to add multiple models as needed. When asking questions with `az aks agent`, you can:

- Use `--config-file` to specify your own model configuration file
- Use `--model` to select a previously configured model
- If neither is provided, the last configured LLM will be used by default

This makes it much easier to manage and switch between multiple models for your AKS troubleshooting workflows.

Key capabilities
----------------


- Interactive and non-interactive modes (use --no-interactive for batch runs).
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via environment variables.
- Configurable via a JSON/YAML config file provided with --config-file.
- Support for multiple LLM providers (Azure OpenAI, OpenAI, etc.) via interactive configuration.
- **Easy model setup with `az aks agent-init`**: interactively add and configure LLM models, run multiple times to add more models.
- Configurable via a JSON/YAML config file provided with --config-file, or select a model with --model.
- If no config or model is specified, the last configured LLM is used automatically.
- Control echo and tool output visibility with --no-echo-request and --show-tool-output.
- Refresh the available toolsets with --refresh-toolsets.
- Stay in traditional toolset mode by default, or opt in to aks-mcp integration with ``--aks-mcp`` when you need the enhanced capabilities.

Prerequisites
-------------

Before using the agent, make sure provider-specific environment variables are set. For
example, Azure OpenAI typically requires AZURE_API_BASE, AZURE_API_VERSION, and AZURE_API_KEY,
while OpenAI requires OPENAI_API_KEY. For more details about supported providers and required
No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init`.
For more details about supported model providers and required
variables, see: https://docs.litellm.ai/docs/providers


Quick start and examples
========================
=========================

Install the extension
---------------------
Expand All @@ -38,25 +47,58 @@ Install the extension

az extension add --name aks-agent

Run the agent (Azure OpenAI example)
Configure LLM models interactively
----------------------------------

.. code-block:: bash

az aks agent-init

This command will guide you through adding a new LLM model. You can run it multiple times to add more models or update existing models. All configured models are saved locally and can be selected when asking questions.

Run the agent (Azure OpenAI example) :
-----------------------------------

**1. Use the last configured model (no extra parameters needed):**

.. code-block:: bash

export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
export AZURE_API_VERSION="2025-01-01-preview"
export AZURE_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup

**2. Specify a particular model you have configured:**

.. code-block:: bash

az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment

**3. Use a custom config file:**

.. code-block:: bash

az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml


Run the agent (OpenAI example)
------------------------------

**1. Use the last configured model (no extra parameters needed):**

.. code-block:: bash

export OPENAI_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup

**2. Specify a particular model you have configured:**

.. code-block:: bash

az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o

**3. Use a custom config file:**

.. code-block:: bash

az aks agent "Why are my pods not starting?" --config-file /path/to/your/model_config.yaml

Run in non-interactive batch mode
---------------------------------

Expand Down
31 changes: 23 additions & 8 deletions src/aks-agent/azext_aks_agent/_help.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
short-summary: Run AI assistant to analyze and troubleshoot Kubernetes clusters.
long-summary: |-
This command allows you to ask questions about your Azure Kubernetes cluster and get answers using AI models.
Environment variables must be set to use the AI model, please refer to https://docs.litellm.ai/docs/providers to learn more about supported AI providers and models and required environment variables.
No need to manually set environment variables! All model and credential information can be configured interactively using `az aks agent-init` or via a config file.
parameters:
- name: --name -n
type: string
Expand All @@ -36,7 +36,7 @@
Note: For Azure OpenAI, it is recommended to set the deployment name as the model name until https://github.com/BerriAI/litellm/issues/13950 is resolved.
- name: --api-key
type: string
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY). (Deprecated)
- name: --config-file
type: string
short-summary: Path to configuration file.
Expand All @@ -63,23 +63,25 @@
short-summary: Enable AKS MCP integration for enhanced capabilities. Traditional mode is the default.

examples:
- name: Ask about pod issues in the cluster with last configured model
text: |-
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup
- name: Ask about pod issues in the cluster with Azure OpenAI
text: |-
export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
export AZURE_API_VERSION="2025-01-01-preview"
export AZURE_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/gpt-4.1
- name: Ask about pod issues in the cluster with OpenAI
text: |-
export OPENAI_API_KEY="sk-xxx"
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
- name: Run agent with config file
text: |
az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --name MyManagedCluster --resource-group MyResourceGroup
Here is an example of config file:
```json
model: "azure/gpt-4.1"
api_key: "..."
llms:
- provider: "azure"
MODEL_NAME: "gpt-4.1"
AZURE_API_BASE: "https://<your-base-url>"
AZURE_API_KEY: "<your-api-key>"
# define a list of mcp servers, mcp server can be defined
mcp_servers:
aks_mcp:
Expand Down Expand Up @@ -131,3 +133,16 @@
- name: Refresh toolsets to get the latest available tools
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
"""

helps[
"aks agent-init"
] = """
type: command
short-summary: Initialize and validate LLM provider/model configuration for AKS agent.
long-summary: |-
This command interactively guides you to select an LLM provider and model, validates the connection, and saves the configuration for later use.
You can run this command multiple times to add or update different model configurations.
examples:
- name: Initialize configuration for Azure OpenAI, OpenAI or other llms
text: az aks agent-init
"""
2 changes: 2 additions & 0 deletions src/aks-agent/azext_aks_agent/agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,7 @@ async def _setup_mcp_mode(mcp_manager, config_file: str, model: str, api_key: st

# Generate enhanced MCP config
mcp_config_dict = ConfigurationGenerator.generate_mcp_config(base_config_dict, server_url)
mcp_config_dict.pop("llms", None) # Remove existing llms to avoid conflicts

# Create temporary config file with MCP settings
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:
Expand Down Expand Up @@ -723,6 +724,7 @@ def _setup_traditional_mode_sync(config_file: str, model: str, api_key: str,

# Generate traditional config
traditional_config_dict = ConfigurationGenerator.generate_traditional_config(base_config_dict)
traditional_config_dict.pop("llms", None) # Remove existing llms to avoid conflicts

# Create temporary config and load
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as temp_file:
Expand Down
91 changes: 91 additions & 0 deletions src/aks-agent/azext_aks_agent/agent/llm_config_manager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------


import os
from typing import List, Dict, Optional
import yaml

from azure.cli.core.api import get_config_dir
from azext_aks_agent._consts import CONST_AGENT_CONFIG_FILE_NAME


class LLMConfigManager:
"""Manages loading and saving LLM configuration from/to a YAML file."""

def __init__(self, config_path=None):
if config_path is None:
config_path = os.path.join(
get_config_dir(), CONST_AGENT_CONFIG_FILE_NAME)
self.config_path = os.path.expanduser(config_path)

def save(self, provider_name: str, params: dict):
configs = self.load()
if not isinstance(configs, Dict):
configs = {}

models = configs.get("llms", [])
model_name = params.get("MODEL_NAME")
if not model_name:
raise ValueError("MODEL_NAME is required to save configuration.")

# Check if model already exists, update it and move it to the last;
# otherwise, append new
models = [
cfg for cfg in models if not (
cfg.get("provider") == provider_name and cfg.get("MODEL_NAME") == model_name)]
models.append({"provider": provider_name, **params})

configs["llms"] = models

with open(self.config_path, "w") as f:
yaml.safe_dump(configs, f, sort_keys=False)

def load(self):
"""Load configurations from the YAML file."""
if not os.path.exists(self.config_path):
return {}
with open(self.config_path, "r") as f:
configs = yaml.safe_load(f)
return configs if isinstance(configs, Dict) else {}

def get_list(self) -> List[Dict]:
"""Get the list of all model configurations"""
return self.load()["llms"] if self.load(
) and "llms" in self.load() else []

def get_latest(self) -> Optional[Dict]:
"""Get the last model configuration"""
model_configs = self.get_list()
if model_configs:
return model_configs[-1]
raise ValueError(
"No configurations found. Please run `az aks agent-init`")

def get_specific(
self,
provider_name: str,
model_name: str) -> Optional[Dict]:
"""
Get specific model configuration by provider and model name during Q&A with --model provider/model
"""
model_configs = self.get_list()
for cfg in model_configs:
if cfg.get("provider") == provider_name and cfg.get(
"MODEL_NAME") == model_name:
return cfg
raise ValueError(
f"No configuration found for provider '{provider_name}' with model '{model_name}'. "
f"Please run `az aks agent-init`")

def is_config_complete(self, config, provider_schema):
"""
Check if the given config has all required keys and valid values as per the provider schema.
"""
for key, meta in provider_schema.items():
if meta.get("validator") and not meta["validator"](
config.get(key)):
return False
return True
77 changes: 77 additions & 0 deletions src/aks-agent/azext_aks_agent/agent/llm_providers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See License.txt in the project root for license information.
# --------------------------------------------------------------------------------------------

from typing import List, Tuple
from .base import LLMProvider
from .azure_provider import AzureProvider
from .openai_provider import OpenAIProvider
from .anthropic_provider import AnthropicProvider
from .gemini_provider import GeminiProvider
from .openai_compatiable_provider import OpenAICompatiableProvider
Copy link
Preview

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'compatiable' to 'compatible'.

Copilot uses AI. Check for mistakes.



_PROVIDER_CLASSES: List[LLMProvider] = [
AzureProvider,
OpenAIProvider,
AnthropicProvider,
GeminiProvider,
OpenAICompatiableProvider,
# Add new providers here
]

PROVIDER_REGISTRY = {}
for cls in _PROVIDER_CLASSES:
key = cls.name.lower()
if key not in PROVIDER_REGISTRY:
PROVIDER_REGISTRY[key] = cls


def _available_providers() -> List[str]:
"""Return a list of registered provider names (lowercase): ["azure", "openai", ...]"""
return list(PROVIDER_REGISTRY.keys())


def _provider_choices_numbered() -> List[Tuple[int, str]]:
"""Return numbered choices: [(1, "azure"), (2, "openai"), ...]."""
return [(i + 1, name) for i, name in enumerate(_available_providers())]


def _get_provider_by_index(idx: int) -> LLMProvider:
"""
Return provider instance by numeric index (1-based).
Raises ValueError if index is out of range.
"""
if 1 <= idx <= len(_PROVIDER_CLASSES):
print("You selected provider:", _PROVIDER_CLASSES[idx - 1].name)
return _PROVIDER_CLASSES[idx - 1]()
raise ValueError(f"Invalid provider index: {idx}")


def prompt_provider_choice() -> LLMProvider:
"""
Show a numbered menu and return the chosen provider instance.
Keeps prompting until a valid selection is made.
"""
choices = _provider_choices_numbered()
if not choices:
raise ValueError("No providers are registered.")
while True:
for idx, name in choices:
print(f" {idx}. {name}")
sel_idx = input("Enter the number of your choice: ").strip().lower()

if sel_idx == "/exit":
raise SystemExit(0)
try:
return _get_provider_by_index(int(sel_idx))
except ValueError as e:
print(
f"Invalid input: {e}. Please enter a valid number, or type /exit to quit.")


__all__ = [
"PROVIDER_REGISTRY",
"prompt_provider_choice",
]
Loading
Loading