Skip to content

Incorporate Waza for standardized skill development and evaluation #58

@anchapin

Description

@anchapin

Incorporate Waza for standardized skill development and evaluation

Summary

This issue proposes integrating Waza (Microsoft's CLI/framework for agent skills) into openstudio-mcp to standardize skill creation, evaluation, and improvement processes. This will enhance skill quality, enable automated validation, and provide measurable improvements in our skill ecosystem.

Benefits of Incorporating Waza

1. Standardized Skill Creation

  • Consistent skill structure with proper YAML frontmatter
  • Automatic scaffolding of skills and evaluation suites
  • Reduced onboarding friction for new contributors

2. Automated Evaluation & Validation

  • Generate test cases from skill definitions
  • Run standardized evaluations with multiple models
  • CI/CD integration for automated skill validation
  • Skill readiness checking (compliance, token budget, spec adherence)

3. Quality Improvement

  • LLM-as-judge quality assessment across multiple dimensions
  • Iterative skill improvement with Copilot suggestions
  • Token optimization for MCP host constraints
  • Comparative analysis across models and versions

4. Enhanced Collaboration

  • Standardized reporting and visualization
  • Historical tracking of evaluation results
  • Cloud storage for team result sharing
  • Session logging for debugging complex interactions

5. Data-Driven Development

  • Metrics on skill performance over time
  • Coverage analysis of skills vs. evaluations
  • Regression detection through result comparison
  • Evidence-based skill improvement decisions

Detailed Implementation Plan

Phase 1: Foundation & Pilot Skill (Weeks 1-2)

  • Install waza CLI in development environment
  • Select pilot skill (e.g., add-hvac) for initial integration
  • Migrate pilot skill to waza-standardized format:
    • Use waza new skill add-hvac to create standardized structure
    • Generate evaluation suite with waza new eval add-hvac
    • Migrate existing workflow instructions to SKILL.md frontmatter
  • Establish baseline evaluation with waza run
  • Document pilot process for team adoption

Phase 2: Tooling & CI Integration (Weeks 3-4)

  • Create .waza.yaml configuration file for openstudio-mcp
  • Add waza evaluation to GitHub Actions workflow:
    - name: Run skill evaluations
      run: waza run evals/<skill-name>/eval.yaml --format github-comment
    - name: Check skill readiness
      run: waza check skills/<name> || exit 1
    - name: Token budget validation
      run: waza tokens compare main --skills --threshold 10 --strict --format json
  • Integrate skill quality checks into PR validation
  • Set up result storage configuration (local initially, optional cloud)
  • Create contributor documentation for waza workflows

Phase 3: Full Migration & Advanced Features (Weeks 5-6)

  • Migrate all existing skills to waza format using systematic approach:
    • For each skill: waza new eval <skill-name> to generate evaluation
    • Manually transfer workflow content to SKILL.md
    • Verify equivalence with existing eval.md files
  • Implement advanced features:
    • Session logging for complex skill debugging
    • Cross-model comparison capabilities
    • Skill coverage analysis with waza coverage
    • Token optimization suggestions
  • Establish skill quality baseline metrics

Phase 4: Optimization & Governance (Ongoing)

  • Regular skill health checks via automated workflows
  • Continuous improvement based on evaluation results
  • Community contribution guidelines updated with waza processes
  • Periodic review of waza configuration and thresholds
  • Knowledge sharing sessions on effective skill development

Success Metrics

  • Reduction in skill creation time for new contributors
  • Increase in skill evaluation pass rates
  • Decrease in token usage per skill while maintaining functionality
  • Improved consistency in skill structure and documentation
  • Faster identification of skill regressions through automated testing
  • Enhanced contributor satisfaction with standardized processes

Dependencies & Considerations

  • Requires Go 1.26+ for waza installation (already met in dev environment)
  • Need to adapt waza templates to match openstudio-mcp's specific skill format
  • Initial time investment for skill migration (estimated 15-30 mins per skill)
  • Training required for team on new workflows
  • Optional Azure storage configuration for team result sharing (can start local)

Next Steps for Discussion

  1. Confirm interest in pursuing this integration
  2. Select pilot skill for initial implementation
  3. Determine timeline and resource allocation
  4. Decide on cloud storage configuration preferences
  5. Establish review process for migrated skills

This implementation would transform openstudio-mcp's skill development from informal practices to a standardized, measurable, and continuously improvable process aligned with industry best practices for AI agent skill development.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions