Skip to content

bugsbunny88/modaic-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What Modaic is today

Modaic is infrastructure for teams building AI systems with LLMs, often with DSPy. It focuses on:

  • Managing judges and evaluations for AI systems

    • Run LLM judges on your agents' outputs
    • Measure how good those judges actually are
    • Get confidence scores and calibration so you know when to trust them
  • Data & eval plumbing around those judges

    • Store and version your evaluation data
    • Run experiments and compare different prompts/agents/judges
    • Repeatable and production-grade

Built for teams building agents and DSPy programs who care about reliability, determinism, and safety, not just throwing prompts at an API.


Long-term direction

  1. The "reliability layer" for AI systems

    • Define what "good" means for an AI system, evaluate it continuously, and attach calibrated confidence scores and safety checks
    • Works with DSPy, but not limited to DSPy.
  2. A full evaluation & data-ops platform

    • Host and manage: datasets, eval suites, benchmarks, LLM judges + their training/fine-tuning
    • Automate: data labeling / relabeling with LLM judges, QA/QC loops on large, messy datasets
  3. Enterprise-grade, self-hostable AI infra

    • Same capabilities as a self-hosted / on-prem product: runs inside the company's own VPC or hardware, satisfies SOC2/HIPAA/compliance constraints
    • Priced as a high-value enterprise license.
  4. A bridge between cutting-edge AI and legacy industries

    • Start with AI-native companies as design partners, then take the same evaluation + reliability stack into finance, energy/heavy industry, legal, healthcare, compliance
    • Where they have a ton of data, regulation, and a need for "deterministic-feeling" AI.

One line:

Modaic is building the evaluation, confidence, and data infrastructure that lets serious teams ship AI agents and DSPy programs they can actually trust, and package it so enterprises can run it in production on their own data.


The CLI

modaic-cli is the operator interface for Modaic's reliability workflows. Teams and CI pipelines use it to run evaluations, manage judges, enforce quality gates, and ship programs to the Modaic Hub.

What exists today

Command group What it does
modaic program Load, save, inspect, and push precompiled DSPy programs
modaic hub Search the Modaic Hub, authenticate
modaic batch Submit, poll, and fetch results from multi-provider batch jobs (OpenAI, Anthropic, Azure, Vertex, Together, Fireworks)
modaic optimize Run prompt optimization (GEPA, bootstrap few-shot, vanilla few-shot)

All commands support --json for machine-readable output. Exit codes are stable and documented (0 success, 1 error, 2 usage, 3 auth, 4 not found, 5 network, 130 interrupt).

What we're building next

The CLI is evolving into a reliability-first command surface. Work is tracked in TODO.md and sequenced in four phases:

Phase 0 — Contract hardening. Unified JSON envelope (schema_version, ok, command, data, error, meta) across every command. Structured error payloads with stable codes and recovery hints. Conformance tests.

Phase 1 — Reliability commands. First-class gate, eval, judge, experiment, and dataset command groups. gate check enforces deterministic pass/fail thresholds for deployment decisions.

Phase 2 — Extension governance. Optional capability modules under modaic x (x optimize, x index). Audit artifact pipeline (.modaic/logs/<run_id>/ with versioned manifests and traces).

Phase 3 — Policy and ergonomics. Redaction/retention controls, destructive-action safety flags (--yes, --dry-run), --assist interactive mode, compatibility documentation.

Project structure

src/modaic_cli/
  cli/              CLI command layer (Typer). Thin; delegates to core.
    _app.py           Root app, registers command groups
    _errors.py        Exit codes, error handling decorator
    _output.py        JSON/human output helpers
    _data.py          Dataset loading (JSONL, Parquet, Arrow, HuggingFace, stdin)
    batch.py          Batch processing commands
    hub.py            Hub search and auth
    optimize.py       Optimization commands
    program.py        Program lifecycle commands
  modaic/           Core library
    batch/            Multi-provider async batch processing
    hub/              Hub API client, git sync, push workflows
    auto/             AutoProgram/AutoConfig dynamic class loading
    precompiled/      Precompiled program support
    programs/         Program registry and built-ins
    serializers/      DSPy and JSON serialization
    module_utils/     Introspection, import filtering, pyproject parsing
    exceptions/       Custom exception hierarchy
  gepa/             GEPA optimization integration
  optiglot/         Optiglot optimization framework (evaluator, predictors, optimizers)

Setup

uv sync --all-extras
uv run modaic --help
uv run pytest -q

Key conventions

  • Python 3.11+, managed with uv
  • Typer for CLI, Pydantic for validation
  • Lazy imports for heavy dependencies (dspy, datasets, gepa, torch) to keep startup fast
  • Strict linting: Ruff
  • Dual-mode output: every command works in both human-readable and --json mode
  • Exit codes are a stable contract; do not change their meanings

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages