Skip to content

Refactor Runtime Configuration with Pydantic for Cleaner and Scalable Pipelines #19

@kargibora

Description

@kargibora

Problem

generate_and_evaluate.py and evaluate.py are getting harder to maintain as more runtime options are added.

Right now, argument parsing, defaults, validation, and execution logic are all mixed together in the same flow. This makes the code harder to read, harder to test, and harder to extend without creating more coupling.

Proposal

I want to move runtime configuration to a Pydantic-based schema and use that as the single source of truth for run options.

Instead of directly relying on many flat CLI arguments throughout execution code, the CLI layer will map arguments into a typed RunConfig object. Pipeline code will consume this config object.

Why this matters

  • Keeps execution code focused on execution, not argument plumbing.
  • Centralizes validation and default handling.
  • Makes invalid configs fail early with clear errors.
  • Improves readability for contributors and reviewers.
  • Makes future additions safer (new options can be added in schema first, then used where needed).

Planned steps

  1. Introduce Pydantic config models (RunConfig + nested sections).
  2. Add a CLI adapter that converts CLI args to RunConfig.
  3. Refactor runtime entrypoints to use RunConfig directly.
  4. Add tests for config parsing, defaults, and validation.
  5. Keep behavior and outputs unchanged during this refactor.
  6. Add pydantic to dependencies.

Expected impact on execution

This is a structural refactor, not a behavior change.

The generation/evaluation flow should remain the same. The main change is how runtime options are represented and validated before execution starts.

Follow-up ideas

After this lands, we can add config file support:

  • JSON config loading first
  • YAML support later
  • CLI overrides on top of config file values

Example (before vs after in code)

Current style (flat args used everywhere):

def main(args):
    if args.swap_mode == "both":
        ...
    judge = make_model(
        model=args.judge_model,
        max_tokens=args.max_out_tokens_judge,
        max_model_len=args.max_model_len,
    )

Proposed style (typed config object):

def main(config: RunConfig):
    if config.judge.swap_mode == "both":
        ...
    judge = make_model(
        model=config.models.judge_model,
        max_tokens=config.judge.max_out_tokens,
        max_model_len=config.runtime.max_model_len,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions