Skip to content

Conversation

@kohankhaki
Copy link
Collaborator

PR Type
Feature

Short Description

This pull request introduces a new standalone pipeline for generating diverse evaluation tasks for a single capability, using a multi-dimensional approach (content, difficulty, reasoning) and leveraging LLMs for automated generation, blueprinting, and verification. The changes include a detailed configuration file, new dataclass definitions, modular pipeline scripts, and improved documentation for agentic generation scripts.

Tests Added

@kohankhaki kohankhaki marked this pull request as draft November 3, 2025 17:11
@kohankhaki kohankhaki marked this pull request as ready for review November 5, 2025 16:30
@kohankhaki kohankhaki requested a review from afkanpour November 5, 2025 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants