Sub topic #43

kohankhaki · 2025-11-03T17:11:50Z

PR Type
Feature

Short Description

This pull request introduces a new standalone pipeline for generating diverse evaluation tasks for a single capability, using a multi-dimensional approach (content, difficulty, reasoning) and leveraging LLMs for automated generation, blueprinting, and verification. The changes include a detailed configuration file, new dataclass definitions, modular pipeline scripts, and improved documentation for agentic generation scripts.

Tests Added

…tputs, and updated corresponding output parser.

kohankhaki added 18 commits August 26, 2025 13:47

adding refactored task generation. updated prompts to ask for json ou…

f0cf760

…tputs, and updated corresponding output parser.

fixed retry, json processing, and max token.

06da910

Merge branch 'fix-anthropic-client' into agentic_task_gen

70d7b06

switichin to two phase task generation.

0ca1c22

switichin to two phase task generation. part 2.

396feac

updated agentic config and readme.

b166e4c

simplified task generations.

084b68c

simplified task generation.

c155d74

fixed mypy errors.

52b4d2a

ruff fix.

d1e1812

updated saved file name for solutions.

4d237f7

added extra details to agent solution messages.

38d825d

fixed prompts.

c5afb81

fixed output dir name to include area name.

9195b93

fixed task solver output dir name.

57d2d2a

upgraded json handling, and model call.

3292299

updated readme to include latest agentic changes.

df4860b

task diversity study scripts added.

653179b

kohankhaki marked this pull request as draft November 3, 2025 17:11

updated prompts, and combination logic.

fdbe570

kohankhaki marked this pull request as ready for review November 5, 2025 16:30

kohankhaki requested a review from afkanpour November 5, 2025 16:30

added resume, and retry.

9866baf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sub topic #43

Sub topic #43

Uh oh!

kohankhaki commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sub topic #43

Are you sure you want to change the base?

Sub topic #43

Uh oh!

Conversation

kohankhaki commented Nov 3, 2025

Short Description

Tests Added

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants