Skip to content

Add caption composer for refined annotations#61

Draft
KarlDeck wants to merge 2 commits into
mainfrom
KarlDeck/Refiner
Draft

Add caption composer for refined annotations#61
KarlDeck wants to merge 2 commits into
mainfrom
KarlDeck/Refiner

Conversation

@KarlDeck

Copy link
Copy Markdown
Collaborator

Add caption composer for refined annotations

♻️ Current situation & Problem

This PR adds a composition step on top of the existing annotation pipeline.

Right now, the pipeline can generate multiple extracted annotations for a recording, but there was no built-in way to turn those into more natural-sounding refined annotations for quick inspection. The initial implementation also needed to preserve local window information rather than collapsing all annotations into one global summary.

No linked issue yet.

⚙️ Release Notes

  • Add a new Composer that distills existing annotations into more coherent refined annotations.
  • Preserve annotation scope by composing global annotations globally and windowed annotations within their original time window.
  • Extend Captionizer to optionally append or replace source annotations with composed refined annotations.
  • Add a local smoke-test script for trying the composer on a sampled daily or weekly record with a local model.

Example usage:

from captionizer import Captionizer
from composer import Composer

captionizer = Captionizer(
    dataset,
    transformer,
    annotator,
    composer=Composer(model),
)

result, _ = captionizer.run(
    max_rows=1,
    replace_annotations_with_refined=False,
)

Local test script:

conda run -n AML python scripts/test_composer_local.py --index 0

📚 Documentation

This PR introduces:

  • composer.py with the new Composer class
  • Captionizer(..., composer=Composer(model))
  • replace_annotations_with_refined in Captionizer.run(...)
  • scripts/test_composer_local.py for local manual validation

The composition prompt now explicitly performs two internal steps in a single model call:

  1. Distill duplicate and overlapping facts.
  2. Compose the distilled facts into a natural annotation.

Windowed annotations are grouped by their exact window, so composed local annotations retain their temporal localization.

✅ Testing

Test coverage added for the new composition behavior:

  • tests/test_composer.py

Validated with:

  • conda run -n AML python -m pytest tests/test_composer.py
  • py_compile on:
    • composer.py
    • captionizer.py
    • scripts/test_composer_local.py
    • tests/test_composer.py

The local smoke-test script is intended for manual model-backed inspection on sampled records and is not exercised in automated tests because it depends on a locally available model runtime.

Code of Conduct & Contributing Guidelines

By creating and submitting this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e390329a-8e7a-4a82-bb80-0eb82f9dee9c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch KarlDeck/Refiner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant