Skip to content

Conversation

@AlanPonnachan
Copy link

@AlanPonnachan AlanPonnachan commented Sep 19, 2025

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Description

This PR introduces prompt caching telemetry for the AWS Bedrock Converse and Converse Stream APIs, bringing feature parity with the existing invoke_model instrumentation.

The Converse API reports caching information in the usage field of the response body, rather than through HTTP headers. This implementation adds the necessary logic to parse this information and record it as metrics and span attributes.

Changes include:

  1. New function prompt_caching_converse_handling in prompt_caching.py to extract cache_read_input_tokens and cache_creation_input_tokens from the response body.
  2. Integration into __init__.py: The new function is now called from _handle_converse and _handle_converse_stream to process caching data for both standard and streaming calls.
  3. New Test File: Added test_bedrock_converse_prompt_caching_metrics.py to validate that the gen_ai.prompt.caching metric is correctly emitted for the Converse API.

Fixes #3337


Important

Adds prompt caching telemetry for AWS Bedrock Converse APIs, including new function for caching data extraction and corresponding tests.

  • Behavior:
    • Adds prompt_caching_converse_handling in prompt_caching.py to extract caching data from Converse API response body.
    • Integrates prompt_caching_converse_handling into _handle_converse and _handle_converse_stream in __init__.py.
  • Testing:
    • Adds test_bedrock_converse_prompt_caching_metrics.py to validate gen_ai.prompt.caching metric emission for Converse API.
  • Misc:

This description was created by Ellipsis for 4fa3792. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Added prompt-caching detection for Bedrock Converse (streaming and non‑streaming), recording cache read/write state and token-level usage into telemetry and metrics.
  • Tests

    • Added tests validating cache write and cache read scenarios for Converse, including span attributes and aggregated token metrics verification.
    • Added a test cassette covering a Converse error response to exercise authentication/failure handling.

@CLAassistant
Copy link

CLAassistant commented Sep 19, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 4fa3792 in 2 minutes and 2 seconds. Click for details.
  • Reviewed 156 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py:359
  • Draft comment:
    Good integration of prompt_caching_converse_handling in _handle_converse. In the streaming handler (lines ~400), note that if both read and write tokens are present, the span attribute may be overwritten. Ensure this is the intended behavior.
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py:56
  • Draft comment:
    The test correctly validates prompt caching metrics for Converse API. The cumulative workaround for metric values indicates the underlying counter is cumulative. Consider resetting metrics between tests to avoid cross-test interference if possible.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts: 1) An observation about the cumulative nature of the metrics which is already documented in the code comments, and 2) A speculative suggestion about resetting metrics that isn't clearly necessary since the current approach works. The comment doesn't identify any actual problems or required changes. The suggestion about resetting metrics could be valid if there's evidence of cross-test interference, but we don't see any such evidence. The current workaround seems intentional and functional. Since the current approach is working and documented, and there's no evidence of actual problems, the suggestion is more speculative than necessary. Delete the comment as it's primarily informative/observational and makes a speculative suggestion without clear evidence of need for change.

Workflow ID: wflow_bNUeXv3pUdPPxbhz

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

…nstrumentation/bedrock/prompt_caching.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
@nirga nirga changed the title feat(bedrock): Add prompt caching support for Converse API fix(bedrock): Add prompt caching support for Converse API Sep 19, 2025
Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @AlanPonnachan - looks like tests are failing, can you take a look?

@AlanPonnachan
Copy link
Author

Hi @nirga

I’ve resolved the lint test failures. The remaining failing test, test_prompt_cache_converse, is expected since it requires a VCR cassette to be recorded.

As I don’t have access to an active AWS account, I’m unable to generate the test_prompt_cache_converse.yaml cassette file myself. Would you be able to check out this branch, run the test and push the generated cassette file to this PR?

Thanks for your help!

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @AlanPonnachan, will do it - can you fix the small comment I wrote? I'll then run it locally and record a test. BTW - if you can rely on existing converse tests it might be easier

@AlanPonnachan
Copy link
Author

Thanks for the great suggestion and for your willingness to help record the test!

I agree that relying on an existing test is a cleaner approach. Before I push the changes, I just want to confirm my plan sounds good to you.

Here is what I am planning to do:

  1. Modify the Existing Test: I will update the test_titan_converse function in tests/traces/test_titan.py.
  2. Enable Caching: I'll add the additionalModelRequestFields with cacheControl to the existing brt.converse API call.
  3. Test Both Scenarios: I will add a second brt.converse call within that same test to ensure we cover both the initial "cache-write" and the subsequent "cache-read".
  4. Add Assertions: I will add the metric assertions I wrote to validate that the prompt caching counters are working correctly.
  5. Clean Up: Finally, I will delete the new test file I originally created (test_bedrock_converse_prompt_caching_metrics.py).

This will result in the cassette for test_titan_converse.yaml needing to be re-recorded, as you mentioned.

Does this plan look good? If so, I'll go ahead and make the changes.

@AlanPonnachan
Copy link
Author

@nirga any update on this?

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added response-based prompt caching handling for Bedrock Converse and invoked it from both synchronous and streaming Converse flows to record span attributes and emit cache read/write metrics.

Changes

Cohort / File(s) Summary
Instrumentation entrypoints
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
Imported prompt_caching_converse_handling and prompt_caching_handling; added calls to prompt_caching_converse_handling(...) in _handle_converse() and in the end-of-message path of _handle_converse_stream() after guardrail handling.
Prompt caching logic
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
Added prompt_caching_converse_handling(response, vendor, model, metric_params) which reads cache token counts from a response object (reads and creations), sets span attributes (CACHED) and gen_ai.usage.* token attributes, and emits cache read/write metrics when enabled.
Tests & cassettes
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py, packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml
Added test helpers (get_metric, assert_metric) and new test test_titan_converse_with_caching exercising a cache-write then cache-read Converse sequence and validating spans and metrics; added cassette for a Titan Converse interaction capturing an auth failure scenario. (Note: test block appears duplicated in the diff — verify.)

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BedrockInstrumentor
    participant BedrockAPI
    participant PromptCacheHandler
    participant Span
    participant Metrics

    Client->>BedrockInstrumentor: invoke_converse(...)
    BedrockInstrumentor->>BedrockAPI: call Converse
    BedrockAPI-->>BedrockInstrumentor: response (includes usage_metadata)

    BedrockInstrumentor->>PromptCacheHandler: prompt_caching_converse_handling(response, vendor, model, metric_params)

    rect #E8F4FF
      Note over PromptCacheHandler: extract cache_read/cache_creation from response\nset span attrs and gen_ai.usage.* token attributes
      PromptCacheHandler->>Span: set CACHED ("read"/"write") and token attrs
    end

    alt cache_read > 0
      PromptCacheHandler->>Metrics: emit cache read metric
    end
    alt cache_creation > 0
      PromptCacheHandler->>Metrics: emit cache write metric
    end

    BedrockInstrumentor-->>Client: return response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify extraction of cache fields from response usage metadata and alignment with actual Bedrock Converse response shapes.
  • Confirm span existence and is_recording checks (especially in streaming path) and that handler is invoked exactly once per final message.
  • Inspect duplicated test block in test_titan.py and the new cassette for correctness.

Suggested reviewers

  • doronkopit5

Poem

🐰 I hopped through tokens, counted each crumb,

cache-read, cache-write — now telemetry hums.
Spans wear badges of "read" or "write",
Metrics tally hops in morning light.
Bedrock Converse, cached and bright! 🎩✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix(bedrock): Add prompt caching support for Converse API" directly and clearly communicates the primary objective of the changeset. The title accurately reflects the main implementation effort across all modified files, which adds prompt caching telemetry support specifically for the Bedrock Converse and Converse Stream APIs. The title is concise, specific, and avoids vague terminology, making it clear to reviewers scanning the repository history what this change accomplishes.

[pre_merge_check_pass]

The PR implementation addresses the core coding requirements from issue #3337: a new function prompt_caching_converse_handling was added to extract cache metrics from response objects, it is properly integrated into both _handle_converse and _handle_converse_stream to record span attributes and metrics, and test code with helper functions was added to validate both cache-write and cache-read scenarios. The implementation correctly parses cache_read_input_tokens and cache_creation_input_tokens from the response body and emits them as span attributes and metrics, achieving feature parity with the existing invoke_model instrumentation. While the VCR cassette for the test requires re-recording with proper AWS credentials (currently showing a 403 error), the test code structure and assertions are properly implemented and awaiting cassette generation as noted in the PR comments.

[pre_merge_check_pass]

All code changes in this PR are directly aligned with the requirements outlined in issue #3337. The modifications to __init__.py add necessary imports and function calls for prompt caching handling, changes to prompt_caching.py implement the required caching extraction logic, additions to test_titan.py include test code and helpers to validate cache-write and cache-read flows, and the cassette file provides test fixtures. No unrelated modifications, refactorings, or changes to unrelated components are present in the changeset. |

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay @AlanPonnachan! I was OOO and this fell between the cracks (team missed it for some reason). I'm back now! Left a small comment reg. the test - lmk if you need any assistance

assert data_point.value == usage["cache_creation_input_tokens"]


@pytest.mark.vcr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlanPonnachan can you record a real test case here, like the other tests we have?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

59-85: Span attribute overwrite when both cache operations occur.

If a response contains both cache_read_input_tokens > 0 and cache_creation_input_tokens > 0, the second block (line 82) will overwrite the CACHED attribute set by the first block (line 68). This means only "write" will be recorded on the span, losing the "read" information.

Consider one of these approaches:

  • Accumulate both values in a single attribute (e.g., "read,write")
  • Use separate span attributes for read and write operations
  • Prioritize one operation type if simultaneous occurrence is expected to be rare

Apply this diff to record both operations:

     if read_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 read_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "read",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "read")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.read", True)
         span.set_attribute(
             "gen_ai.usage.cache_read_input_tokens", read_cached_tokens
         )
 
     if write_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 write_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "write",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "write")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.write", True)
         span.set_attribute(
             "gen_ai.usage.cache_creation_input_tokens", write_cached_tokens
         )
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

14-43: Consider adding is_recording check for consistency.

The new prompt_caching_converse_handling function includes an is_recording() check (line 52 in the new function), but this existing function doesn't. For consistency and to avoid unnecessary span operations when not recording, consider adding the same check here.

Apply this diff:

 def prompt_caching_handling(headers, vendor, model, metric_params):
     base_attrs = {
         "gen_ai.system": vendor,
         "gen_ai.response.model": model,
     }
     span = trace.get_current_span()
-    if not isinstance(span, trace.Span):
+    if not isinstance(span, trace.Span) or not span.is_recording():
         return
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b01af56 and 277499e.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (3)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
  • PromptCaching (470-472)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
  • CacheSpanAttrs (9-11)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)
  • test_context (35-47)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (2)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (2)
  • prompt_caching_converse_handling (46-85)
  • prompt_caching_handling (14-43)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py (1)
  • guardrail_converse (153-176)
🪛 Ruff (0.14.2)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

30-30: Create your own exception

(TRY002)


30-30: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

46-57: Good defensive programming with is_recording check.

The addition of span.is_recording() check and the conditional metric recording (if metric_params.prompt_caching:) are good defensive practices that prevent unnecessary work when telemetry is disabled.

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3)

25-28: LGTM: Clean import of prompt caching handlers.

The import statements are correctly structured and bring in both the new prompt_caching_converse_handling and the existing prompt_caching_handling functions.


357-361: LGTM: Proper integration in synchronous Converse handler.

The call to prompt_caching_converse_handling is correctly placed after guardrail_converse and receives the appropriate parameters (response, provider, model, metric_params). This mirrors the pattern used in _handle_call (line 332) for the invoke_model API.


399-406: LGTM: Correct metadata extraction for streaming events.

The metadata is properly extracted from the event before being passed to prompt_caching_converse_handling. This ensures that cache metrics are recorded when the final metadata event arrives in the stream.

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (2)

33-47: Verify metric assertion logic handles all scenarios.

The current assertion logic iterates through all data points and validates each one individually. However, it doesn't verify that the expected number of data points exist (e.g., exactly one for "read" and one for "write"). This could mask issues where duplicate or missing data points occur.

Consider adding explicit checks:

def assert_metric(reader, usage):
    metrics_data = reader.get_metrics_data()
    resource_metrics = metrics_data.resource_metrics
    assert len(resource_metrics) > 0

    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
    
    read_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "read"]
    write_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "write"]
    
    if usage["cache_read_input_tokens"] > 0:
        assert len(read_points) == 1
        assert read_points[0].value == usage["cache_read_input_tokens"]
    
    if usage["cache_creation_input_tokens"] > 0:
        assert len(write_points) == 1
        assert write_points[0].value == usage["cache_creation_input_tokens"]

50-69: Test structure is sound but VCR cassette needs recording.

The test correctly validates both cache creation (first call) and cache reading (second call) scenarios. The cumulative metric workaround at line 68 is necessary due to how the InMemoryMetricReader accumulates counter values across reads.

Note: As mentioned in the PR description, this test requires a VCR cassette to be recorded with actual AWS credentials.

Ensure the VCR cassette is recorded before merging, as the test will fail without it.

Comment on lines 24 to 30
def get_metric(resource_metrics, name):
for rm in resource_metrics:
for sm in rm.scope_metrics:
for metric in sm.metrics:
if metric.name == name:
return metric
raise Exception(f"No metric found with name {name}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use AssertionError for test helper.

The generic Exception should be replaced with AssertionError since this is a test assertion helper and test failures should propagate as assertion failures.

As per static analysis hints.

Apply this diff:

 def get_metric(resource_metrics, name):
     for rm in resource_metrics:
         for sm in rm.scope_metrics:
             for metric in sm.metrics:
                 if metric.name == name:
                     return metric
-    raise Exception(f"No metric found with name {name}")
+    raise AssertionError(f"No metric found with name {name}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_metric(resource_metrics, name):
for rm in resource_metrics:
for sm in rm.scope_metrics:
for metric in sm.metrics:
if metric.name == name:
return metric
raise Exception(f"No metric found with name {name}")
def get_metric(resource_metrics, name):
for rm in resource_metrics:
for sm in rm.scope_metrics:
for metric in sm.metrics:
if metric.name == name:
return metric
raise AssertionError(f"No metric found with name {name}")
🧰 Tools
🪛 Ruff (0.14.2)

30-30: Create your own exception

(TRY002)


30-30: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py
around lines 24 to 30, the helper raises a generic Exception when a metric is
not found; change it to raise an AssertionError (or use assert False) with the
same message so test failures surface as assertion failures. Update the raise
statement accordingly and keep the message f"No metric found with name {name}"
intact.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

16-22: Consider using a more specific exception type.

While the function logic is correct, using a generic Exception is flagged by Ruff. For test utilities, this is acceptable, but you could optionally create a custom exception class or use AssertionError for better clarity in test failures.

Apply this diff if you'd like to improve the exception handling:

-    raise Exception(f"No metric found with name {name}")
+    raise AssertionError(f"No metric found with name {name}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 277499e and 1965083.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
  • PromptCaching (470-472)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
  • CacheSpanAttrs (9-11)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)
  • test_context (35-47)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Ruff (0.14.2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)


22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM! Imports are appropriate for caching support.

The new imports are necessary for testing prompt caching functionality and are properly scoped.


25-46: LGTM! Metric assertion logic is sound.

The function correctly handles cumulative metrics by tracking both read and write data points separately and asserting the appropriate one based on the is_read parameter.


1076-1090: Verify VCR cassette recording.

This test requires a VCR cassette to be recorded with actual AWS credentials. According to the PR comments, you've requested that a maintainer with AWS access run and record this cassette. Ensure this is completed before merging.


1077-1130: Comprehensive caching test with good coverage.

The test thoroughly validates both cache write and read scenarios, including:

  • Usage token assertions from API responses
  • Span attribute validation for cache state
  • Metric emission verification

The test structure is clear and follows best practices.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1113-1114: Resolved: replaced hardcoded attr strings with semconv constants.

Good use of SpanAttributes for cache token attributes; previous review concern addressed.

Also applies to: 1121-1122

🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-14: Avoid depending on instrumentation internals in tests.

Importing PromptCaching and CacheSpanAttrs from instrumentation modules couples tests to internal paths. Prefer stable semconv constants or add local fallbacks for attribute keys to reduce breakage risk on refactors.


16-22: Use pytest.fail and drop bare Exception; simplify lookup.

Replace manual loops + bare Exception with next(...) and pytest.fail for clearer failures and to satisfy TRY002/TRY003.

 def get_metric(resource_metrics, name):
-    for rm in resource_metrics:
-        for sm in rm.scope_metrics:
-            for metric in sm.metrics:
-                if metric.name == name:
-                    return metric
-    raise Exception(f"No metric found with name {name}")
+    try:
+        return next(
+            m
+            for rm in resource_metrics
+            for sm in rm.scope_metrics
+            for m in sm.metrics
+            if m.name == name
+        )
+    except StopIteration:
+        pytest.fail(f"Metric '{name}' not found in collected metrics")

25-46: Make metric assertion robust to cumulative/export cycles.

Sum data points by type to tolerate multiple collections and exporter behavior; add clearer empty-metrics assertion.

 def assert_metric(reader, usage, is_read=False):
-    metrics_data = reader.get_metrics_data()
-    resource_metrics = metrics_data.resource_metrics
-    assert len(resource_metrics) > 0
-
-    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
-    # This check is now more specific to handle cumulative metrics
-    found_read = False
-    found_write = False
-    for data_point in m.data.data_points:
-        if data_point.attributes[CacheSpanAttrs.TYPE] == "read":
-            found_read = True
-            assert data_point.value == usage["cache_read_input_tokens"]
-        elif data_point.attributes[CacheSpanAttrs.TYPE] == "write":
-            found_write = True
-            assert data_point.value == usage["cache_creation_input_tokens"]
-    
-    if is_read:
-        assert found_read
-    else:
-        assert found_write
+    metrics_data = reader.get_metrics_data()
+    resource_metrics = metrics_data.resource_metrics
+    assert resource_metrics, "No resource metrics collected"
+
+    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
+    read = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "read"
+    )
+    write = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "write"
+    )
+    if is_read:
+        assert read == usage["cache_read_input_tokens"]
+    else:
+        assert write == usage["cache_creation_input_tokens"]

1106-1123: Don’t rely on span ordering; select by attribute.

Use CacheSpanAttrs.CACHED to locate write/read spans to avoid flakiness.

-    # Assertions for the first span (cache write)
-    span_write = spans[0]
-    assert span_write.name == "bedrock.converse"
-    attributes_write = span_write.attributes
+    # Assertions for the first span (cache write)
+    span_write = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "write")
+    assert span_write.name == "bedrock.converse"
+    attributes_write = span_write.attributes
@@
-    # Assertions for the second span (cache read)
-    span_read = spans[1]
-    assert span_read.name == "bedrock.converse"
-    attributes_read = span_read.attributes
+    # Assertions for the second span (cache read)
+    span_read = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "read")
+    assert span_read.name == "bedrock.converse"
+    attributes_read = span_read.attributes
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1965083 and 750a59e.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
  • PromptCaching (470-472)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
  • CacheSpanAttrs (9-11)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)
  • test_context (35-47)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
🪛 Ruff (0.14.2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)


22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1124-1130: Metrics assertions look good after helper hardening.

Cumulative usage check for read/write is aligned with counter semantics; ok once assert_metric is updated as suggested.

Optionally assert there are exactly two data points (read/write) for the metric in this test context after moving fixture for isolation.

@AlanPonnachan
Copy link
Author

Hi @nirga , I've updated the PR with the changes we discussed. I moved the logic into a new test in test_titan.py and deleted the old file.

It should be ready for the cassette to be recorded now. Thanks again for your help!

@nirga
Copy link
Member

nirga commented Oct 30, 2025

Thanks @AlanPonnachan! Can you record the test? poetry run pytest --record-mode=once

@AlanPonnachan
Copy link
Author

Hi @nirga, thank you again for all your guidance.

I tried to run the recording command locally as you requested. As expected, since I don't have AWS credentials, the test run fails with an UnrecognizedClientException (invalid security token). This confirms that the test is now correctly set up and is just waiting for a real recording to be generated.

I believe the PR is now ready from a code perspective. Would you be able to run the recording on your end when you have a moment?

Thank you so much for your help

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 750a59e and 419ae0f.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

  • packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

  • packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)
  • PromptCaching (470-472)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
  • CacheSpanAttrs (9-11)
packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
  • reader (37-41)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-245)
🪛 Ruff (0.14.2)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)


22-22: Avoid specifying long messages outside the exception class

(TRY003)


1077-1077: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (4)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM!

The imports for PromptCaching and CacheSpanAttrs are properly added to support the caching test scenario.


16-22: Helper function is clear and appropriate for test utilities.

The get_metric helper correctly searches through resource metrics. While static analysis suggests using a custom exception class, a generic Exception is acceptable for test utilities where the primary goal is clear failure messages during test debugging.


25-45: LGTM!

The assert_metric helper correctly validates prompt caching metrics for both read and write scenarios. The logic properly handles cumulative metrics by iterating through all data points and checking that the appropriate cache type was found.


1076-1129: Test logic is well-structured and correctly validates caching behavior.

The test properly exercises both cache-write and cache-read paths:

  1. First call writes to cache (cache_creation_input_tokens > 0)
  2. Second call reads from cache (cache_read_input_tokens > 0)
  3. Span attributes and metrics are validated for both scenarios

Good use of semantic convention constants (SpanAttributes.*) instead of hardcoded strings, addressing previous review feedback.

Note: The instrument_legacy fixture is flagged as unused by static analysis, but it's required for side effects to set up the Bedrock instrumentation before the test runs.

The test will work correctly once the VCR cassette is properly recorded with valid AWS credentials (see cassette file review comment).

Comment on lines +1 to +47
interactions:
- request:
body: '{"messages": [{"role": "user", "content": [{"text": "Hello, this is a test
prompt for caching."}]}], "inferenceConfig": {"maxTokens": 50}, "additionalModelRequestFields":
{"cacheControl": {"type": "ephemeral"}}}'
headers:
Content-Length:
- '211'
Content-Type:
- !!binary |
YXBwbGljYXRpb24vanNvbg==
User-Agent:
- !!binary |
Qm90bzMvMS4zNC4xNDUgbWQvQm90b2NvcmUjMS4zNC4xNDUgdWEvMi4wIG9zL3dpbmRvd3MjMTAg
bWQvYXJjaCNhbWQ2NCBsYW5nL3B5dGhvbiMzLjExLjggbWQvcHlpbXBsI0NQeXRob24gY2ZnL3Jl
dHJ5LW1vZGUjbGVnYWN5IEJvdG9jb3JlLzEuMzQuMTQ1
X-Amz-Date:
- !!binary |
MjAyNTEwMzBUMTY1MTI4Wg==
amz-sdk-invocation-id:
- !!binary |
ZTFmM2VjZTMtM2VlMC00ZGZiLWE2MWEtYjBiYmU4M2U2NDFm
amz-sdk-request:
- !!binary |
YXR0ZW1wdD0x
method: POST
uri: https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-text-express-v1/converse
response:
body:
string: '{"message":"The security token included in the request is invalid."}'
headers:
Connection:
- keep-alive
Content-Length:
- '68'
Content-Type:
- application/json
Date:
- Thu, 30 Oct 2025 16:51:30 GMT
x-amzn-ErrorType:
- UnrecognizedClientException:http://internal.amazon.com/coral/com.amazon.coral.service/
x-amzn-RequestId:
- f0fdf86b-d85a-4f16-8713-1b9b5fccbe25
status:
code: 403
message: Forbidden
version: 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Cassette contains authentication failure instead of valid caching response.

The cassette records a 403 Forbidden response with "The security token included in the request is invalid" rather than a successful API response with caching metadata. This means the test test_titan_converse_with_caching will fail because:

  • Line 1086-1088: Expects usage_write["cache_creation_input_tokens"] > 0, but the response has no usage field
  • Line 1091-1099: Expects a second successful call with cache read data

Per the PR objectives, you need AWS credentials to record a valid cassette. Ask the maintainer to run:

and commit the generated cassette.


🏁 Script executed:

poetry run pytest packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py::test_titan_converse_with_caching --record-mode=once

Length of output: 44



Once I locate the test file, I'll examine the specific test to verify the assertions mentioned in the review comment.

---

🏁 Script executed:

```shell
#!/bin/bash
# Find and inspect the test_titan.py file and locate the test_titan_converse_with_caching test
fd "test_titan\.py" --type f

Length of output: 77



🏁 Script executed:

#!/bin/bash
# Inspect the test function test_titan_converse_with_caching
ast-grep --pattern 'def test_titan_converse_with_caching($_) {
  $$$
}'

Length of output: 0



🏁 Script executed:

#!/bin/bash
# Search for the test function and display context
rg -n "def test_titan_converse_with_caching" --type py -A 50

Length of output: 6541


Cassette is invalid and will cause test failure; AWS credentials required to regenerate.

The cassette indeed contains a 403 Forbidden response instead of a valid caching response. The test will fail at line 1086 when attempting to access response_write["usage"] from the error response, which lacks this field. The cassette must be regenerated with valid AWS credentials to record a successful response with cache metadata.

🤖 Prompt for AI Agents
packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml
lines 1-47: the cassette records a 403 UnrecognizedClientException (invalid AWS
credentials) instead of a successful converse response, causing tests to crash
when they expect response_write["usage"]; regenerate or replace this cassette by
re-running the test recording with valid AWS credentials so the cassette
contains a successful response that includes cache/usage metadata, and then
commit the new cassette (or remove the failing cassette and update the test to
use a properly recorded fixture).

@nirga
Copy link
Member

nirga commented Oct 30, 2025

Thanks @AlanPonnachan, I tried running it locally and the test failed:

FAILED tests/traces/test_titan.py::test_titan_converse_with_caching - botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: The model returned the following errors: Malformed input request: extraneous key [cacheControl] is not permitted, please reformat your input and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚀 Feature: Add prompt caching for Bedrock Converse

3 participants