fix(bedrock): Add prompt caching support for Converse API #3390

AlanPonnachan · 2025-09-19T14:51:07Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Description

This PR introduces prompt caching telemetry for the AWS Bedrock Converse and Converse Stream APIs, bringing feature parity with the existing invoke_model instrumentation.

The Converse API reports caching information in the usage field of the response body, rather than through HTTP headers. This implementation adds the necessary logic to parse this information and record it as metrics and span attributes.

Changes include:

New function prompt_caching_converse_handling in prompt_caching.py to extract cache_read_input_tokens and cache_creation_input_tokens from the response body.
Integration into __init__.py: The new function is now called from _handle_converse and _handle_converse_stream to process caching data for both standard and streaming calls.
New Test File: Added test_bedrock_converse_prompt_caching_metrics.py to validate that the gen_ai.prompt.caching metric is correctly emitted for the Converse API.

Fixes #3337

Important

Adds prompt caching telemetry for AWS Bedrock Converse APIs, including new function for caching data extraction and corresponding tests.

Behavior:
- Adds prompt_caching_converse_handling in prompt_caching.py to extract caching data from Converse API response body.
- Integrates prompt_caching_converse_handling into _handle_converse and _handle_converse_stream in __init__.py.
Testing:
- Adds test_bedrock_converse_prompt_caching_metrics.py to validate gen_ai.prompt.caching metric emission for Converse API.
Misc:
- Fixes 🚀 Feature: Add prompt caching for Bedrock Converse #3337.

^{This description was created by}^{for 4fa3792. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Added prompt-caching detection for Bedrock Converse (streaming and non‑streaming), recording cache read/write state and token-level usage into telemetry and metrics.
Tests
- Added tests validating cache write and cache read scenarios for Converse, including span attributes and aggregated token metrics verification.
- Added a test cassette covering a Converse error response to exercise authentication/failure handling.

CLAassistant · 2025-09-19T14:51:20Z

All committers have signed the CLA.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 4fa3792 in 2 minutes and 2 seconds. Click for details.

Reviewed 156 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py:359

Draft comment:
Good integration of prompt_caching_converse_handling in _handle_converse. In the streaming handler (lines ~400), note that if both read and write tokens are present, the span attribute may be overwritten. Ensure this is the intended behavior.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py:56

Draft comment:
The test correctly validates prompt caching metrics for Converse API. The cumulative workaround for metric values indicates the underlying counter is cumulative. Consider resetting metrics between tests to avoid cross-test interference if possible.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts: 1) An observation about the cumulative nature of the metrics which is already documented in the code comments, and 2) A speculative suggestion about resetting metrics that isn't clearly necessary since the current approach works. The comment doesn't identify any actual problems or required changes. The suggestion about resetting metrics could be valid if there's evidence of cross-test interference, but we don't see any such evidence. The current workaround seems intentional and functional. Since the current approach is working and documented, and there's no evidence of actual problems, the suggestion is more speculative than necessary. Delete the comment as it's primarily informative/observational and makes a speculative suggestion without clear evidence of need for change.

Workflow ID: wflow_bNUeXv3pUdPPxbhz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py

…nstrumentation/bedrock/prompt_caching.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

nirga

hey @AlanPonnachan - looks like tests are failing, can you take a look?

….com/AlanPonnachan/openllmetry into feat-bedrock-converse-prompt-caching

AlanPonnachan · 2025-09-19T21:05:47Z

Hi @nirga

I’ve resolved the lint test failures. The remaining failing test, test_prompt_cache_converse, is expected since it requires a VCR cassette to be recorded.

As I don’t have access to an active AWS account, I’m unable to generate the test_prompt_cache_converse.yaml cassette file myself. Would you be able to check out this branch, run the test and push the generated cassette file to this PR?

Thanks for your help!

nirga

Sure @AlanPonnachan, will do it - can you fix the small comment I wrote? I'll then run it locally and record a test. BTW - if you can rely on existing converse tests it might be easier

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py

AlanPonnachan · 2025-09-21T06:24:50Z

Thanks for the great suggestion and for your willingness to help record the test!

I agree that relying on an existing test is a cleaner approach. Before I push the changes, I just want to confirm my plan sounds good to you.

Here is what I am planning to do:

Modify the Existing Test: I will update the test_titan_converse function in tests/traces/test_titan.py.
Enable Caching: I'll add the additionalModelRequestFields with cacheControl to the existing brt.converse API call.
Test Both Scenarios: I will add a second brt.converse call within that same test to ensure we cover both the initial "cache-write" and the subsequent "cache-read".
Add Assertions: I will add the metric assertions I wrote to validate that the prompt caching counters are working correctly.
Clean Up: Finally, I will delete the new test file I originally created (test_bedrock_converse_prompt_caching_metrics.py).

This will result in the cassette for test_titan_converse.yaml needing to be re-recorded, as you mentioned.

Does this plan look good? If so, I'll go ahead and make the changes.

AlanPonnachan · 2025-10-28T14:57:39Z

@nirga any update on this?

coderabbitai · 2025-10-29T11:55:45Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added response-based prompt caching handling for Bedrock Converse and invoked it from both synchronous and streaming Converse flows to record span attributes and emit cache read/write metrics.

Changes

Cohort / File(s)	Summary
Instrumentation entrypoints `packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`	Imported `prompt_caching_converse_handling` and `prompt_caching_handling`; added calls to `prompt_caching_converse_handling(...)` in `_handle_converse()` and in the end-of-message path of `_handle_converse_stream()` after guardrail handling.
Prompt caching logic `packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py`	Added `prompt_caching_converse_handling(response, vendor, model, metric_params)` which reads cache token counts from a response object (reads and creations), sets span attributes (`CACHED`) and `gen_ai.usage.*` token attributes, and emits cache read/write metrics when enabled.
Tests & cassettes `packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`, `packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml`	Added test helpers (`get_metric`, `assert_metric`) and new test `test_titan_converse_with_caching` exercising a cache-write then cache-read Converse sequence and validating spans and metrics; added cassette for a Titan Converse interaction capturing an auth failure scenario. (Note: test block appears duplicated in the diff — verify.)

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BedrockInstrumentor
    participant BedrockAPI
    participant PromptCacheHandler
    participant Span
    participant Metrics

    Client->>BedrockInstrumentor: invoke_converse(...)
    BedrockInstrumentor->>BedrockAPI: call Converse
    BedrockAPI-->>BedrockInstrumentor: response (includes usage_metadata)

    BedrockInstrumentor->>PromptCacheHandler: prompt_caching_converse_handling(response, vendor, model, metric_params)

    rect #E8F4FF
      Note over PromptCacheHandler: extract cache_read/cache_creation from response\nset span attrs and gen_ai.usage.* token attributes
      PromptCacheHandler->>Span: set CACHED ("read"/"write") and token attrs
    end

    alt cache_read > 0
      PromptCacheHandler->>Metrics: emit cache read metric
    end
    alt cache_creation > 0
      PromptCacheHandler->>Metrics: emit cache write metric
    end

    BedrockInstrumentor-->>Client: return response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify extraction of cache fields from response usage metadata and alignment with actual Bedrock Converse response shapes.
Confirm span existence and is_recording checks (especially in streaming path) and that handler is invoked exactly once per final message.
Inspect duplicated test block in test_titan.py and the new cassette for correctness.

Suggested reviewers

doronkopit5

Poem

🐰 I hopped through tokens, counted each crumb,

cache-read, cache-write — now telemetry hums.
Spans wear badges of "read" or "write",
Metrics tally hops in morning light.
Bedrock Converse, cached and bright! 🎩✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix(bedrock): Add prompt caching support for Converse API" directly and clearly communicates the primary objective of the changeset. The title accurately reflects the main implementation effort across all modified files, which adds prompt caching telemetry support specifically for the Bedrock Converse and Converse Stream APIs. The title is concise, specific, and avoids vague terminology, making it clear to reviewers scanning the repository history what this change accomplishes.

[pre_merge_check_pass]

The PR implementation addresses the core coding requirements from issue #3337: a new function prompt_caching_converse_handling was added to extract cache metrics from response objects, it is properly integrated into both _handle_converse and _handle_converse_stream to record span attributes and metrics, and test code with helper functions was added to validate both cache-write and cache-read scenarios. The implementation correctly parses cache_read_input_tokens and cache_creation_input_tokens from the response body and emits them as span attributes and metrics, achieving feature parity with the existing invoke_model instrumentation. While the VCR cassette for the test requires re-recording with proper AWS credentials (currently showing a 403 error), the test code structure and assertions are properly implemented and awaiting cassette generation as noted in the PR comments.

[pre_merge_check_pass]

All code changes in this PR are directly aligned with the requirements outlined in issue #3337. The modifications to __init__.py add necessary imports and function calls for prompt caching handling, changes to prompt_caching.py implement the required caching extraction logic, additions to test_titan.py include test code and helpers to validate cache-write and cache-read flows, and the cassette file provides test fixtures. No unrelated modifications, refactorings, or changes to unrelated components are present in the changeset. |

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nirga

Sorry about the delay @AlanPonnachan! I was OOO and this fell between the cracks (team missed it for some reason). I'm back now! Left a small comment reg. the test - lmk if you need any assistance

nirga · 2025-10-29T11:56:30Z

...emetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

+            assert data_point.value == usage["cache_creation_input_tokens"]
+
+
+@pytest.mark.vcr


@AlanPonnachan can you record a real test case here, like the other tests we have?

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
59-85: Span attribute overwrite when both cache operations occur.

If a response contains both cache_read_input_tokens > 0 and cache_creation_input_tokens > 0, the second block (line 82) will overwrite the CACHED attribute set by the first block (line 68). This means only "write" will be recorded on the span, losing the "read" information.

Consider one of these approaches:

Accumulate both values in a single attribute (e.g., "read,write")

Use separate span attributes for read and write operations

Prioritize one operation type if simultaneous occurrence is expected to be rare

Apply this diff to record both operations:
     if read_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 read_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "read",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "read")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.read", True)
         span.set_attribute(
             "gen_ai.usage.cache_read_input_tokens", read_cached_tokens
         )
 
     if write_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 write_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "write",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "write")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.write", True)
         span.set_attribute(
             "gen_ai.usage.cache_creation_input_tokens", write_cached_tokens
         )

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
14-43: Consider adding is_recording check for consistency.

The new prompt_caching_converse_handling function includes an is_recording() check (line 52 in the new function), but this existing function doesn't. For consistency and to avoid unnecessary span operations when not recording, consider adding the same check here.

Apply this diff:
 def prompt_caching_handling(headers, vendor, model, metric_params):
     base_attrs = {
         "gen_ai.system": vendor,
         "gen_ai.response.model": model,
     }
     span = trace.get_current_span()
-    if not isinstance(span, trace.Span):
+    if not isinstance(span, trace.Span) or not span.is_recording():
         return

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b01af56 and 277499e.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py

🧬 Code graph analysis (2)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (2)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (2)

prompt_caching_converse_handling (46-85)

prompt_caching_handling (14-43)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py (1)

guardrail_converse (153-176)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

30-30: Create your own exception

(TRY002)

30-30: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

46-57: Good defensive programming with is_recording check.

The addition of span.is_recording() check and the conditional metric recording (if metric_params.prompt_caching:) are good defensive practices that prevent unnecessary work when telemetry is disabled.

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3)

25-28: LGTM: Clean import of prompt caching handlers.

The import statements are correctly structured and bring in both the new prompt_caching_converse_handling and the existing prompt_caching_handling functions.

357-361: LGTM: Proper integration in synchronous Converse handler.

The call to prompt_caching_converse_handling is correctly placed after guardrail_converse and receives the appropriate parameters (response, provider, model, metric_params). This mirrors the pattern used in _handle_call (line 332) for the invoke_model API.

399-406: LGTM: Correct metadata extraction for streaming events.

The metadata is properly extracted from the event before being passed to prompt_caching_converse_handling. This ensures that cache metrics are recorded when the final metadata event arrives in the stream.
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (2)
33-47: Verify metric assertion logic handles all scenarios.

The current assertion logic iterates through all data points and validates each one individually. However, it doesn't verify that the expected number of data points exist (e.g., exactly one for "read" and one for "write"). This could mask issues where duplicate or missing data points occur.

Consider adding explicit checks:
def assert_metric(reader, usage):
    metrics_data = reader.get_metrics_data()
    resource_metrics = metrics_data.resource_metrics
    assert len(resource_metrics) > 0

    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
    
    read_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "read"]
    write_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "write"]
    
    if usage["cache_read_input_tokens"] > 0:
        assert len(read_points) == 1
        assert read_points[0].value == usage["cache_read_input_tokens"]
    
    if usage["cache_creation_input_tokens"] > 0:
        assert len(write_points) == 1
        assert write_points[0].value == usage["cache_creation_input_tokens"]
50-69: Test structure is sound but VCR cassette needs recording.

The test correctly validates both cache creation (first call) and cache reading (second call) scenarios. The cumulative metric workaround at line 68 is necessary due to how the InMemoryMetricReader accumulates counter values across reads.

Note: As mentioned in the PR description, this test requires a VCR cassette to be recorded with actual AWS credentials.

Ensure the VCR cassette is recorded before merging, as the test will fail without it.

coderabbitai · 2025-10-29T12:00:38Z

...emetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

+def get_metric(resource_metrics, name):
+    for rm in resource_metrics:
+        for sm in rm.scope_metrics:
+            for metric in sm.metrics:
+                if metric.name == name:
+                    return metric
+    raise Exception(f"No metric found with name {name}")


⚠️ Potential issue | 🟡 Minor

Use AssertionError for test helper.

The generic Exception should be replaced with AssertionError since this is a test assertion helper and test failures should propagate as assertion failures.

As per static analysis hints.

Apply this diff:

def get_metric(resource_metrics, name): for rm in resource_metrics: for sm in rm.scope_metrics: for metric in sm.metrics: if metric.name == name: return metric - raise Exception(f"No metric found with name {name}") + raise AssertionError(f"No metric found with name {name}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_metric(resource_metrics, name):

for rm in resource_metrics:

for sm in rm.scope_metrics:

for metric in sm.metrics:

if metric.name == name:

return metric

raise Exception(f"No metric found with name {name}")

def get_metric(resource_metrics, name):

for rm in resource_metrics:

for sm in rm.scope_metrics:

for metric in sm.metrics:

if metric.name == name:

return metric

raise AssertionError(f"No metric found with name {name}")

🧰 Tools

🪛 Ruff (0.14.2)

30-30: Create your own exception

(TRY002)

30-30: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py around lines 24 to 30, the helper raises a generic Exception when a metric is not found; change it to raise an AssertionError (or use assert False) with the same message so test failures surface as assertion failures. Update the raise statement accordingly and keep the message f"No metric found with name {name}" intact.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
16-22: Consider using a more specific exception type.

While the function logic is correct, using a generic Exception is flagged by Ruff. For test utilities, this is acceptable, but you could optionally create a custom exception class or use AssertionError for better clarity in test failures.

Apply this diff if you'd like to improve the exception handling:
-    raise Exception(f"No metric found with name {name}")
+    raise AssertionError(f"No metric found with name {name}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 277499e and 1965083.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-261)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM! Imports are appropriate for caching support.

The new imports are necessary for testing prompt caching functionality and are properly scoped.

25-46: LGTM! Metric assertion logic is sound.

The function correctly handles cumulative metrics by tracking both read and write data points separately and asserting the appropriate one based on the is_read parameter.

1076-1090: Verify VCR cassette recording.

This test requires a VCR cassette to be recorded with actual AWS credentials. According to the PR comments, you've requested that a maintainer with AWS access run and record this cassette. Ensure this is completed before merging.

1077-1130: Comprehensive caching test with good coverage.

The test thoroughly validates both cache write and read scenarios, including:

Usage token assertions from API responses

Span attribute validation for cache state

Metric emission verification

The test structure is clear and follows best practices.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1113-1114: Resolved: replaced hardcoded attr strings with semconv constants.

Good use of SpanAttributes for cache token attributes; previous review concern addressed.

Also applies to: 1121-1122

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-14: Avoid depending on instrumentation internals in tests.

Importing PromptCaching and CacheSpanAttrs from instrumentation modules couples tests to internal paths. Prefer stable semconv constants or add local fallbacks for attribute keys to reduce breakage risk on refactors.

16-22: Use pytest.fail and drop bare Exception; simplify lookup.

Replace manual loops + bare Exception with next(...) and pytest.fail for clearer failures and to satisfy TRY002/TRY003.

 def get_metric(resource_metrics, name):
-    for rm in resource_metrics:
-        for sm in rm.scope_metrics:
-            for metric in sm.metrics:
-                if metric.name == name:
-                    return metric
-    raise Exception(f"No metric found with name {name}")
+    try:
+        return next(
+            m
+            for rm in resource_metrics
+            for sm in rm.scope_metrics
+            for m in sm.metrics
+            if m.name == name
+        )
+    except StopIteration:
+        pytest.fail(f"Metric '{name}' not found in collected metrics")

25-46: Make metric assertion robust to cumulative/export cycles.

Sum data points by type to tolerate multiple collections and exporter behavior; add clearer empty-metrics assertion.

 def assert_metric(reader, usage, is_read=False):
-    metrics_data = reader.get_metrics_data()
-    resource_metrics = metrics_data.resource_metrics
-    assert len(resource_metrics) > 0
-
-    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
-    # This check is now more specific to handle cumulative metrics
-    found_read = False
-    found_write = False
-    for data_point in m.data.data_points:
-        if data_point.attributes[CacheSpanAttrs.TYPE] == "read":
-            found_read = True
-            assert data_point.value == usage["cache_read_input_tokens"]
-        elif data_point.attributes[CacheSpanAttrs.TYPE] == "write":
-            found_write = True
-            assert data_point.value == usage["cache_creation_input_tokens"]
-    
-    if is_read:
-        assert found_read
-    else:
-        assert found_write
+    metrics_data = reader.get_metrics_data()
+    resource_metrics = metrics_data.resource_metrics
+    assert resource_metrics, "No resource metrics collected"
+
+    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
+    read = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "read"
+    )
+    write = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "write"
+    )
+    if is_read:
+        assert read == usage["cache_read_input_tokens"]
+    else:
+        assert write == usage["cache_creation_input_tokens"]

1106-1123: Don’t rely on span ordering; select by attribute.

Use CacheSpanAttrs.CACHED to locate write/read spans to avoid flakiness.

-    # Assertions for the first span (cache write)
-    span_write = spans[0]
-    assert span_write.name == "bedrock.converse"
-    attributes_write = span_write.attributes
+    # Assertions for the first span (cache write)
+    span_write = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "write")
+    assert span_write.name == "bedrock.converse"
+    attributes_write = span_write.attributes
@@
-    # Assertions for the second span (cache read)
-    span_read = spans[1]
-    assert span_read.name == "bedrock.converse"
-    attributes_read = span_read.attributes
+    # Assertions for the second span (cache read)
+    span_read = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "read")
+    assert span_read.name == "bedrock.converse"
+    attributes_read = span_read.attributes

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1965083 and 750a59e.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-261)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1124-1130: Metrics assertions look good after helper hardening.

Cumulative usage check for read/write is aligned with counter semantics; ok once assert_metric is updated as suggested.

Optionally assert there are exactly two data points (read/write) for the metric in this test context after moving fixture for isolation.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

AlanPonnachan · 2025-10-29T16:54:07Z

Hi @nirga , I've updated the PR with the changes we discussed. I moved the logic into a new test in test_titan.py and deleted the old file.

It should be ready for the cassette to be recorded now. Thanks again for your help!

nirga · 2025-10-30T14:43:53Z

Thanks @AlanPonnachan! Can you record the test? poetry run pytest --record-mode=once

AlanPonnachan · 2025-10-30T17:04:40Z

Hi @nirga, thank you again for all your guidance.

I tried to run the recording command locally as you requested. As expected, since I don't have AWS credentials, the test run fails with an UnrecognizedClientException (invalid security token). This confirms that the test is now correctly set up and is just waiting for a real recording to be generated.

I believe the PR is now ready from a code perspective. Would you be able to run the recording on your end when you have a moment?

Thank you so much for your help

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 750a59e and 419ae0f.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml (1 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧠 Learnings (1)

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1077-1077: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM!

The imports for PromptCaching and CacheSpanAttrs are properly added to support the caching test scenario.

16-22: Helper function is clear and appropriate for test utilities.

The get_metric helper correctly searches through resource metrics. While static analysis suggests using a custom exception class, a generic Exception is acceptable for test utilities where the primary goal is clear failure messages during test debugging.

25-45: LGTM!

The assert_metric helper correctly validates prompt caching metrics for both read and write scenarios. The logic properly handles cumulative metrics by iterating through all data points and checking that the appropriate cache type was found.

1076-1129: Test logic is well-structured and correctly validates caching behavior.

The test properly exercises both cache-write and cache-read paths:

First call writes to cache (cache_creation_input_tokens > 0)

Second call reads from cache (cache_read_input_tokens > 0)

Span attributes and metrics are validated for both scenarios

Good use of semantic convention constants (SpanAttributes.*) instead of hardcoded strings, addressing previous review feedback.

Note: The instrument_legacy fixture is flagged as unused by static analysis, but it's required for side effects to set up the Bedrock instrumentation before the test runs.

The test will work correctly once the VCR cassette is properly recorded with valid AWS credentials (see cassette file review comment).

coderabbitai · 2025-10-30T17:23:26Z

...trumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml

+interactions:
+- request:
+    body: '{"messages": [{"role": "user", "content": [{"text": "Hello, this is a test
+      prompt for caching."}]}], "inferenceConfig": {"maxTokens": 50}, "additionalModelRequestFields":
+      {"cacheControl": {"type": "ephemeral"}}}'
+    headers:
+      Content-Length:
+      - '211'
+      Content-Type:
+      - !!binary |
+        YXBwbGljYXRpb24vanNvbg==
+      User-Agent:
+      - !!binary |
+        Qm90bzMvMS4zNC4xNDUgbWQvQm90b2NvcmUjMS4zNC4xNDUgdWEvMi4wIG9zL3dpbmRvd3MjMTAg
+        bWQvYXJjaCNhbWQ2NCBsYW5nL3B5dGhvbiMzLjExLjggbWQvcHlpbXBsI0NQeXRob24gY2ZnL3Jl
+        dHJ5LW1vZGUjbGVnYWN5IEJvdG9jb3JlLzEuMzQuMTQ1
+      X-Amz-Date:
+      - !!binary |
+        MjAyNTEwMzBUMTY1MTI4Wg==
+      amz-sdk-invocation-id:
+      - !!binary |
+        ZTFmM2VjZTMtM2VlMC00ZGZiLWE2MWEtYjBiYmU4M2U2NDFm
+      amz-sdk-request:
+      - !!binary |
+        YXR0ZW1wdD0x
+    method: POST
+    uri: https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-text-express-v1/converse
+  response:
+    body:
+      string: '{"message":"The security token included in the request is invalid."}'
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '68'
+      Content-Type:
+      - application/json
+      Date:
+      - Thu, 30 Oct 2025 16:51:30 GMT
+      x-amzn-ErrorType:
+      - UnrecognizedClientException:http://internal.amazon.com/coral/com.amazon.coral.service/
+      x-amzn-RequestId:
+      - f0fdf86b-d85a-4f16-8713-1b9b5fccbe25
+    status:
+      code: 403
+      message: Forbidden
+version: 1


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Cassette contains authentication failure instead of valid caching response.

The cassette records a 403 Forbidden response with "The security token included in the request is invalid" rather than a successful API response with caching metadata. This means the test test_titan_converse_with_caching will fail because:

Line 1086-1088: Expects usage_write["cache_creation_input_tokens"] > 0, but the response has no usage field

Line 1091-1099: Expects a second successful call with cache read data

Per the PR objectives, you need AWS credentials to record a valid cassette. Ask the maintainer to run:

and commit the generated cassette.

🏁 Script executed:

poetry run pytest packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py::test_titan_converse_with_caching --record-mode=once

Length of output: 44

Once I locate the test file, I'll examine the specific test to verify the assertions mentioned in the review comment. --- 🏁 Script executed: ```shell #!/bin/bash # Find and inspect the test_titan.py file and locate the test_titan_converse_with_caching test fd "test_titan\.py" --type f

Length of output: 77

🏁 Script executed:

#!/bin/bash # Inspect the test function test_titan_converse_with_caching ast-grep --pattern 'def test_titan_converse_with_caching($_) { $$$ }'

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for the test function and display context rg -n "def test_titan_converse_with_caching" --type py -A 50

Length of output: 6541

Cassette is invalid and will cause test failure; AWS credentials required to regenerate.

The cassette indeed contains a 403 Forbidden response instead of a valid caching response. The test will fail at line 1086 when attempting to access response_write["usage"] from the error response, which lacks this field. The cassette must be regenerated with valid AWS credentials to record a successful response with cache metadata.

🤖 Prompt for AI Agents

packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml lines 1-47: the cassette records a 403 UnrecognizedClientException (invalid AWS credentials) instead of a successful converse response, causing tests to crash when they expect response_write["usage"]; regenerate or replace this cassette by re-running the test recording with valid AWS credentials so the cassette contains a successful response that includes cache/usage metadata, and then commit the new cassette (or remove the failing cassette and update the test to use a properly recorded fixture).

nirga · 2025-10-30T19:10:46Z

Thanks @AlanPonnachan, I tried running it locally and the test failed:

FAILED tests/traces/test_titan.py::test_titan_converse_with_caching - botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: The model returned the following errors: Malformed input request: extraneous key [cacheControl] is not permitted, please reformat your input and try again.

AlanPonnachan added 3 commits September 19, 2025 20:02

add converse handling function in prompt_caching

ff8e343

update init

83bdb0b

add test

4fa3792

ellipsis-dev bot reviewed Sep 19, 2025

View reviewed changes

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py Outdated Show resolved Hide resolved

Update packages/opentelemetry-instrumentation-bedrock/opentelemetry/i…

c8abf60

…nstrumentation/bedrock/prompt_caching.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

nirga changed the title ~~feat(bedrock): Add prompt caching support for Converse API~~ fix(bedrock): Add prompt caching support for Converse API Sep 19, 2025

nirga reviewed Sep 19, 2025

View reviewed changes

AlanPonnachan added 2 commits September 20, 2025 01:49

correct lint test

7d65ff8

Merge branch 'feat-bedrock-converse-prompt-caching' of https://github…

6cf3366

….com/AlanPonnachan/openllmetry into feat-bedrock-converse-prompt-caching

nirga reviewed Sep 19, 2025

View reviewed changes

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py Show resolved Hide resolved

add token data

a3c94e3

Merge branch 'main' into feat-bedrock-converse-prompt-caching

277499e

nirga reviewed Oct 29, 2025

View reviewed changes

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

AlanPonnachan and others added 2 commits October 29, 2025 20:55

Merge branch 'main' into feat-bedrock-converse-prompt-caching

979db5d

add real test

1965083

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Outdated Show resolved Hide resolved

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Outdated Show resolved Hide resolved

refactoring

750a59e

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Show resolved Hide resolved

AlanPonnachan and others added 2 commits October 30, 2025 22:27

record test

d4c2ca6

Merge branch 'main' into feat-bedrock-converse-prompt-caching

419ae0f

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

		assert data_point.value == usage["cache_creation_input_tokens"]


		@pytest.mark.vcr

fix(bedrock): Add prompt caching support for Converse API #3390

Are you sure you want to change the base?

fix(bedrock): Add prompt caching support for Converse API #3390

Uh oh!

Conversation

AlanPonnachan commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

CLAassistant commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

AlanPonnachan commented Sep 19, 2025

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlanPonnachan commented Sep 21, 2025

Uh oh!

AlanPonnachan commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

nirga Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlanPonnachan commented Oct 29, 2025

Uh oh!

nirga commented Oct 30, 2025

Uh oh!

AlanPonnachan commented Oct 30, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

nirga commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

AlanPonnachan commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

CLAassistant commented Sep 19, 2025 •

edited

Loading

coderabbitai bot commented Oct 29, 2025 •

edited

Loading