[Bug] ScenarioResult.messages returns judge's internal messages instead of full conversation in 0.7.15

## Description

In `langwatch-scenario==0.7.15`, `ScenarioResult.messages` returns the judge's internal messages (system prompt + transcript text) instead of the full conversation history. This is a breaking change from 0.7.14.

## Root Cause

In v0.7.15, the `JudgeAgent.call()` method was refactored to use a transcript-based approach for the judge's context:

**v0.7.14** built `messages` by spreading the actual conversation:
```python
messages = [
    {"role": "system", "content": self.system_prompt or ...},
    *input.messages,  # Actual conversation messages spread here
]
```

**v0.7.15** converts messages to a text transcript instead:
```python
transcript = JudgeUtils.build_transcript_from_messages(input.messages)
content_for_judge = f"""
<transcript>
{transcript}
</transcript>
...
"""

messages = [
    {"role": "system", "content": self.system_prompt or ...},
    {"role": "user", "content": content_for_judge},  # Just text, not the actual messages
]
```

Both versions then return `messages=messages` in the `ScenarioResult`, but in 0.7.15 this no longer contains the actual message objects.

## Expected Behavior

`ScenarioResult.messages` should contain the full conversation history between the user simulator and the agent under test, including:
- User messages
- Assistant messages  
- Tool calls
- Tool results

## Actual Behavior

`ScenarioResult.messages` contains only:
- Judge's system prompt
- A single user message containing the transcript as text

The actual structured message objects (with tool calls, etc.) are lost.

## Impact

This breaks any code that relies on `result.messages` to:
- Extract tool calls from the conversation
- Extract assistant responses
- Log the full conversation for debugging/reporting
- Generate HTML reports showing the conversation
- Iterate over individual messages

## Workaround

Downgrade to `langwatch-scenario==0.7.14` where `result.messages` contains the actual conversation messages.

## Suggested Fix

Change the JudgeAgent to return `input.messages` instead of the local `messages` variable:

```python
return ScenarioResult(
    success=verdict == "success" and len(failed_criteria) == 0,
    messages=input.messages,  # Use the actual conversation, not judge's internal messages
    reasoning=reasoning,
    passed_criteria=passed_criteria,
    failed_criteria=failed_criteria,
)
```

## Environment

- Python: 3.13.4
- langwatch-scenario: 0.7.15
- OS: macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] ScenarioResult.messages returns judge's internal messages instead of full conversation in 0.7.15 #221

Description

Root Cause

Expected Behavior

Actual Behavior

Impact

Workaround

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] ScenarioResult.messages returns judge's internal messages instead of full conversation in 0.7.15 #221

Description

Description

Root Cause

Expected Behavior

Actual Behavior

Impact

Workaround

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions