[Feature] Use RolloutEngine as distillation teacher to reduce GPU memory vs TrainEngine teacher

## Checklist

- [x] This feature will maintain backward compatibility with the current APIs in
  `areal/api/`. If not, please raise a refactor issue first.

## Background

In on-policy distillation, the teacher is used only for log-prob scoring (teacher_logp) and does not need optimizer, gradients, or training states.
Using a full TrainEngine for teacher is memory-heavy and increases resource pressure.
We want a teacher path based on RolloutEngine / InferenceEngine to reduce GPU memory usage.

Current distillation workflows historically instantiate teacher as a train-style engine, which can allocate unnecessary training components (optimizer states, train-time buffers, etc.) for an inference-only teacher use case.

## Potential Solution

- teacher is configured as an inference rollout engine (vLLM/SGLang).
- RLTrainer calls teacher.compute_logp(...) on rollout batches.
- Teacher model path/config is independent from actor rollout model path.
- Teacher lifecycle uses rollout/controller semantics (init/offload/onload/destroy) without train-engine overhead.

### Benefits

- Lower peak GPU memory for distillation runs.
- Better stability on limited-memory hardware.
- Better separation of concerns (teacher scoring vs student training).

## Additional Information

### Minimal config example

```
teacher:
  path: <teacher-model-path>
  rollout:
    backend: "vllm:d1p1t1"   # or sglang:d...
  offload: true
  rl_loss_weight: 1.0
  distill_loss_weight: 0.005
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Use RolloutEngine as distillation teacher to reduce GPU memory vs TrainEngine teacher #1367

Checklist

Background

Potential Solution

Benefits

Additional Information

Minimal config example

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Use RolloutEngine as distillation teacher to reduce GPU memory vs TrainEngine teacher #1367

Description

Checklist

Background

Potential Solution

Benefits

Additional Information

Minimal config example

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions