Skip to content

Commit 8fc679b

Browse files
committed
add native otel intrsumentation doc for agent identity
1 parent 67c44d6 commit 8fc679b

2 files changed

Lines changed: 311 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,8 @@ openobserve_init(
153153

154154
Resource-level `gen_ai.agent.name` is attached through the OpenTelemetry `Resource`. OpenObserve can use it as a fallback for LLM span agent identity, but span attributes take precedence. Use this only when the process has one static agent identity. For request-scoped or multi-agent processes, prefer `agent_name=` or `openobserve_agent(...)`.
155155

156+
If you already manage OpenTelemetry providers yourself, see [Native OpenTelemetry Agent Identity](docs/native-opentelemetry-agent-identity.md) for equivalent native SDK patterns.
157+
156158
### Protocol Configuration Notes
157159

158160
**HTTP/Protobuf (default)**
Lines changed: 309 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
# Native OpenTelemetry Agent Identity
2+
3+
This guide shows how to reproduce the OpenObserve Python SDK's trace behavior with the native OpenTelemetry Python SDK.
4+
5+
Use this when you already own your OpenTelemetry setup and do not want `openobserve_init(...)` to create providers/exporters for you.
6+
7+
## What The OpenObserve SDK Does
8+
9+
For traces, the SDK mainly does four things:
10+
11+
1. Creates an OpenTelemetry `TracerProvider`.
12+
2. Configures OTLP exporters for OpenObserve endpoints and headers.
13+
3. Applies `resource_attributes` through an OpenTelemetry `Resource`.
14+
4. Adds agent identity to spans with `gen_ai.agent.id` and `gen_ai.agent.name`.
15+
16+
The agent identity behavior is intentionally span-based. `agent_id=` and `agent_name=` in `openobserve_init(...)` are process-wide defaults stamped on spans, not resource attributes. `openobserve_agent(...)` sets request-scoped identity using OpenTelemetry baggage and overrides those defaults.
17+
18+
## Option 1: Resource Attributes Only
19+
20+
This is the easiest native OpenTelemetry option when one process represents one static agent.
21+
22+
```python
23+
from opentelemetry import trace
24+
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
25+
from opentelemetry.sdk.resources import Resource
26+
from opentelemetry.sdk.trace import TracerProvider
27+
from opentelemetry.sdk.trace.export import BatchSpanProcessor
28+
29+
resource = Resource.create(
30+
{
31+
"service.name": "support-agent-worker",
32+
"service.version": "1.0.0",
33+
"deployment.environment": "production",
34+
"gen_ai.agent.name": "Support Agent",
35+
}
36+
)
37+
38+
provider = TracerProvider(resource=resource)
39+
exporter = OTLPSpanExporter(
40+
endpoint="http://localhost:5080/api/default/v1/traces",
41+
headers={
42+
"Authorization": "Basic <base64>",
43+
"stream-name": "default",
44+
},
45+
)
46+
provider.add_span_processor(BatchSpanProcessor(exporter))
47+
trace.set_tracer_provider(provider)
48+
```
49+
50+
Resource attributes are associated with all spans emitted by that provider through the OTLP `ResourceSpans` envelope. In OpenObserve, resource attributes are stored as service/resource fields. Current OpenObserve agent extraction can use resource `gen_ai.agent.name` / `gen_ai.agent.id` as a fallback for LLM spans and promote them into canonical agent fields.
51+
52+
Use this only when:
53+
54+
- The process has one static agent identity.
55+
- The identity does not change per request.
56+
- You are comfortable with resource identity being a fallback for LLM spans.
57+
58+
Do not use this as the only mechanism when one process handles multiple agents or request-scoped agent identity.
59+
60+
## Option 2: Span Processor Defaults
61+
62+
This is closer to `openobserve_init(agent_id=..., agent_name=...)`: set a process-wide default identity on every recording span as span attributes.
63+
64+
```python
65+
from contextvars import ContextVar
66+
67+
from opentelemetry import baggage
68+
from opentelemetry.sdk.trace import SpanProcessor
69+
70+
AGENT_ID_KEY = "gen_ai.agent.id"
71+
AGENT_NAME_KEY = "gen_ai.agent.name"
72+
_current_agent_identity = ContextVar("agent_identity", default=None)
73+
74+
75+
def _clean(value):
76+
if value is None:
77+
return None
78+
value = str(value).strip()
79+
return value or None
80+
81+
82+
class AgentIdentitySpanProcessor(SpanProcessor):
83+
def __init__(self, agent_id=None, agent_name=None):
84+
self.agent_id = _clean(agent_id)
85+
self.agent_name = _clean(agent_name)
86+
87+
def on_start(self, span, parent_context=None):
88+
if not span.is_recording():
89+
return
90+
91+
local_identity = _current_agent_identity.get()
92+
if local_identity is not None:
93+
# Exact SDK behavior: a local partial identity does not mix with
94+
# static defaults or inherited baggage.
95+
agent_id, agent_name = local_identity
96+
elif self.agent_id is not None or self.agent_name is not None:
97+
agent_id = self.agent_id
98+
agent_name = self.agent_name
99+
else:
100+
agent_id = _clean(baggage.get_baggage(AGENT_ID_KEY, context=parent_context))
101+
agent_name = _clean(baggage.get_baggage(AGENT_NAME_KEY, context=parent_context))
102+
103+
if agent_id is not None:
104+
span.set_attribute(AGENT_ID_KEY, agent_id)
105+
if agent_name is not None:
106+
span.set_attribute(AGENT_NAME_KEY, agent_name)
107+
108+
def on_end(self, span):
109+
pass
110+
111+
def shutdown(self):
112+
pass
113+
114+
def force_flush(self, timeout_millis=30000):
115+
return True
116+
```
117+
118+
Register it before the exporter processor:
119+
120+
```python
121+
provider = TracerProvider(resource=resource)
122+
provider.add_span_processor(
123+
AgentIdentitySpanProcessor(
124+
agent_id="support-agent",
125+
agent_name="Support Agent",
126+
)
127+
)
128+
provider.add_span_processor(BatchSpanProcessor(exporter))
129+
trace.set_tracer_provider(provider)
130+
```
131+
132+
This makes every span self-identifying with normal span attributes. That is the safest form for OpenObserve agent attribution, especially for evaluations and non-LLM helper spans that should still carry agent identity.
133+
134+
## Option 3: Request-Scoped Identity With Baggage
135+
136+
This is the native equivalent of `openobserve_agent(...)`.
137+
138+
```python
139+
from contextlib import contextmanager
140+
141+
from opentelemetry import baggage, context
142+
143+
AGENT_ID_KEY = "gen_ai.agent.id"
144+
AGENT_NAME_KEY = "gen_ai.agent.name"
145+
146+
147+
@contextmanager
148+
def agent_identity(agent_id=None, agent_name=None):
149+
agent_id = _clean(agent_id)
150+
agent_name = _clean(agent_name)
151+
if agent_id is None and agent_name is None:
152+
raise ValueError("Agent identity requires at least one of agent_id or agent_name")
153+
154+
local_token = _current_agent_identity.set((agent_id, agent_name))
155+
ctx = context.get_current()
156+
if agent_id is not None:
157+
ctx = baggage.set_baggage(AGENT_ID_KEY, agent_id, context=ctx)
158+
else:
159+
ctx = baggage.remove_baggage(AGENT_ID_KEY, context=ctx)
160+
161+
if agent_name is not None:
162+
ctx = baggage.set_baggage(AGENT_NAME_KEY, agent_name, context=ctx)
163+
else:
164+
ctx = baggage.remove_baggage(AGENT_NAME_KEY, context=ctx)
165+
166+
token = context.attach(ctx)
167+
try:
168+
yield
169+
finally:
170+
context.detach(token)
171+
_current_agent_identity.reset(local_token)
172+
```
173+
174+
Use it around request or workflow execution:
175+
176+
```python
177+
with agent_identity(agent_id="triage", agent_name="Triage Agent"):
178+
run_agent_workflow()
179+
```
180+
181+
If outbound HTTP instrumentation is configured to propagate W3C baggage, downstream services can inherit the same identity. The downstream service can still override it with its own local or static identity.
182+
183+
The `ContextVar` is not used for propagation. It exists to mirror the SDK's precedence rules inside the current process: local request identity wins, static process identity is next, inherited baggage is last. This also prevents partial local identities from mixing with inherited or static fields.
184+
185+
## Explicit Baggage Propagation
186+
187+
If your application has custom propagation setup, make sure baggage is included:
188+
189+
```python
190+
from opentelemetry.propagate import set_global_textmap
191+
from opentelemetry.propagators.composite import CompositePropagator
192+
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
193+
from opentelemetry.baggage.propagation import W3CBaggagePropagator
194+
195+
set_global_textmap(
196+
CompositePropagator(
197+
[
198+
TraceContextTextMapPropagator(),
199+
W3CBaggagePropagator(),
200+
]
201+
)
202+
)
203+
```
204+
205+
## Full Native Trace Setup
206+
207+
```python
208+
from opentelemetry import trace
209+
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
210+
from opentelemetry.sdk.resources import Resource
211+
from opentelemetry.sdk.trace import TracerProvider
212+
from opentelemetry.sdk.trace.export import BatchSpanProcessor
213+
214+
resource = Resource.create(
215+
{
216+
"service.name": "o2-ai",
217+
"service.version": "1.0.0",
218+
"deployment.environment": "production",
219+
}
220+
)
221+
222+
exporter = OTLPSpanExporter(
223+
endpoint="http://localhost:5080/api/default/v1/traces",
224+
headers={
225+
"Authorization": "Basic <base64>",
226+
"stream-name": "default",
227+
},
228+
timeout=30,
229+
)
230+
231+
provider = TracerProvider(resource=resource)
232+
provider.add_span_processor(
233+
AgentIdentitySpanProcessor(agent_id="o2-ai", agent_name="O2 AI")
234+
)
235+
provider.add_span_processor(BatchSpanProcessor(exporter))
236+
trace.set_tracer_provider(provider)
237+
238+
tracer = trace.get_tracer("o2-ai")
239+
240+
with agent_identity(agent_id="sre", agent_name="SRE Agent"):
241+
with tracer.start_as_current_span("agent.workflow") as span:
242+
span.set_attribute("gen_ai.operation.name", "chat")
243+
run_agent_workflow()
244+
```
245+
246+
## HTTP And gRPC OpenObserve Endpoints
247+
248+
For HTTP/protobuf traces:
249+
250+
```python
251+
OTLPSpanExporter(
252+
endpoint="https://<openobserve-host>/api/<org>/v1/traces",
253+
headers={
254+
"Authorization": "Basic <base64>",
255+
"stream-name": "default",
256+
},
257+
)
258+
```
259+
260+
For gRPC traces, use the gRPC exporter and pass OpenObserve headers as metadata:
261+
262+
```python
263+
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
264+
265+
OTLPSpanExporter(
266+
endpoint="<openobserve-host>:5081",
267+
headers=(
268+
("authorization", "Basic <base64>"),
269+
("organization", "default"),
270+
("stream-name", "default"),
271+
),
272+
insecure=False,
273+
)
274+
```
275+
276+
Use lowercase header names for gRPC metadata.
277+
278+
## Choosing The Right Approach
279+
280+
| Requirement | Native OTel approach |
281+
| --- | --- |
282+
| One static agent per process | Resource attr can be enough |
283+
| OpenObserve agent attribution on LLM spans | Resource fallback or span processor |
284+
| Every span should be self-identifying | Span processor |
285+
| One process handles multiple agents | Request-scoped baggage plus span processor |
286+
| Identity should propagate to downstream services | Baggage propagation |
287+
| Span identity should override inherited identity | Local request-scoped baggage/context |
288+
289+
## OpenObserve Behavior Notes
290+
291+
- Span attributes named `gen_ai.agent.name` and `gen_ai.agent.id` are the primary agent identity fields.
292+
- Resource attributes are stored separately as resource/service metadata. OpenObserve prefixes non-`service.name` resource fields internally.
293+
- Current OpenObserve extraction can use resource agent fields as a fallback for LLM spans.
294+
- Span-level agent fields win over resource-level fields.
295+
- If a process can emit spans for multiple agents, prefer span attributes over resource attributes.
296+
297+
## What Native OTel Does Not Give You Automatically
298+
299+
Using native OpenTelemetry means you own:
300+
301+
- OpenObserve endpoint construction.
302+
- OpenObserve authorization and stream headers.
303+
- Provider singleton lifecycle.
304+
- Flush and shutdown behavior.
305+
- Agent identity precedence rules.
306+
- Baggage propagation setup.
307+
- Logs and metrics exporter setup, if you need those signals.
308+
309+
Use the OpenObserve SDK when you want those defaults managed for you. Use native OpenTelemetry when your application already has a provider/exporter lifecycle and you only need to add OpenObserve-compatible conventions.

0 commit comments

Comments
 (0)