feat: add OpenTelemetry observability with GenAI semantic conventions by kirang89 · Pull Request #49 · nilenso/ask-forge

kirang89 · 2026-03-21T13:59:43Z

Summary

Add OpenTelemetry instrumentation to ask-forge so consumers can observe LLM interactions using any OTel-compatible backend (Langfuse, Jaeger, Honeycomb, etc.).

Design decisions

Library depends only on @opentelemetry/api — zero overhead no-op when no SDK is installed. No backend coupling.
GenAI semantic conventions for span and attribute naming (gen_ai.chat, gen_ai.execute_tool, etc.)
Pure functions in src/tracing.ts — no classes, no state (aside from the idiomatic module-level tracer)
One trace per ask() call, correlated across multi-turn conversations via ask_forge.session.id — matches the standard pattern used by Langfuse, LangSmith, and OpenLLMetry
All error paths record exceptions with full stack traces

Trace structure

Each session.ask() produces a trace:

ask (root)
├── compaction
├── gen_ai.chat (iteration 1)
├── gen_ai.execute_tool (rg)
├── gen_ai.execute_tool (read)
├── gen_ai.chat (iteration 2)
└── gen_ai.chat (iteration 3, final response)

Instrumentation points

Span	Attributes	Events
`ask` (root)	`gen_ai.operation.name`, `gen_ai.request.model`, `ask_forge.session.id`, `ask_forge.repo.url`, `ask_forge.repo.commitish`, token usage, iteration/tool counts, link stats	`gen_ai.system_instructions`, `gen_ai.input.messages` (question)
`compaction`	`was_compacted`, `tokens_before`, `tokens_after`	Exception on error
`gen_ai.chat`	model, provider, iteration, token usage (incl. cache), stop reason	`gen_ai.input.messages`, `gen_ai.output.messages`, exception on error
`gen_ai.execute_tool`	tool name, call ID	`gen_ai.tool.call.arguments`, `gen_ai.tool.call.result`

Error handling

Tool execution errors: tool span ends with ERROR, exception propagates and ask span also ends with ERROR
Stream/API errors: generation span ends with ERROR, ask span ends normally (returns error result)
Max iterations: ask span ends with ERROR (error.type = "max_iterations_reached")
Compaction failures: compaction span ends with ERROR, execution continues

All spans are guaranteed to end (no orphans) via try/catch guards.

Multi-turn conversations

Each ask() call creates an independent trace. Multi-turn conversations are correlated via ask_forge.session.id on the root span. This matches the industry standard (Langfuse sessions, LangSmith threads, OpenLLMetry association properties).

Follow-up: #56 — adopt gen_ai.conversation.id from the OTel GenAI semantic conventions (v1.40.0) for spec-compliant conversation tracking.

Files changed

src/tracing.ts — NEW: OTel span helpers with GenAI semantic conventions
src/session.ts — MODIFIED: instrumented at 4 integration points
test/tracing.test.ts — NEW: 20 tests with in-memory TracerProvider
README.md — MODIFIED: added Observability section with setup guide and metrics table
package.json — MODIFIED: added @opentelemetry/api dependency

Consumer setup

By default, tracing is a zero-overhead no-op (no console output, no network calls). To enable, the consumer installs an OTel SDK and registers an exporter before calling ask():

import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";

const sdk = new NodeSDK({ spanProcessors: [new LangfuseSpanProcessor()] });
sdk.start();

// ask-forge spans now flow to Langfuse automatically

Uses only @opentelemetry/api (no-op without consumer SDK). Pure functions — no classes, no state, no custom interfaces.

…points - Root ask span with system prompt event - Compaction span (success + error paths) - Generation span per LLM iteration with input/output message events - Tool span per execution with arguments/result events - All error paths record exceptions with full stack traces

16 tests covering: root ask span attributes/events, compaction spans, generation spans with usage/messages, tool spans with args/results, parent-child relationships, and error path exception recording.

Covers: setup example (Langfuse via OTel), trace structure diagram, captured metrics table for all span types, and error handling behavior.

…ool call count - gen_ai.provider.name on generation spans (standard semconv) - gen_ai.response.finish_reason on generation spans (end_turn, tool_use, etc.) - ask_forge.total_iterations on root ask span - ask_forge.total_tool_calls on root ask span

- Add try/catch around tool execution to end tool spans on error - Wrap iteration loop in try/catch to guarantee ask span ends - Add endToolSpanWithError helper for failed tool executions - Remove unused params from startAskSpan (response, inferenceTimeMs) - Add question as input event on root ask span - Make systemPrompt optional to match upstream Context type - Import Span type instead of inline import() expressions - Replace console.log with this.#logger.log for compaction - Remove stale test count from README

- Add test for tool execution error (span + ask span both end) - Add test for API error string recorded as exception - Add test for compaction error span lifecycle - Fix TestSpan.addEvent signature to match OTel Span interface - Fix TracerProvider type import (was trace.TracerProvider) - Add addLink/addLinks stubs required by OTel Span interface - Remove unused afterEach import

kirang89 added 8 commits March 21, 2026 13:46

feat(tracing): add OTel tracing module with GenAI semantic conventions

c15cda6

Uses only @opentelemetry/api (no-op without consumer SDK). Pure functions — no classes, no state, no custom interfaces.

test(tracing): add OTel span tests with in-memory TracerProvider

1efc125

16 tests covering: root ask span attributes/events, compaction spans, generation spans with usage/messages, tool spans with args/results, parent-child relationships, and error path exception recording.

docs(tracing): add Observability section to README with metrics table

2154d12

Covers: setup example (Langfuse via OTel), trace structure diagram, captured metrics table for all span types, and error handling behavior.

style(tracing): auto-format via biome

7b06123

kirang89 mentioned this pull request Mar 22, 2026

feat: add gen_ai.conversation.id for multi-turn conversation tracing #56

Open

docs(readme): add observability to features list

711df91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenTelemetry observability with GenAI semantic conventions#49

feat: add OpenTelemetry observability with GenAI semantic conventions#49
kirang89 wants to merge 9 commits intomainfrom
feat/observability

kirang89 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kirang89 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Trace structure

Instrumentation points

Error handling

Multi-turn conversations

Files changed

Consumer setup

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kirang89 commented Mar 21, 2026 •

edited

Loading