feat: add OpenTelemetry observability with GenAI semantic conventions#49
Open
feat: add OpenTelemetry observability with GenAI semantic conventions#49
Conversation
Uses only @opentelemetry/api (no-op without consumer SDK). Pure functions — no classes, no state, no custom interfaces.
…points - Root ask span with system prompt event - Compaction span (success + error paths) - Generation span per LLM iteration with input/output message events - Tool span per execution with arguments/result events - All error paths record exceptions with full stack traces
16 tests covering: root ask span attributes/events, compaction spans, generation spans with usage/messages, tool spans with args/results, parent-child relationships, and error path exception recording.
Covers: setup example (Langfuse via OTel), trace structure diagram, captured metrics table for all span types, and error handling behavior.
…ool call count - gen_ai.provider.name on generation spans (standard semconv) - gen_ai.response.finish_reason on generation spans (end_turn, tool_use, etc.) - ask_forge.total_iterations on root ask span - ask_forge.total_tool_calls on root ask span
- Add try/catch around tool execution to end tool spans on error - Wrap iteration loop in try/catch to guarantee ask span ends - Add endToolSpanWithError helper for failed tool executions - Remove unused params from startAskSpan (response, inferenceTimeMs) - Add question as input event on root ask span - Make systemPrompt optional to match upstream Context type - Import Span type instead of inline import() expressions - Replace console.log with this.#logger.log for compaction - Remove stale test count from README
- Add test for tool execution error (span + ask span both end) - Add test for API error string recorded as exception - Add test for compaction error span lifecycle - Fix TestSpan.addEvent signature to match OTel Span interface - Fix TracerProvider type import (was trace.TracerProvider) - Add addLink/addLinks stubs required by OTel Span interface - Remove unused afterEach import
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add OpenTelemetry instrumentation to ask-forge so consumers can observe LLM interactions using any OTel-compatible backend (Langfuse, Jaeger, Honeycomb, etc.).
Design decisions
@opentelemetry/api— zero overhead no-op when no SDK is installed. No backend coupling.gen_ai.chat,gen_ai.execute_tool, etc.)src/tracing.ts— no classes, no state (aside from the idiomatic module-level tracer)ask()call, correlated across multi-turn conversations viaask_forge.session.id— matches the standard pattern used by Langfuse, LangSmith, and OpenLLMetryTrace structure
Each
session.ask()produces a trace:Instrumentation points
ask(root)gen_ai.operation.name,gen_ai.request.model,ask_forge.session.id,ask_forge.repo.url,ask_forge.repo.commitish, token usage, iteration/tool counts, link statsgen_ai.system_instructions,gen_ai.input.messages(question)compactionwas_compacted,tokens_before,tokens_aftergen_ai.chatgen_ai.input.messages,gen_ai.output.messages, exception on errorgen_ai.execute_toolgen_ai.tool.call.arguments,gen_ai.tool.call.resultError handling
error.type = "max_iterations_reached")All spans are guaranteed to end (no orphans) via try/catch guards.
Multi-turn conversations
Each
ask()call creates an independent trace. Multi-turn conversations are correlated viaask_forge.session.idon the root span. This matches the industry standard (Langfuse sessions, LangSmith threads, OpenLLMetry association properties).Follow-up: #56 — adopt
gen_ai.conversation.idfrom the OTel GenAI semantic conventions (v1.40.0) for spec-compliant conversation tracking.Files changed
src/tracing.ts— NEW: OTel span helpers with GenAI semantic conventionssrc/session.ts— MODIFIED: instrumented at 4 integration pointstest/tracing.test.ts— NEW: 20 tests with in-memory TracerProviderREADME.md— MODIFIED: added Observability section with setup guide and metrics tablepackage.json— MODIFIED: added@opentelemetry/apidependencyConsumer setup
By default, tracing is a zero-overhead no-op (no console output, no network calls). To enable, the consumer installs an OTel SDK and registers an exporter before calling
ask():