Skip to content

Conversation

@bouwkast
Copy link
Contributor

@bouwkast bouwkast commented Oct 17, 2025

What does this PR do?

This was generated with help from Claude Code; please review accordingly.

Introduces custom exception classes for the profiler's C code to safely distinguish between constant error
messages (safe for telemetry) and dynamic content (excluded from telemetry), with compile-time enforcement to
prevent accidental PII leakage.

Implementation:
Two exception types:

  • Datadog::Profiling::ProfilingError - Constant messages → included in telemetry for aggregation
  • Datadog::Profiling::ProfilingInternalError - Dynamic content (libdatadog errors, system state) → excluded from
    telemetry, preserved locally for debugging

Compile-time safe macros using libdatadog's string concatenation technique:

#define RAISE_PROFILING_TELEMETRY_SAFE(msg) \
  rb_raise(datadog_profiling_constant_error_class, "" msg)

#define RAISE_PROFILING_TELEMETRY_UNSAFE(fmt, ...) \
  rb_raise(datadog_profiling_dynamic_error_class, fmt, ##__VA_ARGS__)

The "" msg trick ensures compilation fails if a non-literal string is passed to TELEMETRY_SAFE, preventing
accidental inclusion of dynamic content in telemetry.

Updated 51 error sites across 18 C files (26 constant → TELEMETRY_SAFE, 25 dynamic → TELEMETRY_UNSAFE) and
modified telemetry logging to selectively include only ProfilingError messages.

Motivation:

Following #4985 which removed the pii_safe parameter, we need to ensure profiler error messages in telemetry
contain only known-constant strings for compliance with intake requirements.

The profiler communicates errors to telemetry (Ruby code) via exception messages. Using exception types to distinguish safe vs. unsafe content allows the telemetry layer to filter appropriately.

The compile-time enforcement addresses maintainability concerns by making it impossible to accidentally pass
dynamic content to telemetry-safe exceptions through copy-paste errors or misunderstanding.

Change log entry

None.

Additional Notes:

This approach enables telemetry fingerprinting and aggregation for constant error messages while maintaining full
debugging context locally for dynamic errors. The two-tier system provides semantic clarity about what's safe for
telemetry vs. what should remain local-only.

Special handling for private_vm_api_access.c which cannot include ruby_helpers.h due to header conflicts with
private VM APIs—uses manual extern declarations with explanatory comments.

How to test the change?

Existing test coverage validates:

  • ProfilingError messages are included in telemetry (spec/datadog/core/telemetry/logging_spec.rb)
  • ProfilingInternalError messages are excluded from telemetry
  • Both exception types can be raised from C code
  • Telemetry filtering logic works correctly with both types

Compile-time enforcement can be verified by attempting to pass a variable to RAISE_PROFILING_TELEMETRY_SAFE
(compilation will fail as expected).

All CI tests passing.

@github-actions github-actions bot added core Involves Datadog core libraries profiling Involves Datadog profiling labels Oct 17, 2025
@bouwkast bouwkast changed the base branch from master to steven/error-logs-remediation October 17, 2025 18:07
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Oct 17, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

tests.parametric.test_config_consistency.Test_Stable_Config_Default.test_invalid_files[/etc/datadog-agent/application_monitoring.yaml, parametric-ruby] from system_tests_suite (Datadog)
Timeout of 60 seconds exceeded waiting for HTTP server to start. Please check logs.

ℹ️ Info

🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 100.00%
Total Coverage: 98.37% (-0.05%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c31665f | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@lloeki lloeki force-pushed the steven/error-logs-remediation branch from f40e845 to c8670d6 Compare October 21, 2025 13:49
Base automatically changed from steven/error-logs-remediation to master October 21, 2025 14:40
@bouwkast bouwkast force-pushed the steven/error-logs-remediation-custom-profiler-code branch from 8550883 to 3d908d7 Compare November 7, 2025 20:10
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

Typing analysis

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 1 partially typed method, and clears 1 partially typed method. It increases the percentage of typed methods from 54.67% to 54.7% (+0.03%).

Partially typed methods (+1-1)Introduced:
sig/datadog/profiling.rbs:20
└── def self.try_reading_skipped_reason_file: (?untyped file_api) -> ::String?
Cleared:
sig/datadog/profiling.rbs:14
└── def self.try_reading_skipped_reason_file: (?untyped file_api) -> ::String?

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

Add ruby_helpers.h include to 8 C files that use datadog_profiling_error_class
and datadog_profiling_internal_error_class but were missing the header declaration.

This fixes the compilation error:
  error: 'datadog_profiling_error_class' undeclared

Files fixed:
- clock_id_from_pthread.c
- collectors_gc_profiling_helper.c
- collectors_stack.c
- collectors_thread_context.c
- encoded_profile.c
- libdatadog_helpers.c
- private_vm_api_access.c
- unsafe_api_calls_check.c
Move ruby_helpers.h include after private VM headers to avoid conflicts.
This file requires private VM headers to be included first before any
public Ruby headers, but ruby_helpers.h includes datadog_ruby_common.h
which includes ruby.h, causing header ordering conflicts.

Fixes compilation error: 'expected ')' before '==' token in RHASH_EMPTY_P'
Cannot include ruby_helpers.h in this file as it pulls in public Ruby headers
(via datadog_ruby_common.h) that conflict with private VM headers.

Instead, declare the exception class globals as extern, following the pattern
already established in this file for other declarations.

This fully resolves the header ordering compilation error.
Method was renamed from safe_exception_message to constant_exception_message
but the RBS signature file was not updated, causing Steep type errors.
@pr-commenter
Copy link

pr-commenter bot commented Nov 7, 2025

Benchmarks

Benchmark execution time: 2025-11-07 21:32:47

Comparing candidate commit f04f01d in PR branch steven/error-logs-remediation-custom-profiler-code with baseline commit 9c11f64 in branch master.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 42 metrics, 2 unstable metrics.

scenario:profiling - Allocations (baseline)

  • 🟥 throughput [-323589.970op/s; -311052.662op/s] or [-6.146%; -5.908%]

scenario:tracing - Propagation - Datadog

  • 🟥 throughput [-2946.528op/s; -2871.924op/s] or [-9.301%; -9.065%]

The error method must be public but was accidentally made private when
constant_exception_message was added. Moving it before the private keyword
restores its public visibility.

Fixes test failure: NoMethodError: private method 'error' called
Serialization errors contain dynamic libdatadog content, so they should
raise ProfilingInternalError (not ProfilingError or RuntimeError).

Updated both the Ruby wrapper code and the test expectation to use
ProfilingInternalError consistently.

Fixes test failure expecting ProfilingError but getting RuntimeError.
@bouwkast bouwkast added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Nov 10, 2025
@bouwkast bouwkast changed the title [DO NOT MERGE] Implement custom exceptions raised from profiler code [WIP] Implement custom exceptions raised from profiler code Nov 10, 2025
@bouwkast bouwkast changed the title [WIP] Implement custom exceptions raised from profiler code Add compile-time safe exception macros for profiler telemetry Nov 14, 2025
@bouwkast bouwkast marked this pull request as ready for review November 14, 2025 19:35
@bouwkast bouwkast requested review from a team as code owners November 14, 2025 19:35
@bouwkast bouwkast requested review from ivoanjo and lloeki November 14, 2025 19:35
@marcotc marcotc self-assigned this Nov 18, 2025
@ivoanjo
Copy link
Member

ivoanjo commented Nov 21, 2025

So I believe this has been superseded by #5076 so should we close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants