Skip to content

Conversation

@bruceg
Copy link
Member

@bruceg bruceg commented Nov 14, 2025

Summary

This change adds support for a configurable send_timeout_secs option to the datadog_agent source. When configured, the source will reject requests with a HTTP 503 response code after the configured time when the downstream buffer cannot accept the new events in time. It defaults to no timeout, preserving the existing behavior.

Without the timeout, the Datadog Agent will eventually drop the connection itself, which Vector will detect and report as an "Events dropped." error as well as incrementing the component_events_dropped_total metric. However, in both cases, either connection drop or explicit timeout, the Agent will continue to retry the request indefinitely. As such, this does not change the end result, only the reporting of it and the speed at which it is reported.

Vector configuration

How did you test this PR?

There are internal unit tests that confirm the behavior with a running source.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@bruceg bruceg added the type: enhancement A value-adding code change that enhances its existing functionality. label Nov 14, 2025
@bruceg bruceg requested review from a team as code owners November 14, 2025 21:06
@bruceg bruceg added the source: datadog_agent Anything `datadog_agent` source related label Nov 14, 2025
@github-actions github-actions bot added domain: topology Anything related to Vector's topology code domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation domain: core Anything related to core crates i.e. vector-core, core-common, etc labels Nov 14, 2025
@graphcareful
Copy link
Contributor

When configured, the source will reject requests with a HTTP 503 response code after the configured time when the downstream buffer cannot accept the new events in time

Can you explain the effect of implementing this change? i.e. What will returning 503 make the agent do, and how would that in turn affect Vector?

@bruceg
Copy link
Member Author

bruceg commented Nov 14, 2025

Can you explain the effect of implementing this change? i.e. What will returning 503 make the agent do, and how would that in turn affect Vector?

Comment added here, and I will update the docs as @pront suggests.

@bruceg
Copy link
Member Author

bruceg commented Nov 14, 2025

FWIW the check-events errors are interesting. I copied the new ComponentEventsTimedOut from ComponentEventsDropped but that event isn't erroring. The reason it isn't is that the impl RegisterInternalEvent for it is split across two lines due to generic parameters, and so the regex in check-events doesn't catch it.

@iadjivon
Copy link

Hi team, added a do not merge label for the docs team. Kindly remove it once it is ready to be reviewed by docs! Thanks and let me know if you have any questions!

@bruceg bruceg requested a review from pront November 14, 2025 22:26
@bruceg
Copy link
Member Author

bruceg commented Nov 14, 2025

Hi team, added a do not merge label for the docs team. Kindly remove it once it is ready to be reviewed by docs! Thanks and let me know if you have any questions!

I added more docs, so I think that's ready for review.

@bruceg
Copy link
Member Author

bruceg commented Nov 14, 2025

Regarding the spelling errors on timedout, I'd be happy to change the wording on the metrics that appear to be causing that, but I wanted to make it obvious at first glance that it was Vector causing requests to time out.

@pront
Copy link
Member

pront commented Nov 17, 2025

Tangential note: this timeout mechanism could and probably should be extended to other HTTP-based sources (most importantly http_server source).

@bruceg
Copy link
Member Author

bruceg commented Nov 17, 2025

Tangential note: this timeout mechanism could and probably should be extended to other HTTP-based sources (most importantly http_server source).

Absolutely. That's why the core of this mechanism is in all the common code and not just in the datadog_agent source.

@bruceg bruceg enabled auto-merge November 17, 2025 14:47
@bruceg bruceg added this pull request to the merge queue Nov 17, 2025
Merged via the queue into master with commit 9c3e7ee Nov 17, 2025
45 checks passed
@bruceg bruceg deleted the bruceg/datadog-agent-send-timeout branch November 17, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources domain: topology Anything related to Vector's topology code source: datadog_agent Anything `datadog_agent` source related type: enhancement A value-adding code change that enhances its existing functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants