Skip to content

Conversation

alexeyleping
Copy link

Add delay monitoring to ThreadPoolTaskScheduler

Fixes #33856

Problem

ThreadPoolTaskScheduler with default pool size of 1 silently suffers from thread starvation when multiple tasks are scheduled, leading to production issues that are hard to diagnose.

Solution

Added automatic delay monitoring that:

  • Detects when tasks are delayed due to pool exhaustion
  • Logs actionable warnings with diagnostic metrics
  • Includes circuit breaker for self-protection
  • Exposes JMX management interface for runtime control

Key Features

  • Lock-free design (CAS operations, atomic variables)
  • Zero contention with scheduler threads
  • Configurable thresholds and intervals
  • Structured logging support (JSON)
  • CPU self-monitoring
  • Fully backward compatible

Example Warning

WARN o.s.s.c.ThreadPoolTaskScheduler - 5 scheduled tasks are delayed
(max delay: 1488ms) due to thread pool exhaustion. Pool size: 1,
Active threads: 1, Queue size: 5. Consider increasing the pool size
via ThreadPoolTaskScheduler.setPoolSize() or
spring.task.scheduling.pool.size property.

Testing

Verified on production-like workload with demo application. All edge cases covered including concurrent modifications, circuit breaker transitions, and error recovery.

  ThreadPoolTaskScheduler now monitors scheduled tasks and logs warnings
  when tasks are delayed due to insufficient pool size. This helps diagnose
  thread starvation issues and suggests increasing pool size or enabling
  virtual threads.

  Closes spring-projectsgh-33856
  Address production blockers in delay monitoring feature from spring-projects#33856:
  - Fix race conditions in circuit breaker using AtomicReference with CAS
  - Fix memory leak in sliding window rate limiter with bounded queue
  - Make stopDelayMonitor() lock-free to prevent deadlock
  - Add bounds check for warningRateLimitMs (max 24 hours)
  - Fix resetWarningRateLimit() to clear sliding window queue

  Add 7 concurrency tests covering race conditions, memory leaks, and
  deadlock scenarios. All 24 tests pass (100%).

  Implementation now uses lock-free algorithms throughout for thread
  safety without performance degradation.

  See spring-projectsgh-33856
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Oct 12, 2025
@sbrannen sbrannen added the in: core Issues in core modules (aop, beans, core, context, expression) label Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in: core Issues in core modules (aop, beans, core, context, expression) status: waiting-for-triage An issue we've not yet triaged or decided on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Emit warning when ThreadPoolTaskScheduler is unable to meet task delay

3 participants