Skip to content

feat: Temporal tracking of dice in video frames #28

@samueil

Description

@samueil

Currently, each video frame is independently evaluated against the TensorFlow model, which can lead to unstable detections due to motion blur, lighting changes, or transient model errors.

This issue proposes implementing temporal smoothing by maintaining a rolling buffer of detection history to stabilize dice recognition:

What needs to be done:

  • Maintain a rolling frame history buffer (10-15 frames at 2 FPS capture rate ≈ 5-7.5 seconds of history)
  • Track dice across frames using spatial proximity (Euclidean distance between bounding boxes)
  • Implement confidence voting logic: if a die at position (x, y) is detected as "4" in 10+ out of 15 frames, but shows as "5" in 1-2 frames due to noise, lock in the "4" result
  • Reduce false positives by requiring temporal consistency rather than single-frame confidence

Technical approach:

  • Leverage the existing DiceDetection interface (which already tracks x, y, width, height, confidence)
  • Add a time-series buffer to the frame processor state
  • Implement spatial matching using Pythagorean distance: sqrt((x2-x1)² + (y2-y1)²) to track die identity across frames

Acceptance Criteria

  • Frame history buffer stores the last 10-15 detections per die
  • Dice are matched across consecutive frames using spatial proximity (configurable distance threshold)
  • Final die value is determined by majority vote from the frame buffer history
  • Single-frame noise/outliers are filtered out when consistent detections exist in history
  • Detection output remains stable with reduced jitter compared to single-frame processing
  • Frame processor continues to operate at 2 FPS without performance degradation

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions