fix(calls): stop auto leave/re-enter on remote media stall#2307
fix(calls): stop auto leave/re-enter on remote media stall#2307webguru-hypha wants to merge 27 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds remote-feed health helpers, pairwise call republish/hangup utilities with a debounced scheduler, video-surface readiness/paint predicates, integrates an in-call soft repair/pairwise restart into useSpaceGroupCall, and updates tests, telemetry, global context wiring, and TURN runbook. ChangesRemote Media Stall Detection & Recovery
Sequence DiagramsequenceDiagram
participant Eval as Stall Evaluator
participant Group as GroupCall
participant Pairwise as Pairwise Calls
participant Local as Local Media
participant TURN as TURN Probe
Eval->>Group: collectRemoteCallFeedUserIds / listUnhealthyRemoteCallUserIds
Eval->>Pairwise: determine targets for restart
Eval->>Local: schedule republish (debounced)
Eval->>Pairwise: hangupPairwiseCallsForRemoteUsers(targets)
Local->>Pairwise: republishLocalMediaToPairwiseCalls()
Eval->>TURN: re-probe TURN readiness
Eval->>Eval: re-run evaluation after repair
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🔗 Custom preview URL: https://pr-2307.preview-app.hypha.earth |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/epics/src/common/global-call-dock-context.tsx`:
- Around line 606-609: The useCallback for retryRemoteMediaConnection currently
lists only [call.retryRemoteMediaConnection] but may recreate or close over
stale values; ensure useSpaceGroupCall provides a stable
retryRemoteMediaConnection or change the dependency to a stable reference (e.g.,
include the whole call object `call` returned by useSpaceGroupCall or memoize
the retry function) and also include `resumeCallPlayback` in the dependency
array if resumeCallPlayback is not already stable; update the dependency array
and/or memoize the functions so retryRemoteMediaConnection and
resumeCallPlayback references remain stable and avoid unnecessary recreations or
stale closures.
In
`@packages/epics/src/common/human-chat-panel/__tests__/call-feed-tile-video.test.ts`:
- Around line 154-183: Add positive unit tests for
shouldMarkCallFeedVideoSurfaceReady and shouldPaintCallFeedVideoSurface: (1) for
shouldMarkCallFeedVideoSurfaceReady, add a case where videoWidth/videoHeight > 0
with muted: true and readyState: 2 and assert it returns true; (2) for
shouldPaintCallFeedVideoSurface, add a case with liveVideoTrack unmuted
(mockTrack({ muted: false })) and videoSurfaceReady true and assert it returns
true; (3) for shouldPaintCallFeedVideoSurface, add a case for a local feed
(isLocalFeed: true) with liveVideoTrack muted but readyState set to 'live' and
assert it returns true; locate test helpers/mockTrack and the functions
shouldMarkCallFeedVideoSurfaceReady and shouldPaintCallFeedVideoSurface to
implement these assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eaa82c0c-262b-4cbb-8235-072405c7779d
📒 Files selected for processing (12)
packages/core/src/matrix/client/hooks/__tests__/call-pairwise-restart.test.tspackages/core/src/matrix/client/hooks/__tests__/call-pairwise-retry-env.test.tspackages/core/src/matrix/client/hooks/__tests__/remote-call-media-stall.test.tspackages/core/src/matrix/client/hooks/call-pairwise-restart.tspackages/core/src/matrix/client/hooks/call-pairwise-retry-env.tspackages/core/src/matrix/client/hooks/remote-call-media-stall.tspackages/core/src/matrix/client/hooks/space-group-call-telemetry.tspackages/core/src/matrix/client/hooks/use-space-group-call.tspackages/epics/src/common/global-call-dock-context.tsxpackages/epics/src/common/human-chat-panel/__tests__/call-feed-tile-video.test.tspackages/epics/src/common/human-chat-panel/call-feed-tile-video.tspackages/epics/src/common/human-chat-panel/human-chat-panel-call-stage.tsx
Autofix Review CompletedFixed 2 file(s) from 2 unresolved CodeRabbit review comment(s) — one commit per comment. Commits:
Validation: Branch: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/matrix/client/hooks/use-space-group-call.ts (1)
865-871:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not clear pairwise republish queue from bootstrap-timer cleanup.
Line 870 couples two unrelated timer domains. In the join path, republish is queued on Line 2633 and then canceled when
startLocalMediaBootstrapSeries()callsclearLocalMediaBootstrapTimers()(Lines 1818-1823). That drops the intended post-join republish pass.Move
clearPairwiseRepublishSchedule()to teardown-only paths (e.g.,runCleanup) and keepclearLocalMediaBootstrapTimers()scoped to bootstrap timers.Suggested patch
const clearLocalMediaBootstrapTimers = useCallback(() => { if (localMediaBootstrapDebounceRef.current != null) { clearTimeout(localMediaBootstrapDebounceRef.current); localMediaBootstrapDebounceRef.current = null; } - clearPairwiseRepublishSchedule(); if (localMediaBootstrapTimerRefs.current.length > 0) { for (const id of localMediaBootstrapTimerRefs.current) { clearTimeout(id); } localMediaBootstrapTimerRefs.current = []; @@ stopPlaceOutgoingPeriodicNudge(placeOutgoingPeriodicIntervalRef); clearLocalMediaBootstrapTimers(); + clearPairwiseRepublishSchedule(); webRtcDiagCleanupRef.current?.();Also applies to: 1818-1823, 2631-2655
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/matrix/client/hooks/use-space-group-call.ts` around lines 865 - 871, clearLocalMediaBootstrapTimers is incorrectly clearing pairwise republish state; remove the call to clearPairwiseRepublishSchedule() from clearLocalMediaBootstrapTimers (and any bootstrap-timer cleanup paths invoked by startLocalMediaBootstrapSeries) so bootstrap timers only clear timer refs, and instead invoke clearPairwiseRepublishSchedule() only from teardown/cleanup flows such as runCleanup (and other teardown-only paths) so the post-join republish pass queued around startLocalMediaBootstrapSeries is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.ts`:
- Around line 49-52: The timer callback currently calls
flushScheduledPairwiseRepublish() without handling rejections, which can surface
unhandled promise errors; update the setTimeout callback that sets
republishDebounceTimer to call flushScheduledPairwiseRepublish().catch(...) (or
use try/catch with an async IIFE) and log or swallow the error so scheduler
failures are contained; locate the callback that references
republishDebounceTimer, republishPendingDelayMs, and
flushScheduledPairwiseRepublish and add a rejection handler to prevent unhandled
promise rejections.
---
Outside diff comments:
In `@packages/core/src/matrix/client/hooks/use-space-group-call.ts`:
- Around line 865-871: clearLocalMediaBootstrapTimers is incorrectly clearing
pairwise republish state; remove the call to clearPairwiseRepublishSchedule()
from clearLocalMediaBootstrapTimers (and any bootstrap-timer cleanup paths
invoked by startLocalMediaBootstrapSeries) so bootstrap timers only clear timer
refs, and instead invoke clearPairwiseRepublishSchedule() only from
teardown/cleanup flows such as runCleanup (and other teardown-only paths) so the
post-join republish pass queued around startLocalMediaBootstrapSeries is
preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dbe7f399-bb88-4513-9cc9-3aa50678094e
📒 Files selected for processing (10)
packages/core/src/matrix/client/hooks/__tests__/call-pairwise-republish-schedule.test.tspackages/core/src/matrix/client/hooks/__tests__/call-pairwise-restart.test.tspackages/core/src/matrix/client/hooks/__tests__/remote-call-media-stall.test.tspackages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.tspackages/core/src/matrix/client/hooks/call-pairwise-restart.tspackages/core/src/matrix/client/hooks/remote-call-media-stall.tspackages/core/src/matrix/client/hooks/use-space-group-call.tspackages/epics/src/common/global-call-dock-context.tsxpackages/epics/src/common/human-chat-panel/__tests__/call-feed-tile-video.test.tspackages/epics/src/common/human-chat-panel/human-chat-panel-call-stage.tsx
Autofix Review CompletedFixed 1 unresolved CodeRabbit comment. Commit: Skipped (already resolved): Validation: Branch: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.ts (1)
33-35: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winUse proper
GroupCalltype instead ofunknown.Per coding guidelines, TypeScript strict mode compliance requires avoiding untyped parameters. The
gcparameter is alwaysMatrixSdk.GroupCallat all call sites inuse-space-group-call.ts. Usingunknownhere bypasses type checking without providing safety benefits since the value is passed through without narrowing.♻️ Suggested fix
+import type { GroupCall } from 'matrix-js-sdk'; import { republishLocalMediaToPairwiseCalls } from './call-pairwise-restart'; /** Avoid SDP renegotiation races while ICE is still connecting. */ const REPUBLISH_DEBOUNCE_MS = 2_000; const REPUBLISH_MIN_INTERVAL_MS = 4_000; let republishDebounceTimer: ReturnType<typeof setTimeout> | null = null; -let republishPendingGc: unknown = null; +let republishPendingGc: GroupCall | null = null; let republishPendingForce = false; let republishPendingDelayMs = REPUBLISH_DEBOUNCE_MS; let lastRepublishCompletedAt = 0; // ... export function scheduleRepublishLocalMediaToPairwiseCalls( - gc: unknown, + gc: GroupCall, options?: { force?: boolean; delayMs?: number }, ): void {Based on learnings: "always flag usage of TypeScript
any... Suggest properly typed alternatives such as inferring types from Zod schemas or defining explicit TypeScript interfaces/types."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.ts` around lines 33 - 35, The gc parameter in scheduleRepublishLocalMediaToPairwiseCalls is typed as unknown; replace it with the concrete MatrixSdk.GroupCall type used by call sites (e.g., in use-space-group-call.ts) so the function signature becomes strongly typed and TypeScript strict mode can validate usages; update the parameter type and any related local references/exports to use MatrixSdk.GroupCall (or the appropriate imported alias) and add the necessary import for MatrixSdk if not already present.Sources: Coding guidelines, Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/operations/matrix-turn-hostinger-runbook.md`:
- Around line 320-324: Update the fenced code block that currently shows the
example log output to include a language specifier; replace the opening
backticks with ```text so the block begins with "```text" and retains the same
contents (the lines "incoming packet ALLOCATE processed, success", "incoming
packet CREATE_PERMISSION processed, error 403: Forbidden IP", and "peer usage:
... rp=0, rb=0, sp=0, sb=0") to follow markdown best practices for the log
example.
---
Outside diff comments:
In `@packages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.ts`:
- Around line 33-35: The gc parameter in
scheduleRepublishLocalMediaToPairwiseCalls is typed as unknown; replace it with
the concrete MatrixSdk.GroupCall type used by call sites (e.g., in
use-space-group-call.ts) so the function signature becomes strongly typed and
TypeScript strict mode can validate usages; update the parameter type and any
related local references/exports to use MatrixSdk.GroupCall (or the appropriate
imported alias) and add the necessary import for MatrixSdk if not already
present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8d0cc607-ef4a-4f88-90c7-178a34cb9d28
📒 Files selected for processing (3)
docs/operations/matrix-turn-hostinger-runbook.mdpackages/core/src/matrix/client/hooks/call-pairwise-republish-schedule.tspackages/core/src/matrix/client/hooks/use-space-group-call.ts
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/epics/src/common/human-chat-panel/call-feed-tile-video.ts (1)
122-141: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueUnused
isLocalFeedparameter.The
isLocalFeedparameter is destructured but never used in the function body. This appears intentional per the behavior change, but the dead code could confuse maintainers.♻️ Suggested fix: prefix with underscore to signal intentional omission
export function shouldPaintCallFeedVideoSurface(options: { hasVideo: boolean; warmingVideoTrack: boolean; videoSurfaceReady: boolean; liveVideoTrack: Pick<MediaStreamTrack, 'muted' | 'readyState'> | null; - isLocalFeed: boolean; + isLocalFeed: boolean; // Kept for API compatibility; currently unused }): boolean { const { hasVideo, warmingVideoTrack, videoSurfaceReady, liveVideoTrack, - isLocalFeed, + isLocalFeed: _isLocalFeed, } = options;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/epics/src/common/human-chat-panel/call-feed-tile-video.ts` around lines 122 - 141, The parameter isLocalFeed in shouldPaintCallFeedVideoSurface is intentionally unused and should be marked as such; update the function signature and its options destructuring to rename isLocalFeed to _isLocalFeed (preserving its type in the options shape) so the unused parameter is explicit, and adjust any references in the same file to match the new name if present; do not change function behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/epics/src/common/human-chat-panel/call-feed-tile-video.ts`:
- Around line 122-141: The parameter isLocalFeed in
shouldPaintCallFeedVideoSurface is intentionally unused and should be marked as
such; update the function signature and its options destructuring to rename
isLocalFeed to _isLocalFeed (preserving its type in the options shape) so the
unused parameter is explicit, and adjust any references in the same file to
match the new name if present; do not change function behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 727884c3-23a1-4c59-af1b-b9ca2f1effe1
📒 Files selected for processing (7)
docs/operations/matrix-turn-hostinger-runbook.mdpackages/core/src/matrix/client/hooks/__tests__/call-pairwise-restart.test.tspackages/core/src/matrix/client/hooks/call-pairwise-restart.tspackages/core/src/matrix/client/hooks/space-group-call-telemetry.tspackages/core/src/matrix/client/hooks/use-space-group-call.tspackages/epics/src/common/human-chat-panel/__tests__/call-feed-tile-video.test.tspackages/epics/src/common/human-chat-panel/call-feed-tile-video.ts
The 70s auto-recover ran full GroupCall.leave() + re-enter, which dropped screenshare, bounced the UI, and caused disconnect cascades between peers. Retry connection now only nudges pairwise WebRTC and republishes local tracks. Stall detection also treats inbound screenshare feeds as connected media. Co-authored-by: Cursor <cursoragent@cursor.com>
Stall recovery now treats remote feeds with no live audio/video tracks as unhealthy, not only participants missing a CallFeed entirely. Also nudge placeOutgoingCalls at 20s by default to recover slow pairwise WebRTC setup. Co-authored-by: Cursor <cursoragent@cursor.com>
Inbound video can work while outbound audio never publishes — stall detection now monitors local mic/camera health too. After 25s we hang up only the stuck pairwise MatrixCall(s) and re-place, instead of only nudging placeOutgoing. Manual retry does the same for all peers and resumes audio playback. Co-authored-by: Cursor <cursoragent@cursor.com>
Unblocks check-types CI after introducing hypha.group_call.pairwise_restart logging in the call reconnect stability work. Co-authored-by: Cursor <cursoragent@cursor.com>
GroupCall.updateLocalUsermediaStream only updates the local feed — when the callee answers before camera warms up, the caller never receives video. Push tracks into each MatrixCall on bootstrap, CallsChanged, and stall recovery. Co-authored-by: Cursor <cursoragent@cursor.com>
Throttle pairwise stream republish so bootstrap no longer re-pushes local A/V every few hundred ms. Hide video surfaces when tracks are muted with no frames so tiles fall back to avatar instead of black. Co-authored-by: Cursor <cursoragent@cursor.com>
Global stream fingerprint dedup skipped new MatrixCall peers and camera unmute events. Track publish state per call and restore bootstrap republish so remote tiles receive video without the prior churn loop. Co-authored-by: Cursor <cursoragent@cursor.com>
Destructure the memoized retry fn from useSpaceGroupCall so the dock wrapper closes over a stable reference instead of the whole call object. Co-authored-by: Cursor <cursoragent@cursor.com>
Cover muted tracks with dimensions, unmuted remote paint, and local camera warming paths for shouldMarkCallFeedVideoSurfaceReady and shouldPaintCallFeedVideoSurface. Co-authored-by: Cursor <cursoragent@cursor.com>
Voice processing rebuild dropped video tracks still in `new` state after enter(), leaving both sides without media. Wait for camera readiness on video join, republish after audio stream swap, and only show voice waves when a live audio track exists. Co-authored-by: Cursor <cursoragent@cursor.com>
Debounce pairwise stream republish, delay restarts to 45s, and treat warming muted tracks as healthy so hangup/republish loops do not trigger setRemoteDescription failures seen in the browser console. Co-authored-by: Cursor <cursoragent@cursor.com>
Prevent unhandled promise rejections when the debounced flush fails. Co-authored-by: Cursor <cursoragent@cursor.com>
Push camera tracks into active MatrixCall sessions on video join and camera on; slow stall repair nudges to reduce negotiation glare. Document coturn CREATE_PERMISSION 403 Forbidden IP in the Hostinger runbook. Co-authored-by: Cursor <cursoragent@cursor.com>
Audio-first sessions need a full pairwise re-place to negotiate video; also bind remote camera tracks while frames warm up to avoid black tiles. Co-authored-by: Cursor <cursoragent@cursor.com>
CodeRabbit: use ```text for the CREATE_PERMISSION symptom block in the Hostinger TURN runbook. Co-authored-by: Cursor <cursoragent@cursor.com>
Pairwise republish now passes forceAudio/forceVideo to MatrixCall.updateLocalUsermediaStream and upgrades calls that still lack a video sender, so camera tracks are not left disabled after audio-first negotiation. Co-authored-by: Cursor <cursoragent@cursor.com>
Probe camera permission before enter, then re-acquire video via SDK setLocalVideoMuted after voice-preset refresh when the local feed stayed audio-only. Adds local video fields to media_snapshot telemetry. Co-authored-by: Cursor <cursoragent@cursor.com>
Restart audio-first pairwise sessions after video join and on CallsChanged so both sides renegotiate with camera tracks; use SDK upgradeCall when the video sender is missing and shorten the stall watchdog to 15s. Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes check-types CI failure from duplicate identifier in SpaceGroupCallTelemetryEvent. Co-authored-by: Cursor <cursoragent@cursor.com>
Republish was racing ahead of placeOutgoingCalls() after hangup, so video tracks never reached new MatrixCall sessions. Also recover local camera when outbound video is unhealthy during stall detection. Co-authored-by: Cursor <cursoragent@cursor.com>
CallsChanged was scheduling a full hangup+re-place every 5s, retriggering negotiation and the warming banner. Cap resync to two attempts per call, republish without force on routine call changes, and keep dismiss state across brief healthy-media flickers. Co-authored-by: Cursor <cursoragent@cursor.com>
Pairwise republish read transient CallFeed mute state during mic recovery and disabled outbound audio. Use GroupCall mute intent instead, restore unmute after recovery, and align voice-preset track enabled state. Co-authored-by: Cursor <cursoragent@cursor.com>
Stable pairwise publish intent avoids disabling outbound audio when mic recovery temporarily flips gc.isMicrophoneMuted(). Force republish on new pairwise sessions and for all video joins regardless of camera state. Co-authored-by: Cursor <cursoragent@cursor.com>
Inbound tracks stay track.muted=true while RTP flows, so stall detection kept firing pairwise restarts and forced republishes. Treat live enabled tracks as healthy, republish only after real recovery, and slow restart. Co-authored-by: Cursor <cursoragent@cursor.com>
Pre-enter camera getUserMedia plus Matrix enter() and voice-preset refresh each opened their own capture stream. Query camera permission without getUserMedia before enter, reuse applyConstraints for voice preset, and only probe camera mid-call when the user toggles it on. Co-authored-by: Cursor <cursoragent@cursor.com>
Log selected candidate pair types and inbound RTP byte counts so support can tell whether failures are ICE, TURN relay, or app-layer publish issues. Co-authored-by: Cursor <cursoragent@cursor.com>
5093fa7 to
70bee38
Compare
Use for-of over RTCStatsReport and align iceGatherState telemetry field so check-types passes in CI. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
GroupCall.leave(), reset call state, andenterWithKind()again when remote media appeared stalled.Context
After recent call stability work, production calls were still unstable: participants disconnected one after another and the stage kept rearranging. Screen sharing was lost on disconnect with no way to recover the shared screen. This matches the auto leave/re-enter recovery introduced for remote media stalls.
Test plan
pnpm --filter @hypha-platform/core exec vitest run src/matrix/client/hooks/__tests__/remote-call-media-stall.test.ts src/matrix/client/hooks/__tests__/use-space-group-call.test.tsMade with Cursor
Summary by CodeRabbit