Skip to content

Fix #1090: don't drop USB reports on transient hub stalls#1539

Open
semnil wants to merge 4 commits intoUltimateHackingKeyboard:masterfrom
semnil:fix/issue-1090-usb-resend-state
Open

Fix #1090: don't drop USB reports on transient hub stalls#1539
semnil wants to merge 4 commits intoUltimateHackingKeyboard:masterfrom
semnil:fix/issue-1090-usb-resend-state

Conversation

@semnil
Copy link
Copy Markdown
Contributor

@semnil semnil commented May 5, 2026

Summary

Fixes #1090. Under transient USB hub stalls, the previous Hid_Send*Report() failure path silently committed unsent reports as the "previously sent" buffer in the double-buffer scheme, breaking the host's view of key state. User-visible symptoms:

  • Stuck-key auto-repeat — a dropped release report leaves the key in the host's pressed set until the next press happens to differ.
  • Dropped short presses — same mechanism, with a press dropped before its release lands.
  • Swapped consecutive bigrams (e.g. typing se produced es) — both keys merged into a single multi-key report whose slot order matches matrix order, not press order.
  • Disturbance to other HID devices on the same hub during sustained UHK input, because the unrate-limited retry path saturated the upstream IN-token queue.

Changes

Three commits for the fix plus one independent UHK60 build prerequisite:

  1. 9616cb4 Preserve report state on USB send failure — branch on ret != 0; never call switchActive*Report() when the send fails. The next merge cycle rebuilds the active buffer from the latest cached state and CheckReportReady retries once the link recovers.
  2. a8aa0d4 Throttle USB resend retries via EventScheduler — replace the synchronous EventVector_Set(EventVector_ResendUsbReports) with EventScheduler_Schedule(now + 4 ms, EventSchedulerEvent_UsbResend, ...). Keyboard / controls / mouse retry wakeups collapse into a single ~250 Hz upper bound across all three paths, freeing the hub upstream queue between attempts.
  3. 373cf4f Rename ShouldResendReportIsWithinResendWindow — the helper now only manages the per-counter retry window (consumed for log routing in usb_report_updater.c and as the resend-loop predicate in device/src/messenger.c); the new name reflects that role.
  4. 8b36cd3 Move logger.h include in debug.c — independent of UHK60 v2 getting stuck on infinite key repeat #1090 but bundled as a prerequisite UHK60 build fix: Debug_RecordBleSendResult() references LogU() in a dead-code branch and GCC 15 promotes implicit-declaration warnings to errors.

Applied identically to the keyboard, controls, and mouse send paths. New EventSchedulerEvent_UsbResend added to right/src/event_scheduler.{h,c}.

Detailed design notes

Full root-cause analysis, per-symptom mechanism walkthrough, and pre/post-fix mermaid sequence diagrams are kept on a separate docs branch to keep this PR focused on code:

semnil added 4 commits May 5, 2026 06:41
Debug_RecordBleSendResult() references LogU() inside a
DEBUG_BLE_LATENCY_STATS=false dead-code branch.  The declaration of
LogU() lives in logger.h, which until now was only included in the
__ZEPHYR__ branch of debug.c.  GCC <= 14 emitted -Wimplicit-function-
declaration as a warning when this branch was compiled for UHK60, so
the build silently passed; GCC 15 promotes that warning to an error
by default and breaks the UHK60 build.

logger.h itself is platform agnostic (it only pulls in device.h and
stdint.h, and is already included unconditionally by main.c,
mouse_controller.c, trace.c, usb_report_updater.c, and others), so
the simplest fix is to move the include out of the Zephyr-only block.

No behavior change: the affected code is dead under the current
DEBUG_BLE_LATENCY_STATS setting.
…ard#1090).

When Hid_SendKeyboardReport / Hid_SendControlsReport / Hid_SendMouseReport
returns a non-zero status (transient USB hub stall, busy endpoint, etc.)
and the retry helper ShouldResendReport() decides to give up after its
~128 ms window, the previous code took the same path as a successful
send: it cleared *NeedsResending and called switchActive*Report().

Switching the active buffer on a failed send is destructive.  After the
switch, the buffer that held the unsent report becomes the "previously
sent" buffer in the double-buffer scheme, so subsequent CheckReportReady
comparisons are made against state that was never delivered to the host.
The most visible consequence is that release reports get dropped on a
flaky link: the host keeps the key in its pressed set and auto-repeats
forever, until the next press happens to produce a buffer difference
that finally clears the stuck key.  Short presses can disappear
entirely, and apparent input-order swaps can occur for the same reason
on cross-hand bigrams.

The retry loop also has no rate limiting between attempts, so during a
hub stall the firmware re-attempts on every main-loop iteration.  This
hammers the upstream side of any shared USB hub heavily enough that
*other* HID devices on the same hub start losing reports too, which
matches the user-visible symptom of a different keyboard, plugged into
the same hub, also exhibiting infinite repeat while the UHK is
struggling.

Fix:

  - Branch on `ret != 0` rather than the result of ShouldResendReport().
  - On any failure, keep *NeedsResending = true and re-arm
    EventVector_ResendUsbReports.  Crucially, do not switch the active
    buffer.  The next mergeReports() rebuilds the active buffer from
    the latest cached state and CheckReportReady will retry the send
    once the link recovers.
  - Still call ShouldResendReport() so its retry counter stays in sync;
    use its return value to decide between reportRetry() (transient,
    log nothing) and handleFail() (long stall, log and on Zephyr try a
    reconnect).  The "give up" path no longer drops the report.
  - On success, call ShouldResendReport(true, ...) so the retry counter
    is reset for the next failure window.

Applied identically to the keyboard, controls, and mouse send paths.

Tested on UHK60 v1 over a USB hub that previously triggered the stuck
release behavior with stock v17.0.0; the regression no longer
reproduces and other devices on the same hub also stop misbehaving
while the UHK is in heavy use.
After a Hid_Send*Report() failure, the previous code (including the
prior commit's state-preservation fix) re-armed EventVector_ResendUsbReports
synchronously.  EventVector_ResendUsbReports is part of EventVector_ReportUpdateMask,
so RunUserLogic() calls UpdateUsbReports() again on the very next
main-loop iteration.  An idle main loop iteration is short (sub-100us
on UHK60), so during a hub stall the firmware re-attempts the send at
roughly the main-loop rate.  This saturates the USB hub upstream IN-token
queue and disturbs the periodic interrupt transfers of every other HID
device that shares the hub.

Replace the immediate flag set with a 4 ms delayed schedule:

  EventScheduler_Schedule(now + USB_RESEND_DELAY_MS,
                          EventSchedulerEvent_UsbResend,
                          "usb-resend");

A new EventSchedulerEvent_UsbResend processes by setting
EventVector_ResendUsbReports, so the existing UpdateUsbReports() entry
point is unchanged.  EventScheduler keeps only the earliest scheduled
time for a given event id, so the three send paths (keyboard, controls,
mouse) collapse into a single scheduled wakeup when they fail in the
same cycle.

USB_RESEND_DELAY_MS is set to 4 ms.  USB Full Speed SOF is 1 ms, so this
gives the hub four frames of headroom between attempts.  At a typical
30-50 ms keystroke duration this latency is well below any human
perception threshold and is paid only when the link has already failed
once; the initial send remains immediate.

Applied identically to the keyboard, controls, and mouse send paths.
The main thread now sleeps via __WFI() between failed attempts instead
of spinning, which also lowers CPU activity during a hub stall.
The helper only manages a per-counter retry window now.  It is consumed
as a log-routing predicate in usb_report_updater.c (transient retry vs
fatal) and as the resend-loop condition in messenger.c.  The previous
name implied the helper drove sending, which is no longer accurate
after the state-preservation fix moved that responsibility elsewhere.

Also corrects the stale "Try resending a report for 512ms" header
comment; the actual window is 128 ms.
@cla-sign-bot
Copy link
Copy Markdown

cla-sign-bot Bot commented May 5, 2026

Thank you for your contribution!
Please sign the CLA.

@cla-sign-bot
Copy link
Copy Markdown

cla-sign-bot Bot commented May 5, 2026

The CLA sign PR can contain only the latest CLA file!

@semnil semnil force-pushed the fix/issue-1090-usb-resend-state branch from 63af70b to 373cf4f Compare May 5, 2026 01:06
@cla-sign-bot
Copy link
Copy Markdown

cla-sign-bot Bot commented May 5, 2026

Thank you for your contribution!
Please sign the CLA.

@semnil
Copy link
Copy Markdown
Contributor Author

semnil commented May 5, 2026

#1540

@semnil semnil marked this pull request as ready for review May 5, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UHK60 v2 getting stuck on infinite key repeat

1 participant