perf: device-task hot-path cleanups and dedicated rclrs spin thread#13
Merged
Conversation
…ffers) Bundle of low-risk changes: - Cargo.toml: add [profile.release] with fat LTO, codegen-units = 1, strip = "debuginfo" for ~5–15% runtime and ~30–60% binary size wins. Trim tokio features from "full" to only the set actually used (rt-multi-thread, macros, sync, time, signal). - UBX parsers: add Vec::with_capacity in NAV-SAT, SEC-SIG, SEC-SIGLOG and MON-COMMS parse paths to avoid the usual 4→8→16→32→64 growth cascade per message. - Device/NTRIP task state: split internal state into a Mutex-guarded struct plus sibling AtomicU64/AtomicU32 counters. Hot-path counter increments (messages_received, rtcm_bytes_injected, bytes_received, messages_forwarded, connection_count) become lock-free fetch_add. The public snapshot type is unchanged. - Device task: bump the serial read buffer from 1 KiB to 4 KiB so that a single USB-CDC frame carrying NAV-SAT + RTCM doesn't force multiple read syscalls and partial-frame parser state. All 183 lib tests pass under cargo test --features ros2.
The device task's UbxProcessResult contains messages that are consumed purely by the integrity aggregator (NAV-COV, NAV-POSECEF, SEC-SIGLOG, RXM-COR, MON-COMMS, MON-HW, MON-RF, and fix-type transitions). They were being cloned, enum-wrapped, sent across the mpsc channel, and then dropped by the forwarding bridge because into_gnss_message() returned None for those variants. Handle them in place: call integrity.update_*() and skip the channel send. This removes 7-8 clones and the corresponding channel sends per GNSS epoch (~70-160 heap operations/sec at 10 Hz) and shrinks both DeviceMessage and the main.rs forwarding bridge. into_gnss_message() now returns GnssMessage directly. The forwarding bridge becomes unconditional. No ROS topic content changes because none of the removed variants was ever published.
After filter-before-send (previous commit), the four heavy payloads that still crossed mpsc::channel — SatInfo, SecSig, RelPosNed, SurveyIn — were being cloned by the device task before being sent. Move ownership instead: update the integrity aggregator with a borrow, then send the value by move. While auditing: the IntegrityAggregator kept eight Option<T> "last_*" fields, but only `last_nav_pl` is ever read back (by compute()). The other seven were write-only dead retention, each adding a per-epoch clone on top of the one already being sent through the channel. Drop those fields and their assignments; the update_*() functions still take the struct by reference to read the specific metric they care about. Net effect per GNSS epoch: zero deep clones in process_serial_data for the four heavy message types, and zero retention clones in the integrity aggregator for the seven unused snapshots.
The old main loop polled `executor.spin(spin_once().timeout(100ms))` with a `sleep(10ms)` between iterations — waking the main tokio thread ~9×/sec and churning the rcl wait set for no reason. Move the executor onto its own OS thread running `spin(SpinOptions::default())`: it blocks in `rcl_wait` and wakes the instant a callback is ready. Shutdown goes through `SpinOptions::until_promise_resolved` (the mechanism that flips the executor's `halt_spinning` flag *and* triggers its guard conditions to wake a blocked wait set), wired to a tokio oneshot that's dropped on shutdown. Effect: at low message rates the polling loop was costing roughly half the steady-state CPU — that goes away. Control-plane callbacks (parameter events, service calls, timers) are serviced immediately rather than within the ~110ms polling cycle. The data path is publish-only and never went through the executor, so publish→subscribe latency is unchanged. All 183 lib tests pass; clean startup/spin/publish/shutdown verified against a live F9P in standalone mode.
4f199f5 to
7f9a29b
Compare
4451074 to
9327023
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four perf commits plus a CHANGELOG entry for v0.2.0. Reduces release binary size by ~22% and steady-state CPU by 35–50% across all five user-facing modes. No public API or topic-content changes.
Commits
dafab66perf: tactical CPU/memory optimisations (release profile, atomics, buffers)ba01b89perf: drop internal-only UBX payloads from the device channelc3f7a05perf: eliminate remaining hot-path clones in the device taskd52a97bperf: run the rclrs executor on a dedicated thread instead of polling642ce62docs: v0.2.0 CHANGELOG entryWhat changed
Device-task hot path (
dafab66,ba01b89,c3f7a05). An explicit[profile.release]block (fat LTO,codegen-units = 1,strip = "debuginfo") accounts for the binary size reduction. The clone-elimination commits remove ~9 per-epoch heap operations at 5 Hz worst-case load: internal-only UBX variants (NAV-COV, NAV-POSECEF, SEC-SIGLOG, RXM-COR, MON-COMMS/HW/RF, FixTypeChanged) no longer cross the mpsc channel; the remaining heavy payloads (SatInfo, SecSig, RelPosNed, SurveyIn) are moved into the channel send rather than cloned; seven write-only retained-message fields inIntegrityAggregatorwere removed. Other tactical:Vec::with_capacityin the UBX parsers,AtomicU64/U32counter relocation out from under the asyncMutexinDeviceTask/NtripTask, serial read buffer 1 KiB → 4 KiB,tokiofeatures trimmed from"full"to["rt-multi-thread", "macros", "sync", "time", "signal"].Dedicated rclrs spin thread (
d52a97b). Replaces the polling loop driving the rclrs executor (executor.spin(SpinOptions::spin_once().timeout(100ms)) + sleep(10ms)on the main tokio thread) with a dedicated OS thread runningspin(SpinOptions::default()), which blocks inrcl_waitand returns the instant a callback is ready. Shutdown usesSpinOptions::until_promise_resolved(the rclrs mechanism that flipshalt_spinningand triggers the wait set's guard conditions to wake), wired to a tokio oneshot dropped on shutdown. Effect: the main tokio thread no longer wakes ~9 times per second to drain an empty wait set, and ROS control-plane callbacks (parameter events, service calls, timers) are no longer bounded by the polling cycle. The publish path is unaffected —publisher.publish()is called directly from the tokio side — so data-plane fix-to-subscriber latency is unchanged.Headline numbers (vs v0.1.0)
Measured across 30–60 min runs at 5 Hz worst-case load (all four constellations dual-band, every feature each mode allows), live hardware A/B on the same antennas:
/fixe2e p50: unchanged in standalone/moving_base; rover_ntrip improved (some of which is rmw_zenoh delivery variance run-to-run); tail (p99/max) trends favourable/rosoutWARN/ERRORAll 183 library tests pass unchanged.
Modes covered
standalone,rover_ntrip,moving_base,moving_base_rover,rover_radio.static_basewas not exercised — none of these commits touch the survey-in or RTCM-output code paths.