feat(proto): Better NAT probe retry intervals by flub · Pull Request #623 · n0-computer/noq

flub · 2026-04-29T14:38:36Z

Description

This improves the NAT probing intervals: it now starts at 33ms by
default and does an exponential backoff capping at 2s intervals. There
are now 7 probe attempts spanning a period 4 seconds.

Fixes #569

Breaking Changes

n/a

Notes & open questions

4s is a bit longer than I initially thought, but I think it's
fine. Maybe we could remove 1 or even 2 further attempts. Removing 1
gets you to 2s though. Alternatively the interval cap could be lowered
a bit and the initial interval a tiny bit higher. But really I'm only
writing this down for nice git history, I think the current tradeoff
is a good start.

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
Tests if relevant.

This improves the NAT probing intervals: it now starts at 33ms by default and does an exponential backoff capping at 2s intervals. There are now 7 probe attempts spanning a period 4 seconds.

github-actions · 2026-04-29T14:41:52Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/623/docs/noq/

Last updated: 2026-05-06T08:23:27Z

flub · 2026-04-29T14:43:15Z

Patchbay is sad until n0-computer/iroh#4213 is merged

github-actions · 2026-04-29T14:48:10Z

Performance Comparison Report

4759df8196f8770ebe6760aeb1d3b80c2abce083 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5837.7 Mbps	7930.8 Mbps	-26.4%	97.0% / 98.6%
medium-concurrent	5561.1 Mbps	7758.9 Mbps	-28.3%	95.7% / 97.5%
medium-single	4109.1 Mbps	4620.5 Mbps	-11.1%	95.7% / 98.1%
small-concurrent	3909.1 Mbps	5112.4 Mbps	-23.5%	97.4% / 99.6%
small-single	3572.8 Mbps	4687.6 Mbps	-23.8%	96.2% / 98.4%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3114.8 Mbps	4062.0 Mbps	-23.3%
lan	782.4 Mbps	810.3 Mbps	-3.5%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 23.0% slower on average

---

6e2ebb5cea895e55eb52ea1e4a4974959bb51291 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5878.5 Mbps	7903.9 Mbps	-25.6%	96.5% / 151.0%
medium-concurrent	5515.6 Mbps	7641.0 Mbps	-27.8%	96.1% / 150.0%
medium-single	3826.4 Mbps	4749.5 Mbps	-19.4%	97.7% / 149.0%
small-concurrent	3833.1 Mbps	5299.1 Mbps	-27.7%	93.4% / 102.0%
small-single	3562.7 Mbps	4847.0 Mbps	-26.5%	94.8% / 102.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3114.5 Mbps	4064.6 Mbps	-23.4%
lan	782.4 Mbps	810.4 Mbps	-3.5%
lossy	69.9 Mbps	67.3 Mbps	+3.8%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 24.8% slower on average

---

99fb41ccfc9a6f28f6db544a8ecf4a56fe7f8d24 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5417.7 Mbps	7885.4 Mbps	-31.3%	94.7% / 99.0%
medium-concurrent	5434.2 Mbps	7681.9 Mbps	-29.3%	95.3% / 101.0%
medium-single	3864.8 Mbps	4749.0 Mbps	-18.6%	98.8% / 152.0%
small-concurrent	3830.3 Mbps	5494.3 Mbps	-30.3%	90.9% / 99.2%
small-single	3457.7 Mbps	4890.4 Mbps	-29.3%	93.3% / 103.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3075.0 Mbps	4004.5 Mbps	-23.2%
lan	782.4 Mbps	810.3 Mbps	-3.4%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 27.1% slower on average

matheus23 · 2026-04-30T10:21:43Z

+    pub(crate) fn queue_retries(&mut self, ipv6: bool) {
+        self.attempt += 1;


We now have both attempt as well as ProbeState::Active(remaining).

Attempts are counted up, the remaining attempts for each probe are counted down.

Can we make it so that we only have attempt? I think all remaining values are basically the same across all probes, unless one of them is IPv6 in an ipv4-only socket.

The major design choice is to keep sending probes to all remote candidates at the same time, rather than keep a timer for each probe individually. You can gain a new address mid-round: e.g. if the REACH_OUT frames are spread over two packets you'll get new probes mid-round. In this case the retry timers are all set from the first probe, not from later added ones. This is also the reason to keep attempt outside of the remaining, because they track very distinct numbers.

The other choice would be to track the delay for each probe individually. This would be technically be better, but the cost is that we need to set the timer a lot more. The way I see this working is to have the retry_delay function iterate over all the probes, compute the delay for each and return the shortest delay. Then when you queue retries you do the same: iterate over all probes and retry all those for which the timer has expired. And if we do all that we wouldn't have to keep two fields around.

I was kind of shying away from that 2nd version as I worried it might just be too much. But then maybe as we improve our incremental probing we'll end up having to do that anyway so maybe it is better to go straight for that? What do you think after this whole explanation?

Although it reminds me of the PTO (or perhaps because of that) I'd say the second version is overkill.

I still don't think it's necessary to keep remaining around. IMO it can be a basic "succeeded/not succeeded".
When we receive the REACH_OUT on the second packet, well then we just sent the probes a little later.
If we receive it after the first retry attempt, well then we'll have one less retry with those probes, that's fine, we're not being very precise with the retry timing here anyways (after all, we don't do the second version!).

I can see how attempt and remaining could be separate values, but IMO the complexity is not worth it. The only effect of having them separate seems to be sending the later-arriving REACH_OUT candidates for potentially a couple more retires.

matheus23

I like this a lot - but I'd prefer to fix the canonicalization issue ASAP, if possible, and then get the iroh PR with the noq update in, and then see if this gets through patchbay tests fine.

Frando

(did not actually review)

feat(proto): Better NAT probe retry intervals

4759df8

This improves the NAT probing intervals: it now starts at 33ms by default and does an exponential backoff capping at 2s intervals. There are now 7 probe attempts spanning a period 4 seconds.

flub requested review from dignifiedquire and matheus23 April 29, 2026 14:39

n0bot Bot added this to iroh Apr 29, 2026

github-project-automation Bot moved this to 🚑 Needs Triage in iroh Apr 29, 2026

flub moved this from 🚑 Needs Triage to 👀 In review in iroh Apr 29, 2026

flub added this to the iroh v1.0.0-rc.0 milestone Apr 29, 2026

matheus23 reviewed Apr 30, 2026

View reviewed changes

flub added 2 commits May 5, 2026 16:07

Merge branch 'main' into flub/nat-probe-retry-timing

6e2ebb5

Merge branch 'main' into flub/nat-probe-retry-timing

99fb41c

flub enabled auto-merge May 6, 2026 08:45

flub disabled auto-merge May 6, 2026 09:00

flub enabled auto-merge May 6, 2026 09:00

Frando approved these changes May 6, 2026

View reviewed changes

flub added this pull request to the merge queue May 6, 2026

Merged via the queue into main with commit 262697c May 6, 2026
64 of 65 checks passed

flub deleted the flub/nat-probe-retry-timing branch May 6, 2026 09:16

github-project-automation Bot moved this from 👀 In review to ✅ Done in iroh May 6, 2026

		pub(crate) fn queue_retries(&mut self, ipv6: bool) {
		self.attempt += 1;

Conversation

flub commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Breaking Changes

Notes & open questions

Change checklist

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flub commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison Report

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Uh oh!

matheus23 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

flub Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

matheus23 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

matheus23 left a comment

Choose a reason for hiding this comment

Uh oh!

Frando left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flub commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading