Skip to content

fix: nat lost in some p2p apps#2216

Open
21paradox wants to merge 1 commit intoEasyTier:mainfrom
21paradox:develop
Open

fix: nat lost in some p2p apps#2216
21paradox wants to merge 1 commit intoEasyTier:mainfrom
21paradox:develop

Conversation

@21paradox
Copy link
Copy Markdown
Contributor

reuse conn by dst_peer_id, every peer use only 1 quic conn, to fix nat lost problem

我遇到了丢nat的问题。场景是透明代理, 所有数据都通过 tun(gost自带)发到远端(自带的relay协议, 基于tcp),
跑p2p应用(erigon) 会有 0 caplin peer的问题。

经过调试,使用单个quic conn来处理所有连接(open_bi), 可以解决这个问题, pr的修改大致是这个意思

如果场景的连接数量过高,可以本地修改easytier/src/tunnel/quic.rs 的max_concurrent_bidi_streams配置为2000(默认256)

@KKRainbow KKRainbow requested a review from ZnqbuZ May 6, 2026 16:47
@KKRainbow
Copy link
Copy Markdown
Member

丢 NAT 是什么意思

@KKRainbow
Copy link
Copy Markdown
Member

不知道为啥没法 comment。这个 pr 有几个严重 bug 我 comment 不上去。
另外我觉得还是有必要调查一下为什么现在的形式会有问题,单纯的改成 connection 复用感觉只会隐藏问题

Comment thread easytier/src/gateway/quic_proxy.rs Outdated
}
}
// Try to reuse an existing QUIC connection for this peer
if let Some(conn) = self.conn_map.get(&dst_peer_id) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dashmap 拿到的引用,跨 await 会死锁

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quinn::Connection 是个句柄,应该直接 Clone

另外应该还需要清理闲置的 Connection

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

替换成moka::sync::Cachel解决这些问题了
Cache::builder()
.max_capacity(4096)
.time_to_idle(Duration::from_secs(600))
.build(),

@ZnqbuZ
Copy link
Copy Markdown
Contributor

ZnqbuZ commented May 6, 2026

只从 quic 的角度看确实是应该复用已有的 connection 的,我当时写的时候确实对 quic 不够熟悉

@ZnqbuZ
Copy link
Copy Markdown
Contributor

ZnqbuZ commented May 6, 2026

对于 transport_config,quic tunnel 和 quic proxy 应该需要不同的参数,这个有待测试

@21paradox
Copy link
Copy Markdown
Contributor Author

丢 NAT 是什么意思

容器里流量通过gost来转发(relay协议,流量封装成tcp, 类似vless),easytier就当作网络中转,而且只处理tcp的请求。
有些依赖p2p的应用(比如etherum的节点)找不到peers.
开启enable_quic_proxy 或者 enable_kcp_proxy 后有这个问题, 如果是默认udp没有问题, 但是默认udp速度只有100k-200k

Comment thread easytier/src/gateway/quic_proxy.rs Outdated
.context("quic write_chunk failed")?;

// Store the connection for future reuse
self.conn_map.insert(dst_peer_id, connection);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果有并发的 connect,都会走到这,后来的会把先来的挤掉

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接改用 moka 吧

Copy link
Copy Markdown
Contributor Author

@21paradox 21paradox May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照要求修改了, conn_locks 限制第一次的并发创建conn

Comment thread easytier/src/gateway/quic_proxy.rs Outdated
Comment thread easytier/src/gateway/quic_proxy.rs Outdated
Comment thread easytier/src/gateway/quic_proxy.rs Outdated
"quic connect: reused write_header failed peer={:?}, creating new",
dst_peer_id
);
self.conn_map.remove(&dst_peer_id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该要等所有链接都断开才 remove 吧

Comment thread easytier/src/gateway/quic_proxy.rs
@21paradox 21paradox force-pushed the develop branch 4 times, most recently from 8dee0fc to b16fec3 Compare May 8, 2026 00:51
Comment thread easytier/src/gateway/quic_proxy.rs Outdated
Comment thread easytier/src/gateway/quic_proxy.rs Outdated
@21paradox 21paradox force-pushed the develop branch 4 times, most recently from 73c1356 to a8ab9ab Compare May 8, 2026 06:46
Comment thread easytier/src/gateway/quic_proxy.rs
Comment thread easytier/src/gateway/quic_proxy.rs Outdated
Comment on lines +353 to +366
let connection = match get_or_create_conn(dst_peer_id).await {
Ok(conn) => conn,
Err(e) => {
if attempt == 0 {
debug!(
"quic connect attempt 0 failed={}, retrying after delay...",
e
);
tokio::time::sleep(Duration::from_millis(300)).await;
continue;
}
return Err(anyhow!("quic connect failed after retry: {}", e).into());
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast fallback 重点是同时尝试建立多个连接,取成功的那个,所以如果要有的话,必须放在 get_with 里面。但我觉得对 quic 来说也不太需要,试一次就行了

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写法修改了下,没有循环了, 手动retry一次,针对 conn.open_bi()失败的情况重试一次

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我搞错了,不好意思,我以为你想要保留 happy eyeballs,之前这个逻辑没什么问题,改简洁点就好了

for attempt in 0..2 {
    let endpoint = self.endpoint.clone();
    
    let connection = self.conn_map
        .try_get_with(dst_peer_id, async move {
            endpoint.connect(addr, "")?.await?
        })
        .await
        .context("failed to get or create quic connection")?;

    let stream = async {
        let mut stream: QuicStream = connection.open_bi().await?.into();
        stream.writer_mut().write_chunk(header.clone()).await?; 
        Ok(stream) 
    }.await;

    match stream {
        Ok(stream) => return Ok(stream),
        Err(error) => {
            debug!(?dst_peer_id, attempt, ?error, "quic connect: stream setup failed");
        }
    }

    if attempt == 0 {
        self.conn_map.invalidate(&dst_peer_id).await;
        tokio::time::sleep(Duration::from_millis(300)).await;
    }
}

Comment thread easytier/src/gateway/quic_proxy.rs Outdated
Comment thread easytier/src/gateway/quic_proxy.rs
Comment on lines +878 to +879
.max_capacity(4096)
.time_to_idle(Duration::from_secs(600))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capacity (resp. tti) 应该和 quinn transport_config 中指定的连接数上限(resp. 连接 tti)差不多,加个注释说明一下好了

@ZnqbuZ
Copy link
Copy Markdown
Contributor

ZnqbuZ commented May 8, 2026

另外只开启 enable_kcp_proxy 不开启 enable_quic_proxy 的时候也有这个问题吗?

@21paradox
Copy link
Copy Markdown
Contributor Author

另外只开启 enable_kcp_proxy 不开启 enable_quic_proxy 的时候也有这个问题吗?

我现在不确定kcp是否有问题了,需要再观察一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants