The Problem
The following commit extracts the transport of a RAFT client, and reuses it as the transport of Orchestrator API reverse proxy: b0aa7b8
Previously, the proxy's transport was http.DefaultTransport with ResponseHeaderTimeout set to 30 seconds. After the mentioned change, ResponseHeaderTimeout is set to config.ActiveNodeExpireSeconds, which is equal to 5 seconds.
It looks Ok for a RAFT client to have a short timeout, though for a general API request it is too short.
A practical example: for the infrastructure managed by my team the call to graceful-master-takeover API usually takes 10-15 seconds. It means that we are never able to get a successful response from this API endpoint. (Fortunately, only reverse proxy transport is timed out, and the takeover itself keep running till the end).
Related Code
The transport for reverse proxy is set here:
|
proxy.Transport, err = orcraft.GetRaftHttpTransport() |
Right now a single transport instance is defined and cached in GetRaftHttpTransport here:
|
func GetRaftHttpTransport() (*http.Transport, error) { |
|
// Checks whether there is a cached httpTransport to return: |
|
if httpTransport != nil { |
|
return httpTransport, nil |
|
} |
|
httpTimeout := time.Duration(config.ActiveNodeExpireSeconds) * time.Second |
Note that config.ActiveNodeExpireSeconds is hardcoded and can not be changed via a config file.
Also, note that the most of RAFT API do not use the reverse proxy:
|
this.registerAPIRequestNoProxy(m, "raft-yield/:node", this.RaftYield) |
|
this.registerAPIRequestNoProxy(m, "raft-yield-hint/:hint", this.RaftYieldHint) |
|
this.registerAPIRequestNoProxy(m, "raft-peers", this.RaftPeers) |
|
this.registerAPIRequestNoProxy(m, "raft-state", this.RaftState) |
|
this.registerAPIRequestNoProxy(m, "raft-leader", this.RaftLeader) |
|
this.registerAPIRequestNoProxy(m, "raft-health", this.RaftHealth) |
|
this.registerAPIRequestNoProxy(m, "raft-status", this.RaftStatus) |
|
this.registerAPIRequestNoProxy(m, "raft-snapshot", this.RaftSnapshot) |
|
this.registerAPIRequestNoProxy(m, "raft-follower-health-report/:authenticationToken/:raftBind/:raftAdvertise", this.RaftFollowerHealthReport) |
|
this.registerAPIRequestNoProxy(m, "reload-configuration", this.ReloadConfiguration) |
|
this.registerAPIRequestNoProxy(m, "hostname-resolve-cache", this.HostnameResolveCache) |
|
this.registerAPIRequestNoProxy(m, "reset-hostname-resolve-cache", this.ResetHostnameResolveCache) |
Proposed Solution
To deal with the issue, I would like to make the following changes:
- Define a separate transport for
raftReverseProxy.
- Make the timeout for
raftReverseProxy transport configurable, default to 30 seconds.
This way, RAFT clients will still time out after 5 seconds, and I, as a user, will be able to configure the reverse proxy timeout for regular Orchestrator API requests.
Please let me know if the proposed solution makes sense, and if it is Ok if I make a PR with related changes.
The Problem
The following commit extracts the transport of a RAFT client, and reuses it as the transport of Orchestrator API reverse proxy: b0aa7b8
Previously, the proxy's transport was
http.DefaultTransportwithResponseHeaderTimeoutset to 30 seconds. After the mentioned change, ResponseHeaderTimeout is set to config.ActiveNodeExpireSeconds, which is equal to 5 seconds.It looks Ok for a RAFT client to have a short timeout, though for a general API request it is too short.
A practical example: for the infrastructure managed by my team the call to
graceful-master-takeoverAPI usually takes 10-15 seconds. It means that we are never able to get a successful response from this API endpoint. (Fortunately, only reverse proxy transport is timed out, and the takeover itself keep running till the end).Related Code
The transport for reverse proxy is set here:
orchestrator/go/http/raft_reverse_proxy.go
Line 41 in 1754ca9
Right now a single transport instance is defined and cached in
GetRaftHttpTransporthere:orchestrator/go/raft/http_client.go
Lines 39 to 44 in 1754ca9
Note that
config.ActiveNodeExpireSecondsis hardcoded and can not be changed via a config file.Also, note that the most of RAFT API do not use the reverse proxy:
orchestrator/go/http/api.go
Lines 3953 to 3964 in 1754ca9
Proposed Solution
To deal with the issue, I would like to make the following changes:
raftReverseProxy.raftReverseProxytransport configurable, default to 30 seconds.This way, RAFT clients will still time out after 5 seconds, and I, as a user, will be able to configure the reverse proxy timeout for regular Orchestrator API requests.
Please let me know if the proposed solution makes sense, and if it is Ok if I make a PR with related changes.