-
Notifications
You must be signed in to change notification settings - Fork 91
Baseten performance client/total timeout #2037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| max_chars_per_request: Option<usize>, | ||
| timeout_s: f64, | ||
| hedge_delay: Option<f64>, | ||
| total_timeout_s: Option<f64>, | ||
| max_retries: Option<i64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update node bindings for added timeout/retry params
The core API now requires total_timeout_s and max_retries for process_embeddings_requests (and the other batched helpers), but the Node bindings still invoke these methods with the old argument list (e.g., node_bindings/src/lib.rs calls them only through hedge_delay). Building the workspace or running cargo build -p node_bindings now fails with a mismatched-argument-count compile error because the new parameters are never supplied.
Useful? React with 👍 / 👎.
|
@codex review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
truss/baseten-performance-client/node_bindings/src/lib.rs
Lines 210 to 214 in 3bd2fd2
| let max_concurrent_requests = | |
| max_concurrent_requests.unwrap_or(DEFAULT_CONCURRENCY as u32) as usize; | |
| let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE as u32) as usize; | |
| let timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S); | |
| let max_chars_per_request = max_chars_per_request.map(|x| x as usize); |
The new Node bindings expose total_timeout_s, but the per‑request timeout is still defaulted to DEFAULT_REQUEST_TIMEOUT_S before calling the core client (timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S) here). The core validator rejects any total timeout smaller than the per‑request timeout (see core/src/split_policy.rs lines 72‑83), so a call like embed(..., total_timeout_s=30.0) now fails immediately with InvalidParameter unless the caller also lowers timeout_s explicitly. That makes the new option effectively unusable for typical "short overall timeout" use cases; consider deriving timeout_s from total_timeout_s when it is provided (as the Python bindings do) so callers don’t have to set both.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| max_concurrent_requests: usize, | ||
| timeout_s: f64, | ||
| hedge_delay: Option<f64>, | ||
| total_timeout_s: Option<f64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch post total_timeout_s conflicts with default timeout
The Python batch_post entry point keeps timeout_s as a required f64 with a 3600s default, even though the core batch‑post handler now rejects any total_timeout_s shorter than the per‑request timeout (core/src/client.rs lines 599‑609). Passing a shorter overall timeout (e.g., total_timeout_s=30) while leaving the default timeout_s causes an InvalidParameter error instead of honoring the total timeout, unlike the embed/rerank/classify methods where timeout_s falls back to the total timeout. This makes the new total timeout parameter unusable unless callers also override timeout_s manually.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
truss/baseten-performance-client/node_bindings/src/lib.rs
Lines 210 to 214 in 3bd2fd2
| let max_concurrent_requests = | |
| max_concurrent_requests.unwrap_or(DEFAULT_CONCURRENCY as u32) as usize; | |
| let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE as u32) as usize; | |
| let timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S); | |
| let max_chars_per_request = max_chars_per_request.map(|x| x as usize); |
embed keeps timeout_s at DEFAULT_REQUEST_TIMEOUT_S even when the caller supplies total_timeout_s, but the core validator rejects any total timeout shorter than the per-request timeout. A Node caller passing only total_timeout_s=10 therefore gets an InvalidParameter because timeout_s remains 3600s. The Python bindings already derive timeout_s from total_timeout_s to avoid this; the Node bindings need the same reconciliation (or to lower timeout_s when a total timeout is provided) so the new parameter is usable.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| max_concurrent_requests, | ||
| timeout_s, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batch_post rejects provided total_timeout_s unless timeout_s is also lowered
batch_post always forwards the default per-request timeout_s (3600s) while also accepting total_timeout_s, but the core process_batch_post_requests validation requires the total timeout to be at least the per-request timeout. Calling batch_post(..., total_timeout_s=10) therefore errors out even though the caller set a total timeout, because timeout_s is still 3600. The method should reconcile timeout_s with total_timeout_s (as other bindings do) or otherwise reduce timeout_s when a total timeout is provided so the new option works.
Useful? React with 👍 / 👎.
🚀 What
💻 How
🔬 Testing