Baseten performance client/total timeout #2037

michaelfeil · 2025-11-19T02:09:49Z

🚀 What

💻 How

🔬 Testing

michaelfeil · 2025-11-19T02:10:14Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-19T02:16:52Z

baseten-performance-client/core/src/client.rs

        max_chars_per_request: Option<usize>,
        timeout_s: f64,
        hedge_delay: Option<f64>,
+        total_timeout_s: Option<f64>,
+        max_retries: Option<i64>,


Update node bindings for added timeout/retry params

The core API now requires total_timeout_s and max_retries for process_embeddings_requests (and the other batched helpers), but the Node bindings still invoke these methods with the old argument list (e.g., node_bindings/src/lib.rs calls them only through hedge_delay). Building the workspace or running cargo build -p node_bindings now fails with a mismatched-argument-count compile error because the new parameters are never supplied.

Useful? React with 👍 / 👎.

michaelfeil · 2025-11-21T07:10:25Z

@codex review.

chatgpt-codex-connector

💡 Codex Review

truss/baseten-performance-client/node_bindings/src/lib.rs

Lines 210 to 214 in 3bd2fd2

    
           let max_concurrent_requests = 
        
             max_concurrent_requests.unwrap_or(DEFAULT_CONCURRENCY as u32) as usize; 
        
           let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE as u32) as usize; 
        
           let timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S); 
        
           let max_chars_per_request = max_chars_per_request.map(|x| x as usize);

Make Node total_timeout_s usable without manual timeout override

The new Node bindings expose total_timeout_s, but the per‑request timeout is still defaulted to DEFAULT_REQUEST_TIMEOUT_S before calling the core client (timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S) here). The core validator rejects any total timeout smaller than the per‑request timeout (see core/src/split_policy.rs lines 72‑83), so a call like embed(..., total_timeout_s=30.0) now fails immediately with InvalidParameter unless the caller also lowers timeout_s explicitly. That makes the new option effectively unusable for typical "short overall timeout" use cases; consider deriving timeout_s from total_timeout_s when it is provided (as the Python bindings do) so callers don’t have to set both.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-21T07:17:15Z

baseten-performance-client/python_bindings/src/lib.rs

        max_concurrent_requests: usize,
        timeout_s: f64,
        hedge_delay: Option<f64>,
+        total_timeout_s: Option<f64>,


Batch post total_timeout_s conflicts with default timeout

The Python batch_post entry point keeps timeout_s as a required f64 with a 3600s default, even though the core batch‑post handler now rejects any total_timeout_s shorter than the per‑request timeout (core/src/client.rs lines 599‑609). Passing a shorter overall timeout (e.g., total_timeout_s=30) while leaving the default timeout_s causes an InvalidParameter error instead of honoring the total timeout, unlike the embed/rerank/classify methods where timeout_s falls back to the total timeout. This makes the new total timeout parameter unusable unless callers also override timeout_s manually.

Useful? React with 👍 / 👎.

michaelfeil · 2025-11-21T23:32:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

truss/baseten-performance-client/node_bindings/src/lib.rs

Lines 210 to 214 in 3bd2fd2

    
           let max_concurrent_requests = 
        
             max_concurrent_requests.unwrap_or(DEFAULT_CONCURRENCY as u32) as usize; 
        
           let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE as u32) as usize; 
        
           let timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S); 
        
           let max_chars_per_request = max_chars_per_request.map(|x| x as usize);

Respect total_timeout_s when deriving Node per-request timeout

embed keeps timeout_s at DEFAULT_REQUEST_TIMEOUT_S even when the caller supplies total_timeout_s, but the core validator rejects any total timeout shorter than the per-request timeout. A Node caller passing only total_timeout_s=10 therefore gets an InvalidParameter because timeout_s remains 3600s. The Python bindings already derive timeout_s from total_timeout_s to avoid this; the Node bindings need the same reconciliation (or to lower timeout_s when a total timeout is provided) so the new parameter is usable.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-21T23:36:49Z

baseten-performance-client/python_bindings/src/lib.rs

                        max_concurrent_requests,
                        timeout_s,


batch_post rejects provided total_timeout_s unless timeout_s is also lowered

batch_post always forwards the default per-request timeout_s (3600s) while also accepting total_timeout_s, but the core process_batch_post_requests validation requires the total timeout to be at least the per-request timeout. Calling batch_post(..., total_timeout_s=10) therefore errors out even though the caller set a total timeout, because timeout_s is still 3600. The method should reconcile timeout_s with total_timeout_s (as other bindings do) or otherwise reduce timeout_s when a total timeout is provided so the new option works.

Useful? React with 👍 / 👎.

jojo-baseten added 6 commits November 17, 2025 16:26

add operation_timeout_s parameter for total operation timeout

4452b7e

reduce MIN_REQUEST_TIMEOUT_S from 1 to 0.3s

3f5b9f0

add retry parameter

3bb8d2b

update timeout min to 0.5s, reset min hedge delay to 0.2

8496b24

rename operation_timeout_s to total_timeout_s

8e8329e

add separate constants for total timeout and request timeout

3099120

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

jojo-baseten added 7 commits November 19, 2025 11:11

add node_bindings for total_timeout_s and max_retries

2155db4

add total_timeou_s to batch_post and async_batch_post

23ded64

fix trailing whitespaces

dc34ee6

Bump baseten-performance-client Python to v0.0.12-rc.0 (PyPI only)

dd92943

Update .pyi file to match new timeout and retry parameters

ad7fb79

Bump version to 0.0.12-rc.1 (includes updated .pyi file)

fc732bf

lower min timeout and hedge delay to 0.1

3bd2fd2

chatgpt-codex-connector bot reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Baseten performance client/total timeout #2037

Baseten performance client/total timeout #2037

Uh oh!

michaelfeil commented Nov 19, 2025

Uh oh!

michaelfeil commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 19, 2025

Uh oh!

michaelfeil commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Uh oh!

michaelfeil commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	let max_concurrent_requests =
	max_concurrent_requests.unwrap_or(DEFAULT_CONCURRENCY as u32) as usize;
	let batch_size = batch_size.unwrap_or(DEFAULT_BATCH_SIZE as u32) as usize;
	let timeout_s = timeout_s.unwrap_or(DEFAULT_REQUEST_TIMEOUT_S);
	let max_chars_per_request = max_chars_per_request.map(\|x\| x as usize);

Baseten performance client/total timeout #2037

Are you sure you want to change the base?

Baseten performance client/total timeout #2037

Uh oh!

Conversation

michaelfeil commented Nov 19, 2025

🚀 What

💻 How

🔬 Testing

Uh oh!

michaelfeil commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

michaelfeil commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

michaelfeil commented Nov 21, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants