Skip to content

Commit 39a676e

Browse files
feat(api): remove deprecated endpoint increment_queries
1 parent 2f242f9 commit 39a676e

13 files changed

+146
-444
lines changed

.stats.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
configured_endpoints: 54
2-
openapi_spec_hash: 49989625bf633c5fdb3e11140f788f2d
3-
config_hash: 8f6e5c3b064cbb77569a6bf654954a56
2+
openapi_spec_hash: 57e29e33aec4bbc20171ec3128594e75
3+
config_hash: 930284cfa37f835d949c8a1b124f4807

src/codex/resources/projects/projects.py

Lines changed: 32 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,6 @@ def validate(
460460
quality_preset: Literal["best", "high", "medium", "low", "base"] | NotGiven = NOT_GIVEN,
461461
rewritten_question: Optional[str] | NotGiven = NOT_GIVEN,
462462
task: Optional[str] | NotGiven = NOT_GIVEN,
463-
tools: Optional[Iterable[project_validate_params.Tool]] | NotGiven = NOT_GIVEN,
464463
x_client_library_version: str | NotGiven = NOT_GIVEN,
465464
x_integration_type: str | NotGiven = NOT_GIVEN,
466465
x_source: str | NotGiven = NOT_GIVEN,
@@ -505,16 +504,17 @@ def validate(
505504
506505
The default values corresponding to each quality preset are:
507506
508-
- **best:** `num_consistency_samples` = 8, `num_self_reflections` = 3,
509-
`reasoning_effort` = `"high"`.
510-
- **high:** `num_consistency_samples` = 4, `num_self_reflections` = 3,
511-
`reasoning_effort` = `"high"`.
512-
- **medium:** `num_consistency_samples` = 0, `num_self_reflections` = 3,
513-
`reasoning_effort` = `"high"`.
514-
- **low:** `num_consistency_samples` = 0, `num_self_reflections` = 3,
515-
`reasoning_effort` = `"none"`.
516-
- **base:** `num_consistency_samples` = 0, `num_self_reflections` = 1,
517-
`reasoning_effort` = `"none"`.
507+
- **best:** `num_candidate_responses` = 6, `num_consistency_samples` = 8,
508+
`use_self_reflection` = True. This preset improves LLM responses.
509+
- **high:** `num_candidate_responses` = 4, `num_consistency_samples` = 8,
510+
`use_self_reflection` = True. This preset improves LLM responses.
511+
- **medium:** `num_candidate_responses` = 1, `num_consistency_samples` = 8,
512+
`use_self_reflection` = True.
513+
- **low:** `num_candidate_responses` = 1, `num_consistency_samples` = 4,
514+
`use_self_reflection` = True.
515+
- **base:** `num_candidate_responses` = 1, `num_consistency_samples` = 0,
516+
`use_self_reflection` = False. When using `get_trustworthiness_score()` on
517+
"base" preset, a faster self-reflection is employed.
518518
519519
By default, TLM uses the: "medium" `quality_preset`, "gpt-4.1-mini" base
520520
`model`, and `max_tokens` is set to 512. You can set custom values for these
@@ -550,11 +550,12 @@ def validate(
550550
strange prompts or prompts that are too vague/open-ended to receive a clearly defined 'good' response.
551551
TLM measures consistency via the degree of contradiction between sampled responses that the model considers plausible.
552552
553-
num_self_reflections(int, default = 3): the number of self-reflections to perform where the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
554-
The maximum number of self-reflections currently supported is 3. Lower values will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
555-
Reflection helps quantify aleatoric uncertainty associated with challenging prompts and catches responses that are noticeably incorrect/bad upon further analysis.
553+
use_self_reflection (bool, default = `True`): whether the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
554+
Setting this False disables reflection and will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
555+
Reflection helps quantify aleatoric uncertainty associated with challenging prompts
556+
and catches responses that are noticeably incorrect/bad upon further analysis.
556557
557-
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "discrepancy"): how the
558+
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "semantic"): how the
558559
trustworthiness scoring's consistency algorithm measures similarity between alternative responses considered plausible by the model.
559560
Supported similarity measures include - "semantic" (based on natural language inference),
560561
"embedding" (based on vector embedding similarity), "embedding_large" (based on a larger embedding model),
@@ -573,8 +574,6 @@ def validate(
573574
- name: Name of the evaluation criteria.
574575
- criteria: Instructions specifying the evaluation criteria.
575576
576-
use_self_reflection (bool, default = `True`): deprecated. Use `num_self_reflections` instead.
577-
578577
prompt: The prompt to use for the TLM call. If not provided, the prompt will be
579578
generated from the messages.
580579
@@ -583,9 +582,6 @@ def validate(
583582
rewritten_question: The re-written query if it was provided by the client to Codex from a user to be
584583
used instead of the original query.
585584
586-
tools: Tools to use for the LLM call. If not provided, it is assumed no tools were
587-
provided to the LLM.
588-
589585
extra_headers: Send extra headers
590586
591587
extra_query: Add additional query parameters to the request
@@ -624,7 +620,6 @@ def validate(
624620
"quality_preset": quality_preset,
625621
"rewritten_question": rewritten_question,
626622
"task": task,
627-
"tools": tools,
628623
},
629624
project_validate_params.ProjectValidateParams,
630625
),
@@ -1033,7 +1028,6 @@ async def validate(
10331028
quality_preset: Literal["best", "high", "medium", "low", "base"] | NotGiven = NOT_GIVEN,
10341029
rewritten_question: Optional[str] | NotGiven = NOT_GIVEN,
10351030
task: Optional[str] | NotGiven = NOT_GIVEN,
1036-
tools: Optional[Iterable[project_validate_params.Tool]] | NotGiven = NOT_GIVEN,
10371031
x_client_library_version: str | NotGiven = NOT_GIVEN,
10381032
x_integration_type: str | NotGiven = NOT_GIVEN,
10391033
x_source: str | NotGiven = NOT_GIVEN,
@@ -1078,16 +1072,17 @@ async def validate(
10781072
10791073
The default values corresponding to each quality preset are:
10801074
1081-
- **best:** `num_consistency_samples` = 8, `num_self_reflections` = 3,
1082-
`reasoning_effort` = `"high"`.
1083-
- **high:** `num_consistency_samples` = 4, `num_self_reflections` = 3,
1084-
`reasoning_effort` = `"high"`.
1085-
- **medium:** `num_consistency_samples` = 0, `num_self_reflections` = 3,
1086-
`reasoning_effort` = `"high"`.
1087-
- **low:** `num_consistency_samples` = 0, `num_self_reflections` = 3,
1088-
`reasoning_effort` = `"none"`.
1089-
- **base:** `num_consistency_samples` = 0, `num_self_reflections` = 1,
1090-
`reasoning_effort` = `"none"`.
1075+
- **best:** `num_candidate_responses` = 6, `num_consistency_samples` = 8,
1076+
`use_self_reflection` = True. This preset improves LLM responses.
1077+
- **high:** `num_candidate_responses` = 4, `num_consistency_samples` = 8,
1078+
`use_self_reflection` = True. This preset improves LLM responses.
1079+
- **medium:** `num_candidate_responses` = 1, `num_consistency_samples` = 8,
1080+
`use_self_reflection` = True.
1081+
- **low:** `num_candidate_responses` = 1, `num_consistency_samples` = 4,
1082+
`use_self_reflection` = True.
1083+
- **base:** `num_candidate_responses` = 1, `num_consistency_samples` = 0,
1084+
`use_self_reflection` = False. When using `get_trustworthiness_score()` on
1085+
"base" preset, a faster self-reflection is employed.
10911086
10921087
By default, TLM uses the: "medium" `quality_preset`, "gpt-4.1-mini" base
10931088
`model`, and `max_tokens` is set to 512. You can set custom values for these
@@ -1123,11 +1118,12 @@ async def validate(
11231118
strange prompts or prompts that are too vague/open-ended to receive a clearly defined 'good' response.
11241119
TLM measures consistency via the degree of contradiction between sampled responses that the model considers plausible.
11251120
1126-
num_self_reflections(int, default = 3): the number of self-reflections to perform where the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
1127-
The maximum number of self-reflections currently supported is 3. Lower values will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
1128-
Reflection helps quantify aleatoric uncertainty associated with challenging prompts and catches responses that are noticeably incorrect/bad upon further analysis.
1121+
use_self_reflection (bool, default = `True`): whether the LLM is asked to reflect on the given response and directly evaluate correctness/confidence.
1122+
Setting this False disables reflection and will reduce runtimes/costs, but potentially also the reliability of trustworthiness scores.
1123+
Reflection helps quantify aleatoric uncertainty associated with challenging prompts
1124+
and catches responses that are noticeably incorrect/bad upon further analysis.
11291125
1130-
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "discrepancy"): how the
1126+
similarity_measure ({"semantic", "string", "embedding", "embedding_large", "code", "discrepancy"}, default = "semantic"): how the
11311127
trustworthiness scoring's consistency algorithm measures similarity between alternative responses considered plausible by the model.
11321128
Supported similarity measures include - "semantic" (based on natural language inference),
11331129
"embedding" (based on vector embedding similarity), "embedding_large" (based on a larger embedding model),
@@ -1146,8 +1142,6 @@ async def validate(
11461142
- name: Name of the evaluation criteria.
11471143
- criteria: Instructions specifying the evaluation criteria.
11481144
1149-
use_self_reflection (bool, default = `True`): deprecated. Use `num_self_reflections` instead.
1150-
11511145
prompt: The prompt to use for the TLM call. If not provided, the prompt will be
11521146
generated from the messages.
11531147
@@ -1156,9 +1150,6 @@ async def validate(
11561150
rewritten_question: The re-written query if it was provided by the client to Codex from a user to be
11571151
used instead of the original query.
11581152
1159-
tools: Tools to use for the LLM call. If not provided, it is assumed no tools were
1160-
provided to the LLM.
1161-
11621153
extra_headers: Send extra headers
11631154
11641155
extra_query: Add additional query parameters to the request
@@ -1197,7 +1188,6 @@ async def validate(
11971188
"quality_preset": quality_preset,
11981189
"rewritten_question": rewritten_question,
11991190
"task": task,
1200-
"tools": tools,
12011191
},
12021192
project_validate_params.ProjectValidateParams,
12031193
),

0 commit comments

Comments
 (0)