You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/cleanlab_tlm/utils/rag.py
+192-4Lines changed: 192 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@
11
11
from __future__ importannotations
12
12
13
13
importasyncio
14
+
importwarnings
14
15
fromcollections.abcimportSequence
15
16
fromtypingimport (
16
17
TYPE_CHECKING,
@@ -27,6 +28,7 @@
27
28
fromcleanlab_tlm.errorsimportValidationError
28
29
fromcleanlab_tlm.internal.apiimportapi
29
30
fromcleanlab_tlm.internal.baseimportBaseTLM
31
+
fromcleanlab_tlm.tlmimportTLM
30
32
fromcleanlab_tlm.internal.constantsimport (
31
33
_BINARY_STR,
32
34
_CONTINUOUS_STR,
@@ -866,9 +868,10 @@ class Eval:
866
868
response_identifier (str, optional): The exact string used in your evaluation `criteria` to reference the RAG/LLM response.
867
869
For example, specifying `response_identifier` as "AI Answer" means your `criteria` should refer to the response as "AI Answer".
868
870
Leave this value as None (the default) if this Eval doesn't consider the response.
869
-
mode (str, optional): What type of evaluation these `criteria` correspond to, either "continuous" (default)or "binary".
871
+
mode (str, optional): What type of evaluation these `criteria` correspond to, either "continuous" (default), "binary", or "auto".
870
872
- "continuous": For `criteria` that define what is good/better v.s. what is bad/worse, corresponding to evaluations of quality along a continuous spectrum (e.g., relevance, conciseness).
871
873
- "binary": For `criteria` written as Yes/No questions, corresponding to evaluations that most would consider either True or False rather than grading along a continuous spectrum (e.g., does Response mention ACME Inc., is Query asking about refund, ...).
874
+
- "auto": Automatically determines whether the criteria is binary or continuous based on the criteria text.
872
875
Both modes return scores in the 0-1 range.
873
876
For "continuous" evaluations, your `criteria` should define what good vs. bad looks like (cases deemed bad will return low evaluation scores).
874
877
For binary evaluations, your `criteria` should be a Yes/No question (cases answered "Yes" will return low evaluation scores, so phrase your question such that the likelihood of "Yes" matches the likelihood of the particular problem you wish to detect).
Check if criteria clearly specifies what is Good vs Bad or Desirable vs Undesirable.
1033
+
1034
+
Args:
1035
+
criteria: The evaluation criteria text
1036
+
1037
+
Returns:
1038
+
True if criteria clearly defines good/bad or desirable/undesirable, False otherwise
1039
+
"""
1040
+
tlm=TLM(quality_preset="base")
1041
+
1042
+
prompt=f"""Analyze the following evaluation criteria and determine if it clearly specifies what is "good" versus "bad", "desirable" versus "undesirable", "better" versus "worse", or uses similar language to define quality distinctions.
1043
+
1044
+
The criteria should make it clear what characteristics or qualities are considered positive/desirable versus negative/undesirable.
1045
+
1046
+
Evaluation Criteria:
1047
+
{criteria}
1048
+
1049
+
Does this criteria clearly specify what is good/desirable versus bad/undesirable? Answer only "Yes" or "No"."""
0 commit comments