-
Notifications
You must be signed in to change notification settings - Fork 31
Added trasformation function with unit test cases #847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR adds a comprehensive suite of data transformation utilities to the Airbyte CDK across four modules: mathematical scaling functions, string normalization utilities, date handling functions, and data imputation logic. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant fill_nulls_record
participant choose_imputation_strategy
participant compute_imputation_value
Caller->>fill_nulls_record: record, columns, samples, strategies
loop For each column
alt explicit strategy provided
fill_nulls_record->>compute_imputation_value: series, strategy
else infer strategy
fill_nulls_record->>choose_imputation_strategy: series, numeric, skew_threshold
choose_imputation_strategy-->>fill_nulls_record: strategy ("mean"/"median"/"mode")
fill_nulls_record->>compute_imputation_value: series, strategy
end
compute_imputation_value-->>fill_nulls_record: imputation_value
fill_nulls_record->>fill_nulls_record: apply value if field is None<br/>create ImputationReport
end
fill_nulls_record-->>Caller: updated_record, [ImputationReport, ...]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Consider focusing extra attention on:
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Nitpick comments (4)
airbyte_cdk/utils/transforms/__init__.py (1)
17-29: Consider adding spaces after commas in__all__for consistency?The
__all__list items are missing spaces after commas. While this works fine, adding spaces would align with PEP 8 style conventions and improve readability. Wdyt?Apply this diff if you'd like to improve the formatting:
__all__ = [ # math - "minmax_scale","zscore","clip","winsorize","log1p_safe", - "bucketize","robust_percentile_scale", + "minmax_scale", "zscore", "clip", "winsorize", "log1p_safe", + "bucketize", "robust_percentile_scale", # cleaning - "to_lower","strip_whitespace","squash_whitespace", - "normalize_unicode","remove_punctuation","map_values","cast_numeric", + "to_lower", "strip_whitespace", "squash_whitespace", + "normalize_unicode", "remove_punctuation", "map_values", "cast_numeric", # date - "try_parse_date","extract_date_parts","floor_to_month","ceil_to_month", + "try_parse_date", "extract_date_parts", "floor_to_month", "ceil_to_month", # impute - "ImputationReport","choose_imputation_strategy", - "compute_imputation_value","fill_nulls_column","fill_nulls_record", + "ImputationReport", "choose_imputation_strategy", + "compute_imputation_value", "fill_nulls_column", "fill_nulls_record", ]airbyte_cdk/utils/transforms/math.py (3)
7-11: Consider potential floating-point precision issues with equality check?Line 9 uses
==to compare floats (data_max == data_min). While this usually works when the same values are passed in, floating-point arithmetic can sometimes lead to precision issues. Would using a small epsilon for comparison be more robust, or is exact equality the intended behavior here? Wdyt?
22-28: Document the error-handling behavior oflog1p_safe?The function returns the original value
float(x)when an exception occurs (line 28). While this "safe" pattern prevents crashes, users might not expect this behavior. Adding a docstring to clarify when and why the original value is returned would help. Wdyt about adding documentation for this?
36-50: Consider using a local variable instead of reassigning the parameter?On line 46, the parameter
xis reassigned whenclip_outliers=True. While this works, it can make the code slightly harder to follow. Would you consider using a local variable likescaled_xto preserve the original parameter? This could improve clarity. Wdyt?Example:
def robust_percentile_scale( x: Number, p_low_value: Number, p_high_value: Number, out_range: Tuple[Number, Number]=(0.0, 1.0), clip_outliers: bool=True ) -> float: a, b = out_range lo, hi = float(p_low_value), float(p_high_value) scaled_x = clip(float(x), lo, hi) if clip_outliers else float(x) width = hi - lo if width == 0: return float(a + (b - a) / 2.0) return ((scaled_x - lo) / width) * (b - a) + a
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
.DS_Storeis excluded by!**/.DS_Store
📒 Files selected for processing (10)
.github/copilot-instructions.md(1 hunks)airbyte_cdk/test/utils/transforms/test_cleaning.py(1 hunks)airbyte_cdk/test/utils/transforms/test_date.py(1 hunks)airbyte_cdk/test/utils/transforms/test_impute.py(1 hunks)airbyte_cdk/test/utils/transforms/test_math.py(1 hunks)airbyte_cdk/utils/transforms/__init__.py(1 hunks)airbyte_cdk/utils/transforms/cleaning.py(1 hunks)airbyte_cdk/utils/transforms/date.py(1 hunks)airbyte_cdk/utils/transforms/impute.py(1 hunks)airbyte_cdk/utils/transforms/math.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
.github/copilot-instructions.md
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
.github/copilot-instructions.md
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.
Applied to files:
.github/copilot-instructions.md
🧬 Code graph analysis (5)
airbyte_cdk/test/utils/transforms/test_cleaning.py (1)
airbyte_cdk/utils/transforms/cleaning.py (7)
to_lower(7-8)strip_whitespace(10-11)squash_whitespace(13-16)normalize_unicode(18-19)remove_punctuation(22-25)map_values(27-28)cast_numeric(30-44)
airbyte_cdk/test/utils/transforms/test_date.py (1)
airbyte_cdk/utils/transforms/date.py (4)
try_parse_date(4-8)extract_date_parts(10-14)floor_to_month(16-20)ceil_to_month(22-28)
airbyte_cdk/test/utils/transforms/test_impute.py (1)
airbyte_cdk/utils/transforms/impute.py (6)
_numeric_skewness(17-26)choose_imputation_strategy(28-45)compute_imputation_value(47-62)fill_nulls_column(64-72)fill_nulls_record(74-94)ImputationReport(11-15)
airbyte_cdk/utils/transforms/__init__.py (4)
airbyte_cdk/utils/transforms/math.py (7)
minmax_scale(7-11)zscore(13-14)clip(16-17)winsorize(19-20)log1p_safe(22-28)bucketize(30-34)robust_percentile_scale(36-50)airbyte_cdk/utils/transforms/cleaning.py (7)
to_lower(7-8)strip_whitespace(10-11)squash_whitespace(13-16)normalize_unicode(18-19)remove_punctuation(22-25)map_values(27-28)cast_numeric(30-44)airbyte_cdk/utils/transforms/date.py (4)
try_parse_date(4-8)extract_date_parts(10-14)floor_to_month(16-20)ceil_to_month(22-28)airbyte_cdk/utils/transforms/impute.py (5)
ImputationReport(11-15)choose_imputation_strategy(28-45)compute_imputation_value(47-62)fill_nulls_column(64-72)fill_nulls_record(74-94)
airbyte_cdk/test/utils/transforms/test_math.py (1)
airbyte_cdk/utils/transforms/math.py (7)
minmax_scale(7-11)zscore(13-14)clip(16-17)winsorize(19-20)log1p_safe(22-28)bucketize(30-34)robust_percentile_scale(36-50)
🪛 GitHub Actions: Linters
airbyte_cdk/test/utils/transforms/test_cleaning.py
[error] 13-13: Function is missing a return type annotation [no-untyped-def]
[error] 13-13: Use "-> None" if function does not return a value
[error] 28-28: Function is missing a return type annotation [no-untyped-def]
[error] 28-28: Use "-> None" if function does not return a value
[error] 43-43: Function is missing a return type annotation [no-untyped-def]
[error] 43-43: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 78-78: Function is missing a return type annotation [no-untyped-def]
[error] 78-78: Use "-> None" if function does not return a value
[error] 96-96: Function is missing a return type annotation [no-untyped-def]
[error] 96-96: Use "-> None" if function does not return a value
[error] 113-113: Function is missing a return type annotation [no-untyped-def]
[error] 113-113: Use "-> None" if function does not return a value
[error] 131-131: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal['']") [comparison-overlap]
[error] 132-132: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal[' ']") [comparison-overlap]
[error] 136-136: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "str") [comparison-overlap]
airbyte_cdk/test/utils/transforms/test_date.py
[error] 11-11: Function is missing a return type annotation [no-untyped-def]
[error] 11-11: Use "-> None" if function does not return a value
[error] 22-22: Function is missing a return type annotation [no-untyped-def]
[error] 22-22: Use "-> None" if function does not return a value
[error] 39-39: Function is missing a return type annotation [no-untyped-def]
[error] 39-39: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 46-46: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 50-50: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 53-53: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 54-54: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
airbyte_cdk/test/utils/transforms/test_impute.py
[error] 12-12: Function is missing a return type annotation [no-untyped-def]
[error] 12-12: Use "-> None" if function does not return a value
[error] 26-26: Function is missing a return type annotation [no-untyped-def]
[error] 26-26: Use "-> None" if function does not return a value
[error] 68-68: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
[error] 107-107: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
[error] 115-115: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
[error] 12-12: Function is missing a return type annotation [no-untyped-def]
[error] 12-12: Use "-> None" if function does not return a value
[error] 26-26: Function is missing a return type annotation [no-untyped-def]
[error] 26-26: Use "-> None" if function does not return a value
[error] 46-46: Function is missing a return type annotation [no-untyped-def]
[error] 46-46: Use "-> None" if function does not return a value
airbyte_cdk/utils/transforms/__init__.py
[error] 1-1: I001 Import block is un-sorted or un-formatted. Organize imports.
[error] 15-15: I001 Import block is un-sorted or un-formatted. Organize imports.
airbyte_cdk/utils/transforms/date.py
[error] 4-4: Function is missing a return type annotation [no-untyped-def]
[error] 10-10: Function is missing a type annotation for one or more arguments [no-untyped-def]
[error] 16-16: Function is missing a type annotation [no-untyped-def]
[error] 22-22: Function is missing a type annotation [no-untyped-def]
airbyte_cdk/test/utils/transforms/test_math.py
[error] 14-14: Function is missing a return type annotation [no-untyped-def]
[error] 14-14: Use "-> None" if function does not return a value
[error] 33-33: Function is missing a return type annotation [no-untyped-def]
[error] 33-33: Use "-> None" if function does not return a value
[error] 45-45: Function is missing a return type annotation [no-untyped-def]
[error] 45-45: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 14-14: Function is missing a return type annotation [no-untyped-def]
[error] 14-14: Use "-> None" if function does not return a value
[error] 33-33: Function is missing a return type annotation [no-untyped-def]
[error] 33-33: Use "-> None" if function does not return a value
[error] 45-45: Function is missing a return type annotation [no-untyped-def]
[error] 45-45: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 72-72: Function is missing a return type annotation [no-untyped-def]
[error] 72-72: Use "-> None" if function does not return a value
[error] 87-87: Function is missing a return type annotation [no-untyped-def]
[error] 87-87: Use "-> None" if function does not return a value
[error] 111-111: Function is missing a return type annotation [no-untyped-def]
[error] 111-111: Use "-> None" if function does not return a value
airbyte_cdk/utils/transforms/impute.py
[error] 64-64: Function is missing a type annotation for one or more arguments [no-untyped-def]
airbyte_cdk/utils/transforms/math.py
[error] 3-3: I001 Import block is un-sorted or un-formatted. Organize imports.
airbyte_cdk/utils/transforms/cleaning.py
[error] 19-19: Argument 1 to "normalize" has incompatible type "str"; expected "Literal['NFC', 'NFD', 'NFKC', 'NFKD']" [arg-type]
[error] 44-44: Returning Any from function declared to return "int | float | None" [no-any-return]
🪛 LanguageTool
.github/copilot-instructions.md
[uncategorized] ~89-~89: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rnal APIs: Use HttpStream with proper rate limiting - Vector DBs: Implement destination log...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
.github/copilot-instructions.md (1)
1-97: LGTM! Comprehensive development guide.This AI development guide is well-structured and provides valuable context for AI agents working with the codebase. The sections cover key architectural concepts, testing patterns, and common workflows effectively.
airbyte_cdk/utils/transforms/math.py (1)
7-50: All function implementations look solid!The logic across all seven transformation functions is correct and handles edge cases well (zero-width ranges, division by zero, invalid log inputs). The implementations are clean and the type hints are helpful. Nice work on this utility module!
| def test_to_lower(): | ||
| """Test string lowercasing function.""" | ||
| # Test normal cases | ||
| assert to_lower("Hello") == "hello" | ||
| assert to_lower("HELLO") == "hello" | ||
| assert to_lower("HeLLo") == "hello" | ||
|
|
||
| # Test with spaces and special characters | ||
| assert to_lower("Hello World!") == "hello world!" | ||
| assert to_lower("Hello123") == "hello123" | ||
|
|
||
| # Test empty and None | ||
| assert to_lower("") == "" | ||
| assert to_lower(None) is None | ||
|
|
||
| def test_strip_whitespace(): | ||
| """Test whitespace stripping function.""" | ||
| # Test normal cases | ||
| assert strip_whitespace(" hello ") == "hello" | ||
| assert strip_whitespace("hello") == "hello" | ||
|
|
||
| # Test with tabs and newlines | ||
| assert strip_whitespace("\thello\n") == "hello" | ||
| assert strip_whitespace(" hello\n world ") == "hello\n world" | ||
|
|
||
| # Test empty and None | ||
| assert strip_whitespace(" ") == "" | ||
| assert strip_whitespace("") == "" | ||
| assert strip_whitespace(None) is None | ||
|
|
||
| def test_squash_whitespace(): | ||
| """Test whitespace squashing function.""" | ||
| # Test normal cases | ||
| assert squash_whitespace("hello world") == "hello world" | ||
| assert squash_whitespace(" hello world ") == "hello world" | ||
|
|
||
| # Test with tabs and newlines | ||
| assert squash_whitespace("hello\n\nworld") == "hello world" | ||
| assert squash_whitespace("hello\t\tworld") == "hello world" | ||
| assert squash_whitespace("\n hello \t world \n") == "hello world" | ||
|
|
||
| # Test empty and None | ||
| assert squash_whitespace(" ") == "" | ||
| assert squash_whitespace("") == "" | ||
| assert squash_whitespace(None) is None | ||
|
|
||
| def test_normalize_unicode(): | ||
| """Test unicode normalization function.""" | ||
| # Test normal cases | ||
| assert normalize_unicode("hello") == "hello" | ||
|
|
||
| # Test composed characters | ||
| assert normalize_unicode("café") == "café" # Composed 'é' | ||
|
|
||
| # Test decomposed characters | ||
| decomposed = "cafe\u0301" # 'e' with combining acute accent | ||
| assert normalize_unicode(decomposed) == "café" # Should normalize to composed form | ||
|
|
||
| # Test different normalization forms | ||
| assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC") | ||
|
|
||
| # Test empty and None | ||
| assert normalize_unicode("") == "" | ||
| assert normalize_unicode(None) is None | ||
|
|
||
| def test_remove_punctuation(): | ||
| """Test punctuation removal function.""" | ||
| # Test normal cases | ||
| assert remove_punctuation("hello, world!") == "hello world" | ||
| assert remove_punctuation("hello.world") == "helloworld" | ||
|
|
||
| # Test with multiple punctuation marks | ||
| assert remove_punctuation("hello!!! world???") == "hello world" | ||
| assert remove_punctuation("hello@#$%world") == "helloworld" | ||
|
|
||
| # Test with unicode punctuation | ||
| assert remove_punctuation("hello—world") == "helloworld" | ||
| assert remove_punctuation("«hello»") == "hello" | ||
|
|
||
| # Test empty and None | ||
| assert remove_punctuation("") == "" | ||
| assert remove_punctuation(None) is None | ||
|
|
||
| def test_map_values(): | ||
| """Test value mapping function.""" | ||
| mapping = {"a": 1, "b": 2, "c": 3} | ||
|
|
||
| # Test normal cases | ||
| assert map_values("a", mapping) == 1 | ||
| assert map_values("b", mapping) == 2 | ||
|
|
||
| # Test with default value | ||
| assert map_values("x", mapping) is None | ||
| assert map_values("x", mapping, default=0) == 0 | ||
|
|
||
| # Test with different value types | ||
| mixed_mapping = {1: "one", "two": 2, None: "null"} | ||
| assert map_values(1, mixed_mapping) == "one" | ||
| assert map_values(None, mixed_mapping) == "null" | ||
|
|
||
| def test_cast_numeric(): | ||
| """Test numeric casting function.""" | ||
| # Test successful casts | ||
| assert cast_numeric("123") == 123 | ||
| assert cast_numeric("123.45") == 123.45 | ||
| assert cast_numeric(123) == 123 | ||
| assert cast_numeric(123.45) == 123.45 | ||
|
|
||
| # Test integers vs floats | ||
| assert isinstance(cast_numeric("123"), int) | ||
| assert isinstance(cast_numeric("123.45"), float) | ||
|
|
||
| # Test empty values | ||
| assert cast_numeric(None) is None | ||
| assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string | ||
| assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace | ||
|
|
||
| # Test empty values with default behavior (on_error="ignore") | ||
| assert cast_numeric("") == "" | ||
| assert cast_numeric(" ") == " " | ||
|
|
||
| # Test error handling modes | ||
| non_numeric = "abc" | ||
| assert cast_numeric(non_numeric, on_error="ignore") == non_numeric | ||
| assert cast_numeric(non_numeric, on_error="none") is None | ||
| assert cast_numeric(non_numeric, on_error="default", default=0) == 0 | ||
|
|
||
| # Test error raising | ||
| with pytest.raises(Exception): | ||
| cast_numeric(non_numeric, on_error="raise") No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type annotations to test functions.
All test functions are missing -> None return type annotations. Would you consider adding these? Wdyt?
-def test_to_lower():
+def test_to_lower() -> None:
"""Test string lowercasing function."""
-def test_strip_whitespace():
+def test_strip_whitespace() -> None:
"""Test whitespace stripping function."""
-def test_squash_whitespace():
+def test_squash_whitespace() -> None:
"""Test whitespace squashing function."""
-def test_normalize_unicode():
+def test_normalize_unicode() -> None:
"""Test unicode normalization function."""
-def test_remove_punctuation():
+def test_remove_punctuation() -> None:
"""Test punctuation removal function."""
-def test_map_values():
+def test_map_values() -> None:
"""Test value mapping function."""
-def test_cast_numeric():
+def test_cast_numeric() -> None:
"""Test numeric casting function."""Note: The comparison-overlap errors on lines 131-132, 136 will be resolved once the return type issue in cast_numeric (flagged in cleaning.py) is fixed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_to_lower(): | |
| """Test string lowercasing function.""" | |
| # Test normal cases | |
| assert to_lower("Hello") == "hello" | |
| assert to_lower("HELLO") == "hello" | |
| assert to_lower("HeLLo") == "hello" | |
| # Test with spaces and special characters | |
| assert to_lower("Hello World!") == "hello world!" | |
| assert to_lower("Hello123") == "hello123" | |
| # Test empty and None | |
| assert to_lower("") == "" | |
| assert to_lower(None) is None | |
| def test_strip_whitespace(): | |
| """Test whitespace stripping function.""" | |
| # Test normal cases | |
| assert strip_whitespace(" hello ") == "hello" | |
| assert strip_whitespace("hello") == "hello" | |
| # Test with tabs and newlines | |
| assert strip_whitespace("\thello\n") == "hello" | |
| assert strip_whitespace(" hello\n world ") == "hello\n world" | |
| # Test empty and None | |
| assert strip_whitespace(" ") == "" | |
| assert strip_whitespace("") == "" | |
| assert strip_whitespace(None) is None | |
| def test_squash_whitespace(): | |
| """Test whitespace squashing function.""" | |
| # Test normal cases | |
| assert squash_whitespace("hello world") == "hello world" | |
| assert squash_whitespace(" hello world ") == "hello world" | |
| # Test with tabs and newlines | |
| assert squash_whitespace("hello\n\nworld") == "hello world" | |
| assert squash_whitespace("hello\t\tworld") == "hello world" | |
| assert squash_whitespace("\n hello \t world \n") == "hello world" | |
| # Test empty and None | |
| assert squash_whitespace(" ") == "" | |
| assert squash_whitespace("") == "" | |
| assert squash_whitespace(None) is None | |
| def test_normalize_unicode(): | |
| """Test unicode normalization function.""" | |
| # Test normal cases | |
| assert normalize_unicode("hello") == "hello" | |
| # Test composed characters | |
| assert normalize_unicode("café") == "café" # Composed 'é' | |
| # Test decomposed characters | |
| decomposed = "cafe\u0301" # 'e' with combining acute accent | |
| assert normalize_unicode(decomposed) == "café" # Should normalize to composed form | |
| # Test different normalization forms | |
| assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC") | |
| # Test empty and None | |
| assert normalize_unicode("") == "" | |
| assert normalize_unicode(None) is None | |
| def test_remove_punctuation(): | |
| """Test punctuation removal function.""" | |
| # Test normal cases | |
| assert remove_punctuation("hello, world!") == "hello world" | |
| assert remove_punctuation("hello.world") == "helloworld" | |
| # Test with multiple punctuation marks | |
| assert remove_punctuation("hello!!! world???") == "hello world" | |
| assert remove_punctuation("hello@#$%world") == "helloworld" | |
| # Test with unicode punctuation | |
| assert remove_punctuation("hello—world") == "helloworld" | |
| assert remove_punctuation("«hello»") == "hello" | |
| # Test empty and None | |
| assert remove_punctuation("") == "" | |
| assert remove_punctuation(None) is None | |
| def test_map_values(): | |
| """Test value mapping function.""" | |
| mapping = {"a": 1, "b": 2, "c": 3} | |
| # Test normal cases | |
| assert map_values("a", mapping) == 1 | |
| assert map_values("b", mapping) == 2 | |
| # Test with default value | |
| assert map_values("x", mapping) is None | |
| assert map_values("x", mapping, default=0) == 0 | |
| # Test with different value types | |
| mixed_mapping = {1: "one", "two": 2, None: "null"} | |
| assert map_values(1, mixed_mapping) == "one" | |
| assert map_values(None, mixed_mapping) == "null" | |
| def test_cast_numeric(): | |
| """Test numeric casting function.""" | |
| # Test successful casts | |
| assert cast_numeric("123") == 123 | |
| assert cast_numeric("123.45") == 123.45 | |
| assert cast_numeric(123) == 123 | |
| assert cast_numeric(123.45) == 123.45 | |
| # Test integers vs floats | |
| assert isinstance(cast_numeric("123"), int) | |
| assert isinstance(cast_numeric("123.45"), float) | |
| # Test empty values | |
| assert cast_numeric(None) is None | |
| assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string | |
| assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace | |
| # Test empty values with default behavior (on_error="ignore") | |
| assert cast_numeric("") == "" | |
| assert cast_numeric(" ") == " " | |
| # Test error handling modes | |
| non_numeric = "abc" | |
| assert cast_numeric(non_numeric, on_error="ignore") == non_numeric | |
| assert cast_numeric(non_numeric, on_error="none") is None | |
| assert cast_numeric(non_numeric, on_error="default", default=0) == 0 | |
| # Test error raising | |
| with pytest.raises(Exception): | |
| cast_numeric(non_numeric, on_error="raise") | |
| def test_to_lower() -> None: | |
| """Test string lowercasing function.""" | |
| # Test normal cases | |
| assert to_lower("Hello") == "hello" | |
| assert to_lower("HELLO") == "hello" | |
| assert to_lower("HeLLo") == "hello" | |
| # Test with spaces and special characters | |
| assert to_lower("Hello World!") == "hello world!" | |
| assert to_lower("Hello123") == "hello123" | |
| # Test empty and None | |
| assert to_lower("") == "" | |
| assert to_lower(None) is None | |
| def test_strip_whitespace() -> None: | |
| """Test whitespace stripping function.""" | |
| # Test normal cases | |
| assert strip_whitespace(" hello ") == "hello" | |
| assert strip_whitespace("hello") == "hello" | |
| # Test with tabs and newlines | |
| assert strip_whitespace("\thello\n") == "hello" | |
| assert strip_whitespace(" hello\n world ") == "hello\n world" | |
| # Test empty and None | |
| assert strip_whitespace(" ") == "" | |
| assert strip_whitespace("") == "" | |
| assert strip_whitespace(None) is None | |
| def test_squash_whitespace() -> None: | |
| """Test whitespace squashing function.""" | |
| # Test normal cases | |
| assert squash_whitespace("hello world") == "hello world" | |
| assert squash_whitespace(" hello world ") == "hello world" | |
| # Test with tabs and newlines | |
| assert squash_whitespace("hello\n\nworld") == "hello world" | |
| assert squash_whitespace("hello\t\tworld") == "hello world" | |
| assert squash_whitespace("\n hello \t world \n") == "hello world" | |
| # Test empty and None | |
| assert squash_whitespace(" ") == "" | |
| assert squash_whitespace("") == "" | |
| assert squash_whitespace(None) is None | |
| def test_normalize_unicode() -> None: | |
| """Test unicode normalization function.""" | |
| # Test normal cases | |
| assert normalize_unicode("hello") == "hello" | |
| # Test composed characters | |
| assert normalize_unicode("café") == "café" # Composed 'é' | |
| # Test decomposed characters | |
| decomposed = "cafe\u0301" # 'e' with combining acute accent | |
| assert normalize_unicode(decomposed) == "café" # Should normalize to composed form | |
| # Test different normalization forms | |
| assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC") | |
| # Test empty and None | |
| assert normalize_unicode("") == "" | |
| assert normalize_unicode(None) is None | |
| def test_remove_punctuation() -> None: | |
| """Test punctuation removal function.""" | |
| # Test normal cases | |
| assert remove_punctuation("hello, world!") == "hello world" | |
| assert remove_punctuation("hello.world") == "helloworld" | |
| # Test with multiple punctuation marks | |
| assert remove_punctuation("hello!!! world???") == "hello world" | |
| assert remove_punctuation("hello@#$%world") == "helloworld" | |
| # Test with unicode punctuation | |
| assert remove_punctuation("hello—world") == "helloworld" | |
| assert remove_punctuation("«hello»") == "hello" | |
| # Test empty and None | |
| assert remove_punctuation("") == "" | |
| assert remove_punctuation(None) is None | |
| def test_map_values() -> None: | |
| """Test value mapping function.""" | |
| mapping = {"a": 1, "b": 2, "c": 3} | |
| # Test normal cases | |
| assert map_values("a", mapping) == 1 | |
| assert map_values("b", mapping) == 2 | |
| # Test with default value | |
| assert map_values("x", mapping) is None | |
| assert map_values("x", mapping, default=0) == 0 | |
| # Test with different value types | |
| mixed_mapping = {1: "one", "two": 2, None: "null"} | |
| assert map_values(1, mixed_mapping) == "one" | |
| assert map_values(None, mixed_mapping) == "null" | |
| def test_cast_numeric() -> None: | |
| """Test numeric casting function.""" | |
| # Test successful casts | |
| assert cast_numeric("123") == 123 | |
| assert cast_numeric("123.45") == 123.45 | |
| assert cast_numeric(123) == 123 | |
| assert cast_numeric(123.45) == 123.45 | |
| # Test integers vs floats | |
| assert isinstance(cast_numeric("123"), int) | |
| assert isinstance(cast_numeric("123.45"), float) | |
| # Test empty values | |
| assert cast_numeric(None) is None | |
| assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string | |
| assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace | |
| # Test empty values with default behavior (on_error="ignore") | |
| assert cast_numeric("") == "" | |
| assert cast_numeric(" ") == " " | |
| # Test error handling modes | |
| non_numeric = "abc" | |
| assert cast_numeric(non_numeric, on_error="ignore") == non_numeric | |
| assert cast_numeric(non_numeric, on_error="none") is None | |
| assert cast_numeric(non_numeric, on_error="default", default=0) == 0 | |
| # Test error raising | |
| with pytest.raises(Exception): | |
| cast_numeric(non_numeric, on_error="raise") |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 13-13: Function is missing a return type annotation [no-untyped-def]
[error] 13-13: Use "-> None" if function does not return a value
[error] 28-28: Function is missing a return type annotation [no-untyped-def]
[error] 28-28: Use "-> None" if function does not return a value
[error] 43-43: Function is missing a return type annotation [no-untyped-def]
[error] 43-43: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 78-78: Function is missing a return type annotation [no-untyped-def]
[error] 78-78: Use "-> None" if function does not return a value
[error] 96-96: Function is missing a return type annotation [no-untyped-def]
[error] 96-96: Use "-> None" if function does not return a value
[error] 113-113: Function is missing a return type annotation [no-untyped-def]
[error] 113-113: Use "-> None" if function does not return a value
[error] 131-131: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal['']") [comparison-overlap]
[error] 132-132: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal[' ']") [comparison-overlap]
[error] 136-136: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "str") [comparison-overlap]
🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_cleaning.py around lines 13-142 the
test functions (test_to_lower, test_strip_whitespace, test_squash_whitespace,
test_normalize_unicode, test_remove_punctuation, test_map_values,
test_cast_numeric) are missing explicit return type annotations; update each
function definition to include "-> None" (e.g., def test_to_lower() -> None:) so
all tests have explicit return types, then run the test suite to ensure no
further type-related failures.
| def test_try_parse_date(): | ||
| """Test date parsing function.""" | ||
| # Test with datetime object | ||
| dt = datetime(2023, 1, 15) | ||
| assert try_parse_date(dt) == dt | ||
|
|
||
| # Test with non-date object | ||
| assert try_parse_date("2023-01-15") is None | ||
| assert try_parse_date(123) is None | ||
| assert try_parse_date(None) is None | ||
|
|
||
| def test_extract_date_parts(): | ||
| """Test date parts extraction function.""" | ||
| # Test with valid datetime | ||
| dt = datetime(2023, 1, 15) # Sunday | ||
| parts = extract_date_parts(dt) | ||
| assert parts["year"] == 2023 | ||
| assert parts["month"] == 1 | ||
| assert parts["day"] == 15 | ||
| assert parts["dow"] == 6 # Sunday is 6 | ||
|
|
||
| # Test with invalid input | ||
| parts = extract_date_parts(None) | ||
| assert all(v is None for v in parts.values()) | ||
|
|
||
| parts = extract_date_parts("not a date") | ||
| assert all(v is None for v in parts.values()) | ||
|
|
||
| def test_floor_to_month(): | ||
| """Test floor to month function.""" | ||
| # Test normal cases | ||
| dt = datetime(2023, 1, 15) | ||
| assert floor_to_month(dt) == datetime(2023, 1, 1) | ||
|
|
||
| dt = datetime(2023, 12, 31) | ||
| assert floor_to_month(dt) == datetime(2023, 12, 1) | ||
|
|
||
| # Test first day of month | ||
| dt = datetime(2023, 1, 1) | ||
| assert floor_to_month(dt) == dt | ||
|
|
||
| # Test with invalid input | ||
| assert floor_to_month(None) is None | ||
| assert floor_to_month("not a date") is None | ||
|
|
||
| def test_ceil_to_month(): | ||
| """Test ceil to month function.""" | ||
| # Test normal cases | ||
| dt = datetime(2023, 1, 15) | ||
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | ||
|
|
||
| # Test end of year | ||
| dt = datetime(2023, 12, 15) | ||
| assert ceil_to_month(dt) == datetime(2024, 1, 1) | ||
|
|
||
| # Test first day of month | ||
| dt = datetime(2023, 1, 1) | ||
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | ||
|
|
||
| # Test with invalid input | ||
| assert ceil_to_month(None) is None | ||
| assert ceil_to_month("not a date") is None No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type annotations to test functions.
All test functions are missing -> None return type annotations. Would you mind adding these to satisfy the type checker? Wdyt?
-def test_try_parse_date():
+def test_try_parse_date() -> None:
"""Test date parsing function."""
-def test_extract_date_parts():
+def test_extract_date_parts() -> None:
"""Test date parts extraction function."""
-def test_floor_to_month():
+def test_floor_to_month() -> None:
"""Test floor to month function."""
-def test_ceil_to_month():
+def test_ceil_to_month() -> None:
"""Test ceil to month function."""Note: The "Call to untyped function" errors will be resolved once the functions in date.py have proper type annotations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_try_parse_date(): | |
| """Test date parsing function.""" | |
| # Test with datetime object | |
| dt = datetime(2023, 1, 15) | |
| assert try_parse_date(dt) == dt | |
| # Test with non-date object | |
| assert try_parse_date("2023-01-15") is None | |
| assert try_parse_date(123) is None | |
| assert try_parse_date(None) is None | |
| def test_extract_date_parts(): | |
| """Test date parts extraction function.""" | |
| # Test with valid datetime | |
| dt = datetime(2023, 1, 15) # Sunday | |
| parts = extract_date_parts(dt) | |
| assert parts["year"] == 2023 | |
| assert parts["month"] == 1 | |
| assert parts["day"] == 15 | |
| assert parts["dow"] == 6 # Sunday is 6 | |
| # Test with invalid input | |
| parts = extract_date_parts(None) | |
| assert all(v is None for v in parts.values()) | |
| parts = extract_date_parts("not a date") | |
| assert all(v is None for v in parts.values()) | |
| def test_floor_to_month(): | |
| """Test floor to month function.""" | |
| # Test normal cases | |
| dt = datetime(2023, 1, 15) | |
| assert floor_to_month(dt) == datetime(2023, 1, 1) | |
| dt = datetime(2023, 12, 31) | |
| assert floor_to_month(dt) == datetime(2023, 12, 1) | |
| # Test first day of month | |
| dt = datetime(2023, 1, 1) | |
| assert floor_to_month(dt) == dt | |
| # Test with invalid input | |
| assert floor_to_month(None) is None | |
| assert floor_to_month("not a date") is None | |
| def test_ceil_to_month(): | |
| """Test ceil to month function.""" | |
| # Test normal cases | |
| dt = datetime(2023, 1, 15) | |
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | |
| # Test end of year | |
| dt = datetime(2023, 12, 15) | |
| assert ceil_to_month(dt) == datetime(2024, 1, 1) | |
| # Test first day of month | |
| dt = datetime(2023, 1, 1) | |
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | |
| # Test with invalid input | |
| assert ceil_to_month(None) is None | |
| assert ceil_to_month("not a date") is None | |
| def test_try_parse_date() -> None: | |
| """Test date parsing function.""" | |
| # Test with datetime object | |
| dt = datetime(2023, 1, 15) | |
| assert try_parse_date(dt) == dt | |
| # Test with non-date object | |
| assert try_parse_date("2023-01-15") is None | |
| assert try_parse_date(123) is None | |
| assert try_parse_date(None) is None | |
| def test_extract_date_parts() -> None: | |
| """Test date parts extraction function.""" | |
| # Test with valid datetime | |
| dt = datetime(2023, 1, 15) # Sunday | |
| parts = extract_date_parts(dt) | |
| assert parts["year"] == 2023 | |
| assert parts["month"] == 1 | |
| assert parts["day"] == 15 | |
| assert parts["dow"] == 6 # Sunday is 6 | |
| # Test with invalid input | |
| parts = extract_date_parts(None) | |
| assert all(v is None for v in parts.values()) | |
| parts = extract_date_parts("not a date") | |
| assert all(v is None for v in parts.values()) | |
| def test_floor_to_month() -> None: | |
| """Test floor to month function.""" | |
| # Test normal cases | |
| dt = datetime(2023, 1, 15) | |
| assert floor_to_month(dt) == datetime(2023, 1, 1) | |
| dt = datetime(2023, 12, 31) | |
| assert floor_to_month(dt) == datetime(2023, 12, 1) | |
| # Test first day of month | |
| dt = datetime(2023, 1, 1) | |
| assert floor_to_month(dt) == dt | |
| # Test with invalid input | |
| assert floor_to_month(None) is None | |
| assert floor_to_month("not a date") is None | |
| def test_ceil_to_month() -> None: | |
| """Test ceil to month function.""" | |
| # Test normal cases | |
| dt = datetime(2023, 1, 15) | |
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | |
| # Test end of year | |
| dt = datetime(2023, 12, 15) | |
| assert ceil_to_month(dt) == datetime(2024, 1, 1) | |
| # Test first day of month | |
| dt = datetime(2023, 1, 1) | |
| assert ceil_to_month(dt) == datetime(2023, 2, 1) | |
| # Test with invalid input | |
| assert ceil_to_month(None) is None | |
| assert ceil_to_month("not a date") is None |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 11-11: Function is missing a return type annotation [no-untyped-def]
[error] 11-11: Use "-> None" if function does not return a value
[error] 22-22: Function is missing a return type annotation [no-untyped-def]
[error] 22-22: Use "-> None" if function does not return a value
[error] 39-39: Function is missing a return type annotation [no-untyped-def]
[error] 39-39: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 46-46: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 50-50: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 53-53: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
[error] 54-54: Call to untyped function "floor_to_month" in typed context [no-untyped-call]
🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_date.py around lines 11 to 72, the
test functions lack return type annotations; update each test function
definition (test_try_parse_date, test_extract_date_parts, test_floor_to_month,
test_ceil_to_month) to include an explicit "-> None" return type (e.g., def
test_try_parse_date() -> None:) so the type checker sees them as properly
annotated tests.
| def test_numeric_skewness(): | ||
| """Test skewness calculation function.""" | ||
| # Test normal cases | ||
| assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data | ||
| assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew | ||
| assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew | ||
|
|
||
| # Test edge cases | ||
| assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values | ||
| assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance | ||
|
|
||
| # Test with floating point values | ||
| assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10) | ||
|
|
||
| def test_choose_imputation_strategy(): | ||
| """Test imputation strategy selection function.""" | ||
| # Test numeric data | ||
| assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew | ||
| assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew | ||
|
|
||
| # Test categorical data | ||
| assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode" | ||
| assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric | ||
|
|
||
| # Test repeated values with custom threshold | ||
| assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6) | ||
|
|
||
| # Test empty and None values | ||
| assert choose_imputation_strategy([]) == "mode" | ||
| assert choose_imputation_strategy([None, None]) == "mode" | ||
|
|
||
| # Test with mixed types | ||
| assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected | ||
|
|
||
| def test_compute_imputation_value(): | ||
| """Test imputation value computation function.""" | ||
| # Test mean strategy | ||
| assert compute_imputation_value([1, 2, 3], "mean") == 2.0 | ||
| assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5 | ||
|
|
||
| # Test median strategy | ||
| assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5 | ||
| assert compute_imputation_value([1, 2, 3], "median") == 2.0 | ||
|
|
||
| # Test mode strategy | ||
| assert compute_imputation_value([1, 1, 2], "mode") == 1 | ||
| assert compute_imputation_value(["a", "a", "b"], "mode") == "a" | ||
|
|
||
| # Test with None values | ||
| assert compute_imputation_value([1, None, 3], "mean") == 2.0 | ||
| assert compute_imputation_value([None, None], "mean") is None | ||
|
|
||
| # Test invalid strategy | ||
| with pytest.raises(ValueError): | ||
| compute_imputation_value([1, 2, 3], "invalid") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type annotations to test functions.
Test functions test_numeric_skewness, test_choose_imputation_strategy, and test_compute_imputation_value are missing -> None return type annotations. Would you mind adding these? Wdyt?
-def test_numeric_skewness():
+def test_numeric_skewness() -> None:
"""Test skewness calculation function."""
-def test_choose_imputation_strategy():
+def test_choose_imputation_strategy() -> None:
"""Test imputation strategy selection function."""
-def test_compute_imputation_value():
+def test_compute_imputation_value() -> None:
"""Test imputation value computation function."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_numeric_skewness(): | |
| """Test skewness calculation function.""" | |
| # Test normal cases | |
| assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data | |
| assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew | |
| assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew | |
| # Test edge cases | |
| assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values | |
| assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance | |
| # Test with floating point values | |
| assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10) | |
| def test_choose_imputation_strategy(): | |
| """Test imputation strategy selection function.""" | |
| # Test numeric data | |
| assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew | |
| assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew | |
| # Test categorical data | |
| assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode" | |
| assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric | |
| # Test repeated values with custom threshold | |
| assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6) | |
| # Test empty and None values | |
| assert choose_imputation_strategy([]) == "mode" | |
| assert choose_imputation_strategy([None, None]) == "mode" | |
| # Test with mixed types | |
| assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected | |
| def test_compute_imputation_value(): | |
| """Test imputation value computation function.""" | |
| # Test mean strategy | |
| assert compute_imputation_value([1, 2, 3], "mean") == 2.0 | |
| assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5 | |
| # Test median strategy | |
| assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5 | |
| assert compute_imputation_value([1, 2, 3], "median") == 2.0 | |
| # Test mode strategy | |
| assert compute_imputation_value([1, 1, 2], "mode") == 1 | |
| assert compute_imputation_value(["a", "a", "b"], "mode") == "a" | |
| # Test with None values | |
| assert compute_imputation_value([1, None, 3], "mean") == 2.0 | |
| assert compute_imputation_value([None, None], "mean") is None | |
| # Test invalid strategy | |
| with pytest.raises(ValueError): | |
| compute_imputation_value([1, 2, 3], "invalid") | |
| def test_numeric_skewness() -> None: | |
| """Test skewness calculation function.""" | |
| # Test normal cases | |
| assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data | |
| assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew | |
| assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew | |
| # Test edge cases | |
| assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values | |
| assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance | |
| # Test with floating point values | |
| assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10) | |
| def test_choose_imputation_strategy() -> None: | |
| """Test imputation strategy selection function.""" | |
| # Test numeric data | |
| assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew | |
| assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew | |
| # Test categorical data | |
| assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode" | |
| assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric | |
| # Test repeated values with custom threshold | |
| assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6) | |
| # Test empty and None values | |
| assert choose_imputation_strategy([]) == "mode" | |
| assert choose_imputation_strategy([None, None]) == "mode" | |
| # Test with mixed types | |
| assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected | |
| def test_compute_imputation_value() -> None: | |
| """Test imputation value computation function.""" | |
| # Test mean strategy | |
| assert compute_imputation_value([1, 2, 3], "mean") == 2.0 | |
| assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5 | |
| # Test median strategy | |
| assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5 | |
| assert compute_imputation_value([1, 2, 3], "median") == 2.0 | |
| # Test mode strategy | |
| assert compute_imputation_value([1, 1, 2], "mode") == 1 | |
| assert compute_imputation_value(["a", "a", "b"], "mode") == "a" | |
| # Test with None values | |
| assert compute_imputation_value([1, None, 3], "mean") == 2.0 | |
| assert compute_imputation_value([None, None], "mean") is None | |
| # Test invalid strategy | |
| with pytest.raises(ValueError): | |
| compute_imputation_value([1, 2, 3], "invalid") |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 12-12: Function is missing a return type annotation [no-untyped-def]
[error] 12-12: Use "-> None" if function does not return a value
[error] 26-26: Function is missing a return type annotation [no-untyped-def]
[error] 26-26: Use "-> None" if function does not return a value
[error] 12-12: Function is missing a return type annotation [no-untyped-def]
[error] 12-12: Use "-> None" if function does not return a value
[error] 26-26: Function is missing a return type annotation [no-untyped-def]
[error] 26-26: Use "-> None" if function does not return a value
[error] 46-46: Function is missing a return type annotation [no-untyped-def]
[error] 46-46: Use "-> None" if function does not return a value
🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_impute.py around lines 12 to 67, the
three test functions lack explicit return type annotations; update each function
definition to include "-> None" (i.e., change "def test_numeric_skewness():",
"def test_choose_imputation_strategy():", and "def
test_compute_imputation_value():" to "def test_numeric_skewness() -> None:",
"def test_choose_imputation_strategy() -> None:", and "def
test_compute_imputation_value() -> None:" respectively), no other changes
required.
| def test_fill_nulls_column(): | ||
| """Test column null filling function.""" | ||
| # Test numeric data | ||
| values, report = fill_nulls_column([1, None, 3]) | ||
| assert values == [1, 2.0, 3] | ||
| assert report.strategy == "mean" | ||
| assert report.value_used == 2.0 | ||
|
|
||
| # Test categorical data | ||
| values, report = fill_nulls_column(["a", None, "a"]) | ||
| assert values == ["a", "a", "a"] | ||
| assert report.strategy == "mode" | ||
| assert report.value_used == "a" | ||
|
|
||
| # Test explicit strategy | ||
| values, report = fill_nulls_column([1, None, 3], explicit_strategy="median") | ||
| assert values == [1, 2, 3] | ||
| assert report.strategy == "median" | ||
|
|
||
| # Test all None values | ||
| values, report = fill_nulls_column([None, None]) | ||
| assert values == [None, None] | ||
| assert report.value_used is None | ||
|
|
||
| def test_fill_nulls_record(): | ||
| """Test record null filling function.""" | ||
| # Test basic record filling | ||
| record = {"a": 1, "b": None, "c": "x"} | ||
| samples = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]} | ||
| filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples) | ||
|
|
||
| assert filled["a"] == 1 | ||
| assert filled["b"] == 5.0 # Mean of samples | ||
| assert filled["c"] == "x" | ||
| assert len(reports) == 3 | ||
| assert all(isinstance(r, ImputationReport) for r in reports) | ||
|
|
||
| # Test with explicit strategies | ||
| strategies = {"b": "median"} | ||
| filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples, strategies=strategies) | ||
| assert filled["b"] == 5.0 # Median of samples | ||
|
|
||
| # Test with empty samples | ||
| filled, reports = fill_nulls_record(record, ["a", "b", "c"], {}) | ||
| assert filled["b"] is None # No samples to impute from | ||
|
|
||
| # Test with missing columns | ||
| filled, reports = fill_nulls_record(record, ["a", "d"], samples) | ||
| assert "d" in filled | ||
| assert len(reports) == 2 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type annotations and fix type compatibility for samples dict.
Two issues here:
- Test functions missing
-> Nonereturn type annotations - The
samplesdict has type inference issues - the type checker seesdict[str, object]but expectsMapping[str, Sequence[Any]]
Would you consider adding explicit type hints to the samples dict to help the type checker? Wdyt?
-def test_fill_nulls_column():
+def test_fill_nulls_column() -> None:
"""Test column null filling function."""
-def test_fill_nulls_record():
+def test_fill_nulls_record() -> None:
"""Test record null filling function."""
# Test basic record filling
record = {"a": 1, "b": None, "c": "x"}
- samples = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]}
+ samples: dict[str, Sequence[Any]] = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]}
filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples)Apply similar type hints on lines 106-107 and 111.
🧰 Tools
🪛 GitHub Actions: Linters
[error] 68-68: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
[error] 107-107: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
[error] 115-115: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]
🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_impute.py around lines 68-117, the
test functions lack explicit return type annotations and the local samples
variables are inferred as dict[str, object] which conflicts with functions
expecting Mapping[str, Sequence[Any]]; add -> None to both
test_fill_nulls_column and test_fill_nulls_record declarations, and annotate
each samples variable with the appropriate type (e.g., samples: Mapping[str,
Sequence[Any]]) on the occurrences around lines 106-107 and 111 so the type
checker accepts the passed argument.
| def test_minmax_scale(): | ||
| """Test minmax scaling function.""" | ||
| # Test normal scaling | ||
| assert minmax_scale(5, 0, 10) == 0.5 | ||
| assert minmax_scale(5, 0, 10, (0, 100)) == 50.0 | ||
|
|
||
| # Test edge cases | ||
| assert minmax_scale(0, 0, 10) == 0.0 | ||
| assert minmax_scale(10, 0, 10) == 1.0 | ||
|
|
||
| # Test custom range scaling | ||
| assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0 | ||
|
|
||
| # Test when data_max equals data_min (prevents division by zero) | ||
| assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range | ||
|
|
||
| # Test with float inputs | ||
| assert minmax_scale(5.5, 0.0, 10.0) == 0.55 | ||
|
|
||
| def test_zscore(): | ||
| """Test z-score calculation function.""" | ||
| # Test normal cases | ||
| assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2 | ||
| assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2 | ||
|
|
||
| # Test with zero sigma | ||
| assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully | ||
|
|
||
| # Test with float inputs | ||
| assert zscore(10.5, 5.0, 2.0) == 2.75 | ||
|
|
||
| def test_clip(): | ||
| """Test value clipping function.""" | ||
| # Test normal clipping | ||
| assert clip(5, 0, 10) == 5 | ||
| assert clip(-1, 0, 10) == 0 | ||
| assert clip(11, 0, 10) == 10 | ||
|
|
||
| # Test with float values | ||
| assert clip(5.5, 0.0, 10.0) == 5.5 | ||
| assert clip(-1.5, 0.0, 10.0) == 0.0 | ||
|
|
||
| # Test when low == high | ||
| assert clip(5, 3, 3) == 3 | ||
|
|
||
| def test_winsorize(): | ||
| """Test winsorization function.""" | ||
| # Test normal cases | ||
| assert winsorize(5, 0, 10) == 5 | ||
| assert winsorize(-1, 0, 10) == 0 | ||
| assert winsorize(11, 0, 10) == 10 | ||
|
|
||
| # Test with float values | ||
| assert winsorize(5.5, 0.0, 10.0) == 5.5 | ||
|
|
||
| # Test when low == high | ||
| assert winsorize(5, 3, 3) == 3 | ||
|
|
||
| def test_log1p_safe(): | ||
| """Test safe log1p calculation function.""" | ||
| # Test normal cases | ||
| assert log1p_safe(0) == 0.0 | ||
| assert log1p_safe(math.e - 1) == 1.0 | ||
|
|
||
| # Test negative values > -1 | ||
| assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10 | ||
|
|
||
| # Test negative values <= -1 | ||
| assert log1p_safe(-2) == -2.0 # Should return input value | ||
|
|
||
| # Test error cases | ||
| assert log1p_safe(float('inf')) == float('inf') | ||
|
|
||
| def test_bucketize(): | ||
| """Test bucketization function.""" | ||
| edges = [0, 10, 20, 30] | ||
|
|
||
| # Test normal cases | ||
| assert bucketize(-5, edges) == 0 | ||
| assert bucketize(5, edges) == 1 | ||
| assert bucketize(15, edges) == 2 | ||
| assert bucketize(25, edges) == 3 | ||
| assert bucketize(35, edges) == 4 | ||
|
|
||
| # Test edge values | ||
| assert bucketize(0, edges) == 0 | ||
| assert bucketize(10, edges) == 1 | ||
| assert bucketize(20, edges) == 2 | ||
| assert bucketize(30, edges) == 3 | ||
|
|
||
| # Test empty edges | ||
| assert bucketize(5, []) == 0 | ||
|
|
||
| # Test single edge | ||
| assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0 | ||
| assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1 | ||
|
|
||
| def test_robust_percentile_scale(): | ||
| """Test robust percentile scaling function.""" | ||
| # Test normal scaling | ||
| assert robust_percentile_scale(5, 0, 10) == 0.5 | ||
| assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0 | ||
|
|
||
| # Test edge cases | ||
| assert robust_percentile_scale(0, 0, 10) == 0.0 | ||
| assert robust_percentile_scale(10, 0, 10) == 1.0 | ||
|
|
||
| # Test custom range | ||
| assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0 | ||
|
|
||
| # Test clipping | ||
| assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping | ||
| assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping | ||
|
|
||
| # Test when high equals low | ||
| assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type annotations to test functions.
All test functions are missing -> None return type annotations, which the type checker requires. Would you consider adding these to satisfy the linter? Here's the pattern to apply:
-def test_minmax_scale():
+def test_minmax_scale() -> None:
"""Test minmax scaling function."""
-def test_zscore():
+def test_zscore() -> None:
"""Test z-score calculation function."""
-def test_clip():
+def test_clip() -> None:
"""Test value clipping function."""
-def test_winsorize():
+def test_winsorize() -> None:
"""Test winsorization function."""
-def test_log1p_safe():
+def test_log1p_safe() -> None:
"""Test safe log1p calculation function."""
-def test_bucketize():
+def test_bucketize() -> None:
"""Test bucketization function."""
-def test_robust_percentile_scale():
+def test_robust_percentile_scale() -> None:
"""Test robust percentile scaling function."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_minmax_scale(): | |
| """Test minmax scaling function.""" | |
| # Test normal scaling | |
| assert minmax_scale(5, 0, 10) == 0.5 | |
| assert minmax_scale(5, 0, 10, (0, 100)) == 50.0 | |
| # Test edge cases | |
| assert minmax_scale(0, 0, 10) == 0.0 | |
| assert minmax_scale(10, 0, 10) == 1.0 | |
| # Test custom range scaling | |
| assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0 | |
| # Test when data_max equals data_min (prevents division by zero) | |
| assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range | |
| # Test with float inputs | |
| assert minmax_scale(5.5, 0.0, 10.0) == 0.55 | |
| def test_zscore(): | |
| """Test z-score calculation function.""" | |
| # Test normal cases | |
| assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2 | |
| assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2 | |
| # Test with zero sigma | |
| assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully | |
| # Test with float inputs | |
| assert zscore(10.5, 5.0, 2.0) == 2.75 | |
| def test_clip(): | |
| """Test value clipping function.""" | |
| # Test normal clipping | |
| assert clip(5, 0, 10) == 5 | |
| assert clip(-1, 0, 10) == 0 | |
| assert clip(11, 0, 10) == 10 | |
| # Test with float values | |
| assert clip(5.5, 0.0, 10.0) == 5.5 | |
| assert clip(-1.5, 0.0, 10.0) == 0.0 | |
| # Test when low == high | |
| assert clip(5, 3, 3) == 3 | |
| def test_winsorize(): | |
| """Test winsorization function.""" | |
| # Test normal cases | |
| assert winsorize(5, 0, 10) == 5 | |
| assert winsorize(-1, 0, 10) == 0 | |
| assert winsorize(11, 0, 10) == 10 | |
| # Test with float values | |
| assert winsorize(5.5, 0.0, 10.0) == 5.5 | |
| # Test when low == high | |
| assert winsorize(5, 3, 3) == 3 | |
| def test_log1p_safe(): | |
| """Test safe log1p calculation function.""" | |
| # Test normal cases | |
| assert log1p_safe(0) == 0.0 | |
| assert log1p_safe(math.e - 1) == 1.0 | |
| # Test negative values > -1 | |
| assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10 | |
| # Test negative values <= -1 | |
| assert log1p_safe(-2) == -2.0 # Should return input value | |
| # Test error cases | |
| assert log1p_safe(float('inf')) == float('inf') | |
| def test_bucketize(): | |
| """Test bucketization function.""" | |
| edges = [0, 10, 20, 30] | |
| # Test normal cases | |
| assert bucketize(-5, edges) == 0 | |
| assert bucketize(5, edges) == 1 | |
| assert bucketize(15, edges) == 2 | |
| assert bucketize(25, edges) == 3 | |
| assert bucketize(35, edges) == 4 | |
| # Test edge values | |
| assert bucketize(0, edges) == 0 | |
| assert bucketize(10, edges) == 1 | |
| assert bucketize(20, edges) == 2 | |
| assert bucketize(30, edges) == 3 | |
| # Test empty edges | |
| assert bucketize(5, []) == 0 | |
| # Test single edge | |
| assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0 | |
| assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1 | |
| def test_robust_percentile_scale(): | |
| """Test robust percentile scaling function.""" | |
| # Test normal scaling | |
| assert robust_percentile_scale(5, 0, 10) == 0.5 | |
| assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0 | |
| # Test edge cases | |
| assert robust_percentile_scale(0, 0, 10) == 0.0 | |
| assert robust_percentile_scale(10, 0, 10) == 1.0 | |
| # Test custom range | |
| assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0 | |
| # Test clipping | |
| assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping | |
| assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping | |
| # Test when high equals low | |
| assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range | |
| def test_minmax_scale() -> None: | |
| """Test minmax scaling function.""" | |
| # Test normal scaling | |
| assert minmax_scale(5, 0, 10) == 0.5 | |
| assert minmax_scale(5, 0, 10, (0, 100)) == 50.0 | |
| # Test edge cases | |
| assert minmax_scale(0, 0, 10) == 0.0 | |
| assert minmax_scale(10, 0, 10) == 1.0 | |
| # Test custom range scaling | |
| assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0 | |
| # Test when data_max equals data_min (prevents division by zero) | |
| assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range | |
| # Test with float inputs | |
| assert minmax_scale(5.5, 0.0, 10.0) == 0.55 | |
| def test_zscore() -> None: | |
| """Test z-score calculation function.""" | |
| # Test normal cases | |
| assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2 | |
| assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2 | |
| # Test with zero sigma | |
| assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully | |
| # Test with float inputs | |
| assert zscore(10.5, 5.0, 2.0) == 2.75 | |
| def test_clip() -> None: | |
| """Test value clipping function.""" | |
| # Test normal clipping | |
| assert clip(5, 0, 10) == 5 | |
| assert clip(-1, 0, 10) == 0 | |
| assert clip(11, 0, 10) == 10 | |
| # Test with float values | |
| assert clip(5.5, 0.0, 10.0) == 5.5 | |
| assert clip(-1.5, 0.0, 10.0) == 0.0 | |
| # Test when low == high | |
| assert clip(5, 3, 3) == 3 | |
| def test_winsorize() -> None: | |
| """Test winsorization function.""" | |
| # Test normal cases | |
| assert winsorize(5, 0, 10) == 5 | |
| assert winsorize(-1, 0, 10) == 0 | |
| assert winsorize(11, 0, 10) == 10 | |
| # Test with float values | |
| assert winsorize(5.5, 0.0, 10.0) == 5.5 | |
| # Test when low == high | |
| assert winsorize(5, 3, 3) == 3 | |
| def test_log1p_safe() -> None: | |
| """Test safe log1p calculation function.""" | |
| # Test normal cases | |
| assert log1p_safe(0) == 0.0 | |
| assert log1p_safe(math.e - 1) == 1.0 | |
| # Test negative values > -1 | |
| assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10 | |
| # Test negative values <= -1 | |
| assert log1p_safe(-2) == -2.0 # Should return input value | |
| # Test error cases | |
| assert log1p_safe(float('inf')) == float('inf') | |
| def test_bucketize() -> None: | |
| """Test bucketization function.""" | |
| edges = [0, 10, 20, 30] | |
| # Test normal cases | |
| assert bucketize(-5, edges) == 0 | |
| assert bucketize(5, edges) == 1 | |
| assert bucketize(15, edges) == 2 | |
| assert bucketize(25, edges) == 3 | |
| assert bucketize(35, edges) == 4 | |
| # Test edge values | |
| assert bucketize(0, edges) == 0 | |
| assert bucketize(10, edges) == 1 | |
| assert bucketize(20, edges) == 2 | |
| assert bucketize(30, edges) == 3 | |
| # Test empty edges | |
| assert bucketize(5, []) == 0 | |
| # Test single edge | |
| assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0 | |
| assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1 | |
| def test_robust_percentile_scale() -> None: | |
| """Test robust percentile scaling function.""" | |
| # Test normal scaling | |
| assert robust_percentile_scale(5, 0, 10) == 0.5 | |
| assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0 | |
| # Test edge cases | |
| assert robust_percentile_scale(0, 0, 10) == 0.0 | |
| assert robust_percentile_scale(10, 0, 10) == 1.0 | |
| # Test custom range | |
| assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0 | |
| # Test clipping | |
| assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping | |
| assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping | |
| # Test when high equals low | |
| assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 14-14: Function is missing a return type annotation [no-untyped-def]
[error] 14-14: Use "-> None" if function does not return a value
[error] 33-33: Function is missing a return type annotation [no-untyped-def]
[error] 33-33: Use "-> None" if function does not return a value
[error] 45-45: Function is missing a return type annotation [no-untyped-def]
[error] 45-45: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 14-14: Function is missing a return type annotation [no-untyped-def]
[error] 14-14: Use "-> None" if function does not return a value
[error] 33-33: Function is missing a return type annotation [no-untyped-def]
[error] 33-33: Use "-> None" if function does not return a value
[error] 45-45: Function is missing a return type annotation [no-untyped-def]
[error] 45-45: Use "-> None" if function does not return a value
[error] 59-59: Function is missing a return type annotation [no-untyped-def]
[error] 59-59: Use "-> None" if function does not return a value
[error] 72-72: Function is missing a return type annotation [no-untyped-def]
[error] 72-72: Use "-> None" if function does not return a value
[error] 87-87: Function is missing a return type annotation [no-untyped-def]
[error] 87-87: Use "-> None" if function does not return a value
[error] 111-111: Function is missing a return type annotation [no-untyped-def]
[error] 111-111: Use "-> None" if function does not return a value
🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_math.py around lines 14 to 129 the
test functions lack return type annotations; add "-> None" to each test function
signature (e.g., def test_minmax_scale() -> None:) and update all other tests in
this range similarly so every test function explicitly returns None to satisfy
the type checker.
| def extract_date_parts(dt) -> Dict[str, Optional[int]]: | ||
| try: | ||
| return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())} | ||
| except Exception: | ||
| return {"year": None, "month": None, "day": None, "dow": None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotation for dt parameter.
The dt parameter is missing a type annotation. Since it's duck-typed (uses attributes like .year, .month), would Any work here? Wdyt?
-def extract_date_parts(dt) -> Dict[str, Optional[int]]:
+def extract_date_parts(dt: Any) -> Dict[str, Optional[int]]:
try:
return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())}
except Exception:
return {"year": None, "month": None, "day": None, "dow": None}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def extract_date_parts(dt) -> Dict[str, Optional[int]]: | |
| try: | |
| return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())} | |
| except Exception: | |
| return {"year": None, "month": None, "day": None, "dow": None} | |
| def extract_date_parts(dt: Any) -> Dict[str, Optional[int]]: | |
| try: | |
| return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())} | |
| except Exception: | |
| return {"year": None, "month": None, "day": None, "dow": None} |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 10-10: Function is missing a type annotation for one or more arguments [no-untyped-def]
🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 10 to 14, the function
parameter dt is missing a type annotation; update the signature to annotate dt
as Union[datetime.date, datetime.datetime, Any] (import Union and Any from
typing and datetime.date/datetime from datetime) so callers and linters know dt
is expected to be a date-like object but still allow duck-typed inputs; keep the
body unchanged.
| def floor_to_month(dt): | ||
| try: | ||
| return dt.replace(day=1) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotations for floor_to_month.
This function is missing both parameter and return type annotations. Would you consider adding them? Wdyt?
-def floor_to_month(dt):
+def floor_to_month(dt: Any) -> Optional[Any]:
try:
return dt.replace(day=1)
except Exception:
return NoneCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 GitHub Actions: Linters
[error] 16-16: Function is missing a type annotation [no-untyped-def]
🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 16 to 20, the function
floor_to_month lacks type annotations; annotate the parameter to accept
datetime.date or datetime.datetime (e.g., Union[date, datetime]) and the return
type as Optional[Union[date, datetime]] (or the Python 3.10+ union syntax),
import the required types from typing and datetime at the top, and keep the
current behavior of returning None on failure so the signature reflects that
possibility.
| def ceil_to_month(dt): | ||
| try: | ||
| if dt.month == 12: | ||
| return dt.replace(year=dt.year + 1, month=1, day=1) | ||
| return dt.replace(month=dt.month + 1, day=1) | ||
| except Exception: | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotations for ceil_to_month.
This function is also missing parameter and return type annotations. Would you mind adding them? Wdyt?
-def ceil_to_month(dt):
+def ceil_to_month(dt: Any) -> Optional[Any]:
try:
if dt.month == 12:
return dt.replace(year=dt.year + 1, month=1, day=1)
return dt.replace(month=dt.month + 1, day=1)
except Exception:
return NoneCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 GitHub Actions: Linters
[error] 22-22: Function is missing a type annotation [no-untyped-def]
🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 22 to 28, the function
ceil_to_month is missing type annotations; update its signature to accept a
datetime.datetime (or Optional[datetime.datetime] if callers may pass None) and
return Optional[datetime.datetime>, add the necessary imports (from typing
import Optional and import datetime) at the top of the file, and ensure the
function signature and any internal uses reflect those types (e.g., def
ceil_to_month(dt: datetime.datetime) -> Optional[datetime.datetime]:).
| def fill_nulls_column( | ||
| series: Sequence[Any], | ||
| explicit_strategy: Optional[str] = None, | ||
| numeric: Optional[bool] = None, | ||
| **choose_kwargs | ||
| ) -> Tuple[List[Any], ImputationReport]: | ||
| strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs) | ||
| fill_value = compute_imputation_value(series, strategy) | ||
| return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotation for **choose_kwargs parameter.
The **choose_kwargs parameter is missing a type annotation, which the type checker is flagging. Would you consider adding **choose_kwargs: Any or being more specific with the expected keyword arguments? Wdyt?
def fill_nulls_column(
series: Sequence[Any],
explicit_strategy: Optional[str] = None,
numeric: Optional[bool] = None,
- **choose_kwargs
+ **choose_kwargs: Any
) -> Tuple[List[Any], ImputationReport]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def fill_nulls_column( | |
| series: Sequence[Any], | |
| explicit_strategy: Optional[str] = None, | |
| numeric: Optional[bool] = None, | |
| **choose_kwargs | |
| ) -> Tuple[List[Any], ImputationReport]: | |
| strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs) | |
| fill_value = compute_imputation_value(series, strategy) | |
| return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value) | |
| def fill_nulls_column( | |
| series: Sequence[Any], | |
| explicit_strategy: Optional[str] = None, | |
| numeric: Optional[bool] = None, | |
| **choose_kwargs: Any | |
| ) -> Tuple[List[Any], ImputationReport]: | |
| strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs) | |
| fill_value = compute_imputation_value(series, strategy) | |
| return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value) |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 64-64: Function is missing a type annotation for one or more arguments [no-untyped-def]
🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/impute.py around lines 64 to 72, the variadic
keyword parameter **choose_kwargs is missing a type annotation; add a type
annotation (e.g., **choose_kwargs: Any) to the function signature so the type
checker stops flagging it — if you prefer stricter typing, use a more specific
mapping type like **choose_kwargs: Mapping[str, Any] and import Any/Mapping as
needed.
| from __future__ import annotations | ||
| from typing import Sequence, Tuple, Union | ||
| import math | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix import ordering to resolve linter error.
The standard library import math should come before the from typing import. This will resolve the pipeline failure.
Apply this diff:
from __future__ import annotations
-from typing import Sequence, Tuple, Union
import math
+from typing import Sequence, Tuple, Union📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from __future__ import annotations | |
| from typing import Sequence, Tuple, Union | |
| import math | |
| from __future__ import annotations | |
| import math | |
| from typing import Sequence, Tuple, Union | |
🧰 Tools
🪛 GitHub Actions: Linters
[error] 3-3: I001 Import block is un-sorted or un-formatted. Organize imports.
🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/math.py around lines 1 to 4, the import ordering
is incorrect (math should be a standard-library import and come before the
typing import); keep "from __future__ import annotations" as the very first
line, then import math, then the typing imports (Sequence, Tuple, Union), to
satisfy the linter and PEP8 import grouping.
Implemented transformation functions
Cleaning
to_lower – changes all letters in the text to lowercase.
strip_whitespace – removes spaces from the beginning and end of the text.
squash_whitespace – replaces multiple spaces between words with a single space.
normalize_unicode – fixes and standardizes special or accented characters.
remove_punctuation – removes punctuation marks like commas, periods, and question marks.
map_values – replaces a value using a given dictionary or mapping.
cast_numeric – converts text or other types into numbers safely.
Date Transformations
try_parse_date – checks if something is a date and returns it.
extract_date_parts – gives the year, month, day, and weekday from a date.
floor_to_month – changes the date to the first day of the same month.
ceil_to_month – changes the date to the first day of the next month.
Input functions
ImputationReport – keeps a small report showing which method was used to fill missing data.
_numeric_skewness – checks how much the numeric data is skewed (not evenly spread).
choose_imputation_strategy – decides whether to fill missing values using the mean, median, or mode.
compute_imputation_value – actually finds the mean, median, or mode value to use for filling.
fill_nulls_column – fills missing values in one column with the chosen method.
fill_nulls_record – fills missing values in a full record (row) using sample data and gives a small report.
Math Functions
minmax_scale – scales a number to a new range, usually between 0 and 1.
zscore – finds how far a number is from the average in terms of standard deviation.
clip – keeps a number within a lower and upper limit.
winsorize – limits extreme values to reduce outliers (similar to clip).
log1p_safe – safely applies a log(1+x) transformation without errors.
bucketize – puts a number into a range or group (a bucket).
robust_percentile_scale – scales data between percentiles to reduce outlier effects.
Summary by CodeRabbit
Release Notes
New Features
airbyte_cdk.utils.transforms.Documentation
Tests