Skip to content

Conversation

@Sbnikitha
Copy link

@Sbnikitha Sbnikitha commented Nov 13, 2025

Implemented transformation functions
Cleaning
to_lower – changes all letters in the text to lowercase.
strip_whitespace – removes spaces from the beginning and end of the text.
squash_whitespace – replaces multiple spaces between words with a single space.
normalize_unicode – fixes and standardizes special or accented characters.
remove_punctuation – removes punctuation marks like commas, periods, and question marks.
map_values – replaces a value using a given dictionary or mapping.
cast_numeric – converts text or other types into numbers safely.
Date Transformations
try_parse_date – checks if something is a date and returns it.
extract_date_parts – gives the year, month, day, and weekday from a date.
floor_to_month – changes the date to the first day of the same month.
ceil_to_month – changes the date to the first day of the next month.
Input functions
ImputationReport – keeps a small report showing which method was used to fill missing data.
_numeric_skewness – checks how much the numeric data is skewed (not evenly spread).
choose_imputation_strategy – decides whether to fill missing values using the mean, median, or mode.
compute_imputation_value – actually finds the mean, median, or mode value to use for filling.
fill_nulls_column – fills missing values in one column with the chosen method.
fill_nulls_record – fills missing values in a full record (row) using sample data and gives a small report.

Math Functions
minmax_scale – scales a number to a new range, usually between 0 and 1.
zscore – finds how far a number is from the average in terms of standard deviation.
clip – keeps a number within a lower and upper limit.
winsorize – limits extreme values to reduce outliers (similar to clip).
log1p_safe – safely applies a log(1+x) transformation without errors.
bucketize – puts a number into a range or group (a bucket).
robust_percentile_scale – scales data between percentiles to reduce outlier effects.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced transforms utilities library with functions for string cleaning (lowercasing, whitespace normalization, punctuation removal), date operations (parsing, part extraction, month floor/ceil), numeric scaling and transformations, and data imputation strategies.
    • Unified public API for transforms accessible via airbyte_cdk.utils.transforms.
  • Documentation

    • Added comprehensive development guide for the Airbyte Python CDK.
  • Tests

    • Added comprehensive test coverage for all transforms utilities.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

📝 Walkthrough

Walkthrough

This PR adds a comprehensive suite of data transformation utilities to the Airbyte CDK across four modules: mathematical scaling functions, string normalization utilities, date handling functions, and data imputation logic. A new __init__.py consolidates these into a unified public API. Full test coverage is provided for each module, and a development guide is added for AI agent reference.

Changes

Cohort / File(s) Change Summary
Documentation
​.github/copilot-instructions.md
Adds a comprehensive development guide for the Airbyte Python CDK detailing project overview, core components, data flow, development conventions, and common workflows. Serves as inline documentation for AI agents.
Math Transforms
airbyte_cdk/utils/transforms/math.py, airbyte_cdk/test/utils/transforms/test_math.py
Introduces seven numeric transformation utilities: minmax_scale, zscore, clip, winsorize, log1p_safe, bucketize, and robust_percentile_scale. Includes comprehensive test coverage for typical usage and edge cases (e.g., division-by-zero, boundary conditions).
String Cleaning Transforms
airbyte_cdk/utils/transforms/cleaning.py, airbyte_cdk/test/utils/transforms/test_cleaning.py
Adds seven string normalization and type-casting utilities: to_lower, strip_whitespace, squash_whitespace, normalize_unicode, remove_punctuation, map_values, and cast_numeric. Tests cover typical and edge cases including None handling and error modes.
Date Transforms
airbyte_cdk/utils/transforms/date.py, airbyte_cdk/test/utils/transforms/test_date.py
Provides four date handling functions: try_parse_date, extract_date_parts, floor_to_month, and ceil_to_month. Tests validate datetime handling, edge cases, and month-boundary behavior.
Imputation Transforms
airbyte_cdk/utils/transforms/impute.py, airbyte_cdk/test/utils/transforms/test_impute.py
Implements data imputation utilities including ImputationReport dataclass, strategy selection logic, and per-column/record-level filling functions. Tests cover strategy inference, numeric skewness detection, and multi-column imputation workflows.
Public API
airbyte_cdk/utils/transforms/__init__.py
Consolidates and re-exports 24 symbols (functions and classes) from math, cleaning, date, and impute submodules via a centralized __all__ list, establishing a unified public interface.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant fill_nulls_record
    participant choose_imputation_strategy
    participant compute_imputation_value
    
    Caller->>fill_nulls_record: record, columns, samples, strategies
    
    loop For each column
        alt explicit strategy provided
            fill_nulls_record->>compute_imputation_value: series, strategy
        else infer strategy
            fill_nulls_record->>choose_imputation_strategy: series, numeric, skew_threshold
            choose_imputation_strategy-->>fill_nulls_record: strategy ("mean"/"median"/"mode")
            fill_nulls_record->>compute_imputation_value: series, strategy
        end
        
        compute_imputation_value-->>fill_nulls_record: imputation_value
        fill_nulls_record->>fill_nulls_record: apply value if field is None<br/>create ImputationReport
    end
    
    fill_nulls_record-->>Caller: updated_record, [ImputationReport, ...]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Heterogeneous changes across four distinct utility modules with different purposes and logic densities requiring separate reasoning for each
  • Moderate complexity in impute.py — strategy selection logic, skewness calculation, and multi-column coordination warrant careful review
  • Edge case handling — division-by-zero safeguards, None propagation, and error modes in cast_numeric and numeric functions should be verified
  • No structural changes to existing code, purely additive, which reduces overall complexity
  • Comprehensive test coverage provides confidence but tests themselves need validation

Consider focusing extra attention on:

  • The numeric skewness calculation and strategy selection thresholds in impute.py — do the defaults align with intended use cases, wdyt?
  • Error handling consistency across modules, particularly how None is propagated vs. raising exceptions
  • The cast_numeric error modes ("default", "none", "raise", and implicit ignore) — is the behavior clear and complete, wdyt?

Suggested labels

enhancement

Suggested reviewers

  • maxi297
  • brianjlai

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title mentions 'trasformation function' (with a typo) and 'unit test cases', which broadly aligns with the PR's addition of multiple transformation utilities and comprehensive tests across math, cleaning, date, and imputation modules.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (4)
airbyte_cdk/utils/transforms/__init__.py (1)

17-29: Consider adding spaces after commas in __all__ for consistency?

The __all__ list items are missing spaces after commas. While this works fine, adding spaces would align with PEP 8 style conventions and improve readability. Wdyt?

Apply this diff if you'd like to improve the formatting:

 __all__ = [
     # math
-    "minmax_scale","zscore","clip","winsorize","log1p_safe",
-    "bucketize","robust_percentile_scale",
+    "minmax_scale", "zscore", "clip", "winsorize", "log1p_safe",
+    "bucketize", "robust_percentile_scale",
     # cleaning
-    "to_lower","strip_whitespace","squash_whitespace",
-    "normalize_unicode","remove_punctuation","map_values","cast_numeric",
+    "to_lower", "strip_whitespace", "squash_whitespace",
+    "normalize_unicode", "remove_punctuation", "map_values", "cast_numeric",
     # date
-    "try_parse_date","extract_date_parts","floor_to_month","ceil_to_month",
+    "try_parse_date", "extract_date_parts", "floor_to_month", "ceil_to_month",
     # impute
-    "ImputationReport","choose_imputation_strategy",
-    "compute_imputation_value","fill_nulls_column","fill_nulls_record",
+    "ImputationReport", "choose_imputation_strategy",
+    "compute_imputation_value", "fill_nulls_column", "fill_nulls_record",
 ]
airbyte_cdk/utils/transforms/math.py (3)

7-11: Consider potential floating-point precision issues with equality check?

Line 9 uses == to compare floats (data_max == data_min). While this usually works when the same values are passed in, floating-point arithmetic can sometimes lead to precision issues. Would using a small epsilon for comparison be more robust, or is exact equality the intended behavior here? Wdyt?


22-28: Document the error-handling behavior of log1p_safe?

The function returns the original value float(x) when an exception occurs (line 28). While this "safe" pattern prevents crashes, users might not expect this behavior. Adding a docstring to clarify when and why the original value is returned would help. Wdyt about adding documentation for this?


36-50: Consider using a local variable instead of reassigning the parameter?

On line 46, the parameter x is reassigned when clip_outliers=True. While this works, it can make the code slightly harder to follow. Would you consider using a local variable like scaled_x to preserve the original parameter? This could improve clarity. Wdyt?

Example:

def robust_percentile_scale(
    x: Number,
    p_low_value: Number,
    p_high_value: Number,
    out_range: Tuple[Number, Number]=(0.0, 1.0),
    clip_outliers: bool=True
) -> float:
    a, b = out_range
    lo, hi = float(p_low_value), float(p_high_value)
    scaled_x = clip(float(x), lo, hi) if clip_outliers else float(x)
    width = hi - lo
    if width == 0:
        return float(a + (b - a) / 2.0)
    return ((scaled_x - lo) / width) * (b - a) + a
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8764296 and f08ebad.

⛔ Files ignored due to path filters (1)
  • .DS_Store is excluded by !**/.DS_Store
📒 Files selected for processing (10)
  • .github/copilot-instructions.md (1 hunks)
  • airbyte_cdk/test/utils/transforms/test_cleaning.py (1 hunks)
  • airbyte_cdk/test/utils/transforms/test_date.py (1 hunks)
  • airbyte_cdk/test/utils/transforms/test_impute.py (1 hunks)
  • airbyte_cdk/test/utils/transforms/test_math.py (1 hunks)
  • airbyte_cdk/utils/transforms/__init__.py (1 hunks)
  • airbyte_cdk/utils/transforms/cleaning.py (1 hunks)
  • airbyte_cdk/utils/transforms/date.py (1 hunks)
  • airbyte_cdk/utils/transforms/impute.py (1 hunks)
  • airbyte_cdk/utils/transforms/math.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • .github/copilot-instructions.md
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • .github/copilot-instructions.md
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, ignore all `__init__.py` files when providing a recommended reviewing order.

Applied to files:

  • .github/copilot-instructions.md
🧬 Code graph analysis (5)
airbyte_cdk/test/utils/transforms/test_cleaning.py (1)
airbyte_cdk/utils/transforms/cleaning.py (7)
  • to_lower (7-8)
  • strip_whitespace (10-11)
  • squash_whitespace (13-16)
  • normalize_unicode (18-19)
  • remove_punctuation (22-25)
  • map_values (27-28)
  • cast_numeric (30-44)
airbyte_cdk/test/utils/transforms/test_date.py (1)
airbyte_cdk/utils/transforms/date.py (4)
  • try_parse_date (4-8)
  • extract_date_parts (10-14)
  • floor_to_month (16-20)
  • ceil_to_month (22-28)
airbyte_cdk/test/utils/transforms/test_impute.py (1)
airbyte_cdk/utils/transforms/impute.py (6)
  • _numeric_skewness (17-26)
  • choose_imputation_strategy (28-45)
  • compute_imputation_value (47-62)
  • fill_nulls_column (64-72)
  • fill_nulls_record (74-94)
  • ImputationReport (11-15)
airbyte_cdk/utils/transforms/__init__.py (4)
airbyte_cdk/utils/transforms/math.py (7)
  • minmax_scale (7-11)
  • zscore (13-14)
  • clip (16-17)
  • winsorize (19-20)
  • log1p_safe (22-28)
  • bucketize (30-34)
  • robust_percentile_scale (36-50)
airbyte_cdk/utils/transforms/cleaning.py (7)
  • to_lower (7-8)
  • strip_whitespace (10-11)
  • squash_whitespace (13-16)
  • normalize_unicode (18-19)
  • remove_punctuation (22-25)
  • map_values (27-28)
  • cast_numeric (30-44)
airbyte_cdk/utils/transforms/date.py (4)
  • try_parse_date (4-8)
  • extract_date_parts (10-14)
  • floor_to_month (16-20)
  • ceil_to_month (22-28)
airbyte_cdk/utils/transforms/impute.py (5)
  • ImputationReport (11-15)
  • choose_imputation_strategy (28-45)
  • compute_imputation_value (47-62)
  • fill_nulls_column (64-72)
  • fill_nulls_record (74-94)
airbyte_cdk/test/utils/transforms/test_math.py (1)
airbyte_cdk/utils/transforms/math.py (7)
  • minmax_scale (7-11)
  • zscore (13-14)
  • clip (16-17)
  • winsorize (19-20)
  • log1p_safe (22-28)
  • bucketize (30-34)
  • robust_percentile_scale (36-50)
🪛 GitHub Actions: Linters
airbyte_cdk/test/utils/transforms/test_cleaning.py

[error] 13-13: Function is missing a return type annotation [no-untyped-def]


[error] 13-13: Use "-> None" if function does not return a value


[error] 28-28: Function is missing a return type annotation [no-untyped-def]


[error] 28-28: Use "-> None" if function does not return a value


[error] 43-43: Function is missing a return type annotation [no-untyped-def]


[error] 43-43: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 78-78: Function is missing a return type annotation [no-untyped-def]


[error] 78-78: Use "-> None" if function does not return a value


[error] 96-96: Function is missing a return type annotation [no-untyped-def]


[error] 96-96: Use "-> None" if function does not return a value


[error] 113-113: Function is missing a return type annotation [no-untyped-def]


[error] 113-113: Use "-> None" if function does not return a value


[error] 131-131: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal['']") [comparison-overlap]


[error] 132-132: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal[' ']") [comparison-overlap]


[error] 136-136: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "str") [comparison-overlap]

airbyte_cdk/test/utils/transforms/test_date.py

[error] 11-11: Function is missing a return type annotation [no-untyped-def]


[error] 11-11: Use "-> None" if function does not return a value


[error] 22-22: Function is missing a return type annotation [no-untyped-def]


[error] 22-22: Use "-> None" if function does not return a value


[error] 39-39: Function is missing a return type annotation [no-untyped-def]


[error] 39-39: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 46-46: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 50-50: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 53-53: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 54-54: Call to untyped function "floor_to_month" in typed context [no-untyped-call]

airbyte_cdk/test/utils/transforms/test_impute.py

[error] 12-12: Function is missing a return type annotation [no-untyped-def]


[error] 12-12: Use "-> None" if function does not return a value


[error] 26-26: Function is missing a return type annotation [no-untyped-def]


[error] 26-26: Use "-> None" if function does not return a value


[error] 68-68: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]


[error] 107-107: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]


[error] 115-115: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]


[error] 12-12: Function is missing a return type annotation [no-untyped-def]


[error] 12-12: Use "-> None" if function does not return a value


[error] 26-26: Function is missing a return type annotation [no-untyped-def]


[error] 26-26: Use "-> None" if function does not return a value


[error] 46-46: Function is missing a return type annotation [no-untyped-def]


[error] 46-46: Use "-> None" if function does not return a value

airbyte_cdk/utils/transforms/__init__.py

[error] 1-1: I001 Import block is un-sorted or un-formatted. Organize imports.


[error] 15-15: I001 Import block is un-sorted or un-formatted. Organize imports.

airbyte_cdk/utils/transforms/date.py

[error] 4-4: Function is missing a return type annotation [no-untyped-def]


[error] 10-10: Function is missing a type annotation for one or more arguments [no-untyped-def]


[error] 16-16: Function is missing a type annotation [no-untyped-def]


[error] 22-22: Function is missing a type annotation [no-untyped-def]

airbyte_cdk/test/utils/transforms/test_math.py

[error] 14-14: Function is missing a return type annotation [no-untyped-def]


[error] 14-14: Use "-> None" if function does not return a value


[error] 33-33: Function is missing a return type annotation [no-untyped-def]


[error] 33-33: Use "-> None" if function does not return a value


[error] 45-45: Function is missing a return type annotation [no-untyped-def]


[error] 45-45: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 14-14: Function is missing a return type annotation [no-untyped-def]


[error] 14-14: Use "-> None" if function does not return a value


[error] 33-33: Function is missing a return type annotation [no-untyped-def]


[error] 33-33: Use "-> None" if function does not return a value


[error] 45-45: Function is missing a return type annotation [no-untyped-def]


[error] 45-45: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 72-72: Function is missing a return type annotation [no-untyped-def]


[error] 72-72: Use "-> None" if function does not return a value


[error] 87-87: Function is missing a return type annotation [no-untyped-def]


[error] 87-87: Use "-> None" if function does not return a value


[error] 111-111: Function is missing a return type annotation [no-untyped-def]


[error] 111-111: Use "-> None" if function does not return a value

airbyte_cdk/utils/transforms/impute.py

[error] 64-64: Function is missing a type annotation for one or more arguments [no-untyped-def]

airbyte_cdk/utils/transforms/math.py

[error] 3-3: I001 Import block is un-sorted or un-formatted. Organize imports.

airbyte_cdk/utils/transforms/cleaning.py

[error] 19-19: Argument 1 to "normalize" has incompatible type "str"; expected "Literal['NFC', 'NFD', 'NFKC', 'NFKD']" [arg-type]


[error] 44-44: Returning Any from function declared to return "int | float | None" [no-any-return]

🪛 LanguageTool
.github/copilot-instructions.md

[uncategorized] ~89-~89: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...rnal APIs: Use HttpStream with proper rate limiting - Vector DBs: Implement destination log...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
.github/copilot-instructions.md (1)

1-97: LGTM! Comprehensive development guide.

This AI development guide is well-structured and provides valuable context for AI agents working with the codebase. The sections cover key architectural concepts, testing patterns, and common workflows effectively.

airbyte_cdk/utils/transforms/math.py (1)

7-50: All function implementations look solid!

The logic across all seven transformation functions is correct and handles edge cases well (zero-width ranges, division by zero, invalid log inputs). The implementations are clean and the type hints are helpful. Nice work on this utility module!

Comment on lines +13 to +142
def test_to_lower():
"""Test string lowercasing function."""
# Test normal cases
assert to_lower("Hello") == "hello"
assert to_lower("HELLO") == "hello"
assert to_lower("HeLLo") == "hello"

# Test with spaces and special characters
assert to_lower("Hello World!") == "hello world!"
assert to_lower("Hello123") == "hello123"

# Test empty and None
assert to_lower("") == ""
assert to_lower(None) is None

def test_strip_whitespace():
"""Test whitespace stripping function."""
# Test normal cases
assert strip_whitespace(" hello ") == "hello"
assert strip_whitespace("hello") == "hello"

# Test with tabs and newlines
assert strip_whitespace("\thello\n") == "hello"
assert strip_whitespace(" hello\n world ") == "hello\n world"

# Test empty and None
assert strip_whitespace(" ") == ""
assert strip_whitespace("") == ""
assert strip_whitespace(None) is None

def test_squash_whitespace():
"""Test whitespace squashing function."""
# Test normal cases
assert squash_whitespace("hello world") == "hello world"
assert squash_whitespace(" hello world ") == "hello world"

# Test with tabs and newlines
assert squash_whitespace("hello\n\nworld") == "hello world"
assert squash_whitespace("hello\t\tworld") == "hello world"
assert squash_whitespace("\n hello \t world \n") == "hello world"

# Test empty and None
assert squash_whitespace(" ") == ""
assert squash_whitespace("") == ""
assert squash_whitespace(None) is None

def test_normalize_unicode():
"""Test unicode normalization function."""
# Test normal cases
assert normalize_unicode("hello") == "hello"

# Test composed characters
assert normalize_unicode("café") == "café" # Composed 'é'

# Test decomposed characters
decomposed = "cafe\u0301" # 'e' with combining acute accent
assert normalize_unicode(decomposed) == "café" # Should normalize to composed form

# Test different normalization forms
assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC")

# Test empty and None
assert normalize_unicode("") == ""
assert normalize_unicode(None) is None

def test_remove_punctuation():
"""Test punctuation removal function."""
# Test normal cases
assert remove_punctuation("hello, world!") == "hello world"
assert remove_punctuation("hello.world") == "helloworld"

# Test with multiple punctuation marks
assert remove_punctuation("hello!!! world???") == "hello world"
assert remove_punctuation("hello@#$%world") == "helloworld"

# Test with unicode punctuation
assert remove_punctuation("hello—world") == "helloworld"
assert remove_punctuation("«hello»") == "hello"

# Test empty and None
assert remove_punctuation("") == ""
assert remove_punctuation(None) is None

def test_map_values():
"""Test value mapping function."""
mapping = {"a": 1, "b": 2, "c": 3}

# Test normal cases
assert map_values("a", mapping) == 1
assert map_values("b", mapping) == 2

# Test with default value
assert map_values("x", mapping) is None
assert map_values("x", mapping, default=0) == 0

# Test with different value types
mixed_mapping = {1: "one", "two": 2, None: "null"}
assert map_values(1, mixed_mapping) == "one"
assert map_values(None, mixed_mapping) == "null"

def test_cast_numeric():
"""Test numeric casting function."""
# Test successful casts
assert cast_numeric("123") == 123
assert cast_numeric("123.45") == 123.45
assert cast_numeric(123) == 123
assert cast_numeric(123.45) == 123.45

# Test integers vs floats
assert isinstance(cast_numeric("123"), int)
assert isinstance(cast_numeric("123.45"), float)

# Test empty values
assert cast_numeric(None) is None
assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string
assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace

# Test empty values with default behavior (on_error="ignore")
assert cast_numeric("") == ""
assert cast_numeric(" ") == " "

# Test error handling modes
non_numeric = "abc"
assert cast_numeric(non_numeric, on_error="ignore") == non_numeric
assert cast_numeric(non_numeric, on_error="none") is None
assert cast_numeric(non_numeric, on_error="default", default=0) == 0

# Test error raising
with pytest.raises(Exception):
cast_numeric(non_numeric, on_error="raise") No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add return type annotations to test functions.

All test functions are missing -> None return type annotations. Would you consider adding these? Wdyt?

-def test_to_lower():
+def test_to_lower() -> None:
     """Test string lowercasing function."""

-def test_strip_whitespace():
+def test_strip_whitespace() -> None:
     """Test whitespace stripping function."""

-def test_squash_whitespace():
+def test_squash_whitespace() -> None:
     """Test whitespace squashing function."""

-def test_normalize_unicode():
+def test_normalize_unicode() -> None:
     """Test unicode normalization function."""

-def test_remove_punctuation():
+def test_remove_punctuation() -> None:
     """Test punctuation removal function."""

-def test_map_values():
+def test_map_values() -> None:
     """Test value mapping function."""

-def test_cast_numeric():
+def test_cast_numeric() -> None:
     """Test numeric casting function."""

Note: The comparison-overlap errors on lines 131-132, 136 will be resolved once the return type issue in cast_numeric (flagged in cleaning.py) is fixed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_to_lower():
"""Test string lowercasing function."""
# Test normal cases
assert to_lower("Hello") == "hello"
assert to_lower("HELLO") == "hello"
assert to_lower("HeLLo") == "hello"
# Test with spaces and special characters
assert to_lower("Hello World!") == "hello world!"
assert to_lower("Hello123") == "hello123"
# Test empty and None
assert to_lower("") == ""
assert to_lower(None) is None
def test_strip_whitespace():
"""Test whitespace stripping function."""
# Test normal cases
assert strip_whitespace(" hello ") == "hello"
assert strip_whitespace("hello") == "hello"
# Test with tabs and newlines
assert strip_whitespace("\thello\n") == "hello"
assert strip_whitespace(" hello\n world ") == "hello\n world"
# Test empty and None
assert strip_whitespace(" ") == ""
assert strip_whitespace("") == ""
assert strip_whitespace(None) is None
def test_squash_whitespace():
"""Test whitespace squashing function."""
# Test normal cases
assert squash_whitespace("hello world") == "hello world"
assert squash_whitespace(" hello world ") == "hello world"
# Test with tabs and newlines
assert squash_whitespace("hello\n\nworld") == "hello world"
assert squash_whitespace("hello\t\tworld") == "hello world"
assert squash_whitespace("\n hello \t world \n") == "hello world"
# Test empty and None
assert squash_whitespace(" ") == ""
assert squash_whitespace("") == ""
assert squash_whitespace(None) is None
def test_normalize_unicode():
"""Test unicode normalization function."""
# Test normal cases
assert normalize_unicode("hello") == "hello"
# Test composed characters
assert normalize_unicode("café") == "café" # Composed 'é'
# Test decomposed characters
decomposed = "cafe\u0301" # 'e' with combining acute accent
assert normalize_unicode(decomposed) == "café" # Should normalize to composed form
# Test different normalization forms
assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC")
# Test empty and None
assert normalize_unicode("") == ""
assert normalize_unicode(None) is None
def test_remove_punctuation():
"""Test punctuation removal function."""
# Test normal cases
assert remove_punctuation("hello, world!") == "hello world"
assert remove_punctuation("hello.world") == "helloworld"
# Test with multiple punctuation marks
assert remove_punctuation("hello!!! world???") == "hello world"
assert remove_punctuation("hello@#$%world") == "helloworld"
# Test with unicode punctuation
assert remove_punctuation("hello—world") == "helloworld"
assert remove_punctuation("«hello»") == "hello"
# Test empty and None
assert remove_punctuation("") == ""
assert remove_punctuation(None) is None
def test_map_values():
"""Test value mapping function."""
mapping = {"a": 1, "b": 2, "c": 3}
# Test normal cases
assert map_values("a", mapping) == 1
assert map_values("b", mapping) == 2
# Test with default value
assert map_values("x", mapping) is None
assert map_values("x", mapping, default=0) == 0
# Test with different value types
mixed_mapping = {1: "one", "two": 2, None: "null"}
assert map_values(1, mixed_mapping) == "one"
assert map_values(None, mixed_mapping) == "null"
def test_cast_numeric():
"""Test numeric casting function."""
# Test successful casts
assert cast_numeric("123") == 123
assert cast_numeric("123.45") == 123.45
assert cast_numeric(123) == 123
assert cast_numeric(123.45) == 123.45
# Test integers vs floats
assert isinstance(cast_numeric("123"), int)
assert isinstance(cast_numeric("123.45"), float)
# Test empty values
assert cast_numeric(None) is None
assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string
assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace
# Test empty values with default behavior (on_error="ignore")
assert cast_numeric("") == ""
assert cast_numeric(" ") == " "
# Test error handling modes
non_numeric = "abc"
assert cast_numeric(non_numeric, on_error="ignore") == non_numeric
assert cast_numeric(non_numeric, on_error="none") is None
assert cast_numeric(non_numeric, on_error="default", default=0) == 0
# Test error raising
with pytest.raises(Exception):
cast_numeric(non_numeric, on_error="raise")
def test_to_lower() -> None:
"""Test string lowercasing function."""
# Test normal cases
assert to_lower("Hello") == "hello"
assert to_lower("HELLO") == "hello"
assert to_lower("HeLLo") == "hello"
# Test with spaces and special characters
assert to_lower("Hello World!") == "hello world!"
assert to_lower("Hello123") == "hello123"
# Test empty and None
assert to_lower("") == ""
assert to_lower(None) is None
def test_strip_whitespace() -> None:
"""Test whitespace stripping function."""
# Test normal cases
assert strip_whitespace(" hello ") == "hello"
assert strip_whitespace("hello") == "hello"
# Test with tabs and newlines
assert strip_whitespace("\thello\n") == "hello"
assert strip_whitespace(" hello\n world ") == "hello\n world"
# Test empty and None
assert strip_whitespace(" ") == ""
assert strip_whitespace("") == ""
assert strip_whitespace(None) is None
def test_squash_whitespace() -> None:
"""Test whitespace squashing function."""
# Test normal cases
assert squash_whitespace("hello world") == "hello world"
assert squash_whitespace(" hello world ") == "hello world"
# Test with tabs and newlines
assert squash_whitespace("hello\n\nworld") == "hello world"
assert squash_whitespace("hello\t\tworld") == "hello world"
assert squash_whitespace("\n hello \t world \n") == "hello world"
# Test empty and None
assert squash_whitespace(" ") == ""
assert squash_whitespace("") == ""
assert squash_whitespace(None) is None
def test_normalize_unicode() -> None:
"""Test unicode normalization function."""
# Test normal cases
assert normalize_unicode("hello") == "hello"
# Test composed characters
assert normalize_unicode("café") == "café" # Composed 'é'
# Test decomposed characters
decomposed = "cafe\u0301" # 'e' with combining acute accent
assert normalize_unicode(decomposed) == "café" # Should normalize to composed form
# Test different normalization forms
assert normalize_unicode("café", form="NFD") != normalize_unicode("café", form="NFC")
# Test empty and None
assert normalize_unicode("") == ""
assert normalize_unicode(None) is None
def test_remove_punctuation() -> None:
"""Test punctuation removal function."""
# Test normal cases
assert remove_punctuation("hello, world!") == "hello world"
assert remove_punctuation("hello.world") == "helloworld"
# Test with multiple punctuation marks
assert remove_punctuation("hello!!! world???") == "hello world"
assert remove_punctuation("hello@#$%world") == "helloworld"
# Test with unicode punctuation
assert remove_punctuation("hello—world") == "helloworld"
assert remove_punctuation("«hello»") == "hello"
# Test empty and None
assert remove_punctuation("") == ""
assert remove_punctuation(None) is None
def test_map_values() -> None:
"""Test value mapping function."""
mapping = {"a": 1, "b": 2, "c": 3}
# Test normal cases
assert map_values("a", mapping) == 1
assert map_values("b", mapping) == 2
# Test with default value
assert map_values("x", mapping) is None
assert map_values("x", mapping, default=0) == 0
# Test with different value types
mixed_mapping = {1: "one", "two": 2, None: "null"}
assert map_values(1, mixed_mapping) == "one"
assert map_values(None, mixed_mapping) == "null"
def test_cast_numeric() -> None:
"""Test numeric casting function."""
# Test successful casts
assert cast_numeric("123") == 123
assert cast_numeric("123.45") == 123.45
assert cast_numeric(123) == 123
assert cast_numeric(123.45) == 123.45
# Test integers vs floats
assert isinstance(cast_numeric("123"), int)
assert isinstance(cast_numeric("123.45"), float)
# Test empty values
assert cast_numeric(None) is None
assert cast_numeric("", on_error="none") is None # Need to specify on_error="none" to get None for empty string
assert cast_numeric(" ", on_error="none") is None # Need to specify on_error="none" to get None for whitespace
# Test empty values with default behavior (on_error="ignore")
assert cast_numeric("") == ""
assert cast_numeric(" ") == " "
# Test error handling modes
non_numeric = "abc"
assert cast_numeric(non_numeric, on_error="ignore") == non_numeric
assert cast_numeric(non_numeric, on_error="none") is None
assert cast_numeric(non_numeric, on_error="default", default=0) == 0
# Test error raising
with pytest.raises(Exception):
cast_numeric(non_numeric, on_error="raise")
🧰 Tools
🪛 GitHub Actions: Linters

[error] 13-13: Function is missing a return type annotation [no-untyped-def]


[error] 13-13: Use "-> None" if function does not return a value


[error] 28-28: Function is missing a return type annotation [no-untyped-def]


[error] 28-28: Use "-> None" if function does not return a value


[error] 43-43: Function is missing a return type annotation [no-untyped-def]


[error] 43-43: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 78-78: Function is missing a return type annotation [no-untyped-def]


[error] 78-78: Use "-> None" if function does not return a value


[error] 96-96: Function is missing a return type annotation [no-untyped-def]


[error] 96-96: Use "-> None" if function does not return a value


[error] 113-113: Function is missing a return type annotation [no-untyped-def]


[error] 113-113: Use "-> None" if function does not return a value


[error] 131-131: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal['']") [comparison-overlap]


[error] 132-132: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "Literal[' ']") [comparison-overlap]


[error] 136-136: Non-overlapping equality check (left operand type: "int | float | None", right operand type: "str") [comparison-overlap]

🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_cleaning.py around lines 13-142 the
test functions (test_to_lower, test_strip_whitespace, test_squash_whitespace,
test_normalize_unicode, test_remove_punctuation, test_map_values,
test_cast_numeric) are missing explicit return type annotations; update each
function definition to include "-> None" (e.g., def test_to_lower() -> None:) so
all tests have explicit return types, then run the test suite to ensure no
further type-related failures.

Comment on lines +11 to +72
def test_try_parse_date():
"""Test date parsing function."""
# Test with datetime object
dt = datetime(2023, 1, 15)
assert try_parse_date(dt) == dt

# Test with non-date object
assert try_parse_date("2023-01-15") is None
assert try_parse_date(123) is None
assert try_parse_date(None) is None

def test_extract_date_parts():
"""Test date parts extraction function."""
# Test with valid datetime
dt = datetime(2023, 1, 15) # Sunday
parts = extract_date_parts(dt)
assert parts["year"] == 2023
assert parts["month"] == 1
assert parts["day"] == 15
assert parts["dow"] == 6 # Sunday is 6

# Test with invalid input
parts = extract_date_parts(None)
assert all(v is None for v in parts.values())

parts = extract_date_parts("not a date")
assert all(v is None for v in parts.values())

def test_floor_to_month():
"""Test floor to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert floor_to_month(dt) == datetime(2023, 1, 1)

dt = datetime(2023, 12, 31)
assert floor_to_month(dt) == datetime(2023, 12, 1)

# Test first day of month
dt = datetime(2023, 1, 1)
assert floor_to_month(dt) == dt

# Test with invalid input
assert floor_to_month(None) is None
assert floor_to_month("not a date") is None

def test_ceil_to_month():
"""Test ceil to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert ceil_to_month(dt) == datetime(2023, 2, 1)

# Test end of year
dt = datetime(2023, 12, 15)
assert ceil_to_month(dt) == datetime(2024, 1, 1)

# Test first day of month
dt = datetime(2023, 1, 1)
assert ceil_to_month(dt) == datetime(2023, 2, 1)

# Test with invalid input
assert ceil_to_month(None) is None
assert ceil_to_month("not a date") is None No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add return type annotations to test functions.

All test functions are missing -> None return type annotations. Would you mind adding these to satisfy the type checker? Wdyt?

-def test_try_parse_date():
+def test_try_parse_date() -> None:
     """Test date parsing function."""

-def test_extract_date_parts():
+def test_extract_date_parts() -> None:
     """Test date parts extraction function."""

-def test_floor_to_month():
+def test_floor_to_month() -> None:
     """Test floor to month function."""

-def test_ceil_to_month():
+def test_ceil_to_month() -> None:
     """Test ceil to month function."""

Note: The "Call to untyped function" errors will be resolved once the functions in date.py have proper type annotations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_try_parse_date():
"""Test date parsing function."""
# Test with datetime object
dt = datetime(2023, 1, 15)
assert try_parse_date(dt) == dt
# Test with non-date object
assert try_parse_date("2023-01-15") is None
assert try_parse_date(123) is None
assert try_parse_date(None) is None
def test_extract_date_parts():
"""Test date parts extraction function."""
# Test with valid datetime
dt = datetime(2023, 1, 15) # Sunday
parts = extract_date_parts(dt)
assert parts["year"] == 2023
assert parts["month"] == 1
assert parts["day"] == 15
assert parts["dow"] == 6 # Sunday is 6
# Test with invalid input
parts = extract_date_parts(None)
assert all(v is None for v in parts.values())
parts = extract_date_parts("not a date")
assert all(v is None for v in parts.values())
def test_floor_to_month():
"""Test floor to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert floor_to_month(dt) == datetime(2023, 1, 1)
dt = datetime(2023, 12, 31)
assert floor_to_month(dt) == datetime(2023, 12, 1)
# Test first day of month
dt = datetime(2023, 1, 1)
assert floor_to_month(dt) == dt
# Test with invalid input
assert floor_to_month(None) is None
assert floor_to_month("not a date") is None
def test_ceil_to_month():
"""Test ceil to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert ceil_to_month(dt) == datetime(2023, 2, 1)
# Test end of year
dt = datetime(2023, 12, 15)
assert ceil_to_month(dt) == datetime(2024, 1, 1)
# Test first day of month
dt = datetime(2023, 1, 1)
assert ceil_to_month(dt) == datetime(2023, 2, 1)
# Test with invalid input
assert ceil_to_month(None) is None
assert ceil_to_month("not a date") is None
def test_try_parse_date() -> None:
"""Test date parsing function."""
# Test with datetime object
dt = datetime(2023, 1, 15)
assert try_parse_date(dt) == dt
# Test with non-date object
assert try_parse_date("2023-01-15") is None
assert try_parse_date(123) is None
assert try_parse_date(None) is None
def test_extract_date_parts() -> None:
"""Test date parts extraction function."""
# Test with valid datetime
dt = datetime(2023, 1, 15) # Sunday
parts = extract_date_parts(dt)
assert parts["year"] == 2023
assert parts["month"] == 1
assert parts["day"] == 15
assert parts["dow"] == 6 # Sunday is 6
# Test with invalid input
parts = extract_date_parts(None)
assert all(v is None for v in parts.values())
parts = extract_date_parts("not a date")
assert all(v is None for v in parts.values())
def test_floor_to_month() -> None:
"""Test floor to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert floor_to_month(dt) == datetime(2023, 1, 1)
dt = datetime(2023, 12, 31)
assert floor_to_month(dt) == datetime(2023, 12, 1)
# Test first day of month
dt = datetime(2023, 1, 1)
assert floor_to_month(dt) == dt
# Test with invalid input
assert floor_to_month(None) is None
assert floor_to_month("not a date") is None
def test_ceil_to_month() -> None:
"""Test ceil to month function."""
# Test normal cases
dt = datetime(2023, 1, 15)
assert ceil_to_month(dt) == datetime(2023, 2, 1)
# Test end of year
dt = datetime(2023, 12, 15)
assert ceil_to_month(dt) == datetime(2024, 1, 1)
# Test first day of month
dt = datetime(2023, 1, 1)
assert ceil_to_month(dt) == datetime(2023, 2, 1)
# Test with invalid input
assert ceil_to_month(None) is None
assert ceil_to_month("not a date") is None
🧰 Tools
🪛 GitHub Actions: Linters

[error] 11-11: Function is missing a return type annotation [no-untyped-def]


[error] 11-11: Use "-> None" if function does not return a value


[error] 22-22: Function is missing a return type annotation [no-untyped-def]


[error] 22-22: Use "-> None" if function does not return a value


[error] 39-39: Function is missing a return type annotation [no-untyped-def]


[error] 39-39: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 46-46: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 50-50: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 53-53: Call to untyped function "floor_to_month" in typed context [no-untyped-call]


[error] 54-54: Call to untyped function "floor_to_month" in typed context [no-untyped-call]

🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_date.py around lines 11 to 72, the
test functions lack return type annotations; update each test function
definition (test_try_parse_date, test_extract_date_parts, test_floor_to_month,
test_ceil_to_month) to include an explicit "-> None" return type (e.g., def
test_try_parse_date() -> None:) so the type checker sees them as properly
annotated tests.

Comment on lines +12 to +67
def test_numeric_skewness():
"""Test skewness calculation function."""
# Test normal cases
assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data
assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew
assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew

# Test edge cases
assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values
assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance

# Test with floating point values
assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10)

def test_choose_imputation_strategy():
"""Test imputation strategy selection function."""
# Test numeric data
assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew
assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew

# Test categorical data
assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode"
assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric

# Test repeated values with custom threshold
assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6)

# Test empty and None values
assert choose_imputation_strategy([]) == "mode"
assert choose_imputation_strategy([None, None]) == "mode"

# Test with mixed types
assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected

def test_compute_imputation_value():
"""Test imputation value computation function."""
# Test mean strategy
assert compute_imputation_value([1, 2, 3], "mean") == 2.0
assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5

# Test median strategy
assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5
assert compute_imputation_value([1, 2, 3], "median") == 2.0

# Test mode strategy
assert compute_imputation_value([1, 1, 2], "mode") == 1
assert compute_imputation_value(["a", "a", "b"], "mode") == "a"

# Test with None values
assert compute_imputation_value([1, None, 3], "mean") == 2.0
assert compute_imputation_value([None, None], "mean") is None

# Test invalid strategy
with pytest.raises(ValueError):
compute_imputation_value([1, 2, 3], "invalid")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add return type annotations to test functions.

Test functions test_numeric_skewness, test_choose_imputation_strategy, and test_compute_imputation_value are missing -> None return type annotations. Would you mind adding these? Wdyt?

-def test_numeric_skewness():
+def test_numeric_skewness() -> None:
     """Test skewness calculation function."""

-def test_choose_imputation_strategy():
+def test_choose_imputation_strategy() -> None:
     """Test imputation strategy selection function."""

-def test_compute_imputation_value():
+def test_compute_imputation_value() -> None:
     """Test imputation value computation function."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_numeric_skewness():
"""Test skewness calculation function."""
# Test normal cases
assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data
assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew
assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew
# Test edge cases
assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values
assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance
# Test with floating point values
assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10)
def test_choose_imputation_strategy():
"""Test imputation strategy selection function."""
# Test numeric data
assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew
assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew
# Test categorical data
assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode"
assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric
# Test repeated values with custom threshold
assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6)
# Test empty and None values
assert choose_imputation_strategy([]) == "mode"
assert choose_imputation_strategy([None, None]) == "mode"
# Test with mixed types
assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected
def test_compute_imputation_value():
"""Test imputation value computation function."""
# Test mean strategy
assert compute_imputation_value([1, 2, 3], "mean") == 2.0
assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5
# Test median strategy
assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5
assert compute_imputation_value([1, 2, 3], "median") == 2.0
# Test mode strategy
assert compute_imputation_value([1, 1, 2], "mode") == 1
assert compute_imputation_value(["a", "a", "b"], "mode") == "a"
# Test with None values
assert compute_imputation_value([1, None, 3], "mean") == 2.0
assert compute_imputation_value([None, None], "mean") is None
# Test invalid strategy
with pytest.raises(ValueError):
compute_imputation_value([1, 2, 3], "invalid")
def test_numeric_skewness() -> None:
"""Test skewness calculation function."""
# Test normal cases
assert _numeric_skewness([1, 2, 3]) == pytest.approx(0.0, abs=1e-10) # Symmetric data
assert _numeric_skewness([1, 1, 2]) > 0 # Positive skew
assert _numeric_skewness([1, 2, 2]) < 0 # Negative skew
# Test edge cases
assert _numeric_skewness([1, 1]) == 0.0 # Less than 3 values
assert _numeric_skewness([1, 1, 1]) == 0.0 # No variance
# Test with floating point values
assert _numeric_skewness([1.0, 2.0, 3.0]) == pytest.approx(0.0, abs=1e-10)
def test_choose_imputation_strategy() -> None:
"""Test imputation strategy selection function."""
# Test numeric data
assert choose_imputation_strategy([1, 2, 3]) == "mean" # Low skew
assert choose_imputation_strategy([1, 1, 10]) == "median" # High skew
# Test categorical data
assert choose_imputation_strategy(["a", "b", "c"], numeric=False) == "mode"
assert choose_imputation_strategy(["a", "a", "b"]) == "mode" # Autodetect non-numeric
# Test repeated values with custom threshold
assert choose_imputation_strategy([1, 1, 1, 2], unique_ratio_threshold=0.6) == "mode" # Low unique ratio (0.5 < 0.6)
# Test empty and None values
assert choose_imputation_strategy([]) == "mode"
assert choose_imputation_strategy([None, None]) == "mode"
# Test with mixed types
assert choose_imputation_strategy([1, "2", 3]) == "mode" # Non-numeric detected
def test_compute_imputation_value() -> None:
"""Test imputation value computation function."""
# Test mean strategy
assert compute_imputation_value([1, 2, 3], "mean") == 2.0
assert compute_imputation_value([1.5, 2.5, 3.5], "mean") == 2.5
# Test median strategy
assert compute_imputation_value([1, 2, 3, 4], "median") == 2.5
assert compute_imputation_value([1, 2, 3], "median") == 2.0
# Test mode strategy
assert compute_imputation_value([1, 1, 2], "mode") == 1
assert compute_imputation_value(["a", "a", "b"], "mode") == "a"
# Test with None values
assert compute_imputation_value([1, None, 3], "mean") == 2.0
assert compute_imputation_value([None, None], "mean") is None
# Test invalid strategy
with pytest.raises(ValueError):
compute_imputation_value([1, 2, 3], "invalid")
🧰 Tools
🪛 GitHub Actions: Linters

[error] 12-12: Function is missing a return type annotation [no-untyped-def]


[error] 12-12: Use "-> None" if function does not return a value


[error] 26-26: Function is missing a return type annotation [no-untyped-def]


[error] 26-26: Use "-> None" if function does not return a value


[error] 12-12: Function is missing a return type annotation [no-untyped-def]


[error] 12-12: Use "-> None" if function does not return a value


[error] 26-26: Function is missing a return type annotation [no-untyped-def]


[error] 26-26: Use "-> None" if function does not return a value


[error] 46-46: Function is missing a return type annotation [no-untyped-def]


[error] 46-46: Use "-> None" if function does not return a value

🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_impute.py around lines 12 to 67, the
three test functions lack explicit return type annotations; update each function
definition to include "-> None" (i.e., change "def test_numeric_skewness():",
"def test_choose_imputation_strategy():", and "def
test_compute_imputation_value():" to "def test_numeric_skewness() -> None:",
"def test_choose_imputation_strategy() -> None:", and "def
test_compute_imputation_value() -> None:" respectively), no other changes
required.

Comment on lines +68 to +117
def test_fill_nulls_column():
"""Test column null filling function."""
# Test numeric data
values, report = fill_nulls_column([1, None, 3])
assert values == [1, 2.0, 3]
assert report.strategy == "mean"
assert report.value_used == 2.0

# Test categorical data
values, report = fill_nulls_column(["a", None, "a"])
assert values == ["a", "a", "a"]
assert report.strategy == "mode"
assert report.value_used == "a"

# Test explicit strategy
values, report = fill_nulls_column([1, None, 3], explicit_strategy="median")
assert values == [1, 2, 3]
assert report.strategy == "median"

# Test all None values
values, report = fill_nulls_column([None, None])
assert values == [None, None]
assert report.value_used is None

def test_fill_nulls_record():
"""Test record null filling function."""
# Test basic record filling
record = {"a": 1, "b": None, "c": "x"}
samples = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]}
filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples)

assert filled["a"] == 1
assert filled["b"] == 5.0 # Mean of samples
assert filled["c"] == "x"
assert len(reports) == 3
assert all(isinstance(r, ImputationReport) for r in reports)

# Test with explicit strategies
strategies = {"b": "median"}
filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples, strategies=strategies)
assert filled["b"] == 5.0 # Median of samples

# Test with empty samples
filled, reports = fill_nulls_record(record, ["a", "b", "c"], {})
assert filled["b"] is None # No samples to impute from

# Test with missing columns
filled, reports = fill_nulls_record(record, ["a", "d"], samples)
assert "d" in filled
assert len(reports) == 2 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add return type annotations and fix type compatibility for samples dict.

Two issues here:

  1. Test functions missing -> None return type annotations
  2. The samples dict has type inference issues - the type checker sees dict[str, object] but expects Mapping[str, Sequence[Any]]

Would you consider adding explicit type hints to the samples dict to help the type checker? Wdyt?

-def test_fill_nulls_column():
+def test_fill_nulls_column() -> None:
     """Test column null filling function."""

-def test_fill_nulls_record():
+def test_fill_nulls_record() -> None:
     """Test record null filling function."""
     # Test basic record filling
     record = {"a": 1, "b": None, "c": "x"}
-    samples = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]}
+    samples: dict[str, Sequence[Any]] = {"a": [1, 2, 3], "b": [4, 5, 6], "c": ["x", "y", "x"]}
     filled, reports = fill_nulls_record(record, ["a", "b", "c"], samples)

Apply similar type hints on lines 106-107 and 111.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 68-68: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]


[error] 107-107: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]


[error] 115-115: Argument 3 to "fill_nulls_record" has incompatible type "dict[str, object]"; expected "Mapping[str, Sequence[Any]]" [arg-type]

🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_impute.py around lines 68-117, the
test functions lack explicit return type annotations and the local samples
variables are inferred as dict[str, object] which conflicts with functions
expecting Mapping[str, Sequence[Any]]; add -> None to both
test_fill_nulls_column and test_fill_nulls_record declarations, and annotate
each samples variable with the appropriate type (e.g., samples: Mapping[str,
Sequence[Any]]) on the occurrences around lines 106-107 and 111 so the type
checker accepts the passed argument.

Comment on lines +14 to +129
def test_minmax_scale():
"""Test minmax scaling function."""
# Test normal scaling
assert minmax_scale(5, 0, 10) == 0.5
assert minmax_scale(5, 0, 10, (0, 100)) == 50.0

# Test edge cases
assert minmax_scale(0, 0, 10) == 0.0
assert minmax_scale(10, 0, 10) == 1.0

# Test custom range scaling
assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0

# Test when data_max equals data_min (prevents division by zero)
assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range

# Test with float inputs
assert minmax_scale(5.5, 0.0, 10.0) == 0.55

def test_zscore():
"""Test z-score calculation function."""
# Test normal cases
assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2
assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2

# Test with zero sigma
assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully

# Test with float inputs
assert zscore(10.5, 5.0, 2.0) == 2.75

def test_clip():
"""Test value clipping function."""
# Test normal clipping
assert clip(5, 0, 10) == 5
assert clip(-1, 0, 10) == 0
assert clip(11, 0, 10) == 10

# Test with float values
assert clip(5.5, 0.0, 10.0) == 5.5
assert clip(-1.5, 0.0, 10.0) == 0.0

# Test when low == high
assert clip(5, 3, 3) == 3

def test_winsorize():
"""Test winsorization function."""
# Test normal cases
assert winsorize(5, 0, 10) == 5
assert winsorize(-1, 0, 10) == 0
assert winsorize(11, 0, 10) == 10

# Test with float values
assert winsorize(5.5, 0.0, 10.0) == 5.5

# Test when low == high
assert winsorize(5, 3, 3) == 3

def test_log1p_safe():
"""Test safe log1p calculation function."""
# Test normal cases
assert log1p_safe(0) == 0.0
assert log1p_safe(math.e - 1) == 1.0

# Test negative values > -1
assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10

# Test negative values <= -1
assert log1p_safe(-2) == -2.0 # Should return input value

# Test error cases
assert log1p_safe(float('inf')) == float('inf')

def test_bucketize():
"""Test bucketization function."""
edges = [0, 10, 20, 30]

# Test normal cases
assert bucketize(-5, edges) == 0
assert bucketize(5, edges) == 1
assert bucketize(15, edges) == 2
assert bucketize(25, edges) == 3
assert bucketize(35, edges) == 4

# Test edge values
assert bucketize(0, edges) == 0
assert bucketize(10, edges) == 1
assert bucketize(20, edges) == 2
assert bucketize(30, edges) == 3

# Test empty edges
assert bucketize(5, []) == 0

# Test single edge
assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0
assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1

def test_robust_percentile_scale():
"""Test robust percentile scaling function."""
# Test normal scaling
assert robust_percentile_scale(5, 0, 10) == 0.5
assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0

# Test edge cases
assert robust_percentile_scale(0, 0, 10) == 0.0
assert robust_percentile_scale(10, 0, 10) == 1.0

# Test custom range
assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0

# Test clipping
assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping
assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping

# Test when high equals low
assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add return type annotations to test functions.

All test functions are missing -> None return type annotations, which the type checker requires. Would you consider adding these to satisfy the linter? Here's the pattern to apply:

-def test_minmax_scale():
+def test_minmax_scale() -> None:
     """Test minmax scaling function."""

-def test_zscore():
+def test_zscore() -> None:
     """Test z-score calculation function."""

-def test_clip():
+def test_clip() -> None:
     """Test value clipping function."""

-def test_winsorize():
+def test_winsorize() -> None:
     """Test winsorization function."""

-def test_log1p_safe():
+def test_log1p_safe() -> None:
     """Test safe log1p calculation function."""

-def test_bucketize():
+def test_bucketize() -> None:
     """Test bucketization function."""

-def test_robust_percentile_scale():
+def test_robust_percentile_scale() -> None:
     """Test robust percentile scaling function."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_minmax_scale():
"""Test minmax scaling function."""
# Test normal scaling
assert minmax_scale(5, 0, 10) == 0.5
assert minmax_scale(5, 0, 10, (0, 100)) == 50.0
# Test edge cases
assert minmax_scale(0, 0, 10) == 0.0
assert minmax_scale(10, 0, 10) == 1.0
# Test custom range scaling
assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0
# Test when data_max equals data_min (prevents division by zero)
assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range
# Test with float inputs
assert minmax_scale(5.5, 0.0, 10.0) == 0.55
def test_zscore():
"""Test z-score calculation function."""
# Test normal cases
assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2
assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2
# Test with zero sigma
assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully
# Test with float inputs
assert zscore(10.5, 5.0, 2.0) == 2.75
def test_clip():
"""Test value clipping function."""
# Test normal clipping
assert clip(5, 0, 10) == 5
assert clip(-1, 0, 10) == 0
assert clip(11, 0, 10) == 10
# Test with float values
assert clip(5.5, 0.0, 10.0) == 5.5
assert clip(-1.5, 0.0, 10.0) == 0.0
# Test when low == high
assert clip(5, 3, 3) == 3
def test_winsorize():
"""Test winsorization function."""
# Test normal cases
assert winsorize(5, 0, 10) == 5
assert winsorize(-1, 0, 10) == 0
assert winsorize(11, 0, 10) == 10
# Test with float values
assert winsorize(5.5, 0.0, 10.0) == 5.5
# Test when low == high
assert winsorize(5, 3, 3) == 3
def test_log1p_safe():
"""Test safe log1p calculation function."""
# Test normal cases
assert log1p_safe(0) == 0.0
assert log1p_safe(math.e - 1) == 1.0
# Test negative values > -1
assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10
# Test negative values <= -1
assert log1p_safe(-2) == -2.0 # Should return input value
# Test error cases
assert log1p_safe(float('inf')) == float('inf')
def test_bucketize():
"""Test bucketization function."""
edges = [0, 10, 20, 30]
# Test normal cases
assert bucketize(-5, edges) == 0
assert bucketize(5, edges) == 1
assert bucketize(15, edges) == 2
assert bucketize(25, edges) == 3
assert bucketize(35, edges) == 4
# Test edge values
assert bucketize(0, edges) == 0
assert bucketize(10, edges) == 1
assert bucketize(20, edges) == 2
assert bucketize(30, edges) == 3
# Test empty edges
assert bucketize(5, []) == 0
# Test single edge
assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0
assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1
def test_robust_percentile_scale():
"""Test robust percentile scaling function."""
# Test normal scaling
assert robust_percentile_scale(5, 0, 10) == 0.5
assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0
# Test edge cases
assert robust_percentile_scale(0, 0, 10) == 0.0
assert robust_percentile_scale(10, 0, 10) == 1.0
# Test custom range
assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0
# Test clipping
assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping
assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping
# Test when high equals low
assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range
def test_minmax_scale() -> None:
"""Test minmax scaling function."""
# Test normal scaling
assert minmax_scale(5, 0, 10) == 0.5
assert minmax_scale(5, 0, 10, (0, 100)) == 50.0
# Test edge cases
assert minmax_scale(0, 0, 10) == 0.0
assert minmax_scale(10, 0, 10) == 1.0
# Test custom range scaling
assert minmax_scale(5, 0, 10, (-1, 1)) == 0.0
# Test when data_max equals data_min (prevents division by zero)
assert minmax_scale(5, 5, 5) == 0.5 # Should return middle of output range
# Test with float inputs
assert minmax_scale(5.5, 0.0, 10.0) == 0.55
def test_zscore() -> None:
"""Test z-score calculation function."""
# Test normal cases
assert zscore(10, 5, 2) == 2.5 # (10 - 5) / 2
assert zscore(0, 5, 2) == -2.5 # (0 - 5) / 2
# Test with zero sigma
assert zscore(10, 5, 0) == 0.0 # Should handle division by zero gracefully
# Test with float inputs
assert zscore(10.5, 5.0, 2.0) == 2.75
def test_clip() -> None:
"""Test value clipping function."""
# Test normal clipping
assert clip(5, 0, 10) == 5
assert clip(-1, 0, 10) == 0
assert clip(11, 0, 10) == 10
# Test with float values
assert clip(5.5, 0.0, 10.0) == 5.5
assert clip(-1.5, 0.0, 10.0) == 0.0
# Test when low == high
assert clip(5, 3, 3) == 3
def test_winsorize() -> None:
"""Test winsorization function."""
# Test normal cases
assert winsorize(5, 0, 10) == 5
assert winsorize(-1, 0, 10) == 0
assert winsorize(11, 0, 10) == 10
# Test with float values
assert winsorize(5.5, 0.0, 10.0) == 5.5
# Test when low == high
assert winsorize(5, 3, 3) == 3
def test_log1p_safe() -> None:
"""Test safe log1p calculation function."""
# Test normal cases
assert log1p_safe(0) == 0.0
assert log1p_safe(math.e - 1) == 1.0
# Test negative values > -1
assert abs(log1p_safe(-0.5) - math.log1p(-0.5)) < 1e-10
# Test negative values <= -1
assert log1p_safe(-2) == -2.0 # Should return input value
# Test error cases
assert log1p_safe(float('inf')) == float('inf')
def test_bucketize() -> None:
"""Test bucketization function."""
edges = [0, 10, 20, 30]
# Test normal cases
assert bucketize(-5, edges) == 0
assert bucketize(5, edges) == 1
assert bucketize(15, edges) == 2
assert bucketize(25, edges) == 3
assert bucketize(35, edges) == 4
# Test edge values
assert bucketize(0, edges) == 0
assert bucketize(10, edges) == 1
assert bucketize(20, edges) == 2
assert bucketize(30, edges) == 3
# Test empty edges
assert bucketize(5, []) == 0
# Test single edge
assert bucketize(5, [10]) == 0 # 5 ≤ 10, so bucket 0
assert bucketize(15, [10]) == 1 # 15 > 10, so bucket 1
def test_robust_percentile_scale() -> None:
"""Test robust percentile scaling function."""
# Test normal scaling
assert robust_percentile_scale(5, 0, 10) == 0.5
assert robust_percentile_scale(5, 0, 10, (0, 100)) == 50.0
# Test edge cases
assert robust_percentile_scale(0, 0, 10) == 0.0
assert robust_percentile_scale(10, 0, 10) == 1.0
# Test custom range
assert robust_percentile_scale(5, 0, 10, (-1, 1)) == 0.0
# Test clipping
assert robust_percentile_scale(-1, 0, 10) == 0.0 # With clipping
assert robust_percentile_scale(-1, 0, 10, clip_outliers=False) < 0.0 # Without clipping
# Test when high equals low
assert robust_percentile_scale(5, 5, 5) == 0.5 # Should return middle of output range
🧰 Tools
🪛 GitHub Actions: Linters

[error] 14-14: Function is missing a return type annotation [no-untyped-def]


[error] 14-14: Use "-> None" if function does not return a value


[error] 33-33: Function is missing a return type annotation [no-untyped-def]


[error] 33-33: Use "-> None" if function does not return a value


[error] 45-45: Function is missing a return type annotation [no-untyped-def]


[error] 45-45: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 14-14: Function is missing a return type annotation [no-untyped-def]


[error] 14-14: Use "-> None" if function does not return a value


[error] 33-33: Function is missing a return type annotation [no-untyped-def]


[error] 33-33: Use "-> None" if function does not return a value


[error] 45-45: Function is missing a return type annotation [no-untyped-def]


[error] 45-45: Use "-> None" if function does not return a value


[error] 59-59: Function is missing a return type annotation [no-untyped-def]


[error] 59-59: Use "-> None" if function does not return a value


[error] 72-72: Function is missing a return type annotation [no-untyped-def]


[error] 72-72: Use "-> None" if function does not return a value


[error] 87-87: Function is missing a return type annotation [no-untyped-def]


[error] 87-87: Use "-> None" if function does not return a value


[error] 111-111: Function is missing a return type annotation [no-untyped-def]


[error] 111-111: Use "-> None" if function does not return a value

🤖 Prompt for AI Agents
In airbyte_cdk/test/utils/transforms/test_math.py around lines 14 to 129 the
test functions lack return type annotations; add "-> None" to each test function
signature (e.g., def test_minmax_scale() -> None:) and update all other tests in
this range similarly so every test function explicitly returns None to satisfy
the type checker.

Comment on lines +10 to +14
def extract_date_parts(dt) -> Dict[str, Optional[int]]:
try:
return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())}
except Exception:
return {"year": None, "month": None, "day": None, "dow": None}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add type annotation for dt parameter.

The dt parameter is missing a type annotation. Since it's duck-typed (uses attributes like .year, .month), would Any work here? Wdyt?

-def extract_date_parts(dt) -> Dict[str, Optional[int]]:
+def extract_date_parts(dt: Any) -> Dict[str, Optional[int]]:
     try:
         return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())}
     except Exception:
         return {"year": None, "month": None, "day": None, "dow": None}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_date_parts(dt) -> Dict[str, Optional[int]]:
try:
return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())}
except Exception:
return {"year": None, "month": None, "day": None, "dow": None}
def extract_date_parts(dt: Any) -> Dict[str, Optional[int]]:
try:
return {"year": dt.year, "month": dt.month, "day": dt.day, "dow": int(dt.weekday())}
except Exception:
return {"year": None, "month": None, "day": None, "dow": None}
🧰 Tools
🪛 GitHub Actions: Linters

[error] 10-10: Function is missing a type annotation for one or more arguments [no-untyped-def]

🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 10 to 14, the function
parameter dt is missing a type annotation; update the signature to annotate dt
as Union[datetime.date, datetime.datetime, Any] (import Union and Any from
typing and datetime.date/datetime from datetime) so callers and linters know dt
is expected to be a date-like object but still allow duck-typed inputs; keep the
body unchanged.

Comment on lines +16 to +20
def floor_to_month(dt):
try:
return dt.replace(day=1)
except Exception:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add type annotations for floor_to_month.

This function is missing both parameter and return type annotations. Would you consider adding them? Wdyt?

-def floor_to_month(dt):
+def floor_to_month(dt: Any) -> Optional[Any]:
     try:
         return dt.replace(day=1)
     except Exception:
         return None

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 16-16: Function is missing a type annotation [no-untyped-def]

🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 16 to 20, the function
floor_to_month lacks type annotations; annotate the parameter to accept
datetime.date or datetime.datetime (e.g., Union[date, datetime]) and the return
type as Optional[Union[date, datetime]] (or the Python 3.10+ union syntax),
import the required types from typing and datetime at the top, and keep the
current behavior of returning None on failure so the signature reflects that
possibility.

Comment on lines +22 to +28
def ceil_to_month(dt):
try:
if dt.month == 12:
return dt.replace(year=dt.year + 1, month=1, day=1)
return dt.replace(month=dt.month + 1, day=1)
except Exception:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add type annotations for ceil_to_month.

This function is also missing parameter and return type annotations. Would you mind adding them? Wdyt?

-def ceil_to_month(dt):
+def ceil_to_month(dt: Any) -> Optional[Any]:
     try:
         if dt.month == 12:
             return dt.replace(year=dt.year + 1, month=1, day=1)
         return dt.replace(month=dt.month + 1, day=1)
     except Exception:
         return None

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 22-22: Function is missing a type annotation [no-untyped-def]

🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/date.py around lines 22 to 28, the function
ceil_to_month is missing type annotations; update its signature to accept a
datetime.datetime (or Optional[datetime.datetime] if callers may pass None) and
return Optional[datetime.datetime>, add the necessary imports (from typing
import Optional and import datetime) at the top of the file, and ensure the
function signature and any internal uses reflect those types (e.g., def
ceil_to_month(dt: datetime.datetime) -> Optional[datetime.datetime]:).

Comment on lines +64 to +72
def fill_nulls_column(
series: Sequence[Any],
explicit_strategy: Optional[str] = None,
numeric: Optional[bool] = None,
**choose_kwargs
) -> Tuple[List[Any], ImputationReport]:
strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs)
fill_value = compute_imputation_value(series, strategy)
return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add type annotation for **choose_kwargs parameter.

The **choose_kwargs parameter is missing a type annotation, which the type checker is flagging. Would you consider adding **choose_kwargs: Any or being more specific with the expected keyword arguments? Wdyt?

 def fill_nulls_column(
     series: Sequence[Any],
     explicit_strategy: Optional[str] = None,
     numeric: Optional[bool] = None,
-    **choose_kwargs
+    **choose_kwargs: Any
 ) -> Tuple[List[Any], ImputationReport]:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def fill_nulls_column(
series: Sequence[Any],
explicit_strategy: Optional[str] = None,
numeric: Optional[bool] = None,
**choose_kwargs
) -> Tuple[List[Any], ImputationReport]:
strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs)
fill_value = compute_imputation_value(series, strategy)
return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value)
def fill_nulls_column(
series: Sequence[Any],
explicit_strategy: Optional[str] = None,
numeric: Optional[bool] = None,
**choose_kwargs: Any
) -> Tuple[List[Any], ImputationReport]:
strategy = explicit_strategy or choose_imputation_strategy(series, numeric=numeric, **choose_kwargs)
fill_value = compute_imputation_value(series, strategy)
return [fill_value if x is None else x for x in series], ImputationReport("<series>", strategy, fill_value)
🧰 Tools
🪛 GitHub Actions: Linters

[error] 64-64: Function is missing a type annotation for one or more arguments [no-untyped-def]

🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/impute.py around lines 64 to 72, the variadic
keyword parameter **choose_kwargs is missing a type annotation; add a type
annotation (e.g., **choose_kwargs: Any) to the function signature so the type
checker stops flagging it — if you prefer stricter typing, use a more specific
mapping type like **choose_kwargs: Mapping[str, Any] and import Any/Mapping as
needed.

Comment on lines +1 to +4
from __future__ import annotations
from typing import Sequence, Tuple, Union
import math

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix import ordering to resolve linter error.

The standard library import math should come before the from typing import. This will resolve the pipeline failure.

Apply this diff:

 from __future__ import annotations
-from typing import Sequence, Tuple, Union
 import math
+from typing import Sequence, Tuple, Union
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from __future__ import annotations
from typing import Sequence, Tuple, Union
import math
from __future__ import annotations
import math
from typing import Sequence, Tuple, Union
🧰 Tools
🪛 GitHub Actions: Linters

[error] 3-3: I001 Import block is un-sorted or un-formatted. Organize imports.

🤖 Prompt for AI Agents
In airbyte_cdk/utils/transforms/math.py around lines 1 to 4, the import ordering
is incorrect (math should be a standard-library import and come before the
typing import); keep "from __future__ import annotations" as the very first
line, then import math, then the typing imports (Sequence, Tuple, Union), to
satisfy the linter and PEP8 import grouping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant