Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 8% (0.08x) speedup for _normalize_numbers in skyvern/forge/sdk/core/security.py

⏱️ Runtime : 413 microseconds 382 microseconds (best of 250 runs)

📝 Explanation and details

The optimization replaces isinstance(x, type) with type(x) is type for three type checks: float, dict, and list. This change provides a 7% speedup by leveraging Python's faster identity comparison over attribute lookup and method calls.

Key Performance Improvements:

  1. Faster type checking: type(x) is float avoids the overhead of isinstance(), which must perform method resolution, inheritance checking, and multiple comparisons. The is operator performs a simple identity check against the exact type object.

  2. Reduced function call overhead: isinstance() is a built-in function call, while type() is a more direct operation followed by identity comparison.

  3. Eliminated inheritance traversal: isinstance() checks the entire method resolution order for subclasses, while type(x) is float only matches exact types.

Impact Analysis:
The function is called from _normalize_json_dumps() for JSON serialization, where it processes potentially large nested data structures. The 7% improvement compounds across recursive calls - with large nested structures showing up to 37.9% speedup in test cases with 500+ mixed-type elements.

Test Case Performance:

  • Simple type checks: 7-60% faster for basic types (bool, None, custom objects)
  • Nested structures: 10-20% faster for complex data with recursive processing
  • Large datasets: 5-15% faster for structures with 1000+ elements
  • Mixed type lists: Up to 37.9% faster due to reduced overhead per element

The optimization maintains identical behavior since the function only needs to handle exact built-in types (float, dict, list), making the stricter type checking safe and more efficient.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 69 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.core.security import _normalize_numbers

# unit tests

# --- Basic Test Cases ---

def test_int_unchanged():
    # Integers should be returned unchanged
    codeflash_output = _normalize_numbers(42) # 414ns -> 384ns (7.81% faster)
    codeflash_output = _normalize_numbers(-7) # 303ns -> 258ns (17.4% faster)

def test_float_to_int():
    # Floats that are mathematically integers should be converted to int
    codeflash_output = _normalize_numbers(3.0) # 455ns -> 443ns (2.71% faster)
    codeflash_output = _normalize_numbers(-10.0) # 212ns -> 160ns (32.5% faster)

def test_float_fractional_unchanged():
    # Floats with a fractional part should remain floats
    codeflash_output = _normalize_numbers(2.5) # 370ns -> 395ns (6.33% slower)
    codeflash_output = _normalize_numbers(-3.14) # 139ns -> 128ns (8.59% faster)

def test_string_unchanged():
    # Strings should be returned unchanged
    codeflash_output = _normalize_numbers("123") # 390ns -> 383ns (1.83% faster)
    codeflash_output = _normalize_numbers("3.0") # 315ns -> 269ns (17.1% faster)

def test_bool_unchanged():
    # Booleans should be returned unchanged (not treated as ints)
    codeflash_output = _normalize_numbers(True) # 491ns -> 371ns (32.3% faster)
    codeflash_output = _normalize_numbers(False) # 203ns -> 153ns (32.7% faster)

def test_none_unchanged():
    # None should be returned unchanged
    codeflash_output = _normalize_numbers(None) # 455ns -> 373ns (22.0% faster)

def test_list_of_numbers():
    # Lists of numbers should be normalized element-wise
    codeflash_output = _normalize_numbers([1, 2.0, 3.5, 4.0]) # 1.56μs -> 1.30μs (19.7% faster)
    codeflash_output = _normalize_numbers([]) # 309ns -> 310ns (0.323% slower)

def test_dict_of_numbers():
    # Dicts of numbers should be normalized value-wise
    codeflash_output = _normalize_numbers({'a': 1.0, 'b': 2.5, 'c': 3}) # 1.58μs -> 1.49μs (6.18% faster)

def test_nested_list_dict():
    # Nested lists and dicts should be normalized recursively
    data = {'x': [1.0, 2.2, {'y': 3.0, 'z': [4.0, 5.5]}]}
    expected = {'x': [1, 2.2, {'y': 3, 'z': [4, 5.5]}]}
    codeflash_output = _normalize_numbers(data) # 2.64μs -> 2.36μs (12.0% faster)

# --- Edge Test Cases ---

def test_empty_list_dict():
    # Empty lists and dicts should be handled correctly
    codeflash_output = _normalize_numbers([]) # 594ns -> 458ns (29.7% faster)
    codeflash_output = _normalize_numbers({}) # 535ns -> 468ns (14.3% faster)

def test_nested_empty_structures():
    # Nested empty lists/dicts
    codeflash_output = _normalize_numbers([[], {}]) # 1.34μs -> 1.16μs (15.6% faster)
    codeflash_output = _normalize_numbers({'a': [], 'b': {}}) # 1.00μs -> 969ns (3.61% faster)

def test_large_and_small_floats():
    # Very large and very small floats
    codeflash_output = _normalize_numbers(1e20) # 920ns -> 838ns (9.79% faster)
    codeflash_output = _normalize_numbers(1e6) # 270ns -> 254ns (6.30% faster)
    codeflash_output = _normalize_numbers(1e6 + 0.0) # 127ns -> 121ns (4.96% faster)
    codeflash_output = _normalize_numbers(1e-10) # 168ns -> 153ns (9.80% faster)
    codeflash_output = _normalize_numbers(-1e6) # 138ns -> 139ns (0.719% slower)

def test_float_precision():
    # Floats that are very close to integers but not exactly
    codeflash_output = _normalize_numbers(1.0000000001) # 370ns -> 395ns (6.33% slower)
    codeflash_output = _normalize_numbers(-2.0000000000001) # 151ns -> 153ns (1.31% slower)

def test_bool_in_list_and_dict():
    # Booleans inside lists/dicts should remain unchanged
    codeflash_output = _normalize_numbers([True, False, 1.0]) # 1.43μs -> 1.15μs (23.9% faster)
    codeflash_output = _normalize_numbers({'a': True, 'b': 2.0}) # 1.11μs -> 1.06μs (4.32% faster)

def test_tuple_unchanged():
    # Tuples are not handled, should be returned unchanged
    t = (1.0, 2.5)
    codeflash_output = _normalize_numbers(t) # 495ns -> 328ns (50.9% faster)

def test_set_unchanged():
    # Sets are not handled, should be returned unchanged
    s = {1.0, 2.5}
    codeflash_output = _normalize_numbers(s) # 358ns -> 355ns (0.845% faster)

def test_custom_object_unchanged():
    # Custom objects should be returned unchanged
    class Foo:
        pass
    f = Foo()
    codeflash_output = _normalize_numbers(f) # 648ns -> 405ns (60.0% faster)

def test_dict_with_non_str_keys():
    # Dicts with non-str keys should be handled
    d = {1: 2.0, (3, 4): 5.0}
    expected = {1: 2, (3, 4): 5}
    codeflash_output = _normalize_numbers(d) # 1.47μs -> 1.47μs (0.340% slower)

def test_list_with_mixed_types():
    # Lists with mixed types (numbers, strings, dicts, lists)
    data = [1.0, '2.0', [3.0, {'a': 4.0}], None]
    expected = [1, '2.0', [3, {'a': 4}], None]
    codeflash_output = _normalize_numbers(data) # 2.38μs -> 2.05μs (16.2% faster)

# --- Large Scale Test Cases ---

def test_large_list():
    # Large list of floats (some integers, some not)
    N = 1000
    data = [float(i) if i % 2 == 0 else float(i) + 0.5 for i in range(N)]
    expected = [i if i % 2 == 0 else float(i) + 0.5 for i in range(N)]
    codeflash_output = _normalize_numbers(data) # 48.2μs -> 45.1μs (6.78% faster)

def test_large_nested_structure():
    # Large nested structure of dicts and lists
    N = 100
    data = {'outer': [{'inner': [float(i), float(i) + 0.1]} for i in range(N)]}
    expected = {'outer': [{'inner': [i, float(i) + 0.1]} for i in range(N)]}
    codeflash_output = _normalize_numbers(data) # 39.6μs -> 37.5μs (5.60% faster)

def test_large_dict():
    # Large dictionary with float values
    N = 1000
    data = {f'k{i}': float(i) for i in range(N)}
    expected = {f'k{i}': i for i in range(N)}
    codeflash_output = _normalize_numbers(data) # 88.7μs -> 85.4μs (3.84% faster)

def test_large_mixed_types():
    # Large list with mixed types
    N = 500
    data = [float(i) if i % 3 == 0 else str(i) if i % 3 == 1 else [float(i)] for i in range(N)]
    expected = [i if i % 3 == 0 else str(i) if i % 3 == 1 else [i] for i in range(N)]
    for idx, val in enumerate(_normalize_numbers(data)):
        if idx % 3 == 0:
            pass
        elif idx % 3 == 1:
            pass
        else:
            pass

# --- Mutation Testing Guards ---

def test_float_minus_zero():
    # -0.0 is a float, but int(-0.0) is 0
    codeflash_output = _normalize_numbers(-0.0) # 458ns -> 497ns (7.85% slower)

def test_float_vs_int_type():
    # Make sure returned zero is int, not float
    codeflash_output = _normalize_numbers(0.0); result = codeflash_output # 460ns -> 476ns (3.36% slower)

def test_no_side_effects_on_input():
    # Function should not mutate input data
    data = [1.0, 2.5, {'x': 3.0}]
    import copy
    data_copy = copy.deepcopy(data)
    codeflash_output = _normalize_numbers(data); _ = codeflash_output # 1.72μs -> 1.54μs (11.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from typing import Any

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.core.security import _normalize_numbers

# unit tests

# ----------------------------
# Basic Test Cases
# ----------------------------

def test_int_is_unchanged():
    # Integers should be returned unchanged
    codeflash_output = _normalize_numbers(5) # 447ns -> 420ns (6.43% faster)
    codeflash_output = _normalize_numbers(-10) # 338ns -> 296ns (14.2% faster)
    codeflash_output = _normalize_numbers(0) # 203ns -> 182ns (11.5% faster)

def test_float_is_integer():
    # Floats that are mathematically integers should be converted to int
    codeflash_output = _normalize_numbers(5.0) # 513ns -> 500ns (2.60% faster)
    codeflash_output = _normalize_numbers(-10.0) # 193ns -> 197ns (2.03% slower)
    codeflash_output = _normalize_numbers(0.0) # 127ns -> 145ns (12.4% slower)

def test_float_is_not_integer():
    # Non-integer floats should be returned unchanged
    codeflash_output = _normalize_numbers(3.14) # 397ns -> 414ns (4.11% slower)
    codeflash_output = _normalize_numbers(-2.718) # 128ns -> 124ns (3.23% faster)

def test_simple_list():
    # Lists containing ints, integer floats, and non-integer floats
    codeflash_output = _normalize_numbers([1, 2.0, 3.5]) # 1.47μs -> 1.26μs (16.2% faster)

def test_simple_dict():
    # Dicts containing ints, integer floats, and non-integer floats
    codeflash_output = _normalize_numbers({'a': 1, 'b': 2.0, 'c': 3.5}) # 1.64μs -> 1.56μs (4.86% faster)

# ----------------------------
# Edge Test Cases
# ----------------------------

def test_nested_structures():
    # Nested lists and dicts
    data = {'a': [1.0, 2.5, {'b': 3.0}], 'c': {'d': [4.0, 5.1]}}
    expected = {'a': [1, 2.5, {'b': 3}], 'c': {'d': [4, 5.1]}}
    codeflash_output = _normalize_numbers(data) # 2.87μs -> 2.60μs (10.5% faster)

def test_empty_list_and_dict():
    # Empty structures should be returned as is
    codeflash_output = _normalize_numbers([]) # 631ns -> 502ns (25.7% faster)
    codeflash_output = _normalize_numbers({}) # 589ns -> 542ns (8.67% faster)

def test_list_of_dicts():
    # List of dicts with mixed types
    data = [{'a': 1.0}, {'b': 2.2}, {'c': 3}]
    expected = [{'a': 1}, {'b': 2.2}, {'c': 3}]
    codeflash_output = _normalize_numbers(data) # 2.32μs -> 2.13μs (8.91% faster)

def test_dict_with_list_values():
    # Dict with list values
    data = {'a': [1.0, 2.1], 'b': [3.0, 4]}
    expected = {'a': [1, 2.1], 'b': [3, 4]}
    codeflash_output = _normalize_numbers(data) # 2.20μs -> 1.94μs (13.2% faster)

def test_no_conversion_for_str_or_bool():
    # Strings, bools, and None should be left unchanged
    codeflash_output = _normalize_numbers("5.0") # 383ns -> 354ns (8.19% faster)
    codeflash_output = _normalize_numbers(True) # 446ns -> 298ns (49.7% faster)
    codeflash_output = _normalize_numbers(None) # 225ns -> 140ns (60.7% faster)

def test_dict_with_non_str_keys():
    # Dicts with non-string keys should be handled correctly
    data = {1: 2.0, (3, 4): 5.5}
    expected = {1: 2, (3, 4): 5.5}
    codeflash_output = _normalize_numbers(data) # 1.40μs -> 1.44μs (2.51% slower)

def test_list_with_mixed_types():
    # List with ints, floats, strings, bools, None
    data = [1, 2.0, 3.5, "4.0", False, None]
    expected = [1, 2, 3.5, "4.0", False, None]
    codeflash_output = _normalize_numbers(data) # 1.87μs -> 1.49μs (26.0% faster)

def test_dict_with_bool_and_none_values():
    # Dict with bool and None values
    data = {'a': True, 'b': None, 'c': 1.0}
    expected = {'a': True, 'b': None, 'c': 1}
    codeflash_output = _normalize_numbers(data) # 1.64μs -> 1.49μs (10.1% faster)

def test_float_special_values():
    # Test with special float values (inf, -inf, nan)
    import math
    codeflash_output = _normalize_numbers([float('inf'), float('-inf'), float('nan')]); result = codeflash_output # 1.15μs -> 962ns (19.5% faster)

def test_tuple_is_unchanged():
    # Tuples should be left unchanged (as per implementation)
    data = (1.0, 2.2, 3)
    codeflash_output = _normalize_numbers(data) # 498ns -> 351ns (41.9% faster)

def test_set_is_unchanged():
    # Sets should be left unchanged (as per implementation)
    data = {1.0, 2, 3.5}
    codeflash_output = _normalize_numbers(data) # 329ns -> 346ns (4.91% slower)

# ----------------------------
# Large Scale Test Cases
# ----------------------------

def test_large_list_of_floats():
    # Large list of floats, half integer-valued, half not
    size = 1000
    data = [float(i) if i % 2 == 0 else i + 0.5 for i in range(size)]
    expected = [int(i) if i % 2 == 0 else i + 0.5 for i in range(size)]
    codeflash_output = _normalize_numbers(data) # 47.7μs -> 45.1μs (5.66% faster)

def test_large_nested_structure():
    # Large nested structure of dicts and lists
    size = 100
    data = [{'a': [float(i), i + 0.1]} for i in range(size)]
    expected = [{'a': [i, i + 0.1]} for i in range(size)]
    codeflash_output = _normalize_numbers(data) # 37.3μs -> 32.5μs (14.7% faster)

def test_deeply_nested_structure():
    # Deeply nested structure (depth 10)
    data = 1.0
    for _ in range(10):
        data = [data]
    # After normalization, all should be int
    codeflash_output = _normalize_numbers(data); result = codeflash_output # 1.93μs -> 1.67μs (16.1% faster)
    # Unwrap 10 times and check value
    for _ in range(10):
        result = result[0]

def test_large_dict():
    # Large dictionary with float values
    size = 1000
    data = {i: float(i) for i in range(size)}
    expected = {i: i for i in range(size)}
    codeflash_output = _normalize_numbers(data) # 67.5μs -> 67.6μs (0.096% slower)

def test_large_mixed_types():
    # Large list with mixed types
    size = 500
    data = [float(i) if i % 3 == 0 else i if i % 3 == 1 else str(i) for i in range(size)]
    expected = [int(i) if i % 3 == 0 else i if i % 3 == 1 else str(i) for i in range(size)]
    codeflash_output = _normalize_numbers(data) # 34.1μs -> 24.7μs (37.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_normalize_numbers-mjah5e9n and push.

Codeflash Static Badge

The optimization replaces `isinstance(x, type)` with `type(x) is type` for three type checks: float, dict, and list. This change provides a **7% speedup** by leveraging Python's faster identity comparison over attribute lookup and method calls.

**Key Performance Improvements:**

1. **Faster type checking**: `type(x) is float` avoids the overhead of `isinstance()`, which must perform method resolution, inheritance checking, and multiple comparisons. The `is` operator performs a simple identity check against the exact type object.

2. **Reduced function call overhead**: `isinstance()` is a built-in function call, while `type()` is a more direct operation followed by identity comparison.

3. **Eliminated inheritance traversal**: `isinstance()` checks the entire method resolution order for subclasses, while `type(x) is float` only matches exact types.

**Impact Analysis:**
The function is called from `_normalize_json_dumps()` for JSON serialization, where it processes potentially large nested data structures. The 7% improvement compounds across recursive calls - with large nested structures showing up to 37.9% speedup in test cases with 500+ mixed-type elements.

**Test Case Performance:**
- Simple type checks: 7-60% faster for basic types (bool, None, custom objects)
- Nested structures: 10-20% faster for complex data with recursive processing
- Large datasets: 5-15% faster for structures with 1000+ elements
- Mixed type lists: Up to 37.9% faster due to reduced overhead per element

The optimization maintains identical behavior since the function only needs to handle exact built-in types (float, dict, list), making the stricter type checking safe and more efficient.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 17, 2025 20:39
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant