Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 32% (0.32x) speedup for _is_approx_fp in quantecon/_compute_fp.py

⏱️ Runtime : 238 microseconds 180 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code introduces a Numba-compiled function (_numba_max_abs_diff) that replaces the NumPy operation np.max(np.abs(result - v)) in the performance-critical path. The key optimization is avoiding NumPy's temporary array creation and vectorized operations in favor of a simple loop that Numba can compile to fast machine code.

Key optimizations:

  1. Numba JIT compilation: The @njit(fastmath=True, cache=True) decorator compiles the max absolute difference calculation to optimized machine code, eliminating Python overhead and enabling aggressive floating-point optimizations.
  2. Memory efficiency: The manual loop avoids creating the temporary arrays that np.abs(result - v) would generate, reducing memory allocation and cache pressure.
  3. Smart fallback logic: The optimization only applies when both inputs are NumPy arrays with matching shapes, dtypes, and numeric types, ensuring correctness for edge cases.

Why this leads to speedup:

  • Numba's compiled loop is faster than NumPy's general-purpose vectorized operations for this specific computation pattern
  • Eliminates temporary array allocation from result - v and np.abs(...)
  • The fastmath=True flag enables unsafe floating-point optimizations that can further accelerate the computation

Impact on workloads:
Based on the function_references, this function is called within compute_fixed_point, which is likely used in iterative algorithms for economic modeling. The 32% speedup will compound across iterations, making fixed-point computations significantly faster. The test results show consistent 20-75% improvements across various array sizes and value ranges, with the largest gains on exact fixed points and large vectors.

Test case performance:
The optimization performs best on exact matches (65-75% faster) and large arrays (25-55% faster), while maintaining correctness for edge cases like NaN/Inf values and non-array inputs through the fallback mechanism.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 51 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._compute_fp import _is_approx_fp

# unit tests

# -----------------------------
# Basic Test Cases
# -----------------------------

def test_exact_fixed_point_scalar():
    # T returns v unchanged, so error = 0 <= error_tol
    def T(x):
        return x
    v = np.array([1.0])
    codeflash_output = _is_approx_fp(T, v, 1e-8) # 3.92μs -> 2.33μs (67.9% faster)

def test_exact_fixed_point_vector():
    # T returns v unchanged, so error = 0 <= error_tol
    def T(x):
        return x
    v = np.array([1.0, -2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 1e-8) # 3.79μs -> 2.29μs (65.5% faster)

def test_small_difference_within_tol():
    # T perturbs v by a small amount, within tolerance
    def T(x):
        return x + 1e-6
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 1e-5) # 4.29μs -> 3.21μs (33.8% faster)

def test_difference_exceeds_tol():
    # T perturbs v by more than tolerance
    def T(x):
        return x + 0.1
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.05) # 4.25μs -> 3.08μs (37.9% faster)

def test_negative_values():
    # Test with negative numbers, within tolerance
    def T(x):
        return x - 1e-7
    v = np.array([-1.0, -2.0, -3.0])
    codeflash_output = _is_approx_fp(T, v, 1e-6) # 4.29μs -> 3.25μs (32.1% faster)

# -----------------------------
# Edge Test Cases
# -----------------------------

def test_zero_tolerance_exact_match():
    # If error_tol is zero, only exact match passes
    def T(x):
        return x
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 3.67μs -> 2.25μs (63.0% faster)

def test_zero_tolerance_inexact_match():
    # Any nonzero error fails if error_tol is zero
    def T(x):
        return x + 1e-10
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 4.17μs -> 3.08μs (35.1% faster)

def test_single_element_vector():
    # Single element, within tolerance
    def T(x):
        return x + 1e-8
    v = np.array([42.0])
    codeflash_output = _is_approx_fp(T, v, 1e-7) # 4.33μs -> 3.17μs (36.9% faster)

def test_nan_in_vector():
    # If T(v) or v contains nan, np.abs returns nan, np.max returns nan, so error <= tol is always False
    def T(x):
        return x
    v = np.array([1.0, np.nan, 3.0])
    codeflash_output = _is_approx_fp(T, v, 1.0) # 3.92μs -> 2.25μs (74.0% faster)

def test_inf_in_vector():
    # If T(v) or v contains inf, error will be inf, so should return False unless tol is inf
    def T(x):
        return x
    v = np.array([1.0, np.inf, 3.0])
    codeflash_output = _is_approx_fp(T, v, 1e6) # 7.17μs -> 2.25μs (218% faster)
    # If tol is inf, should return True
    codeflash_output = _is_approx_fp(T, v, np.inf) # 3.08μs -> 1.00μs (208% faster)

def test_negative_tolerance():
    # Negative tolerance should never pass unless error is negative (impossible)
    def T(x):
        return x
    v = np.array([1.0, 2.0])
    codeflash_output = _is_approx_fp(T, v, -1.0) # 3.75μs -> 2.25μs (66.7% faster)

def test_function_with_args_kwargs():
    # T uses extra arguments
    def T(x, a, b=1.0):
        return x * a + b
    v = np.array([2.0, 4.0])
    # T(v, 1, b=0) == v, so error = 0
    codeflash_output = _is_approx_fp(T, v, 1e-8, 1, b=0) # 6.21μs -> 5.04μs (23.1% faster)
    # T(v, 2, b=0) == 2*v, error = max(abs(2*v - v)) = max(abs(v)) = 4.0
    codeflash_output = _is_approx_fp(T, v, 3.0, 2, b=0) # 3.71μs -> 2.71μs (36.9% faster)

# -----------------------------
# Large Scale Test Cases
# -----------------------------

def test_large_vector_within_tol():
    # Large vector, all differences within tolerance
    n = 1000
    v = np.linspace(-100, 100, n)
    def T(x):
        return x + 1e-6
    codeflash_output = _is_approx_fp(T, v, 1e-5) # 4.88μs -> 3.88μs (25.8% faster)

def test_large_vector_exceeds_tol():
    # Large vector, one element exceeds tolerance
    n = 1000
    v = np.linspace(-100, 100, n)
    def T(x):
        y = np.copy(x)
        y[500] += 1.0  # large jump
        return y
    codeflash_output = _is_approx_fp(T, v, 0.5) # 6.00μs -> 5.04μs (19.0% faster)

def test_large_vector_all_elements_exceed_tol():
    # Large vector, all elements exceed tolerance
    n = 1000
    v = np.linspace(-100, 100, n)
    def T(x):
        return x + 10.0
    codeflash_output = _is_approx_fp(T, v, 5.0) # 4.67μs -> 3.71μs (25.8% faster)

def test_large_vector_all_elements_exact():
    # Large vector, T returns v exactly
    n = 1000
    v = np.random.rand(n)
    def T(x):
        return x
    codeflash_output = _is_approx_fp(T, v, 1e-12) # 4.42μs -> 2.83μs (55.9% faster)

def test_large_vector_random_noise_within_tol():
    # Large vector, random noise within tolerance
    np.random.seed(42)
    n = 1000
    v = np.random.normal(0, 1, n)
    noise = np.random.uniform(-1e-6, 1e-6, n)
    def T(x):
        return x + noise
    codeflash_output = _is_approx_fp(T, v, 2e-6) # 4.71μs -> 3.71μs (27.0% faster)

def test_large_vector_random_noise_exceeds_tol():
    # Large vector, random noise, some elements exceed tolerance
    np.random.seed(43)
    n = 1000
    v = np.random.normal(0, 1, n)
    noise = np.random.uniform(-1e-2, 1e-2, n)
    def T(x):
        return x + noise
    codeflash_output = _is_approx_fp(T, v, 5e-3) # 4.62μs -> 3.58μs (29.1% faster)

# -----------------------------
# Additional Tests for Mutation Coverage
# -----------------------------

def test_non_array_input():
    # Accepts python list as input (should work due to numpy conversion)
    def T(x):
        return np.array(x)
    v = [1.0, 2.0, 3.0]
    codeflash_output = _is_approx_fp(T, v, 1e-8) # 4.88μs -> 6.62μs (26.4% slower)

def test_output_shape_mismatch():
    # If T returns array of different shape, should raise ValueError
    def T(x):
        return np.append(x, 0)
    v = np.array([1.0, 2.0, 3.0])
    with pytest.raises(ValueError):
        _is_approx_fp(T, v, 1e-8) # 7.58μs -> 7.71μs (1.62% slower)

def test_non_numeric_input():
    # If v contains non-numeric, should raise TypeError
    def T(x):
        return x
    v = np.array(['a', 'b', 'c'])
    with pytest.raises(TypeError):
        _is_approx_fp(T, v, 1e-8) # 2.83μs -> 4.29μs (34.0% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon._compute_fp import _is_approx_fp

# unit tests

# ---------------------- Basic Test Cases ----------------------

def test_exact_fixed_point_scalar():
    # T is identity, v is fixed point, error_tol=0
    T = lambda x: x
    v = np.array([1.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 3.92μs -> 2.38μs (64.9% faster)

def test_exact_fixed_point_vector():
    # T is identity, v is fixed point, error_tol=0
    T = lambda x: x
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 3.79μs -> 2.17μs (75.1% faster)

def test_within_tolerance():
    # T perturbs v by less than error_tol
    T = lambda x: x + 0.001
    v = np.array([2.0])
    codeflash_output = _is_approx_fp(T, v, 0.01) # 4.38μs -> 3.25μs (34.6% faster)

def test_outside_tolerance():
    # T perturbs v by more than error_tol
    T = lambda x: x + 0.1
    v = np.array([2.0])
    codeflash_output = not _is_approx_fp(T, v, 0.01) # 4.25μs -> 3.12μs (36.0% faster)

def test_negative_values():
    # T is identity, v contains negatives, should be fixed point
    T = lambda x: x
    v = np.array([-1.0, -2.0, -3.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 3.75μs -> 2.25μs (66.7% faster)

def test_multiple_elements_within_tolerance():
    # T perturbs each element by less than error_tol
    T = lambda x: x + np.array([0.001, -0.002, 0.0005])
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.01) # 4.46μs -> 3.38μs (32.1% faster)

def test_multiple_elements_outside_tolerance():
    # T perturbs one element by more than error_tol
    T = lambda x: x + np.array([0.001, 0.02, 0.0005])
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = not _is_approx_fp(T, v, 0.01) # 4.33μs -> 3.21μs (35.1% faster)

# ---------------------- Edge Test Cases ----------------------

def test_zero_vector():
    # v is all zeros, T is identity
    T = lambda x: x
    v = np.zeros(5)
    codeflash_output = _is_approx_fp(T, v, 0.0) # 3.96μs -> 2.25μs (75.9% faster)

def test_error_tol_zero_exact():
    # T is identity, error_tol=0, should pass
    T = lambda x: x
    v = np.array([1.0, 2.0])
    codeflash_output = _is_approx_fp(T, v, 0.0) # 7.67μs -> 4.42μs (73.6% faster)

def test_error_tol_zero_not_exact():
    # T perturbs v, error_tol=0, should fail
    T = lambda x: x + 1e-10
    v = np.array([1.0, 2.0])
    codeflash_output = not _is_approx_fp(T, v, 0.0) # 5.50μs -> 4.88μs (12.8% faster)

def test_large_and_small_values():
    # v contains both large and small numbers, T perturbs slightly
    T = lambda x: x + np.array([1e-10, -1e10])
    v = np.array([1e-10, 1e10])
    # max error is 2e10, so should fail for small error_tol
    codeflash_output = not _is_approx_fp(T, v, 1e-5) # 4.71μs -> 3.58μs (31.4% faster)
    # should pass for large error_tol
    codeflash_output = _is_approx_fp(T, v, 2e10) # 2.88μs -> 1.79μs (60.4% faster)

def test_nan_in_vector():
    # v contains nan, T is identity; np.abs(nan) is nan, np.max(nan) is nan, nan <= error_tol is False
    T = lambda x: x
    v = np.array([1.0, np.nan])
    codeflash_output = not _is_approx_fp(T, v, 1.0) # 3.83μs -> 2.33μs (64.3% faster)

def test_inf_in_vector():
    # v contains inf, T is identity; np.abs(inf) is inf, np.max(inf) is inf, inf <= error_tol is False
    T = lambda x: x
    v = np.array([1.0, np.inf])
    codeflash_output = not _is_approx_fp(T, v, 1e20) # 8.62μs -> 2.29μs (276% faster)

def test_negative_error_tol():
    # error_tol negative, should only pass if error is negative (impossible)
    T = lambda x: x
    v = np.array([1.0])
    codeflash_output = not _is_approx_fp(T, v, -1.0) # 3.92μs -> 2.29μs (70.9% faster)

def test_non_array_input():
    # v is a list, should be handled as np.array
    T = lambda x: np.array(x) + 1e-6
    v = [1.0, 2.0, 3.0]
    codeflash_output = _is_approx_fp(T, v, 1e-5) # 5.58μs -> 9.75μs (42.7% slower)

def test_function_with_args_kwargs():
    # T uses an extra argument
    def T(x, scale=1.0):
        return x * scale
    v = np.array([2.0, 4.0])
    # scale=1.0: fixed point
    codeflash_output = _is_approx_fp(T, v, 0.0, scale=1.0) # 5.17μs -> 3.92μs (31.9% faster)
    # scale=1.1: not fixed point
    codeflash_output = not _is_approx_fp(T, v, 0.05, scale=1.1) # 2.83μs -> 1.88μs (51.1% faster)

# ---------------------- Large Scale Test Cases ----------------------

def test_large_vector_fixed_point():
    # Large v, T is identity
    T = lambda x: x
    v = np.ones(1000)
    codeflash_output = _is_approx_fp(T, v, 0.0) # 5.04μs -> 3.00μs (68.1% faster)

def test_large_vector_small_perturbation():
    # Large v, T perturbs by small value
    T = lambda x: x + 1e-9
    v = np.ones(1000)
    codeflash_output = _is_approx_fp(T, v, 1e-8) # 5.25μs -> 4.21μs (24.7% faster)

def test_large_vector_large_perturbation():
    # Large v, T perturbs by large value
    T = lambda x: x + 1.0
    v = np.ones(1000)
    codeflash_output = not _is_approx_fp(T, v, 0.5) # 5.08μs -> 4.08μs (24.5% faster)

def test_large_vector_one_large_perturbation():
    # Large v, only one element perturbed above tol
    def T(x):
        y = x.copy()
        y[500] += 1.0
        return y
    v = np.ones(1000)
    codeflash_output = not _is_approx_fp(T, v, 0.5) # 6.33μs -> 5.17μs (22.6% faster)

def test_large_vector_all_within_tolerance():
    # Large v, all elements perturbed within tol
    T = lambda x: x + np.full_like(x, 0.001)
    v = np.arange(1000, dtype=float)
    codeflash_output = _is_approx_fp(T, v, 0.01) # 7.67μs -> 6.88μs (11.5% faster)

def test_large_vector_random_perturbation():
    # Large v, random perturbation within tolerance
    rng = np.random.default_rng(123)
    perturb = rng.uniform(-0.001, 0.001, 1000)
    T = lambda x: x + perturb
    v = np.linspace(-1000, 1000, 1000)
    codeflash_output = _is_approx_fp(T, v, 0.002) # 4.92μs -> 3.83μs (28.3% faster)
    # If we lower tolerance, should fail if any perturb > tol
    codeflash_output = not _is_approx_fp(T, v, 0.0005) # 3.25μs -> 2.38μs (36.8% faster)

# ---------------------- Determinism Test ----------------------

def test_determinism():
    # Running twice with same inputs should always give same result
    T = lambda x: x + 0.01
    v = np.array([1.0, 2.0, 3.0])
    codeflash_output = _is_approx_fp(T, v, 0.02); result1 = codeflash_output # 4.42μs -> 3.25μs (35.9% faster)
    codeflash_output = _is_approx_fp(T, v, 0.02); result2 = codeflash_output # 2.71μs -> 1.62μs (66.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_is_approx_fp-mj9zqn5k and push.

Codeflash Static Badge

The optimized code introduces a **Numba-compiled function** (`_numba_max_abs_diff`) that replaces the NumPy operation `np.max(np.abs(result - v))` in the performance-critical path. The key optimization is avoiding NumPy's temporary array creation and vectorized operations in favor of a simple loop that Numba can compile to fast machine code.

**Key optimizations:**
1. **Numba JIT compilation**: The `@njit(fastmath=True, cache=True)` decorator compiles the max absolute difference calculation to optimized machine code, eliminating Python overhead and enabling aggressive floating-point optimizations.
2. **Memory efficiency**: The manual loop avoids creating the temporary arrays that `np.abs(result - v)` would generate, reducing memory allocation and cache pressure.
3. **Smart fallback logic**: The optimization only applies when both inputs are NumPy arrays with matching shapes, dtypes, and numeric types, ensuring correctness for edge cases.

**Why this leads to speedup:**
- Numba's compiled loop is faster than NumPy's general-purpose vectorized operations for this specific computation pattern
- Eliminates temporary array allocation from `result - v` and `np.abs(...)`  
- The `fastmath=True` flag enables unsafe floating-point optimizations that can further accelerate the computation

**Impact on workloads:**
Based on the `function_references`, this function is called within `compute_fixed_point`, which is likely used in iterative algorithms for economic modeling. The 32% speedup will compound across iterations, making fixed-point computations significantly faster. The test results show consistent 20-75% improvements across various array sizes and value ranges, with the largest gains on exact fixed points and large vectors.

**Test case performance:**
The optimization performs best on exact matches (65-75% faster) and large arrays (25-55% faster), while maintaining correctness for edge cases like NaN/Inf values and non-array inputs through the fallback mechanism.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 12:31
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant