Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 34% (0.34x) speedup for IVP.solve in quantecon/_ivp.py

⏱️ Runtime : 26.6 milliseconds 19.9 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 33% speedup primarily through eliminating repeated array allocations in the integration trajectory building process.

Key Optimizations Applied:

  1. Numba-accelerated array operations: Added @njit compiled functions _hstack_numba() and _vstack_numba() that replace the expensive np.hstack() and np.vstack() calls. These eliminate Python function call overhead and optimize memory access patterns.

  2. Batch array construction strategy: Instead of repeatedly calling np.vstack() to grow the solution array at each integration step (which creates new arrays each time), the optimized version:

    • Collects integration steps in a Python list during the loop
    • Performs a single bulk np.vstack() operation at the end
    • This reduces memory allocations from O(n²) to O(n) where n is the number of integration steps

Why This Leads to Speedup:

The original profiler results show that np.hstack() and np.vstack() operations consumed ~52% of total runtime (26.3% + 25.7% in _integrate_fixed_trajectory). The optimized version reduces this overhead significantly by:

  • Faster compiled array operations via Numba
  • Eliminating quadratic memory growth from repeated vstacking
  • Reducing temporary object creation and garbage collection pressure

Performance Impact by Test Cases:

The optimization is most effective for:

  • Long integrations (1000+ steps): 30-43% speedup as seen in test_large_number_of_steps and test_large_backward_integration
  • High-dimensional systems: 16.5% speedup for 100-dimensional ODEs
  • Variable trajectory cases: 36-43% speedup when many steps are collected

The optimization maintains identical behavior and accuracy while providing consistent performance gains across all integration scenarios, with larger benefits for computationally intensive cases involving many integration steps.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 70 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest
from quantecon._ivp import IVP
from scipy import integrate

# unit tests

# --------- BASIC TEST CASES ---------

def test_scalar_ode_fixed_terminal():
    # dy/dt = -y, y(0) = 1, analytic: y(t) = exp(-t)
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.1, 1.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 97.2μs -> 82.0μs (18.6% faster)
    # Check solution is close to analytic at T
    analytic = np.exp(-sol[:, 0])

def test_vector_ode_fixed_terminal():
    # dy/dt = [-y0, -2*y1], y(0) = [1, 2]
    f = lambda t, y: np.array([-y[0], -2*y[1]])
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0, 2.0], 0.1, 1.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 136μs -> 123μs (10.8% faster)
    # Analytic: y0 = exp(-t), y1 = 2*exp(-2t)
    analytic0 = np.exp(-sol[:, 0])
    analytic1 = 2 * np.exp(-2 * sol[:, 0])

def test_scalar_ode_variable_stop():
    # dy/dt = -y, y(0) = 1, stop when y < 0.5
    f = lambda t, y: -y
    g = lambda t, y: y[0]  # stop when y[0] < tol
    ivp = IVP(f)
    t0, y0, h, tol = 0.0, [1.0], 0.1, 0.5
    codeflash_output = ivp.solve(t0, y0, h=h, g=g, tol=tol); sol = codeflash_output # 70.6μs -> 61.6μs (14.6% faster)

def test_list_returning_rhs():
    # dy/dt = [y0+y1, y0-y1], y(0) = [1, 2]
    f = lambda t, y: [y[0]+y[1], y[0]-y[1]]
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0, 2.0], 0.1, 0.5
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 67.0μs -> 61.7μs (8.64% faster)

def test_negative_h():
    # dy/dt = y, y(1) = 1, integrate backward to t=0
    f = lambda t, y: y
    ivp = IVP(f)
    t0, y0, h, T = 1.0, [1.0], -0.1, 0.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 77.6μs -> 63.9μs (21.4% faster)
    # Analytic: y(t) = exp(t-1)
    analytic = np.exp(sol[:, 0] - 1)

def test_fixed_and_variable_stop_priority():
    # Should use variable stop if both g/tol and T are given
    f = lambda t, y: -y
    g = lambda t, y: y[0]
    ivp = IVP(f)
    t0, y0, h, T, tol = 0.0, [1.0], 0.1, 2.0, 0.5
    codeflash_output = ivp.solve(t0, y0, h=h, T=T, g=g, tol=tol); sol = codeflash_output # 70.6μs -> 60.3μs (17.0% faster)

# --------- EDGE TEST CASES ---------

def test_missing_T_and_g_raises():
    # Should raise ValueError if neither T nor g/tol are given
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h = 0.0, [1.0], 0.1
    with pytest.raises(ValueError):
        ivp.solve(t0, y0, h=h) # 9.96μs -> 10.0μs (0.836% slower)

def test_variable_stop_missing_tol_raises():
    # Should raise if g is given but tol is not
    f = lambda t, y: -y
    g = lambda t, y: y[0]
    ivp = IVP(f)
    t0, y0, h = 0.0, [1.0], 0.1
    with pytest.raises(ValueError):
        ivp.solve(t0, y0, h=h, g=g) # 10.0μs -> 10.1μs (0.823% slower)

def test_variable_stop_missing_g_raises():
    # Should raise if tol is given but g is not
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h = 0.0, [1.0], 0.1
    with pytest.raises(ValueError):
        ivp.solve(t0, y0, h=h, tol=0.5) # 9.79μs -> 10.0μs (2.48% slower)

def test_zero_step_size():
    # h=0 should not progress; should return only initial condition
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.0, 1.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 26.5μs -> 27.1μs (2.15% slower)

def test_empty_y0():
    # y0 is empty: should raise or handle gracefully
    f = lambda t, y: []
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [], 0.1, 1.0
    with pytest.raises(Exception):
        ivp.solve(t0, y0, h=h, T=T) # 22.5μs -> 21.7μs (3.45% faster)

def test_non_numeric_y0():
    # y0 is non-numeric: should raise
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, ["a"], 0.1, 1.0
    with pytest.raises(Exception):
        ivp.solve(t0, y0, h=h, T=T) # 7.12μs -> 7.25μs (1.72% slower)

def test_non_callable_g():
    # g is not callable: should raise
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, tol = 0.0, [1.0], 0.1, 0.5
    with pytest.raises(Exception):
        ivp.solve(t0, y0, h=h, g=5, tol=tol) # 31.9μs -> 26.6μs (19.7% faster)

def test_large_number_of_steps():
    # Integrate with many steps (but <1000)
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.001, 0.5  # 500 steps
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 3.55ms -> 2.73ms (30.1% faster)

def test_large_vector_ode():
    # Large system: dy/dt = -y, y0 = np.arange(100)
    n = 100
    f = lambda t, y: -y
    y0 = np.arange(n, dtype=float)
    ivp = IVP(f)
    t0, h, T = 0.0, 0.1, 1.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 110μs -> 95.0μs (16.5% faster)

def test_large_variable_stop():
    # Large system, variable stop
    n = 50
    f = lambda t, y: -y
    g = lambda t, y: np.max(y)
    y0 = np.arange(n, dtype=float) + 1
    ivp = IVP(f)
    t0, h, tol = 0.0, 0.05, 0.1
    codeflash_output = ivp.solve(t0, y0, h=h, g=g, tol=tol); sol = codeflash_output # 1.37ms -> 955μs (43.3% faster)

def test_performance_under_large_steps():
    # Not a performance test, but ensure function completes for large steps
    f = lambda t, y: -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.5, 10.0  # 20 steps
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 233μs -> 207μs (12.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest
from quantecon._ivp import IVP
from scipy import integrate

# ------------------- UNIT TESTS -------------------

# 1. BASIC TEST CASES

def test_linear_scalar_ode_fixed_terminal():
    # dy/dt = -y, y(0) = 1, exact: y(t) = exp(-t)
    def f(t, y): return -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.1, 1.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 96.7μs -> 82.4μs (17.4% faster)
    # Check solution is close to analytical at final time
    y_true = np.exp(-sol[-1, 0])

def test_linear_vector_ode_fixed_terminal():
    # dy/dt = A y, y(0) = [1, 0], A = [[0, 1], [-1, 0]], solution: rotation
    def f(t, y): return np.dot(np.array([[0, 1], [-1, 0]]), y)
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0, 0.0], 0.05, np.pi/2
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 461μs -> 419μs (9.98% faster)
    # Should rotate from (1,0) to (0,1) at t=pi/2
    y_true = [0.0, 1.0]

def test_variable_trajectory_stopping_condition():
    # dy/dt = -2y, y(0)=2, stop when y < 0.1
    def f(t, y): return -2*y
    def g(t, y, *args): return y[0]
    ivp = IVP(f)
    t0, y0, h = 0.0, [2.0], 0.2
    tol = 0.1
    codeflash_output = ivp.solve(t0, y0, h=h, g=g, tol=tol); sol = codeflash_output # 146μs -> 136μs (7.65% faster)
    if sol.shape[0] > 1:
        pass

def test_integrator_choice():
    # Test with integrator 'vode'
    def f(t, y): return -y
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=0.1, T=1.0, integrator='vode'); sol = codeflash_output # 89.5μs -> 76.0μs (17.7% faster)
    # Test with integrator 'lsoda'
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=0.1, T=1.0, integrator='lsoda'); sol = codeflash_output # 94.0μs -> 78.5μs (19.8% faster)

# 2. EDGE TEST CASES

def test_zero_step_size():
    # h=0 should result in a solution with only the initial condition
    def f(t, y): return y
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=0.0, T=0.0); sol = codeflash_output # 26.9μs -> 27.7μs (2.86% slower)

def test_negative_step_backward_integration():
    # Integrate backward: dy/dt = y, y(0)=1, integrate to t=-1
    def f(t, y): return y
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=-0.1, T=-1.0); sol = codeflash_output # 78.3μs -> 65.0μs (20.5% faster)

def test_non_array_initial_condition():
    # y0 as scalar
    def f(t, y): return -y
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, 1.0, h=0.1, T=1.0); sol = codeflash_output # 95.8μs -> 81.8μs (17.2% faster)

def test_missing_T_and_g_raises():
    # Should raise ValueError if neither T nor (g, tol) are provided
    def f(t, y): return y
    ivp = IVP(f)
    with pytest.raises(ValueError):
        ivp.solve(0.0, [1.0]) # 10.5μs -> 10.6μs (1.18% slower)

def test_g_without_tol_raises():
    # Should raise ValueError if g is provided without tol
    def f(t, y): return y
    def g(t, y, *args): return y[0]
    ivp = IVP(f)
    with pytest.raises(ValueError):
        ivp.solve(0.0, [1.0], g=g) # 10.5μs -> 10.7μs (1.17% slower)

def test_tol_without_g_raises():
    # Should raise ValueError if tol is provided without g
    def f(t, y): return y
    ivp = IVP(f)
    with pytest.raises(ValueError):
        ivp.solve(0.0, [1.0], tol=1e-2) # 10.3μs -> 10.4μs (0.405% slower)

def test_stop_condition_met_immediately():
    # g returns < tol at initial condition
    def f(t, y): return y
    def g(t, y, *args): return 0.0
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=0.1, g=g, tol=0.5); sol = codeflash_output # 22.3μs -> 23.2μs (3.59% slower)

def test_large_tol_stops_after_one_step():
    # g returns > tol at initial, < tol after one step
    def f(t, y): return -1.0
    def g(t, y, *args): return y[0]
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [2.0], h=1.0, g=g, tol=1.5); sol = codeflash_output # 24.9μs -> 25.3μs (1.48% slower)

def test_jacobian_argument():
    # Pass jacobian and check no error
    def f(t, y): return -y
    def jac(t, y): return np.array([[-1.0]])
    ivp = IVP(f, jac=jac)
    codeflash_output = ivp.solve(0.0, [1.0], h=0.1, T=1.0); sol = codeflash_output # 97.2μs -> 82.7μs (17.5% faster)

# 3. LARGE SCALE TEST CASES

def test_long_time_integration():
    # Integrate dy/dt = -y, y(0)=1, for t in [0, 10] with h=0.01 (1000 steps)
    def f(t, y): return -y
    ivp = IVP(f)
    t0, y0, h, T = 0.0, [1.0], 0.01, 10.0
    codeflash_output = ivp.solve(t0, y0, h=h, T=T); sol = codeflash_output # 7.20ms -> 5.42ms (32.8% faster)

def test_high_dimensional_system():
    # dy/dt = -y, y in R^50, y(0) = np.arange(50)
    def f(t, y): return -y
    y0 = np.arange(50.0)
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, y0, h=0.5, T=5.0); sol = codeflash_output # 135μs -> 126μs (7.25% faster)
    # All values should decay exponentially
    t_final = sol[-1, 0]
    y_expected = y0 * np.exp(-t_final)

def test_large_variable_trajectory():
    # dy/dt = -1, y(0)=1000, stop when y < 0, h=1, should take 1000 steps
    def f(t, y): return -1.0
    def g(t, y, *args): return y[0]
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1000.0], h=1.0, g=g, tol=0.0); sol = codeflash_output # 6.43ms -> 4.71ms (36.7% faster)

def test_large_backward_integration():
    # dy/dt = y, y(0)=1, integrate to t=-10, h=-0.01 (1000 steps)
    def f(t, y): return y
    ivp = IVP(f)
    codeflash_output = ivp.solve(0.0, [1.0], h=-0.01, T=-10.0); sol = codeflash_output # 5.63ms -> 3.94ms (43.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-IVP.solve-mja75vkd and push.

Codeflash Static Badge

The optimized code achieves a 33% speedup primarily through **eliminating repeated array allocations** in the integration trajectory building process.

**Key Optimizations Applied:**

1. **Numba-accelerated array operations**: Added `@njit` compiled functions `_hstack_numba()` and `_vstack_numba()` that replace the expensive `np.hstack()` and `np.vstack()` calls. These eliminate Python function call overhead and optimize memory access patterns.

2. **Batch array construction strategy**: Instead of repeatedly calling `np.vstack()` to grow the solution array at each integration step (which creates new arrays each time), the optimized version:
   - Collects integration steps in a Python list during the loop
   - Performs a single bulk `np.vstack()` operation at the end
   - This reduces memory allocations from O(n²) to O(n) where n is the number of integration steps

**Why This Leads to Speedup:**

The original profiler results show that `np.hstack()` and `np.vstack()` operations consumed ~52% of total runtime (26.3% + 25.7% in `_integrate_fixed_trajectory`). The optimized version reduces this overhead significantly by:
- Faster compiled array operations via Numba
- Eliminating quadratic memory growth from repeated vstacking
- Reducing temporary object creation and garbage collection pressure

**Performance Impact by Test Cases:**

The optimization is most effective for:
- **Long integrations** (1000+ steps): 30-43% speedup as seen in `test_large_number_of_steps` and `test_large_backward_integration`
- **High-dimensional systems**: 16.5% speedup for 100-dimensional ODEs 
- **Variable trajectory cases**: 36-43% speedup when many steps are collected

The optimization maintains identical behavior and accuracy while providing consistent performance gains across all integration scenarios, with larger benefits for computationally intensive cases involving many integration steps.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 15:59
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant