Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 687% (6.87x) speedup for IVP._integrate_variable_trajectory in quantecon/_ivp.py

⏱️ Runtime : 2.00 seconds 255 milliseconds (best of 5 runs)

📝 Explanation and details

The optimization replaces the inefficient dynamic array growth pattern with a pre-allocated buffer and doubling strategy, delivering a 686% speedup.

Key Optimizations:

  1. Pre-allocated Buffer: Instead of starting with a single row and using np.vstack() for each step, the code pre-allocates a buffer of 1024 rows using np.empty(). This eliminates the O(n²) memory allocation overhead from repeated array concatenation.

  2. Exponential Growth Strategy: When the buffer fills up, it doubles in size rather than growing by one row. This ensures amortized O(1) insertion cost per step instead of O(n) cost for each np.vstack() operation.

  3. Direct Array Assignment: Values are written directly to pre-allocated positions (solution[row, 0] = self.t) rather than creating intermediate arrays with np.hstack() and concatenating them.

Performance Impact Analysis:

The line profiler shows the dramatic improvement - the original code spent 78.1% of time in np.vstack() operations (40 billion nanoseconds), while the optimized version eliminates this bottleneck entirely. The optimized version shows most time is now spent on the actual ODE integration (self.integrate() at 65.4%), which is the unavoidable computational work.

Test Case Performance:

  • Small trajectories (edge cases): 13-66% faster due to reduced overhead
  • Medium trajectories: 50-100% faster as buffer reallocation becomes less frequent
  • Large trajectories: 162-891% faster where the O(n²) vs O(n) difference dominates (e.g., test_large_scale_multivar shows 891% speedup)

The optimization is most effective for workloads with longer integration trajectories, making it particularly valuable for numerical simulations requiring many integration steps.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 57 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest
from quantecon._ivp import IVP
from scipy import integrate

# unit tests

# Helper ODE function: dy/dt = -y, analytical solution y(t) = y0 * exp(-t)
def simple_decay(t, y):
    return -y

# Helper event function: triggers when y < tol
def g_event(t, y, *args):
    return y[0]  # y is 1D array

# Helper ODE function: dy/dt = 1, y(t) = y0 + t
def linear_increase(t, y):
    return np.ones_like(y)

# Helper event: triggers when t > tol
def g_time_event(t, y, *args):
    return tol - t

# Helper ODE function: dy/dt = 0, constant
def zero_ode(t, y):
    return np.zeros_like(y)

# ---- Basic Test Cases ----

def test_basic_exponential_decay():
    """
    Basic: Test exponential decay, should stop when y < tol.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 0.1
    tol = 0.01
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    # Should stop when y < tol
    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 265μs -> 159μs (66.3% faster)
    # t should be increasing
    for i in range(1, sol.shape[0]):
        pass
    # y should be decreasing
    for i in range(1, sol.shape[0]):
        pass

def test_basic_linear_increase():
    """
    Basic: Test linear increase, event triggers on time.
    """
    y0 = np.array([0.0])
    t0 = 0.0
    h = 0.5
    global tol
    tol = 2.0  # event triggers when t > tol
    step = False
    relax = False

    ivp = IVP(linear_increase)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_time_event, tol, step, relax); sol = codeflash_output # 29.3μs -> 25.6μs (14.3% faster)
    # y should increase by ~1 per unit t
    for i in range(1, sol.shape[0]):
        pass

def test_edge_zero_step_size():
    """
    Edge: Zero step size should not progress.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 0.0
    tol = 0.5
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    # Should not progress if h == 0, i.e. only initial condition
    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 49.8μs -> 40.9μs (21.7% faster)

def test_edge_negative_step_size():
    """
    Edge: Negative step size, should integrate backward in time.
    """
    y0 = np.array([1.0])
    t0 = 1.0
    h = -0.1
    tol = 0.5
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 53.3ms -> 19.8ms (169% faster)
    # t should be decreasing
    for i in range(1, sol.shape[0]):
        pass

def test_edge_multidimensional_y():
    """
    Edge: Test with multidimensional y (system of ODEs).
    """
    def coupled_ode(t, y):
        return np.array([-y[0], -2*y[1]])
    def g_sum(t, y, *args):
        return y[0] + y[1]
    y0 = np.array([1.0, 2.0])
    t0 = 0.0
    h = 0.1
    tol = 0.1
    step = False
    relax = False

    ivp = IVP(coupled_ode)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_sum, tol, step, relax); sol = codeflash_output # 187μs -> 128μs (45.9% faster)

def test_edge_event_triggers_immediately():
    """
    Edge: Event triggers at initial condition.
    """
    y0 = np.array([0.0])
    t0 = 0.0
    h = 0.1
    tol = 0.1
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 15.7μs -> 12.0μs (30.9% faster)

# ---- Large Scale Test Cases ----

def test_large_scale_many_steps():
    """
    Large scale: Integrate over many steps, check for performance and correctness.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 0.01
    tol = 1e-5
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 5.51ms -> 2.80ms (96.6% faster)

def test_large_scale_high_dimensional():
    """
    Large scale: Test with high-dimensional y.
    """
    n = 50
    def decay_n(t, y):
        return -y
    def g_sum_n(t, y, *args):
        return sum(y)
    y0 = np.ones(n)
    t0 = 0.0
    h = 0.1
    tol = 1e-2
    step = False
    relax = False

    ivp = IVP(decay_n)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_sum_n, tol, step, relax); sol = codeflash_output # 797μs -> 511μs (55.8% faster)
    # All y should be positive and decreasing
    for i in range(1, sol.shape[0]):
        for j in range(1, n+1):
            pass

def test_large_scale_long_time():
    """
    Large scale: Integrate over long time period.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 1.0
    tol = 1e-15
    step = False
    relax = False

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    codeflash_output = ivp._integrate_variable_trajectory(h, g_event, tol, step, relax); sol = codeflash_output # 250μs -> 189μs (32.3% faster)

# ---- Determinism and Robustness ----

def test_determinism():
    """
    Test that running the same ODE twice gives the same result.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 0.1
    tol = 0.01
    step = False
    relax = False

    ivp1 = IVP(simple_decay)
    ivp1.set_initial_value(y0, t0)
    ivp1.f_params = ()
    codeflash_output = ivp1._integrate_variable_trajectory(h, g_event, tol, step, relax); sol1 = codeflash_output # 261μs -> 160μs (62.3% faster)

    ivp2 = IVP(simple_decay)
    ivp2.set_initial_value(y0, t0)
    ivp2.f_params = ()
    codeflash_output = ivp2._integrate_variable_trajectory(h, g_event, tol, step, relax); sol2 = codeflash_output # 252μs -> 155μs (62.6% faster)

# ---- Error Handling ----

def test_invalid_g_function_raises():
    """
    If g raises an exception, it should propagate.
    """
    y0 = np.array([1.0])
    t0 = 0.0
    h = 0.1
    tol = 0.5
    step = False
    relax = False

    def g_raises(t, y, *args):
        raise ValueError("g failed")

    ivp = IVP(simple_decay)
    ivp.set_initial_value(y0, t0)
    ivp.f_params = ()

    with pytest.raises(ValueError):
        ivp._integrate_variable_trajectory(h, g_raises, tol, step, relax) # 20.9μs -> 17.2μs (21.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest
from quantecon._ivp import IVP
from scipy import integrate

# unit tests

# Basic ODE: dy/dt = -y, y(0) = 1, analytical solution y(t) = exp(-t)
def f_basic(t, y):
    return -y

def g_basic(t, y, *args):
    # Stop when y < tol
    return y[0]

@pytest.fixture
def basic_ivp():
    # Setup IVP for dy/dt = -y, y(0)=1
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    return ivp

def test_basic_exponential_decay(basic_ivp):
    # Test that the solution decays and stops at the right tolerance
    h = 0.1
    tol = 0.05
    step = False
    relax = False
    codeflash_output = basic_ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 176μs -> 110μs (59.6% faster)
    # t should be strictly increasing
    for i in range(1, sol.shape[0]):
        pass

def test_basic_trajectory_length(basic_ivp):
    # Test that the number of steps is as expected for a simple decay
    h = 0.2
    tol = 0.01
    step = False
    relax = False
    codeflash_output = basic_ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 156μs -> 103μs (50.5% faster)

# Edge case: ODE with zero dynamics (dy/dt = 0)
def f_zero(t, y):
    return np.zeros_like(y)

def g_zero(t, y, *args):
    # Stop when y < tol (should be immediate if y0 < tol)
    return y[0]

def test_zero_dynamics_stop_immediately():
    ivp = IVP(f_zero)
    ivp.set_initial_value(np.array([0.005]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_zero, tol, step, relax); sol = codeflash_output # 19.9μs -> 17.5μs (13.8% faster)

def test_zero_dynamics_multiple_steps():
    ivp = IVP(f_zero)
    ivp.set_initial_value(np.array([0.1]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    # Should never reach tol, so should terminate by unsuccessful integration (simulate max steps)
    # To avoid infinite loop, we forcibly break after 10 steps by monkeypatching successful()
    count = [0]
    orig_successful = ivp.successful
    def limited_successful():
        count[0] += 1
        return count[0] <= 10
    ivp.successful = limited_successful
    codeflash_output = ivp._integrate_variable_trajectory(h, g_zero, tol, step, relax); sol = codeflash_output # 61.9μs -> 39.8μs (55.6% faster)
    # All y values should be 0.1
    for i in range(sol.shape[0]):
        pass

# Edge case: ODE with multiple variables
def f_multivar(t, y):
    # dy1/dt = -y1, dy2/dt = y2
    return np.array([-y[0], y[1]])

def g_multivar(t, y, *args):
    # Stop when sum(abs(y)) < tol
    return abs(y[0]) + abs(y[1])

def test_multivar_trajectory():
    ivp = IVP(f_multivar)
    ivp.set_initial_value(np.array([1.0, 2.0]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_multivar, tol, step, relax); sol = codeflash_output # 59.2ms -> 22.6ms (162% faster)

# Edge case: Tolerance is zero (should only stop if g returns zero)
def test_zero_tolerance():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.0
    step = False
    relax = False
    # Should only stop if y hits exactly zero, which never happens
    # To avoid infinite loop, forcibly limit steps
    count = [0]
    orig_successful = ivp.successful
    def limited_successful():
        count[0] += 1
        return count[0] <= 10
    ivp.successful = limited_successful
    codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 74.5μs -> 49.8μs (49.6% faster)

# Large scale: Many steps, but not exceeding 1000
def test_large_scale():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    h = 0.01
    tol = 1e-6
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 6.74ms -> 3.40ms (98.4% faster)

# Large scale: Multi-variable, many steps
def test_large_scale_multivar():
    ivp = IVP(f_multivar)
    ivp.set_initial_value(np.array([1.0, 1.0]), 0.0)
    ivp.f_params = ()
    h = 0.01
    tol = 1e-6
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_multivar, tol, step, relax); sol = codeflash_output # 1.82s -> 183ms (891% faster)

# Edge case: g returns negative (should stop immediately)
def g_negative(t, y, *args):
    return -1.0

def test_g_negative_stops_immediately():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_negative, tol, step, relax); sol = codeflash_output # 27.5μs -> 17.0μs (61.6% faster)

# Edge case: g returns NaN (should not break, but never < tol)
def g_nan(t, y, *args):
    return float('nan')

def test_g_nan_never_stops():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    # To avoid infinite loop, forcibly limit steps
    count = [0]
    orig_successful = ivp.successful
    def limited_successful():
        count[0] += 1
        return count[0] <= 10
    ivp.successful = limited_successful
    codeflash_output = ivp._integrate_variable_trajectory(h, g_nan, tol, step, relax); sol = codeflash_output # 76.9μs -> 51.0μs (50.7% faster)

# Edge case: y is negative, g checks abs(y)
def g_abs(t, y, *args):
    return abs(y[0])

def f_neg(t, y):
    return -1.0

def f_list(t, y):
    return [-y[0]]

def test_y_as_list():
    ivp = IVP(f_list)
    ivp.set_initial_value([1.0], 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 288μs -> 178μs (61.9% faster)

# Edge case: y is scalar, not array
def f_scalar(t, y):
    return -y

def g_scalar(t, y, *args):
    return y

def test_y_as_scalar():
    ivp = IVP(f_scalar)
    ivp.set_initial_value(1.0, 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_scalar, tol, step, relax); sol = codeflash_output # 291μs -> 190μs (53.0% faster)

# Edge case: step and relax flags
def test_step_and_relax_flags():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    # Try all combinations of step and relax
    for step in [True, False]:
        for relax in [True, False]:
            codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output

# Edge case: h is negative (should move backward in time)
def test_negative_h():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([1.0]), 1.0)
    ivp.f_params = ()
    h = -0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 53.3ms -> 20.1ms (166% faster)
    # t should be decreasing
    for i in range(1, sol.shape[0]):
        pass

# Edge case: initial y is exactly tol
def test_initial_y_equals_tol():
    ivp = IVP(f_basic)
    ivp.set_initial_value(np.array([0.01]), 0.0)
    ivp.f_params = ()
    h = 0.1
    tol = 0.01
    step = False
    relax = False
    codeflash_output = ivp._integrate_variable_trajectory(h, g_basic, tol, step, relax); sol = codeflash_output # 21.2μs -> 17.4μs (21.8% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-IVP._integrate_variable_trajectory-mja6uore and push.

Codeflash Static Badge

The optimization replaces the inefficient dynamic array growth pattern with a pre-allocated buffer and doubling strategy, delivering a **686% speedup**.

**Key Optimizations:**

1. **Pre-allocated Buffer**: Instead of starting with a single row and using `np.vstack()` for each step, the code pre-allocates a buffer of 1024 rows using `np.empty()`. This eliminates the O(n²) memory allocation overhead from repeated array concatenation.

2. **Exponential Growth Strategy**: When the buffer fills up, it doubles in size rather than growing by one row. This ensures amortized O(1) insertion cost per step instead of O(n) cost for each `np.vstack()` operation.

3. **Direct Array Assignment**: Values are written directly to pre-allocated positions (`solution[row, 0] = self.t`) rather than creating intermediate arrays with `np.hstack()` and concatenating them.

**Performance Impact Analysis:**

The line profiler shows the dramatic improvement - the original code spent 78.1% of time in `np.vstack()` operations (40 billion nanoseconds), while the optimized version eliminates this bottleneck entirely. The optimized version shows most time is now spent on the actual ODE integration (`self.integrate()` at 65.4%), which is the unavoidable computational work.

**Test Case Performance:**

- **Small trajectories** (edge cases): 13-66% faster due to reduced overhead
- **Medium trajectories**: 50-100% faster as buffer reallocation becomes less frequent
- **Large trajectories**: 162-891% faster where the O(n²) vs O(n) difference dominates (e.g., `test_large_scale_multivar` shows 891% speedup)

The optimization is most effective for workloads with longer integration trajectories, making it particularly valuable for numerical simulations requiring many integration steps.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 15:50
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants