⚡️ Speed up function `rouwenhorst` by 105% #55

codeflash-ai · 2025-12-17T13:16:51Z

📄 105% (1.05x) speedup for `rouwenhorst` in `quantecon/markov/approximation.py`

⏱️ Runtime : 444 milliseconds → 217 milliseconds (best of 38 runs)

📝 Explanation and details

The optimization replaces the recursive Python implementation of the transition matrix construction with a Numba JIT-compiled version, delivering a 2.05x speedup (105% faster).

Key optimizations applied:

Numba JIT compilation: The core _row_build_mat_numba function uses @njit(cache=True) to compile the recursive matrix construction to machine code, eliminating Python interpreter overhead for the computationally intensive loops.
Explicit loop construction: Instead of NumPy's fancy indexing (e.g., p1[:n-1, :n-1] = p * new_mat), the optimized version uses explicit nested loops that Numba can optimize more effectively.
Memory layout optimization: Pre-allocating arrays with specific dtype (np.float64) and using manual element assignment provides better cache locality and avoids temporary array creation.

Why this leads to speedup:

The original code spent 99.4% of execution time in the recursive row_build_mat function performing array slicing and broadcasting operations
Numba compiles these operations to optimized machine code, eliminating Python function call overhead and NumPy's intermediate array allocations
The recursive nature amplifies the benefit since the optimization applies at every recursion level

Performance benefits by test case:

Small n (2-4 states): 1-33% faster due to JIT compilation overhead being relatively small
Medium n (20-100 states): 136-186% faster as recursion depth increases
Large n (500 states): 102-106% faster, hitting the sweet spot where JIT overhead is amortized

Workload impact:
Based on the function references, rouwenhorst is used in Markov chain approximation tests and setup methods that run repeatedly. The optimization particularly benefits:

Monte Carlo simulations requiring many Markov chains
Parameter sensitivity analysis with varying state space sizes
Academic/research workflows processing multiple AR(1) approximations

The optimization maintains identical behavior including the recursion limit check, ensuring backward compatibility while dramatically improving performance for the computational bottleneck.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 41 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

from math import sqrt

import numpy as np
# imports
import pytest
from quantecon.markov.approximation import rouwenhorst

# Minimal MarkovChain class for testing
class MarkovChain:
    def __init__(self, P, s):
        self.P = P
        self.s = s
from quantecon.markov.approximation import rouwenhorst

# unit tests

# -------- BASIC TEST CASES --------
def test_basic_two_states():
    # Test with n=2, rho=0, sigma=1, mu=0
    codeflash_output = rouwenhorst(2, 0, 1, 0); mc = codeflash_output # 35.3μs -> 34.5μs (2.30% faster)

def test_basic_three_states():
    # Test with n=3, rho=0.5, sigma=1, mu=0
    codeflash_output = rouwenhorst(3, 0.5, 1, 0); mc = codeflash_output # 40.0μs -> 33.1μs (20.8% faster)
    # Check that each row sums to 1
    for i in range(3):
        pass

def test_basic_nonzero_mu():
    # Test with nonzero mu
    codeflash_output = rouwenhorst(4, 0.8, 2, mu=5); mc = codeflash_output # 43.9μs -> 33.0μs (33.1% faster)
    # States should be shifted by mu/(1 - rho)
    shift = 5 / (1 - 0.8)
    # Check each row sums to 1
    for i in range(4):
        pass

def test_basic_rho_negative():
    # Test with negative rho
    codeflash_output = rouwenhorst(3, -0.5, 1, 0); mc = codeflash_output # 37.8μs -> 31.9μs (18.4% faster)
    # p=q=(1+rho)/2=0.25
    expected_P = np.array([
        [0.25, 0.75, 0.  ],
        [0.375, 0.25, 0.375],
        [0.  , 0.75, 0.25]
    ])

# -------- EDGE TEST CASES --------
def test_edge_n_equals_2():
    # Smallest valid n
    codeflash_output = rouwenhorst(2, 0.9, 0.5); mc = codeflash_output # 31.5μs -> 31.1μs (1.47% faster)
    # p=q=(1+0.9)/2=0.95
    expected_P = np.array([[0.95, 0.05], [0.05, 0.95]])
    # States
    y_sd = sqrt(0.5**2 / (1 - 0.9**2))
    psi = y_sd * sqrt(2 - 1)

def test_edge_rho_near_1():
    # rho very close to 1
    rho = 0.999
    sigma = 1
    n = 5
    codeflash_output = rouwenhorst(n, rho, sigma); mc = codeflash_output # 47.8μs -> 32.8μs (45.6% faster)
    # States should be spread far apart (since y_sd is large)
    y_sd = sqrt(sigma**2 / (1 - rho**2))
    psi = y_sd * sqrt(n - 1)

def test_edge_sigma_zero():
    # sigma=0, so all states should be the same
    codeflash_output = rouwenhorst(3, 0.5, 0); mc = codeflash_output # 55.8μs -> 47.7μs (17.0% faster)
    # Transition matrix still valid
    for i in range(3):
        pass

def test_edge_invalid_n():
    # n < 2 should raise ValueError
    with pytest.raises(ValueError):
        rouwenhorst(1, 0.5, 1) # 13.7μs -> 16.5μs (16.7% slower)
    with pytest.raises(ValueError):
        rouwenhorst(0, 0.5, 1) # 10.5μs -> 11.1μs (5.64% slower)
    with pytest.raises(ValueError):
        rouwenhorst(-5, 0.5, 1) # 4.08μs -> 4.08μs (0.000% faster)

def test_edge_large_negative_mu():
    # Large negative mu
    codeflash_output = rouwenhorst(3, 0.5, 1, mu=-1000); mc = codeflash_output # 53.8μs -> 46.2μs (16.4% faster)
    shift = -1000 / (1 - 0.5)

# -------- LARGE SCALE TEST CASES --------
def test_large_scale_n_50():
    # Test with n=50
    codeflash_output = rouwenhorst(50, 0.9, 2); mc = codeflash_output # 482μs -> 185μs (160% faster)
    # Each row sums to 1
    for i in range(50):
        pass

def test_large_scale_n_500():
    # Test with n=500
    codeflash_output = rouwenhorst(500, 0.95, 1); mc = codeflash_output # 212ms -> 105ms (102% faster)
    # Each row sums to 1
    for i in range(500):
        pass

def test_large_scale_extreme_sigma():
    # Large sigma
    codeflash_output = rouwenhorst(20, 0.5, 100); mc = codeflash_output # 175μs -> 74.5μs (136% faster)

def test_large_scale_extreme_rho():
    # rho very close to 1, n=100
    codeflash_output = rouwenhorst(100, 0.9999, 1); mc = codeflash_output # 2.46ms -> 1.01ms (143% faster)

import warnings

import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon.markov.approximation import rouwenhorst

# Function to test (copied as per instructions)
class MarkovChain:
    """
    Minimal MarkovChain class for testing.
    Stores transition matrix and state values.
    """
    def __init__(self, P, s):
        self.P = P
        self.state_values = s
from quantecon.markov.approximation import rouwenhorst

# unit tests

# ---- Basic Test Cases ----

def test_basic_two_states():
    # Test for n=2, rho=0.5, sigma=1.0, mu=0.0
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(2, 0.5, 1.0); mc = codeflash_output # 94.4μs -> 47.3μs (99.5% faster)
    # Transition matrix should be valid probabilities
    for row in mc.P:
        pass

def test_basic_three_states():
    # Test for n=3, rho=0.0, sigma=2.0, mu=0.0
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, 0.0, 2.0); mc = codeflash_output # 47.5μs -> 35.4μs (34.2% faster)
    # Transition matrix should be valid probabilities
    for row in mc.P:
        pass

def test_basic_mu_shift():
    # Test mu shifting the mean of state values
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(2, 0.5, 1.0, mu=0.0); mc1 = codeflash_output # 34.3μs -> 31.8μs (7.72% faster)
        codeflash_output = rouwenhorst(2, 0.5, 1.0, mu=2.0); mc2 = codeflash_output # 26.5μs -> 24.8μs (6.89% faster)
    # The mean of state_values should shift by mu/(1-rho)
    mean1 = np.mean(mc1.state_values)
    mean2 = np.mean(mc2.state_values)
    expected_shift = 2.0 / (1 - 0.5)

def test_basic_rho_extremes():
    # Test for rho near 1 and -1
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(2, 0.99, 1.0); mc_pos = codeflash_output # 31.4μs -> 30.0μs (4.58% faster)
        codeflash_output = rouwenhorst(2, -0.99, 1.0); mc_neg = codeflash_output # 25.9μs -> 24.2μs (7.24% faster)

# ---- Edge Test Cases ----

def test_edge_n_equals_2():
    # Edge case: n=2 (minimum allowed)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(2, 0.5, 1.0); mc = codeflash_output # 30.5μs -> 29.6μs (3.10% faster)

def test_edge_n_too_small():
    # Edge case: n=1 (should raise ValueError)
    with pytest.raises(ValueError):
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            rouwenhorst(1, 0.5, 1.0)

def test_edge_sigma_zero():
    # Edge case: sigma=0 (no noise)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, 0.5, 0.0); mc = codeflash_output # 38.8μs -> 32.8μs (18.4% faster)

def test_edge_rho_zero():
    # Edge case: rho=0 (no autocorrelation)
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(4, 0.0, 1.0); mc = codeflash_output # 42.5μs -> 31.2μs (36.0% faster)

def test_edge_rho_one_minus_epsilon():
    # Edge case: rho very close to 1, but not exactly 1
    rho = 1 - 1e-10
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, rho, 1.0); mc = codeflash_output # 36.6μs -> 30.2μs (21.1% faster)

def test_edge_rho_negative_one_plus_epsilon():
    # Edge case: rho very close to -1, but not exactly -1
    rho = -1 + 1e-10
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, rho, 1.0); mc = codeflash_output # 36.1μs -> 29.8μs (21.3% faster)

def test_edge_large_negative_mu():
    # Edge case: large negative mu
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, 0.5, 1.0, mu=-1000.0); mc = codeflash_output # 36.2μs -> 29.8μs (21.5% faster)

def test_edge_large_positive_mu():
    # Edge case: large positive mu
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(3, 0.5, 1.0, mu=1000.0); mc = codeflash_output # 35.5μs -> 29.8μs (19.5% faster)

def test_large_n_100():
    # Large n=100, moderate rho and sigma
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(100, 0.9, 1.0); mc = codeflash_output # 2.88ms -> 1.01ms (186% faster)
    # Probabilities should be valid
    for row in mc.P:
        pass

def test_large_n_500():
    # Large n=500, check performance and correctness
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(500, 0.8, 2.0); mc = codeflash_output # 216ms -> 105ms (106% faster)
    # Probabilities should be valid
    for row in mc.P:
        pass

def test_large_scale_state_values_monotonic():
    # For large n and rho=0, state values should be evenly spaced and monotonic
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(100, 0.0, 1.0); mc = codeflash_output # 2.96ms -> 1.06ms (179% faster)
    diffs = np.diff(mc.state_values)

def test_large_scale_state_values_shifted():
    # For large n and mu, mean should shift accordingly
    mu = 1000.0
    rho = 0.5
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(100, rho, 1.0, mu=mu); mc = codeflash_output # 2.39ms -> 948μs (152% faster)
    expected_mean = mu / (1 - rho)

# ---- Additional Robustness ----

@pytest.mark.parametrize("n,rho,sigma,mu", [
    (2, 0.1, 0.1, 0.0),
    (3, 0.9, 2.0, 1.0),
    (10, -0.5, 0.5, -1.0),
    (50, 0.7, 1.5, 10.0),
    (100, 0.0, 10.0, 0.0),
])
def test_parametrized_various_inputs(n, rho, sigma, mu):
    # Parametrized test for a variety of valid inputs
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        codeflash_output = rouwenhorst(n, rho, sigma, mu); mc = codeflash_output # 2.94ms -> 1.18ms (149% faster)
    # Probabilities valid
    for row in mc.P:
        pass

def test_warning_is_raised():
    # Ensure UserWarning is raised for API change
    with pytest.warns(UserWarning):
        rouwenhorst(2, 0.5, 1.0) # 44.0μs -> 41.1μs (6.89% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-rouwenhorst-mja1cn83 and push.

The optimization replaces the recursive Python implementation of the transition matrix construction with a **Numba JIT-compiled version**, delivering a **2.05x speedup** (105% faster). **Key optimizations applied:** 1. **Numba JIT compilation**: The core `_row_build_mat_numba` function uses `@njit(cache=True)` to compile the recursive matrix construction to machine code, eliminating Python interpreter overhead for the computationally intensive loops. 2. **Explicit loop construction**: Instead of NumPy's fancy indexing (e.g., `p1[:n-1, :n-1] = p * new_mat`), the optimized version uses explicit nested loops that Numba can optimize more effectively. 3. **Memory layout optimization**: Pre-allocating arrays with specific dtype (`np.float64`) and using manual element assignment provides better cache locality and avoids temporary array creation. **Why this leads to speedup:** - The original code spent 99.4% of execution time in the recursive `row_build_mat` function performing array slicing and broadcasting operations - Numba compiles these operations to optimized machine code, eliminating Python function call overhead and NumPy's intermediate array allocations - The recursive nature amplifies the benefit since the optimization applies at every recursion level **Performance benefits by test case:** - Small n (2-4 states): 1-33% faster due to JIT compilation overhead being relatively small - Medium n (20-100 states): 136-186% faster as recursion depth increases - Large n (500 states): 102-106% faster, hitting the sweet spot where JIT overhead is amortized **Workload impact:** Based on the function references, `rouwenhorst` is used in Markov chain approximation tests and setup methods that run repeatedly. The optimization particularly benefits: - Monte Carlo simulations requiring many Markov chains - Parameter sensitivity analysis with varying state space sizes - Academic/research workflows processing multiple AR(1) approximations The optimization maintains identical behavior including the recursion limit check, ensuring backward compatibility while dramatically improving performance for the computational bottleneck.

codeflash-ai bot requested a review from aseembits93 December 17, 2025 13:16

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025

misrasaurabh1 approved these changes Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `rouwenhorst` by 105% #55

⚡️ Speed up function `rouwenhorst` by 105% #55

Uh oh!

codeflash-ai bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function rouwenhorst by 105% #55

Are you sure you want to change the base?

⚡️ Speed up function rouwenhorst by 105% #55

Uh oh!

Conversation

codeflash-ai bot commented Dec 17, 2025

📄 105% (1.05x) speedup for rouwenhorst in quantecon/markov/approximation.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡️ Speed up function `rouwenhorst` by 105% #55

⚡️ Speed up function `rouwenhorst` by 105% #55

📄 105% (1.05x) speedup for `rouwenhorst` in `quantecon/markov/approximation.py`