Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 616% (6.16x) speedup for draw in quantecon/random/utilities.py

⏱️ Runtime : 1.36 milliseconds 190 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 6.15x speedup by replacing the pure Python loop-based search with Numba JIT-compiled binary search functions.

Key Optimizations:

  1. JIT Compilation with Numba: Added @njit(cache=True) decorators to create compiled binary search functions (_searchsorted_jit and _draw_jit) that execute at near-C speeds instead of interpreted Python.

  2. Custom Binary Search: Replaced the original searchsorted function calls with a custom binary search implementation that's optimized for Numba compilation, reducing algorithmic complexity from O(n) linear search to O(log n) binary search.

  3. Vectorized Processing: The _draw_jit function processes all random samples in a single compiled function call, eliminating the Python loop overhead from the original implementation.

Performance Impact:

  • Large arrays benefit most: Tests show 1000%+ speedups for large CDFs (1000+ elements) with multiple draws
  • Multiple draws see significant gains: 77-2066% faster for batch operations (size > 1)
  • Single draws have modest overhead: 3-10% slower due to JIT compilation and np.asarray() conversion costs
  • Small arrays (< 10 elements): Mixed results due to compilation overhead vs. search benefits

Hot Path Benefits:
Based on the function references showing draw_jitted usage in test files, this function appears to be used in Monte Carlo simulations and random sampling workflows where it would be called repeatedly. The JIT compilation cost is amortized over multiple calls, and the O(log n) vs O(n) algorithmic improvement becomes significant for larger probability distributions.

The optimization is most effective for workloads involving repeated sampling from moderate-to-large CDFs, which are common in quantitative economics applications.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 121 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon.random.utilities import draw

# function to test
def searchsorted(cdf, value):
    """
    Pure Python implementation of np.searchsorted for 1D cdf arrays.
    Returns the first index where cdf[index] > value.
    """
    for i, v in enumerate(cdf):
        if value < v:
            return i
    return len(cdf)
from quantecon.random.utilities import draw

# unit tests

# ---- BASIC TEST CASES ----

def test_draw_single_basic():
    """Test single draw from a simple two-element CDF."""
    cdf = np.array([0.4, 1.0])
    # Since np.random is stochastic, test output is always 0 or 1
    codeflash_output = draw(cdf); result = codeflash_output # 1.42μs -> 1.42μs (0.000% faster)

def test_draw_multiple_basic():
    """Test multiple draws from a simple two-element CDF."""
    cdf = np.array([0.4, 1.0])
    size = 10
    codeflash_output = draw(cdf, size); result = codeflash_output # 4.67μs -> 2.62μs (77.8% faster)

def test_draw_three_element_cdf():
    """Test draw from a three-element CDF."""
    cdf = np.array([0.2, 0.7, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 1.12μs -> 1.17μs (3.60% slower)

def test_draw_multiple_three_element_cdf():
    """Test multiple draws from a three-element CDF."""
    cdf = np.array([0.2, 0.7, 1.0])
    size = 20
    codeflash_output = draw(cdf, size); result = codeflash_output # 6.04μs -> 2.33μs (159% faster)

def test_draw_cdf_uniform():
    """Test draw from a uniform CDF."""
    cdf = np.array([0.25, 0.5, 0.75, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 1.04μs -> 1.08μs (3.88% slower)

# ---- EDGE TEST CASES ----

def test_draw_cdf_single_element():
    """Edge: CDF with a single element (should always return 0)."""
    cdf = np.array([1.0])
    for _ in range(10):
        codeflash_output = draw(cdf) # 4.21μs -> 4.58μs (8.16% slower)
        codeflash_output = draw(cdf, 5); arr = codeflash_output # 17.2μs -> 9.46μs (82.0% faster)

def test_draw_cdf_all_ones():
    """Edge: CDF with all elements equal to 1 (degenerate, but valid)."""
    cdf = np.array([1.0, 1.0, 1.0])
    for _ in range(10):
        codeflash_output = draw(cdf); result = codeflash_output # 4.16μs -> 4.50μs (7.47% slower)
        codeflash_output = draw(cdf, 10); arr = codeflash_output # 26.2μs -> 9.21μs (184% faster)

def test_draw_cdf_not_ending_at_one():
    """Edge: CDF not ending at 1.0 (invalid CDF). Should always return len(cdf) if r >= cdf[-1]."""
    cdf = np.array([0.2, 0.5, 0.8])
    # Since np.random.random() in [0,1), but cdf[-1]=0.8, some draws may return 3.
    found_3 = False
    for _ in range(100):
        codeflash_output = draw(cdf); result = codeflash_output # 1.42μs -> 1.46μs (2.88% slower)
        if result == 3:
            found_3 = True
            break

def test_draw_cdf_with_zeros():
    """Edge: CDF with leading zeros."""
    cdf = np.array([0.0, 0.0, 0.5, 1.0])
    for _ in range(20):
        codeflash_output = draw(cdf); result = codeflash_output # 7.29μs -> 7.58μs (3.89% slower)

def test_draw_cdf_with_duplicates():
    """Edge: CDF with duplicate values (flat regions)."""
    cdf = np.array([0.2, 0.5, 0.5, 1.0])
    for _ in range(20):
        codeflash_output = draw(cdf); result = codeflash_output # 7.37μs -> 7.83μs (5.90% slower)

def test_draw_size_zero():
    """Edge: size=0 should return empty array."""
    cdf = np.array([0.2, 0.8, 1.0])
    codeflash_output = draw(cdf, 0); result = codeflash_output # 1.46μs -> 2.04μs (28.6% slower)

def test_draw_size_negative():
    """Edge: size < 0 should raise ValueError or produce empty array (depends on implementation)."""
    cdf = np.array([0.2, 0.8, 1.0])
    with pytest.raises(ValueError):
        draw(cdf, -5) # 1.75μs -> 1.75μs (0.000% faster)

def test_draw_non_integer_size():
    """Edge: Non-integer size should draw a single sample."""
    cdf = np.array([0.3, 0.8, 1.0])
    codeflash_output = draw(cdf, size=None); result = codeflash_output # 1.17μs -> 1.25μs (6.64% slower)
    codeflash_output = draw(cdf, size="foo"); result = codeflash_output # 500ns -> 458ns (9.17% faster)

def test_draw_large_cdf_single():
    """Large scale: Draw from a large CDF, single sample."""
    cdf = np.linspace(0, 1, 1000)
    codeflash_output = draw(cdf); result = codeflash_output # 1.46μs -> 1.62μs (10.3% slower)

def test_draw_large_cdf_multiple():
    """Large scale: Draw multiple samples from a large CDF."""
    cdf = np.linspace(0, 1, 1000)
    size = 500
    codeflash_output = draw(cdf, size); result = codeflash_output # 117μs -> 10.1μs (1062% faster)

def test_draw_large_cdf_distribution():
    """Large scale: Check empirical distribution approximates uniform for uniform CDF."""
    cdf = np.linspace(0, 1, 1000)
    size = 1000
    codeflash_output = draw(cdf, size); result = codeflash_output # 231μs -> 16.1μs (1339% faster)
    # For uniform CDF, indices should be roughly uniformly distributed
    counts = np.bincount(result, minlength=1001)

def test_draw_large_cdf_performance():
    """Large scale: Performance test for many draws (should not error or timeout)."""
    cdf = np.linspace(0, 1, 1000)
    size = 1000
    codeflash_output = draw(cdf, size); result = codeflash_output # 232μs -> 15.8μs (1369% faster)

# ---- DETERMINISM AND RANDOMNESS ----

def test_draw_randomness():
    """Test that repeated draws produce varying results (not deterministic)."""
    cdf = np.array([0.3, 0.7, 1.0])
    results = [draw(cdf) for _ in range(10)] # 1.12μs -> 1.21μs (6.87% slower)

def test_draw_distribution_bias():
    """Test that output indices roughly match CDF probabilities."""
    cdf = np.array([0.2, 0.8, 1.0])
    size = 500
    codeflash_output = draw(cdf, size); result = codeflash_output # 96.5μs -> 4.46μs (2066% faster)
    # Empirical probabilities
    p0 = np.mean(result == 0)
    p1 = np.mean(result == 1)
    p2 = np.mean(result == 2)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon.random.utilities import draw

# function to test
def searchsorted(cdf, r):
    # Replicates np.searchsorted(cdf, r, side='right')
    # Returns the index where r should be inserted to maintain order
    # side='right' means returns first index where r < cdf[i]
    for i, v in enumerate(cdf):
        if r < v:
            return i
    return len(cdf)
from quantecon.random.utilities import draw

# unit tests

# ----------- Basic Test Cases -----------

def test_draw_single_basic():
    """Test draw with a simple two-element CDF, single draw."""
    cdf = np.array([0.4, 1.0])
    # The only possible outputs are 0 or 1
    codeflash_output = draw(cdf); result = codeflash_output # 1.08μs -> 1.12μs (3.73% slower)

def test_draw_multiple_basic():
    """Test draw with size parameter for multiple draws."""
    cdf = np.array([0.4, 1.0])
    codeflash_output = draw(cdf, size=10); result = codeflash_output # 4.46μs -> 2.42μs (84.4% faster)

def test_draw_three_element_cdf():
    """Test draw with three-element CDF."""
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf, size=20); result = codeflash_output # 6.00μs -> 2.29μs (162% faster)

def test_draw_cdf_sum_to_one():
    """Test that draw works even if CDF does not end exactly at 1.0 (but is close)."""
    cdf = np.array([0.3, 0.7, 0.999999])
    codeflash_output = draw(cdf, size=10); result = codeflash_output # 4.04μs -> 2.12μs (90.2% faster)

def test_draw_return_type_scalar():
    """Test that draw returns int for scalar draw."""
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 1.04μs -> 1.12μs (7.38% slower)

def test_draw_return_type_array():
    """Test that draw returns ndarray for size > 1."""
    cdf = np.array([0.5, 1.0])
    codeflash_output = draw(cdf, size=5); result = codeflash_output # 3.12μs -> 2.08μs (50.0% faster)

# ----------- Edge Test Cases -----------

def test_draw_cdf_all_ones():
    """Test draw with CDF that is all ones."""
    cdf = np.array([1.0, 1.0, 1.0])
    # All random numbers < 1.0, so always index 0
    codeflash_output = draw(cdf, size=20); result = codeflash_output # 5.88μs -> 2.12μs (176% faster)

def test_draw_cdf_single_element():
    """Test draw with CDF of length 1."""
    cdf = np.array([1.0])
    # Only possible output is 0
    codeflash_output = draw(cdf, size=10); result = codeflash_output # 4.04μs -> 2.08μs (94.0% faster)

def test_draw_cdf_zero_first():
    """Test draw with CDF starting at zero."""
    cdf = np.array([0.0, 0.5, 1.0])
    codeflash_output = draw(cdf, size=20); result = codeflash_output # 5.88μs -> 2.08μs (182% faster)

def test_draw_cdf_with_duplicates():
    """Test draw with CDF containing repeated values."""
    cdf = np.array([0.3, 0.3, 1.0])
    codeflash_output = draw(cdf, size=20); result = codeflash_output # 6.58μs -> 2.88μs (129% faster)

def test_draw_cdf_with_inf():
    """Test draw with CDF containing np.inf."""
    cdf = np.array([0.5, np.inf])
    codeflash_output = draw(cdf, size=10); result = codeflash_output # 4.75μs -> 2.88μs (65.2% faster)

def test_draw_large_size():
    """Test draw with large size (performance and correctness)."""
    cdf = np.linspace(0.01, 1.0, 10)
    codeflash_output = draw(cdf, size=1000); result = codeflash_output # 209μs -> 17.0μs (1130% faster)

def test_draw_large_cdf():
    """Test draw with a large CDF (performance and correctness)."""
    cdf = np.linspace(0.001, 1.0, 1000)
    codeflash_output = draw(cdf, size=100); result = codeflash_output # 25.0μs -> 4.04μs (517% faster)

def test_draw_large_cdf_and_size():
    """Test draw with both large CDF and large sample size."""
    cdf = np.linspace(0.001, 1.0, 1000)
    codeflash_output = draw(cdf, size=999); result = codeflash_output # 231μs -> 16.0μs (1353% faster)

def test_draw_uniform_distribution():
    """Test draw with CDF representing uniform distribution."""
    probs = np.ones(10) / 10
    cdf = np.cumsum(probs)
    codeflash_output = draw(cdf, size=100); result = codeflash_output # 22.3μs -> 3.38μs (562% faster)
    # Should be roughly uniform
    counts = np.bincount(result, minlength=10)

def test_draw_extreme_probabilities():
    """Test draw with CDF where one probability is almost 1."""
    probs = np.array([0.999, 0.001])
    cdf = np.cumsum(probs)
    codeflash_output = draw(cdf, size=100); result = codeflash_output # 20.7μs -> 2.54μs (713% faster)

# ----------- Determinism and Output Range -----------

def test_draw_output_range():
    """Test that draw never returns out-of-bounds indices."""
    cdf = np.array([0.2, 0.5, 0.8, 1.0])
    codeflash_output = draw(cdf, size=100); result = codeflash_output # 21.2μs -> 3.25μs (553% faster)

def test_draw_output_determinism(monkeypatch):
    """Test determinism by patching random to always return 0.0."""
    # Patch np.random.random to always return 0.0
    monkeypatch.setattr(np.random, "random", lambda size=None: 0.0 if size is None else np.zeros(size))
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 833ns -> 875ns (4.80% slower)
    codeflash_output = draw(cdf, size=5); result_arr = codeflash_output # 2.62μs -> 1.71μs (53.6% faster)

def test_draw_output_determinism_high(monkeypatch):
    """Test determinism by patching random to always return high value."""
    monkeypatch.setattr(np.random, "random", lambda size=None: 0.999 if size is None else np.full(size, 0.999))
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf); result = codeflash_output # 708ns -> 791ns (10.5% slower)
    codeflash_output = draw(cdf, size=5); result_arr = codeflash_output # 4.17μs -> 3.33μs (25.0% faster)

# ----------- Miscellaneous -----------

def test_draw_size_zero():
    """Test draw with size=0 returns empty array."""
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf, size=0); result = codeflash_output # 1.58μs -> 2.17μs (26.9% slower)

def test_draw_size_one():
    """Test draw with size=1 returns array of length 1."""
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf, size=1); result = codeflash_output # 2.25μs -> 2.21μs (1.90% faster)

def test_draw_size_none():
    """Test draw with size=None returns scalar."""
    cdf = np.array([0.2, 0.5, 1.0])
    codeflash_output = draw(cdf, size=None); result = codeflash_output # 1.25μs -> 1.29μs (3.25% slower)

To edit these changes git checkout codeflash/optimize-draw-mja23ef6 and push.

Codeflash Static Badge

The optimized code achieves a **6.15x speedup** by replacing the pure Python loop-based search with Numba JIT-compiled binary search functions.

**Key Optimizations:**

1. **JIT Compilation with Numba**: Added `@njit(cache=True)` decorators to create compiled binary search functions (`_searchsorted_jit` and `_draw_jit`) that execute at near-C speeds instead of interpreted Python.

2. **Custom Binary Search**: Replaced the original `searchsorted` function calls with a custom binary search implementation that's optimized for Numba compilation, reducing algorithmic complexity from O(n) linear search to O(log n) binary search.

3. **Vectorized Processing**: The `_draw_jit` function processes all random samples in a single compiled function call, eliminating the Python loop overhead from the original implementation.

**Performance Impact:**
- **Large arrays benefit most**: Tests show 1000%+ speedups for large CDFs (1000+ elements) with multiple draws
- **Multiple draws see significant gains**: 77-2066% faster for batch operations (size > 1)
- **Single draws have modest overhead**: 3-10% slower due to JIT compilation and `np.asarray()` conversion costs
- **Small arrays (< 10 elements)**: Mixed results due to compilation overhead vs. search benefits

**Hot Path Benefits:**
Based on the function references showing `draw_jitted` usage in test files, this function appears to be used in Monte Carlo simulations and random sampling workflows where it would be called repeatedly. The JIT compilation cost is amortized over multiple calls, and the O(log n) vs O(n) algorithmic improvement becomes significant for larger probability distributions.

The optimization is most effective for workloads involving repeated sampling from moderate-to-large CDFs, which are common in quantitative economics applications.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 13:37
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants