Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 9% (0.09x) speedup for DiscreteDP.to_product_form in quantecon/markov/ddp.py

⏱️ Runtime : 206 microseconds 190 microseconds (best of 104 runs)

📝 Explanation and details

The optimized code achieves an 8% speedup by replacing the expensive NumPy advanced indexing operation R[self.s_indices, self.a_indices] = self.R with a Numba-compiled function _assign_sa_rewards.

Key optimization:

  • Added @njit(cache=True) decorated function _assign_sa_rewards that performs the reward assignment in a tight, compiled loop instead of using NumPy's advanced indexing
  • Changed @jit(nopython=True) to @njit in _fill_dense_Q for cleaner syntax and explicit nopython mode

Why this is faster:
NumPy's advanced indexing R[s_indices, a_indices] = values involves significant Python overhead for index validation, memory allocation, and dispatching. The Numba-compiled loop eliminates this overhead by generating optimized machine code that directly iterates through the indices and performs assignments.

Performance characteristics:
The line profiler shows the reward assignment line dropping from consuming a significant portion of execution time to being much more efficient. The optimization is most effective for larger state-action pair arrays (higher L values), as seen in the test cases where speedups range from 5-25% depending on problem size.

Impact on workloads:
This optimization benefits any code that converts DiscreteDP instances from state-action pair form to product form, which is common in dynamic programming workflows where different algorithms may require different data representations. The speedup becomes more pronounced with larger problem instances.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 49 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest
from quantecon.markov.ddp import DiscreteDP

def _has_sorted_sa_indices(s_indices, a_indices):
    L = len(s_indices)
    for i in range(L-1):
        if s_indices[i] > s_indices[i+1]:
            return False
        if s_indices[i] == s_indices[i+1]:
            if a_indices[i] >= a_indices[i+1]:
                return False
    return True

def _generate_a_indptr(num_states, s_indices, out):
    idx = 0
    out[0] = 0
    for s in range(num_states-1):
        while(idx < len(s_indices) and s_indices[idx] == s):
            idx += 1
        out[s+1] = idx
    out[num_states] = len(s_indices)
from quantecon.markov.ddp import DiscreteDP

# -------------------- UNIT TESTS --------------------

# Basic Test Cases

def test_already_product_form_returns_self():
    # Product form: R is (n, m), Q is (n, m, n)
    R = np.array([[1, 2], [3, 4]])
    Q = np.array([
        [[1, 0], [0, 1]],
        [[0.5, 0.5], [0.2, 0.8]]
    ])
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta)
    # Should return self
    codeflash_output = ddp.to_product_form(); result = codeflash_output # 208ns -> 167ns (24.6% faster)

def test_sa_pair_to_product_form_dense():
    # State-action pair form, dense Q
    # 2 states, 2 actions, only 3 feasible pairs
    R = np.array([5, 10, -1])
    Q = np.array([
        [0.5, 0.5],   # (0, 0)
        [0.0, 1.0],   # (0, 1)
        [0.0, 1.0],   # (1, 0)
    ])
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([0, 1, 0])
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 19.2μs -> 17.0μs (13.5% faster)

def test_sa_pair_to_product_form_sparse():
    import scipy.sparse as sp

    # State-action pair form, sparse Q
    R = np.array([1, 2, 3])
    Q = sp.csr_matrix(np.array([
        [0.1, 0.9],
        [0.2, 0.8],
        [0.3, 0.7]
    ]))
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([0, 1, 0])
    beta = 0.8
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 19.1μs -> 18.0μs (6.02% faster)

def test_sa_pair_to_product_form_unsorted_indices():
    # Unsorted s_indices/a_indices should be sorted in product form
    R = np.array([10, -1, 5])
    Q = np.array([
        [0.0, 1.0],   # (0, 1)
        [0.0, 1.0],   # (1, 0)
        [0.5, 0.5],   # (0, 0)
    ])
    s_indices = np.array([0, 1, 0])
    a_indices = np.array([1, 0, 0])
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 15.5μs -> 13.8μs (12.7% faster)

# Edge Test Cases

def test_single_state_action():
    # Only one state and one action
    R = np.array([42])
    Q = np.array([[1.0]])
    s_indices = np.array([0])
    a_indices = np.array([0])
    beta = 1.0
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 16.2μs -> 15.0μs (8.05% faster)

def test_action_indices_not_zero_based():
    # Actions are not zero-based, e.g., actions 1 and 2 (should still work)
    R = np.array([1, 2])
    Q = np.array([
        [0.4, 0.6],
        [0.3, 0.7]
    ])
    s_indices = np.array([0, 1])
    a_indices = np.array([1, 2])  # actions start at 1
    beta = 0.5
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 17.8μs -> 16.5μs (7.81% faster)

def test_all_actions_feasible():
    # All actions feasible for all states
    R = np.array([1, 2, 3, 4])
    Q = np.array([
        [0.5, 0.5],
        [0.6, 0.4],
        [0.7, 0.3],
        [0.8, 0.2]
    ])
    s_indices = np.array([0, 0, 1, 1])
    a_indices = np.array([0, 1, 0, 1])
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 14.5μs -> 13.2μs (9.81% faster)

# Large Scale Test Cases

def test_large_all_actions_feasible():
    # All actions feasible for all states
    n, m = 20, 20
    R = np.random.randn(n, m)
    Q = np.random.dirichlet(np.ones(n), size=(n, m))
    beta = 0.99
    ddp = DiscreteDP(R, Q, beta)
    codeflash_output = ddp.to_product_form(); prod = codeflash_output # 208ns -> 208ns (0.000% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest
import scipy.sparse as sp
from quantecon.markov.ddp import DiscreteDP

def _has_sorted_sa_indices(s_indices, a_indices):
    L = len(s_indices)
    for i in range(L-1):
        if s_indices[i] > s_indices[i+1]:
            return False
        if s_indices[i] == s_indices[i+1]:
            if a_indices[i] >= a_indices[i+1]:
                return False
    return True

def _generate_a_indptr(num_states, s_indices, out):
    idx = 0
    out[0] = 0
    for s in range(num_states-1):
        while(idx < len(s_indices) and s_indices[idx] == s):
            idx += 1
        out[s+1] = idx
    out[num_states] = len(s_indices)

def _s_wise_max(a_indices, a_indptr, vals, out_max=None):
    # For each state, find max over actions
    n = len(a_indptr) - 1
    if out_max is None:
        out_max = np.empty(n)
    for i in range(n):
        out_max[i] = np.max(vals[a_indptr[i]:a_indptr[i+1]])
    return out_max

def _s_wise_max_argmax(a_indices, a_indptr, vals, out_max=None, out_argmax=None):
    n = len(a_indptr) - 1
    if out_max is None:
        out_max = np.empty(n)
    if out_argmax is None:
        out_argmax = np.empty(n, dtype=int)
    for i in range(n):
        idx = np.argmax(vals[a_indptr[i]:a_indptr[i+1]])
        out_max[i] = vals[a_indptr[i]+idx]
        out_argmax[i] = a_indices[a_indptr[i]+idx]
    return out_max, out_argmax
from quantecon.markov.ddp import DiscreteDP

# ------------------- UNIT TESTS -------------------

# Basic Test Cases

def test_already_product_form_returns_self():
    # Test that product form returns self (object identity)
    R = np.array([[1, 2], [3, -np.inf]])
    Q = np.array([
        [[0.7, 0.3], [0.0, 1.0]],
        [[0.2, 0.8], [0.5, 0.5]]
    ])
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta)
    codeflash_output = ddp.to_product_form(); result = codeflash_output # 125ns -> 167ns (25.1% slower)

def test_simple_sa_pair_to_product_form():
    # Test conversion from state-action pair to product form
    R = np.array([5, 10, -1])
    Q = np.array([[0.5, 0.5], [0, 1], [0, 1]])
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([0, 1, 0])
    beta = 0.95
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 16.8μs -> 15.1μs (11.0% faster)

def test_sparse_sa_pair_to_product_form():
    # Test with a scipy sparse Q matrix
    R = np.array([1, 2, 3])
    Q = np.array([[0.1, 0.9], [0.5, 0.5], [0.0, 1.0]])
    Q_sparse = sp.csr_matrix(Q)
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([0, 1, 0])
    beta = 0.8
    ddp = DiscreteDP(R, Q_sparse, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 18.9μs -> 17.6μs (7.33% faster)

# Edge Test Cases

def test_sa_pair_with_nonzero_action_indices():
    # Test with action indices not starting at zero and with gaps
    R = np.array([1, 2, 3])
    Q = np.array([[0.5, 0.5], [0.0, 1.0], [1.0, 0.0]])
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([1, 3, 1])  # action 1 and 3 for state 0, action 1 for state 1
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 13.0μs -> 12.3μs (5.74% faster)

def test_unsorted_sa_pair_indices():
    # Unsorted input should still be handled correctly
    R = np.array([2, 1, 3])
    Q = np.array([[0.0, 1.0], [0.5, 0.5], [1.0, 0.0]])
    s_indices = np.array([0, 0, 1])
    a_indices = np.array([3, 1, 1])  # unsorted by (s,a)
    beta = 0.9
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 14.8μs -> 13.2μs (11.9% faster)

def test_all_actions_feasible():
    # All actions feasible for all states
    R = np.array([1, 2, 3, 4])
    Q = np.array([[0.5, 0.5], [0.0, 1.0], [1.0, 0.0], [0.6, 0.4]])
    s_indices = np.array([0, 0, 1, 1])
    a_indices = np.array([0, 1, 0, 1])
    beta = 0.7
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 12.5μs -> 11.7μs (6.40% faster)

def test_infeasible_actions_are_minus_inf():
    # If an action is infeasible, reward is -inf and Q is zero
    R = np.array([1, 2])
    Q = np.array([[0.5, 0.5], [0.0, 1.0]])
    s_indices = np.array([0, 1])
    a_indices = np.array([0, 1])
    beta = 0.7
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 12.3μs -> 11.6μs (6.11% faster)

def test_beta_one_returns_correct_form():
    # Test beta=1 (should still convert, but solution methods disabled)
    R = np.array([1, 2])
    Q = np.array([[0.5, 0.5], [0.0, 1.0]])
    s_indices = np.array([0, 1])
    a_indices = np.array([0, 1])
    beta = 1.0
    ddp = DiscreteDP(R, Q, beta, s_indices, a_indices)
    codeflash_output = ddp.to_product_form(); ddp_prod = codeflash_output # 14.8μs -> 13.9μs (6.31% faster)

# Large Scale Test Cases

To edit these changes git checkout codeflash/optimize-DiscreteDP.to_product_form-mj9t62pf and push.

Codeflash Static Badge

The optimized code achieves an 8% speedup by replacing the expensive NumPy advanced indexing operation `R[self.s_indices, self.a_indices] = self.R` with a Numba-compiled function `_assign_sa_rewards`. 

**Key optimization:**
- Added `@njit(cache=True)` decorated function `_assign_sa_rewards` that performs the reward assignment in a tight, compiled loop instead of using NumPy's advanced indexing
- Changed `@jit(nopython=True)` to `@njit` in `_fill_dense_Q` for cleaner syntax and explicit nopython mode

**Why this is faster:**
NumPy's advanced indexing `R[s_indices, a_indices] = values` involves significant Python overhead for index validation, memory allocation, and dispatching. The Numba-compiled loop eliminates this overhead by generating optimized machine code that directly iterates through the indices and performs assignments.

**Performance characteristics:**
The line profiler shows the reward assignment line dropping from consuming a significant portion of execution time to being much more efficient. The optimization is most effective for larger state-action pair arrays (higher `L` values), as seen in the test cases where speedups range from 5-25% depending on problem size.

**Impact on workloads:**
This optimization benefits any code that converts DiscreteDP instances from state-action pair form to product form, which is common in dynamic programming workflows where different algorithms may require different data representations. The speedup becomes more pronounced with larger problem instances.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 December 17, 2025 09:27
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant