⚡️ Speed up method Kalman.stationary_values by 187%
#43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 187% (1.87x) speedup for
Kalman.stationary_valuesinquantecon/_kalman.py⏱️ Runtime :
25.1 milliseconds→8.74 milliseconds(best of118runs)📝 Explanation and details
The optimization achieves a 187% speedup by applying Numba JIT compilation to the computationally intensive structured doubling algorithm in
solve_discrete_riccati.Key Optimizations:
JIT-compiled Core Algorithm: The entire doubling method implementation is moved into
_solve_discrete_riccati_doubling_njit, decorated with@numba.njit(cache=True). This provides massive acceleration for the matrix-heavy computations that dominate runtime.Eliminated Python Overhead: The original code spent significant time in Python loops and function calls. The line profiler shows the main loop iterations (148 hits each for
A1,G1,H1calculations) taking 5-5.2% of total time each. JIT compilation eliminates this interpreter overhead.Optimized Matrix Operations: All the expensive
np.linalg.solve, matrix multiplications (@), and condition number computations are now JIT-compiled, avoiding Python function call overhead on each operation.Why This Works:
The original profiling shows that
solve_discrete_riccaticonsumed 97.3% of the total runtime, with the gamma selection loop (270 iterations) and main convergence loop (148 iterations) being the primary bottlenecks. Each iteration involves multiple matrix solves and multiplications on moderately-sized matrices (typically 2x2 to 50x50 based on test cases).Numba's nopython mode compiles these operations to optimized machine code, eliminating:
solve()and matrix operationTest Case Performance:
The optimization shows consistent 3-9x speedups across all test cases, from simple 1D systems (871% faster) to large 50D systems (34% faster). The benefit is most pronounced for smaller-to-medium systems where Python overhead was a larger fraction of total time.
Workload Impact:
Since
solve_discrete_riccatiis used in Kalman filtering for solving the steady-state covariance equation, this optimization significantly accelerates any econometric or control applications requiring repeated Riccati solutions, such as dynamic programming or filtering in state-space models.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-Kalman.stationary_values-mj9rwmpxand push.