⚡️ Speed up function cosine_similarity by 14%
#184
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 14% (0.14x) speedup for
cosine_similarityinsrc/statistics/similarity.py⏱️ Runtime :
24.6 microseconds→21.6 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 13% speedup through three key changes that reduce computational overhead and memory allocations:
What optimizations were applied:
Replaced
np.array()withnp.asarray()- This avoids unnecessary array copying when inputs are already numpy arrays, reducing memory allocation overhead.Split the combined dot product and division operation - The original
np.dot(X, Y.T) / np.outer(X_norm, Y_norm)was split into separatedot = X @ Y.Tandnorm_product = np.outer(X_norm, Y_norm)operations.Eliminated the NaN/Inf detection pass - Instead of computing the full similarity matrix then scanning for NaN/Inf values, the optimized version pre-allocates a zero matrix and only performs division where denominators are non-zero, naturally avoiding division by zero.
Why this leads to speedup:
np.asarray()avoids copying already-formatted numpy arraysnonzero = norm_product != 0creates more cache-friendly access patterns by avoiding scattered NaN/Inf checksImpact on workloads:
Based on the
function_references, this function is called bycosine_similarity_top_k()which processes similarity matrices to find top matches. The optimization particularly benefits:The optimization performs well across all test scenarios, with particular benefits for edge cases involving zero vectors where the original code would generate and then clean up NaN/Inf values unnecessarily.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_y2k4lcao/tmpli28xgpf/test_concolic_coverage.py::test_cosine_similaritycodeflash_concolic_y2k4lcao/tmpli28xgpf/test_concolic_coverage.py::test_cosine_similarity_2codeflash_concolic_y2k4lcao/tmpli28xgpf/test_concolic_coverage.py::test_cosine_similarity_3To edit these changes
git checkout codeflash/optimize-cosine_similarity-mix166hxand push.