⚡️ Speed up function _depth by 26%
#205
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 26% (0.26x) speedup for
_depthinkeras/src/applications/mobilenet_v3.py⏱️ Runtime :
433 microseconds→345 microseconds(best of250runs)📝 Explanation and details
The optimization improves performance by 25% through three key changes that reduce computational overhead:
1. Eliminates float division: The original code uses
divisor / 2which creates a float intermediate value. The optimized version usesdivisor // 2(integer division), keeping all arithmetic in the integer domain until the final float comparison.2. Reduces function call overhead: Instead of using Python's
max()function to ensure the result is at leastmin_value, the optimization uses a direct conditional checkif new_v < min_value: new_v = min_value. This avoids the function call overhead ofmax().3. Single type conversion: The original code calls
int()within a complex expression, while the optimized version does one upfront conversionvd = int(v)and then uses integer arithmetic throughout the main calculation.Performance impact in MobileNetV3: The function references show
_depth()is called extensively during model construction - in_inverted_res_block(),_se_block(), and the mainMobileNetV3()function. Since MobileNetV3 uses multiple inverted residual blocks (typically 11+ blocks), each calling_depth()multiple times for channel calculations, this 25% speedup compounds significantly during model initialization.Test case performance: The optimization shows consistent 20-40% improvements across all test scenarios, with particularly strong gains (30-60%) on smaller input values that are common in neural network channel calculations. The optimization maintains identical behavior for edge cases like the 10% rounding rule and min_value constraints.
This optimization is especially valuable since
_depth()is in the hot path of model construction where channel dimensions are calculated repeatedly.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_depth-mjac2cmmand push.