feat(utils): Accelerate data generation by 168x using NumPy by raphaelgimenezneto · Pull Request #1 · Ismail-Dagli/smart-ride-pooling

raphaelgimenezneto · 2026-01-17T04:40:41Z

Hello!

This PR introduces a high-performance, vectorized implementation for the generate_historical_rides function, resulting in a ~168x speedup in the simulation's data setup phase.

The Problem: Identifying the Bottleneck
Using cProfile, I identified that the original generate_historical_rides function was the most significant bottleneck, consuming over 10 seconds of execution time. This was primarily due to its iterative, loop-based approach for generating a large number of records.

Profiler Output (Before):

The Solution: Vectorization with NumPy
The solution was to replace the iterative method with NumPy vectorization. Instead of processing records one-by-one, this approach operates on entire arrays of data at once, leveraging NumPy's highly optimized C backend for maximum efficiency.

The Results: Performance Gain & Validation
The new implementation is 168.29x faster, reducing the execution time from 10.68 seconds to just 0.06 seconds.

More importantly, this speed was achieved without sacrificing correctness. A comprehensive statistical validation suite confirms that the new function produces a dataset that is statistically equivalent to the original, preserving all key patterns like rush hour distribution and hotspot logic.

Benchmark & Validation Results:

Profiler Output (After):
As a result, generate_historical_rides no longer appears as a major bottleneck in the profiler output.

Changes in this Pull Request

src/utils.py: The original function has been replaced with the high-performance vectorized version.
benchmarking/benchmark_data_generation.py: A new, self-contained script has been added. It contains the original "frozen" code and the logic used to generate the benchmark results above, serving as reproducible proof of the improvement.

This PR serves as a practical case study in applying HPC principles to scientific Python code.

feat(utils): Accelerate data generation by 168x using NumPy

b83ab37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(utils): Accelerate data generation by 168x using NumPy#1

feat(utils): Accelerate data generation by 168x using NumPy#1
raphaelgimenezneto wants to merge 1 commit into
Ismail-Dagli:mainfrom
raphaelgimenezneto:feature/optimize-greedy-solver

raphaelgimenezneto commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raphaelgimenezneto commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant