Skip to content

Optimize raw data requests.#16

Closed
aaronweeden wants to merge 3 commits intoubccr:mainfrom
aaronweeden:optimize-jobs-raw-data-query
Closed

Optimize raw data requests.#16
aaronweeden wants to merge 3 commits intoubccr:mainfrom
aaronweeden:optimize-jobs-raw-data-query

Conversation

@aaronweeden
Copy link

@aaronweeden aaronweeden commented Sep 21, 2023

Description

This improves the performance of raw data requests.

Instead of repeatedly requesting the entire time range in chunks of 10,000 rows (or whatever the portal has configured as the rest_raw_row_limit), the method now makes a separate request for each day in the range, still in chunks of 10,000 rows (or whateverrest_raw_row_limit is).

For example, instead of:

start_date=2023-01-01&end_date=2023-01-31&offset=0
start_date=2023-01-01&end_date=2023-01-31&offset=10000
start_date=2023-01-01&end_date=2023-01-31&offset=20000
...

it is now (note that for each request, start_date == end_date):

start_date=2023-01-01&end_date=2023-01-01&offset=0
start_date=2023-01-01&end_date=2023-01-01&offset=10000
start_date=2023-01-01&end_date=2023-01-01&offset=20000
...
start_date=2023-01-02&end_date=2023-01-02&offset=0
start_date=2023-01-02&end_date=2023-01-02&offset=10000
start_date=2023-01-02&end_date=2023-01-02&offset=20000
...

The full result is received faster, even though more queries are made (e.g., ~1min for 7 days (258,836 rows) of SUPREMM data as opposed to ~1m20s seconds), and for a portal that implements ubccr/xdmod#1780, requests for raw Jobs realm data are received much faster (around 20 seconds to fetch two days (63,251 rows) of Jobs data as opposed to around 25 minutes).

This also updates the show_progress feature to also print the number of days that have been retrieved so far.

TODO: add tests

Motivation and Context

Requests for raw data in the ACCESS XDMoD Jobs realm are incredibly slow. This is also the case for metrics-staging.

Tests performed

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request
  • Running the automated tests (see docs/developing.md) produces no errors.

@aaronweeden aaronweeden added the bug Something isn't working label Sep 21, 2023
@aaronweeden aaronweeden added this to the 1.0.1 milestone Sep 21, 2023
@aaronweeden
Copy link
Author

Closed in favor of #19.

@aaronweeden aaronweeden deleted the optimize-jobs-raw-data-query branch November 22, 2023 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant