Skip to content

Conversation

pulpdrew
Copy link
Contributor

@pulpdrew pulpdrew commented Oct 2, 2025

Summary

Closes HDX-2310
Closes HDX-2616

This PR implements chunking of chart queries to improve performance of charts on large data sets and long time ranges. Recent data is loaded first, then older data is loaded one-chunk-at-a-time until the full chart date range has been queried.

Screen.Recording.2025-10-03.at.1.11.09.PM.mov

Performance Impacts

Expectations

This change is intended to improve performance in a few ways:

  1. Queries over long time ranges are now much less likely to time out, since the range is chunked into several smaller queries
  2. Average memory usage should decrease, since the total result size and number of rows being read are smaller
  3. Perceived latency of queries over long date ranges is likely to decrease, because users will start seeing charts render (more recent) data as soon as the first chunk is queried, instead of after the entire date range has been queried. However, total latency to display results for the entire date range is likely to increase, due to additional round-trip network latency being added for each additional chunk.

Measured Results

Overall, the results match the expectations outlined above.

  • Total latency changed between ~-4% and ~25%
  • Average memory usage decreased by between 18% and 80%
Scenarios and data

In each of the following tests:

  1. Queries were run 5 times before starting to measure, to ensure data is filesystem cached.
  2. Queries were then run 3 times. The results shown are the median result from the 3 runs.

Scenario: Log Search Histogram in Staging V2, 2 Day Range, No Filter

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 5.36 409.23 MiB 409.23 MiB 1
Chunked 5.14 83.06 MiB 232.69 MiB 4

Scenario: Log Search Histogram in Staging V2, 14 Day Range, No Filter

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 26.56 383.63 MiB 383.63 MiB 1
Chunked 33.08 130.00 MiB 241.21 MiB 16

Scenario: Chart Explorer Line Chart with p90 and p99 trace durations, Staging V2 Traces, Filtering for "GET" spans, 7 Day range

Total Latency Memory Usage (Avg) Memory Usage (Max) Chunk Count
Original 2.79 346.12 MiB 346.12 MiB 1
Chunked 3.26 283.00 MiB 401.38 MiB 9

Implementation Notes

When is chunking used? Chunking is used when all of the following are true:
  1. granularity and timestampValueExpression are defined in the config. This ensures that the query is already being bucketed. Without bucketing, chunking would break aggregation queries, since groups can span multiple chunks.
  2. dateRange is defined in the config. Without a date range, we'd need an unbounded set of chunks or the start and end chunks would have to be unbounded at their start and end, respectively.
  3. The config is not a metrics query. Metrics queries have complex logic which we want to avoid breaking with the initial delivery of this feature.
  4. The consumer of useQueriedChartConfig does not pass the disableQueryChunking: true option. This option is provided to disable chunking when necessary.
How are time windows chosen?
  1. First, generate the windows as they are generated for the existing search chunking feature (eg. 6 hours back, 6 hours back, 12 hours back, 24 hours back...)
  2. Then, the start and end of each window is aligned to the start of a time bucket that depends on the "granularity" of the chart.
  3. The first and last windows are shortened or extended so that the combined date range of all of the windows matches the start and end of the original config.
Which order are the chunks queried in?

Chunks are queried sequentially, most-recent first, due to the expectation that more recent data is typically more important to the user. Unlike with useOffsetPaginatedSearch, we are not paginating the data beyond the chunks, and all data is typically displayed together, so there is no need to support "ascending" order.

Does this improve client-side caching behavior?

One theoretical way in which query chunking could improve performance to enable client-side caching of individual chunks, which could then be re-used if the same query is run over a longer time range.

Unfortunately, using streamedQuery, react-query stores the entire time range as one item in the cache, so it does not re-use individual chunks or "pages" from another query.

We could accomplish this improvement by using useQueries instead of streamedQuery or useInfiniteQuery. In that case, we'd treat each chunk as its own query. This would require a number of changes:

  1. Our query key would have to include the chunk's window duration
  2. We'd need some hacky way of making the useQueries requests fire in sequence. This can be done using enabled but requires some additional state to figure out whether the previous query is done.
  3. We'd need to emulate the return value of a useQuery using the useQueries result, or update consumers.

Copy link

changeset-bot bot commented Oct 2, 2025

🦋 Changeset detected

Latest commit: de6ae37

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@hyperdx/app Patch
@hyperdx/api Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

vercel bot commented Oct 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
hyperdx-v2-oss-app Ready Ready Preview Comment Oct 16, 2025 11:53pm

Copy link
Contributor

github-actions bot commented Oct 2, 2025

E2E Test Results

All tests passed • 25 passed • 3 skipped • 228s

Status Count
✅ Passed 25
❌ Failed 0
⚠️ Flaky 0
⏭️ Skipped 3

View full report →

@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from e077e57 to 5620ca8 Compare October 3, 2025 15:09
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from a3b22e9 to 2d0e0b7 Compare October 3, 2025 20:01
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 2d0e0b7 to 6229d52 Compare October 6, 2025 18:40
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 6229d52 to 3e73289 Compare October 6, 2025 18:49
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 943ddb8 to 3e899c8 Compare October 6, 2025 19:17
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 3e899c8 to 76d2753 Compare October 6, 2025 20:48
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from 76d2753 to ec976e5 Compare October 7, 2025 14:45
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from ec976e5 to ca69e95 Compare October 7, 2025 16:23
@pulpdrew pulpdrew changed the title Drew/paginated chart queries feat: Implement query chunking for charts Oct 7, 2025
@pulpdrew pulpdrew marked this pull request as ready for review October 7, 2025 17:03
@pulpdrew pulpdrew requested review from a team and dhable and removed request for a team October 7, 2025 17:06
@pulpdrew pulpdrew changed the title feat: Implement query chunking for charts perf: Implement query chunking for charts Oct 7, 2025
@pulpdrew pulpdrew force-pushed the drew/paginated-chart-queries branch from ca69e95 to 61afdb4 Compare October 9, 2025 15:34
Copy link

claude bot commented Oct 9, 2025

PR Review: Query Chunking for Charts

Critical Issues

✅ No critical issues found.

Code Quality Observations

Strong Points:

  • Comprehensive test coverage (1126 new test lines)
  • Well-documented implementation with clear opt-in behavior
  • Proper handling of edge cases (window alignment, empty ranges, etc.)
  • Backwards compatible (chunking is opt-in via enableQueryChunking)

Minor Recommendations:

  • ⚠️ useChartConfig.tsx:176 - Data order inconsistency → Chunks are prepended (line 177: [...chunk.data, ...accumulated.data]), resulting in oldest-first order. This contradicts the stated "most-recent first" query order and could confuse consumers expecting chronological display.

  • ⚠️ useChartConfig.tsx:244-257 - Inconsistent cache update logic → Refetch skips intermediate updates but fresh queries update on every chunk. Consider unifying this behavior or documenting why refetch needs different handling.

  • ℹ️ renderChartConfig.ts:743 - dateRangeEndInclusive defaults to true but chunks set it to false for non-final windows. Verify this does not cause off-by-one errors at window boundaries when granularity does not align perfectly.

Testing Note: The comprehensive test suite is excellent, but consider adding an integration test that verifies actual query results match between chunked and non-chunked modes for the same config.

Overall: Well-implemented feature with good architectural decisions. The opt-in design and thorough testing inspire confidence.

Copy link
Contributor

@knudtty knudtty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one comment regarding streaming using similar existing mechanisms, this looks really good overall

knudtty
knudtty previously approved these changes Oct 15, 2025
Copy link
Contributor

@knudtty knudtty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

knudtty
knudtty previously approved these changes Oct 16, 2025
Comment on lines 124 to 129
async function* fetchDataInChunks(
config: ChartConfigWithOptDateRange,
clickhouseClient: ClickhouseClient,
signal: AbortSignal,
disableQueryChunking: boolean = false,
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style:I suggest converting it into an object when the method takes more than three args

const windows =
!disableQueryChunking && shouldUseChunking(config)
? getGranularityAlignedTimeWindows(config)
: ([undefined] as const);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need type casting here

...(window ?? {}),
};

const result = await clickhouseClient.queryChartConfig({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check that we want to abort fetching if queryChartConfig throws? and what happens on the UI side?

config: ChartConfigWithOptDateRange,
clickhouseClient: ClickhouseClient,
signal: AbortSignal,
disableQueryChunking: boolean = false,
Copy link
Member

@wrn14897 wrn14897 Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's safer to make it disabled by default and opt in feature by feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants