Skip to content

Optimize get_timeseries_metadata() to eliminate N+1 I/O (~94x speedup)#815

Open
JackieTien97 wants to merge 10 commits into
developfrom
ty/opt-timeseries-metadata
Open

Optimize get_timeseries_metadata() to eliminate N+1 I/O (~94x speedup)#815
JackieTien97 wants to merge 10 commits into
developfrom
ty/opt-timeseries-metadata

Conversation

@JackieTien97
Copy link
Copy Markdown
Contributor

@JackieTien97 JackieTien97 commented May 9, 2026

Summary

Optimize TsFile C++ read path in two areas:

1. Optimize get_timeseries_metadata() — eliminate N+1 I/O (~94x speedup for init)

Root Causes:

  1. N+1 I/O: get_all_device_ids() traverses the entire device index tree but discards offset information. Then for each device, load_device_index_entry() re-searches the tree from the root — O(N×D) redundant disk reads for N devices and tree depth D.
  2. Duplicate read: get_device_timeseries_meta_without_chunk_meta() reads the measurement index node once to check alignment, then load_all_measurement_index_entry() reads the exact same byte range again.
  3. Redundant PageArena::init(): Called per-device in a loop despite already being initialized in the constructor.

Changes:

  • Add TsFileIOReader::get_device_timeseries_meta_by_offset() — accepts pre-resolved offsets (skips load_device_index_entry), reuses deserialized top node for get_all_leaf() (eliminates duplicate read)
  • Add TsFileReader::get_all_device_entries() — single tree traversal collecting device IDs with their (start_offset, end_offset)
  • Rewrite get_timeseries_metadata() to combine the above
  • Remove redundant PageArena::init() call from get_timeseries_metadata_impl()

Benchmark (ecg_dataset/part_0.tsfile, 634 MB, 53040 devices):

Path Avg Time Speedup
Before (get_all_device_ids + get_timeseries_metadata(ids)) ~28.8 s
After (get_timeseries_metadata() optimized) ~0.30 s ~94x

2. Optimize queryByRow with exact tag filter — B-tree direct lookup (~230x speedup per query)

Root Cause:
When queryByRow is called with a tag filter (e.g. exact match on one device), DeviceMetaIterator still traverses the entire MetaIndexNode tree: reading all 208 internal nodes from disk and scanning all 53040 leaf device entries to check the filter. This takes ~117ms per query even though only 1 device matches.

Changes:

  • DeviceMetaIterator::try_setup_direct_lookup() — detects single TagEq filter at construction time and constructs the target device ID
  • DeviceMetaIterator::load_results_direct() — uses load_device_index_entry() for O(log N) B-tree binary search instead of full tree traversal
  • Move load_device_index_entry() from private to public in TsFileIOReader for reuse
  • Add bench_read.cpp for standalone C++ read path benchmarking

Benchmark (ecg_dataset/part_0.tsfile, 53040 devices, offset=1000, limit=3584):

Component Before After Speedup
DeviceTaskIterator::next() 117.4 ms 0.5 ms ~230x
first next() (full lazy init) 128 ms 1.8 ms ~71x
Total queryByRow 128 ms 1.9 ms ~67x

End-to-end Python benchmark (training sample read, ecg_dataset):

Metric Before After Speedup
Average latency 236.8 ms 5.74 ms 41x
Throughput 4.32 windows/sec 174.1 windows/sec 40x

Note: The single-TagEq direct lookup covers the common training data access pattern. Multi-tag AND filters fall back to the existing full-scan path.

Test plan

  • All 522 existing tests pass
  • Verified result correctness: both paths return identical device count (53040)
  • C++ bench_read validates queryByRow returns correct row count (3584)
  • Python end-to-end benchmark validates data correctness with training pipeline

- Add get_device_timeseries_meta_by_offset() that accepts pre-resolved
  offsets, skipping redundant load_device_index_entry() tree search and
  reusing the deserialized measurement index node via get_all_leaf()
  instead of re-reading the same bytes through
  load_all_measurement_index_entry()
- Add get_all_device_entries() to collect device IDs with their
  (start_offset, end_offset) in a single index tree traversal
- Rewrite get_timeseries_metadata() to use the above two methods
- Remove redundant PageArena::init() call in get_timeseries_metadata_impl
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the “collect metadata for all devices” path by avoiding repeated device-index traversals and redundant disk reads when building the full device→timeseries-metadata map.

Changes:

  • Adds an offset-based metadata loader (TsFileIOReader::get_device_timeseries_meta_by_offset) to skip per-device index-tree searches.
  • Adds a single-pass device index traversal that collects (device_id, start_offset, end_offset) entries for all devices.
  • Rewrites TsFileReader::get_timeseries_metadata() to use the above and removes a redundant PageArena::init() call.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
cpp/src/reader/tsfile_reader.cc Adds device-entry collection by offset and rewrites the “all devices metadata” path to use it.
cpp/src/file/tsfile_io_reader.h Declares new offset-based device metadata API.
cpp/src/file/tsfile_io_reader.cc Implements new offset-based device metadata loader to reduce redundant reads/traversals.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/src/reader/tsfile_reader.cc Outdated
Comment thread cpp/src/reader/tsfile_reader.cc Outdated
Comment thread cpp/src/reader/tsfile_reader.cc
Comment thread cpp/src/reader/tsfile_reader.cc
Comment thread cpp/src/file/tsfile_io_reader.cc
Copy link
Copy Markdown
Contributor Author

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Great optimization — the 94x speedup from eliminating the N+1 I/O pattern is impressive. The approach of collecting device entries with offsets in a single traversal and then reusing the deserialized top node is well-designed.

One critical bug found: integer truncation of 64-bit file offsets in get_all_device_entries. See inline comments.

Comment thread cpp/src/reader/tsfile_reader.cc Outdated
Comment thread cpp/src/file/tsfile_io_reader.cc
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Comment thread cpp/src/file/tsfile_io_reader.cc
Comment thread cpp/src/file/tsfile_io_reader.cc
Comment thread cpp/src/file/tsfile_io_reader.cc
Comment thread cpp/src/reader/tsfile_reader.cc
Comment thread cpp/src/reader/tsfile_reader.cc
Comment thread cpp/src/reader/tsfile_reader.cc
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 9, 2026

Codecov Report

❌ Patch coverage is 62.00000% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.17%. Comparing base (f7041d8) to head (4b34bf5).

Files with missing lines Patch % Lines
cpp/src/reader/tsfile_reader.cc 41.66% 16 Missing and 5 partials ⚠️
cpp/src/reader/device_meta_iterator.cc 74.28% 1 Missing and 8 partials ⚠️
cpp/src/file/tsfile_io_reader.cc 66.66% 0 Missing and 8 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #815      +/-   ##
===========================================
- Coverage    63.18%   63.17%   -0.01%     
===========================================
  Files          717      717              
  Lines        43783    43875      +92     
  Branches      6526     6562      +36     
===========================================
+ Hits         27664    27720      +56     
- Misses       15127    15145      +18     
- Partials       992     1010      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JackieTien97 and others added 8 commits May 10, 2026 08:56
When DeviceMetaIterator receives a single TagEq filter (exact match on
one tag column), construct the target device ID and use
load_device_index_entry() for O(log N) B-tree binary search instead of
traversing all internal nodes and scanning all leaf devices.

For ecg_dataset (53040 devices), this reduces queryByRow's
DeviceTaskIterator::next() from ~117ms to ~0.5ms per query (~230x).

Also adds bench_read.cpp for standalone C++ read path benchmarking
and makes load_device_index_entry() public for reuse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TagEq direct B-tree lookup constructed a device ID with only
col_idx_+1 segments, but devices with multiple tag columns have more.
The segment count mismatch caused operator== to always return false,
so queries returned 0 rows. Guard the optimization to only activate
when the filter fully specifies the device ID (single tag column).
- GetTimeseriesMetadataTableModel: test get_timeseries_metadata()
  with table model, covering get_all_device_entries (LEAF_DEVICE
  path) and get_device_timeseries_meta_by_offset
- GetTimeseriesMetadataMultiTable: test with two tables, covering
  the multi-table iteration in get_timeseries_metadata
- DirectLookupSingleTagColumn: test the TagEq direct B-tree lookup
  optimization with a single-tag-column table, covering
  try_setup_direct_lookup and load_results_direct
- DirectLookupNonExistDevice: test direct lookup returns 0 rows
  when the device does not exist
The previous guard (actual_segment_count != eq->col_idx_ + 1) still
allowed direct lookup when filtering on the last tag column in a
multi-tag table (e.g., 2 tags: segment_count=3, col_idx=2, 2+1=3).
The constructed device ID had empty segments for unfiltered tags,
causing B-tree lookup to find nothing. Restrict direct lookup to
single-tag tables only (segment_count == 2).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants