-
Notifications
You must be signed in to change notification settings - Fork 7
HPCC-35650 Add performance tests for very large rows #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds performance tests for very large rows by introducing new test cases 01bi_writevlarge and 01ci_countvlarge along with supporting infrastructure. These tests evaluate disk I/O performance with extremely large variable-sized records (~100MB+ per row).
Changes:
- Added
grandParentRecrecord structure andcreateGrandParenttransform to support nested datasets with multiple levels of hierarchy - Introduced
vlargeRecordCountconfiguration parameter for very large record tests - Created new performance test files for writing and reading very large records with corresponding expected output files
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| PerformanceTesting/ecl/perform/format.ecl | Adds grandParentRec structure and createGrandParent transform for creating deeply nested records |
| PerformanceTesting/ecl/perform/config.ecl | Adds vlargeRecordCount configuration constant for the new test suite |
| PerformanceTesting/ecl/key/01ci_countvlarge.xml | Expected output file for the very large record count test |
| PerformanceTesting/ecl/key/01bi_writevlarge.xml | Expected output file for the very large record write test |
| PerformanceTesting/ecl/01ci_countvlarge.ecl | Test that reads and counts very large records from disk |
| PerformanceTesting/ecl/01bi_writevlarge.ecl | Test that writes very large records to disk |
| PerformanceTesting/TestSummary.rst | Documentation updates describing the new test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2)); | ||
| SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3)); |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createParent transform expects 3 parameters (id, numChildren, startChild) but is being called with mismatched semantics. The second parameter should be numChildren but 800 and 1000 appear to be constants rather than child counts. This will create parent records with 800 and 1000 children respectively, ignoring the numChildren1 and numChildren2 parameters passed to createGrandParent.
| SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2)); | |
| SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3)); | |
| SELF.c1 := DATASET(numChildren1, createParent(id, numChildren1, 800)); | |
| SELF.c2 := DATASET(numChildren2, createParent(id, numChildren2, 1000)); |
| export largeRecordCountPerSlave := 100; // Total serialized memory ~4GB | ||
| export largeRecordCount := largeRecordCountPerSlave * numSlaves; | ||
| export largeRecordChildren := 500000; // Total size approx 40MB per row | ||
| export vlargeRecordCount := 5 * numSlaves; |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment explaining the purpose and expected memory footprint of vlargeRecordCount, unlike the similar largeRecordCount constant which includes a comment about total serialized memory. This would help clarify the scale difference between 'large' and 'vlarge' tests.
| export vlargeRecordCount := 5 * numSlaves; | |
| export vlargeRecordCount := 5 * numSlaves; // Very-large record tests: each row is much larger than 'large' rows, so keep count low to avoid excessive memory use |
Signed-off-by: Gavin Halliday <[email protected]>
AttilaVamos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some concern related to memory/disk usage of these new tests, so I executed them on our BM Performance Testing environment (Ubuntu 22.04, 8 Cores, 24 GB RAM, ~100GB free disk space). All of them passed with acceptable resource usage.
No description provided.