Skip to content

Conversation

@ghalliday
Copy link
Member

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds performance tests for very large rows by introducing new test cases 01bi_writevlarge and 01ci_countvlarge along with supporting infrastructure. These tests evaluate disk I/O performance with extremely large variable-sized records (~100MB+ per row).

Changes:

  • Added grandParentRec record structure and createGrandParent transform to support nested datasets with multiple levels of hierarchy
  • Introduced vlargeRecordCount configuration parameter for very large record tests
  • Created new performance test files for writing and reading very large records with corresponding expected output files

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
PerformanceTesting/ecl/perform/format.ecl Adds grandParentRec structure and createGrandParent transform for creating deeply nested records
PerformanceTesting/ecl/perform/config.ecl Adds vlargeRecordCount configuration constant for the new test suite
PerformanceTesting/ecl/key/01ci_countvlarge.xml Expected output file for the very large record count test
PerformanceTesting/ecl/key/01bi_writevlarge.xml Expected output file for the very large record write test
PerformanceTesting/ecl/01ci_countvlarge.ecl Test that reads and counts very large records from disk
PerformanceTesting/ecl/01bi_writevlarge.ecl Test that writes very large records to disk
PerformanceTesting/TestSummary.rst Documentation updates describing the new test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +66 to +67
SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2));
SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3));
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The createParent transform expects 3 parameters (id, numChildren, startChild) but is being called with mismatched semantics. The second parameter should be numChildren but 800 and 1000 appear to be constants rather than child counts. This will create parent records with 800 and 1000 children respectively, ignoring the numChildren1 and numChildren2 parameters passed to createGrandParent.

Suggested change
SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2));
SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3));
SELF.c1 := DATASET(numChildren1, createParent(id, numChildren1, 800));
SELF.c2 := DATASET(numChildren2, createParent(id, numChildren2, 1000));

Copilot uses AI. Check for mistakes.
export largeRecordCountPerSlave := 100; // Total serialized memory ~4GB
export largeRecordCount := largeRecordCountPerSlave * numSlaves;
export largeRecordChildren := 500000; // Total size approx 40MB per row
export vlargeRecordCount := 5 * numSlaves;
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment explaining the purpose and expected memory footprint of vlargeRecordCount, unlike the similar largeRecordCount constant which includes a comment about total serialized memory. This would help clarify the scale difference between 'large' and 'vlarge' tests.

Suggested change
export vlargeRecordCount := 5 * numSlaves;
export vlargeRecordCount := 5 * numSlaves; // Very-large record tests: each row is much larger than 'large' rows, so keep count low to avoid excessive memory use

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@AttilaVamos AttilaVamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some concern related to memory/disk usage of these new tests, so I executed them on our BM Performance Testing environment (Ubuntu 22.04, 8 Cores, 24 GB RAM, ~100GB free disk space). All of them passed with acceptable resource usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants