HPCC-35650 Add performance tests for very large rows #18

ghalliday · 2026-01-16T11:54:43Z

No description provided.

Copilot

Pull request overview

This PR adds performance tests for very large rows by introducing new test cases 01bi_writevlarge and 01ci_countvlarge along with supporting infrastructure. These tests evaluate disk I/O performance with extremely large variable-sized records (~100MB+ per row).

Changes:

Added grandParentRec record structure and createGrandParent transform to support nested datasets with multiple levels of hierarchy
Introduced vlargeRecordCount configuration parameter for very large record tests
Created new performance test files for writing and reading very large records with corresponding expected output files

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
PerformanceTesting/ecl/perform/format.ecl	Adds `grandParentRec` structure and `createGrandParent` transform for creating deeply nested records
PerformanceTesting/ecl/perform/config.ecl	Adds `vlargeRecordCount` configuration constant for the new test suite
PerformanceTesting/ecl/key/01ci_countvlarge.xml	Expected output file for the very large record count test
PerformanceTesting/ecl/key/01bi_writevlarge.xml	Expected output file for the very large record write test
PerformanceTesting/ecl/01ci_countvlarge.ecl	Test that reads and counts very large records from disk
PerformanceTesting/ecl/01bi_writevlarge.ecl	Test that writes very large records to disk
PerformanceTesting/TestSummary.rst	Documentation updates describing the new test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-16T12:12:00Z

PerformanceTesting/ecl/perform/format.ecl

+        SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2));
+        SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3));


The createParent transform expects 3 parameters (id, numChildren, startChild) but is being called with mismatched semantics. The second parameter should be numChildren but 800 and 1000 appear to be constants rather than child counts. This will create parent records with 800 and 1000 children respectively, ignoring the numChildren1 and numChildren2 parameters passed to createGrandParent.

Suggested change

SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2));

SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3));

SELF.c1 := DATASET(numChildren1, createParent(id, numChildren1, 800));

SELF.c2 := DATASET(numChildren2, createParent(id, numChildren2, 1000));

Copilot · 2026-01-16T12:12:00Z

PerformanceTesting/ecl/perform/config.ecl

    export largeRecordCountPerSlave := 100;  // Total serialized memory ~4GB
    export largeRecordCount := largeRecordCountPerSlave * numSlaves;
    export largeRecordChildren := 500000;    // Total size approx 40MB per row
+    export vlargeRecordCount := 5 * numSlaves;


Missing comment explaining the purpose and expected memory footprint of vlargeRecordCount, unlike the similar largeRecordCount constant which includes a comment about total serialized memory. This would help clarify the scale difference between 'large' and 'vlarge' tests.

Suggested change

export vlargeRecordCount := 5 * numSlaves;

export vlargeRecordCount := 5 * numSlaves; // Very-large record tests: each row is much larger than 'large' rows, so keep count low to avoid excessive memory use

Signed-off-by: Gavin Halliday <[email protected]>

AttilaVamos

I had some concern related to memory/disk usage of these new tests, so I executed them on our BM Performance Testing environment (Ubuntu 22.04, 8 Cores, 24 GB RAM, ~100GB free disk space). All of them passed with acceptable resource usage.

ghalliday requested review from AttilaVamos and Copilot January 16, 2026 11:54

Copilot started reviewing on behalf of ghalliday January 16, 2026 12:07 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

HPCC-35650 Add performance tests for very large rows

8102de8

Signed-off-by: Gavin Halliday <[email protected]>

ghalliday force-pushed the issue35650 branch from 82684bb to 8102de8 Compare January 19, 2026 13:32

AttilaVamos approved these changes Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HPCC-35650 Add performance tests for very large rows #18

HPCC-35650 Add performance tests for very large rows #18

Uh oh!

ghalliday commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

Copilot AI Jan 16, 2026

Uh oh!

AttilaVamos left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		SELF.c1 := DATASET(numChildren1, createParent(id, 800, id2));
		SELF.c2 := DATASET(numChildren2, createParent(id, 1000, id3));

	export vlargeRecordCount := 5 * numSlaves;
	export vlargeRecordCount := 5 * numSlaves; // Very-large record tests: each row is much larger than 'large' rows, so keep count low to avoid excessive memory use

HPCC-35650 Add performance tests for very large rows #18

Are you sure you want to change the base?

HPCC-35650 Add performance tests for very large rows #18

Uh oh!

Conversation

ghalliday commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

AttilaVamos left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants