Skip to content

Conversation

@fengttt
Copy link
Contributor

@fengttt fengttt commented Oct 25, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #3433

What this PR does / why we need it:

HashAgg spill. Refactor group and merge group code. Now if agg group mem usage
reached a threshold, we will spill to disk and read it back.

merge group is simply spill to network, and read it back from network.


PR Type

Bug fix, Enhancement, Tests


Description

  • Implements comprehensive HashAgg spill-to-disk functionality with memory management and multi-pass support

  • Adds new Group and MergeGroup operators with state machine architecture (Build/Eval/End states) for handling group-by and aggregation operations

  • Implements bucket-based partitioning and spill file I/O with serialization/deserialization of aggregation state

  • Adds distinct value tracking and serialization support across all aggregation functions (COUNT, SUM, AVG, MEDIAN, GROUP_CONCAT, APPROX_COUNT, window functions, etc.)

  • Refactors batch serialization to use flexible ExtraBuf1 and ExtraBuf2 fields instead of direct Aggs field storage

  • Migrates MergeGroup operator from separate mergegroup package to unified group package

  • Adds reader-based vector and batch deserialization for streaming I/O support

  • Implements AllGroupHash() method across all hash map implementations for spill bucket computation

  • Adds utility functions for primitive type serialization/deserialization in encoding module

  • Fixes hash code slice allocation logic and spill file service type handling

  • Updates window operator to use colexec.ExprEvalVector instead of group.ExprEvalVector

  • Adds MarshalBinary() methods to aggregation context types for serialization support


Diagram Walkthrough

flowchart LR
  A["Input Batches"] -->|"Build Phase"| B["Group Operator<br/>Hash Table"]
  B -->|"Memory Exceeded"| C["Spill to Disk<br/>Bucket Partitioning"]
  C -->|"Serialize"| D["Spill Files"]
  D -->|"Load & Merge"| E["MergeGroup Operator"]
  E -->|"Deserialize"| F["Aggregation Results"]
  B -->|"Eval Phase"| G["Aggregation Exec<br/>Distinct Tracking"]
  G -->|"End Phase"| F
  H["Distinct Hash"] -->|"Serialize"| I["Extra Buffers"]
  I -->|"Deserialize"| J["Merged Results"]
Loading

File Walkthrough

Relevant files
Feature
4 files
exec2.go
New HashAgg spill implementation with state machine           

pkg/sql/colexec/group/exec2.go

  • Implements new HashAgg spill functionality with memory management and
    disk spillover
  • Adds Prepare() and Call() methods for group operator state machine
    (Build/Eval/End states)
  • Implements batch building with hash table insertion and aggregation
    filling
  • Adds spill-to-disk mechanism with bucket-based partitioning and
    multi-pass support
+530/-0 
helper.go
Spill file I/O and hash table helper functions                     

pkg/sql/colexec/group/helper.go

  • Implements ResHashRelated struct for hash table management with insert
    tracking
  • Adds spillDataToDisk() to serialize group-by batches and aggregates to
    spill files
  • Implements loadSpilledData() to deserialize and merge spilled data
    back into memory
  • Adds helper functions for bucket computation and spill bucket
    traversal
+424/-0 
types2.go
Group operator type definitions and container management 

pkg/sql/colexec/group/types2.go

  • Defines Group and MergeGroup operator types with spill memory
    configuration
  • Implements container struct with hash table, group-by batches, and
    spill bucket management
  • Adds evaluation methods for group-by and aggregation arguments
  • Provides memory management and reset/free operations for operator
    lifecycle
+363/-0 
mergeGroup.go
MergeGroup operator for partial result merging with spill

pkg/sql/colexec/group/mergeGroup.go

  • Implements MergeGroup operator for merging partial aggregation results
  • Adds Prepare() and Call() methods with spill support during merge
    phase
  • Deserializes aggregation expressions from batch metadata during first
    merge
  • Handles both grouped and non-grouped aggregation merging with hash
    table insertion
+243/-0 
Enhancement
29 files
result.go
Add distinct tracking and spill serialization to aggregation results

pkg/sql/colexec/aggexec/result.go

  • Adds hasDistinct parameter to result initialization functions for
    distinct value tracking
  • Implements serialization methods marshalToBuffers() and
    marshalChunkToBuffer() for spill support
  • Adds deserialization from reader via unmarshalFromReader() for loading
    spilled data
  • Adds distinctFill() and distinctMerge() methods to handle distinct
    aggregation during merge
+193/-24
batch.go
Batch serialization refactoring for spill support               

pkg/container/batch/batch.go

  • Replaces Aggs field serialization with ExtraBuf1 and ExtraBuf2 for
    flexible metadata storage
  • Adds UnmarshalFromReader() method for streaming deserialization from
    I/O readers
  • Simplifies MarshalBinary() to delegate to MarshalBinaryWithBuffer()
  • Updates IsEmpty() check to only consider row count, not aggregates
+80/-92 
types.go
Aggregation function serialization interface and helpers 

pkg/sql/colexec/aggexec/types.go

  • Adds interface methods SaveIntermediateResult(),
    SaveIntermediateResultOfChunk(), and UnmarshalFromReader() to
    AggFuncExec
  • Implements generic marshaling helpers marshalRetAndGroupsToBuffer()
    and marshalChunkToBuffer() for spill serialization
  • Adds AggFuncExecExpression serialization via MarshalToBuffer() and
    UnmarshalFromReader()
  • Provides generic unmarshaling helper unmarshalFromReader() for
    deserialization
+213/-0 
concat.go
GROUP_CONCAT distinct handling refactoring for spill         

pkg/sql/colexec/aggexec/concat.go

  • Moves distinct hash handling from groupConcatExec to result object via
    distinctFill() and distinctMerge()
  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for spill serialization
  • Adds UnmarshalFromReader() for loading spilled group_concat state
  • Updates initialization to pass hasDistinct flag to result object
+53/-34 
approx_count.go
APPROX_COUNT spill serialization support                                 

pkg/sql/colexec/aggexec/approx_count.go

  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for HyperLogLog sketch serialization
  • Adds UnmarshalFromReader() to deserialize sketch groups from spilled
    data
  • Updates initialization to pass hasDistinct flag (false) to result
    object
  • Adds validation that distinct data is nil in marshal operations
+58/-6   
count.go
COUNT aggregation distinct and spill serialization             

pkg/sql/colexec/aggexec/count.go

  • Moves distinct hash from countColumnExec to result object via
    distinctFill()
  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for spill serialization
  • Adds UnmarshalFromReader() for loading spilled count state
  • Updates countStarExec with spill serialization methods and validation
+52/-26 
window.go
Window function spill serialization support                           

pkg/sql/colexec/aggexec/window.go

  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for rank/dense_rank/row_number
    serialization
  • Adds UnmarshalFromReader() to deserialize window function groups from
    spilled data
  • Defines i64Slice type with MarshalBinary() for int64 slice
    serialization
  • Updates initialization to pass hasDistinct=false to result object
+55/-6   
median.go
MEDIAN aggregation spill serialization support                     

pkg/sql/colexec/aggexec/median.go

  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for median value serialization
  • Adds UnmarshalFromReader() to deserialize median groups from spilled
    data
  • Moves distinct handling to result object initialization via
    hasDistinct parameter
  • Removes separate distinctHash field from median executor
+50/-6   
fromFixedRetFixed.go
Generic fixed-to-fixed aggregation spill support                 

pkg/sql/colexec/aggexec/fromFixedRetFixed.go

  • Implements SaveIntermediateResult() and
    SaveIntermediateResultOfChunk() for generic fixed-to-fixed aggregation
    serialization
  • Adds UnmarshalFromReader() to deserialize group contexts from spilled
    data
  • Updates initialization to pass hasDistinct flag to result object
  • Adds validation that distinct data is nil in marshal operations
+33/-4   
fromFixedRetBytes.go
Add spill support and distinct handling to fixed-to-bytes aggregator

pkg/sql/colexec/aggexec/fromFixedRetBytes.go

  • Added imports for bytes, io, and moerr packages
  • Modified marshal() to handle a new dist return value from
    marshalToBytes() with validation
  • Added three new methods: SaveIntermediateResult(),
    SaveIntermediateResultOfChunk(), and UnmarshalFromReader() for spill
    support
  • Updated unmarshal() to pass nil for distinct parameter
  • Modified init() to pass info.IsDistinct() flag to
    initAggResultWithBytesTypeResult() and removed separate distinct hash
    initialization
+35/-13 
fromBytesRetFixed.go
Add spill support to bytes-to-fixed aggregator                     

pkg/sql/colexec/aggexec/fromBytesRetFixed.go

  • Added imports for bytes, io, and moerr packages
  • Modified marshal() to handle new dist return value with validation
  • Added three new methods: SaveIntermediateResult(),
    SaveIntermediateResultOfChunk(), and UnmarshalFromReader() for spill
    support
  • Updated unmarshal() to pass nil for distinct parameter
  • Modified init() to pass false for distinct flag to
    initAggResultWithFixedTypeResult()
+31/-3   
fromBytesRetBytes.go
Add spill support to bytes-to-bytes aggregator                     

pkg/sql/colexec/aggexec/fromBytesRetBytes.go

  • Added imports for bytes, io, and moerr packages
  • Modified marshal() to handle new dist return value with validation
  • Updated unmarshal() to pass nil for distinct parameter
  • Added three new methods: SaveIntermediateResult(),
    SaveIntermediateResultOfChunk(), and UnmarshalFromReader() for spill
    support
  • Modified init() to pass false for distinct flag to
    initAggResultWithBytesTypeResult()
+34/-3   
encoding.go
Add I/O utility functions for type serialization                 

pkg/container/types/encoding.go

  • Added import for mpool package
  • Added utility functions for reading/writing primitive types and bytes:
    WriteSizeBytes(), ReadInt64(), WriteInt64(), ReadBool(), ReadInt32(),
    WriteInt32(), ReadInt32AsInt(), ReadByte(), ReadByteAsInt(),
    ReadType(), ReadSizeBytes()
  • These functions support serialization/deserialization with optional
    memory pool allocation
+104/-0 
vector.go
Add reader-based vector deserialization method                     

pkg/container/vector/vector.go

  • Added io import
  • Added new method UnmarshalWithReader() that deserializes a vector from
    an io.Reader with support for memory pool allocation
  • Reads vector metadata (class, type, length) and data buffers (data,
    area, nsp) sequentially from reader
+56/-0   
var_pop.go
Add MarshalBinary methods to var_pop aggregation contexts

pkg/sql/plan/function/agg/var_pop.go

  • Reordered imports to place math before other imports
  • Added MarshalBinary() method to aggVarPopGroupContext,
    aggVarPopOfDecimalGroupContext, and aggVarPopOfDecimalCommonContext
    types
+5/-1     
evalExpression.go
Add ExprEvalVector type for expression evaluation               

pkg/sql/colexec/evalExpression.go

  • Added new ExprEvalVector struct type with fields for executors,
    vectors, and types
  • Added MakeEvalVector() function to create and initialize
    ExprEvalVector from plan expressions
  • Added Free() and ResetForNextQuery() methods to ExprEvalVector for
    resource management
+40/-0   
avg.go
Add MarshalBinary methods to avg aggregation contexts       

pkg/sql/plan/function/agg/avg.go

  • Added MarshalBinary() method to aggAvgContext type
  • Added MarshalBinary() method to aggAvgDecimalCommonCtx type
  • Reformatted method signatures for alignment
+5/-3     
distinct.go
Add reader-based deserialization for distinct hash             

pkg/sql/colexec/aggexec/distinct.go

  • Added marshalToBuffers() method to serialize distinct hash maps based
    on flags
  • Refactored unmarshal() to delegate to new unmarshalFromReader() method
  • Added unmarshalFromReader() method to deserialize from io.Reader
    instead of byte slice
+15/-0   
avg_tw_result.go
Add MarshalBinary methods to avg_tw_result contexts           

pkg/sql/plan/function/agg/avg_tw_result.go

  • Added MarshalBinary() method to AvgTwResultContext type
  • Added MarshalBinary() method to AvgTwResultDecimalContext type
+2/-1     
avg_tw_cache.go
Add MarshalBinary methods to avg_tw_cache contexts             

pkg/sql/plan/function/agg/avg_tw_cache.go

  • Added MarshalBinary() method to AvgTwCacheContext type
  • Added MarshalBinary() method to AvgTwCacheDecimalContext type
+3/-0     
sum.go
Add MarshalBinary method to sum aggregation context           

pkg/sql/plan/function/agg/sum.go

  • Added MarshalBinary() method to aggSumDecimal type
  • Reformatted method signatures for alignment
+3/-2     
types.go
Add AllGroupHash method to HashMap interface                         

pkg/common/hashmap/types.go

  • Added AllGroupHash() method signature to HashMap interface
  • Method returns all (group, hashCode) pairs as a slice of uint64
+2/-0     
string_hash_map.go
Implement AllGroupHash for StringHashMap                                 

pkg/container/hashtable/string_hash_map.go

  • Added AllGroupHash() method implementation that returns hash codes for
    all mapped groups
+12/-0   
int64_hash_map.go
Implement AllGroupHash for Int64HashMap                                   

pkg/container/hashtable/int64_hash_map.go

  • Added AllGroupHash() method implementation that returns hash codes for
    all mapped groups
+12/-0   
aggContext.go
Add method to retrieve group contexts for marshalling       

pkg/sql/colexec/aggexec/aggContext.go

  • Added getGroupContextBinaryMarshaller() method to retrieve group
    contexts for binary marshalling
+7/-0     
aggMethod.go
Add MarshalBinary to AggCanMarshal interface                         

pkg/sql/colexec/aggexec/aggMethod.go

  • Added MarshalBinary() ([]byte, error) method signature to
    AggCanMarshal interface
+1/-0     
inthashmap.go
Implement AllGroupHash for IntHashMap                                       

pkg/common/hashmap/inthashmap.go

  • Added AllGroupHash() method that delegates to underlying hash map
    implementation
+4/-0     
strhashmap.go
Implement AllGroupHash for StrHashMap                                       

pkg/common/hashmap/strhashmap.go

  • Added AllGroupHash() method that delegates to underlying hash map
    implementation
+4/-0     
bitmap.go
Add MarshalBinary method to bitmap aggregation context     

pkg/sql/plan/function/agg/bitmap.go

  • Added MarshalBinary() method to aggBitmapGroupContext type
+1/-0     
Tests
4 files
result_test.go
Test updates for distinct parameter in result functions   

pkg/sql/colexec/aggexec/result_test.go

  • Updates test calls to pass hasDistinct parameter to result
    initialization functions
  • Updates marshalToBytes() calls to handle third return value for
    distinct data
  • Updates unmarshalFromBytes() calls to pass nil for distinct data
    parameter
  • Fixes import ordering (testing import moved to top)
+17/-14 
remoterun_test.go
Update test to use group.MergeGroup                                           

pkg/sql/compile/remoterun_test.go

  • Removed import of mergegroup package
  • Changed &mergegroup.MergeGroup{} to &group.MergeGroup{} in test data
+1/-18   
batch_test.go
Update batch tests for extra buffer fields                             

pkg/container/batch/batch_test.go

  • Reordered imports to place testing before other imports
  • Added assertions in test to verify ExtraBuf1 and ExtraBuf2 fields are
    preserved during marshal/unmarshal
  • Removed aggregation function setup from newBatch() helper
  • Added initialization of ExtraBuf1 and ExtraBuf2 fields in test batch
+9/-3     
aggFrame_test.go
Add MarshalBinary method to test context                                 

pkg/sql/colexec/aggexec/aggFrame_test.go

  • Added MarshalBinary() method to avgDemoCtx test type
  • Reformatted method signatures for alignment
+4/-3     
Refactoring
7 files
window.go
Window operator refactoring for expression evaluation       

pkg/sql/colexec/window/window.go

  • Changes aggVecs and orderVecs from group.ExprEvalVector to
    colexec.ExprEvalVector
  • Replaces ctr.bat.Aggs with ctr.batAggs to separate batch data from
    aggregation state
  • Updates MakeEvalVector() calls to use colexec package instead of group
    package
  • Removes dependency on group package for expression evaluation
+12/-13 
remoterun.go
Migrate MergeGroup from mergegroup to group package           

pkg/sql/compile/remoterun.go

  • Removed import of mergegroup package
  • Changed *mergegroup.MergeGroup references to *group.MergeGroup
  • Updated mergegroup.NewArgument() call to group.NewArgumentMergeGroup()
  • Updated function signatures for EncodeMergeGroup() and
    DecodeMergeGroup() to use *group.MergeGroup
+5/-6     
scope.go
Migrate MergeGroup references and simplify optimization logic

pkg/sql/compile/scope.go

  • Removed import of mergegroup package
  • Changed findMergeGroup() return type from *mergegroup.MergeGroup to
    *group.MergeGroup
  • Updated type assertion from *mergegroup.MergeGroup to
    *group.MergeGroup
  • Simplified aggOptimize() logic by removing the assignment of
    PartialResults and PartialResultTypes to mergeGroup
+3/-7     
types.go
Refactor window container to use colexec.ExprEvalVector   

pkg/sql/colexec/window/types.go

  • Reordered imports to place colexec before aggexec
  • Removed import of group package
  • Added batAggs field to container struct for storing aggregation
    functions
  • Changed orderVecs and aggVecs types from []group.ExprEvalVector to
    []colexec.ExprEvalVector
  • Updated freeAggFun() to use ctr.batAggs instead of ctr.bat.Aggs
+7/-6     
operator.go
Migrate MergeGroup construction to group package                 

pkg/sql/compile/operator.go

  • Removed import of mergegroup package
  • Changed constructMergeGroup() return type from *mergegroup.MergeGroup
    to *group.MergeGroup
  • Updated mergegroup.NewArgument() call to group.NewArgumentMergeGroup()
+2/-3     
compile.go
Update MergeGroup type reference in compile                           

pkg/sql/compile/compile.go

  • Removed import of mergegroup package
  • Changed *mergegroup.MergeGroup type assertion to *group.MergeGroup in
    compileProjection() method
+1/-2     
types.go
Replace Aggs field with extra buffer fields in Batch         

pkg/container/batch/types.go

  • Removed import of aggexec package
  • Replaced Aggs []aggexec.AggFuncExec field with ExtraBuf1 []byte and
    ExtraBuf2 []byte fields
  • Reformatted struct field alignment
+3/-4     
Formatting
2 files
buffer.go
Reorder imports for consistency                                                   

pkg/container/pSpool/buffer.go

  • Reordered imports to place sync before other imports
+2/-7     
copy.go
Reorder imports for consistency                                                   

pkg/container/pSpool/copy.go

  • Reordered imports to place math before other imports
+2/-4     
Bug fix
2 files
process.go
Update spill file service to return LocalFS type                 

pkg/vm/process/process.go

  • Added import for moerr package
  • Changed GetSpillFileService() return type from
    fileservice.MutableFileService to *fileservice.LocalFS
  • Added type assertion and validation to ensure spillfs is a
    *fileservice.LocalFS instance
+10/-2   
index_util.go
Fix hash code slice allocation logic                                         

pkg/sql/util/index_util.go

  • Fixed hash code slice capacity check from rowCount > cap(hashCode) to
    rowCount > len(hashCode)
  • Added else clause to slice hashCode to exact rowCount when capacity is
    sufficient
+3/-1     
Additional files
9 files
exec.go +0/-503 
exec_test.go +0/-600 
execctx.go +0/-419 
execctx_test.go +0/-96   
types.go +0/-240 
exec.go +0/-221 
exec_test.go +0/-219 
types.go +0/-101 
top.go +0/-1     

@mergify mergify bot added kind/bug Something isn't working kind/feature labels Oct 25, 2025
@qodo-merge-pro qodo-merge-pro bot changed the title Fengttt spill doit. Fengttt spill doit. Oct 25, 2025
@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 25, 2025

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Insecure spill I/O

Description: Spill files are created with predictable names (UUID + bucket index) and written without
explicit fsync/sync or permission hardening, and error handling ignores Write return
value, risking partial writes and potential data corruption or leakage if the spill file
service maps to shared storage.
helper.go [160-241]

Referred Code
	return count, kth
}

func (ctr *container) spillDataToDisk(proc *process.Process, parentBkt *spillBucket) error {
	// we only allow to spill up to spillMaxPass passes.
	// each pass we take spillMaskBits bits from the hashCode, and use them as the index
	// to select the spill bucket.  Default params, 32^3 = 32768 spill buckets -- if this
	// is still not enough, probably we cannot do much anyway, just fail the query.
	if parentBkt.lv >= spillMaxPass {
		return moerr.NewInternalError(proc.Ctx, "spill level too deep")
	}

	lv := parentBkt.lv + 1

	// initialing spill structure.
	if len(parentBkt.again) == 0 {
		parentBkt.again = make([]spillBucket, spillNumBuckets)
		fnuuid, _ := uuid.NewV7()

		// log the spill file name.
		logutil.Infof("spilling data to disk, level %d, file %s", parentBkt.lv, fnuuid.String())


 ... (clipped 61 lines)
Untrusted input parsing

Description: Loading spilled data does not validate sizes beyond trusting encoded lengths and lacks
robust EOF checks, which could allow malformed spill files to cause excessive memory
allocation or panic when reading from untrusted or corrupted storage.
helper.go [309-370]

Referred Code
	var next int
	var nextBkt *spillBucket

	for next = 0; next < len(bkt.again); next++ {
		nextBkt = bkt.again[next].nextToLoadBkt()
		if nextBkt != nil {
			break
		}
	}

	if nextBkt == nil {
		bkt.again = nil
		return nil
	}

	bkt.again = bkt.again[next:]
	return nextBkt
}

func (bkt *spillBucket) free(proc *process.Process) {
	if bkt.file != nil {


 ... (clipped 41 lines)
Unbounded aggregation growth

Description: GROUP_CONCAT concatenates unbounded strings without enforcing a maximum length, enabling
potential memory exhaustion when processing large distinct sets or long separators.
concat.go [146-155]

Referred Code
x, y := exec.ret.updateNextAccessIdx(groupIndex)
exec.ret.setGroupNotEmpty(x, y)

if need, err := exec.ret.distinctFill(x, y, vectors, row); err != nil || !need {
	return err
}

r := exec.ret.get()
if len(r) > 0 {
	r = append(r, exec.separator...)
Ticket Compliance
🎫 No ticket provided
- [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true -->

</details></td></tr>
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 25, 2025

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect spill bucket calculation

Fix the incorrect bitmask in the spill bucket calculation to ensure proper data
distribution across buckets by using (1 << spillMaskBits) - 1.

pkg/sql/colexec/group/helper.go [197-201]

 	// compute spill bucket.
 	hashCodes := ctr.hr.Hash.AllGroupHash()
 	for i, hashCode := range hashCodes {
-		hashCodes[i] = (hashCode >> (64 - spillMaskBits*lv)) & (spillMaskBits - 1)
+		hashCodes[i] = (hashCode >> (64 - spillMaskBits*lv)) & ((1 << spillMaskBits) - 1)
 	}
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug in the spill bucket calculation where an incorrect bitmask (spillMaskBits - 1) would lead to poor data distribution, and provides the correct fix.

High
Prevent panic from out-of-bounds access

Correct the boundary check for group.GroupingFlag to prevent a potential panic
from an out-of-bounds slice access.

pkg/sql/colexec/group/types2.go [106-109]

 	// FUBAR: check if the grouping flag length is too big,
-	if len(group.ctr.groupByEvaluate.Vec) >= len(group.GroupingFlag) {
-		return moerr.NewInternalErrorNoCtx("grouping flag length too big")
+	if len(group.ctr.groupByEvaluate.Vec) > len(group.GroupingFlag) {
+		return moerr.NewInternalErrorNoCtx("mismatch between grouping expressions and grouping flags")
 	}
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a logic error in a boundary check that could lead to a panic due to an out-of-bounds slice access and provides a correct fix.

Medium
Handle potential error from write

The WriteInt64 function should return the error from the underlying w.Write call
instead of always returning nil.

pkg/container/types/encoding.go [579-582]

 func WriteInt64(w io.Writer, v int64) error {
-	w.Write(EncodeInt64(&v))
-	return nil
+	_, err := w.Write(EncodeInt64(&v))
+	return err
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that an error from w.Write is ignored, which could lead to silent failures, and the proposed fix correctly propagates the error.

Medium
General
Correctly free and clear aggregators
Suggestion Impact:The commit changed loadSpilledData to stop using the ineffective loop variable nil assignment by replacing the cleanup with a call to ctr.resetForSpill(proc), and later reconstructing aggList and spillAggList. This achieves the intended effect of correctly freeing and resetting aggregators, albeit via a helper rather than explicit index loops and slice nil-ing.

code diff:

+	// popped bkt must be freed.
 	bkt := ctr.spillBkts.PopBack().Value
+	defer bkt.free(proc)
+
 	// reposition to the start of the file.
 	bkt.file.Seek(0, io.SeekStart)
 
-	// reset data structures,
-	ctr.hr.Free0()
-
-	// reset rowcount group by batches.
-	for _, gb := range ctr.groupByBatches {
-		gb.SetRowCount(0)
-	}
-	ctr.currBatchIdx = 0
-
-	// Free, will clean the resource and reuse the aggregation.
-	for _, ag := range ctr.aggList {
-		ag.Free()
-	}
-	for _, ag := range ctr.spillAggList {
-		ag.Free()
-	}
-
-	gbBatch := ctr.createNewGroupByBatch(proc, ctr.groupByBatches[0].Vecs, aggBatchSize)
+	ctr.resetForSpill(proc)
+	ctr.aggList, err = ctr.makeAggList(proc, aggExprs)
+	if err != nil {
+		return false, err
+	}
+	ctr.spillAggList, err = ctr.makeAggList(proc, aggExprs)
+	if err != nil {
+		return false, err
+	}
+
+	gbBatch := ctr.createNewGroupByBatch(proc, nil, aggBatchSize)
+	totalCnt := int64(0)
 
 	for {

Refactor the aggregator cleanup logic in loadSpilledData to correctly free
resources and then set the entire slice to nil, removing the ineffective
assignment to the loop variable.

pkg/sql/colexec/group/helper.go [363-370]

-	for _, ag := range ctr.aggList {
-		ag.Free()
-		ag = nil
+	for i := range ctr.aggList {
+		ctr.aggList[i].Free()
 	}
-	for _, ag := range ctr.spillAggList {
-		ag.Free()
-		ag = nil
+	ctr.aggList = nil
+
+	for i := range ctr.spillAggList {
+		ctr.spillAggList[i].Free()
 	}
+	ctr.spillAggList = nil

[Suggestion processed]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out that assigning nil to a loop variable is ineffective and misleading, proposing a cleaner way to free resources and clear the slices, which improves code clarity and correctness.

Low
  • Update

@matrix-meow matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Oct 26, 2025
@fengttt fengttt force-pushed the fengttt-spill-doit branch from 1463d51 to 54c01cb Compare October 27, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/feature Review effort 5/5 size/XL Denotes a PR that changes [1000, 1999] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants