Fengttt spill doit. #22686

fengttt · 2025-10-25T10:17:55Z

User description

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

HashAgg spill. Refactor group and merge group code. Now if agg group mem usage
reached a threshold, we will spill to disk and read it back.

merge group is simply spill to network, and read it back from network.

PR Type

Bug fix, Enhancement, Tests

Description

Implements comprehensive HashAgg spill-to-disk functionality with memory management and multi-pass support
Adds new Group and MergeGroup operators with state machine architecture (Build/Eval/End states) for handling group-by and aggregation operations
Implements bucket-based partitioning and spill file I/O with serialization/deserialization of aggregation state
Adds distinct value tracking and serialization support across all aggregation functions (COUNT, SUM, AVG, MEDIAN, GROUP_CONCAT, APPROX_COUNT, window functions, etc.)
Refactors batch serialization to use flexible ExtraBuf1 and ExtraBuf2 fields instead of direct Aggs field storage
Migrates MergeGroup operator from separate mergegroup package to unified group package
Adds reader-based vector and batch deserialization for streaming I/O support
Implements AllGroupHash() method across all hash map implementations for spill bucket computation
Adds utility functions for primitive type serialization/deserialization in encoding module
Fixes hash code slice allocation logic and spill file service type handling
Updates window operator to use colexec.ExprEvalVector instead of group.ExprEvalVector
Adds MarshalBinary() methods to aggregation context types for serialization support

Diagram Walkthrough

flowchart LR
  A["Input Batches"] -->|"Build Phase"| B["Group Operator<br/>Hash Table"]
  B -->|"Memory Exceeded"| C["Spill to Disk<br/>Bucket Partitioning"]
  C -->|"Serialize"| D["Spill Files"]
  D -->|"Load & Merge"| E["MergeGroup Operator"]
  E -->|"Deserialize"| F["Aggregation Results"]
  B -->|"Eval Phase"| G["Aggregation Exec<br/>Distinct Tracking"]
  G -->|"End Phase"| F
  H["Distinct Hash"] -->|"Serialize"| I["Extra Buffers"]
  I -->|"Deserialize"| J["Merged Results"]

File Walkthrough

Relevant files

Feature

4 files

exec2.go `New HashAgg spill implementation with state machine` pkg/sql/colexec/group/exec2.go Implements new HashAgg spill functionality with memory management and disk spillover Adds `Prepare()` and `Call()` methods for group operator state machine (Build/Eval/End states) Implements batch building with hash table insertion and aggregation filling Adds spill-to-disk mechanism with bucket-based partitioning and multi-pass support	+530/-0
helper.go `Spill file I/O and hash table helper functions` pkg/sql/colexec/group/helper.go Implements `ResHashRelated` struct for hash table management with insert tracking Adds `spillDataToDisk()` to serialize group-by batches and aggregates to spill files Implements `loadSpilledData()` to deserialize and merge spilled data back into memory Adds helper functions for bucket computation and spill bucket traversal	+424/-0
types2.go `Group operator type definitions and container management` pkg/sql/colexec/group/types2.go Defines `Group` and `MergeGroup` operator types with spill memory configuration Implements container struct with hash table, group-by batches, and spill bucket management Adds evaluation methods for group-by and aggregation arguments Provides memory management and reset/free operations for operator lifecycle	+363/-0
mergeGroup.go `MergeGroup operator for partial result merging with spill` pkg/sql/colexec/group/mergeGroup.go Implements `MergeGroup` operator for merging partial aggregation results Adds `Prepare()` and `Call()` methods with spill support during merge phase Deserializes aggregation expressions from batch metadata during first merge Handles both grouped and non-grouped aggregation merging with hash table insertion	+243/-0

Enhancement

29 files

result.go `Add distinct tracking and spill serialization to aggregation results` pkg/sql/colexec/aggexec/result.go Adds `hasDistinct` parameter to result initialization functions for distinct value tracking Implements serialization methods `marshalToBuffers()` and `marshalChunkToBuffer()` for spill support Adds deserialization from reader via `unmarshalFromReader()` for loading spilled data Adds `distinctFill()` and `distinctMerge()` methods to handle distinct aggregation during merge	+193/-24
batch.go `Batch serialization refactoring for spill support` pkg/container/batch/batch.go Replaces `Aggs` field serialization with `ExtraBuf1` and `ExtraBuf2` for flexible metadata storage Adds `UnmarshalFromReader()` method for streaming deserialization from I/O readers Simplifies `MarshalBinary()` to delegate to `MarshalBinaryWithBuffer()` Updates `IsEmpty()` check to only consider row count, not aggregates	+80/-92
types.go `Aggregation function serialization interface and helpers` pkg/sql/colexec/aggexec/types.go Adds interface methods `SaveIntermediateResult()`, `SaveIntermediateResultOfChunk()`, and `UnmarshalFromReader()` to `AggFuncExec` Implements generic marshaling helpers `marshalRetAndGroupsToBuffer()` and `marshalChunkToBuffer()` for spill serialization Adds `AggFuncExecExpression` serialization via `MarshalToBuffer()` and `UnmarshalFromReader()` Provides generic unmarshaling helper `unmarshalFromReader()` for deserialization	+213/-0
concat.go `GROUP_CONCAT distinct handling refactoring for spill` pkg/sql/colexec/aggexec/concat.go Moves distinct hash handling from `groupConcatExec` to result object via `distinctFill()` and `distinctMerge()` Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for spill serialization Adds `UnmarshalFromReader()` for loading spilled group_concat state Updates initialization to pass `hasDistinct` flag to result object	+53/-34
approx_count.go `APPROX_COUNT spill serialization support` pkg/sql/colexec/aggexec/approx_count.go Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for HyperLogLog sketch serialization Adds `UnmarshalFromReader()` to deserialize sketch groups from spilled data Updates initialization to pass `hasDistinct` flag (false) to result object Adds validation that distinct data is nil in marshal operations	+58/-6
count.go `COUNT aggregation distinct and spill serialization` pkg/sql/colexec/aggexec/count.go Moves distinct hash from `countColumnExec` to result object via `distinctFill()` Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for spill serialization Adds `UnmarshalFromReader()` for loading spilled count state Updates `countStarExec` with spill serialization methods and validation	+52/-26
window.go `Window function spill serialization support` pkg/sql/colexec/aggexec/window.go Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for rank/dense_rank/row_number serialization Adds `UnmarshalFromReader()` to deserialize window function groups from spilled data Defines `i64Slice` type with `MarshalBinary()` for int64 slice serialization Updates initialization to pass `hasDistinct=false` to result object	+55/-6
median.go `MEDIAN aggregation spill serialization support` pkg/sql/colexec/aggexec/median.go Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for median value serialization Adds `UnmarshalFromReader()` to deserialize median groups from spilled data Moves distinct handling to result object initialization via `hasDistinct` parameter Removes separate `distinctHash` field from median executor	+50/-6
fromFixedRetFixed.go `Generic fixed-to-fixed aggregation spill support` pkg/sql/colexec/aggexec/fromFixedRetFixed.go Implements `SaveIntermediateResult()` and `SaveIntermediateResultOfChunk()` for generic fixed-to-fixed aggregation serialization Adds `UnmarshalFromReader()` to deserialize group contexts from spilled data Updates initialization to pass `hasDistinct` flag to result object Adds validation that distinct data is nil in marshal operations	+33/-4
fromFixedRetBytes.go `Add spill support and distinct handling to fixed-to-bytes aggregator` pkg/sql/colexec/aggexec/fromFixedRetBytes.go Added imports for `bytes`, `io`, and `moerr` packages Modified `marshal()` to handle a new `dist` return value from `marshalToBytes()` with validation Added three new methods: `SaveIntermediateResult()`, `SaveIntermediateResultOfChunk()`, and `UnmarshalFromReader()` for spill support Updated `unmarshal()` to pass `nil` for distinct parameter Modified `init()` to pass `info.IsDistinct()` flag to `initAggResultWithBytesTypeResult()` and removed separate distinct hash initialization	+35/-13
fromBytesRetFixed.go `Add spill support to bytes-to-fixed aggregator` pkg/sql/colexec/aggexec/fromBytesRetFixed.go Added imports for `bytes`, `io`, and `moerr` packages Modified `marshal()` to handle new `dist` return value with validation Added three new methods: `SaveIntermediateResult()`, `SaveIntermediateResultOfChunk()`, and `UnmarshalFromReader()` for spill support Updated `unmarshal()` to pass `nil` for distinct parameter Modified `init()` to pass `false` for distinct flag to `initAggResultWithFixedTypeResult()`	+31/-3
fromBytesRetBytes.go `Add spill support to bytes-to-bytes aggregator` pkg/sql/colexec/aggexec/fromBytesRetBytes.go Added imports for `bytes`, `io`, and `moerr` packages Modified `marshal()` to handle new `dist` return value with validation Updated `unmarshal()` to pass `nil` for distinct parameter Added three new methods: `SaveIntermediateResult()`, `SaveIntermediateResultOfChunk()`, and `UnmarshalFromReader()` for spill support Modified `init()` to pass `false` for distinct flag to `initAggResultWithBytesTypeResult()`	+34/-3
encoding.go `Add I/O utility functions for type serialization` pkg/container/types/encoding.go Added import for `mpool` package Added utility functions for reading/writing primitive types and bytes: `WriteSizeBytes()`, `ReadInt64()`, `WriteInt64()`, `ReadBool()`, `ReadInt32()`, `WriteInt32()`, `ReadInt32AsInt()`, `ReadByte()`, `ReadByteAsInt()`, `ReadType()`, `ReadSizeBytes()` These functions support serialization/deserialization with optional memory pool allocation	+104/-0
vector.go `Add reader-based vector deserialization method` pkg/container/vector/vector.go Added `io` import Added new method `UnmarshalWithReader()` that deserializes a vector from an `io.Reader` with support for memory pool allocation Reads vector metadata (class, type, length) and data buffers (data, area, nsp) sequentially from reader	+56/-0
var_pop.go `Add MarshalBinary methods to var_pop aggregation contexts` pkg/sql/plan/function/agg/var_pop.go Reordered imports to place `math` before other imports Added `MarshalBinary()` method to `aggVarPopGroupContext`, `aggVarPopOfDecimalGroupContext`, and `aggVarPopOfDecimalCommonContext` types	+5/-1
evalExpression.go `Add ExprEvalVector type for expression evaluation` pkg/sql/colexec/evalExpression.go Added new `ExprEvalVector` struct type with fields for executors, vectors, and types Added `MakeEvalVector()` function to create and initialize `ExprEvalVector` from plan expressions Added `Free()` and `ResetForNextQuery()` methods to `ExprEvalVector` for resource management	+40/-0
avg.go `Add MarshalBinary methods to avg aggregation contexts` pkg/sql/plan/function/agg/avg.go Added `MarshalBinary()` method to `aggAvgContext` type Added `MarshalBinary()` method to `aggAvgDecimalCommonCtx` type Reformatted method signatures for alignment	+5/-3
distinct.go `Add reader-based deserialization for distinct hash` pkg/sql/colexec/aggexec/distinct.go Added `marshalToBuffers()` method to serialize distinct hash maps based on flags Refactored `unmarshal()` to delegate to new `unmarshalFromReader()` method Added `unmarshalFromReader()` method to deserialize from `io.Reader` instead of byte slice	+15/-0
avg_tw_result.go `Add MarshalBinary methods to avg_tw_result contexts` pkg/sql/plan/function/agg/avg_tw_result.go Added `MarshalBinary()` method to `AvgTwResultContext` type Added `MarshalBinary()` method to `AvgTwResultDecimalContext` type	+2/-1
avg_tw_cache.go `Add MarshalBinary methods to avg_tw_cache contexts` pkg/sql/plan/function/agg/avg_tw_cache.go Added `MarshalBinary()` method to `AvgTwCacheContext` type Added `MarshalBinary()` method to `AvgTwCacheDecimalContext` type	+3/-0
sum.go `Add MarshalBinary method to sum aggregation context` pkg/sql/plan/function/agg/sum.go Added `MarshalBinary()` method to `aggSumDecimal` type Reformatted method signatures for alignment	+3/-2
types.go `Add AllGroupHash method to HashMap interface` pkg/common/hashmap/types.go Added `AllGroupHash()` method signature to `HashMap` interface Method returns all (group, hashCode) pairs as a slice of uint64	+2/-0
string_hash_map.go `Implement AllGroupHash for StringHashMap` pkg/container/hashtable/string_hash_map.go Added `AllGroupHash()` method implementation that returns hash codes for all mapped groups	+12/-0
int64_hash_map.go `Implement AllGroupHash for Int64HashMap` pkg/container/hashtable/int64_hash_map.go Added `AllGroupHash()` method implementation that returns hash codes for all mapped groups	+12/-0
aggContext.go `Add method to retrieve group contexts for marshalling` pkg/sql/colexec/aggexec/aggContext.go Added `getGroupContextBinaryMarshaller()` method to retrieve group contexts for binary marshalling	+7/-0
aggMethod.go `Add MarshalBinary to AggCanMarshal interface` pkg/sql/colexec/aggexec/aggMethod.go Added `MarshalBinary() ([]byte, error)` method signature to `AggCanMarshal` interface	+1/-0
inthashmap.go `Implement AllGroupHash for IntHashMap` pkg/common/hashmap/inthashmap.go Added `AllGroupHash()` method that delegates to underlying hash map implementation	+4/-0
strhashmap.go `Implement AllGroupHash for StrHashMap` pkg/common/hashmap/strhashmap.go Added `AllGroupHash()` method that delegates to underlying hash map implementation	+4/-0
bitmap.go `Add MarshalBinary method to bitmap aggregation context` pkg/sql/plan/function/agg/bitmap.go Added `MarshalBinary()` method to `aggBitmapGroupContext` type	+1/-0

Tests

4 files

result_test.go `Test updates for distinct parameter in result functions` pkg/sql/colexec/aggexec/result_test.go Updates test calls to pass `hasDistinct` parameter to result initialization functions Updates `marshalToBytes()` calls to handle third return value for distinct data Updates `unmarshalFromBytes()` calls to pass nil for distinct data parameter Fixes import ordering (testing import moved to top)	+17/-14
remoterun_test.go `Update test to use group.MergeGroup` pkg/sql/compile/remoterun_test.go Removed import of `mergegroup` package Changed `&mergegroup.MergeGroup{}` to `&group.MergeGroup{}` in test data	+1/-18
batch_test.go `Update batch tests for extra buffer fields` pkg/container/batch/batch_test.go Reordered imports to place `testing` before other imports Added assertions in test to verify `ExtraBuf1` and `ExtraBuf2` fields are preserved during marshal/unmarshal Removed aggregation function setup from `newBatch()` helper Added initialization of `ExtraBuf1` and `ExtraBuf2` fields in test batch	+9/-3
aggFrame_test.go `Add MarshalBinary method to test context` pkg/sql/colexec/aggexec/aggFrame_test.go Added `MarshalBinary()` method to `avgDemoCtx` test type Reformatted method signatures for alignment	+4/-3

Refactoring

7 files

window.go `Window operator refactoring for expression evaluation` pkg/sql/colexec/window/window.go Changes `aggVecs` and `orderVecs` from `group.ExprEvalVector` to `colexec.ExprEvalVector` Replaces `ctr.bat.Aggs` with `ctr.batAggs` to separate batch data from aggregation state Updates `MakeEvalVector()` calls to use `colexec` package instead of `group` package Removes dependency on group package for expression evaluation	+12/-13
remoterun.go `Migrate MergeGroup from mergegroup to group package` pkg/sql/compile/remoterun.go Removed import of `mergegroup` package Changed `mergegroup.MergeGroup` references to `group.MergeGroup` Updated `mergegroup.NewArgument()` call to `group.NewArgumentMergeGroup()` Updated function signatures for `EncodeMergeGroup()` and `DecodeMergeGroup()` to use `*group.MergeGroup`	+5/-6
scope.go `Migrate MergeGroup references and simplify optimization logic` pkg/sql/compile/scope.go Removed import of `mergegroup` package Changed `findMergeGroup()` return type from `mergegroup.MergeGroup` to `group.MergeGroup` Updated type assertion from `mergegroup.MergeGroup` to `group.MergeGroup` Simplified `aggOptimize()` logic by removing the assignment of `PartialResults` and `PartialResultTypes` to mergeGroup	+3/-7
types.go `Refactor window container to use colexec.ExprEvalVector` pkg/sql/colexec/window/types.go Reordered imports to place `colexec` before `aggexec` Removed import of `group` package Added `batAggs` field to `container` struct for storing aggregation functions Changed `orderVecs` and `aggVecs` types from `[]group.ExprEvalVector` to `[]colexec.ExprEvalVector` Updated `freeAggFun()` to use `ctr.batAggs` instead of `ctr.bat.Aggs`	+7/-6
operator.go `Migrate MergeGroup construction to group package` pkg/sql/compile/operator.go Removed import of `mergegroup` package Changed `constructMergeGroup()` return type from `mergegroup.MergeGroup` to `group.MergeGroup` Updated `mergegroup.NewArgument()` call to `group.NewArgumentMergeGroup()`	+2/-3
compile.go `Update MergeGroup type reference in compile` pkg/sql/compile/compile.go Removed import of `mergegroup` package Changed `mergegroup.MergeGroup` type assertion to `group.MergeGroup` in `compileProjection()` method	+1/-2
types.go `Replace Aggs field with extra buffer fields in Batch` pkg/container/batch/types.go Removed import of `aggexec` package Replaced `Aggs []aggexec.AggFuncExec` field with `ExtraBuf1 []byte` and `ExtraBuf2 []byte` fields Reformatted struct field alignment	+3/-4

Formatting

2 files

buffer.go `Reorder imports for consistency` pkg/container/pSpool/buffer.go Reordered imports to place `sync` before other imports	+2/-7
copy.go `Reorder imports for consistency` pkg/container/pSpool/copy.go Reordered imports to place `math` before other imports	+2/-4

Bug fix

2 files

process.go `Update spill file service to return LocalFS type` pkg/vm/process/process.go Added import for `moerr` package Changed `GetSpillFileService()` return type from `fileservice.MutableFileService` to `fileservice.LocalFS` Added type assertion and validation to ensure spillfs is a `fileservice.LocalFS` instance	+10/-2
index_util.go `Fix hash code slice allocation logic` pkg/sql/util/index_util.go Fixed hash code slice capacity check from `rowCount > cap(hashCode)` to `rowCount > len(hashCode)` Added else clause to slice `hashCode` to exact rowCount when capacity is sufficient	+3/-1

Additional files

9 files

exec.go	+0/-503
exec_test.go	+0/-600
execctx.go	+0/-419
execctx_test.go	+0/-96
types.go	+0/-240
exec.go	+0/-221
exec_test.go	+0/-219
types.go	+0/-101
top.go	+0/-1

qodo-merge-pro · 2025-10-25T10:18:59Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Insecure spill I/O Description: Spill files are created with predictable names (UUID + bucket index) and written without explicit fsync/sync or permission hardening, and error handling ignores Write return value, risking partial writes and potential data corruption or leakage if the spill file service maps to shared storage. helper.go [160-241] Referred Code return count, kth } func (ctr container) spillDataToDisk(proc process.Process, parentBkt *spillBucket) error { // we only allow to spill up to spillMaxPass passes. // each pass we take spillMaskBits bits from the hashCode, and use them as the index // to select the spill bucket. Default params, 32^3 = 32768 spill buckets -- if this // is still not enough, probably we cannot do much anyway, just fail the query. if parentBkt.lv >= spillMaxPass { return moerr.NewInternalError(proc.Ctx, "spill level too deep") } lv := parentBkt.lv + 1 // initialing spill structure. if len(parentBkt.again) == 0 { parentBkt.again = make([]spillBucket, spillNumBuckets) fnuuid, _ := uuid.NewV7() // log the spill file name. logutil.Infof("spilling data to disk, level %d, file %s", parentBkt.lv, fnuuid.String()) ... (clipped 61 lines)
	Untrusted input parsing Description: Loading spilled data does not validate sizes beyond trusting encoded lengths and lacks robust EOF checks, which could allow malformed spill files to cause excessive memory allocation or panic when reading from untrusted or corrupted storage. helper.go [309-370] Referred Code var next int var nextBkt spillBucket for next = 0; next < len(bkt.again); next++ { nextBkt = bkt.again[next].nextToLoadBkt() if nextBkt != nil { break } } if nextBkt == nil { bkt.again = nil return nil } bkt.again = bkt.again[next:] return nextBkt } func (bkt spillBucket) free(proc *process.Process) { if bkt.file != nil { ... (clipped 41 lines)
	Unbounded aggregation growth Description: GROUP_CONCAT concatenates unbounded strings without enforcing a maximum length, enabling potential memory exhaustion when processing large distinct sets or long separators. concat.go [146-155] Referred Code x, y := exec.ret.updateNextAccessIdx(groupIndex) exec.ret.setGroupNotEmpty(x, y) if need, err := exec.ret.distinctFill(x, y, vectors, row); err != nil \|\| !need { return err } r := exec.ret.get() if len(r) > 0 { r = append(r, exec.separator...)
Ticket Compliance
⚪	🎫 No ticket provided `- [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true --> </details></td></tr>`
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-merge-pro · 2025-10-25T10:20:06Z

You are nearing your monthly Qodo Merge usage quota. For more information, please visit here.

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix incorrect spill bucket calculation Fix the incorrect bitmask in the spill bucket calculation to ensure proper data distribution across buckets by using `(1 << spillMaskBits) - 1`. pkg/sql/colexec/group/helper.go [197-201] // compute spill bucket. hashCodes := ctr.hr.Hash.AllGroupHash() for i, hashCode := range hashCodes { - hashCodes[i] = (hashCode >> (64 - spillMaskBitslv)) & (spillMaskBits - 1) + hashCodes[i] = (hashCode >> (64 - spillMaskBitslv)) & ((1 << spillMaskBits) - 1) } Apply / Chat Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a critical bug in the spill bucket calculation where an incorrect bitmask (`spillMaskBits - 1`) would lead to poor data distribution, and provides the correct fix.	High
	Prevent panic from out-of-bounds access Correct the boundary check for `group.GroupingFlag` to prevent a potential panic from an out-of-bounds slice access. pkg/sql/colexec/group/types2.go [106-109] // FUBAR: check if the grouping flag length is too big, - if len(group.ctr.groupByEvaluate.Vec) >= len(group.GroupingFlag) { - return moerr.NewInternalErrorNoCtx("grouping flag length too big") + if len(group.ctr.groupByEvaluate.Vec) > len(group.GroupingFlag) { + return moerr.NewInternalErrorNoCtx("mismatch between grouping expressions and grouping flags") } Apply / Chat Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a logic error in a boundary check that could lead to a panic due to an out-of-bounds slice access and provides a correct fix.	Medium
	Handle potential error from write The `WriteInt64` function should return the error from the underlying `w.Write` call instead of always returning `nil`. pkg/container/types/encoding.go [579-582] func WriteInt64(w io.Writer, v int64) error { - w.Write(EncodeInt64(&v)) - return nil + _, err := w.Write(EncodeInt64(&v)) + return err } Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that an error from `w.Write` is ignored, which could lead to silent failures, and the proposed fix correctly propagates the error.	Medium
General	✅ ~~Correctly free and clear aggregators~~ Suggestion Impact: The commit changed loadSpilledData to stop using the ineffective loop variable nil assignment by replacing the cleanup with a call to ctr.resetForSpill(proc), and later reconstructing aggList and spillAggList. This achieves the intended effect of correctly freeing and resetting aggregators, albeit via a helper rather than explicit index loops and slice nil-ing. code diff: + // popped bkt must be freed. bkt := ctr.spillBkts.PopBack().Value + defer bkt.free(proc) + // reposition to the start of the file. bkt.file.Seek(0, io.SeekStart) - // reset data structures, - ctr.hr.Free0() - - // reset rowcount group by batches. - for _, gb := range ctr.groupByBatches { - gb.SetRowCount(0) - } - ctr.currBatchIdx = 0 - - // Free, will clean the resource and reuse the aggregation. - for _, ag := range ctr.aggList { - ag.Free() - } - for _, ag := range ctr.spillAggList { - ag.Free() - } - - gbBatch := ctr.createNewGroupByBatch(proc, ctr.groupByBatches[0].Vecs, aggBatchSize) + ctr.resetForSpill(proc) + ctr.aggList, err = ctr.makeAggList(proc, aggExprs) + if err != nil { + return false, err + } + ctr.spillAggList, err = ctr.makeAggList(proc, aggExprs) + if err != nil { + return false, err + } + + gbBatch := ctr.createNewGroupByBatch(proc, nil, aggBatchSize) + totalCnt := int64(0) for { Refactor the aggregator cleanup logic in `loadSpilledData` to correctly free resources and then set the entire slice to `nil`, removing the ineffective assignment to the loop variable. pkg/sql/colexec/group/helper.go [363-370] - for _, ag := range ctr.aggList { - ag.Free() - ag = nil + for i := range ctr.aggList { + ctr.aggList[i].Free() } - for _, ag := range ctr.spillAggList { - ag.Free() - ag = nil + ctr.aggList = nil + + for i := range ctr.spillAggList { + ctr.spillAggList[i].Free() } + ctr.spillAggList = nil `[Suggestion processed]` Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out that assigning `nil` to a loop variable is ineffective and misleading, proposing a cleaner way to free resources and clear the slices, which improves code clarity and correctness.	Low
Update

Unscrew. Expect tons of bugs ...

Now we have something that compiles, but there will be tons of bugs ...

And fix a few bugs.

yeah, I know ...

fengttt added the do-not-merge/wip label Oct 25, 2025

fengttt requested review from XuPeng-SH, aunjgr, ouyuanning and reusee as code owners October 25, 2025 10:17

mergify bot added kind/bug Something isn't working kind/feature labels Oct 25, 2025

qodo-merge-pro bot changed the title ~~Fengttt spill doit.~~ Fengttt spill doit. Oct 25, 2025

qodo-merge-pro bot added the Review effort 5/5 label Oct 25, 2025

mergify bot had a problem deploying to ci October 26, 2025 00:11 Failure

mergify bot temporarily deployed to ci October 26, 2025 00:11 Inactive

mergify bot had a problem deploying to ci October 26, 2025 00:11 Failure

matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Oct 26, 2025

fengttt force-pushed the fengttt-spill-doit branch from 1463d51 to 54c01cb Compare October 27, 2025 19:13

fengttt had a problem deploying to ci October 27, 2025 19:14 — with GitHub Actions Failure

fengttt temporarily deployed to ci October 27, 2025 19:14 — with GitHub Actions Inactive

fengttt had a problem deploying to ci October 27, 2025 19:14 — with GitHub Actions Failure

aunjgr approved these changes Nov 11, 2025

View reviewed changes

XuPeng-SH approved these changes Nov 12, 2025

View reviewed changes

daviszhen approved these changes Nov 13, 2025

View reviewed changes

fengttt added 15 commits November 17, 2025 11:25

AggSpill 1: Adding save intermediate to agg exec.

e8dc896

Agg distinct is totally screwed.

79f1cb9

Unscrew. Expect tons of bugs ...

AggSpill 2: spill and load.

59cfcc1

AggSpill 3: Save

96417c5

Now we have something that compiles, but there will be tons of bugs ...

Minor fixes, SCA.

fafd298

Fixes. Finally I can run select count(*)

2a57f79

Fixes, more.

3a18101

Make sure spillmem is passed to mergegroup as well.

7a46249

And fix a few bugs.

Bug fixes.

bd4c0e6

Add a max_dop variable.

c78e695

Really spill.

65b5054

More fixes.

5d42ef2

test case last.

220a24d

yeah, I know ...

Clean ups.

5dc781f

Voodoo debugging.

fb96506

fengttt force-pushed the fengttt-spill-doit branch from b05a25f to fb96506 Compare November 17, 2025 19:25

fengttt had a problem deploying to ci November 17, 2025 19:25 — with GitHub Actions Failure

fengttt temporarily deployed to ci November 17, 2025 19:25 — with GitHub Actions Inactive

fengttt had a problem deploying to ci November 17, 2025 19:25 — with GitHub Actions Failure

fengttt temporarily deployed to ci November 17, 2025 19:25 — with GitHub Actions Inactive

fengttt had a problem deploying to ci November 17, 2025 19:25 — with GitHub Actions Failure

fengttt temporarily deployed to ci November 17, 2025 19:25 — with GitHub Actions Inactive

fengttt had a problem deploying to ci November 17, 2025 19:25 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fengttt spill doit. #22686

Fengttt spill doit. #22686

Uh oh!

fengttt commented Oct 25, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

qodo-merge-pro bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Fengttt spill doit. #22686

Are you sure you want to change the base?

Fengttt spill doit. #22686

Uh oh!

Conversation

fengttt commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-merge-pro bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

fengttt commented Oct 25, 2025 •

edited

Loading

qodo-merge-pro bot commented Oct 25, 2025 •

edited

Loading

qodo-merge-pro bot commented Oct 25, 2025 •

edited

Loading