Enforce stored-procedure safety checks on the streaming CALL path by arthurschreiber · Pull Request #20372 · vitessio/vitess

arthurschreiber · 2026-06-23T12:30:05Z

Description

StreamExecute (the OLAP / streaming path) was sending a CALL straight through execStreamSQL, which reads only the first resultset and throws away the terminating status flags. So the two safety checks the buffered Execute path enforces never ran on the streaming path:

a multi-resultset procedure silently returned only its first resultset (truncated/wrong data, no error), and
a procedure that leaked or changed a transaction was silently accepted, leaving the recycled streaming connection in an open transaction.

While digging in I also found that a perfectly fine single-resultset CALL left its trailing OK packet unread on the connection, so the next query that reused that pooled connection could blow up with an out-of-sequence error. The new test ordering (a single-resultset call followed by a multi-resultset one) reproduces that.

How it works

The streaming protocol now records whether a query returned a resultset and, when it didn't, captures the status flags of the lone OK packet (the only place a no-resultset procedure's IN_TRANS flag is observable). A streamed CALL is then verified after streaming finishes: we read exactly one more result — with FETCH_NO_ROWS, so nothing is buffered — to tell a single-resultset call (only the trailing OK packet remains) from a multi-resultset one, drain any remainder so the connection stays clean for reuse, and reject a leaked/changed transaction (closing the connection in that case, like the buffered path does).

To get there, a streamed CALL is now planned as PlanCallProc instead of PlanSelectStream. That's what lets Stream() dispatch it to the new check, and it also takes it out of stream consolidation — which is the right thing regardless of this fix. Consolidation hands one execution's results to multiple concurrent callers of an identical query; that's safe for a plain SELECT, but a stored procedure can do DML, change transaction state, or otherwise have side effects, and every caller of a CALL expects their own invocation to actually run. Sharing a single procedure execution across callers was never correct, so classifying CALL as PlanCallProc (which the consolidator path is gated against) fixes that too.

Deliberate deviation from `Execute`

This is not strict parity, on purpose. The buffered Execute path rejects any CALL that returns a resultset at all — even a single SELECT — because of the trailing OK packet that follows it (see TestCallProcedure, where proc_select1 is a wantErr case). On the streaming path we deliberately allow a single-resultset CALL to stream, since streaming a large resultset is the whole reason to use this path; forcing every CALL through the buffered path would re-impose maxResultSize and defeat that. We still reject multi-resultset and transaction-leaking procedures, with the same error messages as Execute.

For what it's worth, the Execute behavior here looks wrong to me: rejecting a procedure that returns a single resultset with "Multi-Resultset not supported" is surprising, and a single-resultset CALL is a perfectly reasonable thing to want to run. But changing that is a user-visible behavior change on the buffered path with its own compatibility and test implications, so it should be done separately and deliberately rather than smuggled in here. This PR intentionally leaves Execute exactly as it is and only makes the streaming path safe; the streaming path just happens to be the more permissive (and, I'd argue, more correct) of the two for the single-resultset case.

One inherent consequence: for a multi-resultset CALL we've already streamed the first resultset's rows to the client by the time we can see the violation, so the client gets rows followed by an error (whereas Execute buffers and sends nothing). That's unavoidable without giving up streaming, and the blast radius is limited to the offending query.

Related Issue(s)

Fixes #20371

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

On backporting

This touches the MySQL protocol layer and changes how a streamed CALL is planned, so it's a bit more than a one-line bug fix and arguably more than you'd usually want in a backport. I think it's worth backporting anyway:

It's a silent correctness bug: callers get truncated/wrong data from a multi-resultset CALL with err == nil, and a transaction leaked onto a pooled streaming connection can later surface out of band as an opaque Code: CANCELED on an unrelated query — very hard to diagnose in the field.
It includes a real connection-hygiene fix: a successful single-resultset streamed CALL previously left its trailing OK packet on the connection, poisoning the next reuse. That's a latent source of flaky, hard-to-reproduce errors on the OLAP stream pool.
The behavioral change is narrowly scoped to the CALL path and brings it in line with the long-standing buffered Execute behavior; it doesn't touch regular streaming SELECTs.

Deployment Notes

Streamed CALLs are no longer eligible for stream consolidation (sharing one procedure execution across concurrent callers was never correct — see above), and their query-timing / log-stats now bucket under CallProc instead of SelectStream (matching the buffered Execute path). A multi-resultset or transaction-leaking CALL over StreamExecute now returns an error where it previously returned partial data or silently succeeded.

AI Disclosure

This PR was written primarily by Claude Code — I provided the direction and review.

StreamExecute routed a CALL straight through execStreamSQL, which reads only the first resultset and discards the terminating status flags. As a result the multi-resultset and transaction-leak checks that the buffered Execute path enforces never ran: a multi-resultset procedure silently returned only its first resultset, and a procedure that leaked a transaction was silently accepted onto a pooled connection. A successful single-resultset CALL also left its trailing OK packet unread, dirtying the pooled connection for the next query that reused it. Capture, during streaming, whether the query returned a resultset and the status flags of a no-resultset OK packet, then verify a streamed CALL afterwards: read one more result (without buffering its rows) to tell a single-resultset call from a multi-resultset one, drain any remainder so the connection stays clean, and reject leaked or changed transaction state. A single-resultset CALL keeps streaming, which is the whole point of the streaming path. Streamed CALLs are now planned as PlanCallProc rather than PlanSelectStream so they are dispatched here and excluded from stream consolidation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

vitess-bot · 2026-06-23T12:30:40Z

codecov · 2026-06-23T12:35:11Z

Codecov Report

❌ Patch coverage is 30.12048% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.13%. Comparing base (70c7a72) to head (92a848c).
⚠️ Report is 354 commits behind head on main.

Files with missing lines	Patch %	Lines
go/vt/vttablet/tabletserver/query_executor.go	21.21%	52 Missing ⚠️
go/vt/vttablet/tabletserver/planbuilder/plan.go	20.00%	4 Missing ⚠️
go/vt/vttablet/tabletserver/connpool/dbconn.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main   #20372       +/-   ##
===========================================
+ Coverage   69.67%   74.13%    +4.46%     
===========================================
  Files        1614      274     -1340     
  Lines      216793    40120   -176673     
===========================================
- Hits       151044    29742   -121302     
+ Misses      65749    10378    -55371

Flag	Coverage Δ
partial	`74.13% <30.12%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

promptless · 2026-06-23T12:46:19Z

Promptless prepared documentation updates related to this change.

Triggered by PR #20372

This PR enforces stored-procedure safety checks on the streaming CALL path: multi-resultset and transaction-leaking procedures over StreamExecute now error instead of silently returning bad data, a single-resultset CALL is allowed to stream, streamed CALLs are removed from stream consolidation, and their timing/log-stats now bucket under CallProc. Two doc updates were drafted:

Changelog entry for v25.0.0: Add v25.0.0 changelog entry for streaming CALL stored-procedure safety checks
MySQL compatibility reference update (Stored Procedures section): Clarify stored-procedure CALL support on the streaming path

Note: this PR is still open, so these drafts reflect the current diff and will be re-verified if the change evolves before merge.

mattlord

Shouldn't we check the final CALL status after draining multi-resultsets in
go/vt/vttablet/tabletserver/query_executor.go:1152-1159? The multi-resultset branch drains the remaining resultsets, then returns before inspecting the final OK/status packet. A procedure like SELECT 1; SELECT 2; START TRANSACTION; would still return the multi-resultset error, but the final IN_TRANS status is discarded and the stream-pool connection can be recycled with an open transaction. No? If so then I think that we should have drainStreamedResultSets return the final sqltypes.Result/status and run the same transaction-state check before returning the multi-resultset error, or close the connection unconditionally on this error path.

I think that we should close or verify streamed CALL connections on callback errors in go/vt/vttablet/tabletserver/query_executor.go:431-443. It looks like if the streaming callback returns an error, Stream() returns before verifyStreamedCallProc runs. DBConnection.ExecuteStreamFetch only drains the current resultset on callback error, so a streamed CALL can still leave the trailing OK packet or later resultsets unread before the pooled connection is recycled. This would keep the connection-hygiene hole for client cancellation / send failures. For PlanCallProc, either close the connection on execStreamSQL error or run a bounded drain/verify path while preserving the original callback error.

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

Inline the post-stream stored-procedure checks into the transaction and stream-pool branches of Stream() so each branch works on the concrete connection it owns and closes it directly, with no nil-connection threading. Extract only the txConn-independent protocol handling — reading the trailing status packet and draining any extra resultsets — into streamedCallProcTrailingStatus, leaving each branch a flat close-then- prioritized-error policy. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

arthurschreiber · 2026-06-23T17:08:43Z

@mattlord Thanks for the review, both the issues you raised are valid. I changed the in-transaction check to always use the final status / ok packet to understand whether the connection needs to be recycled.

I also fixed the second issue you pointed out wrt. to errors during callback execution. I also refactored the code a bit in the hopes of making it easier to read/review.

mattlord

LGTM! Just a couple of minor questions.

mattlord · 2026-06-23T23:16:33Z

+			if changedTx {
+				return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction state change inside the stored procedure is not allowed")


Is there any reason we can't return this in the previous changedTx check?

See #20372 (comment)

mattlord · 2026-06-23T23:20:52Z

+		if leakedTx {
+			return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction not concluded inside the stored procedure, leaking transaction from stored procedure is not allowed")


Same comment here, why can't we return this error in the previous leakedTx check?

I guess we could, but I wanted to replicate how the non-streaming version of this behaves - see execCallProc. I'd leave it as-is instead of deviating the behavior between streaming and non-streaming.

Generalize the streaming OK-packet capture from the narrow streamHadResultset bool plus streamStatusFlags uint16 to a single streamOK *sqltypes.Result. A no-resultset streaming query (e.g. a CALL of a procedure that performs DML) now keeps the full OK packet — RowsAffected, InsertID, Info and status flags — mirroring the Result the buffered ExecuteFetch path builds from the same packet. StreamResultStatus is replaced by StreamOKResult, which returns the captured Result (nil when the query returned a resultset). The streamed CALL trailing- status check reads streamOK == nil for "had a resultset" and streamOK.StatusFlags for the transaction-state check, so its behavior is unchanged. Add go/mysql unit tests for the captured OK packet. This single primitive also lets the streaming result path report RowsAffected to the client in a follow-up change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

arthurschreiber · 2026-06-25T14:37:23Z

+	wg.Go(func() {
+		streamErr = cConn.ExecuteStreamFetch("CALL sp_insert()")
+		if streamErr != nil {
+			return
+		}
+		okRes = cConn.StreamOKResult()
+	})


This is not true. We're on go 1.26.0

arthurschreiber · 2026-06-25T14:39:17Z

+	dbConn, err := qre.getStreamConn()
+	if err != nil {
+		return err
+	}
+	defer dbConn.Recycle()

-		if replaceKeyspace != "" {
-			result.ReplaceKeyspace(qre.tsv.config.DB.DBName, replaceKeyspace)
+	err = qre.execStreamSQL(dbConn, false, sql, streamCallback)
+	if qre.plan.PlanID == p.PlanCallProc {
+		if err != nil {
+			dbConn.Close()
+			return err
 		}


This is not true. A connection that's recycled after it was closed will be replaced in the pool with a new connection instead.

Copilot AI review requested due to automatic review settings June 23, 2026 12:30

Copilot AI reviewed Jun 23, 2026

arthurschreiber added Component: Query Serving Type: Bug labels Jun 23, 2026

github-actions Bot added this to the v25.0.0 milestone Jun 23, 2026

arthurschreiber added Backport to: release-23.0 Needs to be backport to release-23.0 Backport to: release-24.0 Needs to be backport to release-24.0 labels Jun 23, 2026

github-actions Bot added the Component: VTTablet label Jun 23, 2026

arthurschreiber marked this pull request as ready for review June 23, 2026 12:36

arthurschreiber requested review from harshit-gangal, mattlord, rohit-nayak-ps, shlomi-noach, systay and timvaillancourt as code owners June 23, 2026 12:36

mattlord reviewed Jun 23, 2026

View reviewed changes

arthurschreiber added 2 commits June 23, 2026 14:49

Fix streamed CALL connection cleanup

a43839f

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

Check streamed CALL transaction state after drain

f15761d

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

mattlord approved these changes Jun 23, 2026

View reviewed changes

mattlord mentioned this pull request Jun 24, 2026

Release of v23.0.5 #20369

Closed

32 tasks

arthurschreiber mentioned this pull request Jun 24, 2026

Release of v24.0.2 #20370

Closed

33 tasks

Copilot AI review requested due to automatic review settings June 25, 2026 14:25

Copilot AI reviewed Jun 25, 2026

View reviewed changes

arthurschreiber mentioned this pull request Jun 25, 2026

vtgate: report RowsAffected for stored-procedure calls over the streaming path #20402

Draft

5 tasks

		if changedTx {
		return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction state change inside the stored procedure is not allowed")

		if leakedTx {
		return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction not concluded inside the stored procedure, leaking transaction from stored procedure is not allowed")

Uh oh!

Conversation

arthurschreiber commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How it works

Deliberate deviation from Execute

Related Issue(s)

Checklist

On backporting

Deployment Notes

AI Disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

vitess-bot Bot commented Jun 23, 2026

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

promptless Bot commented Jun 23, 2026

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

arthurschreiber commented Jun 23, 2026

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

mattlord Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

arthurschreiber Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

mattlord Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

arthurschreiber Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

arthurschreiber Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

arthurschreiber Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arthurschreiber commented Jun 23, 2026 •

edited

Loading

Deliberate deviation from `Execute`

codecov Bot commented Jun 23, 2026 •

edited

Loading

arthurschreiber Jun 25, 2026 •

edited

Loading