Skip to content

Enforce stored-procedure safety checks on the streaming CALL path#20372

Open
arthurschreiber wants to merge 5 commits into
mainfrom
arthur/stream-call-proc-safety-checks
Open

Enforce stored-procedure safety checks on the streaming CALL path#20372
arthurschreiber wants to merge 5 commits into
mainfrom
arthur/stream-call-proc-safety-checks

Conversation

@arthurschreiber

@arthurschreiber arthurschreiber commented Jun 23, 2026

Copy link
Copy Markdown
Member

Description

StreamExecute (the OLAP / streaming path) was sending a CALL straight through execStreamSQL, which reads only the first resultset and throws away the terminating status flags. So the two safety checks the buffered Execute path enforces never ran on the streaming path:

  • a multi-resultset procedure silently returned only its first resultset (truncated/wrong data, no error), and
  • a procedure that leaked or changed a transaction was silently accepted, leaving the recycled streaming connection in an open transaction.

While digging in I also found that a perfectly fine single-resultset CALL left its trailing OK packet unread on the connection, so the next query that reused that pooled connection could blow up with an out-of-sequence error. The new test ordering (a single-resultset call followed by a multi-resultset one) reproduces that.

How it works

The streaming protocol now records whether a query returned a resultset and, when it didn't, captures the status flags of the lone OK packet (the only place a no-resultset procedure's IN_TRANS flag is observable). A streamed CALL is then verified after streaming finishes: we read exactly one more result — with FETCH_NO_ROWS, so nothing is buffered — to tell a single-resultset call (only the trailing OK packet remains) from a multi-resultset one, drain any remainder so the connection stays clean for reuse, and reject a leaked/changed transaction (closing the connection in that case, like the buffered path does).

To get there, a streamed CALL is now planned as PlanCallProc instead of PlanSelectStream. That's what lets Stream() dispatch it to the new check, and it also takes it out of stream consolidation — which is the right thing regardless of this fix. Consolidation hands one execution's results to multiple concurrent callers of an identical query; that's safe for a plain SELECT, but a stored procedure can do DML, change transaction state, or otherwise have side effects, and every caller of a CALL expects their own invocation to actually run. Sharing a single procedure execution across callers was never correct, so classifying CALL as PlanCallProc (which the consolidator path is gated against) fixes that too.

Deliberate deviation from Execute

This is not strict parity, on purpose. The buffered Execute path rejects any CALL that returns a resultset at all — even a single SELECT — because of the trailing OK packet that follows it (see TestCallProcedure, where proc_select1 is a wantErr case). On the streaming path we deliberately allow a single-resultset CALL to stream, since streaming a large resultset is the whole reason to use this path; forcing every CALL through the buffered path would re-impose maxResultSize and defeat that. We still reject multi-resultset and transaction-leaking procedures, with the same error messages as Execute.

For what it's worth, the Execute behavior here looks wrong to me: rejecting a procedure that returns a single resultset with "Multi-Resultset not supported" is surprising, and a single-resultset CALL is a perfectly reasonable thing to want to run. But changing that is a user-visible behavior change on the buffered path with its own compatibility and test implications, so it should be done separately and deliberately rather than smuggled in here. This PR intentionally leaves Execute exactly as it is and only makes the streaming path safe; the streaming path just happens to be the more permissive (and, I'd argue, more correct) of the two for the single-resultset case.

One inherent consequence: for a multi-resultset CALL we've already streamed the first resultset's rows to the client by the time we can see the violation, so the client gets rows followed by an error (whereas Execute buffers and sends nothing). That's unavoidable without giving up streaming, and the blast radius is limited to the offending query.

Related Issue(s)

Fixes #20371

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

On backporting

This touches the MySQL protocol layer and changes how a streamed CALL is planned, so it's a bit more than a one-line bug fix and arguably more than you'd usually want in a backport. I think it's worth backporting anyway:

  • It's a silent correctness bug: callers get truncated/wrong data from a multi-resultset CALL with err == nil, and a transaction leaked onto a pooled streaming connection can later surface out of band as an opaque Code: CANCELED on an unrelated query — very hard to diagnose in the field.
  • It includes a real connection-hygiene fix: a successful single-resultset streamed CALL previously left its trailing OK packet on the connection, poisoning the next reuse. That's a latent source of flaky, hard-to-reproduce errors on the OLAP stream pool.
  • The behavioral change is narrowly scoped to the CALL path and brings it in line with the long-standing buffered Execute behavior; it doesn't touch regular streaming SELECTs.

Deployment Notes

Streamed CALLs are no longer eligible for stream consolidation (sharing one procedure execution across concurrent callers was never correct — see above), and their query-timing / log-stats now bucket under CallProc instead of SelectStream (matching the buffered Execute path). A multi-resultset or transaction-leaking CALL over StreamExecute now returns an error where it previously returned partial data or silently succeeded.

AI Disclosure

This PR was written primarily by Claude Code — I provided the direction and review.

StreamExecute routed a CALL straight through execStreamSQL, which reads
only the first resultset and discards the terminating status flags. As a
result the multi-resultset and transaction-leak checks that the buffered
Execute path enforces never ran: a multi-resultset procedure silently
returned only its first resultset, and a procedure that leaked a
transaction was silently accepted onto a pooled connection. A successful
single-resultset CALL also left its trailing OK packet unread, dirtying
the pooled connection for the next query that reused it.

Capture, during streaming, whether the query returned a resultset and the
status flags of a no-resultset OK packet, then verify a streamed CALL
afterwards: read one more result (without buffering its rows) to tell a
single-resultset call from a multi-resultset one, drain any remainder so
the connection stays clean, and reject leaked or changed transaction
state. A single-resultset CALL keeps streaming, which is the whole point
of the streaming path. Streamed CALLs are now planned as PlanCallProc
rather than PlanSelectStream so they are dispatched here and excluded from
stream consolidation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Copilot AI review requested due to automatic review settings June 23, 2026 12:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions github-actions Bot added this to the v25.0.0 milestone Jun 23, 2026
@vitess-bot vitess-bot Bot added NeedsWebsiteDocsUpdate What it says NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jun 23, 2026
@arthurschreiber arthurschreiber added Backport to: release-23.0 Needs to be backport to release-23.0 Backport to: release-24.0 Needs to be backport to release-24.0 labels Jun 23, 2026
@vitess-bot

vitess-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 30.12048% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.13%. Comparing base (70c7a72) to head (92a848c).
⚠️ Report is 354 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vttablet/tabletserver/query_executor.go 21.21% 52 Missing ⚠️
go/vt/vttablet/tabletserver/planbuilder/plan.go 20.00% 4 Missing ⚠️
go/vt/vttablet/tabletserver/connpool/dbconn.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #20372       +/-   ##
===========================================
+ Coverage   69.67%   74.13%    +4.46%     
===========================================
  Files        1614      274     -1340     
  Lines      216793    40120   -176673     
===========================================
- Hits       151044    29742   -121302     
+ Misses      65749    10378    -55371     
Flag Coverage Δ
partial 74.13% <30.12%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@arthurschreiber arthurschreiber removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jun 23, 2026
@arthurschreiber arthurschreiber marked this pull request as ready for review June 23, 2026 12:36
@promptless

promptless Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Promptless prepared documentation updates related to this change.

Triggered by PR #20372

This PR enforces stored-procedure safety checks on the streaming CALL path: multi-resultset and transaction-leaking procedures over StreamExecute now error instead of silently returning bad data, a single-resultset CALL is allowed to stream, streamed CALLs are removed from stream consolidation, and their timing/log-stats now bucket under CallProc. Two doc updates were drafted:

Note: this PR is still open, so these drafts reflect the current diff and will be re-verified if the change evolves before merge.

@mattlord mattlord left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check the final CALL status after draining multi-resultsets in
go/vt/vttablet/tabletserver/query_executor.go:1152-1159? The multi-resultset branch drains the remaining resultsets, then returns before inspecting the final OK/status packet. A procedure like SELECT 1; SELECT 2; START TRANSACTION; would still return the multi-resultset error, but the final IN_TRANS status is discarded and the stream-pool connection can be recycled with an open transaction. No? If so then I think that we should have drainStreamedResultSets return the final sqltypes.Result/status and run the same transaction-state check before returning the multi-resultset error, or close the connection unconditionally on this error path.

I think that we should close or verify streamed CALL connections on callback errors in go/vt/vttablet/tabletserver/query_executor.go:431-443. It looks like if the streaming callback returns an error, Stream() returns before verifyStreamedCallProc runs. DBConnection.ExecuteStreamFetch only drains the current resultset on callback error, so a streamed CALL can still leave the trailing OK packet or later resultsets unread before the pooled connection is recycled. This would keep the connection-hygiene hole for client cancellation / send failures. For PlanCallProc, either close the connection on execStreamSQL error or run a bounded drain/verify path while preserving the original callback error.

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Inline the post-stream stored-procedure checks into the transaction and
stream-pool branches of Stream() so each branch works on the concrete
connection it owns and closes it directly, with no nil-connection
threading. Extract only the txConn-independent protocol handling — reading
the trailing status packet and draining any extra resultsets — into
streamedCallProcTrailingStatus, leaving each branch a flat close-then-
prioritized-error policy. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
@arthurschreiber

Copy link
Copy Markdown
Member Author

@mattlord Thanks for the review, both the issues you raised are valid. I changed the in-transaction check to always use the final status / ok packet to understand whether the connection needs to be recycled.

I also fixed the second issue you pointed out wrt. to errors during callback execution. I also refactored the code a bit in the hopes of making it easier to read/review.

@mattlord mattlord left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a couple of minor questions.

Comment on lines +458 to +459
if changedTx {
return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction state change inside the stored procedure is not allowed")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't return this in the previous changedTx check?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +491 to +492
if leakedTx {
return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction not concluded inside the stored procedure, leaking transaction from stored procedure is not allowed")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here, why can't we return this error in the previous leakedTx check?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could, but I wanted to replicate how the non-streaming version of this behaves - see execCallProc. I'd leave it as-is instead of deviating the behavior between streaming and non-streaming.

@mattlord mattlord mentioned this pull request Jun 24, 2026
32 tasks
@arthurschreiber arthurschreiber mentioned this pull request Jun 24, 2026
33 tasks
Generalize the streaming OK-packet capture from the narrow streamHadResultset
bool plus streamStatusFlags uint16 to a single streamOK *sqltypes.Result. A
no-resultset streaming query (e.g. a CALL of a procedure that performs DML) now
keeps the full OK packet — RowsAffected, InsertID, Info and status flags —
mirroring the Result the buffered ExecuteFetch path builds from the same packet.

StreamResultStatus is replaced by StreamOKResult, which returns the captured
Result (nil when the query returned a resultset). The streamed CALL trailing-
status check reads streamOK == nil for "had a resultset" and streamOK.StatusFlags
for the transaction-state check, so its behavior is unchanged. Add go/mysql unit
tests for the captured OK packet. This single primitive also lets the streaming
result path report RowsAffected to the client in a follow-up change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment on lines +45 to +51
wg.Go(func() {
streamErr = cConn.ExecuteStreamFetch("CALL sp_insert()")
if streamErr != nil {
return
}
okRes = cConn.StreamOKResult()
})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. We're on go 1.26.0

Comment on lines +466 to 477
dbConn, err := qre.getStreamConn()
if err != nil {
return err
}
defer dbConn.Recycle()

if replaceKeyspace != "" {
result.ReplaceKeyspace(qre.tsv.config.DB.DBName, replaceKeyspace)
err = qre.execStreamSQL(dbConn, false, sql, streamCallback)
if qre.plan.PlanID == p.PlanCallProc {
if err != nil {
dbConn.Close()
return err
}

@arthurschreiber arthurschreiber Jun 25, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. A connection that's recycled after it was closed will be replaced in the pool with a new connection instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StreamExecute (OLAP) bypasses stored-procedure safety checks: multi-resultset and transaction-leaking CALLs are silently accepted

3 participants