Enforce stored-procedure safety checks on the streaming CALL path#20372
Enforce stored-procedure safety checks on the streaming CALL path#20372arthurschreiber wants to merge 5 commits into
Conversation
StreamExecute routed a CALL straight through execStreamSQL, which reads only the first resultset and discards the terminating status flags. As a result the multi-resultset and transaction-leak checks that the buffered Execute path enforces never ran: a multi-resultset procedure silently returned only its first resultset, and a procedure that leaked a transaction was silently accepted onto a pooled connection. A successful single-resultset CALL also left its trailing OK packet unread, dirtying the pooled connection for the next query that reused it. Capture, during streaming, whether the query returned a resultset and the status flags of a no-resultset OK packet, then verify a streamed CALL afterwards: read one more result (without buffering its rows) to tell a single-resultset call from a multi-resultset one, drain any remainder so the connection stays clean, and reject leaked or changed transaction state. A single-resultset CALL keeps streaming, which is the whole point of the streaming path. Streamed CALLs are now planned as PlanCallProc rather than PlanSelectStream so they are dispatched here and excluded from stream consolidation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20372 +/- ##
===========================================
+ Coverage 69.67% 74.13% +4.46%
===========================================
Files 1614 274 -1340
Lines 216793 40120 -176673
===========================================
- Hits 151044 29742 -121302
+ Misses 65749 10378 -55371
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Promptless prepared documentation updates related to this change. Triggered by PR #20372 This PR enforces stored-procedure safety checks on the streaming
Note: this PR is still open, so these drafts reflect the current diff and will be re-verified if the change evolves before merge. |
mattlord
left a comment
There was a problem hiding this comment.
Shouldn't we check the final CALL status after draining multi-resultsets in
go/vt/vttablet/tabletserver/query_executor.go:1152-1159? The multi-resultset branch drains the remaining resultsets, then returns before inspecting the final OK/status packet. A procedure like SELECT 1; SELECT 2; START TRANSACTION; would still return the multi-resultset error, but the final IN_TRANS status is discarded and the stream-pool connection can be recycled with an open transaction. No? If so then I think that we should have drainStreamedResultSets return the final sqltypes.Result/status and run the same transaction-state check before returning the multi-resultset error, or close the connection unconditionally on this error path.
I think that we should close or verify streamed CALL connections on callback errors in go/vt/vttablet/tabletserver/query_executor.go:431-443. It looks like if the streaming callback returns an error, Stream() returns before verifyStreamedCallProc runs. DBConnection.ExecuteStreamFetch only drains the current resultset on callback error, so a streamed CALL can still leave the trailing OK packet or later resultsets unread before the pooled connection is recycled. This would keep the connection-hygiene hole for client cancellation / send failures. For PlanCallProc, either close the connection on execStreamSQL error or run a bounded drain/verify path while preserving the original callback error.
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Inline the post-stream stored-procedure checks into the transaction and stream-pool branches of Stream() so each branch works on the concrete connection it owns and closes it directly, with no nil-connection threading. Extract only the txConn-independent protocol handling — reading the trailing status packet and draining any extra resultsets — into streamedCallProcTrailingStatus, leaving each branch a flat close-then- prioritized-error policy. No behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
|
@mattlord Thanks for the review, both the issues you raised are valid. I changed the in-transaction check to always use the final status / ok packet to understand whether the connection needs to be recycled. I also fixed the second issue you pointed out wrt. to errors during callback execution. I also refactored the code a bit in the hopes of making it easier to read/review. |
mattlord
left a comment
There was a problem hiding this comment.
LGTM! Just a couple of minor questions.
| if changedTx { | ||
| return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction state change inside the stored procedure is not allowed") |
There was a problem hiding this comment.
Is there any reason we can't return this in the previous changedTx check?
| if leakedTx { | ||
| return vterrors.New(vtrpcpb.Code_CANCELED, "Transaction not concluded inside the stored procedure, leaking transaction from stored procedure is not allowed") |
There was a problem hiding this comment.
Same comment here, why can't we return this error in the previous leakedTx check?
There was a problem hiding this comment.
I guess we could, but I wanted to replicate how the non-streaming version of this behaves - see execCallProc. I'd leave it as-is instead of deviating the behavior between streaming and non-streaming.
Generalize the streaming OK-packet capture from the narrow streamHadResultset bool plus streamStatusFlags uint16 to a single streamOK *sqltypes.Result. A no-resultset streaming query (e.g. a CALL of a procedure that performs DML) now keeps the full OK packet — RowsAffected, InsertID, Info and status flags — mirroring the Result the buffered ExecuteFetch path builds from the same packet. StreamResultStatus is replaced by StreamOKResult, which returns the captured Result (nil when the query returned a resultset). The streamed CALL trailing- status check reads streamOK == nil for "had a resultset" and streamOK.StatusFlags for the transaction-state check, so its behavior is unchanged. Add go/mysql unit tests for the captured OK packet. This single primitive also lets the streaming result path report RowsAffected to the client in a follow-up change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
| wg.Go(func() { | ||
| streamErr = cConn.ExecuteStreamFetch("CALL sp_insert()") | ||
| if streamErr != nil { | ||
| return | ||
| } | ||
| okRes = cConn.StreamOKResult() | ||
| }) |
There was a problem hiding this comment.
This is not true. We're on go 1.26.0
| dbConn, err := qre.getStreamConn() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer dbConn.Recycle() | ||
|
|
||
| if replaceKeyspace != "" { | ||
| result.ReplaceKeyspace(qre.tsv.config.DB.DBName, replaceKeyspace) | ||
| err = qre.execStreamSQL(dbConn, false, sql, streamCallback) | ||
| if qre.plan.PlanID == p.PlanCallProc { | ||
| if err != nil { | ||
| dbConn.Close() | ||
| return err | ||
| } |
There was a problem hiding this comment.
This is not true. A connection that's recycled after it was closed will be replaced in the pool with a new connection instead.
Description
StreamExecute(the OLAP / streaming path) was sending aCALLstraight throughexecStreamSQL, which reads only the first resultset and throws away the terminating status flags. So the two safety checks the bufferedExecutepath enforces never ran on the streaming path:While digging in I also found that a perfectly fine single-resultset
CALLleft its trailing OK packet unread on the connection, so the next query that reused that pooled connection could blow up with an out-of-sequence error. The new test ordering (a single-resultset call followed by a multi-resultset one) reproduces that.How it works
The streaming protocol now records whether a query returned a resultset and, when it didn't, captures the status flags of the lone OK packet (the only place a no-resultset procedure's
IN_TRANSflag is observable). A streamedCALLis then verified after streaming finishes: we read exactly one more result — withFETCH_NO_ROWS, so nothing is buffered — to tell a single-resultset call (only the trailing OK packet remains) from a multi-resultset one, drain any remainder so the connection stays clean for reuse, and reject a leaked/changed transaction (closing the connection in that case, like the buffered path does).To get there, a streamed
CALLis now planned asPlanCallProcinstead ofPlanSelectStream. That's what letsStream()dispatch it to the new check, and it also takes it out of stream consolidation — which is the right thing regardless of this fix. Consolidation hands one execution's results to multiple concurrent callers of an identical query; that's safe for a plainSELECT, but a stored procedure can do DML, change transaction state, or otherwise have side effects, and every caller of aCALLexpects their own invocation to actually run. Sharing a single procedure execution across callers was never correct, so classifyingCALLasPlanCallProc(which the consolidator path is gated against) fixes that too.Deliberate deviation from
ExecuteThis is not strict parity, on purpose. The buffered
Executepath rejects anyCALLthat returns a resultset at all — even a singleSELECT— because of the trailing OK packet that follows it (seeTestCallProcedure, whereproc_select1is awantErrcase). On the streaming path we deliberately allow a single-resultsetCALLto stream, since streaming a large resultset is the whole reason to use this path; forcing everyCALLthrough the buffered path would re-imposemaxResultSizeand defeat that. We still reject multi-resultset and transaction-leaking procedures, with the same error messages asExecute.For what it's worth, the
Executebehavior here looks wrong to me: rejecting a procedure that returns a single resultset with "Multi-Resultset not supported" is surprising, and a single-resultsetCALLis a perfectly reasonable thing to want to run. But changing that is a user-visible behavior change on the buffered path with its own compatibility and test implications, so it should be done separately and deliberately rather than smuggled in here. This PR intentionally leavesExecuteexactly as it is and only makes the streaming path safe; the streaming path just happens to be the more permissive (and, I'd argue, more correct) of the two for the single-resultset case.One inherent consequence: for a multi-resultset
CALLwe've already streamed the first resultset's rows to the client by the time we can see the violation, so the client gets rows followed by an error (whereasExecutebuffers and sends nothing). That's unavoidable without giving up streaming, and the blast radius is limited to the offending query.Related Issue(s)
Fixes #20371
Checklist
On backporting
This touches the MySQL protocol layer and changes how a streamed
CALLis planned, so it's a bit more than a one-line bug fix and arguably more than you'd usually want in a backport. I think it's worth backporting anyway:CALLwitherr == nil, and a transaction leaked onto a pooled streaming connection can later surface out of band as an opaqueCode: CANCELEDon an unrelated query — very hard to diagnose in the field.CALLpreviously left its trailing OK packet on the connection, poisoning the next reuse. That's a latent source of flaky, hard-to-reproduce errors on the OLAP stream pool.CALLpath and brings it in line with the long-standing bufferedExecutebehavior; it doesn't touch regular streaming SELECTs.Deployment Notes
Streamed
CALLs are no longer eligible for stream consolidation (sharing one procedure execution across concurrent callers was never correct — see above), and their query-timing / log-stats now bucket underCallProcinstead ofSelectStream(matching the bufferedExecutepath). A multi-resultset or transaction-leakingCALLoverStreamExecutenow returns an error where it previously returned partial data or silently succeeded.AI Disclosure
This PR was written primarily by Claude Code — I provided the direction and review.