vtgate: decouple streaming delivery from workload via --mysql-server-use-streaming#20378
Draft
arthurschreiber wants to merge 11 commits into
Draft
vtgate: decouple streaming delivery from workload via --mysql-server-use-streaming#20378arthurschreiber wants to merge 11 commits into
arthurschreiber wants to merge 11 commits into
Conversation
BuildStreaming is now the full planner that shares all analysis with Build: it accepts every statement type Build does and returns the real plan types (Select, Show, OtherRead, SelectLockFunc, ...) instead of the catch-all PlanSelectStream. Build becomes a thin wrapper that calls BuildStreaming and overlays only the Build-specific behavior: the result-size LIMIT for SELECT/UNION. ANALYZE and EXPLAIN now map to PlanSelect on both paths, so streaming and buffered planning agree (buffered already used PlanSelect for these). The write-safety LIMIT for UPDATE/DELETE stays in analyzeUpdate/analyzeDelete, so it is applied on both the buffered and streaming paths -- it bounds the transaction size regardless of streaming. Only the result-buffering LIMIT for SELECT/UNION is Build-only, since streaming reads have nothing to buffer. PlanSelectStream is removed entirely, fixing ACL error messages and stats keys that leaked "SelectStream" into user-visible output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
…ypes Now that BuildStreaming yields real plan types, Stream() handles the plan types that previously only Execute() did: PlanNextval and PlanShowMigrations (which carry a nil FullQuery) get dedicated handlers, and the generic SQL path falls back to the raw query when FullQuery is nil. The consolidator and schema-name rewrite checks switch from PlanSelectStream to PlanSelect. Stream() also records the per-table/per-plan QueryEngine stats and the result histogram that Execute records, so reads on the streaming path are accounted for the same way. The defer is registered after the DML dispatch so streamDML, which records its own stats, is not double-counted, and it captures the real error code via the named return. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Add a suite verifying that the StreamExecute path matches Execute now that BuildStreaming accepts the full range of statements: DML, DDL, SET, savepoints, CALL, FLUSH, LOAD and migration statements all run through StreamExecute, ACL error messages and stats keys no longer leak "SelectStream", PlanNextval and PlanShowMigrations are handled, and DML on the streaming path honors the write-safety LIMIT through streamDML. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
…rkload Today the MySQL protocol handler uses StreamExecute only for the OLAP workload; every other workload is buffered via Execute. This couples the delivery mode (streaming vs buffering) to the workload mode, even though the workload should only drive pool selection and limits. This adds an opt-in vtgate flag, --mysql-server-use-streaming (default false), which routes ComQuery, ComQueryMulti, and ComStmtExecute through streaming for all workloads. A new mysqlSessionUsesStreaming helper gates the three handlers (flag || workload == OLAP). To keep workload semantics independent of the delivery mode, the tablet side now follows the workload rather than the RPC: - getStreamConn selects the regular (OLTP) connection pool for an OLTP workload and the stream pool otherwise, so an OLTP query streamed to the client still draws from the OLTP pool. This is applied at the single getStreamConn chokepoint, covering the direct, consolidator, and schema query call sites. - streamExecute computes its request timeout via the new streamTimeout helper: an OLTP workload gets the same timeout semantics as the buffered Execute path (query timeout, or the smaller of query and OLTP tx timeout in a transaction), while other workloads keep the historical behavior of bounding only transactional streams with the OLAP tx timeout. UNSPECIFIED deliberately stays on the OLAP tx timeout to avoid TxTimeoutForWorkload mapping it to OLTP. - a non-transactional streamed query is tracked in the query list matching its workload (statelessql for OLTP, olapql otherwise) so shutdown semantics follow the workload too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20378 +/- ##
===========================================
+ Coverage 69.67% 83.38% +13.71%
===========================================
Files 1614 409 -1205
Lines 216793 71476 -145317
===========================================
- Hits 151044 59603 -91441
+ Misses 65749 11873 -53876
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Flip the flag default to streaming-for-all-workloads so CI exercises the streaming delivery path across the whole suite. This is an experiment to surface what breaks under streaming-by-default; not intended to merge as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The StreamExecute result aggregator copied only Fields and Rows, dropping RowsAffected. A row-returning statement that returns an OK packet — e.g. a CALL of a stored procedure that performs DML — reported 0 affected rows to the client instead of the real count, unlike the buffered Execute path. Accumulate RowsAffected onto the aggregated result so it reaches the client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
On the buffered Execute path, updateQueryStats runs unconditionally, so the per-plan QueryExecutions counter is incremented even when a query errors. The streaming path skipped it on the error path for row-returning statements, so a failed EXPLAIN/SELECT over StreamExecute was never counted. Record the stats on the row-returning error path too, leaving TablesUsed unset so the per-table QueryExecutionsByTable counter stays untouched, matching the buffered Execute path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Following the RowsAffected fix, the StreamExecute result aggregator still dropped the remaining OK-packet fields. A row-returning statement that returns an OK packet — e.g. a CALL of a procedure that performs an INSERT — lost its InsertID (last_insert_id) and Info message on the way to the client, unlike the buffered Execute path. Carry InsertID (mirroring Result.AppendResult, including InsertIDChanged) and Info onto the aggregated result so they reach the client. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The previous fix recorded per-plan query stats only for failed row-returning queries on the streaming path. Non-row-returning queries (INSERT/UPDATE/DELETE) that failed still returned through rollbackExecIfNeeded without recording them, so a failed streamed DML was never counted in QueryExecutions — unlike the buffered Execute path, which records stats before any rollback handling. Hoist the stats recording to the top of the error block so every failed query, row-returning or not, is counted. TablesUsed stays unset on error, so the per-table QueryExecutionsByTable counter remains untouched, matching Execute. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
The streaming path never called plan.AddStats, so engine.Plan execution statistics (exec count, exec time, shard queries, rows, errors) — surfaced by /queryz — were missing for every streamed query, unlike the buffered Execute path which always records them via setLogStats. Call plan.AddStats from updateLogStats for both the success and error paths. The row counts are sourced from the streaming result receiver and also stored on logStats (previously unset on this path), matching how buffered Execute couples logStats and plan stats. On error the row counts stay zero and the error count is one, mirroring Execute. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
TestComQueryMulti validates the workload -> delivery-mode mapping with explicit OLTP (buffered) and OLAP (streaming) expectations. It implicitly relied on the --mysql-server-use-streaming default being off to keep its OLTP cases buffered; with the flag on, the OLTP cases received the streaming-shaped packets and the callback indexed past its expectation slice and panicked. Pin the flag off for the duration of the test so it no longer depends on the mutable global default. The streaming delivery of OLTP queries is already covered by the olap cases, which exercise the same path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Today the MySQL protocol handler only uses
StreamExecutefor the OLAP workload — every other workload is buffered throughExecute. That couples the delivery mode (streaming vs. buffering) to the workload mode, even though the workload should really only drive connection-pool selection and limits.This adds an opt-in vtgate flag,
--mysql-server-use-streaming(defaultfalse), that routesComQuery,ComQueryMulti, andComStmtExecutethrough streaming for all workloads. A newmysqlSessionUsesStreaminghelper gates the three handlers (flag || workload == OLAP).To keep workload semantics independent of the delivery mode, the tablet side now follows the workload rather than the RPC:
getStreamConnselects the regular (OLTP) pool for an OLTP workload and the stream pool otherwise, so an OLTP query that happens to be streamed still draws from the OLTP pool. Applied at the singlegetStreamConnchokepoint, covering the direct, consolidator, and schema-query call sites.streamExecutecomputes its request timeout via a newstreamTimeouthelper: an OLTP workload gets the same timeout semantics as bufferedExecute(query timeout, or the smaller of query and OLTP tx timeout inside a transaction), while other workloads keep the historical behavior of bounding only transactional streams with the OLAP tx timeout.UNSPECIFIEDdeliberately stays on the OLAP tx timeout to avoidTxTimeoutForWorkloadmapping it to OLTP.statelessqlfor OLTP,olapqlotherwise), so shutdown semantics follow the workload too.This is based on / extracted from #20213, and builds on #20268 (DML on the StreamExecute path). The earlier commits in this series make the streaming path reach parity with
Executebefore the flag flips it on for everyone:PlanSelectStreamand unifyBuild/BuildStreamingso streaming and buffered execution plan identically.Stream()stats withExecuteand handle all plan types.StreamExecute/Executeparity test suite (stream_execute_compat_test.go).Related Issue(s)
Based on / extracted from #20213. Follows #20268.
Checklist
Deployment Notes
New opt-in vtgate flag
--mysql-server-use-streaming, defaultfalse— no behavioral change unless explicitly enabled. When enabled, all workloads are delivered via streaming; connection pools and limits still follow the session workload.AI Disclosure
This PR was written primarily by Claude Code — I provided direction and review.