Skip to content

Add MySQL proof of concept#3997

Draft
shaunandrews wants to merge 9 commits into
trunkfrom
implement-mysql-poc
Draft

Add MySQL proof of concept#3997
shaunandrews wants to merge 9 commits into
trunkfrom
implement-mysql-poc

Conversation

@shaunandrews

@shaunandrews shaunandrews commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Proposed Changes

  • Adds a proof-of-concept MySQL path for Studio native PHP sites while keeping SQLite as the default database engine.
  • Lets users opt into MySQL from advanced site creation settings or the CLI, with Playground sites rejecting MySQL clearly.
  • Downloads and verifies a managed MySQL runtime on demand instead of increasing every Studio install size.
  • Starts and provisions a per-site MySQL database so native PHP, WordPress install, Blueprint execution, and WP-CLI can run against real MySQL.
  • Gates unsupported MySQL workflows in the desktop UI where the current experience is still SQLite-specific.
  • Documents the strategy, binary delivery model, implementation plan, and remaining product work.
image

What Works Now

  • Create a native PHP site with databaseEngine=mysql.
  • Download MySQL on demand into ~/.studio/mysql-bin/8.4.10/.
  • Store per-site data under ~/.studio/mysql-data/<site-id>/.
  • Start and stop MySQL with the native PHP site.
  • Generate wp-config.php with real MySQL credentials.
  • Apply empty-site Blueprints against MySQL after the follow-up fix.
  • Block MySQL for sandbox/Playground sites.
  • Disable Sync UI for MySQL sites.

Big Remaining Work

  1. Platform binaries

    • Current metadata only includes darwin-arm64.
    • Missing: macOS Intel, Windows x64, and Linux.
    • Windows code paths partially exist for .exe and .zip, but are untested.
    • Linux needs archive selection and extractor/runtime validation.
  2. Binary delivery policy

    • Decide final source: Oracle MySQL vs MariaDB.
    • Add metadata with URL, SHA, archive type, and root dir for each platform.
    • Confirm licensing and redistribution requirements.
    • Keep on-demand download as default unless product wants an offline-first bundle.
  3. Lifecycle hardening

    • Handle orphaned mysqld after app crashes.
    • Add install locks so two sites cannot race while downloading/extracting MySQL.
    • Improve cleanup of runtime socket/PID dirs.
    • Decide per-site MySQL process vs shared global server. Current PoC is effectively per-site.
  4. Feature boundaries

    • Sync is disabled for MySQL right now.
    • Import/export/backup workflows need review.
    • Pull from WordPress.com/Pressable into MySQL is not solved.
    • Site duplication, deletion, reset, and migration need explicit MySQL behavior.
  5. User experience

    • Improve first-use messaging: “Downloading MySQL, about 160-270 MB.”
    • Improve failure messages for unsupported platform, offline state, and hash mismatch.
    • Consider hiding MySQL unless native PHP is selected.
    • Add a visible “Database engine: MySQL” indicator in site details/status.
  6. WP-CLI / developer tooling

    • Internal WP-CLI paths can work if MySQL is running.
    • Public CLI does not currently expose a clean studio wp ... command in this branch.
    • Add or restore a supported WP-CLI command path that starts MySQL when needed.
  7. Tests and CI

    • Add unit tests for metadata/platform resolution.
    • Add tests for MySQL download/extract paths, including Windows archive shape.
    • Add lifecycle tests for start/stop/restart.
    • Add at least one e2e smoke test per supported OS if CI can tolerate it.
  8. Upgrade story

    • Decide what happens when MySQL 8.4.11 ships.
    • Decide whether old runtimes stay installed.
    • Define how existing data dirs are migrated or verified.
    • Define recovery behavior for failed runtime upgrades.

@shaunandrews shaunandrews changed the title [codex] Add MySQL proof of concept Add MySQL proof of concept Jun 29, 2026
@shaunandrews shaunandrews requested a review from chubes4 June 30, 2026 00:13
chubes4 and others added 7 commits July 2, 2026 13:30
Adds a `studio convert --to mysql` command that converts an existing
SQLite-backed Studio site to real MySQL in place, reusing the MySQL POC's
provisioning, config-swap, and boot-check building blocks.

Sequence (all reversible until the boot-verify gate passes):
1. Backup db.php, the sqlite-database-integration mu-plugin, wp-config.php,
   and a config snapshot to a timestamped dir; .ht.sqlite is never deleted.
2. Export the SQLite DB to a MySQL-shaped .sql via the existing AST-driver
   exporter (full single-file dump — keeps wp_users/wp_usermeta).
3. Provision the per-site MySQL DB, matching the exporter's per-table
   collation (utf8mb4_0900_ai_ci) so the schema default and imported tables
   agree.
4. Import the dump INTO MySQL via a new streaming helper.
5. Remove the SQLite integration and write MySQL DB_* constants.
6. Verify WordPress boots on MySQL (is_blog_installed) as a hard accept gate.
7. Roll back fully to the SQLite site on any failure.

New/changed pieces:
- `importSqlFileIntoMysql` streams a .sql on the mysql client's stdin and
  relaxes the session sql_mode (NO_ENGINE_SUBSTITUTION) so WordPress's
  legitimate `0000-00-00` datetime defaults load under MySQL 8 strict mode —
  the same accommodation `wp db import`/mysqldump round-trips rely on.
- `runMysqlCommand` gains optional stdin-file streaming.
- `provisionMysqlDatabase` gains an optional collation override (default
  unchanged for the create path).
- `getWpConfigTransformerPath` exposes the bundled transformer path.
- The convert command reserves known site/MySQL ports before allocating the
  new MySQL port so it can't collide with the site's own server port.

Proven end to end by converting a real 274 MB SQLite runtime to MySQL 8.4
and serving it over HTTPS with content intact (posts/pages/options/users
match the pre-conversion baseline).

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Opus 4.8 via Claude Code
- **Used for:** Implementing the convert command, the MySQL import helper, and
  the collation/sql_mode/port-reservation handling; root-causing the import
  and boot-check behavior against a real runtime.
Two linked bugs let a freshly-converted MySQL site break on the next
`studio start`, even though the conversion itself imported cleanly:

1. paths.ts hardcoded the daemon home to ~/.studio, ignoring the
   DEV_CONFIG_DIR sandbox that the rest of the CLI honors via
   getConfigDirectory(). A sandboxed CLI (the MySQL POC, or start:test)
   therefore connected to the default ~/.studio daemon, which for the POC
   is the stock MySQL-unaware Studio.app daemon — it never launched mysqld
   and drove the isolated site through the wrong process manager. Derive
   PROCESS_MANAGER_HOME from getConfigDirectory() so the daemon socket
   lives alongside cli.json/shared.json. When DEV_CONFIG_DIR is unset
   getConfigDirectory() returns ~/.studio, so normal installs are byte-for
   -byte unchanged; an explicit STUDIO_PROCESS_MANAGER_HOME still wins.
   This finishes what #2958 set out to do ("redirect the entire config
   directory") but which only covered well-known-paths.ts.

2. ensureWpConfig() silently fell back to DB_NAME='wordpress' whenever the
   engine config was absent. Combined with (1), a start via the wrong
   daemon rewrote a converted site's wp-config.php back to 'wordpress',
   severing it from its studio_<id> database. Add a guard that refuses to
   overwrite an existing non-default DB_NAME when no MySQL engine config
   is provided, instead of corrupting the config.

Proven on the real studio-native-local-runtime site: full stop -> start
round-trip now boots on MySQL 8.4.10, serves HTTP 200, and keeps
DB_NAME=studio_1322c571c44e43ca80826cd1 with all content intact.

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Opus 4.8 via Claude Code
- **Used for:** Root-causing the two bugs from source, writing the fixes
  and the guard unit test, and verifying the stop/start round-trip on the
  converted runtime.
The native-PHP runtime hardcoded its worker pool to 4, which is the
concurrency ceiling for a site — the number of requests it can serve
simultaneously. Nothing let a developer change it: not a flag, config
field, or env var.

Allow STUDIO_PHP_WORKER_POOL_SIZE to override the default of 4, so a
larger machine can raise throughput (useful for exercising concurrent
request paths like async fanout) and a constrained one can lower memory
use. Invalid or non-positive values fall back to 4.

Env var rather than a persisted per-site config field for now — kept
deliberately small; it can be promoted to a `studio config` setting later
if it earns it. The name is STUDIO_PHP_WORKER_POOL_SIZE (not NATIVE_PHP)
to avoid reading as the unrelated "Studio Native" product; it configures
Studio's own native-PHP runtime.

Verified on studio-native-local-runtime: STUDIO_PHP_WORKER_POOL_SIZE=8
spawns 8 workers and serves HTTP 200; an unset value still defaults to 4.

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Opus 4.8 via Claude Code
- **Used for:** Implementing the env-var override and verifying worker
  count on the running site.
WordPress fires its own loopback requests — Action Scheduler's async queue
runner, wp-cron, REST-to-self — as fire-and-forget: a non-blocking request
with a ~0.01s timeout, where the client disconnects immediately by design
and the PHP worker is meant to keep running and do the work. On the
native-PHP runtime this never worked, which stalls every async workload
(Action Scheduler jobs, and async agent fanout in particular). Two distinct
causes, both fixed here:

1. `.local` custom domains resolve via multicast DNS (RFC 6762) with a ~5s
   timeout on macOS, even when the name is in /etc/hosts. A self-loopback to
   the site's own `.local` URL therefore spends 5-10s in DNS and blows past
   the 0.01s dispatch timeout, so the request never even connects. Add a
   native-PHP mu-plugin (0-loopback-dns-fast-path.php) that pins the site's
   OWN host to 127.0.0.1 via CURLOPT_RESOLVE for requests WordPress makes to
   itself. This does not change how `.local` resolves for the browser or
   anything external — it only short-circuits the site addressing itself,
   which is what 127.0.0.1 is for. Threaded the site host+port through
   writeStudioMuPluginsForNativePhpRuntime and its three call sites (server
   start, blueprint start, WP-CLI).

2. Both proxy layers (the HTTPS front door in proxy-server.ts and the worker
   pool proxy in php-server-child.ts) aborted the upstream PHP-worker request
   when the client disconnected. For a fire-and-forget request that means the
   worker is killed the instant it is kicked off. Decouple the upstream
   request's lifetime from the client: forward the body manually and, on a
   client abort, finish sending to the worker rather than tearing it down so
   it runs to completion. proxy-server.ts drops the http-proxy dependency for
   this path in favor of a raw http.request forward.

Loopback latency drops from 5-10s to ~0.09s; a fire-and-forget dispatch now
lands and the worker survives client disconnect. Verified end to end: a
WordPress async fanout job self-drives to completion on this runtime, where
before every branch dispatch died at DNS.

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Opus 4.8 via Claude Code
- **Used for:** Root-causing both failures from source and live runtime
  behavior, implementing the fixes, and building a deterministic loopback
  repro to verify.
# Conflicts:
#	apps/cli/commands/site/create.ts
#	apps/studio/src/modules/cli/lib/cli-site-creator.ts
#	packages/common/lib/database-engine.ts
#	packages/common/lib/mysql-binary-cdn-metadata.json
#	packages/common/lib/mysql-binary-metadata.ts
…n heartbeat

Three robustness gaps in the native-PHP MySQL runtime, each fixed and verified
against the live studio-native-local-runtime site.

1. `wp db` was broken: WP-CLI's db subcommands shell out to a bare `mysql`/
   `mysqldump` resolved on PATH, but Studio bundles the MySQL client without
   installing it globally, so every `wp db *` died with `env: mysql: No such
   file or directory`. Prepend the bundled client's bin/ dir to the WP-CLI
   child's PATH for MySQL-engine sites (SQLite sites are untouched). Verified:
   `wp db query "SELECT 1"` and `wp db size` now return results.

2. The bundled mysqld ran with the default `time_zone=SYSTEM`, inheriting the
   host's local zone (e.g. EDT, UTC-4). WordPress and Action Scheduler store all
   `*_gmt` columns as PHP-computed UTC, so the database clock disagreed with the
   stored data by the host's UTC offset: `NOW()` returned local time while the
   rows were UTC. That makes UTC timestamps look hours in the future to any
   query comparing against the DB clock, and corrupts any code mixing MySQL
   `NOW()` with a `_gmt` column. Launch mysqld with `--default-time-zone=+00:00`
   so the server clock is UTC. Verified after a stop/start: `@@time_zone=+00:00`
   and `NOW()` == `UTC_TIMESTAMP()` == PHP `gmdate()`.

3. This runtime is a bare `php -S` worker pool with no cron ticker, so
   WordPress's `wp-cron.php` — the universal entrypoint a production host drives
   via system-cron or traffic — never fired on a schedule. Any async workload
   (Action Scheduler jobs, agent fan-out branches) only advanced when a user
   request happened to arrive, so branches could strand PENDING indefinitely on
   an idle site. Add a generic 60s WP-Cron heartbeat in the server child that
   loopbacks to `/wp-cron.php?doing_wp_cron` over the internal HTTP proxy (with
   the canonical Host header, avoiding the self-signed `.local` cert and the
   canonical redirect). The runtime fires generic WordPress cron; Action
   Scheduler drains as a consequence of being a well-behaved WP-Cron citizen —
   no plugin-specific coupling in the runtime. Verified: the "Starting WP-Cron
   heartbeat" log appears on start and a worker connection lands each 60s window
   with zero external traffic.

The Action Scheduler future-dated-timestamp symptom was investigated and is NOT
a runtime clock-skew bug in the claim path: AS compares PHP-UTC strings on both
sides and never uses MySQL `NOW()`, so the tz fix above is correctness hygiene,
not the drain fix. The remaining branch-drain edge (a stuck in-progress action
holding the single claim slot while `pending_branch_count()` collapses
concurrency back to 1, under agents-api's intentional 3600s long-branch reaper
window) is app-layer (agents-api) behavior, not a Studio runtime defect.

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Opus 4.8 via Claude Code
- **Used for:** Root-causing all three gaps from source and live runtime
  evidence, writing the fixes, and verifying each via the actual CLI/DB/HTTP
  behavior on the running site.
shouldUsePrimaryWorker() pinned every non-GET/HEAD/OPTIONS request to
worker 0. That method-based rule was an over-broad stand-in for
"stateful admin request", but it swept in all of WordPress's own POST
loopbacks — most importantly Action Scheduler's async queue runner (a
POST to admin-ajax.php) and wp-cron. Async fanout fires N concurrent
loopback POSTs to wake N workers for N branches; with the method pin all
N serialized on worker 0, capping fanout concurrency at 1 regardless of
pool size.

Pin worker affinity by request PATH only. The one route that genuinely
needs single-worker affinity is phpMyAdmin (file-based session store
scoped to one worker via STUDIO_PHPMYADMIN_SESSION_PATH), and its
existing /phpmyadmin path pin already covers all its methods. WordPress
itself is stateless across the pool — sessions live in the shared MySQL
database, not on any worker's disk — so its POSTs are safe to
load-balance like GETs.

Proven at the proxy level: 8 concurrent loopback POSTs now reach 8
distinct worker PIDs (was 1 before), matching GET behavior.

AI assistance: Yes; Tool(s): Claude Code (Opus 4.8); Used for:
root-cause analysis, the fix, and proxy/end-to-end concurrency proofs.
@chubes4

chubes4 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Fanout concurrency: proxy POST-pin fixed; a second upstream constraint isolated

Fix in this commit (proxy layer)

shouldUsePrimaryWorker() in apps/cli/php-server-child.ts pinned every non-GET/HEAD/OPTIONS request to worker 0. That method-based rule was an over-broad stand-in for "stateful admin request" and swept in all of WordPress's own POST loopbacks — including Action Scheduler's async queue runner (admin-ajax.php?action=as_async_request_queue_runner) and wp-cron. Async fanout fires N concurrent loopback POSTs to wake N workers; with the method pin all N serialized on worker 0.

Now affinity is pinned by path only. phpMyAdmin (the one genuinely stateful route — file-based session store scoped to one worker) keeps its existing /phpmyadmin pin, which already covers all its methods. WordPress is stateless across the pool (sessions live in shared MySQL), so its POSTs load-balance like GETs.

Proxy-level proof (8-worker pool, getmypid probe):

requests before after
8 concurrent POSTs 1 distinct worker 8 distinct workers
8 concurrent GETs 8 distinct workers 8 distinct workers

Second constraint isolated (upstream — NOT the proxy, NOT this repo)

With the proxy fixed, a 5-page site-forge generate still drained its 5 branches serially at the 60s WP-Cron heartbeat cadence (peak 2 concurrent), not in parallel. Root cause is in agents-api, not Studio: WP_Agent_Workflow_Action_Scheduler_Branch_Executor::loopback_dispatch_target() rewrites the loopback host to 127.0.0.1 but keeps the https:// scheme and default :443. This HTTPS .local runtime's worker-pool proxy is plain HTTP on localhost:<port>; there is no HTTPS listener on 127.0.0.1:443, so every concurrent async-runner POST fails the TLS handshake and the branches fall back to the serial WP-Cron drain.

Isolation proof (same runtime, same 8-worker pool, WpOrg\Requests::request_multiple):

  • Current target https://127.0.0.1/wp-admin/admin-ajax.php…0/8 succeed (SSL routines::ssl/tls alert handshake failure)
  • Correct target http://localhost:<port>/… + canonical Host: header → 8/8 succeed (200), and the 8 runners actually claim + drain branches

So the branch-fanout mechanism, the AS concurrency policy, and this proxy fix are all correct; the remaining cap is agents-api constructing an HTTPS/:443 loopback target that doesn't match a plain-HTTP native-PHP worker-pool runtime. That fix belongs upstream in agents-api's loopback_dispatch_target (derive scheme/port from the actual local server, or reuse the plain-HTTP proxy port), and is deliberately not papered over here.

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Code (Opus 4.8)
  • Used for: root-cause analysis, the proxy fix, and the proxy-level + end-to-end concurrency proofs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants