Skip to content

Batch phpcs calls into a single invocation#110

Open
sirbrillig wants to merge 12 commits into
trunkfrom
batch-phpcs-invocations
Open

Batch phpcs calls into a single invocation#110
sirbrillig wants to merge 12 commits into
trunkfrom
batch-phpcs-invocations

Conversation

@sirbrillig

@sirbrillig sirbrillig commented Apr 20, 2026

Copy link
Copy Markdown
Owner

Fixes #115

Summary

  • Replaces the 2N per-file phpcs launches in runGitWorkflow and runSvnWorkflow with a single phpcs invocation that scans all file versions at once.
  • All modified and unmodified file contents are written to a temp directory (/tmp/phpcs-changed-XXXX/new/… and /tmp/phpcs-changed-XXXX/old/…) and phpcs runs once across all of them, eliminating startup overhead that previously scaled linearly with the number of changed files.

This is a different approach to the two-pass scan idea in #114

Approach

Each workflow now has three phases:

  1. Pre-batch — check caches, determine which files are new (no unmodified version), collect the lists of files that need scanning.
  2. Batch — single getPhpcsOutputForGitBatch / getPhpcsOutputForSvnBatch call writes all content to a temp dir and runs phpcs once; results are cached individually per file.
  3. Filter — parse results and compute new messages per file as before.

Trade-off

The batch approach always scans the unmodified version of uncached files even when the modified version has no messages. Previously the per-file path skipped the unmodified scan in that case. The trade-off is intentional: one saved phpcs startup cost (≥250 ms) outweighs the cost of a few extra file sniffs.

Performance

Measured with benchmark.sh on 10 staged PHP files (PSR2 standard, 10 runs):

Mean Relative
trunk: 20 phpcs calls 5.74 s ± 0.11 s 1.00×
batch (this branch): 1 phpcs call 2.44 s ± 0.05 s 2.36× faster

@sirbrillig sirbrillig force-pushed the batch-phpcs-invocations branch from a75e2c2 to 129b081 Compare April 20, 2026 23:26
Replace the 2N per-file phpcs launches in runGitWorkflow and runSvnWorkflow
with a single batch invocation. All modified and unmodified file contents are
written to a temp directory and phpcs is run once on all of them, eliminating
the startup overhead cost that scaled linearly with the number of changed files.

- Add getPhpcsOutputForGitBatch / getPhpcsOutputForSvnBatch to ShellOperator
- Implement batch methods in UnixShell using a temp dir layout (new/ and old/)
- Override batch methods in TestShell to delegate to existing per-file mocks
- Rewrite runGitWorkflow and runSvnWorkflow with pre-batch/batch/filter phases
- Add and update tests for the new batch behavior
…ied files

runBatchPhpcs keyed its results by original file path, but every file is
scanned as both a modified (new/) and an unmodified (old/) temp file under
the same original path, so one silently overwrote the other. Both 'new' and
'old' then resolved to the same phpcs output, defeating the new-vs-old
filtering that the tool exists to perform.

Key batch results by temp path instead, and map each side back to its
original files through its own temp-to-original map.
The workflow tests previously overrode getPhpcsOutputFor{Git,Svn}Batch in the
test shells, looping over the per-file phpcs methods. That left the real
ShellRunner batch path (temp-file writing, single combined phpcs invocation,
and per-file JSON splitting) completely untested -- which is how the new/old
collision bug went unnoticed.

The test shells now let the real batch path run and intercept only the final
combined phpcs invocation, synthesizing its output from the temp files the
batch writes (buildBatchPhpcsOutput). Command registrations drop the obsolete
'| phpcs' per-file suffix, and the shells prefer an exact command match so a
file-contents command no longer shadows that same command piped to
git hash-object.

With the real batch covered, the now-unused per-file methods
(getPhpcsOutputOf{Modified,Unmodified}{Git,Svn}File) and their helpers
(getPhpcsCommand, processPhpcsOutput) are removed from ShellOperator,
UnixShell, WindowsShell, and ShellRunner.
…ports

The test shells now record the intercepted batch phpcs invocation and expose
wasCommandCalledContaining(), so the standard-configured workflow tests assert
that --standard reaches the phpcs command (coverage previously provided by the
per-file phpcs registrations). Also removes use statements in TestShell that
became unused when the per-file batch overrides were deleted.
wasCommandCalledContaining matched the raw recorded command, but on Windows
escapeshellarg() emits double quotes, so --standard="standard" never matched
the single-quoted needle. Normalize double quotes to single quotes before
comparing, matching how the test shells already normalize commands.
The batch path silently swallowed phpcs processing errors (eg: an
uninstalled standard). phpcs writes a non-JSON error to stdout in that
case, and runBatchPhpcs returned an empty result, which mapped each file
to empty output and produced a bogus phantom STDIN success with exit 0.

Treat any phpcs output that is not decodable JSON with a 'files' key as a
failure and throw a ShellException, restoring the pre-batch behavior
where a processing error surfaces and exits non-zero.
The batch path materialized each file's content by capturing the content
command (git show / cat) through the line-oriented executeCommand(),
which rejoins exec() output and unconditionally appends a trailing
newline (and collapses runs of trailing blank lines). The temp file
phpcs scanned therefore always ended in \n, so the batch path missed
PSR2.Files.EndFileNewline.NoneFound (and .TooMany) for files that
genuinely lack a trailing newline.

Add ShellPlatform::writeCommandOutputToFile(), which redirects the
content command's stdout straight to the temp file, preserving exact
bytes. writeTempFile() now uses it and throws when the content command
fails, so a fetch failure surfaces instead of producing a silently-empty
temp file.

The mocked TestShell cannot reproduce the exec() byte corruption, so add
a UnixShellTest exercising the real shell to lock in the byte-preserving
contract.
@sirbrillig

Copy link
Copy Markdown
Owner Author

Working on making sure this branch's output is identical to trunk.

testFullGitWorkflowForEmptyNewFile mocked git show returning exit 1 with
phpcs's 'You must supply at least one file' text, which described the old
per-file path that piped empty stdin into phpcs. The batch path instead
writes the empty content to a temp file and passes it as an argument, so
git show succeeds (exit 0) and phpcs emits an Internal.NoCodeFound
warning. Because the file is new, that warning is reported as a new
message. Update the test to register that real behavior and assert the
warning is reported.
testFullSvnWorkflowForEmptyNewFile mocked cat returning phpcs's 'You must
supply at least one file' text, describing the old per-file path that
piped empty stdin into phpcs. The batch path writes the empty content to
a temp file and passes it as an argument, so cat succeeds and phpcs emits
an Internal.NoCodeFound warning that is reported as a new message for the
new file. Mirror the git workflow test fix.
Inlining one shell argument per temp file overflowed the OS ARG_MAX
limit once a batch reached thousands of files, failing with a cryptic
"Argument list too long". Write all temp paths to a phpcs --file-list
file in the batch temp dir instead, keeping the command line a constant
size regardless of file count. This also handles paths with spaces
cleanly. Update the test mock shell to read the file list and add a
regression guard asserting files are never inlined as phpcs arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve performance by reducing the number of phpcs calls

1 participant