Batch phpcs calls into a single invocation#110
Open
sirbrillig wants to merge 12 commits into
Open
Conversation
a75e2c2 to
129b081
Compare
Replace the 2N per-file phpcs launches in runGitWorkflow and runSvnWorkflow with a single batch invocation. All modified and unmodified file contents are written to a temp directory and phpcs is run once on all of them, eliminating the startup overhead cost that scaled linearly with the number of changed files. - Add getPhpcsOutputForGitBatch / getPhpcsOutputForSvnBatch to ShellOperator - Implement batch methods in UnixShell using a temp dir layout (new/ and old/) - Override batch methods in TestShell to delegate to existing per-file mocks - Rewrite runGitWorkflow and runSvnWorkflow with pre-batch/batch/filter phases - Add and update tests for the new batch behavior
129b081 to
74ce1eb
Compare
This was referenced Apr 21, 2026
…ied files runBatchPhpcs keyed its results by original file path, but every file is scanned as both a modified (new/) and an unmodified (old/) temp file under the same original path, so one silently overwrote the other. Both 'new' and 'old' then resolved to the same phpcs output, defeating the new-vs-old filtering that the tool exists to perform. Key batch results by temp path instead, and map each side back to its original files through its own temp-to-original map.
The workflow tests previously overrode getPhpcsOutputFor{Git,Svn}Batch in the
test shells, looping over the per-file phpcs methods. That left the real
ShellRunner batch path (temp-file writing, single combined phpcs invocation,
and per-file JSON splitting) completely untested -- which is how the new/old
collision bug went unnoticed.
The test shells now let the real batch path run and intercept only the final
combined phpcs invocation, synthesizing its output from the temp files the
batch writes (buildBatchPhpcsOutput). Command registrations drop the obsolete
'| phpcs' per-file suffix, and the shells prefer an exact command match so a
file-contents command no longer shadows that same command piped to
git hash-object.
With the real batch covered, the now-unused per-file methods
(getPhpcsOutputOf{Modified,Unmodified}{Git,Svn}File) and their helpers
(getPhpcsCommand, processPhpcsOutput) are removed from ShellOperator,
UnixShell, WindowsShell, and ShellRunner.
…ports The test shells now record the intercepted batch phpcs invocation and expose wasCommandCalledContaining(), so the standard-configured workflow tests assert that --standard reaches the phpcs command (coverage previously provided by the per-file phpcs registrations). Also removes use statements in TestShell that became unused when the per-file batch overrides were deleted.
wasCommandCalledContaining matched the raw recorded command, but on Windows escapeshellarg() emits double quotes, so --standard="standard" never matched the single-quoted needle. Normalize double quotes to single quotes before comparing, matching how the test shells already normalize commands.
The batch path silently swallowed phpcs processing errors (eg: an uninstalled standard). phpcs writes a non-JSON error to stdout in that case, and runBatchPhpcs returned an empty result, which mapped each file to empty output and produced a bogus phantom STDIN success with exit 0. Treat any phpcs output that is not decodable JSON with a 'files' key as a failure and throw a ShellException, restoring the pre-batch behavior where a processing error surfaces and exits non-zero.
The batch path materialized each file's content by capturing the content command (git show / cat) through the line-oriented executeCommand(), which rejoins exec() output and unconditionally appends a trailing newline (and collapses runs of trailing blank lines). The temp file phpcs scanned therefore always ended in \n, so the batch path missed PSR2.Files.EndFileNewline.NoneFound (and .TooMany) for files that genuinely lack a trailing newline. Add ShellPlatform::writeCommandOutputToFile(), which redirects the content command's stdout straight to the temp file, preserving exact bytes. writeTempFile() now uses it and throws when the content command fails, so a fetch failure surfaces instead of producing a silently-empty temp file. The mocked TestShell cannot reproduce the exec() byte corruption, so add a UnixShellTest exercising the real shell to lock in the byte-preserving contract.
Owner
Author
|
Working on making sure this branch's output is identical to trunk. |
testFullGitWorkflowForEmptyNewFile mocked git show returning exit 1 with phpcs's 'You must supply at least one file' text, which described the old per-file path that piped empty stdin into phpcs. The batch path instead writes the empty content to a temp file and passes it as an argument, so git show succeeds (exit 0) and phpcs emits an Internal.NoCodeFound warning. Because the file is new, that warning is reported as a new message. Update the test to register that real behavior and assert the warning is reported.
testFullSvnWorkflowForEmptyNewFile mocked cat returning phpcs's 'You must supply at least one file' text, describing the old per-file path that piped empty stdin into phpcs. The batch path writes the empty content to a temp file and passes it as an argument, so cat succeeds and phpcs emits an Internal.NoCodeFound warning that is reported as a new message for the new file. Mirror the git workflow test fix.
Inlining one shell argument per temp file overflowed the OS ARG_MAX limit once a batch reached thousands of files, failing with a cryptic "Argument list too long". Write all temp paths to a phpcs --file-list file in the batch temp dir instead, keeping the command line a constant size regardless of file count. This also handles paths with spaces cleanly. Update the test mock shell to read the file list and add a regression guard asserting files are never inlined as phpcs arguments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #115
Summary
runGitWorkflowandrunSvnWorkflowwith a single phpcs invocation that scans all file versions at once./tmp/phpcs-changed-XXXX/new/…and/tmp/phpcs-changed-XXXX/old/…) and phpcs runs once across all of them, eliminating startup overhead that previously scaled linearly with the number of changed files.This is a different approach to the two-pass scan idea in #114
Approach
Each workflow now has three phases:
getPhpcsOutputForGitBatch/getPhpcsOutputForSvnBatchcall writes all content to a temp dir and runs phpcs once; results are cached individually per file.Trade-off
The batch approach always scans the unmodified version of uncached files even when the modified version has no messages. Previously the per-file path skipped the unmodified scan in that case. The trade-off is intentional: one saved phpcs startup cost (≥250 ms) outweighs the cost of a few extra file sniffs.
Performance
Measured with
benchmark.shon 10 staged PHP files (PSR2 standard, 10 runs):