refactor: use centralized script and config for sensitive files check#139
Conversation
WalkthroughReplaces inline Bash sensitive-file pattern checks with a Python-based detector and externalized pattern config; adds Changes
Sequence Diagram(s)sequenceDiagram
participant PR as Pull Request
participant Runner as Actions Runner
participant Git as Git (repo)
participant Python as SensitiveCheck.py
PR->>Runner: push / open PR (trigger)
Runner->>Git: checkout code
Runner->>Git: git diff --name-only between head and base
Git-->>Runner: ALL_CHANGED_FILES
Runner->>Python: run SensitiveCheck.py --config .github/workflows/config/sensitive_files.txt --files ALL_CHANGED_FILES
Python->>Python: load patterns, match files
Python-->>Runner: exit 0 (no matches) or exit 1 (sensitive files found + report)
Runner->>PR: mark check pass/fail
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/pull-request.yml (1)
117-139: 🧹 Nitpick | 🔵 TrivialVestigial step
idand output variable.The step still has
id: changed-unauth-files(line 119) and setsany_changed=false(line 124), but no downstream step references these outputs. If they're no longer needed, remove theidand the output assignment to avoid confusion.
🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 8-9: The sensitive_files list contains duplicate/overlapping
patterns (e.g., "CODEOWNERS" vs "CODEOWNERS$", "LICENSE" vs "LICENSE$",
".gitignore" vs "^.gitignore$"); decide which matching style your script expects
(plain literal or regex-anchored) and remove the redundant entries so each file
appears only once, updating the entries like CODEOWNERS/CODEOWNERS$,
LICENSE/LICENSE$, and .gitignore/^.gitignore$ to the single correct form used by
the matcher.
- Line 1: The CI sensitive-files check is blocking edits because
sensitive_files.txt contains ".github/" which matches itself; fix by updating
the sensitive-files checker to exclude this specific config file
(sensitive_files.txt) or change the listing to a more specific pattern that does
not match the config file (e.g., replace ".github/" with ".github/workflows/" or
other specific subpaths), or alternatively move the config out of ".github/" or
require the "ignore-sensitive-files-pr" label on PRs touching
".github/workflows/config/sensitive_files.txt"; locate references to the
sensitive-file patterns in the checker and adjust the exclusion logic to
explicitly skip ".github/workflows/config/sensitive_files.txt" (or tighten the
pattern) so CI no longer blocks legitimate edits to this config.
In @.github/workflows/pull-request.yml:
- Around line 136-139: Remove the unnecessary executable bit change: delete the
chmod +x invocation that targets
.github-central/.github/workflows/scripts/sensitive_file_check.py since the
script is invoked directly with python3 (python3
.github-central/.github/workflows/scripts/sensitive_file_check.py --config ...
--files "${ALL_CHANGED_FILES[@]}"), so only keep the python3 call and the
surrounding conditional that checks ALL_CHANGED_FILES.
- Around line 85-90: The workflow currently pins the centralized checkout to a
mutable ref (ref: main) in the actions/checkout step, which is a supply-chain
risk; update that step (the actions/checkout@v4 invocation with repository:
PalisadoesFoundation/.github and path: .github-central) to use a fixed ref such
as a specific commit SHA or a tagged release (e.g., ref: <commit-sha> or ref:
refs/tags/<tag>) so the workflow always pulls an immutable, reproducible version
of the centralized CI/CD scripts.
| - name: Checkout centralized CI/CD scripts | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| repository: PalisadoesFoundation/.github | ||
| ref: main | ||
| path: .github-central |
There was a problem hiding this comment.
Pinning to ref: main is a supply chain risk.
Any push to the main branch of PalisadoesFoundation/.github could silently alter the behavior of this workflow. Pin to a specific commit SHA or a release tag to ensure reproducibility and prevent unintended breakage or compromise.
- name: Checkout centralized CI/CD scripts
uses: actions/checkout@v4
with:
repository: PalisadoesFoundation/.github
- ref: main
+ ref: <commit-sha-or-tag>
path: .github-central🤖 Prompt for AI Agents
In @.github/workflows/pull-request.yml around lines 85 - 90, The workflow
currently pins the centralized checkout to a mutable ref (ref: main) in the
actions/checkout step, which is a supply-chain risk; update that step (the
actions/checkout@v4 invocation with repository: PalisadoesFoundation/.github and
path: .github-central) to use a fixed ref such as a specific commit SHA or a
tagged release (e.g., ref: <commit-sha> or ref: refs/tags/<tag>) so the workflow
always pulls an immutable, reproducible version of the centralized CI/CD
scripts.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 1-31: Add a top-line comment stating the file uses Python-style
regular expressions matched against the full file path (e.g., "# Each line is a
Python regex matched against the full file path"), then convert every entry to a
proper regex form and normalize examples: change plain prefixes like ".github/"
to an anchored regex "^\\.github/", unanchored filenames like "package.json" to
"^package\\.json$", glob-like entries like "./*.md" to a regex that matches
markdown files ".*\\.md$", and ensure existing regexes such as "^src/.*" and
"^\\.gitignore$" remain valid; finally review and escape literal dots, stars and
special chars across the list so every line is a valid, anchored regex.
- Line 10: The pattern "./*.md" is an invalid regular expression; replace it
with a valid regex that matches Markdown files in the repository root (for
example, use ^\./[^/]+\.md$ instead of "./*.md"), updating the entry that
currently contains "./*.md" so the script using re.search/re.match won't raise
re.error.
- Around line 25-30: The regex patterns in the list (e.g., .* .pem$, .* .key$,
.* .cert$, .* .password$, .* .secret$, .* .credentials$) use an unescaped dot
which matches any character and thus overmatches; update each pattern to escape
the literal dot before the extension and (optionally) anchor the pattern — e.g.,
replace occurrences with ^.*\.pem$, ^.*\.key$, ^.*\.cert$, ^.*\.password$,
^.*\.secret$, and ^.*\.credentials$ so they only match filenames that actually
end with the given extension.
- Line 17: The pattern '^src/.*' is too broad and marks all source files as
sensitive; replace it with a narrower set of paths or remove src/ entirely:
update the sensitive_files.txt entry referencing '^src/.*' to either delete that
line and add specific protected directories like '^src/config/.*'
'^src/server/.*' '^src/infrastructure/.*' (or any folder names containing
infrastructure/configuration) or change it to only match truly sensitive files
(e.g., '^src/.*\\.key$', '^src/.*\\.pem$', or '^src/secret-.*'), and consider
adding '^docs/.*' if docs should be protected; locate the '^src/.*' pattern in
sensitive_files.txt and edit it accordingly.
| .github/ | ||
| CNAME$ | ||
| static/CNAME | ||
| package.json | ||
| sidebar | ||
| docusaurus.config.js | ||
| babel.config.js | ||
| CODEOWNERS | ||
| LICENSE | ||
| ./*.md | ||
| package-lock.json | ||
| tsconfig.json | ||
| pnpm-lock.yaml | ||
| .gitignore | ||
| .prettierignore | ||
| .prettierrc | ||
| ^src/.* | ||
| ^.gitignore$ | ||
| .node-version$ | ||
| .eslintrc.json$ | ||
| .eslintignore$ | ||
| CODEOWNERS$ | ||
| LICENSE$ | ||
| .coderabbit.yaml$ | ||
| .*.pem$ | ||
| .*.key$ | ||
| .*.cert$ | ||
| .*.password$ | ||
| .*.secret$ | ||
| .*.credentials$ | ||
| .nojekyll$ |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
No clear convention on whether patterns are globs, substrings, or regex.
The file mixes plain path prefixes (.github/), glob-like patterns (./*.md), regex-anchored patterns (^src/.*, ^.gitignore$), and unanchored strings (package.json). Without a comment header documenting the expected syntax, contributors will add patterns in the wrong format, causing silent misses or false positives.
Add a comment at the top (e.g., # Each line is a Python regex matched against the full file path) and ensure all entries conform.
🧰 Tools
🪛 GitHub Actions: PR Workflow
[error] 1-1: Unauthorized changes detected in sensitive files: .github/workflows/config/sensitive_files.txt
🪛 LanguageTool
[uncategorized] ~1-~1: The official name of this software platform is spelled with a capital “H”.
Context: .github/ CNAME$ static/CNAME package.json sideb...
(GITHUB)
🤖 Prompt for AI Agents
In @.github/workflows/config/sensitive_files.txt around lines 1 - 31, Add a
top-line comment stating the file uses Python-style regular expressions matched
against the full file path (e.g., "# Each line is a Python regex matched against
the full file path"), then convert every entry to a proper regex form and
normalize examples: change plain prefixes like ".github/" to an anchored regex
"^\\.github/", unanchored filenames like "package.json" to "^package\\.json$",
glob-like entries like "./*.md" to a regex that matches markdown files
".*\\.md$", and ensure existing regexes such as "^src/.*" and "^\\.gitignore$"
remain valid; finally review and escape literal dots, stars and special chars
across the list so every line is a valid, anchored regex.
| .*.pem$ | ||
| .*.key$ | ||
| .*.cert$ | ||
| .*.password$ | ||
| .*.secret$ | ||
| .*.credentials$ |
There was a problem hiding this comment.
Credential-file patterns are incorrect regex — they match far more than intended.
.*.pem$ in regex means "any character, then anything, ending in pem" — this matches strings like my_problem, ecosystem, etc. The literal dot before the extension is not escaped.
Similarly affected: .*.key$, .*.cert$, .*.password$, .*.secret$, .*.credentials$.
🐛 Proposed fix — escape the dot before the extension
-.*.pem$
-.*.key$
-.*.cert$
-.*.password$
-.*.secret$
-.*.credentials$
+.*\.pem$
+.*\.key$
+.*\.cert$
+.*\.password$
+.*\.secret$
+.*\.credentials$🤖 Prompt for AI Agents
In @.github/workflows/config/sensitive_files.txt around lines 25 - 30, The regex
patterns in the list (e.g., .* .pem$, .* .key$, .* .cert$, .* .password$, .*
.secret$, .* .credentials$) use an unescaped dot which matches any character and
thus overmatches; update each pattern to escape the literal dot before the
extension and (optionally) anchor the pattern — e.g., replace occurrences with
^.*\.pem$, ^.*\.key$, ^.*\.cert$, ^.*\.password$, ^.*\.secret$, and
^.*\.credentials$ so they only match filenames that actually end with the given
extension.
| "./*.md" | ||
| "package-lock.json" | ||
| "tsconfig.json" | ||
| "pnpm-lock.yaml" |
There was a problem hiding this comment.
- These are the only entries that must be in sensitive_files.txt
- You are using values that are not relevant.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/pull-request.yml:
- Around line 136-138: Add an existence check before invoking the
sensitive_file_check.py script: verify that the script
".github-central/.github/workflows/scripts/sensitive_file_check.py" and the
config ".github/workflows/config/sensitive_files.txt" both exist (and are files)
when the ALL_CHANGED_FILES array is non-empty, and if either is missing print a
clear error message identifying which path is missing and exit with non-zero
status; update the conditional around the python3 invocation that currently
references ALL_CHANGED_FILES to perform these checks and only run python3 when
both files are present.
| if [ ${#ALL_CHANGED_FILES[@]} -gt 0 ]; then | ||
| python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}" | ||
| fi |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
No guard against Python script or config file being absent.
If the centralized-repo checkout (line 85-90) fails silently or the path changes upstream, this step will fail with a cryptic "No such file or directory" error. Consider adding a brief existence check to surface a clearer message.
Proposed improvement
# Check for sensitive files using the python script
if [ ${`#ALL_CHANGED_FILES`[@]} -gt 0 ]; then
+ SCRIPT=".github-central/.github/workflows/scripts/sensitive_file_check.py"
+ CONFIG=".github/workflows/config/sensitive_files.txt"
+ if [ ! -f "$SCRIPT" ]; then
+ echo "::error::Centralized sensitive-file check script not found at $SCRIPT"
+ exit 1
+ fi
+ if [ ! -f "$CONFIG" ]; then
+ echo "::error::Sensitive files config not found at $CONFIG"
+ exit 1
+ fi
- python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}"
+ python3 "$SCRIPT" --config "$CONFIG" --files "${ALL_CHANGED_FILES[@]}"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ ${#ALL_CHANGED_FILES[@]} -gt 0 ]; then | |
| python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}" | |
| fi | |
| if [ ${`#ALL_CHANGED_FILES`[@]} -gt 0 ]; then | |
| SCRIPT=".github-central/.github/workflows/scripts/sensitive_file_check.py" | |
| CONFIG=".github/workflows/config/sensitive_files.txt" | |
| if [ ! -f "$SCRIPT" ]; then | |
| echo "::error::Centralized sensitive-file check script not found at $SCRIPT" | |
| exit 1 | |
| fi | |
| if [ ! -f "$CONFIG" ]; then | |
| echo "::error::Sensitive files config not found at $CONFIG" | |
| exit 1 | |
| fi | |
| python3 "$SCRIPT" --config "$CONFIG" --files "${ALL_CHANGED_FILES[@]}" | |
| fi |
🤖 Prompt for AI Agents
In @.github/workflows/pull-request.yml around lines 136 - 138, Add an existence
check before invoking the sensitive_file_check.py script: verify that the script
".github-central/.github/workflows/scripts/sensitive_file_check.py" and the
config ".github/workflows/config/sensitive_files.txt" both exist (and are files)
when the ALL_CHANGED_FILES array is non-empty, and if either is missing print a
clear error message identifying which path is missing and exit with non-zero
status; update the conditional around the python3 invocation that currently
references ALL_CHANGED_FILES to perform these checks and only run python3 when
both files are present.

What kind of change does this PR introduce?
Refactoring
Issue Number:
Fixes #138
Did you add tests for your changes?
Snapshots/Videos:
N/A
If relevant, did you update the documentation?
N/A
Summary
This PR refactors the Check-Sensitive-Files job in
.github/workflows/pull-request.yml
to use the centralized CI/CD scripts from the PalisadoesFoundation/.github repository.
It replaces the inline bash script logic with a call to the shared sensitive_file_check.py script, using a dedicated configuration file (
.github/workflows/config/sensitive_files.txt
) to manage the sensitive file patterns. This improves maintainability and separation of concerns.
Does this PR introduce a breaking change?
No
Other information
None
Have you read the contributing guide?
Yes
Summary by CodeRabbit