Skip to content

refactor: use centralized script and config for sensitive files check#139

Merged
palisadoes merged 4 commits intoPalisadoesFoundation:mainfrom
18bharathkumar:refactor/sensitive-files-check
Feb 9, 2026
Merged

refactor: use centralized script and config for sensitive files check#139
palisadoes merged 4 commits intoPalisadoesFoundation:mainfrom
18bharathkumar:refactor/sensitive-files-check

Conversation

@18bharathkumar
Copy link
Contributor

@18bharathkumar 18bharathkumar commented Feb 8, 2026

What kind of change does this PR introduce?

Refactoring

Issue Number:

Fixes #138

Did you add tests for your changes?

Snapshots/Videos:

N/A

If relevant, did you update the documentation?

N/A

Summary

This PR refactors the Check-Sensitive-Files job in
.github/workflows/pull-request.yml
to use the centralized CI/CD scripts from the PalisadoesFoundation/.github repository.

It replaces the inline bash script logic with a call to the shared sensitive_file_check.py script, using a dedicated configuration file (
.github/workflows/config/sensitive_files.txt
) to manage the sensitive file patterns. This improves maintainability and separation of concerns.

Does this PR introduce a breaking change?

No

Other information

None

Have you read the contributing guide?

Yes

Summary by CodeRabbit

  • Chores
    • Expanded and standardized the repository's sensitive-file patterns to broaden detection of credentials, keys, config, and common dotfiles.
    • Reworked the CI workflow to collect changed files and run a centralized Python-based sensitive-file check, simplifying prior per-pattern shell logic and improving reliability and maintainability.

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

Walkthrough

Replaces inline Bash sensitive-file pattern checks with a Python-based detector and externalized pattern config; adds .github/workflows/config/sensitive_files.txt and updates .github/workflows/pull-request.yml to collect changed files and invoke the Python check.

Changes

Cohort / File(s) Summary
Sensitive Files Configuration
.github/workflows/config/sensitive_files.txt
Adds a new text file listing explicit paths and regex/glob patterns for sensitive files and extensions to be detected by the workflow.
Workflow CI/CD Refactoring
.github/workflows/pull-request.yml
Removes Bash per-pattern git-diff loop and SENSITIVE_PATTERNS array; adds centralized collection of all changed files and a conditional step to run a Python script with the new config file. Also adjusts checkout and Python setup steps for the new flow.

Sequence Diagram(s)

sequenceDiagram
  participant PR as Pull Request
  participant Runner as Actions Runner
  participant Git as Git (repo)
  participant Python as SensitiveCheck.py

  PR->>Runner: push / open PR (trigger)
  Runner->>Git: checkout code
  Runner->>Git: git diff --name-only between head and base
  Git-->>Runner: ALL_CHANGED_FILES
  Runner->>Python: run SensitiveCheck.py --config .github/workflows/config/sensitive_files.txt --files ALL_CHANGED_FILES
  Python->>Python: load patterns, match files
  Python-->>Runner: exit 0 (no matches) or exit 1 (sensitive files found + report)
  Runner->>PR: mark check pass/fail
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: refactoring the sensitive files check to use a centralized script and config file instead of inline bash logic.
Description check ✅ Passed The description covers the main aspects including change type, issue number, motivation, and breaking changes, though the test section is left incomplete.
Linked Issues check ✅ Passed The PR implements the core requirements from issue #138: adds a configuration file, integrates centralized script, replaces inline bash logic, and maintains existing functionality.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objectives in issue #138: configuration file and workflow integration for sensitive file checking.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@18bharathkumar
Copy link
Contributor Author

@coderabbitai
review the changes

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

@18bharathkumar, I'll review the changes in this PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/pull-request.yml (1)

117-139: 🧹 Nitpick | 🔵 Trivial

Vestigial step id and output variable.

The step still has id: changed-unauth-files (line 119) and sets any_changed=false (line 124), but no downstream step references these outputs. If they're no longer needed, remove the id and the output assignment to avoid confusion.

🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 8-9: The sensitive_files list contains duplicate/overlapping
patterns (e.g., "CODEOWNERS" vs "CODEOWNERS$", "LICENSE" vs "LICENSE$",
".gitignore" vs "^.gitignore$"); decide which matching style your script expects
(plain literal or regex-anchored) and remove the redundant entries so each file
appears only once, updating the entries like CODEOWNERS/CODEOWNERS$,
LICENSE/LICENSE$, and .gitignore/^.gitignore$ to the single correct form used by
the matcher.
- Line 1: The CI sensitive-files check is blocking edits because
sensitive_files.txt contains ".github/" which matches itself; fix by updating
the sensitive-files checker to exclude this specific config file
(sensitive_files.txt) or change the listing to a more specific pattern that does
not match the config file (e.g., replace ".github/" with ".github/workflows/" or
other specific subpaths), or alternatively move the config out of ".github/" or
require the "ignore-sensitive-files-pr" label on PRs touching
".github/workflows/config/sensitive_files.txt"; locate references to the
sensitive-file patterns in the checker and adjust the exclusion logic to
explicitly skip ".github/workflows/config/sensitive_files.txt" (or tighten the
pattern) so CI no longer blocks legitimate edits to this config.

In @.github/workflows/pull-request.yml:
- Around line 136-139: Remove the unnecessary executable bit change: delete the
chmod +x invocation that targets
.github-central/.github/workflows/scripts/sensitive_file_check.py since the
script is invoked directly with python3 (python3
.github-central/.github/workflows/scripts/sensitive_file_check.py --config ...
--files "${ALL_CHANGED_FILES[@]}"), so only keep the python3 call and the
surrounding conditional that checks ALL_CHANGED_FILES.
- Around line 85-90: The workflow currently pins the centralized checkout to a
mutable ref (ref: main) in the actions/checkout step, which is a supply-chain
risk; update that step (the actions/checkout@v4 invocation with repository:
PalisadoesFoundation/.github and path: .github-central) to use a fixed ref such
as a specific commit SHA or a tagged release (e.g., ref: <commit-sha> or ref:
refs/tags/<tag>) so the workflow always pulls an immutable, reproducible version
of the centralized CI/CD scripts.

Comment on lines +85 to +90
- name: Checkout centralized CI/CD scripts
uses: actions/checkout@v4
with:
repository: PalisadoesFoundation/.github
ref: main
path: .github-central
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pinning to ref: main is a supply chain risk.

Any push to the main branch of PalisadoesFoundation/.github could silently alter the behavior of this workflow. Pin to a specific commit SHA or a release tag to ensure reproducibility and prevent unintended breakage or compromise.

       - name: Checkout centralized CI/CD scripts
         uses: actions/checkout@v4
         with:
           repository: PalisadoesFoundation/.github
-          ref: main
+          ref: <commit-sha-or-tag>
           path: .github-central
🤖 Prompt for AI Agents
In @.github/workflows/pull-request.yml around lines 85 - 90, The workflow
currently pins the centralized checkout to a mutable ref (ref: main) in the
actions/checkout step, which is a supply-chain risk; update that step (the
actions/checkout@v4 invocation with repository: PalisadoesFoundation/.github and
path: .github-central) to use a fixed ref such as a specific commit SHA or a
tagged release (e.g., ref: <commit-sha> or ref: refs/tags/<tag>) so the workflow
always pulls an immutable, reproducible version of the centralized CI/CD
scripts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/workflows/config/sensitive_files.txt:
- Around line 1-31: Add a top-line comment stating the file uses Python-style
regular expressions matched against the full file path (e.g., "# Each line is a
Python regex matched against the full file path"), then convert every entry to a
proper regex form and normalize examples: change plain prefixes like ".github/"
to an anchored regex "^\\.github/", unanchored filenames like "package.json" to
"^package\\.json$", glob-like entries like "./*.md" to a regex that matches
markdown files ".*\\.md$", and ensure existing regexes such as "^src/.*" and
"^\\.gitignore$" remain valid; finally review and escape literal dots, stars and
special chars across the list so every line is a valid, anchored regex.
- Line 10: The pattern "./*.md" is an invalid regular expression; replace it
with a valid regex that matches Markdown files in the repository root (for
example, use ^\./[^/]+\.md$ instead of "./*.md"), updating the entry that
currently contains "./*.md" so the script using re.search/re.match won't raise
re.error.
- Around line 25-30: The regex patterns in the list (e.g., .* .pem$, .* .key$,
.* .cert$, .* .password$, .* .secret$, .* .credentials$) use an unescaped dot
which matches any character and thus overmatches; update each pattern to escape
the literal dot before the extension and (optionally) anchor the pattern — e.g.,
replace occurrences with ^.*\.pem$, ^.*\.key$, ^.*\.cert$, ^.*\.password$,
^.*\.secret$, and ^.*\.credentials$ so they only match filenames that actually
end with the given extension.
- Line 17: The pattern '^src/.*' is too broad and marks all source files as
sensitive; replace it with a narrower set of paths or remove src/ entirely:
update the sensitive_files.txt entry referencing '^src/.*' to either delete that
line and add specific protected directories like '^src/config/.*'
'^src/server/.*' '^src/infrastructure/.*' (or any folder names containing
infrastructure/configuration) or change it to only match truly sensitive files
(e.g., '^src/.*\\.key$', '^src/.*\\.pem$', or '^src/secret-.*'), and consider
adding '^docs/.*' if docs should be protected; locate the '^src/.*' pattern in
sensitive_files.txt and edit it accordingly.

Comment on lines +1 to +31
.github/
CNAME$
static/CNAME
package.json
sidebar
docusaurus.config.js
babel.config.js
CODEOWNERS
LICENSE
./*.md
package-lock.json
tsconfig.json
pnpm-lock.yaml
.gitignore
.prettierignore
.prettierrc
^src/.*
^.gitignore$
.node-version$
.eslintrc.json$
.eslintignore$
CODEOWNERS$
LICENSE$
.coderabbit.yaml$
.*.pem$
.*.key$
.*.cert$
.*.password$
.*.secret$
.*.credentials$
.nojekyll$
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

No clear convention on whether patterns are globs, substrings, or regex.

The file mixes plain path prefixes (.github/), glob-like patterns (./*.md), regex-anchored patterns (^src/.*, ^.gitignore$), and unanchored strings (package.json). Without a comment header documenting the expected syntax, contributors will add patterns in the wrong format, causing silent misses or false positives.

Add a comment at the top (e.g., # Each line is a Python regex matched against the full file path) and ensure all entries conform.

🧰 Tools
🪛 GitHub Actions: PR Workflow

[error] 1-1: Unauthorized changes detected in sensitive files: .github/workflows/config/sensitive_files.txt

🪛 LanguageTool

[uncategorized] ~1-~1: The official name of this software platform is spelled with a capital “H”.
Context: .github/ CNAME$ static/CNAME package.json sideb...

(GITHUB)

🤖 Prompt for AI Agents
In @.github/workflows/config/sensitive_files.txt around lines 1 - 31, Add a
top-line comment stating the file uses Python-style regular expressions matched
against the full file path (e.g., "# Each line is a Python regex matched against
the full file path"), then convert every entry to a proper regex form and
normalize examples: change plain prefixes like ".github/" to an anchored regex
"^\\.github/", unanchored filenames like "package.json" to "^package\\.json$",
glob-like entries like "./*.md" to a regex that matches markdown files
".*\\.md$", and ensure existing regexes such as "^src/.*" and "^\\.gitignore$"
remain valid; finally review and escape literal dots, stars and special chars
across the list so every line is a valid, anchored regex.

Comment on lines +25 to +30
.*.pem$
.*.key$
.*.cert$
.*.password$
.*.secret$
.*.credentials$
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Credential-file patterns are incorrect regex — they match far more than intended.

.*.pem$ in regex means "any character, then anything, ending in pem" — this matches strings like my_problem, ecosystem, etc. The literal dot before the extension is not escaped.

Similarly affected: .*.key$, .*.cert$, .*.password$, .*.secret$, .*.credentials$.

🐛 Proposed fix — escape the dot before the extension
-.*.pem$
-.*.key$
-.*.cert$
-.*.password$
-.*.secret$
-.*.credentials$
+.*\.pem$
+.*\.key$
+.*\.cert$
+.*\.password$
+.*\.secret$
+.*\.credentials$
🤖 Prompt for AI Agents
In @.github/workflows/config/sensitive_files.txt around lines 25 - 30, The regex
patterns in the list (e.g., .* .pem$, .* .key$, .* .cert$, .* .password$, .*
.secret$, .* .credentials$) use an unescaped dot which matches any character and
thus overmatches; update each pattern to escape the literal dot before the
extension and (optionally) anchor the pattern — e.g., replace occurrences with
^.*\.pem$, ^.*\.key$, ^.*\.cert$, ^.*\.password$, ^.*\.secret$, and
^.*\.credentials$ so they only match filenames that actually end with the given
extension.

Copy link
Contributor

@palisadoes palisadoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix:

Image

"./*.md"
"package-lock.json"
"tsconfig.json"
"pnpm-lock.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. These are the only entries that must be in sensitive_files.txt
  2. You are using values that are not relevant.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/pull-request.yml:
- Around line 136-138: Add an existence check before invoking the
sensitive_file_check.py script: verify that the script
".github-central/.github/workflows/scripts/sensitive_file_check.py" and the
config ".github/workflows/config/sensitive_files.txt" both exist (and are files)
when the ALL_CHANGED_FILES array is non-empty, and if either is missing print a
clear error message identifying which path is missing and exit with non-zero
status; update the conditional around the python3 invocation that currently
references ALL_CHANGED_FILES to perform these checks and only run python3 when
both files are present.

Comment on lines +136 to +138
if [ ${#ALL_CHANGED_FILES[@]} -gt 0 ]; then
python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

No guard against Python script or config file being absent.

If the centralized-repo checkout (line 85-90) fails silently or the path changes upstream, this step will fail with a cryptic "No such file or directory" error. Consider adding a brief existence check to surface a clearer message.

Proposed improvement
           # Check for sensitive files using the python script
           if [ ${`#ALL_CHANGED_FILES`[@]} -gt 0 ]; then
+             SCRIPT=".github-central/.github/workflows/scripts/sensitive_file_check.py"
+             CONFIG=".github/workflows/config/sensitive_files.txt"
+             if [ ! -f "$SCRIPT" ]; then
+               echo "::error::Centralized sensitive-file check script not found at $SCRIPT"
+               exit 1
+             fi
+             if [ ! -f "$CONFIG" ]; then
+               echo "::error::Sensitive files config not found at $CONFIG"
+               exit 1
+             fi
-             python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}"
+             python3 "$SCRIPT" --config "$CONFIG" --files "${ALL_CHANGED_FILES[@]}"
           fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ ${#ALL_CHANGED_FILES[@]} -gt 0 ]; then
python3 .github-central/.github/workflows/scripts/sensitive_file_check.py --config .github/workflows/config/sensitive_files.txt --files "${ALL_CHANGED_FILES[@]}"
fi
if [ ${`#ALL_CHANGED_FILES`[@]} -gt 0 ]; then
SCRIPT=".github-central/.github/workflows/scripts/sensitive_file_check.py"
CONFIG=".github/workflows/config/sensitive_files.txt"
if [ ! -f "$SCRIPT" ]; then
echo "::error::Centralized sensitive-file check script not found at $SCRIPT"
exit 1
fi
if [ ! -f "$CONFIG" ]; then
echo "::error::Sensitive files config not found at $CONFIG"
exit 1
fi
python3 "$SCRIPT" --config "$CONFIG" --files "${ALL_CHANGED_FILES[@]}"
fi
🤖 Prompt for AI Agents
In @.github/workflows/pull-request.yml around lines 136 - 138, Add an existence
check before invoking the sensitive_file_check.py script: verify that the script
".github-central/.github/workflows/scripts/sensitive_file_check.py" and the
config ".github/workflows/config/sensitive_files.txt" both exist (and are files)
when the ALL_CHANGED_FILES array is non-empty, and if either is missing print a
clear error message identifying which path is missing and exit with non-zero
status; update the conditional around the python3 invocation that currently
references ALL_CHANGED_FILES to perform these checks and only run python3 when
both files are present.

@palisadoes palisadoes merged commit f9c1490 into PalisadoesFoundation:main Feb 9, 2026
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Developer Docs - Python: Script to detect sensitive files with configuration file

2 participants