feat: regex pattern matching for payload inspection flags#356
feat: regex pattern matching for payload inspection flags#356NotYuSheng wants to merge 2 commits into
Conversation
Adds payload_regex as a new rule type in the custom signature system, alongside the existing payload_contains byte-string matching. - CustomSignatureService: payloadRegexMatch() applies Java regex against ASCII-decoded packet payloads; hexToAscii() caps at 64 KB per packet to prevent catastrophic backtracking; case_insensitive flag supported per pattern entry; match_all semantics shared with payload_contains - SignaturesController: validates all payload_regex patterns on PUT /api/ signatures, returning an inline error with rule name and pattern index - signatures.sample.yml / backend/config/signatures.yml: document new payload_regex field; add four annotated example rules (Basic-auth, PII, JWT cookie, SQL injection) - gen_demo.py: add 9 payload-inspection demo flows (5 payload_contains + 4 payload_regex); regenerate demo_all_rules.pcap (now covers 21 rules) - Sphinx docs: update custom-signatures.rst and signature-rules.rst with payload_regex field reference, execution semantics, and examples - SignaturesModal: mention regex support in editor help text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds ChangesPayload Regex Matching Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for payload inspection using regular expressions (payload_regex) in custom signature rules, complementing the existing exact byte-string matching (payload_contains). The changes span backend validation and matching logic, frontend modal updates, documentation, and synthetic PCAP generation for testing. The reviewer feedback focuses on performance optimizations and safety improvements in the backend: introducing a thread-safe pattern cache to avoid redundant regex compilations, lazily caching decoded ASCII payloads to prevent repeated hex-to-ASCII conversions, optimizing the hexToAscii conversion to eliminate substring allocations, and adding safe type checks (instanceof List) when parsing payload_regex from YAML to prevent potential ClassCastException crashes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
347-350:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the demo rule count in the sample-files list too.
This section still says
demo_all_rules.pcaptriggers 12 custom signature demo rules, while the rest of the PR updates the sample/demo set to 21. Leaving both counts in the README will confuse anyone validating the demo content.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 347 - 350, Update the README entry for `demo_all_rules.pcap` so the demo rule count matches the rest of the PR: change the description text that currently reads "Triggers all 12 custom signature demo rules" to "Triggers all 21 custom signature demo rules" (locate the `demo_all_rules.pcap` line in the sample-files list and edit the count).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java`:
- Around line 85-87: The code assumes payload_regex is a
List<Map<String,Object>> and performs unsafe casts in SignaturesController and
CustomSignatureService; instead, add explicit type/schema checks before casting:
in the save/validation path in SignaturesController validate that
rule.get("payload_regex") is either null or an instance of List and that every
element is an instance of Map (with expected keys like "pattern"); if the shape
is invalid, return a validation error or reject the rule instead of saving; in
runtime matching inside CustomSignatureService (where payload_regex is read and
iterated), guard against non-List values and non-Map elements—treat malformed
entries as non-matching (skip them) rather than throwing, and log a debug/warn
about the ignored malformed payload_regex. Ensure you reference and update any
helper/validator used by both classes so the same schema logic is reused.
- Around line 397-405: The code is forcing Pattern.DOTALL by default in
payloadRegexMatch and incorrectly decoding hex to chars; change
payloadRegexMatch to start flags at 0 (not Pattern.DOTALL) so authors can opt
into DOTALL via (?s) or set flags via the entry flag handling (keep the existing
case_insensitive branch for entry.get("case_insensitive")), and update
hexToAscii to build a byte[] from hex pairs and decode it using UTF-8 (e.g., new
String(bytes, StandardCharsets.UTF_8)) so payloadRegexMatch receives proper
UTF‑8 text rather than raw char casts; update references to patternStr, compiled
Pattern creation, payloadRegexMatch, and hexToAscii accordingly.
- Around line 428-441: hexToAscii currently maps each hex byte to a Java char,
which breaks multi-byte UTF-8 sequences; change it to parse the hex into a
byte[] (respecting the even-length/MAX_REGEX_PAYLOAD_BYTES cap and
ignoring/truncating a trailing half-nibble), then construct and return new
String(byteArray, StandardCharsets.UTF_8) so UTF-8 multibyte characters are
decoded correctly; when parsing invalid hex pairs handle NumberFormatException
by inserting the UTF-8 replacement byte (or 0x3F '?') into the byte array so the
final String contains a replacement character.
In `@sample-files/gen_demo.py`:
- Line 460: The print call uses an unnecessary f-string with no placeholders;
replace the f-string print(f"Payload-inspection demo: 9 flows (5
payload_contains + 4 payload_regex) added") with a normal string
print("Payload-inspection demo: 9 flows (5 payload_contains + 4 payload_regex)
added") to remove the empty f-string (locate the statement in gen_demo.py around
the payload-inspection demo print).
---
Outside diff comments:
In `@README.md`:
- Around line 347-350: Update the README entry for `demo_all_rules.pcap` so the
demo rule count matches the rest of the PR: change the description text that
currently reads "Triggers all 12 custom signature demo rules" to "Triggers all
21 custom signature demo rules" (locate the `demo_all_rules.pcap` line in the
sample-files list and edit the count).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6b1ecd37-1fcc-4dbd-9239-93b29293b4ab
📒 Files selected for processing (10)
README.mdbackend/config/signatures.ymlbackend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.javabackend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.javadocs/configuration/signature-rules.rstdocs/features/custom-signatures.rstfrontend/src/components/signatures/SignaturesModal.tsxsample-files/demo_all_rules.pcapsample-files/gen_demo.pysignatures.sample.yml
- Use pattern cache (ConcurrentHashMap) to avoid recompiling regex per conversation - Add lazy decoded[] array to avoid re-decoding each packet hex payload per pattern - Use final int flags (ternary) so lambda capture compiles without effectively-final error - Drop Pattern.DOTALL default; users can add (?s) inline if needed - Replace hexToAscii: byte[] + Character.digit + StandardCharsets.UTF_8 (no substring/parseInt) - Fix instanceof guard in SignaturesController.java to handle non-List payload_regex values - Remove unnecessary f-prefix from print statement in gen_demo.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #341
Summary
payload_regexas a new rule type insignatures.yml, alongside the existingpayload_containsbyte-string matchingcase_insensitive: trueper pattern entrymatch_allsemantics withpayload_contains; both can coexist in the same rule (AND)Changes
Backend
CustomSignatureService:payloadRegexMatch()+hexToAscii()helpers;applySignatures()wired to checkpayload_regexentriesSignaturesController:PUT /api/signaturesnow validates allpayload_regexpatterns before writing, returning a structured error on invalid syntaxConfig / samples
signatures.yml+signatures.sample.yml: documentpayload_regexfield; add four annotated example rules (Basic-auth, PII detection, JWT Set-Cookie, SQL injection probe)gen_demo.py: 9 new payload-inspection flows (5payload_contains+ 4payload_regex);demo_all_rules.pcapregenerated — now covers all 21 rulesDocs
docs/features/custom-signatures.rst: split Payload Matching intopayload_contains/payload_regexsubsections; updated rule count and examplesdocs/configuration/signature-rules.rst: addedpayload_regexfield reference, updated execution semantics and validation sectionSignaturesModal: updated help text to mention regex supportTest plan
payload_regexrule with a valid pattern → re-analyse a PCAP containing matching payload → rule fires"[unclosed") → Save → editor shows inline error with rule name and indexcase_insensitive: true→ rule fires on both upper and lower case payloadsmatch_all: truewith two regex entries → both must match for rule to firepayload_containsandpayload_regexin same rule → both must matchdemo_all_rules.pcapwithsignatures.sample.ymlloaded → all 21 rules fire🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
payload_regexusage, case-insensitive option, match_all semantics, and save-time validation behavior.Chores