Skip to content

feat: regex pattern matching for payload inspection flags#356

Open
NotYuSheng wants to merge 2 commits into
mainfrom
feature/issue-341-payload-regex-signatures
Open

feat: regex pattern matching for payload inspection flags#356
NotYuSheng wants to merge 2 commits into
mainfrom
feature/issue-341-payload-regex-signatures

Conversation

@NotYuSheng

@NotYuSheng NotYuSheng commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Closes #341

Summary

  • Adds payload_regex as a new rule type in signatures.yml, alongside the existing payload_contains byte-string matching
  • Users write standard Java regular expressions matched against the ASCII/UTF-8 decoded payload of each packet
  • Supports case_insensitive: true per pattern entry
  • Shares the existing match_all semantics with payload_contains; both can coexist in the same rule (AND)
  • Payloads capped at 64 KB per packet to prevent catastrophic backtracking
  • Regex syntax errors are caught on save with an inline error identifying the rule name and pattern index

Changes

Backend

  • CustomSignatureService: payloadRegexMatch() + hexToAscii() helpers; applySignatures() wired to check payload_regex entries
  • SignaturesController: PUT /api/signatures now validates all payload_regex patterns before writing, returning a structured error on invalid syntax

Config / samples

  • signatures.yml + signatures.sample.yml: document payload_regex field; add four annotated example rules (Basic-auth, PII detection, JWT Set-Cookie, SQL injection probe)
  • gen_demo.py: 9 new payload-inspection flows (5 payload_contains + 4 payload_regex); demo_all_rules.pcap regenerated — now covers all 21 rules

Docs

  • docs/features/custom-signatures.rst: split Payload Matching into payload_contains / payload_regex subsections; updated rule count and examples
  • docs/configuration/signature-rules.rst: added payload_regex field reference, updated execution semantics and validation section
  • SignaturesModal: updated help text to mention regex support

Test plan

  • Add a payload_regex rule with a valid pattern → re-analyse a PCAP containing matching payload → rule fires
  • Add a rule with an invalid regex (e.g. "[unclosed") → Save → editor shows inline error with rule name and index
  • case_insensitive: true → rule fires on both upper and lower case payloads
  • match_all: true with two regex entries → both must match for rule to fire
  • payload_contains and payload_regex in same rule → both must match
  • Upload demo_all_rules.pcap with signatures.sample.yml loaded → all 21 rules fire

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added regex-based payload matching to custom detection rules (works alongside existing byte-pattern matches).
  • Documentation

    • Expanded docs and in-app help with payload inspection guidance, payload_regex usage, case-insensitive option, match_all semantics, and save-time validation behavior.
  • Chores

    • Editor/save now validates regex compilation.
    • Updated demo rules, sample signatures, and demo traffic to cover payload-based examples.

Adds payload_regex as a new rule type in the custom signature system,
alongside the existing payload_contains byte-string matching.

- CustomSignatureService: payloadRegexMatch() applies Java regex against
  ASCII-decoded packet payloads; hexToAscii() caps at 64 KB per packet
  to prevent catastrophic backtracking; case_insensitive flag supported
  per pattern entry; match_all semantics shared with payload_contains
- SignaturesController: validates all payload_regex patterns on PUT /api/
  signatures, returning an inline error with rule name and pattern index
- signatures.sample.yml / backend/config/signatures.yml: document new
  payload_regex field; add four annotated example rules (Basic-auth,
  PII, JWT cookie, SQL injection)
- gen_demo.py: add 9 payload-inspection demo flows (5 payload_contains
  + 4 payload_regex); regenerate demo_all_rules.pcap (now covers 21 rules)
- Sphinx docs: update custom-signatures.rst and signature-rules.rst with
  payload_regex field reference, execution semantics, and examples
- SignaturesModal: mention regex support in editor help text

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 525f99c3-2cdb-44a0-a79b-30c2fb80cef4

📥 Commits

Reviewing files that changed from the base of the PR and between a177948 and 2bb64ae.

📒 Files selected for processing (3)
  • backend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.java
  • backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java
  • sample-files/gen_demo.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.java
  • sample-files/gen_demo.py
  • backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java

📝 Walkthrough

Walkthrough

Adds payload_regex to custom signature rules: controller validates regex syntax on save, service evaluates compiled regexes against decoded packet payloads (with size cap and match_all semantics), README/docs/frontend help updated, and demo traffic expanded to exercise payload rules.

Changes

Payload Regex Matching Feature

Layer / File(s) Summary
Configuration schema and samples
backend/config/signatures.yml, signatures.sample.yml
Introduces payload_regex YAML entries (pattern, optional case_insensitive) and adds four sample regex-based signatures (Basic-auth, PII, JWT cookie, SQL injection).
Backend service regex matching implementation
backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java
applySignatures accepts payload_regex, requires at least one of match/payload_contains/payload_regex, applies match_all consistently, adds payloadRegexMatch to compile/cache patterns, lazily decodes hex payloads with hexToAscii, and enforces a per-packet byte cap.
Controller validation of regex patterns
backend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.java
saveSignatures parses submitted YAML and compiles each payload_regex pattern, returning 400 with a descriptive error including rule name and pattern index on failure.
Configuration documentation
docs/configuration/signature-rules.rst
Documents payload_regex schema, case_insensitive option, execution semantics (OR vs match_all AND across payload entries), per-packet 64 KB cap, and editor validation flow that compiles regexes on save with inline errors.
User docs and frontend help
docs/features/custom-signatures.rst, README.md, frontend/src/components/signatures/SignaturesModal.tsx
Expands user-facing docs and help text to cover payload_contains vs payload_regex, case_insensitive, combined match_all behavior, inline regex error reporting on save, and updates README/demo coverage to 21 rules.
Demo traffic and examples
docs/features/custom-signatures.rst, sample-files/gen_demo.py
Adds nine payload-focused synthetic flows (HTTP fragments, PDF magic bytes, cleartext credentials, token POST, DNS TXT, Basic auth header, PII strings, JWT Set-Cookie, SQL-injection-like query) and updates examples/documentation to show regex-based detection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A bunny hops through payloads with newfound regex flair,
Matching patterns both literal and patterns in the air,
With case-insensitive grace and safety bounds in place,
Custom signatures now chase tokens, SQL, and JWTs apace,
Twenty-one rules hopping, detection in a merry race!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing regex pattern matching for payload inspection, which is the core feature addition across all modified files.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #341: payload_regex support with Java regex, case-insensitive matching, match_all semantics, validation on save with inline error reporting, and per-packet payload size capping.
Out of Scope Changes check ✅ Passed All changes directly support the payload_regex feature implementation. Documentation, configuration, demo data, controller validation, service logic, and UI help text are all scoped to this feature.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/issue-341-payload-regex-signatures

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for payload inspection using regular expressions (payload_regex) in custom signature rules, complementing the existing exact byte-string matching (payload_contains). The changes span backend validation and matching logic, frontend modal updates, documentation, and synthetic PCAP generation for testing. The reviewer feedback focuses on performance optimizations and safety improvements in the backend: introducing a thread-safe pattern cache to avoid redundant regex compilations, lazily caching decoded ASCII payloads to prevent repeated hex-to-ASCII conversions, optimizing the hexToAscii conversion to eliminate substring allocations, and adding safe type checks (instanceof List) when parsing payload_regex from YAML to prevent potential ClassCastException crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread backend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.java Outdated
Comment thread backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

347-350: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the demo rule count in the sample-files list too.

This section still says demo_all_rules.pcap triggers 12 custom signature demo rules, while the rest of the PR updates the sample/demo set to 21. Leaving both counts in the README will confuse anyone validating the demo content.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 347 - 350, Update the README entry for
`demo_all_rules.pcap` so the demo rule count matches the rest of the PR: change
the description text that currently reads "Triggers all 12 custom signature demo
rules" to "Triggers all 21 custom signature demo rules" (locate the
`demo_all_rules.pcap` line in the sample-files list and edit the count).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java`:
- Around line 85-87: The code assumes payload_regex is a
List<Map<String,Object>> and performs unsafe casts in SignaturesController and
CustomSignatureService; instead, add explicit type/schema checks before casting:
in the save/validation path in SignaturesController validate that
rule.get("payload_regex") is either null or an instance of List and that every
element is an instance of Map (with expected keys like "pattern"); if the shape
is invalid, return a validation error or reject the rule instead of saving; in
runtime matching inside CustomSignatureService (where payload_regex is read and
iterated), guard against non-List values and non-Map elements—treat malformed
entries as non-matching (skip them) rather than throwing, and log a debug/warn
about the ignored malformed payload_regex. Ensure you reference and update any
helper/validator used by both classes so the same schema logic is reused.
- Around line 397-405: The code is forcing Pattern.DOTALL by default in
payloadRegexMatch and incorrectly decoding hex to chars; change
payloadRegexMatch to start flags at 0 (not Pattern.DOTALL) so authors can opt
into DOTALL via (?s) or set flags via the entry flag handling (keep the existing
case_insensitive branch for entry.get("case_insensitive")), and update
hexToAscii to build a byte[] from hex pairs and decode it using UTF-8 (e.g., new
String(bytes, StandardCharsets.UTF_8)) so payloadRegexMatch receives proper
UTF‑8 text rather than raw char casts; update references to patternStr, compiled
Pattern creation, payloadRegexMatch, and hexToAscii accordingly.
- Around line 428-441: hexToAscii currently maps each hex byte to a Java char,
which breaks multi-byte UTF-8 sequences; change it to parse the hex into a
byte[] (respecting the even-length/MAX_REGEX_PAYLOAD_BYTES cap and
ignoring/truncating a trailing half-nibble), then construct and return new
String(byteArray, StandardCharsets.UTF_8) so UTF-8 multibyte characters are
decoded correctly; when parsing invalid hex pairs handle NumberFormatException
by inserting the UTF-8 replacement byte (or 0x3F '?') into the byte array so the
final String contains a replacement character.

In `@sample-files/gen_demo.py`:
- Line 460: The print call uses an unnecessary f-string with no placeholders;
replace the f-string print(f"Payload-inspection demo: 9 flows (5
payload_contains + 4 payload_regex) added") with a normal string
print("Payload-inspection demo: 9 flows (5 payload_contains + 4 payload_regex)
added") to remove the empty f-string (locate the statement in gen_demo.py around
the payload-inspection demo print).

---

Outside diff comments:
In `@README.md`:
- Around line 347-350: Update the README entry for `demo_all_rules.pcap` so the
demo rule count matches the rest of the PR: change the description text that
currently reads "Triggers all 12 custom signature demo rules" to "Triggers all
21 custom signature demo rules" (locate the `demo_all_rules.pcap` line in the
sample-files list and edit the count).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 6b1ecd37-1fcc-4dbd-9239-93b29293b4ab

📥 Commits

Reviewing files that changed from the base of the PR and between 8a468ee and a177948.

📒 Files selected for processing (10)
  • README.md
  • backend/config/signatures.yml
  • backend/src/main/java/com/tracepcap/analysis/controller/SignaturesController.java
  • backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java
  • docs/configuration/signature-rules.rst
  • docs/features/custom-signatures.rst
  • frontend/src/components/signatures/SignaturesModal.tsx
  • sample-files/demo_all_rules.pcap
  • sample-files/gen_demo.py
  • signatures.sample.yml

Comment thread backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java Outdated
Comment thread backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java Outdated
Comment thread backend/src/main/java/com/tracepcap/analysis/service/CustomSignatureService.java Outdated
Comment thread sample-files/gen_demo.py Outdated
- Use pattern cache (ConcurrentHashMap) to avoid recompiling regex per conversation
- Add lazy decoded[] array to avoid re-decoding each packet hex payload per pattern
- Use final int flags (ternary) so lambda capture compiles without effectively-final error
- Drop Pattern.DOTALL default; users can add (?s) inline if needed
- Replace hexToAscii: byte[] + Character.digit + StandardCharsets.UTF_8 (no substring/parseInt)
- Fix instanceof guard in SignaturesController.java to handle non-List payload_regex values
- Remove unnecessary f-prefix from print statement in gen_demo.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: regex pattern matching for payload inspection flags

1 participant