Skip to content

Audit pack: repo-level prompt injection scanner #1

@Fieldnote-Echo

Description

Context

While building the adversarial input sanitizer for navi-bootstrap's rendering pipeline, we realized the defense tooling doubles as an offensive scanner: point it at any repo's committed files and it detects prompt injection attacks targeting AI coding assistants (Copilot, Claude Code, Cursor, etc.).

navi-os already has production-grade detection in src/navi/security/:

  • ContentScanner — 18 injection patterns, 14 self-replication patterns, 7 hidden instruction patterns, entropy analysis, multi-layer decode
  • UnicodeNormalizer — 4-pass pipeline: zero-width strip → fullwidth-to-ASCII → NFKC → homoglyph replacement (42 pairs)
  • prompt_sanitizer — iterative decode (URL → HTML entities → hex → base64), pattern replacement
  • Full adversarial test suite + atheris fuzz corpus

Proposal

The audit pack (8th pack from the design doc, currently undesigned) becomes a repo-level prompt injection scanner:

  • Scan committed files (README, CONTRIBUTING, issue templates, PR templates, code comments, docstrings, CI configs) for hostile patterns
  • Detect: homoglyphs, zero-width chars, encoded payloads, template injection, prompt injection directives, self-replication patterns
  • Output: structured report (findings, severity, location, risk score)
  • Ships as an nboot pack — nboot apply --pack audit scans the target repo

Attack surface

Every repo that an AI coding assistant reads is a prompt injection surface. Hostile content in committed files can:

  • Override agent instructions via embedded directives
  • Hide payloads in visually-identical homoglyph text
  • Split detection keywords with zero-width characters
  • Encode instructions in base64/HTML entities to evade pattern matching
  • Embed self-replication patterns (Morris II style)

Status

Parking this as an issue. The sanitizer module (defensive side) is in progress. The audit pack (offensive/scanning side) is future work.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapFuture work, not immediately actionable

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions