Skip to content

Implement Patch Deduplication to Eliminate Redundant Candidate Patches #97

Description

@dcloud347

Title: Implement Patch Deduplication to Eliminate Redundant Candidate Patches

Description:

We propose to implement a patch deduplication strategy in Prometheus to eliminate semantically redundant candidate patches, thereby reducing the ensemble reasoning space and improving overall efficiency.

🛠️ Proposed Approach

  1. Patch Parsing and Normalization
    Use the [unidiff](https://pypi.org/project/unidiff/) Python package to parse raw patch diffs into structured representations (e.g., added/removed lines, file paths). This enables consistent, reliable normalization.

  2. Semantic-irrelevant Element Removal
    Perform patch normalization by eliminating semantically irrelevant variations such as:

    • Whitespace differences (extra spaces, tabs, line breaks)
    • Code comments
    • Reordering that does not affect behavior
  3. Equivalence Detection

    • Discard patches with syntax errors (i.e., fail to parse into valid code).
    • Detect and remove candidate patches that yield the same normalized form.
    • Treat such patches as semantically equivalent and redundant.
  4. Deduplication Evaluation
    Evaluate deduplication impact by measuring:

    • % reduction in patch space (targeting ~25-30% as seen in prior work)
    • Speedup in ensemble reasoning
    • Any accuracy/performance trade-offs

Checklist:

  • Integrate unidiff for patch parsing
  • Design normalization strategy
  • Implement equivalence detection
  • Add deduplication module to patch generation pipeline
  • Evaluate on SWE-bench Lite and Prometheus patch logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions