Implement Patch Deduplication to Eliminate Redundant Candidate Patches

**Title:** Implement Patch Deduplication to Eliminate Redundant Candidate Patches

**Description:**

We propose to implement a **patch deduplication** strategy in Prometheus to eliminate semantically redundant candidate patches, thereby reducing the ensemble reasoning space and improving overall efficiency.

### 🛠️ Proposed Approach

1. **Patch Parsing and Normalization**
   Use the [`[unidiff](https://pypi.org/project/unidiff/)`](https://pypi.org/project/unidiff/) Python package to parse raw patch diffs into structured representations (e.g., added/removed lines, file paths). This enables consistent, reliable normalization.

2. **Semantic-irrelevant Element Removal**
   Perform patch normalization by eliminating semantically irrelevant variations such as:

   * Whitespace differences (extra spaces, tabs, line breaks)
   * Code comments
   * Reordering that does not affect behavior

3. **Equivalence Detection**

   * Discard patches with syntax errors (i.e., fail to parse into valid code).
   * Detect and remove candidate patches that yield the *same normalized form*.
   * Treat such patches as semantically equivalent and redundant.

4. **Deduplication Evaluation**
   Evaluate deduplication impact by measuring:

   * % reduction in patch space (targeting \~25-30% as seen in prior work)
   * Speedup in ensemble reasoning
   * Any accuracy/performance trade-offs

---

**Checklist:**

* [ ] Integrate `unidiff` for patch parsing
* [ ] Design normalization strategy
* [ ] Implement equivalence detection
* [ ] Add deduplication module to patch generation pipeline
* [ ] Evaluate on SWE-bench Lite and Prometheus patch logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Patch Deduplication to Eliminate Redundant Candidate Patches #97

🛠️ Proposed Approach

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Implement Patch Deduplication to Eliminate Redundant Candidate Patches #97

Description

🛠️ Proposed Approach

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions