Title: Implement Patch Deduplication to Eliminate Redundant Candidate Patches
Description:
We propose to implement a patch deduplication strategy in Prometheus to eliminate semantically redundant candidate patches, thereby reducing the ensemble reasoning space and improving overall efficiency.
🛠️ Proposed Approach
-
Patch Parsing and Normalization
Use the [unidiff](https://pypi.org/project/unidiff/) Python package to parse raw patch diffs into structured representations (e.g., added/removed lines, file paths). This enables consistent, reliable normalization.
-
Semantic-irrelevant Element Removal
Perform patch normalization by eliminating semantically irrelevant variations such as:
- Whitespace differences (extra spaces, tabs, line breaks)
- Code comments
- Reordering that does not affect behavior
-
Equivalence Detection
- Discard patches with syntax errors (i.e., fail to parse into valid code).
- Detect and remove candidate patches that yield the same normalized form.
- Treat such patches as semantically equivalent and redundant.
-
Deduplication Evaluation
Evaluate deduplication impact by measuring:
- % reduction in patch space (targeting ~25-30% as seen in prior work)
- Speedup in ensemble reasoning
- Any accuracy/performance trade-offs
Checklist:
Title: Implement Patch Deduplication to Eliminate Redundant Candidate Patches
Description:
We propose to implement a patch deduplication strategy in Prometheus to eliminate semantically redundant candidate patches, thereby reducing the ensemble reasoning space and improving overall efficiency.
🛠️ Proposed Approach
Patch Parsing and Normalization
Use the
[unidiff](https://pypi.org/project/unidiff/)Python package to parse raw patch diffs into structured representations (e.g., added/removed lines, file paths). This enables consistent, reliable normalization.Semantic-irrelevant Element Removal
Perform patch normalization by eliminating semantically irrelevant variations such as:
Equivalence Detection
Deduplication Evaluation
Evaluate deduplication impact by measuring:
Checklist:
unidifffor patch parsing