Possible bug: evaluate_erdos_solution returns self-claimed c5_bound instead of verified value

While reproducing the Erdős min-overlap result using Tinker, I noticed something odd: at step 34 my run logged a score of **0.38092**, which appeared *better* than the paper's claimed **0.380932** — that felt suspicious, so I dug deeper.

It turns out the logged score on W&B is not the verified C₅, but rather the **model's self-claimed `c5_bound`**. In `evaluate_erdos_solution`, `verify_c5_solution` is called and *does* compute the true value, but its return value is discarded — the function returns the claimed `c5_bound` instead:

```python
def evaluate_erdos_solution(h_values, c5_bound, n_points) -> float:
    verify_c5_solution(h_values, c5_bound, n_points)  # return value dropped
    return float(c5_bound)                             # model's self-reported value
```

The validation only checks `np.isclose(..., atol=1e-4)`, so a model can legally under-report by up to ~9e-5 and still pass. The actual verified C₅ from my run is **0.380972**, while the claimed score is **0.380932** — a gap of **4.07e-5**, which is within tolerance and would silently pass.

This means the paper's claimed score of 0.380932 is unverifiable as-is: if the true C₅ were 0.380972 (as in my reproduction), the system would accept 0.380932 as a valid claim without raising any error.

I suspect this is a bug — `evaluate_erdos_solution` should return the verified computed value rather than the claimed one:

```python
def evaluate_erdos_solution(h_values, c5_bound, n_points) -> float:
    computed_c5 = verify_c5_solution(h_values, c5_bound, n_points)
    return float(computed_c5)
```

Happy to submit a PR. Would also be helpful to know whether the published scores were logged from `c5_bound` or from the verified value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug: evaluate_erdos_solution returns self-claimed c5_bound instead of verified value #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible bug: evaluate_erdos_solution returns self-claimed c5_bound instead of verified value #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions