Skip to content

Possible bug: evaluate_erdos_solution returns self-claimed c5_bound instead of verified value #19

Description

@siyuan0000

While reproducing the Erdős min-overlap result using Tinker, I noticed something odd: at step 34 my run logged a score of 0.38092, which appeared better than the paper's claimed 0.380932 — that felt suspicious, so I dug deeper.

It turns out the logged score on W&B is not the verified C₅, but rather the model's self-claimed c5_bound. In evaluate_erdos_solution, verify_c5_solution is called and does compute the true value, but its return value is discarded — the function returns the claimed c5_bound instead:

def evaluate_erdos_solution(h_values, c5_bound, n_points) -> float:
    verify_c5_solution(h_values, c5_bound, n_points)  # return value dropped
    return float(c5_bound)                             # model's self-reported value

The validation only checks np.isclose(..., atol=1e-4), so a model can legally under-report by up to ~9e-5 and still pass. The actual verified C₅ from my run is 0.380972, while the claimed score is 0.380932 — a gap of 4.07e-5, which is within tolerance and would silently pass.

This means the paper's claimed score of 0.380932 is unverifiable as-is: if the true C₅ were 0.380972 (as in my reproduction), the system would accept 0.380932 as a valid claim without raising any error.

I suspect this is a bug — evaluate_erdos_solution should return the verified computed value rather than the claimed one:

def evaluate_erdos_solution(h_values, c5_bound, n_points) -> float:
    computed_c5 = verify_c5_solution(h_values, c5_bound, n_points)
    return float(computed_c5)

Happy to submit a PR. Would also be helpful to know whether the published scores were logged from c5_bound or from the verified value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions