Pinned Loading
-
-
-
-
-
-
ahoum-conversation-eval
ahoum-conversation-eval PublicProduction-ready benchmark scoring conversations on 300+ facets (scalable to 5000+) using open-weights LLMs ≤16B. With per-score confidence, 4 backends, FastAPI + Streamlit UI, and Docker.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.