PyTorch-native toolkit for predictive evaluation of AI systems.
Benchmark scores increasingly gate deployment decisions but rarely predict how a model will behave in production. torch_measure treats evaluation itself as a predictive modeling problem: latent-variable models infer a system's capability directly from sparse benchmark observations and predict its performance on unseen tasks. Built on PyTorch, with GPU-accelerated IRT, factor models, amortized inference, adaptive testing, and tabular baselines.
With pip:
pip install torch_measureWith uv (faster; drop-in replacement for pip):
uv pip install torch_measure # into the active environment
uv add torch_measure # into a uv-managed projectWe welcome contributions! Please see our contributing guidelines for details, or drop by our Discord to chat.