π Tutorials: Learn more from our detailed guides β
ReActXen IoT Agent (EMNLP 2025) | AssetOpsBench Technical Material
π Paper | π€ HF-Dataset | π’ Blog | Contributors
- Announcements
- Introduction
- Datasets
- AI Agents
- Multi-Agent Frameworks
- System Diagram
- Leaderboards
- Docker Setup
- Talks & Events
- External Resources
- Contributors
- π― Upcoming Events: Tutorial at AAAI 2026 β Agents for Industry 4.0 Applications.
- π Past Event: 2025-10-03 β 2 Hour Workshop AI Agents and Their Role in Industry 4.0 Applications (NJIT-ACM)
- π Accepted Papers: Parts of papers are accepted at NeurIPS 2025, EMNLP 2025 Research Track, and EMNLP 2025 Industry Track.
- π 2025-09-01: CODS 2025 Competition launched β Access AI Agentic Challenge AssetOpsBench-Live.
- π¦ 2025-06-01: AssetOpsBench v1.0 released with 141 industrial Scenarios.
β¨ Stay tuned for new tracks, competitions, and community events.
AssetOpsBench is a unified framework for developing, orchestrating, and evaluating domain-specific AI agents in industrial asset operations and maintenance.
It provides:
- 4 domain-specific agents
- 2 multi-agent orchestration frameworks
Designed for maintenance engineers, reliability specialists, and facility planners, it allows reproducible evaluation of multi-step workflows in simulated industrial environments.
AssetOpsBench scenarios span multiple domains:
| Domain | Example Task |
|---|---|
| IoT | "List all sensors of Chiller 6 in MAIN site" |
| FSMR | "Identify failure modes detected by Chiller 6 Supply Temperature" |
| TSFM | "Forecast 'Chiller 9 Condenser Water Flow' for the week of 2020-04-27" |
| WO | "Generate a work order for Chiller 6 anomaly detection" |
Some tasks focus on a single domain, others are multi-step end-to-end workflows.
Explore all scenarios HF-Dataset.
- IoT Agent:
get_sites,get_history,get_assets,get_sensors - FMSR Agent:
get_sensors,get_failure_modes,get_failure_sensor_mapping - TSFM Agent:
forecasting,timeseries_anomaly_detection - WO Agent:
generate_work_order
- MetaAgent: reAct-based single-agent-as-tool orchestration
- AgentHive: plan-and-execute sequential workflow
Visual overview of AssetOpsBench workflow:
- Evaluated with 7 Large Language Models
- Trajectories scored using LLM Judge (Llama-4-Maverick-17B)
- 6-dimensional criteria measure reasoning, execution, and data handling
Example: MetaAgent leaderboard
- Please Refer to the
- Pre-built Docker Images:
assetopsbench-basic(minimal) &assetopsbench-extra(full) - Conda environment:
assetopsbench - Full setup guide
cd /path/to/AssetOpsBench
chmod +x benchmark/entrypoint.sh
docker-compose -f benchmark/docker-compose.yml build
docker-compose -f benchmark/docker-compose.yml up- π Paper: AssetOpsBench: Benchmarking AI Agents for Industrial Asset Operations
- π€ HuggingFace: Scenario & Model Hub
- π’ Blog: Insights, Tutorials, and Updates
- π₯ Recorded Talks: Link coming soon.
Thanks goes to these wonderful people β¨










