Hi!
Thanks for your wonderful work on DA-Code , we really enjoyed it and found it very valuable for evaluating data science code generation capabilities of large language models 🙌
We’d like to briefly introduce our project, DSLighting.
About DSLighting
DSLighting is a data science agent harness — an LLM-driven autonomous execution engine that turns task descriptions and datasets into iterative workflows including:
- Code generation
- Execution
- Evaluation
- Refinement
It is designed to make it easy to build, run, and evaluate data science agents in a reproducible and extensible way.
Support for DA-Code
We’ve recently added support for running DA-Code within DSLighting. With just a few lines of code, users can easily run the benchmark:
from dotenv import load_dotenv
load_dotenv()
from dslighting.api import DSBenchmark
from dslighting.core import ConfigBuilder
config = ConfigBuilder().build_config(
workflow="aide",
model="gpt-4o",
)
benchmark = DSBenchmark("dacode", data_dir="/path/to/dacode")
result = benchmark.run(config=config)
print(result.results_path)
print(result.metadata_path)
Why this might be useful
- Minimal setup to run DA-Code
- Unified interface across multiple benchmarks
- Supports iterative agent workflows (not just single-pass evaluation)
- Easy to configure for different models and workflows
Other supported benchmarks
DSLighting currently also supports:
- DABench (ICML 2024)
- MoSciBench (ICLR 2026)
- MLE-Bench
- ScienceAgentBench (ICLR 2025)
We hope this can make it easier for researchers to run and extend DA-Code in agent-based workflows.
Happy to hear your thoughts, and we’d love to explore potential alignment or integration!
Thanks again for your great work 🙌
Hi!
Thanks for your wonderful work on DA-Code , we really enjoyed it and found it very valuable for evaluating data science code generation capabilities of large language models 🙌
We’d like to briefly introduce our project, DSLighting.
About DSLighting
DSLighting is a data science agent harness — an LLM-driven autonomous execution engine that turns task descriptions and datasets into iterative workflows including:
It is designed to make it easy to build, run, and evaluate data science agents in a reproducible and extensible way.
Support for DA-Code
We’ve recently added support for running DA-Code within DSLighting. With just a few lines of code, users can easily run the benchmark:
Why this might be useful
Other supported benchmarks
DSLighting currently also supports:
We hope this can make it easier for researchers to run and extend DA-Code in agent-based workflows.
Happy to hear your thoughts, and we’d love to explore potential alignment or integration!
Thanks again for your great work 🙌