five

Snowflake/dare-bench

收藏
Hugging Face2026-03-02 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Snowflake/dare-bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en arxiv: TODO task_categories: - text-generation - question-answering tags: - agent - data-science - tool-use - reinforcement-learning - sft pretty_name: DARE-Bench license: apache-2.0 configs: - config_name: default data_files: - split: train path: "train_viewer.jsonl" - split: eval path: "eval_viewer.jsonl" --- # DARE-Bench **[ICLR 2026]** DARE-Bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science Fan Shu<sup>1</sup>, Yite Wang<sup>2</sup>, Ruofan Wu<sup>1</sup>, Boyi Liu<sup>2</sup>, Zhewei Yao<sup>2</sup>, Yuxiong He<sup>2</sup>, Feng Yan<sup>1</sup> <sup>1</sup>University of Houston &ensp; <sup>2</sup>Snowflake AI Research ## 🔎 Overview **DARE-Bench** (ICLR 2026) is a benchmark for evaluating LLM agents on data science tasks, focusing on modeling and instruction fidelity. This Hugging Face repository provides a selected subset of the full benchmark for public release. ### ✨ Highlights - ✅ Selected subset of the full benchmark - 📚 Splits: Train **2,137** • Eval **162** - 🗃️ Database assets distributed as zip files for convenient download ## 📦 Dataset Contents | Split | Files | #Entries | | --- | --- | ---: | | Train | `train/question_list.json`, `train/databases.zip` | 2,137 | | Eval | `eval/question_list.json`, `eval/databases.zip` | 162 | | SFT | `sft_data/` | — | ### 🗂️ File layout - `sft_data/`: supervised fine-tuning trajectories. - `train/question_list.json`: training tasks (JSON array). - `train/databases.zip`: database assets for training tasks. - `eval/question_list.json`: evaluation tasks (JSON array). - `eval/databases.zip`: database assets for evaluation tasks. ## 🔗 Resources Related resources are also available, please check: | Resource | Link | | --- | --- | | 📄 Paper | [arxiv.org/abs/2602.24288](https://arxiv.org/abs/2602.24288) | | 💻 Code | [Snowflake-Labs/dare-bench](https://github.com/Snowflake-Labs/dare-bench) | ## 📜 License - The `databases/` assets are subject to their **original source licenses**. - All other files in this repository are released under **Apache-2.0**. ## 📝 Citation ```bibtex @inproceedings{shu2026darebench, title={DARE-Bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science}, author={Shu, Fan and Wang, Yite and Wu, Ruofan and Liu, Boyi and Yao, Zhewei and He, Yuxiong and Yan, Feng}, booktitle={International Conference on Learning Representations}, year={2026} } ```
提供机构:
Snowflake
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作