DataSciBench

Name: DataSciBench
Creator: THUDM
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/THUDM/DataSciBench

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为DataSciBench，是一个全面的大型语言模型（LLM）能力评估基准，适用于数据科学领域。它包含了多样化和具有挑战性的提示，用于处理不确定的真相和评估指标。数据集涵盖了来自在线平台、公开基准测试和人工编写问题的多种来源的提示，并经过专家评审确保质量。该数据集规模包括222个提示和519个测试案例，其任务范围涵盖了数据科学任务中的数据清洗、探索、可视化、预测建模、数据挖掘和可解释性评估。

This dataset, named DataSciBench, is a comprehensive benchmark for evaluating the capabilities of Large Language Models (LLMs) tailored for the data science domain. It includes diverse and challenging prompts designed to handle scenarios with uncertain ground truths and evaluation metrics. The dataset covers prompts from multiple sources including online platforms, public benchmarks, and manually written questions, and has undergone expert review to ensure its quality. It consists of 222 prompts and 519 test cases, with task scopes spanning data science-related tasks such as data cleaning, data exploration, data visualization, predictive modeling, data mining, and interpretability evaluation.

提供机构：

THUDM

5,000+

优质数据集

54 个

任务类型

进入经典数据集