five

StatQA

收藏
arXiv2025-09-30 收录
下载链接:
https://statqa.github.io/
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为StatQA,专为统计分析任务设计,包含11,623个实例,旨在评估大型语言模型(LLMs)在专业统计任务中的熟练程度及其在假设检验方法适用性评估方面的能力。该数据集用于测试和评估,包括由数据集构建管道生成的训练集。其规模达到11,623个示例,任务重点在于评估大型语言模型在统计任务中的表现。

The dataset is named StatQA, which is specifically designed for statistical analysis tasks and contains 11,623 instances. It aims to evaluate the proficiency of Large Language Models (LLMs) in professional statistical tasks and their ability to assess the applicability of hypothesis testing methods. This dataset is used for testing and evaluation, including a training set generated via the dataset construction pipeline. With a total of 11,623 examples, its core task focuses on evaluating the performance of LLMs in statistical tasks.
提供机构:
Authors of the paper
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作