StatQA

Name: StatQA
Creator: Authors of the paper
Published: 2025-09-30T13:43:50+08:00

arXiv2025-09-30 收录

统计分析

自然语言处理

数据链接：

https://statqa.github.io/数据链接链接失效反馈

官方服务：

资源简介：

该数据集名为StatQA，专为统计分析任务设计，包含11,623个实例，旨在评估大型语言模型（LLMs）在专业统计任务中的熟练程度及其在假设检验方法适用性评估方面的能力。该数据集用于测试和评估，包括由数据集构建管道生成的训练集。其规模达到11,623个示例，任务重点在于评估大型语言模型在统计任务中的表现。

The dataset is named StatQA, which is specifically designed for statistical analysis tasks and contains 11,623 instances. It aims to evaluate the proficiency of Large Language Models (LLMs) in professional statistical tasks and their ability to assess the applicability of hypothesis testing methods. This dataset is used for testing and evaluation, including a training set generated via the dataset construction pipeline. With a total of 11,623 examples, its core task focuses on evaluating the performance of LLMs in statistical tasks.

提供机构：

Authors of the paper

StatQA

资源简介：

相关数据集