StrucText-Eval
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/MikeGu721/StrucText-Eval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含5800个预先生成并标注的样本,旨在评估大型语言模型对结构化文本的理解和推理能力。该数据集分为两个测试套件:常规测试套件和测试困难套件,后者用于评估模型在更复杂任务上的性能。规模上,数据集包含5800个样本(其中3712个在常规测试套件中,2088个在测试困难套件中),其任务是对大型语言模型在处理富含结构文本时的推理能力进行评估。
This dataset comprises 5,800 pre-generated and annotated samples, designed to evaluate large language models' (LLMs) understanding and reasoning capabilities for structured text. It is divided into two test suites: the Regular Test Suite and the Hard Test Suite, with the Hard Test Suite specifically dedicated to assessing model performance on more complex tasks. The 5,800 samples are split into 3,712 in the Regular Test Suite and 2,088 in the Hard Test Suite, serving the core goal of evaluating LLMs' reasoning abilities when processing text rich in structural features.



