BeyondX
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/johnson0213/beyondx
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为BeyondX,是一个专门设计用来挑战大型语言模型(LLMs)的新型基准,它通过包含多个未知数的问题来评估模型在处理复杂代数问题上的性能。该数据集根据未知数的数量被分为三个子集(BeyondX_3、BeyondX_4和BeyondX_5),评估结果显示,随着未知数数量的增加,模型的准确性显著下降。这三个子集分别对应3个、4个和5个未知数的代数问题,任务的目的是解决含有多个未知数的代数问题。
The dataset named BeyondX is a novel benchmark specifically designed to challenge Large Language Models (LLMs), which evaluates models' performance on complex algebraic problem-solving tasks via questions containing multiple unknown variables. It is divided into three subsets based on the number of unknown variables: BeyondX_3, BeyondX_4, and BeyondX_5. Evaluation results demonstrate that models' accuracy declines significantly as the number of unknown variables increases. These three subsets correspond to algebraic problems with 3, 4, and 5 unknown variables respectively, and the core task of this benchmark is to solve algebraic problems containing multiple unknowns.



