Lila

arXiv2025-09-30 收录

下载链接：

https://github.com/allenai/lila

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为Lila，是一个涵盖数学能力、语言格式、语言多样性及外部知识四个维度的统一数学推理基准测试，包含了23个不同的任务。该基准测试对20个数据集进行了扩展，提供了可解释的解决方案和正确答案，这些答案以Python程序的形式呈现。此外，还包括了用于评估分布外性能和鲁棒性的数据集。规模上，这23个任务被分为训练集、开发集和测试集，分别占比70%、10%和20%。任务涵盖了包括算术和微积分在内的各种数学推理任务。

Lila is a unified mathematical reasoning benchmark covering four core dimensions: mathematical proficiency, linguistic formatting, linguistic diversity, and external knowledge, which includes 23 distinct tasks. This benchmark is expanded from 20 existing datasets, providing interpretable solutions and correct answers formatted as Python programs. Additionally, it incorporates datasets for evaluating out-of-distribution (OOD) performance and robustness. For dataset partitioning, these 23 tasks are split into training, development, and test sets, accounting for 70%, 10%, and 20% of the total dataset respectively. The tasks span a wide range of mathematical reasoning scenarios including arithmetic and calculus.

5,000+

优质数据集

54 个

任务类型

进入经典数据集