Llama series models evaluation benchmarks
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/liyucheng09/Contamination_Detector
下载链接
链接失效反馈官方服务:
资源简介:
该数据集分析了六个流行的多选题问答基准,以量化这些基准与Llama训练集的重叠程度,揭示了污染率。研究发现,这些污染率在1%至8.7%之间。本次评估对六个基准进行了规模分析,其任务是对语言模型进行评估。
This dataset analyzes six popular multiple-choice question answering benchmarks to quantify the degree of overlap between these benchmarks and the Llama training corpus, thus revealing the contamination rates. The study found that these contamination rates fall between 1% and 8.7%. This evaluation conducts a scale analysis on the six benchmarks, which are designed to evaluate language models.
提供机构:
Open Source



