Llama series models evaluation benchmarks

Name: Llama series models evaluation benchmarks
Creator: Open Source
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/liyucheng09/Contamination_Detector

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集分析了六个流行的多选题问答基准，以量化这些基准与Llama训练集的重叠程度，揭示了污染率。研究发现，这些污染率在1%至8.7%之间。本次评估对六个基准进行了规模分析，其任务是对语言模型进行评估。

This dataset analyzes six popular multiple-choice question answering benchmarks to quantify the degree of overlap between these benchmarks and the Llama training corpus, thus revealing the contamination rates. The study found that these contamination rates fall between 1% and 8.7%. This evaluation conducts a scale analysis on the six benchmarks, which are designed to evaluate language models.

提供机构：

Open Source

5,000+

优质数据集

54 个

任务类型

进入经典数据集