five

Llama series models evaluation benchmarks

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/liyucheng09/Contamination_Detector
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集分析了六个流行的多选题问答基准,以量化这些基准与Llama训练集的重叠程度,揭示了污染率。研究发现,这些污染率在1%至8.7%之间。本次评估对六个基准进行了规模分析,其任务是对语言模型进行评估。

This dataset analyzes six popular multiple-choice question answering benchmarks to quantify the degree of overlap between these benchmarks and the Llama training corpus, thus revealing the contamination rates. The study found that these contamination rates fall between 1% and 8.7%. This evaluation conducts a scale analysis on the six benchmarks, which are designed to evaluate language models.
提供机构:
Open Source
二维码
社区交流群
二维码
科研交流群
商业服务