open-llm-leaderboard/details_synapsoft__Llama-2-7b-chat-hf-flan2022-1.2M
收藏数据集概述
数据集简介
该数据集是在对模型 synapsoft/Llama-2-7b-chat-hf-flan2022-1.2M 进行评估运行期间自动创建的。数据集包含64个配置,每个配置对应一个评估任务。数据集从2次运行中创建,每次运行的结果可以在每个配置中找到,使用运行的时间戳作为分割名称。"train" 分割始终指向最新的结果。
数据集结构
数据集包含以下配置:
harness_arc_challenge_25harness_drop_3harness_gsm8k_5harness_hellaswag_10harness_hendrycksTest_5harness_hendrycksTest_abstract_algebra_5harness_hendrycksTest_anatomy_5harness_hendrycksTest_astronomy_5harness_hendrycksTest_business_ethics_5harness_hendrycksTest_clinical_knowledge_5harness_hendrycksTest_college_biology_5harness_hendrycksTest_college_chemistry_5harness_hendrycksTest_college_computer_science_5harness_hendrycksTest_college_mathematics_5harness_hendrycksTest_college_medicine_5harness_hendrycksTest_college_physics_5harness_hendrycksTest_computer_security_5harness_hendrycksTest_conceptual_physics_5harness_hendrycksTest_econometrics_5harness_hendrycksTest_electrical_engineering_5harness_hendrycksTest_elementary_mathematics_5
数据文件
每个配置包含多个分割,每个分割对应不同的运行时间戳。例如:
harness_arc_challenge_25包含2023_09_04T22_45_47.858606和latest分割。harness_drop_3包含2023_09_23T08_39_00.771555和latest分割。
最新结果
最新结果来自 2023-09-23T08:39:00.771555 运行,包含以下指标:
python
{
"all": {
"em": 0.2627936241610738,
"em_stderr": 0.004507560917898865,
"f1": 0.30115981543624176,
"f1_stderr": 0.004494140287139199,
"acc": 0.3666975232366727,
"acc_stderr": 0.008004674480789642
},
"harness|drop|3": {
"em": 0.2627936241610738,
"em_stderr": 0.004507560917898865,
"f1": 0.30115981543624176,
"f1_stderr": 0.004494140287139199
},
"harness|gsm8k|5": {
"acc": 0.015163002274450341,
"acc_stderr": 0.003366022949726345
},
"harness|winogrande|5": {
"acc": 0.7182320441988951,
"acc_stderr": 0.01264332601185294
}
}



