open-llm-leaderboard/details_dhmeltzer__Llama-2-13b-hf-eli5-wiki-1024_r_64_alpha_16
收藏数据集概述
数据集简介
该数据集是在评估模型 dhmeltzer/Llama-2-13b-hf-eli5-wiki-1024_r_64_alpha_16 的过程中自动创建的。数据集包含64个配置,每个配置对应一个评估任务。
数据集结构
数据集由2次运行结果组成,每次运行的结果可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。
数据集配置
数据集包含以下配置:
harness_arc_challenge_25harness_drop_3harness_gsm8k_5harness_hellaswag_10harness_hendrycksTest_5harness_hendrycksTest_abstract_algebra_5harness_hendrycksTest_anatomy_5harness_hendrycksTest_astronomy_5harness_hendrycksTest_business_ethics_5harness_hendrycksTest_clinical_knowledge_5harness_hendrycksTest_college_biology_5harness_hendrycksTest_college_chemistry_5harness_hendrycksTest_college_computer_science_5harness_hendrycksTest_college_mathematics_5harness_hendrycksTest_college_medicine_5harness_hendrycksTest_college_physics_5harness_hendrycksTest_computer_security_5harness_hendrycksTest_conceptual_physics_5harness_hendrycksTest_econometrics_5harness_hendrycksTest_electrical_engineering_5
数据文件
每个配置包含不同分割的数据文件,例如:
harness_arc_challenge_25包含2023_09_05T15_26_38.811892和latest分割的数据文件。harness_drop_3包含2023_09_22T19_51_06.659965和latest分割的数据文件。
最新结果
最新结果来自 2023-09-22T19:51:06.659965 运行,包含以下指标:
python
{
"all": {
"em": 0.002726510067114094,
"em_stderr": 0.0005340111700415918,
"f1": 0.06889890939597297,
"f1_stderr": 0.0014912452735151907,
"acc": 0.43548543448224686,
"acc_stderr": 0.010181852995139873
},
"harness|drop|3": {
"em": 0.002726510067114094,
"em_stderr": 0.0005340111700415918,
"f1": 0.06889890939597297,
"f1_stderr": 0.0014912452735151907
},
"harness|gsm8k|5": {
"acc": 0.10538286580742987,
"acc_stderr": 0.00845757588404176
},
"harness|winogrande|5": {
"acc": 0.7655880031570639,
"acc_stderr": 0.011906130106237986
}
}



