open-llm-leaderboard-old/details_teilomillet__MiniMerlin-3b-v0.1
收藏Hugging Face2023-12-13 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_teilomillet__MiniMerlin-3b-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Evaluation run of teilomillet/MiniMerlin-3b-v0.1
dataset_summary: "Dataset automatically created during the evaluation run of model\
\ [teilomillet/MiniMerlin-3b-v0.1](https://huggingface.co/teilomillet/MiniMerlin-3b-v0.1)\
\ on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\
\nThe dataset is composed of 63 configuration, each one coresponding to one of the\
\ evaluated task.\n\nThe dataset has been created from 1 run(s). Each run can be\
\ found as a specific split in each configuration, the split being named using the\
\ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\
\nAn additional configuration \"results\" store all the aggregated results of the\
\ run (and is used to compute and display the aggregated metrics on the [Open LLM\
\ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\
\nTo load the details from a run, you can for instance do the following:\n```python\n\
from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_teilomillet__MiniMerlin-3b-v0.1\"\
,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\
These are the [latest results from run 2023-12-13T12:30:09.463717](https://huggingface.co/datasets/open-llm-leaderboard/details_teilomillet__MiniMerlin-3b-v0.1/blob/main/results_2023-12-13T12-30-09.463717.json)(note\
\ that their might be results for other tasks in the repos if successive evals didn't\
\ cover the same tasks. You find each in the results and the \"latest\" split for\
\ each eval):\n\n```python\n{\n \"all\": {\n \"acc\": 0.42829802423091123,\n\
\ \"acc_stderr\": 0.034419009383078604,\n \"acc_norm\": 0.4345596062931712,\n\
\ \"acc_norm_stderr\": 0.035301959046270974,\n \"mc1\": 0.3023255813953488,\n\
\ \"mc1_stderr\": 0.016077509266133022,\n \"mc2\": 0.49647374974901654,\n\
\ \"mc2_stderr\": 0.015915065186614973\n },\n \"harness|arc:challenge|25\"\
: {\n \"acc\": 0.38139931740614336,\n \"acc_stderr\": 0.014194389086685261,\n\
\ \"acc_norm\": 0.4069965870307167,\n \"acc_norm_stderr\": 0.014356399418009131\n\
\ },\n \"harness|hellaswag|10\": {\n \"acc\": 0.4343756223859789,\n\
\ \"acc_stderr\": 0.004946617138983514,\n \"acc_norm\": 0.5406293567018522,\n\
\ \"acc_norm_stderr\": 0.004973280417705513\n },\n \"harness|hendrycksTest-abstract_algebra|5\"\
: {\n \"acc\": 0.36,\n \"acc_stderr\": 0.04824181513244218,\n \
\ \"acc_norm\": 0.36,\n \"acc_norm_stderr\": 0.04824181513244218\n \
\ },\n \"harness|hendrycksTest-anatomy|5\": {\n \"acc\": 0.43703703703703706,\n\
\ \"acc_stderr\": 0.042849586397533994,\n \"acc_norm\": 0.43703703703703706,\n\
\ \"acc_norm_stderr\": 0.042849586397533994\n },\n \"harness|hendrycksTest-astronomy|5\"\
: {\n \"acc\": 0.4473684210526316,\n \"acc_stderr\": 0.040463368839782486,\n\
\ \"acc_norm\": 0.4473684210526316,\n \"acc_norm_stderr\": 0.040463368839782486\n\
\ },\n \"harness|hendrycksTest-business_ethics|5\": {\n \"acc\": 0.38,\n\
\ \"acc_stderr\": 0.048783173121456316,\n \"acc_norm\": 0.38,\n \
\ \"acc_norm_stderr\": 0.048783173121456316\n },\n \"harness|hendrycksTest-clinical_knowledge|5\"\
: {\n \"acc\": 0.47924528301886793,\n \"acc_stderr\": 0.030746349975723463,\n\
\ \"acc_norm\": 0.47924528301886793,\n \"acc_norm_stderr\": 0.030746349975723463\n\
\ },\n \"harness|hendrycksTest-college_biology|5\": {\n \"acc\": 0.4722222222222222,\n\
\ \"acc_stderr\": 0.04174752578923185,\n \"acc_norm\": 0.4722222222222222,\n\
\ \"acc_norm_stderr\": 0.04174752578923185\n },\n \"harness|hendrycksTest-college_chemistry|5\"\
: {\n \"acc\": 0.33,\n \"acc_stderr\": 0.047258156262526045,\n \
\ \"acc_norm\": 0.33,\n \"acc_norm_stderr\": 0.047258156262526045\n \
\ },\n \"harness|hendrycksTest-college_computer_science|5\": {\n \"\
acc\": 0.45,\n \"acc_stderr\": 0.049999999999999996,\n \"acc_norm\"\
: 0.45,\n \"acc_norm_stderr\": 0.049999999999999996\n },\n \"harness|hendrycksTest-college_mathematics|5\"\
: {\n \"acc\": 0.28,\n \"acc_stderr\": 0.04512608598542126,\n \
\ \"acc_norm\": 0.28,\n \"acc_norm_stderr\": 0.04512608598542126\n \
\ },\n \"harness|hendrycksTest-college_medicine|5\": {\n \"acc\": 0.3872832369942196,\n\
\ \"acc_stderr\": 0.037143259063020656,\n \"acc_norm\": 0.3872832369942196,\n\
\ \"acc_norm_stderr\": 0.037143259063020656\n },\n \"harness|hendrycksTest-college_physics|5\"\
: {\n \"acc\": 0.2549019607843137,\n \"acc_stderr\": 0.04336432707993177,\n\
\ \"acc_norm\": 0.2549019607843137,\n \"acc_norm_stderr\": 0.04336432707993177\n\
\ },\n \"harness|hendrycksTest-computer_security|5\": {\n \"acc\":\
\ 0.54,\n \"acc_stderr\": 0.05009082659620332,\n \"acc_norm\": 0.54,\n\
\ \"acc_norm_stderr\": 0.05009082659620332\n },\n \"harness|hendrycksTest-conceptual_physics|5\"\
: {\n \"acc\": 0.3276595744680851,\n \"acc_stderr\": 0.030683020843231,\n\
\ \"acc_norm\": 0.3276595744680851,\n \"acc_norm_stderr\": 0.030683020843231\n\
\ },\n \"harness|hendrycksTest-econometrics|5\": {\n \"acc\": 0.2719298245614035,\n\
\ \"acc_stderr\": 0.04185774424022056,\n \"acc_norm\": 0.2719298245614035,\n\
\ \"acc_norm_stderr\": 0.04185774424022056\n },\n \"harness|hendrycksTest-electrical_engineering|5\"\
: {\n \"acc\": 0.43448275862068964,\n \"acc_stderr\": 0.04130740879555497,\n\
\ \"acc_norm\": 0.43448275862068964,\n \"acc_norm_stderr\": 0.04130740879555497\n\
\ },\n \"harness|hendrycksTest-elementary_mathematics|5\": {\n \"acc\"\
: 0.2566137566137566,\n \"acc_stderr\": 0.022494510767503154,\n \"\
acc_norm\": 0.2566137566137566,\n \"acc_norm_stderr\": 0.022494510767503154\n\
\ },\n \"harness|hendrycksTest-formal_logic|5\": {\n \"acc\": 0.25396825396825395,\n\
\ \"acc_stderr\": 0.03893259610604673,\n \"acc_norm\": 0.25396825396825395,\n\
\ \"acc_norm_stderr\": 0.03893259610604673\n },\n \"harness|hendrycksTest-global_facts|5\"\
: {\n \"acc\": 0.3,\n \"acc_stderr\": 0.046056618647183814,\n \
\ \"acc_norm\": 0.3,\n \"acc_norm_stderr\": 0.046056618647183814\n \
\ },\n \"harness|hendrycksTest-high_school_biology|5\": {\n \"acc\": 0.5161290322580645,\n\
\ \"acc_stderr\": 0.028429203176724555,\n \"acc_norm\": 0.5161290322580645,\n\
\ \"acc_norm_stderr\": 0.028429203176724555\n },\n \"harness|hendrycksTest-high_school_chemistry|5\"\
: {\n \"acc\": 0.35467980295566504,\n \"acc_stderr\": 0.0336612448905145,\n\
\ \"acc_norm\": 0.35467980295566504,\n \"acc_norm_stderr\": 0.0336612448905145\n\
\ },\n \"harness|hendrycksTest-high_school_computer_science|5\": {\n \
\ \"acc\": 0.4,\n \"acc_stderr\": 0.049236596391733084,\n \"acc_norm\"\
: 0.4,\n \"acc_norm_stderr\": 0.049236596391733084\n },\n \"harness|hendrycksTest-high_school_european_history|5\"\
: {\n \"acc\": 0.5212121212121212,\n \"acc_stderr\": 0.03900828913737302,\n\
\ \"acc_norm\": 0.5212121212121212,\n \"acc_norm_stderr\": 0.03900828913737302\n\
\ },\n \"harness|hendrycksTest-high_school_geography|5\": {\n \"acc\"\
: 0.5252525252525253,\n \"acc_stderr\": 0.03557806245087314,\n \"\
acc_norm\": 0.5252525252525253,\n \"acc_norm_stderr\": 0.03557806245087314\n\
\ },\n \"harness|hendrycksTest-high_school_government_and_politics|5\": {\n\
\ \"acc\": 0.5647668393782384,\n \"acc_stderr\": 0.035780381650085846,\n\
\ \"acc_norm\": 0.5647668393782384,\n \"acc_norm_stderr\": 0.035780381650085846\n\
\ },\n \"harness|hendrycksTest-high_school_macroeconomics|5\": {\n \
\ \"acc\": 0.382051282051282,\n \"acc_stderr\": 0.024635549163908227,\n \
\ \"acc_norm\": 0.382051282051282,\n \"acc_norm_stderr\": 0.024635549163908227\n\
\ },\n \"harness|hendrycksTest-high_school_mathematics|5\": {\n \"\
acc\": 0.22962962962962963,\n \"acc_stderr\": 0.025644108639267613,\n \
\ \"acc_norm\": 0.22962962962962963,\n \"acc_norm_stderr\": 0.025644108639267613\n\
\ },\n \"harness|hendrycksTest-high_school_microeconomics|5\": {\n \
\ \"acc\": 0.35294117647058826,\n \"acc_stderr\": 0.031041941304059274,\n\
\ \"acc_norm\": 0.35294117647058826,\n \"acc_norm_stderr\": 0.031041941304059274\n\
\ },\n \"harness|hendrycksTest-high_school_physics|5\": {\n \"acc\"\
: 0.33112582781456956,\n \"acc_stderr\": 0.038425817186598696,\n \"\
acc_norm\": 0.33112582781456956,\n \"acc_norm_stderr\": 0.038425817186598696\n\
\ },\n \"harness|hendrycksTest-high_school_psychology|5\": {\n \"acc\"\
: 0.5614678899082569,\n \"acc_stderr\": 0.021274713073954572,\n \"\
acc_norm\": 0.5614678899082569,\n \"acc_norm_stderr\": 0.021274713073954572\n\
\ },\n \"harness|hendrycksTest-high_school_statistics|5\": {\n \"acc\"\
: 0.25,\n \"acc_stderr\": 0.029531221160930918,\n \"acc_norm\": 0.25,\n\
\ \"acc_norm_stderr\": 0.029531221160930918\n },\n \"harness|hendrycksTest-high_school_us_history|5\"\
: {\n \"acc\": 0.5441176470588235,\n \"acc_stderr\": 0.03495624522015475,\n\
\ \"acc_norm\": 0.5441176470588235,\n \"acc_norm_stderr\": 0.03495624522015475\n\
\ },\n \"harness|hendrycksTest-high_school_world_history|5\": {\n \"\
acc\": 0.5907172995780591,\n \"acc_stderr\": 0.032007041833595914,\n \
\ \"acc_norm\": 0.5907172995780591,\n \"acc_norm_stderr\": 0.032007041833595914\n\
\ },\n \"harness|hendrycksTest-human_aging|5\": {\n \"acc\": 0.4663677130044843,\n\
\ \"acc_stderr\": 0.033481800170603065,\n \"acc_norm\": 0.4663677130044843,\n\
\ \"acc_norm_stderr\": 0.033481800170603065\n },\n \"harness|hendrycksTest-human_sexuality|5\"\
: {\n \"acc\": 0.5267175572519084,\n \"acc_stderr\": 0.04379024936553894,\n\
\ \"acc_norm\": 0.5267175572519084,\n \"acc_norm_stderr\": 0.04379024936553894\n\
\ },\n \"harness|hendrycksTest-international_law|5\": {\n \"acc\":\
\ 0.5867768595041323,\n \"acc_stderr\": 0.04495087843548408,\n \"\
acc_norm\": 0.5867768595041323,\n \"acc_norm_stderr\": 0.04495087843548408\n\
\ },\n \"harness|hendrycksTest-jurisprudence|5\": {\n \"acc\": 0.5,\n\
\ \"acc_stderr\": 0.04833682445228318,\n \"acc_norm\": 0.5,\n \
\ \"acc_norm_stderr\": 0.04833682445228318\n },\n \"harness|hendrycksTest-logical_fallacies|5\"\
: {\n \"acc\": 0.5030674846625767,\n \"acc_stderr\": 0.03928297078179663,\n\
\ \"acc_norm\": 0.5030674846625767,\n \"acc_norm_stderr\": 0.03928297078179663\n\
\ },\n \"harness|hendrycksTest-machine_learning|5\": {\n \"acc\": 0.3482142857142857,\n\
\ \"acc_stderr\": 0.04521829902833586,\n \"acc_norm\": 0.3482142857142857,\n\
\ \"acc_norm_stderr\": 0.04521829902833586\n },\n \"harness|hendrycksTest-management|5\"\
: {\n \"acc\": 0.5825242718446602,\n \"acc_stderr\": 0.048828405482122375,\n\
\ \"acc_norm\": 0.5825242718446602,\n \"acc_norm_stderr\": 0.048828405482122375\n\
\ },\n \"harness|hendrycksTest-marketing|5\": {\n \"acc\": 0.6709401709401709,\n\
\ \"acc_stderr\": 0.03078232157768817,\n \"acc_norm\": 0.6709401709401709,\n\
\ \"acc_norm_stderr\": 0.03078232157768817\n },\n \"harness|hendrycksTest-medical_genetics|5\"\
: {\n \"acc\": 0.48,\n \"acc_stderr\": 0.050211673156867795,\n \
\ \"acc_norm\": 0.48,\n \"acc_norm_stderr\": 0.050211673156867795\n \
\ },\n \"harness|hendrycksTest-miscellaneous|5\": {\n \"acc\": 0.49936143039591313,\n\
\ \"acc_stderr\": 0.01787994891443168,\n \"acc_norm\": 0.49936143039591313,\n\
\ \"acc_norm_stderr\": 0.01787994891443168\n },\n \"harness|hendrycksTest-moral_disputes|5\"\
: {\n \"acc\": 0.4653179190751445,\n \"acc_stderr\": 0.026854257928258893,\n\
\ \"acc_norm\": 0.4653179190751445,\n \"acc_norm_stderr\": 0.026854257928258893\n\
\ },\n \"harness|hendrycksTest-moral_scenarios|5\": {\n \"acc\": 0.24692737430167597,\n\
\ \"acc_stderr\": 0.014422292204808862,\n \"acc_norm\": 0.24692737430167597,\n\
\ \"acc_norm_stderr\": 0.014422292204808862\n },\n \"harness|hendrycksTest-nutrition|5\"\
: {\n \"acc\": 0.5163398692810458,\n \"acc_stderr\": 0.028614624752805434,\n\
\ \"acc_norm\": 0.5163398692810458,\n \"acc_norm_stderr\": 0.028614624752805434\n\
\ },\n \"harness|hendrycksTest-philosophy|5\": {\n \"acc\": 0.4855305466237942,\n\
\ \"acc_stderr\": 0.02838619808417768,\n \"acc_norm\": 0.4855305466237942,\n\
\ \"acc_norm_stderr\": 0.02838619808417768\n },\n \"harness|hendrycksTest-prehistory|5\"\
: {\n \"acc\": 0.45987654320987653,\n \"acc_stderr\": 0.027731022753539274,\n\
\ \"acc_norm\": 0.45987654320987653,\n \"acc_norm_stderr\": 0.027731022753539274\n\
\ },\n \"harness|hendrycksTest-professional_accounting|5\": {\n \"\
acc\": 0.3475177304964539,\n \"acc_stderr\": 0.028406627809590947,\n \
\ \"acc_norm\": 0.3475177304964539,\n \"acc_norm_stderr\": 0.028406627809590947\n\
\ },\n \"harness|hendrycksTest-professional_law|5\": {\n \"acc\": 0.3533246414602347,\n\
\ \"acc_stderr\": 0.012208408211082428,\n \"acc_norm\": 0.3533246414602347,\n\
\ \"acc_norm_stderr\": 0.012208408211082428\n },\n \"harness|hendrycksTest-professional_medicine|5\"\
: {\n \"acc\": 0.2757352941176471,\n \"acc_stderr\": 0.02714627193662517,\n\
\ \"acc_norm\": 0.2757352941176471,\n \"acc_norm_stderr\": 0.02714627193662517\n\
\ },\n \"harness|hendrycksTest-professional_psychology|5\": {\n \"\
acc\": 0.4133986928104575,\n \"acc_stderr\": 0.01992211568278667,\n \
\ \"acc_norm\": 0.4133986928104575,\n \"acc_norm_stderr\": 0.01992211568278667\n\
\ },\n \"harness|hendrycksTest-public_relations|5\": {\n \"acc\": 0.5181818181818182,\n\
\ \"acc_stderr\": 0.04785964010794916,\n \"acc_norm\": 0.5181818181818182,\n\
\ \"acc_norm_stderr\": 0.04785964010794916\n },\n \"harness|hendrycksTest-security_studies|5\"\
: {\n \"acc\": 0.5346938775510204,\n \"acc_stderr\": 0.03193207024425314,\n\
\ \"acc_norm\": 0.5346938775510204,\n \"acc_norm_stderr\": 0.03193207024425314\n\
\ },\n \"harness|hendrycksTest-sociology|5\": {\n \"acc\": 0.5771144278606966,\n\
\ \"acc_stderr\": 0.034932317774212816,\n \"acc_norm\": 0.5771144278606966,\n\
\ \"acc_norm_stderr\": 0.034932317774212816\n },\n \"harness|hendrycksTest-us_foreign_policy|5\"\
: {\n \"acc\": 0.61,\n \"acc_stderr\": 0.04902071300001974,\n \
\ \"acc_norm\": 0.61,\n \"acc_norm_stderr\": 0.04902071300001974\n \
\ },\n \"harness|hendrycksTest-virology|5\": {\n \"acc\": 0.43373493975903615,\n\
\ \"acc_stderr\": 0.03858158940685516,\n \"acc_norm\": 0.43373493975903615,\n\
\ \"acc_norm_stderr\": 0.03858158940685516\n },\n \"harness|hendrycksTest-world_religions|5\"\
: {\n \"acc\": 0.5029239766081871,\n \"acc_stderr\": 0.03834759370936839,\n\
\ \"acc_norm\": 0.5029239766081871,\n \"acc_norm_stderr\": 0.03834759370936839\n\
\ },\n \"harness|truthfulqa:mc|0\": {\n \"mc1\": 0.3023255813953488,\n\
\ \"mc1_stderr\": 0.016077509266133022,\n \"mc2\": 0.49647374974901654,\n\
\ \"mc2_stderr\": 0.015915065186614973\n },\n \"harness|winogrande|5\"\
: {\n \"acc\": 0.6053670086819258,\n \"acc_stderr\": 0.013736915172371888\n\
\ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.013646702047005308,\n \
\ \"acc_stderr\": 0.003195747075480817\n }\n}\n```"
repo_url: https://huggingface.co/teilomillet/MiniMerlin-3b-v0.1
leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
point_of_contact: clementine@hf.co
configs:
- config_name: harness_arc_challenge_25
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|arc:challenge|25_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|arc:challenge|25_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_gsm8k_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|gsm8k|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|gsm8k|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hellaswag_10
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hellaswag|10_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hellaswag|10_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-management|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-virology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-management|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-virology|5_2023-12-13T12-30-09.463717.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_abstract_algebra_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_anatomy_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-anatomy|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-anatomy|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_astronomy_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-astronomy|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-astronomy|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_business_ethics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_clinical_knowledge_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_biology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_biology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_biology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_chemistry_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_computer_science_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_mathematics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_medicine_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_college_physics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-college_physics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_physics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_computer_security_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-computer_security|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-computer_security|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_conceptual_physics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_econometrics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-econometrics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-econometrics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_electrical_engineering_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_elementary_mathematics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_formal_logic_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_global_facts_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-global_facts|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-global_facts|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_biology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_chemistry_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_computer_science_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_european_history_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_geography_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_government_and_politics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_macroeconomics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_mathematics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_microeconomics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_physics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_psychology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_statistics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_us_history_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_high_school_world_history_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_human_aging_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-human_aging|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_aging|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_human_sexuality_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_international_law_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-international_law|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-international_law|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_jurisprudence_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_logical_fallacies_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_machine_learning_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_management_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-management|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-management|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_marketing_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-marketing|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-marketing|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_medical_genetics_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_miscellaneous_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_moral_disputes_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_moral_scenarios_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_nutrition_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-nutrition|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-nutrition|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_philosophy_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-philosophy|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-philosophy|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_prehistory_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-prehistory|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-prehistory|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_professional_accounting_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_professional_law_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-professional_law|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_law|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_professional_medicine_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_professional_psychology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_public_relations_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-public_relations|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-public_relations|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_security_studies_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-security_studies|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-security_studies|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_sociology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-sociology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-sociology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_us_foreign_policy_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_virology_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-virology|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-virology|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_hendrycksTest_world_religions_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|hendrycksTest-world_religions|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-world_religions|5_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_truthfulqa_mc_0
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|truthfulqa:mc|0_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|truthfulqa:mc|0_2023-12-13T12-30-09.463717.parquet'
- config_name: harness_winogrande_5
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- '**/details_harness|winogrande|5_2023-12-13T12-30-09.463717.parquet'
- split: latest
path:
- '**/details_harness|winogrande|5_2023-12-13T12-30-09.463717.parquet'
- config_name: results
data_files:
- split: 2023_12_13T12_30_09.463717
path:
- results_2023-12-13T12-30-09.463717.parquet
- split: latest
path:
- results_2023-12-13T12-30-09.463717.parquet
---
# Dataset Card for Evaluation run of teilomillet/MiniMerlin-3b-v0.1
<!-- Provide a quick summary of the dataset. -->
Dataset automatically created during the evaluation run of model [teilomillet/MiniMerlin-3b-v0.1](https://huggingface.co/teilomillet/MiniMerlin-3b-v0.1) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task.
The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.
An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).
To load the details from a run, you can for instance do the following:
```python
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_teilomillet__MiniMerlin-3b-v0.1",
"harness_winogrande_5",
split="train")
```
## Latest results
These are the [latest results from run 2023-12-13T12:30:09.463717](https://huggingface.co/datasets/open-llm-leaderboard/details_teilomillet__MiniMerlin-3b-v0.1/blob/main/results_2023-12-13T12-30-09.463717.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):
```python
{
"all": {
"acc": 0.42829802423091123,
"acc_stderr": 0.034419009383078604,
"acc_norm": 0.4345596062931712,
"acc_norm_stderr": 0.035301959046270974,
"mc1": 0.3023255813953488,
"mc1_stderr": 0.016077509266133022,
"mc2": 0.49647374974901654,
"mc2_stderr": 0.015915065186614973
},
"harness|arc:challenge|25": {
"acc": 0.38139931740614336,
"acc_stderr": 0.014194389086685261,
"acc_norm": 0.4069965870307167,
"acc_norm_stderr": 0.014356399418009131
},
"harness|hellaswag|10": {
"acc": 0.4343756223859789,
"acc_stderr": 0.004946617138983514,
"acc_norm": 0.5406293567018522,
"acc_norm_stderr": 0.004973280417705513
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.36,
"acc_stderr": 0.04824181513244218,
"acc_norm": 0.36,
"acc_norm_stderr": 0.04824181513244218
},
"harness|hendrycksTest-anatomy|5": {
"acc": 0.43703703703703706,
"acc_stderr": 0.042849586397533994,
"acc_norm": 0.43703703703703706,
"acc_norm_stderr": 0.042849586397533994
},
"harness|hendrycksTest-astronomy|5": {
"acc": 0.4473684210526316,
"acc_stderr": 0.040463368839782486,
"acc_norm": 0.4473684210526316,
"acc_norm_stderr": 0.040463368839782486
},
"harness|hendrycksTest-business_ethics|5": {
"acc": 0.38,
"acc_stderr": 0.048783173121456316,
"acc_norm": 0.38,
"acc_norm_stderr": 0.048783173121456316
},
"harness|hendrycksTest-clinical_knowledge|5": {
"acc": 0.47924528301886793,
"acc_stderr": 0.030746349975723463,
"acc_norm": 0.47924528301886793,
"acc_norm_stderr": 0.030746349975723463
},
"harness|hendrycksTest-college_biology|5": {
"acc": 0.4722222222222222,
"acc_stderr": 0.04174752578923185,
"acc_norm": 0.4722222222222222,
"acc_norm_stderr": 0.04174752578923185
},
"harness|hendrycksTest-college_chemistry|5": {
"acc": 0.33,
"acc_stderr": 0.047258156262526045,
"acc_norm": 0.33,
"acc_norm_stderr": 0.047258156262526045
},
"harness|hendrycksTest-college_computer_science|5": {
"acc": 0.45,
"acc_stderr": 0.049999999999999996,
"acc_norm": 0.45,
"acc_norm_stderr": 0.049999999999999996
},
"harness|hendrycksTest-college_mathematics|5": {
"acc": 0.28,
"acc_stderr": 0.04512608598542126,
"acc_norm": 0.28,
"acc_norm_stderr": 0.04512608598542126
},
"harness|hendrycksTest-college_medicine|5": {
"acc": 0.3872832369942196,
"acc_stderr": 0.037143259063020656,
"acc_norm": 0.3872832369942196,
"acc_norm_stderr": 0.037143259063020656
},
"harness|hendrycksTest-college_physics|5": {
"acc": 0.2549019607843137,
"acc_stderr": 0.04336432707993177,
"acc_norm": 0.2549019607843137,
"acc_norm_stderr": 0.04336432707993177
},
"harness|hendrycksTest-computer_security|5": {
"acc": 0.54,
"acc_stderr": 0.05009082659620332,
"acc_norm": 0.54,
"acc_norm_stderr": 0.05009082659620332
},
"harness|hendrycksTest-conceptual_physics|5": {
"acc": 0.3276595744680851,
"acc_stderr": 0.030683020843231,
"acc_norm": 0.3276595744680851,
"acc_norm_stderr": 0.030683020843231
},
"harness|hendrycksTest-econometrics|5": {
"acc": 0.2719298245614035,
"acc_stderr": 0.04185774424022056,
"acc_norm": 0.2719298245614035,
"acc_norm_stderr": 0.04185774424022056
},
"harness|hendrycksTest-electrical_engineering|5": {
"acc": 0.43448275862068964,
"acc_stderr": 0.04130740879555497,
"acc_norm": 0.43448275862068964,
"acc_norm_stderr": 0.04130740879555497
},
"harness|hendrycksTest-elementary_mathematics|5": {
"acc": 0.2566137566137566,
"acc_stderr": 0.022494510767503154,
"acc_norm": 0.2566137566137566,
"acc_norm_stderr": 0.022494510767503154
},
"harness|hendrycksTest-formal_logic|5": {
"acc": 0.25396825396825395,
"acc_stderr": 0.03893259610604673,
"acc_norm": 0.25396825396825395,
"acc_norm_stderr": 0.03893259610604673
},
"harness|hendrycksTest-global_facts|5": {
"acc": 0.3,
"acc_stderr": 0.046056618647183814,
"acc_norm": 0.3,
"acc_norm_stderr": 0.046056618647183814
},
"harness|hendrycksTest-high_school_biology|5": {
"acc": 0.5161290322580645,
"acc_stderr": 0.028429203176724555,
"acc_norm": 0.5161290322580645,
"acc_norm_stderr": 0.028429203176724555
},
"harness|hendrycksTest-high_school_chemistry|5": {
"acc": 0.35467980295566504,
"acc_stderr": 0.0336612448905145,
"acc_norm": 0.35467980295566504,
"acc_norm_stderr": 0.0336612448905145
},
"harness|hendrycksTest-high_school_computer_science|5": {
"acc": 0.4,
"acc_stderr": 0.049236596391733084,
"acc_norm": 0.4,
"acc_norm_stderr": 0.049236596391733084
},
"harness|hendrycksTest-high_school_european_history|5": {
"acc": 0.5212121212121212,
"acc_stderr": 0.03900828913737302,
"acc_norm": 0.5212121212121212,
"acc_norm_stderr": 0.03900828913737302
},
"harness|hendrycksTest-high_school_geography|5": {
"acc": 0.5252525252525253,
"acc_stderr": 0.03557806245087314,
"acc_norm": 0.5252525252525253,
"acc_norm_stderr": 0.03557806245087314
},
"harness|hendrycksTest-high_school_government_and_politics|5": {
"acc": 0.5647668393782384,
"acc_stderr": 0.035780381650085846,
"acc_norm": 0.5647668393782384,
"acc_norm_stderr": 0.035780381650085846
},
"harness|hendrycksTest-high_school_macroeconomics|5": {
"acc": 0.382051282051282,
"acc_stderr": 0.024635549163908227,
"acc_norm": 0.382051282051282,
"acc_norm_stderr": 0.024635549163908227
},
"harness|hendrycksTest-high_school_mathematics|5": {
"acc": 0.22962962962962963,
"acc_stderr": 0.025644108639267613,
"acc_norm": 0.22962962962962963,
"acc_norm_stderr": 0.025644108639267613
},
"harness|hendrycksTest-high_school_microeconomics|5": {
"acc": 0.35294117647058826,
"acc_stderr": 0.031041941304059274,
"acc_norm": 0.35294117647058826,
"acc_norm_stderr": 0.031041941304059274
},
"harness|hendrycksTest-high_school_physics|5": {
"acc": 0.33112582781456956,
"acc_stderr": 0.038425817186598696,
"acc_norm": 0.33112582781456956,
"acc_norm_stderr": 0.038425817186598696
},
"harness|hendrycksTest-high_school_psychology|5": {
"acc": 0.5614678899082569,
"acc_stderr": 0.021274713073954572,
"acc_norm": 0.5614678899082569,
"acc_norm_stderr": 0.021274713073954572
},
"harness|hendrycksTest-high_school_statistics|5": {
"acc": 0.25,
"acc_stderr": 0.029531221160930918,
"acc_norm": 0.25,
"acc_norm_stderr": 0.029531221160930918
},
"harness|hendrycksTest-high_school_us_history|5": {
"acc": 0.5441176470588235,
"acc_stderr": 0.03495624522015475,
"acc_norm": 0.5441176470588235,
"acc_norm_stderr": 0.03495624522015475
},
"harness|hendrycksTest-high_school_world_history|5": {
"acc": 0.5907172995780591,
"acc_stderr": 0.032007041833595914,
"acc_norm": 0.5907172995780591,
"acc_norm_stderr": 0.032007041833595914
},
"harness|hendrycksTest-human_aging|5": {
"acc": 0.4663677130044843,
"acc_stderr": 0.033481800170603065,
"acc_norm": 0.4663677130044843,
"acc_norm_stderr": 0.033481800170603065
},
"harness|hendrycksTest-human_sexuality|5": {
"acc": 0.5267175572519084,
"acc_stderr": 0.04379024936553894,
"acc_norm": 0.5267175572519084,
"acc_norm_stderr": 0.04379024936553894
},
"harness|hendrycksTest-international_law|5": {
"acc": 0.5867768595041323,
"acc_stderr": 0.04495087843548408,
"acc_norm": 0.5867768595041323,
"acc_norm_stderr": 0.04495087843548408
},
"harness|hendrycksTest-jurisprudence|5": {
"acc": 0.5,
"acc_stderr": 0.04833682445228318,
"acc_norm": 0.5,
"acc_norm_stderr": 0.04833682445228318
},
"harness|hendrycksTest-logical_fallacies|5": {
"acc": 0.5030674846625767,
"acc_stderr": 0.03928297078179663,
"acc_norm": 0.5030674846625767,
"acc_norm_stderr": 0.03928297078179663
},
"harness|hendrycksTest-machine_learning|5": {
"acc": 0.3482142857142857,
"acc_stderr": 0.04521829902833586,
"acc_norm": 0.3482142857142857,
"acc_norm_stderr": 0.04521829902833586
},
"harness|hendrycksTest-management|5": {
"acc": 0.5825242718446602,
"acc_stderr": 0.048828405482122375,
"acc_norm": 0.5825242718446602,
"acc_norm_stderr": 0.048828405482122375
},
"harness|hendrycksTest-marketing|5": {
"acc": 0.6709401709401709,
"acc_stderr": 0.03078232157768817,
"acc_norm": 0.6709401709401709,
"acc_norm_stderr": 0.03078232157768817
},
"harness|hendrycksTest-medical_genetics|5": {
"acc": 0.48,
"acc_stderr": 0.050211673156867795,
"acc_norm": 0.48,
"acc_norm_stderr": 0.050211673156867795
},
"harness|hendrycksTest-miscellaneous|5": {
"acc": 0.49936143039591313,
"acc_stderr": 0.01787994891443168,
"acc_norm": 0.49936143039591313,
"acc_norm_stderr": 0.01787994891443168
},
"harness|hendrycksTest-moral_disputes|5": {
"acc": 0.4653179190751445,
"acc_stderr": 0.026854257928258893,
"acc_norm": 0.4653179190751445,
"acc_norm_stderr": 0.026854257928258893
},
"harness|hendrycksTest-moral_scenarios|5": {
"acc": 0.24692737430167597,
"acc_stderr": 0.014422292204808862,
"acc_norm": 0.24692737430167597,
"acc_norm_stderr": 0.014422292204808862
},
"harness|hendrycksTest-nutrition|5": {
"acc": 0.5163398692810458,
"acc_stderr": 0.028614624752805434,
"acc_norm": 0.5163398692810458,
"acc_norm_stderr": 0.028614624752805434
},
"harness|hendrycksTest-philosophy|5": {
"acc": 0.4855305466237942,
"acc_stderr": 0.02838619808417768,
"acc_norm": 0.4855305466237942,
"acc_norm_stderr": 0.02838619808417768
},
"harness|hendrycksTest-prehistory|5": {
"acc": 0.45987654320987653,
"acc_stderr": 0.027731022753539274,
"acc_norm": 0.45987654320987653,
"acc_norm_stderr": 0.027731022753539274
},
"harness|hendrycksTest-professional_accounting|5": {
"acc": 0.3475177304964539,
"acc_stderr": 0.028406627809590947,
"acc_norm": 0.3475177304964539,
"acc_norm_stderr": 0.028406627809590947
},
"harness|hendrycksTest-professional_law|5": {
"acc": 0.3533246414602347,
"acc_stderr": 0.012208408211082428,
"acc_norm": 0.3533246414602347,
"acc_norm_stderr": 0.012208408211082428
},
"harness|hendrycksTest-professional_medicine|5": {
"acc": 0.2757352941176471,
"acc_stderr": 0.02714627193662517,
"acc_norm": 0.2757352941176471,
"acc_norm_stderr": 0.02714627193662517
},
"harness|hendrycksTest-professional_psychology|5": {
"acc": 0.4133986928104575,
"acc_stderr": 0.01992211568278667,
"acc_norm": 0.4133986928104575,
"acc_norm_stderr": 0.01992211568278667
},
"harness|hendrycksTest-public_relations|5": {
"acc": 0.5181818181818182,
"acc_stderr": 0.04785964010794916,
"acc_norm": 0.5181818181818182,
"acc_norm_stderr": 0.04785964010794916
},
"harness|hendrycksTest-security_studies|5": {
"acc": 0.5346938775510204,
"acc_stderr": 0.03193207024425314,
"acc_norm": 0.5346938775510204,
"acc_norm_stderr": 0.03193207024425314
},
"harness|hendrycksTest-sociology|5": {
"acc": 0.5771144278606966,
"acc_stderr": 0.034932317774212816,
"acc_norm": 0.5771144278606966,
"acc_norm_stderr": 0.034932317774212816
},
"harness|hendrycksTest-us_foreign_policy|5": {
"acc": 0.61,
"acc_stderr": 0.04902071300001974,
"acc_norm": 0.61,
"acc_norm_stderr": 0.04902071300001974
},
"harness|hendrycksTest-virology|5": {
"acc": 0.43373493975903615,
"acc_stderr": 0.03858158940685516,
"acc_norm": 0.43373493975903615,
"acc_norm_stderr": 0.03858158940685516
},
"harness|hendrycksTest-world_religions|5": {
"acc": 0.5029239766081871,
"acc_stderr": 0.03834759370936839,
"acc_norm": 0.5029239766081871,
"acc_norm_stderr": 0.03834759370936839
},
"harness|truthfulqa:mc|0": {
"mc1": 0.3023255813953488,
"mc1_stderr": 0.016077509266133022,
"mc2": 0.49647374974901654,
"mc2_stderr": 0.015915065186614973
},
"harness|winogrande|5": {
"acc": 0.6053670086819258,
"acc_stderr": 0.013736915172371888
},
"harness|gsm8k|5": {
"acc": 0.013646702047005308,
"acc_stderr": 0.003195747075480817
}
}
```
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
[More Information Needed]
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
[More Information Needed]
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed]
提供机构:
open-llm-leaderboard-old
原始信息汇总
数据集概述
数据集创建
- 数据集是在模型 teilomillet/MiniMerlin-3b-v0.1 在 Open LLM Leaderboard 上的评估运行期间自动创建的。
数据集结构
- 数据集包含 63 个配置,每个配置对应一个评估任务。
- 数据集从 1 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
- "train" 分割始终指向最新的结果。
- 额外的 "results" 配置存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。
数据加载示例
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_teilomillet__MiniMerlin-3b-v0.1", "harness_winogrande_5", split="train")
最新结果
- 最新结果来自 2023-12-13T12:30:09.463717 运行,包含多个任务的评估结果。
配置详情
-
harness_arc_challenge_25
- 分割:2023_12_13T12_30_09.463717, latest
- 路径:
**/details_harness|arc:challenge|25_2023-12-13T12-30-09.463717.parquet
-
harness_gsm8k_5
- 分割:2023_12_13T12_30_09.463717, latest
- 路径:
**/details_harness|gsm8k|5_2023-12-13T12-30-09.463717.parquet
-
harness_hellaswag_10
- 分割:2023_12_13T12_30_09.463717, latest
- 路径:
**/details_harness|hellaswag|10_2023-12-13T12-30-09.463717.parquet
-
harness_hendrycksTest_5
- 分割:2023_12_13T12_30_09.463717, latest
- 路径:多个路径,包括
**/details_harness|hendrycksTest-abstract_algebra|5_2023-12-13T12-30-09.463717.parquet等。
搜集汇总
数据集介绍

构建方式
在大型语言模型评估领域,该数据集作为Open LLM Leaderboard评估流程的自动化产物而构建。其核心机制是在对特定模型teilomillet/MiniMerlin-3b-v0.1进行系统性评测时,自动捕获并结构化所有评测任务的详细结果。数据集通过63个独立配置组织数据,每个配置对应一项具体的评测任务,例如ARC挑战赛或HellaSwag推理。每次评估运行会生成带有时间戳的唯一数据切片,而“train”切片始终指向最新的评估结果,确保了数据的时效性和可追溯性。
特点
该数据集展现了多维度、细粒度的模型性能刻画特点。它不仅涵盖了常识推理、知识问答、数学计算等广泛任务,还针对专业学科如法学、医学、物理学等提供了深入的评估数据。每个任务配置都精确记录了准确率及其标准误,例如在HellaSwag任务上获得了0.434的准确率,而在GSM8K数学推理任务上表现较低。这种结构允许研究者从宏观的综合指标到微观的单项任务表现进行全方位分析,为模型能力的横向对比与纵向追踪提供了坚实的数据基础。
使用方法
为有效利用该数据集,研究者可通过Hugging Face的`datasets`库进行便捷加载。典型的使用方式是调用`load_dataset`函数,指定数据集名称、目标配置(如`harness_winogrande_5`)以及所需的数据切片。数据集支持按时间戳访问历史评估运行,或通过“latest”切片获取最新结果。加载后的数据以结构化格式呈现,便于进行后续的统计分析、性能可视化或集成到模型评估流水线中,从而支撑模型迭代与学术研究。
背景与挑战
背景概述
在大型语言模型(LLM)迅猛发展的背景下,评估模型性能成为推动技术进步的关键环节。Open LLM Leaderboard作为Hugging Face平台上的权威评测框架,旨在通过标准化测试集对各类开源语言模型进行系统性评估。数据集‘open-llm-leaderboard-old/details_teilomillet__MiniMerlin-3b-v0.1’正是该框架于2023年12月为模型‘MiniMerlin-3b-v0.1’生成的详细评测结果集合,由Hugging Face团队主导构建。其核心研究问题聚焦于量化模型在常识推理、专业知识及数学能力等多维任务上的表现,为社区提供了透明、可复现的性能基准,显著促进了模型比较与优化研究。
当前挑战
该数据集所应对的领域挑战在于,大型语言模型的评估需覆盖广泛且复杂的认知能力,从基础常识到专业学科知识,确保评测的全面性与公正性。构建过程中的挑战则体现在数据整合与标准化方面:需将来自ARC、HellaSwag、MMLU及GSM8K等异构评测任务的结果统一格式化,并处理多次运行产生的时序数据版本管理,同时保证指标计算的准确性与可追溯性,以支撑开放科学的可复现性要求。
常用场景
经典使用场景
在大型语言模型评估领域,该数据集作为Open LLM Leaderboard的自动化评估产物,其经典使用场景体现在对模型teilomillet/MiniMerlin-3b-v0.1进行多维度性能基准测试。通过涵盖ARC挑战赛、HellaSwag、MMLU以及TruthfulQA等63项标准化任务配置,该数据集为研究者提供了模型在常识推理、知识问答、数学计算及伦理判断等核心认知能力上的细粒度评估框架。这种结构化评估范式使得学术界能够系统化地比较不同模型架构在统一度量标准下的表现差异,为模型优化方向的确定提供了数据支撑。
衍生相关工作
基于该数据集衍生的经典研究工作主要集中于模型能力诊断与评估方法论创新。研究者利用其细粒度评估结果开发了模型能力溯源分析工具,如通过对比不同参数规模模型在MMLU子任务上的表现差异,揭示了知识获取与模型缩放定律的内在关联。同时,该数据集催生了多项评估框架改进研究,包括动态评估协议设计、跨任务迁移性分析以及评估偏差修正方法,这些工作共同推动形成了当前大语言模型评估领域以任务分解、误差分析和能力映射为核心的方法论体系。
数据集最近研究
最新研究方向
在大型语言模型评估领域,open-llm-leaderboard数据集作为标准化评测平台,正推动着模型性能的精细化分析。当前研究聚焦于通过多维度任务配置,如ARC挑战、HellaSwag及MMLU等专业学科测试,深入探索模型在知识推理、常识理解和专业领域适应性方面的表现。前沿工作致力于利用此类评估数据优化模型架构与训练策略,特别是在提升小规模模型如MiniMerlin-3b的泛化能力与效率平衡上,为开源社区提供了可复现的基准,促进了透明、可比较的模型发展生态。
以上内容由遇见数据集搜集并总结生成



