five

open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt

收藏
Hugging Face2023-09-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Evaluation run of klosax/pythia-70m-deduped-step44k-92bt dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [klosax/pythia-70m-deduped-step44k-92bt](https://huggingface.co/klosax/pythia-70m-deduped-step44k-92bt)\ \ on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\ \nThe dataset is composed of 64 configuration, each one coresponding to one of the\ \ evaluated task.\n\nThe dataset has been created from 3 run(s). Each run can be\ \ found as a specific split in each configuration, the split being named using the\ \ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\ \nAn additional configuration \"results\" store all the aggregated results of the\ \ run (and is used to compute and display the agregated metrics on the [Open LLM\ \ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\ \nTo load the details from a run, you can for instance do the following:\n```python\n\ from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt\"\ ,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\ These are the [latest results from run 2023-09-16T19:22:51.930931](https://huggingface.co/datasets/open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt/blob/main/results_2023-09-16T19-22-51.930931.json)(note\ \ that their might be results for other tasks in the repos if successive evals didn't\ \ cover the same tasks. You find each in the results and the \"latest\" split for\ \ each eval):\n\n```python\n{\n \"all\": {\n \"em\": 0.0005243288590604027,\n\ \ \"em_stderr\": 0.000234437804648362,\n \"f1\": 0.023688129194630956,\n\ \ \"f1_stderr\": 0.0008485245166671287,\n \"acc\": 0.25769534333070243,\n\ \ \"acc_stderr\": 0.007022913394891831\n },\n \"harness|drop|3\": {\n\ \ \"em\": 0.0005243288590604027,\n \"em_stderr\": 0.000234437804648362,\n\ \ \"f1\": 0.023688129194630956,\n \"f1_stderr\": 0.0008485245166671287\n\ \ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.0,\n \"acc_stderr\"\ : 0.0\n },\n \"harness|winogrande|5\": {\n \"acc\": 0.5153906866614049,\n\ \ \"acc_stderr\": 0.014045826789783661\n }\n}\n```" repo_url: https://huggingface.co/klosax/pythia-70m-deduped-step44k-92bt leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard point_of_contact: clementine@hf.co configs: - config_name: harness_arc_challenge_25 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|arc:challenge|25_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|arc:challenge|25_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|arc:challenge|25_2023-07-24T09:43:50.721558.parquet' - config_name: harness_drop_3 data_files: - split: 2023_09_16T19_22_51.930931 path: - '**/details_harness|drop|3_2023-09-16T19-22-51.930931.parquet' - split: latest path: - '**/details_harness|drop|3_2023-09-16T19-22-51.930931.parquet' - config_name: harness_gsm8k_5 data_files: - split: 2023_09_16T19_22_51.930931 path: - '**/details_harness|gsm8k|5_2023-09-16T19-22-51.930931.parquet' - split: latest path: - '**/details_harness|gsm8k|5_2023-09-16T19-22-51.930931.parquet' - config_name: harness_hellaswag_10 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hellaswag|10_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hellaswag|10_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hellaswag|10_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:26:02.759648.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:43:50.721558.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_abstract_algebra_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_anatomy_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_astronomy_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_business_ethics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_clinical_knowledge_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_biology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_chemistry_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_computer_science_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_mathematics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_medicine_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_college_physics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_computer_security_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_conceptual_physics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_econometrics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_electrical_engineering_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_elementary_mathematics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_formal_logic_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_global_facts_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_biology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_chemistry_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_computer_science_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_european_history_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_geography_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_government_and_politics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_macroeconomics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_mathematics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_microeconomics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_physics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_psychology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_statistics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_us_history_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_high_school_world_history_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_human_aging_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_human_sexuality_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_international_law_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_jurisprudence_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_logical_fallacies_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_machine_learning_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_management_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-management|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_marketing_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_medical_genetics_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_miscellaneous_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_moral_disputes_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_moral_scenarios_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_nutrition_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-nutrition|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_philosophy_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-philosophy|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_prehistory_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-prehistory|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_professional_accounting_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_professional_law_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_law|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_professional_medicine_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_professional_psychology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_public_relations_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-public_relations|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_security_studies_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-security_studies|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_sociology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-sociology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_us_foreign_policy_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_virology_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-virology|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_hendrycksTest_world_religions_5 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|hendrycksTest-world_religions|5_2023-07-24T09:43:50.721558.parquet' - config_name: harness_truthfulqa_mc_0 data_files: - split: 2023_07_24T09_26_02.759648 path: - '**/details_harness|truthfulqa:mc|0_2023-07-24T09:26:02.759648.parquet' - split: 2023_07_24T09_43_50.721558 path: - '**/details_harness|truthfulqa:mc|0_2023-07-24T09:43:50.721558.parquet' - split: latest path: - '**/details_harness|truthfulqa:mc|0_2023-07-24T09:43:50.721558.parquet' - config_name: harness_winogrande_5 data_files: - split: 2023_09_16T19_22_51.930931 path: - '**/details_harness|winogrande|5_2023-09-16T19-22-51.930931.parquet' - split: latest path: - '**/details_harness|winogrande|5_2023-09-16T19-22-51.930931.parquet' - config_name: results data_files: - split: 2023_07_24T09_26_02.759648 path: - results_2023-07-24T09:26:02.759648.parquet - split: 2023_07_24T09_43_50.721558 path: - results_2023-07-24T09:43:50.721558.parquet - split: 2023_09_16T19_22_51.930931 path: - results_2023-09-16T19-22-51.930931.parquet - split: latest path: - results_2023-09-16T19-22-51.930931.parquet --- # Dataset Card for Evaluation run of klosax/pythia-70m-deduped-step44k-92bt ## Dataset Description - **Homepage:** - **Repository:** https://huggingface.co/klosax/pythia-70m-deduped-step44k-92bt - **Paper:** - **Leaderboard:** https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard - **Point of Contact:** clementine@hf.co ### Dataset Summary Dataset automatically created during the evaluation run of model [klosax/pythia-70m-deduped-step44k-92bt](https://huggingface.co/klosax/pythia-70m-deduped-step44k-92bt) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The dataset is composed of 64 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 3 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the agregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt", "harness_winogrande_5", split="train") ``` ## Latest results These are the [latest results from run 2023-09-16T19:22:51.930931](https://huggingface.co/datasets/open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt/blob/main/results_2023-09-16T19-22-51.930931.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "em": 0.0005243288590604027, "em_stderr": 0.000234437804648362, "f1": 0.023688129194630956, "f1_stderr": 0.0008485245166671287, "acc": 0.25769534333070243, "acc_stderr": 0.007022913394891831 }, "harness|drop|3": { "em": 0.0005243288590604027, "em_stderr": 0.000234437804648362, "f1": 0.023688129194630956, "f1_stderr": 0.0008485245166671287 }, "harness|gsm8k|5": { "acc": 0.0, "acc_stderr": 0.0 }, "harness|winogrande|5": { "acc": 0.5153906866614049, "acc_stderr": 0.014045826789783661 } } ``` ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]
提供机构:
open-llm-leaderboard
原始信息汇总

数据集概述

数据集来源

该数据集是在模型 klosax/pythia-70m-deduped-step44k-92btOpen LLM Leaderboard 上的评估运行期间自动创建的。

数据集结构

  • 配置数量:64个配置,每个配置对应一个评估任务。
  • 数据来源:数据集由3次运行创建,每次运行在每个配置中作为一个特定的分片存在,分片名称使用运行的时间戳。
  • 最新结果:"train" 分片总是指向最新的结果。
  • 汇总结果:一个额外的配置 "results" 存储所有运行的汇总结果,用于计算和显示在 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_klosax__pythia-70m-deduped-step44k-92bt", "harness_winogrande_5", split="train")

最新结果

以下是 2023-09-16T19:22:51.930931 运行 的最新结果: python { "all": { "em": 0.0005243288590604027, "em_stderr": 0.000234437804648362, "f1": 0.023688129194630956, "f1_stderr": 0.0008485245166671287, "acc": 0.25769534333070243, "acc_stderr": 0.007022913394891831 }, "harness|drop|3": { "em": 0.0005243288590604027, "em_stderr": 0.000234437804648362, "f1": 0.023688129194630956, "f1_stderr": 0.0008485245166671287 }, "harness|gsm8k|5": { "acc": 0.0, "acc_stderr": 0.0 }, "harness|winogrande|5": { "acc": 0.5153906866614049, "acc_stderr": 0.014045826789783661 } }

配置详情

  • harness_arc_challenge_25

    • 分片:2023_07_24T09_26_02.759648
      • 路径:**/details_harness|arc:challenge|25_2023-07-24T09:26:02.759648.parquet
    • 分片:2023_07_24T09_43_50.721558
      • 路径:**/details_harness|arc:challenge|25_2023-07-24T09:43:50.721558.parquet
    • 分片:latest
      • 路径:**/details_harness|arc:challenge|25_2023-07-24T09:43:50.721558.parquet
  • harness_drop_3

    • 分片:2023_09_16T19_22_51.930931
      • 路径:**/details_harness|drop|3_2023-09-16T19-22-51.930931.parquet
    • 分片:latest
      • 路径:**/details_harness|drop|3_2023-09-16T19-22-51.930931.parquet
  • harness_gsm8k_5

    • 分片:2023_09_16T19_22_51.930931
      • 路径:**/details_harness|gsm8k|5_2023-09-16T19-22-51.930931.parquet
    • 分片:latest
      • 路径:**/details_harness|gsm8k|5_2023-09-16T19-22-51.930931.parquet
  • harness_hellaswag_10

    • 分片:2023_07_24T09_26_02.759648
      • 路径:**/details_harness|hellaswag|10_2023-07-24T09:26:02.759648.parquet
    • 分片:2023_07_24T09_43_50.721558
      • 路径:**/details_harness|hellaswag|10_2023-07-24T09:43:50.721558.parquet
    • 分片:latest
      • 路径:**/details_harness|hellaswag|10_2023-07-24T09:43:50.721558.parquet
  • harness_hendrycksTest_5

    • 分片:2023_07_24T09_26_02.759648
      • 路径:**/details_harness|hendrycksTest-abstract_algebra|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-anatomy|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-astronomy|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-business_ethics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-clinical_knowledge|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_biology|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_chemistry|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_computer_science|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_mathematics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_medicine|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-college_physics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-computer_security|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-conceptual_physics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-econometrics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-electrical_engineering|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-elementary_mathematics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-formal_logic|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-global_facts|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_biology|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_chemistry|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_computer_science|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_european_history|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_geography|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_mathematics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_physics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_psychology|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_statistics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_us_history|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_world_history|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-human_aging|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-human_sexuality|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-international_law|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-jurisprudence|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-logical_fallacies|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-machine_learning|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-management|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-marketing|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-medical_genetics|5_2023-07-24T09:26:02.759648.parquet
      • 路径:**/details_harness|hendrycksTest-miscellaneous|5_2023-07-24T09:26:02.759648.parquet
      • 路径:`**/details_harness|hendrycksTest-moral_disputes|5_2023-07-24T09:26:02.759648
搜集汇总
数据集介绍
main_image_url
构建方式
在大型语言模型评估领域,数据集构建的严谨性直接关系到评估结果的可靠性。本数据集作为Open LLM Leaderboard评估流程的产物,其构建过程体现了自动化与系统化的特点。它通过整合三次独立的评估运行,将每次运行的时间戳作为数据切分依据,自动生成了涵盖64种不同任务配置的详细评估结果。每个配置对应一个特定的评测任务,并以Parquet格式存储评估细节,确保了数据的高效访问与处理。这种构建方式不仅实现了评估过程的透明化,也为模型性能的纵向比较提供了结构化数据支持。
特点
该数据集在大型语言模型评估领域展现出多维度的特性。其核心特征在于全面覆盖了从常识推理到专业知识的广泛评测任务,包括ARC挑战赛、DROP、GSM8K、HellaSwag以及涵盖57个学科的MMLU基准测试。数据集采用时间戳切分机制,保留了历次评估的历史记录,同时通过“latest”切分提供最新结果,实现了评估结果的版本化管理。这种结构设计使得研究者能够追踪模型性能的演变轨迹,同时进行跨任务、跨时间的综合分析,为模型能力评估提供了丰富的维度。
使用方法
对于希望深入分析模型评估细节的研究者而言,该数据集提供了灵活的访问接口。用户可通过Hugging Face的datasets库,指定具体任务配置和切分来加载数据。例如,加载Winogrande任务的详细评估结果,只需调用load_dataset函数并传入相应参数。数据集中的“results”配置汇总了所有任务的聚合指标,可直接用于模型综合性能分析。这种设计使得研究者既能进行细粒度的任务级分析,也能快速获取模型的整体评估概况,为模型比较和性能诊断提供了便捷的数据支持。
背景与挑战
背景概述
在大型语言模型(LLM)快速发展的背景下,评估其性能成为推动技术进步的关键环节。Open LLM Leaderboard数据集由Hugging Face团队于2023年创建,旨在为社区提供一个标准化、透明的模型评估平台。该数据集通过自动化流程,系统性地收集了如klosax/pythia-70m-deduped-step44k-92bt等模型在多项评测任务中的详细结果,涵盖了常识推理、数学问题求解及专业知识测试等多个维度。其核心研究问题聚焦于如何客观量化语言模型在多样化任务上的能力,从而促进模型间的公平比较与迭代优化,对推动开放科学和可复现研究产生了深远影响。
当前挑战
该数据集所解决的领域问题在于大型语言模型的多维度能力评估,其挑战体现在如何设计全面且平衡的评测体系,以覆盖从基础语言理解到复杂推理的广泛技能。构建过程中的挑战则涉及技术实现层面,包括自动化评测流程的稳定性保障、跨任务数据格式的统一性处理,以及海量结果数据的高效存储与检索。此外,随着模型与评测基准的快速演进,保持数据集内容的时效性与向后兼容性亦成为持续性的管理难题。
常用场景
经典使用场景
在大型语言模型评估领域,该数据集作为Open LLM Leaderboard评估框架的产物,其经典使用场景在于为特定模型(如pythia-70m-deduped-step44k-92bt)提供多任务、细粒度的性能基准测试。通过整合ARC挑战赛、HellaSwag、Winogrande等多样化评测任务,研究者能够系统性地剖析模型在常识推理、语言理解及数学解题等维度的能力边界,从而为模型优化与比较提供实证依据。
实际应用
在实际应用层面,该数据集为模型开发者、企业及研究机构提供了关键的决策支持工具。开发者可依据详细的评估结果定位模型弱点,针对性调整训练数据或架构;企业在选型预训练模型时,能基于多维度性能数据做出成本与效能的平衡抉择;学术机构则可利用其开展模型能力溯源研究,探索缩放定律、涌现特性等深层机理,加速可靠、安全的大型语言模型落地进程。
衍生相关工作
围绕该数据集衍生的经典工作主要集中在评估方法论创新与模型诊断工具开发。例如,基于其细粒度结果,研究者提出了任务聚类分析框架,以揭示模型能力的潜在结构;亦有工作利用其多轮评估数据,构建了训练动态可视化工具,用于监测模型学习轨迹。这些衍生研究深化了社区对模型评估科学性的理解,并催生了如LM-Evaluation-Harness等标准化评测工具集的持续演进。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作