five

open-llm-leaderboard-old/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16

收藏
Hugging Face2023-10-24 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Evaluation run of openBuddy/openbuddy-llama2-34b-v11.1-bf16 dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [openBuddy/openbuddy-llama2-34b-v11.1-bf16](https://huggingface.co/openBuddy/openbuddy-llama2-34b-v11.1-bf16)\ \ on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\ \nThe dataset is composed of 64 configuration, each one coresponding to one of the\ \ evaluated task.\n\nThe dataset has been created from 4 run(s). Each run can be\ \ found as a specific split in each configuration, the split being named using the\ \ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\ \nAn additional configuration \"results\" store all the aggregated results of the\ \ run (and is used to compute and display the agregated metrics on the [Open LLM\ \ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\ \nTo load the details from a run, you can for instance do the following:\n```python\n\ from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16\"\ ,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\ These are the [latest results from run 2023-10-24T15:31:04.396852](https://huggingface.co/datasets/open-llm-leaderboard/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16/blob/main/results_2023-10-24T15-31-04.396852.json)(note\ \ that their might be results for other tasks in the repos if successive evals didn't\ \ cover the same tasks. You find each in the results and the \"latest\" split for\ \ each eval):\n\n```python\n{\n \"all\": {\n \"em\": 0.360633389261745,\n\ \ \"em_stderr\": 0.004917536525106699,\n \"f1\": 0.4180935402684579,\n\ \ \"f1_stderr\": 0.004778710905980245,\n \"acc\": 0.5268440191410464,\n\ \ \"acc_stderr\": 0.012939810741097795\n },\n \"harness|drop|3\": {\n\ \ \"em\": 0.360633389261745,\n \"em_stderr\": 0.004917536525106699,\n\ \ \"f1\": 0.4180935402684579,\n \"f1_stderr\": 0.004778710905980245\n\ \ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.3457164518574678,\n \ \ \"acc_stderr\": 0.013100422990441578\n },\n \"harness|winogrande|5\"\ : {\n \"acc\": 0.7079715864246251,\n \"acc_stderr\": 0.012779198491754013\n\ \ }\n}\n```" repo_url: https://huggingface.co/openBuddy/openbuddy-llama2-34b-v11.1-bf16 leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard point_of_contact: clementine@hf.co configs: - config_name: harness_arc_challenge_25 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|arc:challenge|25_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|arc:challenge|25_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|arc:challenge|25_2023-09-13T12-14-53.531149.parquet' - config_name: harness_drop_3 data_files: - split: 2023_10_24T13_56_54.496754 path: - '**/details_harness|drop|3_2023-10-24T13-56-54.496754.parquet' - split: 2023_10_24T15_31_04.396852 path: - '**/details_harness|drop|3_2023-10-24T15-31-04.396852.parquet' - split: latest path: - '**/details_harness|drop|3_2023-10-24T15-31-04.396852.parquet' - config_name: harness_gsm8k_5 data_files: - split: 2023_10_24T13_56_54.496754 path: - '**/details_harness|gsm8k|5_2023-10-24T13-56-54.496754.parquet' - split: 2023_10_24T15_31_04.396852 path: - '**/details_harness|gsm8k|5_2023-10-24T15-31-04.396852.parquet' - split: latest path: - '**/details_harness|gsm8k|5_2023-10-24T15-31-04.396852.parquet' - config_name: harness_hellaswag_10 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hellaswag|10_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hellaswag|10_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hellaswag|10_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-management|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-09-13T11-53-35.640501.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-management|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-management|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-09-13T12-14-53.531149.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_abstract_algebra_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_anatomy_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-anatomy|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_astronomy_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-astronomy|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_business_ethics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_clinical_knowledge_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_biology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_biology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_chemistry_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_computer_science_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_mathematics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_medicine_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_college_physics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_physics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_computer_security_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-computer_security|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_conceptual_physics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_econometrics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-econometrics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_electrical_engineering_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_elementary_mathematics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_formal_logic_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_global_facts_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-global_facts|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_biology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_chemistry_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_computer_science_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_european_history_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_geography_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_government_and_politics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_macroeconomics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_mathematics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_microeconomics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_physics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_psychology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_statistics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_us_history_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_high_school_world_history_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_human_aging_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_aging|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_human_sexuality_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_international_law_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-international_law|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_jurisprudence_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_logical_fallacies_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_machine_learning_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_management_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-management|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-management|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-management|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_marketing_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-marketing|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_medical_genetics_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_miscellaneous_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_moral_disputes_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_moral_scenarios_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_nutrition_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-nutrition|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_philosophy_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-philosophy|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_prehistory_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-prehistory|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_professional_accounting_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_professional_law_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_law|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_professional_medicine_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_professional_psychology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_public_relations_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-public_relations|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_security_studies_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-security_studies|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_sociology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-sociology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_us_foreign_policy_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_virology_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-virology|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-virology|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-virology|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_hendrycksTest_world_religions_5 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|hendrycksTest-world_religions|5_2023-09-13T12-14-53.531149.parquet' - config_name: harness_truthfulqa_mc_0 data_files: - split: 2023_09_13T11_53_35.640501 path: - '**/details_harness|truthfulqa:mc|0_2023-09-13T11-53-35.640501.parquet' - split: 2023_09_13T12_14_53.531149 path: - '**/details_harness|truthfulqa:mc|0_2023-09-13T12-14-53.531149.parquet' - split: latest path: - '**/details_harness|truthfulqa:mc|0_2023-09-13T12-14-53.531149.parquet' - config_name: harness_winogrande_5 data_files: - split: 2023_10_24T13_56_54.496754 path: - '**/details_harness|winogrande|5_2023-10-24T13-56-54.496754.parquet' - split: 2023_10_24T15_31_04.396852 path: - '**/details_harness|winogrande|5_2023-10-24T15-31-04.396852.parquet' - split: latest path: - '**/details_harness|winogrande|5_2023-10-24T15-31-04.396852.parquet' - config_name: results data_files: - split: 2023_09_13T11_53_35.640501 path: - results_2023-09-13T11-53-35.640501.parquet - split: 2023_09_13T12_14_53.531149 path: - results_2023-09-13T12-14-53.531149.parquet - split: 2023_10_24T13_56_54.496754 path: - results_2023-10-24T13-56-54.496754.parquet - split: 2023_10_24T15_31_04.396852 path: - results_2023-10-24T15-31-04.396852.parquet - split: latest path: - results_2023-10-24T15-31-04.396852.parquet --- # Dataset Card for Evaluation run of openBuddy/openbuddy-llama2-34b-v11.1-bf16 ## Dataset Description - **Homepage:** - **Repository:** https://huggingface.co/openBuddy/openbuddy-llama2-34b-v11.1-bf16 - **Paper:** - **Leaderboard:** https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard - **Point of Contact:** clementine@hf.co ### Dataset Summary Dataset automatically created during the evaluation run of model [openBuddy/openbuddy-llama2-34b-v11.1-bf16](https://huggingface.co/openBuddy/openbuddy-llama2-34b-v11.1-bf16) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The dataset is composed of 64 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 4 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the agregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16", "harness_winogrande_5", split="train") ``` ## Latest results These are the [latest results from run 2023-10-24T15:31:04.396852](https://huggingface.co/datasets/open-llm-leaderboard/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16/blob/main/results_2023-10-24T15-31-04.396852.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "em": 0.360633389261745, "em_stderr": 0.004917536525106699, "f1": 0.4180935402684579, "f1_stderr": 0.004778710905980245, "acc": 0.5268440191410464, "acc_stderr": 0.012939810741097795 }, "harness|drop|3": { "em": 0.360633389261745, "em_stderr": 0.004917536525106699, "f1": 0.4180935402684579, "f1_stderr": 0.004778710905980245 }, "harness|gsm8k|5": { "acc": 0.3457164518574678, "acc_stderr": 0.013100422990441578 }, "harness|winogrande|5": { "acc": 0.7079715864246251, "acc_stderr": 0.012779198491754013 } } ``` ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集摘要

该数据集是在对模型 openBuddy/openbuddy-llama2-34b-v11.1-bf16 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

  • 数据集由 64 个配置组成,每个配置对应一个评估任务。
  • 数据集从 4 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_openBuddy__openbuddy-llama2-34b-v11.1-bf16", "harness_winogrande_5", split="train")

最新结果

以下是 2023-10-24T15:31:04.396852 运行的最新结果: python { "all": { "em": 0.360633389261745, "em_stderr": 0.004917536525106699, "f1": 0.4180935402684579, "f1_stderr": 0.004778710905980245, "acc": 0.5268440191410464, "acc_stderr": 0.012939810741097795 }, "harness|drop|3": { "em": 0.360633389261745, "em_stderr": 0.004917536525106699, "f1": 0.4180935402684579, "f1_stderr": 0.004778710905980245 }, "harness|gsm8k|5": { "acc": 0.3457164518574678, "acc_stderr": 0.013100422990441578 }, "harness|winogrande|5": { "acc": 0.7079715864246251, "acc_stderr": 0.012779198491754013 } }

配置详情

  • harness_arc_challenge_25

    • 分割:2023_09_13T11_53_35.640501
      • 路径:**/details_harness|arc:challenge|25_2023-09-13T11-53-35.640501.parquet
    • 分割:2023_09_13T12_14_53.531149
      • 路径:**/details_harness|arc:challenge|25_2023-09-13T12-14-53.531149.parquet
    • 分割:latest
      • 路径:**/details_harness|arc:challenge|25_2023-09-13T12-14-53.531149.parquet
  • harness_drop_3

    • 分割:2023_10_24T13_56_54.496754
      • 路径:**/details_harness|drop|3_2023-10-24T13-56-54.496754.parquet
    • 分割:2023_10_24T15_31_04.396852
      • 路径:**/details_harness|drop|3_2023-10-24T15-31-04.396852.parquet
    • 分割:latest
      • 路径:**/details_harness|drop|3_2023-10-24T15-31-04.396852.parquet
  • harness_gsm8k_5

    • 分割:2023_10_24T13_56_54.496754
      • 路径:**/details_harness|gsm8k|5_2023-10-24T13-56-54.496754.parquet
    • 分割:2023_10_24T15_31_04.396852
      • 路径:**/details_harness|gsm8k|5_2023-10-24T15-31-04.396852.parquet
    • 分割:latest
      • 路径:**/details_harness|gsm8k|5_2023-10-24T15-31-04.396852.parquet
  • harness_hellaswag_10

    • 分割:2023_09_13T11_53_35.640501
      • 路径:**/details_harness|hellaswag|10_2023-09-13T11-53-35.640501.parquet
    • 分割:2023_09_13T12_14_53.531149
      • 路径:**/details_harness|hellaswag|10_2023-09-13T12-14-53.531149.parquet
    • 分割:latest
      • 路径:**/details_harness|hellaswag|10_2023-09-13T12-14-53.531149.parquet
  • harness_hendrycksTest_5

    • 分割:2023_09_13T11_53_35.640501
      • 路径:**/details_harness|hendrycksTest-abstract_algebra|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-anatomy|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-astronomy|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-business_ethics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-clinical_knowledge|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_biology|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_chemistry|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_computer_science|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_mathematics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_medicine|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-college_physics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-computer_security|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-conceptual_physics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-econometrics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-electrical_engineering|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-elementary_mathematics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-formal_logic|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-global_facts|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_biology|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_chemistry|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_computer_science|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_european_history|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_geography|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_mathematics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_physics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_psychology|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_statistics|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_us_history|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-high_school_world_history|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-human_aging|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-human_sexuality|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-international_law|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-jurisprudence|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-logical_fallacies|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-machine_learning|5_2023-09-13T11-53-35.640501.parquet
      • 路径:**/details_harness|hendrycksTest-management|5_2023-09-13T11-53-35.640501.parquet
      • 路径:`**/details_harness|hendrycksTest-marketing|5_2023-09-13T11-53-35.640501.par
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集是在Open LLM Leaderboard评测框架下,对openBuddy/openbuddy-llama2-34b-v11.1-bf16模型进行自动化评估过程中生成的。数据集由64个配置组成,每个配置对应一项具体的评测任务,覆盖了如ARC挑战赛、DROP、GSM8K、HellaSwag及HendrycksTest等多维度基准。每次运行结果以时间戳命名作为独立分割,其中“train”分割始终指向最新一次评测的产出,而额外的“results”配置则汇总了所有运行的聚合指标,为排行榜的指标计算与展示提供支撑。
特点
数据集的核心特色在于其动态更新与多任务覆盖能力。它通过多次运行(目前为4次)累积了不同时间点的评测细节,并以Parquet格式高效存储。每个任务配置下均包含多个历史运行分割,便于研究者追踪模型性能的演变轨迹。此外,数据集不仅存储了原始评测细节,还提供了聚合后的宏观指标(如准确率、精确匹配率及F1分数),使得对模型在不同任务上的表现能够进行直观比较与深入分析。
使用方法
用户可通过HuggingFace的datasets库便捷地加载该数据集。例如,使用load_dataset函数并指定目标配置名称(如“harness_winogrande_5”)及所需分割(如“train”),即可获取对应任务的评测详情。对于需要分析聚合结果的需求,可直接访问“results”配置。此外,通过浏览不同时间戳命名的分割,用户可以回溯模型在特定历史时刻的评测数据,支持纵向对比研究。
背景与挑战
背景概述
该数据集诞生于大型语言模型(LLM)评估领域蓬勃发展的2023年,由Hugging Face团队在Open LLM Leaderboard框架下自动创建,旨在系统性地评估openBuddy团队开发的llama2-34b-v11.1-bf16模型的多维度能力。核心研究问题聚焦于如何通过标准化、可复现的评测流程,量化模型在常识推理、数学解题、文本理解等关键任务上的表现。该数据集通过记录64个配置下的多次运行结果,为社区提供了透明、细致的模型性能追踪机制,推动了LLM评估的规范化和可比性,对后续模型优化与基准研究产生了深远影响。
当前挑战
数据集所解决的领域挑战在于LLM评估的碎片化与不可复现性,传统上模型性能报告常因评测环境、指标差异而难以横向比较,该数据集通过统一的任务配置和结果聚合,为模型间公平对比提供了基石。构建过程中面临的挑战包括:需确保多次评测运行的数据一致性,避免因随机性导致结果偏差;同时要高效管理来自不同任务(如ARC、GSM8K、Winogrande等)的海量细粒度评估结果,并将其结构化存储为可查询的Parquet格式,此外还需维护时间戳分片与最新结果的动态更新逻辑,以支持持续集成式的模型迭代评估。
常用场景
经典使用场景
在自然语言处理与大规模语言模型飞速发展的时代背景下,该数据集作为Open LLM Leaderboard评估流程的产物,其经典使用场景在于系统性地记录和复现模型在多项基准任务上的细粒度表现。数据集内含64个配置,每个配置对应一个评估任务,如ARC Challenge、GSM8K数学推理、Winogrande常识推理等,研究者可通过加载特定任务的配置与时间戳分割,精确获取模型在单次或多次评估中的详细结果,从而进行公平的横向对比与纵向追踪。
衍生相关工作
该数据集衍生了一系列关于大语言模型评估标准化与透明化的经典工作。例如,Open LLM Leaderboard本身即是基于此类数据构建的模型竞技平台,推动了社区采用统一评测框架。后续研究如《Evaluating Large Language Models: A Survey》中引用了该数据集作为评估可复现性的范例,而一些模型分析工具如LM Eval Harness也借鉴了其任务配置与结果存储格式,促进了评估流程的自动化与规范化。这些衍生工作共同强化了模型评测的严谨性与可信度。
数据集最近研究
最新研究方向
在大语言模型(LLM)性能评估领域,Open LLM Leaderboard已成为衡量模型综合能力的重要基准。针对openBuddy-llama2-34b-v11.1-bf16这一基于Llama 2架构的34B参数对话模型,最新研究聚焦于其多任务泛化能力的系统评估。该数据集的评测覆盖了ARC挑战、DROP推理、GSM8K数学、HellaSwag常识推理、MMLU学科知识及WinoGrande共指消解等64项配置,全面映射了模型在复杂推理、科学知识及常识理解等前沿方向的表现。值得注意的是,评测结果揭示了模型在GSM8K任务上约34.6%的准确率与WinoGrande上约70.8%的准确率之间的显著差异,这为研究模型在数学推理与语言理解之间的能力鸿沟提供了关键线索。该数据集的意义在于,它不仅为开源LLM的横向对比提供了标准化框架,更推动了多维度评估方法论的发展,使研究者能够精准定位模型在特定认知任务上的优势与局限,从而指导未来模型在知识蒸馏、领域适配及对齐优化等方面的迭代方向。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作