five

open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0

收藏
Hugging Face2023-10-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Evaluation run of jondurbin/airoboros-65b-gpt4-m2.0 dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [jondurbin/airoboros-65b-gpt4-m2.0](https://huggingface.co/jondurbin/airoboros-65b-gpt4-m2.0)\ \ on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\ \nThe dataset is composed of 64 configuration, each one coresponding to one of the\ \ evaluated task.\n\nThe dataset has been created from 4 run(s). Each run can be\ \ found as a specific split in each configuration, the split being named using the\ \ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\ \nAn additional configuration \"results\" store all the aggregated results of the\ \ run (and is used to compute and display the agregated metrics on the [Open LLM\ \ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\ \nTo load the details from a run, you can for instance do the following:\n```python\n\ from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0\"\ ,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\ These are the [latest results from run 2023-10-22T21:36:42.557922](https://huggingface.co/datasets/open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0/blob/main/results_2023-10-22T21-36-42.557922.json)(note\ \ that their might be results for other tasks in the repos if successive evals didn't\ \ cover the same tasks. You find each in the results and the \"latest\" split for\ \ each eval):\n\n```python\n{\n \"all\": {\n \"em\": 0.07036493288590603,\n\ \ \"em_stderr\": 0.0026192324279004876,\n \"f1\": 0.14583787751677768,\n\ \ \"f1_stderr\": 0.002841532518554861,\n \"acc\": 0.5116370357826509,\n\ \ \"acc_stderr\": 0.011318931374370282\n },\n \"harness|drop|3\": {\n\ \ \"em\": 0.07036493288590603,\n \"em_stderr\": 0.0026192324279004876,\n\ \ \"f1\": 0.14583787751677768,\n \"f1_stderr\": 0.002841532518554861\n\ \ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.221379833206975,\n \ \ \"acc_stderr\": 0.011436000004253518\n },\n \"harness|winogrande|5\":\ \ {\n \"acc\": 0.8018942383583267,\n \"acc_stderr\": 0.011201862744487047\n\ \ }\n}\n```" repo_url: https://huggingface.co/jondurbin/airoboros-65b-gpt4-m2.0 leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard point_of_contact: clementine@hf.co configs: - config_name: harness_arc_challenge_25 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|arc:challenge|25_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|arc:challenge|25_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|arc:challenge|25_2023-08-09T18:28:50.823349.parquet' - config_name: harness_drop_3 data_files: - split: 2023_10_22T15_08_22.403545 path: - '**/details_harness|drop|3_2023-10-22T15-08-22.403545.parquet' - split: 2023_10_22T21_36_42.557922 path: - '**/details_harness|drop|3_2023-10-22T21-36-42.557922.parquet' - split: latest path: - '**/details_harness|drop|3_2023-10-22T21-36-42.557922.parquet' - config_name: harness_gsm8k_5 data_files: - split: 2023_10_22T15_08_22.403545 path: - '**/details_harness|gsm8k|5_2023-10-22T15-08-22.403545.parquet' - split: 2023_10_22T21_36_42.557922 path: - '**/details_harness|gsm8k|5_2023-10-22T21-36-42.557922.parquet' - split: latest path: - '**/details_harness|gsm8k|5_2023-10-22T21-36-42.557922.parquet' - config_name: harness_hellaswag_10 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hellaswag|10_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hellaswag|10_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hellaswag|10_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-management|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-08-09T17:03:24.422206.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-management|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-management|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-08-09T18:28:50.823349.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_abstract_algebra_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_anatomy_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-anatomy|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_astronomy_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-astronomy|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_business_ethics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_clinical_knowledge_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_biology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_biology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_chemistry_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_computer_science_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_mathematics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_medicine_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_college_physics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_physics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_computer_security_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-computer_security|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_conceptual_physics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_econometrics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-econometrics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_electrical_engineering_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_elementary_mathematics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_formal_logic_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_global_facts_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-global_facts|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_biology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_chemistry_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_computer_science_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_european_history_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_geography_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_government_and_politics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_macroeconomics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_mathematics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_microeconomics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_physics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_psychology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_statistics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_us_history_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_high_school_world_history_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_human_aging_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_aging|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_human_sexuality_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_international_law_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-international_law|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_jurisprudence_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_logical_fallacies_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_machine_learning_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_management_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-management|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-management|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-management|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_marketing_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-marketing|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_medical_genetics_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_miscellaneous_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_moral_disputes_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_moral_scenarios_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_nutrition_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-nutrition|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_philosophy_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-philosophy|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_prehistory_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-prehistory|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_professional_accounting_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_professional_law_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_law|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_professional_medicine_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_professional_psychology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_public_relations_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-public_relations|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_security_studies_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-security_studies|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_sociology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-sociology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_us_foreign_policy_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_virology_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-virology|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-virology|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-virology|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_hendrycksTest_world_religions_5 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|hendrycksTest-world_religions|5_2023-08-09T18:28:50.823349.parquet' - config_name: harness_truthfulqa_mc_0 data_files: - split: 2023_08_09T17_03_24.422206 path: - '**/details_harness|truthfulqa:mc|0_2023-08-09T17:03:24.422206.parquet' - split: 2023_08_09T18_28_50.823349 path: - '**/details_harness|truthfulqa:mc|0_2023-08-09T18:28:50.823349.parquet' - split: latest path: - '**/details_harness|truthfulqa:mc|0_2023-08-09T18:28:50.823349.parquet' - config_name: harness_winogrande_5 data_files: - split: 2023_10_22T15_08_22.403545 path: - '**/details_harness|winogrande|5_2023-10-22T15-08-22.403545.parquet' - split: 2023_10_22T21_36_42.557922 path: - '**/details_harness|winogrande|5_2023-10-22T21-36-42.557922.parquet' - split: latest path: - '**/details_harness|winogrande|5_2023-10-22T21-36-42.557922.parquet' - config_name: results data_files: - split: 2023_08_09T17_03_24.422206 path: - results_2023-08-09T17:03:24.422206.parquet - split: 2023_08_09T18_28_50.823349 path: - results_2023-08-09T18:28:50.823349.parquet - split: 2023_10_22T15_08_22.403545 path: - results_2023-10-22T15-08-22.403545.parquet - split: 2023_10_22T21_36_42.557922 path: - results_2023-10-22T21-36-42.557922.parquet - split: latest path: - results_2023-10-22T21-36-42.557922.parquet --- # Dataset Card for Evaluation run of jondurbin/airoboros-65b-gpt4-m2.0 ## Dataset Description - **Homepage:** - **Repository:** https://huggingface.co/jondurbin/airoboros-65b-gpt4-m2.0 - **Paper:** - **Leaderboard:** https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard - **Point of Contact:** clementine@hf.co ### Dataset Summary Dataset automatically created during the evaluation run of model [jondurbin/airoboros-65b-gpt4-m2.0](https://huggingface.co/jondurbin/airoboros-65b-gpt4-m2.0) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The dataset is composed of 64 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 4 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the agregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0", "harness_winogrande_5", split="train") ``` ## Latest results These are the [latest results from run 2023-10-22T21:36:42.557922](https://huggingface.co/datasets/open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0/blob/main/results_2023-10-22T21-36-42.557922.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "em": 0.07036493288590603, "em_stderr": 0.0026192324279004876, "f1": 0.14583787751677768, "f1_stderr": 0.002841532518554861, "acc": 0.5116370357826509, "acc_stderr": 0.011318931374370282 }, "harness|drop|3": { "em": 0.07036493288590603, "em_stderr": 0.0026192324279004876, "f1": 0.14583787751677768, "f1_stderr": 0.002841532518554861 }, "harness|gsm8k|5": { "acc": 0.221379833206975, "acc_stderr": 0.011436000004253518 }, "harness|winogrande|5": { "acc": 0.8018942383583267, "acc_stderr": 0.011201862744487047 } } ``` ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]
提供机构:
open-llm-leaderboard
原始信息汇总

数据集概述

数据集来源

该数据集是在对模型 jondurbin/airoboros-65b-gpt4-m2.0 进行评估运行期间自动创建的,评估运行在 Open LLM Leaderboard 上进行。

数据集组成

数据集由 64 个配置组成,每个配置对应一个评估任务。数据集从 4 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。

额外配置

一个额外的配置 "results" 存储了所有运行的聚合结果,用于计算并在 Open LLM Leaderboard 上显示聚合指标。

数据加载示例

要加载某个运行的详细信息,可以使用以下代码: python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0", "harness_winogrande_5", split="train")

最新结果

以下是 2023-10-22T21:36:42.557922 运行的最新结果: python { "all": { "em": 0.07036493288590603, "em_stderr": 0.0026192324279004876, "f1": 0.14583787751677768, "f1_stderr": 0.002841532518554861, "acc": 0.5116370357826509, "acc_stderr": 0.011318931374370282 }, "harness|drop|3": { "em": 0.07036493288590603, "em_stderr": 0.0026192324279004876, "f1": 0.14583787751677768, "f1_stderr": 0.002841532518554861 }, "harness|gsm8k|5": { "acc": 0.221379833206975, "acc_stderr": 0.011436000004253518 }, "harness|winogrande|5": { "acc": 0.8018942383583267, "acc_stderr": 0.011201862744487047 } }

配置详情

以下是数据集的配置详情:

  • config_name: harness_arc_challenge_25

    • split: 2023_08_09T17_03_24.422206
      • path: **/details_harness|arc:challenge|25_2023-08-09T17:03:24.422206.parquet
    • split: 2023_08_09T18_28_50.823349
      • path: **/details_harness|arc:challenge|25_2023-08-09T18:28:50.823349.parquet
    • split: latest
      • path: **/details_harness|arc:challenge|25_2023-08-09T18:28:50.823349.parquet
  • config_name: harness_drop_3

    • split: 2023_10_22T15_08_22.403545
      • path: **/details_harness|drop|3_2023-10-22T15-08-22.403545.parquet
    • split: 2023_10_22T21_36_42.557922
      • path: **/details_harness|drop|3_2023-10-22T21-36-42.557922.parquet
    • split: latest
      • path: **/details_harness|drop|3_2023-10-22T21-36-42.557922.parquet
  • config_name: harness_gsm8k_5

    • split: 2023_10_22T15_08_22.403545
      • path: **/details_harness|gsm8k|5_2023-10-22T15-08-22.403545.parquet
    • split: 2023_10_22T21_36_42.557922
      • path: **/details_harness|gsm8k|5_2023-10-22T21-36-42.557922.parquet
    • split: latest
      • path: **/details_harness|gsm8k|5_2023-10-22T21-36-42.557922.parquet
  • config_name: harness_hellaswag_10

    • split: 2023_08_09T17_03_24.422206
      • path: **/details_harness|hellaswag|10_2023-08-09T17:03:24.422206.parquet
    • split: 2023_08_09T18_28_50.823349
      • path: **/details_harness|hellaswag|10_2023-08-09T18:28:50.823349.parquet
    • split: latest
      • path: **/details_harness|hellaswag|10_2023-08-09T18:28:50.823349.parquet
  • config_name: harness_hendrycksTest_5

    • split: 2023_08_09T17_03_24.422206
      • path: **/details_harness|hendrycksTest-abstract_algebra|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-anatomy|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-astronomy|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-business_ethics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-clinical_knowledge|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_biology|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_chemistry|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_computer_science|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_mathematics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_medicine|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-college_physics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-computer_security|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-conceptual_physics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-econometrics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-electrical_engineering|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-elementary_mathematics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-formal_logic|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-global_facts|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_biology|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_chemistry|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_computer_science|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_european_history|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_geography|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_mathematics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_microeconomics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_physics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_psychology|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_statistics|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_us_history|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-high_school_world_history|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-human_aging|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-human_sexuality|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-international_law|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-jurisprudence|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-logical_fallacies|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-machine_learning|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-management|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-marketing|5_2023-08-09T17:03:24.422206.parquet
      • path: **/details_harness|hendrycksTest-medical_genetics|5_20
搜集汇总
数据集介绍
main_image_url
构建方式
该数据集是Open LLM Leaderboard在对模型jondurbin/airoboros-65b-gpt4-m2.0进行自动化评估过程中自动生成的。数据集包含64个配置,每个配置对应一项被评估的任务。数据源自4次运行,每次运行的结果被存储为特定分割,分割名称采用运行时间戳进行标识。其中,“train”分割始终指向最新的评估结果。此外,一个名为“results”的额外配置存储了所有运行的聚合结果,用于在排行榜上计算和展示综合指标。
使用方法
用户可通过HuggingFace的datasets库便捷加载数据。例如,使用load_dataset函数指定数据集名称和配置名(如“harness_winogrande_5”),并选择“train”分割即可获取最新评估详情。若需访问特定运行的历史数据,可改用对应时间戳的分割名称。此外,通过加载“results”配置,用户能够直接获取所有任务的聚合指标,便于进行模型性能的综合分析。
背景与挑战
背景概述
随着大语言模型(LLM)能力的飞速提升,如何系统、公正地评估其性能成为学术界与工业界共同关注的焦点。Open LLM Leaderboard由HuggingFace团队于2023年发起,旨在为开源大模型提供一个标准化的评测基准,推动模型透明化与可比较性。该数据集记录了模型jondurbin/airoboros-65b-gpt4-m2.0在多项任务上的详细评估结果,涵盖ARC挑战、DROP、GSM8K、HellaSwag及涵盖57个学科的MMLU等测试,体现了对模型推理、数学、常识与专业知识等多维能力的全面考量。作为评测流程的产物,该数据集不仅为研究者提供了细粒度的模型表现数据,也促进了社区对模型优劣的客观认知,对LLM生态的健康发展具有重要影响。
当前挑战
该数据集所解决的领域问题在于,大语言模型评测长期缺乏统一、可复现的框架,导致不同模型间难以公平比较。具体挑战包括:1)多任务评测的复杂性,需同时覆盖推理、数学、常识与专业知识等异构能力,且各任务评估指标(如准确率、F1分数)不统一,增加了综合评判的难度;2)构建过程中,数据需从多次运行中整合,每次运行的时间戳、任务配置与结果格式需严格对齐,以避免数据碎片化与版本混乱;3)结果存储与展示的标准化,需确保最新结果与历史结果可追溯,并通过Parquet格式高效管理大规模细粒度评估数据,这对数据管道的稳健性与可扩展性提出了较高要求。
常用场景
经典使用场景
在大型语言模型(LLM)评估领域,该数据集作为Open LLM Leaderboard的标准化评测工具,被广泛应用于衡量airoboros-65b-gpt4-m2.0等模型在多种任务上的表现。数据集涵盖ARC挑战、DROP、GSM8K、HellaSwag、WinoGrande及涵盖57个学科的MMLU等配置,每个配置对应特定任务,支持细粒度性能分析。研究者可通过加载不同配置和运行时间戳的分片,复现模型在推理、常识理解、数学推理和知识掌握等维度的表现,从而系统性地比较不同模型的优劣。这一场景为LLM的横向对比提供了可靠基础,推动了评估流程的规范化。
解决学术问题
该数据集有效解决了LLM评估中缺乏统一、可复现基准的学术难题。传统评估常因任务设置、数据版本和采样方式差异导致结果难以比较,而该数据集通过固定配置(如few-shot样本数)和标准化流程,确保了评测的一致性和透明性。它揭示了模型在复杂推理(如DROP的F1得分仅0.146)与简单常识(如WinoGrande准确率80.2%)之间的显著差距,为研究者识别模型能力短板、优化训练策略提供了实证依据。其意义在于促进了评估科学的严谨性,推动了LLM能力图谱的精细化构建。
实际应用
在实际应用中,该数据集为AI模型选型与部署提供了关键参考。企业或开发者可通过查询模型在GSM8K(数学推理)和MMLU(多学科知识)等任务上的准确率,评估其是否适用于教育辅导、智能客服或知识问答等场景。例如,airoboros-65b在GSM8K上22.1%的准确率提示其在数学领域存在局限,而WinoGrande上的高表现则表明其擅长处理常识性歧义。这种细粒度评估帮助用户避免盲目依赖单一指标,从而在资源受限时做出更明智的模型选择,提升实际系统的鲁棒性与可靠性。
数据集最近研究
最新研究方向
当前,大语言模型(LLM)的性能评估已成为推动模型迭代与落地应用的核心议题。在此背景下,open-llm-leaderboard/details_jondurbin__airoboros-65b-gpt4-m2.0 数据集应运而生,它专为追踪和复现airoboros-65b-gpt4-m2.0模型在Open LLM Leaderboard上的评测结果而设计。该数据集不仅系统性地记录了模型在ARC挑战、DROP、GSM8K、HellaSwag及涵盖57个学科的MMLU等多样化基准任务上的细粒度表现,还通过多轮次运行的时间戳分片,为研究者提供了纵向比较模型性能演化与评估稳定性的珍贵资源。其结构化存储的评测日志与聚合指标,使得深入分析模型在常识推理、数学解题与知识问答等前沿方向的能力边界成为可能,从而为改进模型训练策略、构建更鲁棒的评估体系提供了坚实的数据基石,对推动LLM领域的透明化与可复现研究具有深远意义。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作