five

open-llm-leaderboard-old/details_euclaise__Ferret-7B

收藏
Hugging Face2023-11-25 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_euclaise__Ferret-7B
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Evaluation run of euclaise/Ferret-7B dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [euclaise/Ferret-7B](https://huggingface.co/euclaise/Ferret-7B) on the [Open LLM\ \ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\ \nThe dataset is composed of 64 configuration, each one coresponding to one of the\ \ evaluated task.\n\nThe dataset has been created from 4 run(s). Each run can be\ \ found as a specific split in each configuration, the split being named using the\ \ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\ \nAn additional configuration \"results\" store all the aggregated results of the\ \ run (and is used to compute and display the aggregated metrics on the [Open LLM\ \ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\ \nTo load the details from a run, you can for instance do the following:\n```python\n\ from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_euclaise__Ferret-7B_public\"\ ,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\ These are the [latest results from run 2023-11-25T03:02:51.561913](https://huggingface.co/datasets/open-llm-leaderboard/details_euclaise__Ferret-7B_public/blob/main/results_2023-11-25T03-02-51.561913.json)(note\ \ that their might be results for other tasks in the repos if successive evals didn't\ \ cover the same tasks. You find each in the results and the \"latest\" split for\ \ each eval):\n\n```python\n{\n \"all\": {\n \"acc\": 0.5959498298780265,\n\ \ \"acc_stderr\": 0.033140542039800984,\n \"acc_norm\": 0.6066121431850051,\n\ \ \"acc_norm_stderr\": 0.03397883209596383,\n \"mc1\": 0.2778457772337821,\n\ \ \"mc1_stderr\": 0.015680929364024647,\n \"mc2\": 0.4001041496199733,\n\ \ \"mc2_stderr\": 0.014571617835253216,\n \"em\": 0.001572986577181208,\n\ \ \"em_stderr\": 0.00040584511324177344,\n \"f1\": 0.06579802852349013,\n\ \ \"f1_stderr\": 0.0014930152947085352\n },\n \"harness|arc:challenge|25\"\ : {\n \"acc\": 0.5767918088737202,\n \"acc_stderr\": 0.014438036220848029,\n\ \ \"acc_norm\": 0.6228668941979523,\n \"acc_norm_stderr\": 0.014163366896192596\n\ \ },\n \"harness|hellaswag|10\": {\n \"acc\": 0.6248755228042223,\n\ \ \"acc_stderr\": 0.004831655648489736,\n \"acc_norm\": 0.8130850428201554,\n\ \ \"acc_norm_stderr\": 0.00389046515827181\n },\n \"harness|hendrycksTest-abstract_algebra|5\"\ : {\n \"acc\": 0.33,\n \"acc_stderr\": 0.04725815626252606,\n \ \ \"acc_norm\": 0.33,\n \"acc_norm_stderr\": 0.04725815626252606\n \ \ },\n \"harness|hendrycksTest-anatomy|5\": {\n \"acc\": 0.6,\n \ \ \"acc_stderr\": 0.042320736951515885,\n \"acc_norm\": 0.6,\n \"\ acc_norm_stderr\": 0.042320736951515885\n },\n \"harness|hendrycksTest-astronomy|5\"\ : {\n \"acc\": 0.6644736842105263,\n \"acc_stderr\": 0.03842498559395269,\n\ \ \"acc_norm\": 0.6644736842105263,\n \"acc_norm_stderr\": 0.03842498559395269\n\ \ },\n \"harness|hendrycksTest-business_ethics|5\": {\n \"acc\": 0.55,\n\ \ \"acc_stderr\": 0.05,\n \"acc_norm\": 0.55,\n \"acc_norm_stderr\"\ : 0.05\n },\n \"harness|hendrycksTest-clinical_knowledge|5\": {\n \"\ acc\": 0.6679245283018868,\n \"acc_stderr\": 0.02898545565233439,\n \ \ \"acc_norm\": 0.6679245283018868,\n \"acc_norm_stderr\": 0.02898545565233439\n\ \ },\n \"harness|hendrycksTest-college_biology|5\": {\n \"acc\": 0.6944444444444444,\n\ \ \"acc_stderr\": 0.03852084696008534,\n \"acc_norm\": 0.6944444444444444,\n\ \ \"acc_norm_stderr\": 0.03852084696008534\n },\n \"harness|hendrycksTest-college_chemistry|5\"\ : {\n \"acc\": 0.49,\n \"acc_stderr\": 0.05024183937956912,\n \ \ \"acc_norm\": 0.49,\n \"acc_norm_stderr\": 0.05024183937956912\n \ \ },\n \"harness|hendrycksTest-college_computer_science|5\": {\n \"acc\"\ : 0.47,\n \"acc_stderr\": 0.050161355804659205,\n \"acc_norm\": 0.47,\n\ \ \"acc_norm_stderr\": 0.050161355804659205\n },\n \"harness|hendrycksTest-college_mathematics|5\"\ : {\n \"acc\": 0.33,\n \"acc_stderr\": 0.04725815626252604,\n \ \ \"acc_norm\": 0.33,\n \"acc_norm_stderr\": 0.04725815626252604\n \ \ },\n \"harness|hendrycksTest-college_medicine|5\": {\n \"acc\": 0.5780346820809249,\n\ \ \"acc_stderr\": 0.0376574669386515,\n \"acc_norm\": 0.5780346820809249,\n\ \ \"acc_norm_stderr\": 0.0376574669386515\n },\n \"harness|hendrycksTest-college_physics|5\"\ : {\n \"acc\": 0.37254901960784315,\n \"acc_stderr\": 0.048108401480826346,\n\ \ \"acc_norm\": 0.37254901960784315,\n \"acc_norm_stderr\": 0.048108401480826346\n\ \ },\n \"harness|hendrycksTest-computer_security|5\": {\n \"acc\":\ \ 0.71,\n \"acc_stderr\": 0.04560480215720684,\n \"acc_norm\": 0.71,\n\ \ \"acc_norm_stderr\": 0.04560480215720684\n },\n \"harness|hendrycksTest-conceptual_physics|5\"\ : {\n \"acc\": 0.5659574468085107,\n \"acc_stderr\": 0.03240038086792747,\n\ \ \"acc_norm\": 0.5659574468085107,\n \"acc_norm_stderr\": 0.03240038086792747\n\ \ },\n \"harness|hendrycksTest-econometrics|5\": {\n \"acc\": 0.5,\n\ \ \"acc_stderr\": 0.047036043419179864,\n \"acc_norm\": 0.5,\n \ \ \"acc_norm_stderr\": 0.047036043419179864\n },\n \"harness|hendrycksTest-electrical_engineering|5\"\ : {\n \"acc\": 0.6137931034482759,\n \"acc_stderr\": 0.04057324734419035,\n\ \ \"acc_norm\": 0.6137931034482759,\n \"acc_norm_stderr\": 0.04057324734419035\n\ \ },\n \"harness|hendrycksTest-elementary_mathematics|5\": {\n \"acc\"\ : 0.3915343915343915,\n \"acc_stderr\": 0.025138091388851088,\n \"\ acc_norm\": 0.3915343915343915,\n \"acc_norm_stderr\": 0.025138091388851088\n\ \ },\n \"harness|hendrycksTest-formal_logic|5\": {\n \"acc\": 0.3888888888888889,\n\ \ \"acc_stderr\": 0.0436031486007746,\n \"acc_norm\": 0.3888888888888889,\n\ \ \"acc_norm_stderr\": 0.0436031486007746\n },\n \"harness|hendrycksTest-global_facts|5\"\ : {\n \"acc\": 0.43,\n \"acc_stderr\": 0.04975698519562428,\n \ \ \"acc_norm\": 0.43,\n \"acc_norm_stderr\": 0.04975698519562428\n \ \ },\n \"harness|hendrycksTest-high_school_biology|5\": {\n \"acc\": 0.6709677419354839,\n\ \ \"acc_stderr\": 0.026729499068349954,\n \"acc_norm\": 0.6709677419354839,\n\ \ \"acc_norm_stderr\": 0.026729499068349954\n },\n \"harness|hendrycksTest-high_school_chemistry|5\"\ : {\n \"acc\": 0.4729064039408867,\n \"acc_stderr\": 0.03512819077876106,\n\ \ \"acc_norm\": 0.4729064039408867,\n \"acc_norm_stderr\": 0.03512819077876106\n\ \ },\n \"harness|hendrycksTest-high_school_computer_science|5\": {\n \ \ \"acc\": 0.61,\n \"acc_stderr\": 0.04902071300001975,\n \"acc_norm\"\ : 0.61,\n \"acc_norm_stderr\": 0.04902071300001975\n },\n \"harness|hendrycksTest-high_school_european_history|5\"\ : {\n \"acc\": 0.7393939393939394,\n \"acc_stderr\": 0.034277431758165236,\n\ \ \"acc_norm\": 0.7393939393939394,\n \"acc_norm_stderr\": 0.034277431758165236\n\ \ },\n \"harness|hendrycksTest-high_school_geography|5\": {\n \"acc\"\ : 0.7424242424242424,\n \"acc_stderr\": 0.03115626951964683,\n \"\ acc_norm\": 0.7424242424242424,\n \"acc_norm_stderr\": 0.03115626951964683\n\ \ },\n \"harness|hendrycksTest-high_school_government_and_politics|5\": {\n\ \ \"acc\": 0.8341968911917098,\n \"acc_stderr\": 0.026839845022314415,\n\ \ \"acc_norm\": 0.8341968911917098,\n \"acc_norm_stderr\": 0.026839845022314415\n\ \ },\n \"harness|hendrycksTest-high_school_macroeconomics|5\": {\n \ \ \"acc\": 0.5897435897435898,\n \"acc_stderr\": 0.024939313906940798,\n\ \ \"acc_norm\": 0.5897435897435898,\n \"acc_norm_stderr\": 0.024939313906940798\n\ \ },\n \"harness|hendrycksTest-high_school_mathematics|5\": {\n \"\ acc\": 0.29259259259259257,\n \"acc_stderr\": 0.027738969632176088,\n \ \ \"acc_norm\": 0.29259259259259257,\n \"acc_norm_stderr\": 0.027738969632176088\n\ \ },\n \"harness|hendrycksTest-high_school_microeconomics|5\": {\n \ \ \"acc\": 0.6512605042016807,\n \"acc_stderr\": 0.030956636328566545,\n\ \ \"acc_norm\": 0.6512605042016807,\n \"acc_norm_stderr\": 0.030956636328566545\n\ \ },\n \"harness|hendrycksTest-high_school_physics|5\": {\n \"acc\"\ : 0.33774834437086093,\n \"acc_stderr\": 0.0386155754625517,\n \"\ acc_norm\": 0.33774834437086093,\n \"acc_norm_stderr\": 0.0386155754625517\n\ \ },\n \"harness|hendrycksTest-high_school_psychology|5\": {\n \"acc\"\ : 0.7908256880733945,\n \"acc_stderr\": 0.017437937173343233,\n \"\ acc_norm\": 0.7908256880733945,\n \"acc_norm_stderr\": 0.017437937173343233\n\ \ },\n \"harness|hendrycksTest-high_school_statistics|5\": {\n \"acc\"\ : 0.4166666666666667,\n \"acc_stderr\": 0.03362277436608043,\n \"\ acc_norm\": 0.4166666666666667,\n \"acc_norm_stderr\": 0.03362277436608043\n\ \ },\n \"harness|hendrycksTest-high_school_us_history|5\": {\n \"acc\"\ : 0.7696078431372549,\n \"acc_stderr\": 0.02955429260569506,\n \"\ acc_norm\": 0.7696078431372549,\n \"acc_norm_stderr\": 0.02955429260569506\n\ \ },\n \"harness|hendrycksTest-high_school_world_history|5\": {\n \"\ acc\": 0.7637130801687764,\n \"acc_stderr\": 0.02765215314415926,\n \ \ \"acc_norm\": 0.7637130801687764,\n \"acc_norm_stderr\": 0.02765215314415926\n\ \ },\n \"harness|hendrycksTest-human_aging|5\": {\n \"acc\": 0.6905829596412556,\n\ \ \"acc_stderr\": 0.03102441174057221,\n \"acc_norm\": 0.6905829596412556,\n\ \ \"acc_norm_stderr\": 0.03102441174057221\n },\n \"harness|hendrycksTest-human_sexuality|5\"\ : {\n \"acc\": 0.732824427480916,\n \"acc_stderr\": 0.038808483010823944,\n\ \ \"acc_norm\": 0.732824427480916,\n \"acc_norm_stderr\": 0.038808483010823944\n\ \ },\n \"harness|hendrycksTest-international_law|5\": {\n \"acc\":\ \ 0.7272727272727273,\n \"acc_stderr\": 0.04065578140908705,\n \"\ acc_norm\": 0.7272727272727273,\n \"acc_norm_stderr\": 0.04065578140908705\n\ \ },\n \"harness|hendrycksTest-jurisprudence|5\": {\n \"acc\": 0.7685185185185185,\n\ \ \"acc_stderr\": 0.04077494709252626,\n \"acc_norm\": 0.7685185185185185,\n\ \ \"acc_norm_stderr\": 0.04077494709252626\n },\n \"harness|hendrycksTest-logical_fallacies|5\"\ : {\n \"acc\": 0.7239263803680982,\n \"acc_stderr\": 0.035123852837050475,\n\ \ \"acc_norm\": 0.7239263803680982,\n \"acc_norm_stderr\": 0.035123852837050475\n\ \ },\n \"harness|hendrycksTest-machine_learning|5\": {\n \"acc\": 0.41964285714285715,\n\ \ \"acc_stderr\": 0.04684099321077106,\n \"acc_norm\": 0.41964285714285715,\n\ \ \"acc_norm_stderr\": 0.04684099321077106\n },\n \"harness|hendrycksTest-management|5\"\ : {\n \"acc\": 0.8058252427184466,\n \"acc_stderr\": 0.03916667762822585,\n\ \ \"acc_norm\": 0.8058252427184466,\n \"acc_norm_stderr\": 0.03916667762822585\n\ \ },\n \"harness|hendrycksTest-marketing|5\": {\n \"acc\": 0.8034188034188035,\n\ \ \"acc_stderr\": 0.026035386098951292,\n \"acc_norm\": 0.8034188034188035,\n\ \ \"acc_norm_stderr\": 0.026035386098951292\n },\n \"harness|hendrycksTest-medical_genetics|5\"\ : {\n \"acc\": 0.64,\n \"acc_stderr\": 0.04824181513244218,\n \ \ \"acc_norm\": 0.64,\n \"acc_norm_stderr\": 0.04824181513244218\n \ \ },\n \"harness|hendrycksTest-miscellaneous|5\": {\n \"acc\": 0.789272030651341,\n\ \ \"acc_stderr\": 0.014583812465862543,\n \"acc_norm\": 0.789272030651341,\n\ \ \"acc_norm_stderr\": 0.014583812465862543\n },\n \"harness|hendrycksTest-moral_disputes|5\"\ : {\n \"acc\": 0.630057803468208,\n \"acc_stderr\": 0.025992472029306376,\n\ \ \"acc_norm\": 0.630057803468208,\n \"acc_norm_stderr\": 0.025992472029306376\n\ \ },\n \"harness|hendrycksTest-moral_scenarios|5\": {\n \"acc\": 0.38212290502793295,\n\ \ \"acc_stderr\": 0.016251139711570762,\n \"acc_norm\": 0.38212290502793295,\n\ \ \"acc_norm_stderr\": 0.016251139711570762\n },\n \"harness|hendrycksTest-nutrition|5\"\ : {\n \"acc\": 0.6601307189542484,\n \"acc_stderr\": 0.02712195607138886,\n\ \ \"acc_norm\": 0.6601307189542484,\n \"acc_norm_stderr\": 0.02712195607138886\n\ \ },\n \"harness|hendrycksTest-philosophy|5\": {\n \"acc\": 0.6527331189710611,\n\ \ \"acc_stderr\": 0.027040745502307336,\n \"acc_norm\": 0.6527331189710611,\n\ \ \"acc_norm_stderr\": 0.027040745502307336\n },\n \"harness|hendrycksTest-prehistory|5\"\ : {\n \"acc\": 0.6790123456790124,\n \"acc_stderr\": 0.025976566010862737,\n\ \ \"acc_norm\": 0.6790123456790124,\n \"acc_norm_stderr\": 0.025976566010862737\n\ \ },\n \"harness|hendrycksTest-professional_accounting|5\": {\n \"\ acc\": 0.450354609929078,\n \"acc_stderr\": 0.02968010556502904,\n \ \ \"acc_norm\": 0.450354609929078,\n \"acc_norm_stderr\": 0.02968010556502904\n\ \ },\n \"harness|hendrycksTest-professional_law|5\": {\n \"acc\": 0.3924380704041721,\n\ \ \"acc_stderr\": 0.012471243669229106,\n \"acc_norm\": 0.3924380704041721,\n\ \ \"acc_norm_stderr\": 0.012471243669229106\n },\n \"harness|hendrycksTest-professional_medicine|5\"\ : {\n \"acc\": 0.6066176470588235,\n \"acc_stderr\": 0.029674288281311155,\n\ \ \"acc_norm\": 0.6066176470588235,\n \"acc_norm_stderr\": 0.029674288281311155\n\ \ },\n \"harness|hendrycksTest-professional_psychology|5\": {\n \"\ acc\": 0.6160130718954249,\n \"acc_stderr\": 0.01967580813528151,\n \ \ \"acc_norm\": 0.6160130718954249,\n \"acc_norm_stderr\": 0.01967580813528151\n\ \ },\n \"harness|hendrycksTest-public_relations|5\": {\n \"acc\": 0.6272727272727273,\n\ \ \"acc_stderr\": 0.046313813194254656,\n \"acc_norm\": 0.6272727272727273,\n\ \ \"acc_norm_stderr\": 0.046313813194254656\n },\n \"harness|hendrycksTest-security_studies|5\"\ : {\n \"acc\": 0.636734693877551,\n \"acc_stderr\": 0.03078905113903081,\n\ \ \"acc_norm\": 0.636734693877551,\n \"acc_norm_stderr\": 0.03078905113903081\n\ \ },\n \"harness|hendrycksTest-sociology|5\": {\n \"acc\": 0.7761194029850746,\n\ \ \"acc_stderr\": 0.029475250236017204,\n \"acc_norm\": 0.7761194029850746,\n\ \ \"acc_norm_stderr\": 0.029475250236017204\n },\n \"harness|hendrycksTest-us_foreign_policy|5\"\ : {\n \"acc\": 0.83,\n \"acc_stderr\": 0.0377525168068637,\n \ \ \"acc_norm\": 0.83,\n \"acc_norm_stderr\": 0.0377525168068637\n },\n\ \ \"harness|hendrycksTest-virology|5\": {\n \"acc\": 0.5,\n \"\ acc_stderr\": 0.03892494720807614,\n \"acc_norm\": 0.5,\n \"acc_norm_stderr\"\ : 0.03892494720807614\n },\n \"harness|hendrycksTest-world_religions|5\":\ \ {\n \"acc\": 0.783625730994152,\n \"acc_stderr\": 0.031581495393387324,\n\ \ \"acc_norm\": 0.783625730994152,\n \"acc_norm_stderr\": 0.031581495393387324\n\ \ },\n \"harness|truthfulqa:mc|0\": {\n \"mc1\": 0.2778457772337821,\n\ \ \"mc1_stderr\": 0.015680929364024647,\n \"mc2\": 0.4001041496199733,\n\ \ \"mc2_stderr\": 0.014571617835253216\n },\n \"harness|winogrande|5\"\ : {\n \"acc\": 0.77663772691397,\n \"acc_stderr\": 0.011705697565205198\n\ \ },\n \"harness|drop|3\": {\n \"em\": 0.001572986577181208,\n \ \ \"em_stderr\": 0.00040584511324177344,\n \"f1\": 0.06579802852349013,\n\ \ \"f1_stderr\": 0.0014930152947085352\n },\n \"harness|gsm8k|5\":\ \ {\n \"acc\": 0.02047005307050796,\n \"acc_stderr\": 0.003900413385915721\n\ \ }\n}\n```" repo_url: https://huggingface.co/euclaise/Ferret-7B leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard point_of_contact: clementine@hf.co configs: - config_name: harness_arc_challenge_25 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|arc:challenge|25_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|arc:challenge|25_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|arc:challenge|25_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|arc:challenge|25_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|arc:challenge|25_2023-11-25T03-02-51.561913.parquet' - config_name: harness_drop_3 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|drop|3_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|drop|3_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|drop|3_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|drop|3_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|drop|3_2023-11-25T03-02-51.561913.parquet' - config_name: harness_gsm8k_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|gsm8k|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|gsm8k|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|gsm8k|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|gsm8k|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|gsm8k|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hellaswag_10 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hellaswag|10_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hellaswag|10_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hellaswag|10_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hellaswag|10_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hellaswag|10_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-management|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-11-19T15-52-54.018947.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-management|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-11-25T02-44-41.580934.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-management|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-11-25T02-50-24.454188.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-management|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-management|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-virology|5_2023-11-25T03-02-51.561913.parquet' - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_abstract_algebra_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-abstract_algebra|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_anatomy_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-anatomy|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_astronomy_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-astronomy|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_business_ethics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-business_ethics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_clinical_knowledge_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-clinical_knowledge|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_biology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_biology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_chemistry_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_chemistry|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_computer_science_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_computer_science|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_mathematics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_mathematics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_medicine_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_medicine|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_college_physics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-college_physics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_computer_security_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-computer_security|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_conceptual_physics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-conceptual_physics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_econometrics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-econometrics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_electrical_engineering_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-electrical_engineering|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_elementary_mathematics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-elementary_mathematics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_formal_logic_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-formal_logic|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_global_facts_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-global_facts|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_biology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_biology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_chemistry_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_chemistry|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_computer_science_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_computer_science|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_european_history_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_european_history|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_geography_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_geography|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_government_and_politics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_macroeconomics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_mathematics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_mathematics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_microeconomics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_microeconomics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_physics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_physics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_psychology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_psychology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_statistics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_statistics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_us_history_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_us_history|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_high_school_world_history_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-high_school_world_history|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_human_aging_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_aging|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_human_sexuality_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-human_sexuality|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_international_law_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-international_law|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-international_law|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_jurisprudence_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-jurisprudence|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_logical_fallacies_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-logical_fallacies|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_machine_learning_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-machine_learning|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_management_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-management|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-management|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-management|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-management|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-management|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_marketing_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-marketing|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-marketing|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_medical_genetics_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-medical_genetics|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_miscellaneous_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-miscellaneous|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_moral_disputes_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_disputes|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_moral_scenarios_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-moral_scenarios|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_nutrition_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-nutrition|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_philosophy_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-philosophy|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_prehistory_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-prehistory|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_professional_accounting_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_accounting|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_professional_law_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_law|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_professional_medicine_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_medicine|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_professional_psychology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-professional_psychology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_public_relations_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-public_relations|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_security_studies_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-security_studies|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_sociology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-sociology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-sociology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_us_foreign_policy_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-us_foreign_policy|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_virology_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-virology|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-virology|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-virology|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-virology|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-virology|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_hendrycksTest_world_religions_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|hendrycksTest-world_religions|5_2023-11-25T03-02-51.561913.parquet' - config_name: harness_truthfulqa_mc_0 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|truthfulqa:mc|0_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|truthfulqa:mc|0_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|truthfulqa:mc|0_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|truthfulqa:mc|0_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|truthfulqa:mc|0_2023-11-25T03-02-51.561913.parquet' - config_name: harness_winogrande_5 data_files: - split: 2023_11_19T15_52_54.018947 path: - '**/details_harness|winogrande|5_2023-11-19T15-52-54.018947.parquet' - split: 2023_11_25T02_44_41.580934 path: - '**/details_harness|winogrande|5_2023-11-25T02-44-41.580934.parquet' - split: 2023_11_25T02_50_24.454188 path: - '**/details_harness|winogrande|5_2023-11-25T02-50-24.454188.parquet' - split: 2023_11_25T03_02_51.561913 path: - '**/details_harness|winogrande|5_2023-11-25T03-02-51.561913.parquet' - split: latest path: - '**/details_harness|winogrande|5_2023-11-25T03-02-51.561913.parquet' - config_name: results data_files: - split: 2023_11_19T15_52_54.018947 path: - results_2023-11-19T15-52-54.018947.parquet - split: 2023_11_25T02_44_41.580934 path: - results_2023-11-25T02-44-41.580934.parquet - split: 2023_11_25T02_50_24.454188 path: - results_2023-11-25T02-50-24.454188.parquet - split: 2023_11_25T03_02_51.561913 path: - results_2023-11-25T03-02-51.561913.parquet - split: latest path: - results_2023-11-25T03-02-51.561913.parquet --- # Dataset Card for Evaluation run of euclaise/Ferret-7B ## Dataset Description - **Homepage:** - **Repository:** https://huggingface.co/euclaise/Ferret-7B - **Paper:** - **Leaderboard:** https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard - **Point of Contact:** clementine@hf.co ### Dataset Summary Dataset automatically created during the evaluation run of model [euclaise/Ferret-7B](https://huggingface.co/euclaise/Ferret-7B) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). The dataset is composed of 64 configuration, each one coresponding to one of the evaluated task. The dataset has been created from 4 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_euclaise__Ferret-7B_public", "harness_winogrande_5", split="train") ``` ## Latest results These are the [latest results from run 2023-11-25T03:02:51.561913](https://huggingface.co/datasets/open-llm-leaderboard/details_euclaise__Ferret-7B_public/blob/main/results_2023-11-25T03-02-51.561913.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "acc": 0.5959498298780265, "acc_stderr": 0.033140542039800984, "acc_norm": 0.6066121431850051, "acc_norm_stderr": 0.03397883209596383, "mc1": 0.2778457772337821, "mc1_stderr": 0.015680929364024647, "mc2": 0.4001041496199733, "mc2_stderr": 0.014571617835253216, "em": 0.001572986577181208, "em_stderr": 0.00040584511324177344, "f1": 0.06579802852349013, "f1_stderr": 0.0014930152947085352 }, "harness|arc:challenge|25": { "acc": 0.5767918088737202, "acc_stderr": 0.014438036220848029, "acc_norm": 0.6228668941979523, "acc_norm_stderr": 0.014163366896192596 }, "harness|hellaswag|10": { "acc": 0.6248755228042223, "acc_stderr": 0.004831655648489736, "acc_norm": 0.8130850428201554, "acc_norm_stderr": 0.00389046515827181 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.33, "acc_stderr": 0.04725815626252606, "acc_norm": 0.33, "acc_norm_stderr": 0.04725815626252606 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6, "acc_stderr": 0.042320736951515885, "acc_norm": 0.6, "acc_norm_stderr": 0.042320736951515885 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6644736842105263, "acc_stderr": 0.03842498559395269, "acc_norm": 0.6644736842105263, "acc_norm_stderr": 0.03842498559395269 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.55, "acc_stderr": 0.05, "acc_norm": 0.55, "acc_norm_stderr": 0.05 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6679245283018868, "acc_stderr": 0.02898545565233439, "acc_norm": 0.6679245283018868, "acc_norm_stderr": 0.02898545565233439 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.6944444444444444, "acc_stderr": 0.03852084696008534, "acc_norm": 0.6944444444444444, "acc_norm_stderr": 0.03852084696008534 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.47, "acc_stderr": 0.050161355804659205, "acc_norm": 0.47, "acc_norm_stderr": 0.050161355804659205 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.33, "acc_stderr": 0.04725815626252604, "acc_norm": 0.33, "acc_norm_stderr": 0.04725815626252604 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.5780346820809249, "acc_stderr": 0.0376574669386515, "acc_norm": 0.5780346820809249, "acc_norm_stderr": 0.0376574669386515 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.37254901960784315, "acc_stderr": 0.048108401480826346, "acc_norm": 0.37254901960784315, "acc_norm_stderr": 0.048108401480826346 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.71, "acc_stderr": 0.04560480215720684, "acc_norm": 0.71, "acc_norm_stderr": 0.04560480215720684 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5659574468085107, "acc_stderr": 0.03240038086792747, "acc_norm": 0.5659574468085107, "acc_norm_stderr": 0.03240038086792747 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5, "acc_stderr": 0.047036043419179864, "acc_norm": 0.5, "acc_norm_stderr": 0.047036043419179864 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6137931034482759, "acc_stderr": 0.04057324734419035, "acc_norm": 0.6137931034482759, "acc_norm_stderr": 0.04057324734419035 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.3915343915343915, "acc_stderr": 0.025138091388851088, "acc_norm": 0.3915343915343915, "acc_norm_stderr": 0.025138091388851088 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.3888888888888889, "acc_stderr": 0.0436031486007746, "acc_norm": 0.3888888888888889, "acc_norm_stderr": 0.0436031486007746 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.43, "acc_stderr": 0.04975698519562428, "acc_norm": 0.43, "acc_norm_stderr": 0.04975698519562428 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.6709677419354839, "acc_stderr": 0.026729499068349954, "acc_norm": 0.6709677419354839, "acc_norm_stderr": 0.026729499068349954 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.4729064039408867, "acc_stderr": 0.03512819077876106, "acc_norm": 0.4729064039408867, "acc_norm_stderr": 0.03512819077876106 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.61, "acc_stderr": 0.04902071300001975, "acc_norm": 0.61, "acc_norm_stderr": 0.04902071300001975 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7393939393939394, "acc_stderr": 0.034277431758165236, "acc_norm": 0.7393939393939394, "acc_norm_stderr": 0.034277431758165236 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7424242424242424, "acc_stderr": 0.03115626951964683, "acc_norm": 0.7424242424242424, "acc_norm_stderr": 0.03115626951964683 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8341968911917098, "acc_stderr": 0.026839845022314415, "acc_norm": 0.8341968911917098, "acc_norm_stderr": 0.026839845022314415 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.5897435897435898, "acc_stderr": 0.024939313906940798, "acc_norm": 0.5897435897435898, "acc_norm_stderr": 0.024939313906940798 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.29259259259259257, "acc_stderr": 0.027738969632176088, "acc_norm": 0.29259259259259257, "acc_norm_stderr": 0.027738969632176088 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.6512605042016807, "acc_stderr": 0.030956636328566545, "acc_norm": 0.6512605042016807, "acc_norm_stderr": 0.030956636328566545 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.33774834437086093, "acc_stderr": 0.0386155754625517, "acc_norm": 0.33774834437086093, "acc_norm_stderr": 0.0386155754625517 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.7908256880733945, "acc_stderr": 0.017437937173343233, "acc_norm": 0.7908256880733945, "acc_norm_stderr": 0.017437937173343233 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.4166666666666667, "acc_stderr": 0.03362277436608043, "acc_norm": 0.4166666666666667, "acc_norm_stderr": 0.03362277436608043 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.7696078431372549, "acc_stderr": 0.02955429260569506, "acc_norm": 0.7696078431372549, "acc_norm_stderr": 0.02955429260569506 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.7637130801687764, "acc_stderr": 0.02765215314415926, "acc_norm": 0.7637130801687764, "acc_norm_stderr": 0.02765215314415926 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.6905829596412556, "acc_stderr": 0.03102441174057221, "acc_norm": 0.6905829596412556, "acc_norm_stderr": 0.03102441174057221 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.732824427480916, "acc_stderr": 0.038808483010823944, "acc_norm": 0.732824427480916, "acc_norm_stderr": 0.038808483010823944 }, "harness|hendrycksTest-international_law|5": { "acc": 0.7272727272727273, "acc_stderr": 0.04065578140908705, "acc_norm": 0.7272727272727273, "acc_norm_stderr": 0.04065578140908705 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7685185185185185, "acc_stderr": 0.04077494709252626, "acc_norm": 0.7685185185185185, "acc_norm_stderr": 0.04077494709252626 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.7239263803680982, "acc_stderr": 0.035123852837050475, "acc_norm": 0.7239263803680982, "acc_norm_stderr": 0.035123852837050475 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.41964285714285715, "acc_stderr": 0.04684099321077106, "acc_norm": 0.41964285714285715, "acc_norm_stderr": 0.04684099321077106 }, "harness|hendrycksTest-management|5": { "acc": 0.8058252427184466, "acc_stderr": 0.03916667762822585, "acc_norm": 0.8058252427184466, "acc_norm_stderr": 0.03916667762822585 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8034188034188035, "acc_stderr": 0.026035386098951292, "acc_norm": 0.8034188034188035, "acc_norm_stderr": 0.026035386098951292 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.64, "acc_stderr": 0.04824181513244218, "acc_norm": 0.64, "acc_norm_stderr": 0.04824181513244218 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.789272030651341, "acc_stderr": 0.014583812465862543, "acc_norm": 0.789272030651341, "acc_norm_stderr": 0.014583812465862543 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.630057803468208, "acc_stderr": 0.025992472029306376, "acc_norm": 0.630057803468208, "acc_norm_stderr": 0.025992472029306376 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.38212290502793295, "acc_stderr": 0.016251139711570762, "acc_norm": 0.38212290502793295, "acc_norm_stderr": 0.016251139711570762 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.6601307189542484, "acc_stderr": 0.02712195607138886, "acc_norm": 0.6601307189542484, "acc_norm_stderr": 0.02712195607138886 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.6527331189710611, "acc_stderr": 0.027040745502307336, "acc_norm": 0.6527331189710611, "acc_norm_stderr": 0.027040745502307336 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.6790123456790124, "acc_stderr": 0.025976566010862737, "acc_norm": 0.6790123456790124, "acc_norm_stderr": 0.025976566010862737 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.450354609929078, "acc_stderr": 0.02968010556502904, "acc_norm": 0.450354609929078, "acc_norm_stderr": 0.02968010556502904 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.3924380704041721, "acc_stderr": 0.012471243669229106, "acc_norm": 0.3924380704041721, "acc_norm_stderr": 0.012471243669229106 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6066176470588235, "acc_stderr": 0.029674288281311155, "acc_norm": 0.6066176470588235, "acc_norm_stderr": 0.029674288281311155 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6160130718954249, "acc_stderr": 0.01967580813528151, "acc_norm": 0.6160130718954249, "acc_norm_stderr": 0.01967580813528151 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6272727272727273, "acc_stderr": 0.046313813194254656, "acc_norm": 0.6272727272727273, "acc_norm_stderr": 0.046313813194254656 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.636734693877551, "acc_stderr": 0.03078905113903081, "acc_norm": 0.636734693877551, "acc_norm_stderr": 0.03078905113903081 }, "harness|hendrycksTest-sociology|5": { "acc": 0.7761194029850746, "acc_stderr": 0.029475250236017204, "acc_norm": 0.7761194029850746, "acc_norm_stderr": 0.029475250236017204 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.83, "acc_stderr": 0.0377525168068637, "acc_norm": 0.83, "acc_norm_stderr": 0.0377525168068637 }, "harness|hendrycksTest-virology|5": { "acc": 0.5, "acc_stderr": 0.03892494720807614, "acc_norm": 0.5, "acc_norm_stderr": 0.03892494720807614 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.783625730994152, "acc_stderr": 0.031581495393387324, "acc_norm": 0.783625730994152, "acc_norm_stderr": 0.031581495393387324 }, "harness|truthfulqa:mc|0": { "mc1": 0.2778457772337821, "mc1_stderr": 0.015680929364024647, "mc2": 0.4001041496199733, "mc2_stderr": 0.014571617835253216 }, "harness|winogrande|5": { "acc": 0.77663772691397, "acc_stderr": 0.011705697565205198 }, "harness|drop|3": { "em": 0.001572986577181208, "em_stderr": 0.00040584511324177344, "f1": 0.06579802852349013, "f1_stderr": 0.0014930152947085352 }, "harness|gsm8k|5": { "acc": 0.02047005307050796, "acc_stderr": 0.003900413385915721 } } ``` ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

数据集简介

该数据集是在评估模型 euclaise/Ferret-7BOpen LLM Leaderboard 上的运行过程中自动创建的。

数据集组成

  • 数据集包含 64 个配置,每个配置对应一个评估任务。
  • 数据集从 4 次运行中创建,每次运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示在 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_euclaise__Ferret-7B_public", "harness_winogrande_5", split="train")

最新结果

以下是 2023-11-25T03:02:51.561913 运行的最新结果

python { "all": { "acc": 0.5959498298780265, "acc_stderr": 0.033140542039800984, "acc_norm": 0.6066121431850051, "acc_norm_stderr": 0.03397883209596383, "mc1": 0.2778457772337821, "mc1_stderr": 0.015680929364024647, "mc2": 0.4001041496199733, "mc2_stderr": 0.014571617835253216, "em": 0.001572986577181208, "em_stderr": 0.00040584511324177344, "f1": 0.06579802852349013, "f1_stderr": 0.0014930152947085352 }, "harness|arc:challenge|25": { "acc": 0.5767918088737202, "acc_stderr": 0.014438036220848029, "acc_norm": 0.6228668941979523, "acc_norm_stderr": 0.014163366896192596 }, "harness|hellaswag|10": { "acc": 0.6248755228042223, "acc_stderr": 0.004831655648489736, "acc_norm": 0.8130850428201554, "acc_norm_stderr": 0.00389046515827181 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.33, "acc_stderr": 0.04725815626252606, "acc_norm": 0.33, "acc_norm_stderr": 0.04725815626252606 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6, "acc_stderr": 0.042320736951515885, "acc_norm": 0.6, "acc_norm_stderr": 0.042320736951515885 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6644736842105263, "acc_stderr": 0.03842498559395269, "acc_norm": 0.6644736842105263, "acc_norm_stderr": 0.03842498559395269 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.55, "acc_stderr": 0.05, "acc_norm": 0.55, "acc_norm_stderr": 0.05 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6679245283018868, "acc_stderr": 0.02898545565233439, "acc_norm": 0.6679245283018868, "acc_norm_stderr": 0.02898545565233439 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.6944444444444444, "acc_stderr": 0.03852084696008534, "acc_norm": 0.6944444444444444, "acc_norm_stderr": 0.03852084696008534 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.47, "acc_stderr": 0.050161355804659205, "acc_norm": 0.47, "acc_norm_stderr": 0.050161355804659205 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.33, "acc_stderr": 0.04725815626252604, "acc_norm": 0.33, "acc_norm_stderr": 0.04725815626252604 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.5780346820809249, "acc_stderr": 0.0376574669386515, "acc_norm": 0.5780346820809249, "acc_norm_stderr": 0.0376574669386515 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.37254901960784315, "acc_stderr": 0.048108401480826346, "acc_norm": 0.37254901960784315, "acc_norm_stderr": 0.048108401480826346 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.71, "acc_stderr": 0.04560480215720684, "acc_norm": 0.71, "acc_norm_stderr": 0.04560480215720684 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5659574468085107, "acc_stderr": 0.03240038086792747, "acc_norm": 0.5659574468085107, "acc_norm_stderr": 0.03240038086792747 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5, "acc_stderr": 0.047036043419179864, "acc_norm": 0.5, "acc_norm_stderr": 0.047036043419179864 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6137931034482759, "acc_stderr": 0.04057324734419035, "acc_norm": 0.6137931034482759, "acc_norm_stderr": 0.04057324734419035 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.3915343915343915, "acc_stderr": 0.025138091388851088, "acc_norm": 0.3915343915343915, "acc_norm_stderr": 0.025138091388851088 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.3888888888888889, "acc_stderr": 0.0436031486007746, "acc_norm": 0.3888888888888889, "acc_norm_stderr": 0.0436031486007746 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.43, "acc_stderr": 0.04975698519562428, "acc_norm": 0.43, "acc_norm_stderr": 0.04975698519562428 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.6709677419354839, "acc_stderr": 0.026729499068349954, "acc_norm": 0.6709677419354839, "acc_norm_stderr": 0.026729499068349954 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.4729064039408867, "acc_stderr": 0.03512819077876106, "acc_norm": 0.4729064039408867, "acc_norm_stderr": 0.03512819077876106 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.61, "acc_stderr": 0.04902071300001975, "acc_norm": 0.61, "acc_norm_stderr": 0.04902071300001975 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7393939393939394, "acc_stderr": 0.034277431758165236, "acc_norm": 0.7393939393939394, "acc_norm_stderr": 0.034277431758165236 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7424242424242424, "acc_stderr": 0.03115626951964683, "acc_norm": 0.7424242424242424, "acc_norm_stderr": 0.03115626951964683 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8341968911917098, "acc_stderr": 0.026839845022314415, "acc_norm": 0.8341968911917098, "acc_norm_stderr": 0.026839845022314415 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.5897435897435898, "acc_stderr": 0.024939313906940798, "acc_norm": 0.5897435897435898, "acc_norm_stderr": 0.024939313906940798 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.29259259259259257, "acc_stderr": 0.027738969632176088, "acc_norm": 0.2925925925925

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作