open-llm-leaderboard-old/details_ZySec-AI__ZySec-7B
收藏Hugging Face2024-03-22 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_ZySec-AI__ZySec-7B
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Evaluation run of ZySec-AI/ZySec-7B
dataset_summary: "Dataset automatically created during the evaluation run of model\
\ [ZySec-AI/ZySec-7B](https://huggingface.co/ZySec-AI/ZySec-7B) on the [Open LLM\
\ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\
\nThe dataset is composed of 63 configuration, each one coresponding to one of the\
\ evaluated task.\n\nThe dataset has been created from 1 run(s). Each run can be\
\ found as a specific split in each configuration, the split being named using the\
\ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\
\nAn additional configuration \"results\" store all the aggregated results of the\
\ run (and is used to compute and display the aggregated metrics on the [Open LLM\
\ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\
\nTo load the details from a run, you can for instance do the following:\n```python\n\
from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_ZySec-AI__ZySec-7B\"\
,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\
These are the [latest results from run 2024-03-22T02:20:54.183750](https://huggingface.co/datasets/open-llm-leaderboard/details_ZySec-AI__ZySec-7B/blob/main/results_2024-03-22T02-20-54.183750.json)(note\
\ that their might be results for other tasks in the repos if successive evals didn't\
\ cover the same tasks. You find each in the results and the \"latest\" split for\
\ each eval):\n\n```python\n{\n \"all\": {\n \"acc\": 0.5833427561467389,\n\
\ \"acc_stderr\": 0.03342813901320813,\n \"acc_norm\": 0.5898847098167896,\n\
\ \"acc_norm_stderr\": 0.034124395634022184,\n \"mc1\": 0.3561811505507956,\n\
\ \"mc1_stderr\": 0.016763790728446335,\n \"mc2\": 0.5111163939897228,\n\
\ \"mc2_stderr\": 0.015418045555863789\n },\n \"harness|arc:challenge|25\"\
: {\n \"acc\": 0.5204778156996587,\n \"acc_stderr\": 0.01459913135303501,\n\
\ \"acc_norm\": 0.5750853242320819,\n \"acc_norm_stderr\": 0.014445698968520769\n\
\ },\n \"harness|hellaswag|10\": {\n \"acc\": 0.5978888667596096,\n\
\ \"acc_stderr\": 0.0048932206350117925,\n \"acc_norm\": 0.7972515435172276,\n\
\ \"acc_norm_stderr\": 0.004012249939174913\n },\n \"harness|hendrycksTest-abstract_algebra|5\"\
: {\n \"acc\": 0.25,\n \"acc_stderr\": 0.04351941398892446,\n \
\ \"acc_norm\": 0.25,\n \"acc_norm_stderr\": 0.04351941398892446\n \
\ },\n \"harness|hendrycksTest-anatomy|5\": {\n \"acc\": 0.5481481481481482,\n\
\ \"acc_stderr\": 0.042992689054808644,\n \"acc_norm\": 0.5481481481481482,\n\
\ \"acc_norm_stderr\": 0.042992689054808644\n },\n \"harness|hendrycksTest-astronomy|5\"\
: {\n \"acc\": 0.618421052631579,\n \"acc_stderr\": 0.03953173377749194,\n\
\ \"acc_norm\": 0.618421052631579,\n \"acc_norm_stderr\": 0.03953173377749194\n\
\ },\n \"harness|hendrycksTest-business_ethics|5\": {\n \"acc\": 0.57,\n\
\ \"acc_stderr\": 0.049756985195624284,\n \"acc_norm\": 0.57,\n \
\ \"acc_norm_stderr\": 0.049756985195624284\n },\n \"harness|hendrycksTest-clinical_knowledge|5\"\
: {\n \"acc\": 0.6490566037735849,\n \"acc_stderr\": 0.02937364625323469,\n\
\ \"acc_norm\": 0.6490566037735849,\n \"acc_norm_stderr\": 0.02937364625323469\n\
\ },\n \"harness|hendrycksTest-college_biology|5\": {\n \"acc\": 0.6319444444444444,\n\
\ \"acc_stderr\": 0.040329990539607195,\n \"acc_norm\": 0.6319444444444444,\n\
\ \"acc_norm_stderr\": 0.040329990539607195\n },\n \"harness|hendrycksTest-college_chemistry|5\"\
: {\n \"acc\": 0.46,\n \"acc_stderr\": 0.05009082659620333,\n \
\ \"acc_norm\": 0.46,\n \"acc_norm_stderr\": 0.05009082659620333\n \
\ },\n \"harness|hendrycksTest-college_computer_science|5\": {\n \"acc\"\
: 0.48,\n \"acc_stderr\": 0.050211673156867795,\n \"acc_norm\": 0.48,\n\
\ \"acc_norm_stderr\": 0.050211673156867795\n },\n \"harness|hendrycksTest-college_mathematics|5\"\
: {\n \"acc\": 0.32,\n \"acc_stderr\": 0.046882617226215034,\n \
\ \"acc_norm\": 0.32,\n \"acc_norm_stderr\": 0.046882617226215034\n \
\ },\n \"harness|hendrycksTest-college_medicine|5\": {\n \"acc\": 0.5953757225433526,\n\
\ \"acc_stderr\": 0.03742461193887249,\n \"acc_norm\": 0.5953757225433526,\n\
\ \"acc_norm_stderr\": 0.03742461193887249\n },\n \"harness|hendrycksTest-college_physics|5\"\
: {\n \"acc\": 0.3333333333333333,\n \"acc_stderr\": 0.04690650298201942,\n\
\ \"acc_norm\": 0.3333333333333333,\n \"acc_norm_stderr\": 0.04690650298201942\n\
\ },\n \"harness|hendrycksTest-computer_security|5\": {\n \"acc\":\
\ 0.75,\n \"acc_stderr\": 0.04351941398892446,\n \"acc_norm\": 0.75,\n\
\ \"acc_norm_stderr\": 0.04351941398892446\n },\n \"harness|hendrycksTest-conceptual_physics|5\"\
: {\n \"acc\": 0.5404255319148936,\n \"acc_stderr\": 0.03257901482099835,\n\
\ \"acc_norm\": 0.5404255319148936,\n \"acc_norm_stderr\": 0.03257901482099835\n\
\ },\n \"harness|hendrycksTest-econometrics|5\": {\n \"acc\": 0.42105263157894735,\n\
\ \"acc_stderr\": 0.046446020912223177,\n \"acc_norm\": 0.42105263157894735,\n\
\ \"acc_norm_stderr\": 0.046446020912223177\n },\n \"harness|hendrycksTest-electrical_engineering|5\"\
: {\n \"acc\": 0.5379310344827586,\n \"acc_stderr\": 0.04154659671707548,\n\
\ \"acc_norm\": 0.5379310344827586,\n \"acc_norm_stderr\": 0.04154659671707548\n\
\ },\n \"harness|hendrycksTest-elementary_mathematics|5\": {\n \"acc\"\
: 0.41005291005291006,\n \"acc_stderr\": 0.025331202438944433,\n \"\
acc_norm\": 0.41005291005291006,\n \"acc_norm_stderr\": 0.025331202438944433\n\
\ },\n \"harness|hendrycksTest-formal_logic|5\": {\n \"acc\": 0.40476190476190477,\n\
\ \"acc_stderr\": 0.04390259265377562,\n \"acc_norm\": 0.40476190476190477,\n\
\ \"acc_norm_stderr\": 0.04390259265377562\n },\n \"harness|hendrycksTest-global_facts|5\"\
: {\n \"acc\": 0.36,\n \"acc_stderr\": 0.04824181513244218,\n \
\ \"acc_norm\": 0.36,\n \"acc_norm_stderr\": 0.04824181513244218\n \
\ },\n \"harness|hendrycksTest-high_school_biology|5\": {\n \"acc\": 0.6838709677419355,\n\
\ \"acc_stderr\": 0.026450874489042778,\n \"acc_norm\": 0.6838709677419355,\n\
\ \"acc_norm_stderr\": 0.026450874489042778\n },\n \"harness|hendrycksTest-high_school_chemistry|5\"\
: {\n \"acc\": 0.4729064039408867,\n \"acc_stderr\": 0.03512819077876106,\n\
\ \"acc_norm\": 0.4729064039408867,\n \"acc_norm_stderr\": 0.03512819077876106\n\
\ },\n \"harness|hendrycksTest-high_school_computer_science|5\": {\n \
\ \"acc\": 0.63,\n \"acc_stderr\": 0.04852365870939099,\n \"acc_norm\"\
: 0.63,\n \"acc_norm_stderr\": 0.04852365870939099\n },\n \"harness|hendrycksTest-high_school_european_history|5\"\
: {\n \"acc\": 0.7090909090909091,\n \"acc_stderr\": 0.03546563019624335,\n\
\ \"acc_norm\": 0.7090909090909091,\n \"acc_norm_stderr\": 0.03546563019624335\n\
\ },\n \"harness|hendrycksTest-high_school_geography|5\": {\n \"acc\"\
: 0.7222222222222222,\n \"acc_stderr\": 0.031911782267135466,\n \"\
acc_norm\": 0.7222222222222222,\n \"acc_norm_stderr\": 0.031911782267135466\n\
\ },\n \"harness|hendrycksTest-high_school_government_and_politics|5\": {\n\
\ \"acc\": 0.844559585492228,\n \"acc_stderr\": 0.026148483469153314,\n\
\ \"acc_norm\": 0.844559585492228,\n \"acc_norm_stderr\": 0.026148483469153314\n\
\ },\n \"harness|hendrycksTest-high_school_macroeconomics|5\": {\n \
\ \"acc\": 0.5794871794871795,\n \"acc_stderr\": 0.025028610276710862,\n\
\ \"acc_norm\": 0.5794871794871795,\n \"acc_norm_stderr\": 0.025028610276710862\n\
\ },\n \"harness|hendrycksTest-high_school_mathematics|5\": {\n \"\
acc\": 0.35555555555555557,\n \"acc_stderr\": 0.02918571494985741,\n \
\ \"acc_norm\": 0.35555555555555557,\n \"acc_norm_stderr\": 0.02918571494985741\n\
\ },\n \"harness|hendrycksTest-high_school_microeconomics|5\": {\n \
\ \"acc\": 0.6470588235294118,\n \"acc_stderr\": 0.031041941304059274,\n\
\ \"acc_norm\": 0.6470588235294118,\n \"acc_norm_stderr\": 0.031041941304059274\n\
\ },\n \"harness|hendrycksTest-high_school_physics|5\": {\n \"acc\"\
: 0.31125827814569534,\n \"acc_stderr\": 0.03780445850526733,\n \"\
acc_norm\": 0.31125827814569534,\n \"acc_norm_stderr\": 0.03780445850526733\n\
\ },\n \"harness|hendrycksTest-high_school_psychology|5\": {\n \"acc\"\
: 0.7761467889908257,\n \"acc_stderr\": 0.017871217767790222,\n \"\
acc_norm\": 0.7761467889908257,\n \"acc_norm_stderr\": 0.017871217767790222\n\
\ },\n \"harness|hendrycksTest-high_school_statistics|5\": {\n \"acc\"\
: 0.46296296296296297,\n \"acc_stderr\": 0.03400603625538271,\n \"\
acc_norm\": 0.46296296296296297,\n \"acc_norm_stderr\": 0.03400603625538271\n\
\ },\n \"harness|hendrycksTest-high_school_us_history|5\": {\n \"acc\"\
: 0.7107843137254902,\n \"acc_stderr\": 0.03182231867647553,\n \"\
acc_norm\": 0.7107843137254902,\n \"acc_norm_stderr\": 0.03182231867647553\n\
\ },\n \"harness|hendrycksTest-high_school_world_history|5\": {\n \"\
acc\": 0.7468354430379747,\n \"acc_stderr\": 0.028304657943035303,\n \
\ \"acc_norm\": 0.7468354430379747,\n \"acc_norm_stderr\": 0.028304657943035303\n\
\ },\n \"harness|hendrycksTest-human_aging|5\": {\n \"acc\": 0.6322869955156951,\n\
\ \"acc_stderr\": 0.03236198350928276,\n \"acc_norm\": 0.6322869955156951,\n\
\ \"acc_norm_stderr\": 0.03236198350928276\n },\n \"harness|hendrycksTest-human_sexuality|5\"\
: {\n \"acc\": 0.6564885496183206,\n \"acc_stderr\": 0.041649760719448786,\n\
\ \"acc_norm\": 0.6564885496183206,\n \"acc_norm_stderr\": 0.041649760719448786\n\
\ },\n \"harness|hendrycksTest-international_law|5\": {\n \"acc\":\
\ 0.7603305785123967,\n \"acc_stderr\": 0.038968789850704164,\n \"\
acc_norm\": 0.7603305785123967,\n \"acc_norm_stderr\": 0.038968789850704164\n\
\ },\n \"harness|hendrycksTest-jurisprudence|5\": {\n \"acc\": 0.7222222222222222,\n\
\ \"acc_stderr\": 0.04330043749650743,\n \"acc_norm\": 0.7222222222222222,\n\
\ \"acc_norm_stderr\": 0.04330043749650743\n },\n \"harness|hendrycksTest-logical_fallacies|5\"\
: {\n \"acc\": 0.7177914110429447,\n \"acc_stderr\": 0.03536117886664743,\n\
\ \"acc_norm\": 0.7177914110429447,\n \"acc_norm_stderr\": 0.03536117886664743\n\
\ },\n \"harness|hendrycksTest-machine_learning|5\": {\n \"acc\": 0.42857142857142855,\n\
\ \"acc_stderr\": 0.04697113923010212,\n \"acc_norm\": 0.42857142857142855,\n\
\ \"acc_norm_stderr\": 0.04697113923010212\n },\n \"harness|hendrycksTest-management|5\"\
: {\n \"acc\": 0.7475728155339806,\n \"acc_stderr\": 0.04301250399690878,\n\
\ \"acc_norm\": 0.7475728155339806,\n \"acc_norm_stderr\": 0.04301250399690878\n\
\ },\n \"harness|hendrycksTest-marketing|5\": {\n \"acc\": 0.8547008547008547,\n\
\ \"acc_stderr\": 0.023086635086841407,\n \"acc_norm\": 0.8547008547008547,\n\
\ \"acc_norm_stderr\": 0.023086635086841407\n },\n \"harness|hendrycksTest-medical_genetics|5\"\
: {\n \"acc\": 0.62,\n \"acc_stderr\": 0.048783173121456316,\n \
\ \"acc_norm\": 0.62,\n \"acc_norm_stderr\": 0.048783173121456316\n \
\ },\n \"harness|hendrycksTest-miscellaneous|5\": {\n \"acc\": 0.7701149425287356,\n\
\ \"acc_stderr\": 0.01504630184669181,\n \"acc_norm\": 0.7701149425287356,\n\
\ \"acc_norm_stderr\": 0.01504630184669181\n },\n \"harness|hendrycksTest-moral_disputes|5\"\
: {\n \"acc\": 0.6502890173410405,\n \"acc_stderr\": 0.025674281456531015,\n\
\ \"acc_norm\": 0.6502890173410405,\n \"acc_norm_stderr\": 0.025674281456531015\n\
\ },\n \"harness|hendrycksTest-moral_scenarios|5\": {\n \"acc\": 0.3005586592178771,\n\
\ \"acc_stderr\": 0.015334566806251159,\n \"acc_norm\": 0.3005586592178771,\n\
\ \"acc_norm_stderr\": 0.015334566806251159\n },\n \"harness|hendrycksTest-nutrition|5\"\
: {\n \"acc\": 0.6372549019607843,\n \"acc_stderr\": 0.027530078447110314,\n\
\ \"acc_norm\": 0.6372549019607843,\n \"acc_norm_stderr\": 0.027530078447110314\n\
\ },\n \"harness|hendrycksTest-philosophy|5\": {\n \"acc\": 0.6430868167202572,\n\
\ \"acc_stderr\": 0.027210420375934016,\n \"acc_norm\": 0.6430868167202572,\n\
\ \"acc_norm_stderr\": 0.027210420375934016\n },\n \"harness|hendrycksTest-prehistory|5\"\
: {\n \"acc\": 0.6790123456790124,\n \"acc_stderr\": 0.025976566010862744,\n\
\ \"acc_norm\": 0.6790123456790124,\n \"acc_norm_stderr\": 0.025976566010862744\n\
\ },\n \"harness|hendrycksTest-professional_accounting|5\": {\n \"\
acc\": 0.4326241134751773,\n \"acc_stderr\": 0.02955545423677885,\n \
\ \"acc_norm\": 0.4326241134751773,\n \"acc_norm_stderr\": 0.02955545423677885\n\
\ },\n \"harness|hendrycksTest-professional_law|5\": {\n \"acc\": 0.38722294654498046,\n\
\ \"acc_stderr\": 0.012441155326854922,\n \"acc_norm\": 0.38722294654498046,\n\
\ \"acc_norm_stderr\": 0.012441155326854922\n },\n \"harness|hendrycksTest-professional_medicine|5\"\
: {\n \"acc\": 0.5845588235294118,\n \"acc_stderr\": 0.02993534270787774,\n\
\ \"acc_norm\": 0.5845588235294118,\n \"acc_norm_stderr\": 0.02993534270787774\n\
\ },\n \"harness|hendrycksTest-professional_psychology|5\": {\n \"\
acc\": 0.576797385620915,\n \"acc_stderr\": 0.019987809769482064,\n \
\ \"acc_norm\": 0.576797385620915,\n \"acc_norm_stderr\": 0.019987809769482064\n\
\ },\n \"harness|hendrycksTest-public_relations|5\": {\n \"acc\": 0.6,\n\
\ \"acc_stderr\": 0.0469237132203465,\n \"acc_norm\": 0.6,\n \
\ \"acc_norm_stderr\": 0.0469237132203465\n },\n \"harness|hendrycksTest-security_studies|5\"\
: {\n \"acc\": 0.636734693877551,\n \"acc_stderr\": 0.030789051139030806,\n\
\ \"acc_norm\": 0.636734693877551,\n \"acc_norm_stderr\": 0.030789051139030806\n\
\ },\n \"harness|hendrycksTest-sociology|5\": {\n \"acc\": 0.7860696517412935,\n\
\ \"acc_stderr\": 0.028996909693328913,\n \"acc_norm\": 0.7860696517412935,\n\
\ \"acc_norm_stderr\": 0.028996909693328913\n },\n \"harness|hendrycksTest-us_foreign_policy|5\"\
: {\n \"acc\": 0.82,\n \"acc_stderr\": 0.03861229196653697,\n \
\ \"acc_norm\": 0.82,\n \"acc_norm_stderr\": 0.03861229196653697\n \
\ },\n \"harness|hendrycksTest-virology|5\": {\n \"acc\": 0.463855421686747,\n\
\ \"acc_stderr\": 0.03882310850890594,\n \"acc_norm\": 0.463855421686747,\n\
\ \"acc_norm_stderr\": 0.03882310850890594\n },\n \"harness|hendrycksTest-world_religions|5\"\
: {\n \"acc\": 0.7894736842105263,\n \"acc_stderr\": 0.03126781714663179,\n\
\ \"acc_norm\": 0.7894736842105263,\n \"acc_norm_stderr\": 0.03126781714663179\n\
\ },\n \"harness|truthfulqa:mc|0\": {\n \"mc1\": 0.3561811505507956,\n\
\ \"mc1_stderr\": 0.016763790728446335,\n \"mc2\": 0.5111163939897228,\n\
\ \"mc2_stderr\": 0.015418045555863789\n },\n \"harness|winogrande|5\"\
: {\n \"acc\": 0.745067087608524,\n \"acc_stderr\": 0.012248806969376422\n\
\ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.2896133434420015,\n \
\ \"acc_stderr\": 0.012493927348659629\n }\n}\n```"
repo_url: https://huggingface.co/ZySec-AI/ZySec-7B
leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
point_of_contact: clementine@hf.co
configs:
- config_name: harness_arc_challenge_25
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|arc:challenge|25_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|arc:challenge|25_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_gsm8k_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|gsm8k|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|gsm8k|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hellaswag_10
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hellaswag|10_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hellaswag|10_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-management|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-virology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-management|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-virology|5_2024-03-22T02-20-54.183750.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_abstract_algebra_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_anatomy_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-anatomy|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-anatomy|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_astronomy_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-astronomy|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-astronomy|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_business_ethics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_clinical_knowledge_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_biology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_biology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_biology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_chemistry_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_computer_science_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_mathematics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_medicine_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_college_physics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-college_physics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_physics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_computer_security_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-computer_security|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-computer_security|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_conceptual_physics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_econometrics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-econometrics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-econometrics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_electrical_engineering_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_elementary_mathematics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_formal_logic_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_global_facts_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-global_facts|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-global_facts|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_biology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_chemistry_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_computer_science_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_european_history_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_geography_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_government_and_politics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_macroeconomics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_mathematics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_microeconomics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_physics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_psychology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_statistics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_us_history_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_high_school_world_history_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_human_aging_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-human_aging|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_aging|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_human_sexuality_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_international_law_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-international_law|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-international_law|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_jurisprudence_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_logical_fallacies_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_machine_learning_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_management_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-management|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-management|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_marketing_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-marketing|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-marketing|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_medical_genetics_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_miscellaneous_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_moral_disputes_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_moral_scenarios_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_nutrition_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-nutrition|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-nutrition|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_philosophy_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-philosophy|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-philosophy|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_prehistory_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-prehistory|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-prehistory|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_professional_accounting_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_professional_law_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-professional_law|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_law|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_professional_medicine_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_professional_psychology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_public_relations_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-public_relations|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-public_relations|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_security_studies_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-security_studies|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-security_studies|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_sociology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-sociology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-sociology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_us_foreign_policy_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_virology_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-virology|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-virology|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_hendrycksTest_world_religions_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|hendrycksTest-world_religions|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-world_religions|5_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_truthfulqa_mc_0
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|truthfulqa:mc|0_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|truthfulqa:mc|0_2024-03-22T02-20-54.183750.parquet'
- config_name: harness_winogrande_5
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- '**/details_harness|winogrande|5_2024-03-22T02-20-54.183750.parquet'
- split: latest
path:
- '**/details_harness|winogrande|5_2024-03-22T02-20-54.183750.parquet'
- config_name: results
data_files:
- split: 2024_03_22T02_20_54.183750
path:
- results_2024-03-22T02-20-54.183750.parquet
- split: latest
path:
- results_2024-03-22T02-20-54.183750.parquet
---
# Dataset Card for Evaluation run of ZySec-AI/ZySec-7B
<!-- Provide a quick summary of the dataset. -->
Dataset automatically created during the evaluation run of model [ZySec-AI/ZySec-7B](https://huggingface.co/ZySec-AI/ZySec-7B) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task.
The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.
An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).
To load the details from a run, you can for instance do the following:
```python
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_ZySec-AI__ZySec-7B",
"harness_winogrande_5",
split="train")
```
## Latest results
These are the [latest results from run 2024-03-22T02:20:54.183750](https://huggingface.co/datasets/open-llm-leaderboard/details_ZySec-AI__ZySec-7B/blob/main/results_2024-03-22T02-20-54.183750.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):
```python
{
"all": {
"acc": 0.5833427561467389,
"acc_stderr": 0.03342813901320813,
"acc_norm": 0.5898847098167896,
"acc_norm_stderr": 0.034124395634022184,
"mc1": 0.3561811505507956,
"mc1_stderr": 0.016763790728446335,
"mc2": 0.5111163939897228,
"mc2_stderr": 0.015418045555863789
},
"harness|arc:challenge|25": {
"acc": 0.5204778156996587,
"acc_stderr": 0.01459913135303501,
"acc_norm": 0.5750853242320819,
"acc_norm_stderr": 0.014445698968520769
},
"harness|hellaswag|10": {
"acc": 0.5978888667596096,
"acc_stderr": 0.0048932206350117925,
"acc_norm": 0.7972515435172276,
"acc_norm_stderr": 0.004012249939174913
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.25,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.25,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-anatomy|5": {
"acc": 0.5481481481481482,
"acc_stderr": 0.042992689054808644,
"acc_norm": 0.5481481481481482,
"acc_norm_stderr": 0.042992689054808644
},
"harness|hendrycksTest-astronomy|5": {
"acc": 0.618421052631579,
"acc_stderr": 0.03953173377749194,
"acc_norm": 0.618421052631579,
"acc_norm_stderr": 0.03953173377749194
},
"harness|hendrycksTest-business_ethics|5": {
"acc": 0.57,
"acc_stderr": 0.049756985195624284,
"acc_norm": 0.57,
"acc_norm_stderr": 0.049756985195624284
},
"harness|hendrycksTest-clinical_knowledge|5": {
"acc": 0.6490566037735849,
"acc_stderr": 0.02937364625323469,
"acc_norm": 0.6490566037735849,
"acc_norm_stderr": 0.02937364625323469
},
"harness|hendrycksTest-college_biology|5": {
"acc": 0.6319444444444444,
"acc_stderr": 0.040329990539607195,
"acc_norm": 0.6319444444444444,
"acc_norm_stderr": 0.040329990539607195
},
"harness|hendrycksTest-college_chemistry|5": {
"acc": 0.46,
"acc_stderr": 0.05009082659620333,
"acc_norm": 0.46,
"acc_norm_stderr": 0.05009082659620333
},
"harness|hendrycksTest-college_computer_science|5": {
"acc": 0.48,
"acc_stderr": 0.050211673156867795,
"acc_norm": 0.48,
"acc_norm_stderr": 0.050211673156867795
},
"harness|hendrycksTest-college_mathematics|5": {
"acc": 0.32,
"acc_stderr": 0.046882617226215034,
"acc_norm": 0.32,
"acc_norm_stderr": 0.046882617226215034
},
"harness|hendrycksTest-college_medicine|5": {
"acc": 0.5953757225433526,
"acc_stderr": 0.03742461193887249,
"acc_norm": 0.5953757225433526,
"acc_norm_stderr": 0.03742461193887249
},
"harness|hendrycksTest-college_physics|5": {
"acc": 0.3333333333333333,
"acc_stderr": 0.04690650298201942,
"acc_norm": 0.3333333333333333,
"acc_norm_stderr": 0.04690650298201942
},
"harness|hendrycksTest-computer_security|5": {
"acc": 0.75,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.75,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-conceptual_physics|5": {
"acc": 0.5404255319148936,
"acc_stderr": 0.03257901482099835,
"acc_norm": 0.5404255319148936,
"acc_norm_stderr": 0.03257901482099835
},
"harness|hendrycksTest-econometrics|5": {
"acc": 0.42105263157894735,
"acc_stderr": 0.046446020912223177,
"acc_norm": 0.42105263157894735,
"acc_norm_stderr": 0.046446020912223177
},
"harness|hendrycksTest-electrical_engineering|5": {
"acc": 0.5379310344827586,
"acc_stderr": 0.04154659671707548,
"acc_norm": 0.5379310344827586,
"acc_norm_stderr": 0.04154659671707548
},
"harness|hendrycksTest-elementary_mathematics|5": {
"acc": 0.41005291005291006,
"acc_stderr": 0.025331202438944433,
"acc_norm": 0.41005291005291006,
"acc_norm_stderr": 0.025331202438944433
},
"harness|hendrycksTest-formal_logic|5": {
"acc": 0.40476190476190477,
"acc_stderr": 0.04390259265377562,
"acc_norm": 0.40476190476190477,
"acc_norm_stderr": 0.04390259265377562
},
"harness|hendrycksTest-global_facts|5": {
"acc": 0.36,
"acc_stderr": 0.04824181513244218,
"acc_norm": 0.36,
"acc_norm_stderr": 0.04824181513244218
},
"harness|hendrycksTest-high_school_biology|5": {
"acc": 0.6838709677419355,
"acc_stderr": 0.026450874489042778,
"acc_norm": 0.6838709677419355,
"acc_norm_stderr": 0.026450874489042778
},
"harness|hendrycksTest-high_school_chemistry|5": {
"acc": 0.4729064039408867,
"acc_stderr": 0.03512819077876106,
"acc_norm": 0.4729064039408867,
"acc_norm_stderr": 0.03512819077876106
},
"harness|hendrycksTest-high_school_computer_science|5": {
"acc": 0.63,
"acc_stderr": 0.04852365870939099,
"acc_norm": 0.63,
"acc_norm_stderr": 0.04852365870939099
},
"harness|hendrycksTest-high_school_european_history|5": {
"acc": 0.7090909090909091,
"acc_stderr": 0.03546563019624335,
"acc_norm": 0.7090909090909091,
"acc_norm_stderr": 0.03546563019624335
},
"harness|hendrycksTest-high_school_geography|5": {
"acc": 0.7222222222222222,
"acc_stderr": 0.031911782267135466,
"acc_norm": 0.7222222222222222,
"acc_norm_stderr": 0.031911782267135466
},
"harness|hendrycksTest-high_school_government_and_politics|5": {
"acc": 0.844559585492228,
"acc_stderr": 0.026148483469153314,
"acc_norm": 0.844559585492228,
"acc_norm_stderr": 0.026148483469153314
},
"harness|hendrycksTest-high_school_macroeconomics|5": {
"acc": 0.5794871794871795,
"acc_stderr": 0.025028610276710862,
"acc_norm": 0.5794871794871795,
"acc_norm_stderr": 0.025028610276710862
},
"harness|hendrycksTest-high_school_mathematics|5": {
"acc": 0.35555555555555557,
"acc_stderr": 0.02918571494985741,
"acc_norm": 0.35555555555555557,
"acc_norm_stderr": 0.02918571494985741
},
"harness|hendrycksTest-high_school_microeconomics|5": {
"acc": 0.6470588235294118,
"acc_stderr": 0.031041941304059274,
"acc_norm": 0.6470588235294118,
"acc_norm_stderr": 0.031041941304059274
},
"harness|hendrycksTest-high_school_physics|5": {
"acc": 0.31125827814569534,
"acc_stderr": 0.03780445850526733,
"acc_norm": 0.31125827814569534,
"acc_norm_stderr": 0.03780445850526733
},
"harness|hendrycksTest-high_school_psychology|5": {
"acc": 0.7761467889908257,
"acc_stderr": 0.017871217767790222,
"acc_norm": 0.7761467889908257,
"acc_norm_stderr": 0.017871217767790222
},
"harness|hendrycksTest-high_school_statistics|5": {
"acc": 0.46296296296296297,
"acc_stderr": 0.03400603625538271,
"acc_norm": 0.46296296296296297,
"acc_norm_stderr": 0.03400603625538271
},
"harness|hendrycksTest-high_school_us_history|5": {
"acc": 0.7107843137254902,
"acc_stderr": 0.03182231867647553,
"acc_norm": 0.7107843137254902,
"acc_norm_stderr": 0.03182231867647553
},
"harness|hendrycksTest-high_school_world_history|5": {
"acc": 0.7468354430379747,
"acc_stderr": 0.028304657943035303,
"acc_norm": 0.7468354430379747,
"acc_norm_stderr": 0.028304657943035303
},
"harness|hendrycksTest-human_aging|5": {
"acc": 0.6322869955156951,
"acc_stderr": 0.03236198350928276,
"acc_norm": 0.6322869955156951,
"acc_norm_stderr": 0.03236198350928276
},
"harness|hendrycksTest-human_sexuality|5": {
"acc": 0.6564885496183206,
"acc_stderr": 0.041649760719448786,
"acc_norm": 0.6564885496183206,
"acc_norm_stderr": 0.041649760719448786
},
"harness|hendrycksTest-international_law|5": {
"acc": 0.7603305785123967,
"acc_stderr": 0.038968789850704164,
"acc_norm": 0.7603305785123967,
"acc_norm_stderr": 0.038968789850704164
},
"harness|hendrycksTest-jurisprudence|5": {
"acc": 0.7222222222222222,
"acc_stderr": 0.04330043749650743,
"acc_norm": 0.7222222222222222,
"acc_norm_stderr": 0.04330043749650743
},
"harness|hendrycksTest-logical_fallacies|5": {
"acc": 0.7177914110429447,
"acc_stderr": 0.03536117886664743,
"acc_norm": 0.7177914110429447,
"acc_norm_stderr": 0.03536117886664743
},
"harness|hendrycksTest-machine_learning|5": {
"acc": 0.42857142857142855,
"acc_stderr": 0.04697113923010212,
"acc_norm": 0.42857142857142855,
"acc_norm_stderr": 0.04697113923010212
},
"harness|hendrycksTest-management|5": {
"acc": 0.7475728155339806,
"acc_stderr": 0.04301250399690878,
"acc_norm": 0.7475728155339806,
"acc_norm_stderr": 0.04301250399690878
},
"harness|hendrycksTest-marketing|5": {
"acc": 0.8547008547008547,
"acc_stderr": 0.023086635086841407,
"acc_norm": 0.8547008547008547,
"acc_norm_stderr": 0.023086635086841407
},
"harness|hendrycksTest-medical_genetics|5": {
"acc": 0.62,
"acc_stderr": 0.048783173121456316,
"acc_norm": 0.62,
"acc_norm_stderr": 0.048783173121456316
},
"harness|hendrycksTest-miscellaneous|5": {
"acc": 0.7701149425287356,
"acc_stderr": 0.01504630184669181,
"acc_norm": 0.7701149425287356,
"acc_norm_stderr": 0.01504630184669181
},
"harness|hendrycksTest-moral_disputes|5": {
"acc": 0.6502890173410405,
"acc_stderr": 0.025674281456531015,
"acc_norm": 0.6502890173410405,
"acc_norm_stderr": 0.025674281456531015
},
"harness|hendrycksTest-moral_scenarios|5": {
"acc": 0.3005586592178771,
"acc_stderr": 0.015334566806251159,
"acc_norm": 0.3005586592178771,
"acc_norm_stderr": 0.015334566806251159
},
"harness|hendrycksTest-nutrition|5": {
"acc": 0.6372549019607843,
"acc_stderr": 0.027530078447110314,
"acc_norm": 0.6372549019607843,
"acc_norm_stderr": 0.027530078447110314
},
"harness|hendrycksTest-philosophy|5": {
"acc": 0.6430868167202572,
"acc_stderr": 0.027210420375934016,
"acc_norm": 0.6430868167202572,
"acc_norm_stderr": 0.027210420375934016
},
"harness|hendrycksTest-prehistory|5": {
"acc": 0.6790123456790124,
"acc_stderr": 0.025976566010862744,
"acc_norm": 0.6790123456790124,
"acc_norm_stderr": 0.025976566010862744
},
"harness|hendrycksTest-professional_accounting|5": {
"acc": 0.4326241134751773,
"acc_stderr": 0.02955545423677885,
"acc_norm": 0.4326241134751773,
"acc_norm_stderr": 0.02955545423677885
},
"harness|hendrycksTest-professional_law|5": {
"acc": 0.38722294654498046,
"acc_stderr": 0.012441155326854922,
"acc_norm": 0.38722294654498046,
"acc_norm_stderr": 0.012441155326854922
},
"harness|hendrycksTest-professional_medicine|5": {
"acc": 0.5845588235294118,
"acc_stderr": 0.02993534270787774,
"acc_norm": 0.5845588235294118,
"acc_norm_stderr": 0.02993534270787774
},
"harness|hendrycksTest-professional_psychology|5": {
"acc": 0.576797385620915,
"acc_stderr": 0.019987809769482064,
"acc_norm": 0.576797385620915,
"acc_norm_stderr": 0.019987809769482064
},
"harness|hendrycksTest-public_relations|5": {
"acc": 0.6,
"acc_stderr": 0.0469237132203465,
"acc_norm": 0.6,
"acc_norm_stderr": 0.0469237132203465
},
"harness|hendrycksTest-security_studies|5": {
"acc": 0.636734693877551,
"acc_stderr": 0.030789051139030806,
"acc_norm": 0.636734693877551,
"acc_norm_stderr": 0.030789051139030806
},
"harness|hendrycksTest-sociology|5": {
"acc": 0.7860696517412935,
"acc_stderr": 0.028996909693328913,
"acc_norm": 0.7860696517412935,
"acc_norm_stderr": 0.028996909693328913
},
"harness|hendrycksTest-us_foreign_policy|5": {
"acc": 0.82,
"acc_stderr": 0.03861229196653697,
"acc_norm": 0.82,
"acc_norm_stderr": 0.03861229196653697
},
"harness|hendrycksTest-virology|5": {
"acc": 0.463855421686747,
"acc_stderr": 0.03882310850890594,
"acc_norm": 0.463855421686747,
"acc_norm_stderr": 0.03882310850890594
},
"harness|hendrycksTest-world_religions|5": {
"acc": 0.7894736842105263,
"acc_stderr": 0.03126781714663179,
"acc_norm": 0.7894736842105263,
"acc_norm_stderr": 0.03126781714663179
},
"harness|truthfulqa:mc|0": {
"mc1": 0.3561811505507956,
"mc1_stderr": 0.016763790728446335,
"mc2": 0.5111163939897228,
"mc2_stderr": 0.015418045555863789
},
"harness|winogrande|5": {
"acc": 0.745067087608524,
"acc_stderr": 0.012248806969376422
},
"harness|gsm8k|5": {
"acc": 0.2896133434420015,
"acc_stderr": 0.012493927348659629
}
}
```
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
[More Information Needed]
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
[More Information Needed]
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed]
提供机构:
open-llm-leaderboard-old
原始信息汇总
数据集概述
数据集基本信息
- 数据集名称: Evaluation run of ZySec-AI/ZySec-7B
- 数据集描述: 该数据集是在对模型 ZySec-AI/ZySec-7B 进行评估时自动创建的,用于 Open LLM Leaderboard。
数据集结构
- 配置数量: 63个配置,每个配置对应一个评估任务。
- 数据来源: 数据集从1次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。"train" 分割始终指向最新的结果。
- 额外配置: "results" 配置存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。
数据加载示例
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_ZySec-AI__ZySec-7B", "harness_winogrande_5", split="train")
最新结果
- 最新结果时间戳: 2024-03-22T02:20:54.183750
- 结果示例:
python
{
"all": {
"acc": 0.5833427561467389,
"acc_stderr": 0.03342813901320813,
"acc_norm": 0.5898847098167896,
"acc_norm_stderr": 0.034124395634022184,
"mc1": 0.3561811505507956,
"mc1_stderr": 0.016763790728446335,
"mc2": 0.5111163939897228,
"mc2_stderr": 0.015418045555863789
},
"harness|arc:challenge|25": {
"acc": 0.5204778156996587,
"acc_stderr": 0.01459913135303501,
"acc_norm": 0.5750853242320819,
"acc_norm_stderr": 0.014445698968520769
},
"harness|hellaswag|10": {
"acc": 0.5978888667596096,
"acc_stderr": 0.0048932206350117925,
"acc_norm": 0.7972515435172276,
"acc_norm_stderr": 0.004012249939174913
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.25,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.25,
"acc_norm_stderr": 0.04351941398892446
},
其他任务结果省略...
}
配置详情
-
配置名称: harness_arc_challenge_25
- 数据文件:
- 分割: 2024_03_22T02_20_54.183750
- 路径: **/details_harness|arc:challenge|25_2024-03-22T02-20-54.183750.parquet
- 分割: latest
- 路径: **/details_harness|arc:challenge|25_2024-03-22T02-20-54.183750.parquet
- 分割: 2024_03_22T02_20_54.183750
- 数据文件:
-
配置名称: harness_gsm8k_5
- 数据文件:
- 分割: 2024_03_22T02_20_54.183750
- 路径: **/details_harness|gsm8k|5_2024-03-22T02-20-54.183750.parquet
- 分割: latest
- 路径: **/details_harness|gsm8k|5_2024-03-22T02-20-54.183750.parquet
- 分割: 2024_03_22T02_20_54.183750
- 数据文件:
-
配置名称: harness_hellaswag_10
- 数据文件:
- 分割: 2024_03_22T02_20_54.183750
- 路径: **/details_harness|hellaswag|10_2024-03-22T02-20-54.183750.parquet
- 分割: latest
- 路径: **/details_harness|hellaswag|10_2024-03-22T02-20-54.183750.parquet
- 分割: 2024_03_22T02_20_54.183750
- 数据文件:
-
配置名称: harness_hendrycksTest_5
- 数据文件:
- 分割: 2024_03_22T02_20_54.183750
- 路径:
- **/details_harness|hendrycksTest-abstract_algebra|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-anatomy|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-astronomy|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-business_ethics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-clinical_knowledge|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_biology|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_chemistry|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_computer_science|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_mathematics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_medicine|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-college_physics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-computer_security|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-conceptual_physics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-econometrics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-electrical_engineering|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-elementary_mathematics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-formal_logic|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-global_facts|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_biology|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_chemistry|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_computer_science|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_european_history|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_geography|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_mathematics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_microeconomics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_physics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_psychology|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_statistics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_us_history|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-high_school_world_history|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-human_aging|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-human_sexuality|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-international_law|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-jurisprudence|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-logical_fallacies|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-machine_learning|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-management|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-marketing|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-medical_genetics|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-miscellaneous|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-moral_disputes|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-moral_scenarios|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-nutrition|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-philosophy|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-prehistory|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-professional_accounting|5_2024-03-22T02-20-54.183750.parquet
- **/details_harness|hendrycksTest-professional_law|5_2024-03-22T02-20-54.183750.parquet
- **/details
- 路径:
- 分割: 2024_03_22T02_20_54.183750
- 数据文件:
搜集汇总
数据集介绍

构建方式
在大型语言模型评估领域,本数据集作为开放LLM排行榜的自动化产物,其构建过程体现了系统性评估框架的严谨性。数据集通过标准化测试流程自动生成,涵盖了对ZySec-7B模型在63项不同任务上的性能评估结果。每个任务对应一个独立的配置单元,评估运行的时间戳被用作数据分割的标识,确保了结果的可追溯性。最新评估结果始终存储在“train”分割中,而“results”配置则汇总了所有运行的聚合指标,为模型性能的宏观分析提供了结构化数据基础。
特点
该数据集展现了多维度评估的鲜明特征,其核心在于覆盖了从常识推理到专业知识的广泛任务谱系。数据集不仅包含ARC挑战赛、HellaSwag等通用推理任务,更整合了涵盖数学、物理、法律、医学等57个专业学科的MMLU细分测试,形成了立体化的能力评估矩阵。每个任务配置均提供准确率及其标准误差的量化指标,数据以时间戳分割的形式保存,既保留了历史评估记录,又通过动态更新的“latest”分割确保了结果的时效性。这种结构设计使得研究者能够纵向追踪模型性能的演进轨迹。
使用方法
对于希望深入分析模型性能的研究者而言,该数据集提供了灵活的访问接口。通过Hugging Face的datasets库,用户可指定具体任务配置与数据分割进行定向加载,例如调用load_dataset函数并传入"harness_winogrande_5"配置即可获取相应任务的详细评估数据。数据集支持按时间戳检索历史评估记录,同时通过“train”分割便捷访问最新结果。这种设计使得用户既能进行特定任务的微观分析,也能利用聚合配置开展模型整体性能的宏观比较,为模型优化与学术研究提供了丰富的数据支撑。
背景与挑战
背景概述
在大型语言模型(LLM)迅猛发展的背景下,评估其综合能力成为推动技术演进的关键环节。由HuggingFace机构于2023年推出的Open LLM Leaderboard,旨在构建一个标准化、多维度的模型评估框架,以系统化地衡量不同LLM在推理、知识、伦理等多方面的性能。该平台通过整合ARC、HellaSwag、MMLU、TruthfulQA等权威基准测试,为研究社区提供了透明、可复现的模型比较基准,极大地促进了开源模型的迭代优化与学术交流。数据集“open-llm-leaderboard-old/details_ZySec-AI__ZySec-7B”作为该评估体系的具体产物,记录了ZySec-7B模型于2024年3月的详细评测结果,其创建不仅体现了对模型性能的精细化追踪,也为后续研究提供了宝贵的实证数据。
当前挑战
Open LLM Leaderboard所应对的核心挑战在于如何全面、公正地评估语言模型的复杂认知能力。传统评估方法往往局限于单一任务或领域,难以捕捉模型在跨学科知识、逻辑推理、伦理对齐等方面的真实表现。该平台通过集成多样化基准测试,试图解决评估维度碎片化与模型泛化能力难以量化的问题。在数据集构建过程中,技术挑战同样显著:如何高效整合来自不同基准的异构数据格式,确保评测流程的自动化与可扩展性;如何设计统一的指标聚合方法,以平衡各任务权重并减少评估偏差;以及如何管理持续演进的模型版本与评测结果,维持数据的一致性与可追溯性,这些均构成了数据集构建与维护中的关键难点。
常用场景
经典使用场景
在大型语言模型评估领域,该数据集作为Open LLM Leaderboard的评估结果记录,其经典使用场景在于为研究者提供模型ZySec-7B在多项标准化基准测试中的详细性能数据。通过涵盖ARC挑战赛、HellaSwag、MMLU及TruthfulQA等63项任务配置,该数据集能够系统性地展示模型在常识推理、语言理解、专业知识及真实性等方面的综合表现,为模型间的横向对比与性能分析奠定了数据基础。
解决学术问题
该数据集有效解决了大型语言模型评估中缺乏统一、透明且可复现的基准比较问题。通过整合多个权威评测任务的结果,它使得研究者能够深入探究模型在不同领域的能力边界,识别其优势与短板,从而推动模型优化方向的确定。其意义在于建立了标准化的评估框架,促进了开源模型社区的健康发展,并为模型能力的科学量化提供了可靠依据。
衍生相关工作
围绕该数据集衍生的经典工作主要包括基于Open LLM Leaderboard评估框架的系列研究。这些工作深入分析了不同模型架构、训练数据与规模对综合性能的影响,并催生了如HELM(Holistic Evaluation of Language Models)等更全面的评估倡议。此外,针对评估结果中暴露的模型缺陷,后续研究也提出了多种改进策略,例如通过指令微调或思维链技术来提升模型在数学推理与事实性问答上的准确性。
以上内容由遇见数据集搜集并总结生成



