open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b
收藏Hugging Face2024-04-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Evaluation run of Locutusque/llama-3-neural-chat-v1-8b
dataset_summary: "Dataset automatically created during the evaluation run of model\
\ [Locutusque/llama-3-neural-chat-v1-8b](https://huggingface.co/Locutusque/llama-3-neural-chat-v1-8b)\
\ on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).\n\
\nThe dataset is composed of 63 configuration, each one coresponding to one of the\
\ evaluated task.\n\nThe dataset has been created from 1 run(s). Each run can be\
\ found as a specific split in each configuration, the split being named using the\
\ timestamp of the run.The \"train\" split is always pointing to the latest results.\n\
\nAn additional configuration \"results\" store all the aggregated results of the\
\ run (and is used to compute and display the aggregated metrics on the [Open LLM\
\ Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).\n\
\nTo load the details from a run, you can for instance do the following:\n```python\n\
from datasets import load_dataset\ndata = load_dataset(\"open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b\"\
,\n\t\"harness_winogrande_5\",\n\tsplit=\"train\")\n```\n\n## Latest results\n\n\
These are the [latest results from run 2024-04-20T21:23:35.453083](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b/blob/main/results_2024-04-20T21-23-35.453083.json)(note\
\ that their might be results for other tasks in the repos if successive evals didn't\
\ cover the same tasks. You find each in the results and the \"latest\" split for\
\ each eval):\n\n```python\n{\n \"all\": {\n \"acc\": 0.6463757768465722,\n\
\ \"acc_stderr\": 0.032443331188726734,\n \"acc_norm\": 0.6495082726667307,\n\
\ \"acc_norm_stderr\": 0.033092506073055875,\n \"mc1\": 0.390452876376989,\n\
\ \"mc1_stderr\": 0.017078230743431448,\n \"mc2\": 0.5634222670773993,\n\
\ \"mc2_stderr\": 0.015351979609326523\n },\n \"harness|arc:challenge|25\"\
: {\n \"acc\": 0.5827645051194539,\n \"acc_stderr\": 0.014409825518403077,\n\
\ \"acc_norm\": 0.6083617747440273,\n \"acc_norm_stderr\": 0.014264122124938213\n\
\ },\n \"harness|hellaswag|10\": {\n \"acc\": 0.6444931288587931,\n\
\ \"acc_stderr\": 0.004776883632722615,\n \"acc_norm\": 0.8412666799442342,\n\
\ \"acc_norm_stderr\": 0.0036468038997703434\n },\n \"harness|hendrycksTest-abstract_algebra|5\"\
: {\n \"acc\": 0.37,\n \"acc_stderr\": 0.04852365870939099,\n \
\ \"acc_norm\": 0.37,\n \"acc_norm_stderr\": 0.04852365870939099\n \
\ },\n \"harness|hendrycksTest-anatomy|5\": {\n \"acc\": 0.6222222222222222,\n\
\ \"acc_stderr\": 0.04188307537595853,\n \"acc_norm\": 0.6222222222222222,\n\
\ \"acc_norm_stderr\": 0.04188307537595853\n },\n \"harness|hendrycksTest-astronomy|5\"\
: {\n \"acc\": 0.6710526315789473,\n \"acc_stderr\": 0.03823428969926604,\n\
\ \"acc_norm\": 0.6710526315789473,\n \"acc_norm_stderr\": 0.03823428969926604\n\
\ },\n \"harness|hendrycksTest-business_ethics|5\": {\n \"acc\": 0.65,\n\
\ \"acc_stderr\": 0.047937248544110196,\n \"acc_norm\": 0.65,\n \
\ \"acc_norm_stderr\": 0.047937248544110196\n },\n \"harness|hendrycksTest-clinical_knowledge|5\"\
: {\n \"acc\": 0.7471698113207547,\n \"acc_stderr\": 0.026749899771241207,\n\
\ \"acc_norm\": 0.7471698113207547,\n \"acc_norm_stderr\": 0.026749899771241207\n\
\ },\n \"harness|hendrycksTest-college_biology|5\": {\n \"acc\": 0.7569444444444444,\n\
\ \"acc_stderr\": 0.03586879280080342,\n \"acc_norm\": 0.7569444444444444,\n\
\ \"acc_norm_stderr\": 0.03586879280080342\n },\n \"harness|hendrycksTest-college_chemistry|5\"\
: {\n \"acc\": 0.41,\n \"acc_stderr\": 0.049431107042371025,\n \
\ \"acc_norm\": 0.41,\n \"acc_norm_stderr\": 0.049431107042371025\n \
\ },\n \"harness|hendrycksTest-college_computer_science|5\": {\n \"\
acc\": 0.48,\n \"acc_stderr\": 0.050211673156867795,\n \"acc_norm\"\
: 0.48,\n \"acc_norm_stderr\": 0.050211673156867795\n },\n \"harness|hendrycksTest-college_mathematics|5\"\
: {\n \"acc\": 0.41,\n \"acc_stderr\": 0.049431107042371025,\n \
\ \"acc_norm\": 0.41,\n \"acc_norm_stderr\": 0.049431107042371025\n \
\ },\n \"harness|hendrycksTest-college_medicine|5\": {\n \"acc\": 0.6069364161849711,\n\
\ \"acc_stderr\": 0.0372424959581773,\n \"acc_norm\": 0.6069364161849711,\n\
\ \"acc_norm_stderr\": 0.0372424959581773\n },\n \"harness|hendrycksTest-college_physics|5\"\
: {\n \"acc\": 0.4411764705882353,\n \"acc_stderr\": 0.049406356306056595,\n\
\ \"acc_norm\": 0.4411764705882353,\n \"acc_norm_stderr\": 0.049406356306056595\n\
\ },\n \"harness|hendrycksTest-computer_security|5\": {\n \"acc\":\
\ 0.8,\n \"acc_stderr\": 0.04020151261036846,\n \"acc_norm\": 0.8,\n\
\ \"acc_norm_stderr\": 0.04020151261036846\n },\n \"harness|hendrycksTest-conceptual_physics|5\"\
: {\n \"acc\": 0.5531914893617021,\n \"acc_stderr\": 0.0325005368436584,\n\
\ \"acc_norm\": 0.5531914893617021,\n \"acc_norm_stderr\": 0.0325005368436584\n\
\ },\n \"harness|hendrycksTest-econometrics|5\": {\n \"acc\": 0.5087719298245614,\n\
\ \"acc_stderr\": 0.04702880432049615,\n \"acc_norm\": 0.5087719298245614,\n\
\ \"acc_norm_stderr\": 0.04702880432049615\n },\n \"harness|hendrycksTest-electrical_engineering|5\"\
: {\n \"acc\": 0.6137931034482759,\n \"acc_stderr\": 0.04057324734419035,\n\
\ \"acc_norm\": 0.6137931034482759,\n \"acc_norm_stderr\": 0.04057324734419035\n\
\ },\n \"harness|hendrycksTest-elementary_mathematics|5\": {\n \"acc\"\
: 0.4021164021164021,\n \"acc_stderr\": 0.025253032554997695,\n \"\
acc_norm\": 0.4021164021164021,\n \"acc_norm_stderr\": 0.025253032554997695\n\
\ },\n \"harness|hendrycksTest-formal_logic|5\": {\n \"acc\": 0.5,\n\
\ \"acc_stderr\": 0.04472135954999579,\n \"acc_norm\": 0.5,\n \
\ \"acc_norm_stderr\": 0.04472135954999579\n },\n \"harness|hendrycksTest-global_facts|5\"\
: {\n \"acc\": 0.43,\n \"acc_stderr\": 0.04975698519562428,\n \
\ \"acc_norm\": 0.43,\n \"acc_norm_stderr\": 0.04975698519562428\n \
\ },\n \"harness|hendrycksTest-high_school_biology|5\": {\n \"acc\": 0.7548387096774194,\n\
\ \"acc_stderr\": 0.024472243840895504,\n \"acc_norm\": 0.7548387096774194,\n\
\ \"acc_norm_stderr\": 0.024472243840895504\n },\n \"harness|hendrycksTest-high_school_chemistry|5\"\
: {\n \"acc\": 0.49261083743842365,\n \"acc_stderr\": 0.035176035403610084,\n\
\ \"acc_norm\": 0.49261083743842365,\n \"acc_norm_stderr\": 0.035176035403610084\n\
\ },\n \"harness|hendrycksTest-high_school_computer_science|5\": {\n \
\ \"acc\": 0.67,\n \"acc_stderr\": 0.047258156262526094,\n \"acc_norm\"\
: 0.67,\n \"acc_norm_stderr\": 0.047258156262526094\n },\n \"harness|hendrycksTest-high_school_european_history|5\"\
: {\n \"acc\": 0.7515151515151515,\n \"acc_stderr\": 0.033744026441394036,\n\
\ \"acc_norm\": 0.7515151515151515,\n \"acc_norm_stderr\": 0.033744026441394036\n\
\ },\n \"harness|hendrycksTest-high_school_geography|5\": {\n \"acc\"\
: 0.7626262626262627,\n \"acc_stderr\": 0.0303137105381989,\n \"acc_norm\"\
: 0.7626262626262627,\n \"acc_norm_stderr\": 0.0303137105381989\n },\n\
\ \"harness|hendrycksTest-high_school_government_and_politics|5\": {\n \
\ \"acc\": 0.8808290155440415,\n \"acc_stderr\": 0.02338193534812143,\n\
\ \"acc_norm\": 0.8808290155440415,\n \"acc_norm_stderr\": 0.02338193534812143\n\
\ },\n \"harness|hendrycksTest-high_school_macroeconomics|5\": {\n \
\ \"acc\": 0.6,\n \"acc_stderr\": 0.02483881198803316,\n \"acc_norm\"\
: 0.6,\n \"acc_norm_stderr\": 0.02483881198803316\n },\n \"harness|hendrycksTest-high_school_mathematics|5\"\
: {\n \"acc\": 0.3592592592592593,\n \"acc_stderr\": 0.029252905927251976,\n\
\ \"acc_norm\": 0.3592592592592593,\n \"acc_norm_stderr\": 0.029252905927251976\n\
\ },\n \"harness|hendrycksTest-high_school_microeconomics|5\": {\n \
\ \"acc\": 0.7310924369747899,\n \"acc_stderr\": 0.028801392193631276,\n\
\ \"acc_norm\": 0.7310924369747899,\n \"acc_norm_stderr\": 0.028801392193631276\n\
\ },\n \"harness|hendrycksTest-high_school_physics|5\": {\n \"acc\"\
: 0.423841059602649,\n \"acc_stderr\": 0.04034846678603397,\n \"acc_norm\"\
: 0.423841059602649,\n \"acc_norm_stderr\": 0.04034846678603397\n },\n\
\ \"harness|hendrycksTest-high_school_psychology|5\": {\n \"acc\": 0.8165137614678899,\n\
\ \"acc_stderr\": 0.0165952597103993,\n \"acc_norm\": 0.8165137614678899,\n\
\ \"acc_norm_stderr\": 0.0165952597103993\n },\n \"harness|hendrycksTest-high_school_statistics|5\"\
: {\n \"acc\": 0.4675925925925926,\n \"acc_stderr\": 0.03402801581358966,\n\
\ \"acc_norm\": 0.4675925925925926,\n \"acc_norm_stderr\": 0.03402801581358966\n\
\ },\n \"harness|hendrycksTest-high_school_us_history|5\": {\n \"acc\"\
: 0.8137254901960784,\n \"acc_stderr\": 0.027325470966716312,\n \"\
acc_norm\": 0.8137254901960784,\n \"acc_norm_stderr\": 0.027325470966716312\n\
\ },\n \"harness|hendrycksTest-high_school_world_history|5\": {\n \"\
acc\": 0.7890295358649789,\n \"acc_stderr\": 0.026558372502661916,\n \
\ \"acc_norm\": 0.7890295358649789,\n \"acc_norm_stderr\": 0.026558372502661916\n\
\ },\n \"harness|hendrycksTest-human_aging|5\": {\n \"acc\": 0.6905829596412556,\n\
\ \"acc_stderr\": 0.03102441174057221,\n \"acc_norm\": 0.6905829596412556,\n\
\ \"acc_norm_stderr\": 0.03102441174057221\n },\n \"harness|hendrycksTest-human_sexuality|5\"\
: {\n \"acc\": 0.7404580152671756,\n \"acc_stderr\": 0.03844876139785271,\n\
\ \"acc_norm\": 0.7404580152671756,\n \"acc_norm_stderr\": 0.03844876139785271\n\
\ },\n \"harness|hendrycksTest-international_law|5\": {\n \"acc\":\
\ 0.8181818181818182,\n \"acc_stderr\": 0.035208939510976506,\n \"\
acc_norm\": 0.8181818181818182,\n \"acc_norm_stderr\": 0.035208939510976506\n\
\ },\n \"harness|hendrycksTest-jurisprudence|5\": {\n \"acc\": 0.6944444444444444,\n\
\ \"acc_stderr\": 0.04453197507374983,\n \"acc_norm\": 0.6944444444444444,\n\
\ \"acc_norm_stderr\": 0.04453197507374983\n },\n \"harness|hendrycksTest-logical_fallacies|5\"\
: {\n \"acc\": 0.7730061349693251,\n \"acc_stderr\": 0.03291099578615769,\n\
\ \"acc_norm\": 0.7730061349693251,\n \"acc_norm_stderr\": 0.03291099578615769\n\
\ },\n \"harness|hendrycksTest-machine_learning|5\": {\n \"acc\": 0.5892857142857143,\n\
\ \"acc_stderr\": 0.04669510663875191,\n \"acc_norm\": 0.5892857142857143,\n\
\ \"acc_norm_stderr\": 0.04669510663875191\n },\n \"harness|hendrycksTest-management|5\"\
: {\n \"acc\": 0.7864077669902912,\n \"acc_stderr\": 0.040580420156460344,\n\
\ \"acc_norm\": 0.7864077669902912,\n \"acc_norm_stderr\": 0.040580420156460344\n\
\ },\n \"harness|hendrycksTest-marketing|5\": {\n \"acc\": 0.8418803418803419,\n\
\ \"acc_stderr\": 0.023902325549560406,\n \"acc_norm\": 0.8418803418803419,\n\
\ \"acc_norm_stderr\": 0.023902325549560406\n },\n \"harness|hendrycksTest-medical_genetics|5\"\
: {\n \"acc\": 0.79,\n \"acc_stderr\": 0.040936018074033256,\n \
\ \"acc_norm\": 0.79,\n \"acc_norm_stderr\": 0.040936018074033256\n \
\ },\n \"harness|hendrycksTest-miscellaneous|5\": {\n \"acc\": 0.8148148148148148,\n\
\ \"acc_stderr\": 0.013890862162876164,\n \"acc_norm\": 0.8148148148148148,\n\
\ \"acc_norm_stderr\": 0.013890862162876164\n },\n \"harness|hendrycksTest-moral_disputes|5\"\
: {\n \"acc\": 0.7023121387283237,\n \"acc_stderr\": 0.024617055388676992,\n\
\ \"acc_norm\": 0.7023121387283237,\n \"acc_norm_stderr\": 0.024617055388676992\n\
\ },\n \"harness|hendrycksTest-moral_scenarios|5\": {\n \"acc\": 0.42681564245810055,\n\
\ \"acc_stderr\": 0.016542401954631917,\n \"acc_norm\": 0.42681564245810055,\n\
\ \"acc_norm_stderr\": 0.016542401954631917\n },\n \"harness|hendrycksTest-nutrition|5\"\
: {\n \"acc\": 0.738562091503268,\n \"acc_stderr\": 0.025160998214292456,\n\
\ \"acc_norm\": 0.738562091503268,\n \"acc_norm_stderr\": 0.025160998214292456\n\
\ },\n \"harness|hendrycksTest-philosophy|5\": {\n \"acc\": 0.7331189710610932,\n\
\ \"acc_stderr\": 0.02512263760881665,\n \"acc_norm\": 0.7331189710610932,\n\
\ \"acc_norm_stderr\": 0.02512263760881665\n },\n \"harness|hendrycksTest-prehistory|5\"\
: {\n \"acc\": 0.7314814814814815,\n \"acc_stderr\": 0.024659685185967294,\n\
\ \"acc_norm\": 0.7314814814814815,\n \"acc_norm_stderr\": 0.024659685185967294\n\
\ },\n \"harness|hendrycksTest-professional_accounting|5\": {\n \"\
acc\": 0.475177304964539,\n \"acc_stderr\": 0.02979071924382972,\n \
\ \"acc_norm\": 0.475177304964539,\n \"acc_norm_stderr\": 0.02979071924382972\n\
\ },\n \"harness|hendrycksTest-professional_law|5\": {\n \"acc\": 0.43415906127770537,\n\
\ \"acc_stderr\": 0.01265903323706725,\n \"acc_norm\": 0.43415906127770537,\n\
\ \"acc_norm_stderr\": 0.01265903323706725\n },\n \"harness|hendrycksTest-professional_medicine|5\"\
: {\n \"acc\": 0.6691176470588235,\n \"acc_stderr\": 0.028582709753898445,\n\
\ \"acc_norm\": 0.6691176470588235,\n \"acc_norm_stderr\": 0.028582709753898445\n\
\ },\n \"harness|hendrycksTest-professional_psychology|5\": {\n \"\
acc\": 0.6944444444444444,\n \"acc_stderr\": 0.018635594034423983,\n \
\ \"acc_norm\": 0.6944444444444444,\n \"acc_norm_stderr\": 0.018635594034423983\n\
\ },\n \"harness|hendrycksTest-public_relations|5\": {\n \"acc\": 0.6727272727272727,\n\
\ \"acc_stderr\": 0.0449429086625209,\n \"acc_norm\": 0.6727272727272727,\n\
\ \"acc_norm_stderr\": 0.0449429086625209\n },\n \"harness|hendrycksTest-security_studies|5\"\
: {\n \"acc\": 0.7551020408163265,\n \"acc_stderr\": 0.027529637440174934,\n\
\ \"acc_norm\": 0.7551020408163265,\n \"acc_norm_stderr\": 0.027529637440174934\n\
\ },\n \"harness|hendrycksTest-sociology|5\": {\n \"acc\": 0.835820895522388,\n\
\ \"acc_stderr\": 0.026193923544454125,\n \"acc_norm\": 0.835820895522388,\n\
\ \"acc_norm_stderr\": 0.026193923544454125\n },\n \"harness|hendrycksTest-us_foreign_policy|5\"\
: {\n \"acc\": 0.84,\n \"acc_stderr\": 0.03684529491774708,\n \
\ \"acc_norm\": 0.84,\n \"acc_norm_stderr\": 0.03684529491774708\n \
\ },\n \"harness|hendrycksTest-virology|5\": {\n \"acc\": 0.5120481927710844,\n\
\ \"acc_stderr\": 0.03891364495835816,\n \"acc_norm\": 0.5120481927710844,\n\
\ \"acc_norm_stderr\": 0.03891364495835816\n },\n \"harness|hendrycksTest-world_religions|5\"\
: {\n \"acc\": 0.8245614035087719,\n \"acc_stderr\": 0.029170885500727665,\n\
\ \"acc_norm\": 0.8245614035087719,\n \"acc_norm_stderr\": 0.029170885500727665\n\
\ },\n \"harness|truthfulqa:mc|0\": {\n \"mc1\": 0.390452876376989,\n\
\ \"mc1_stderr\": 0.017078230743431448,\n \"mc2\": 0.5634222670773993,\n\
\ \"mc2_stderr\": 0.015351979609326523\n },\n \"harness|winogrande|5\"\
: {\n \"acc\": 0.7821625887924231,\n \"acc_stderr\": 0.011601066079939324\n\
\ },\n \"harness|gsm8k|5\": {\n \"acc\": 0.5481425322213799,\n \
\ \"acc_stderr\": 0.013708494995677646\n }\n}\n```"
repo_url: https://huggingface.co/Locutusque/llama-3-neural-chat-v1-8b
leaderboard_url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
point_of_contact: clementine@hf.co
configs:
- config_name: harness_arc_challenge_25
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|arc:challenge|25_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|arc:challenge|25_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_gsm8k_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|gsm8k|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|gsm8k|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hellaswag_10
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hellaswag|10_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hellaswag|10_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-management|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-virology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-anatomy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-astronomy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-business_ethics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_biology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_medicine|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-college_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-computer_security|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-econometrics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-formal_logic|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-global_facts|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-human_aging|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-international_law|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-machine_learning|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-management|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-marketing|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-nutrition|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-philosophy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-prehistory|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_law|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-public_relations|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-security_studies|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-sociology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-virology|5_2024-04-20T21-23-35.453083.parquet'
- '**/details_harness|hendrycksTest-world_religions|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_abstract_algebra_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-abstract_algebra|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_anatomy_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-anatomy|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-anatomy|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_astronomy_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-astronomy|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-astronomy|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_business_ethics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-business_ethics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_clinical_knowledge_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-clinical_knowledge|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_biology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_biology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_biology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_chemistry_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_computer_science_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_mathematics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_medicine_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_medicine|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_college_physics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-college_physics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-college_physics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_computer_security_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-computer_security|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-computer_security|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_conceptual_physics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-conceptual_physics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_econometrics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-econometrics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-econometrics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_electrical_engineering_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-electrical_engineering|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_elementary_mathematics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-elementary_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_formal_logic_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-formal_logic|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_global_facts_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-global_facts|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-global_facts|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_biology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_biology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_chemistry_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_chemistry|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_computer_science_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_computer_science|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_european_history_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_european_history|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_geography_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_geography|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_government_and_politics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_government_and_politics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_macroeconomics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_macroeconomics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_mathematics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_mathematics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_microeconomics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_microeconomics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_physics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_physics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_psychology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_psychology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_statistics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_statistics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_us_history_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_us_history|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_high_school_world_history_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-high_school_world_history|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_human_aging_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-human_aging|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_aging|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_human_sexuality_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-human_sexuality|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_international_law_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-international_law|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-international_law|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_jurisprudence_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-jurisprudence|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_logical_fallacies_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-logical_fallacies|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_machine_learning_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-machine_learning|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_management_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-management|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-management|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_marketing_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-marketing|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-marketing|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_medical_genetics_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-medical_genetics|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_miscellaneous_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-miscellaneous|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_moral_disputes_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_disputes|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_moral_scenarios_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-moral_scenarios|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_nutrition_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-nutrition|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-nutrition|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_philosophy_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-philosophy|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-philosophy|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_prehistory_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-prehistory|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-prehistory|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_professional_accounting_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_accounting|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_professional_law_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-professional_law|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_law|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_professional_medicine_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_medicine|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_professional_psychology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-professional_psychology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_public_relations_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-public_relations|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-public_relations|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_security_studies_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-security_studies|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-security_studies|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_sociology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-sociology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-sociology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_us_foreign_policy_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-us_foreign_policy|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_virology_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-virology|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-virology|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_hendrycksTest_world_religions_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|hendrycksTest-world_religions|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|hendrycksTest-world_religions|5_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_truthfulqa_mc_0
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|truthfulqa:mc|0_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|truthfulqa:mc|0_2024-04-20T21-23-35.453083.parquet'
- config_name: harness_winogrande_5
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- '**/details_harness|winogrande|5_2024-04-20T21-23-35.453083.parquet'
- split: latest
path:
- '**/details_harness|winogrande|5_2024-04-20T21-23-35.453083.parquet'
- config_name: results
data_files:
- split: 2024_04_20T21_23_35.453083
path:
- results_2024-04-20T21-23-35.453083.parquet
- split: latest
path:
- results_2024-04-20T21-23-35.453083.parquet
---
# Dataset Card for Evaluation run of Locutusque/llama-3-neural-chat-v1-8b
<!-- Provide a quick summary of the dataset. -->
Dataset automatically created during the evaluation run of model [Locutusque/llama-3-neural-chat-v1-8b](https://huggingface.co/Locutusque/llama-3-neural-chat-v1-8b) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task.
The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.
An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)).
To load the details from a run, you can for instance do the following:
```python
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b",
"harness_winogrande_5",
split="train")
```
## Latest results
These are the [latest results from run 2024-04-20T21:23:35.453083](https://huggingface.co/datasets/open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b/blob/main/results_2024-04-20T21-23-35.453083.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):
```python
{
"all": {
"acc": 0.6463757768465722,
"acc_stderr": 0.032443331188726734,
"acc_norm": 0.6495082726667307,
"acc_norm_stderr": 0.033092506073055875,
"mc1": 0.390452876376989,
"mc1_stderr": 0.017078230743431448,
"mc2": 0.5634222670773993,
"mc2_stderr": 0.015351979609326523
},
"harness|arc:challenge|25": {
"acc": 0.5827645051194539,
"acc_stderr": 0.014409825518403077,
"acc_norm": 0.6083617747440273,
"acc_norm_stderr": 0.014264122124938213
},
"harness|hellaswag|10": {
"acc": 0.6444931288587931,
"acc_stderr": 0.004776883632722615,
"acc_norm": 0.8412666799442342,
"acc_norm_stderr": 0.0036468038997703434
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.37,
"acc_stderr": 0.04852365870939099,
"acc_norm": 0.37,
"acc_norm_stderr": 0.04852365870939099
},
"harness|hendrycksTest-anatomy|5": {
"acc": 0.6222222222222222,
"acc_stderr": 0.04188307537595853,
"acc_norm": 0.6222222222222222,
"acc_norm_stderr": 0.04188307537595853
},
"harness|hendrycksTest-astronomy|5": {
"acc": 0.6710526315789473,
"acc_stderr": 0.03823428969926604,
"acc_norm": 0.6710526315789473,
"acc_norm_stderr": 0.03823428969926604
},
"harness|hendrycksTest-business_ethics|5": {
"acc": 0.65,
"acc_stderr": 0.047937248544110196,
"acc_norm": 0.65,
"acc_norm_stderr": 0.047937248544110196
},
"harness|hendrycksTest-clinical_knowledge|5": {
"acc": 0.7471698113207547,
"acc_stderr": 0.026749899771241207,
"acc_norm": 0.7471698113207547,
"acc_norm_stderr": 0.026749899771241207
},
"harness|hendrycksTest-college_biology|5": {
"acc": 0.7569444444444444,
"acc_stderr": 0.03586879280080342,
"acc_norm": 0.7569444444444444,
"acc_norm_stderr": 0.03586879280080342
},
"harness|hendrycksTest-college_chemistry|5": {
"acc": 0.41,
"acc_stderr": 0.049431107042371025,
"acc_norm": 0.41,
"acc_norm_stderr": 0.049431107042371025
},
"harness|hendrycksTest-college_computer_science|5": {
"acc": 0.48,
"acc_stderr": 0.050211673156867795,
"acc_norm": 0.48,
"acc_norm_stderr": 0.050211673156867795
},
"harness|hendrycksTest-college_mathematics|5": {
"acc": 0.41,
"acc_stderr": 0.049431107042371025,
"acc_norm": 0.41,
"acc_norm_stderr": 0.049431107042371025
},
"harness|hendrycksTest-college_medicine|5": {
"acc": 0.6069364161849711,
"acc_stderr": 0.0372424959581773,
"acc_norm": 0.6069364161849711,
"acc_norm_stderr": 0.0372424959581773
},
"harness|hendrycksTest-college_physics|5": {
"acc": 0.4411764705882353,
"acc_stderr": 0.049406356306056595,
"acc_norm": 0.4411764705882353,
"acc_norm_stderr": 0.049406356306056595
},
"harness|hendrycksTest-computer_security|5": {
"acc": 0.8,
"acc_stderr": 0.04020151261036846,
"acc_norm": 0.8,
"acc_norm_stderr": 0.04020151261036846
},
"harness|hendrycksTest-conceptual_physics|5": {
"acc": 0.5531914893617021,
"acc_stderr": 0.0325005368436584,
"acc_norm": 0.5531914893617021,
"acc_norm_stderr": 0.0325005368436584
},
"harness|hendrycksTest-econometrics|5": {
"acc": 0.5087719298245614,
"acc_stderr": 0.04702880432049615,
"acc_norm": 0.5087719298245614,
"acc_norm_stderr": 0.04702880432049615
},
"harness|hendrycksTest-electrical_engineering|5": {
"acc": 0.6137931034482759,
"acc_stderr": 0.04057324734419035,
"acc_norm": 0.6137931034482759,
"acc_norm_stderr": 0.04057324734419035
},
"harness|hendrycksTest-elementary_mathematics|5": {
"acc": 0.4021164021164021,
"acc_stderr": 0.025253032554997695,
"acc_norm": 0.4021164021164021,
"acc_norm_stderr": 0.025253032554997695
},
"harness|hendrycksTest-formal_logic|5": {
"acc": 0.5,
"acc_stderr": 0.04472135954999579,
"acc_norm": 0.5,
"acc_norm_stderr": 0.04472135954999579
},
"harness|hendrycksTest-global_facts|5": {
"acc": 0.43,
"acc_stderr": 0.04975698519562428,
"acc_norm": 0.43,
"acc_norm_stderr": 0.04975698519562428
},
"harness|hendrycksTest-high_school_biology|5": {
"acc": 0.7548387096774194,
"acc_stderr": 0.024472243840895504,
"acc_norm": 0.7548387096774194,
"acc_norm_stderr": 0.024472243840895504
},
"harness|hendrycksTest-high_school_chemistry|5": {
"acc": 0.49261083743842365,
"acc_stderr": 0.035176035403610084,
"acc_norm": 0.49261083743842365,
"acc_norm_stderr": 0.035176035403610084
},
"harness|hendrycksTest-high_school_computer_science|5": {
"acc": 0.67,
"acc_stderr": 0.047258156262526094,
"acc_norm": 0.67,
"acc_norm_stderr": 0.047258156262526094
},
"harness|hendrycksTest-high_school_european_history|5": {
"acc": 0.7515151515151515,
"acc_stderr": 0.033744026441394036,
"acc_norm": 0.7515151515151515,
"acc_norm_stderr": 0.033744026441394036
},
"harness|hendrycksTest-high_school_geography|5": {
"acc": 0.7626262626262627,
"acc_stderr": 0.0303137105381989,
"acc_norm": 0.7626262626262627,
"acc_norm_stderr": 0.0303137105381989
},
"harness|hendrycksTest-high_school_government_and_politics|5": {
"acc": 0.8808290155440415,
"acc_stderr": 0.02338193534812143,
"acc_norm": 0.8808290155440415,
"acc_norm_stderr": 0.02338193534812143
},
"harness|hendrycksTest-high_school_macroeconomics|5": {
"acc": 0.6,
"acc_stderr": 0.02483881198803316,
"acc_norm": 0.6,
"acc_norm_stderr": 0.02483881198803316
},
"harness|hendrycksTest-high_school_mathematics|5": {
"acc": 0.3592592592592593,
"acc_stderr": 0.029252905927251976,
"acc_norm": 0.3592592592592593,
"acc_norm_stderr": 0.029252905927251976
},
"harness|hendrycksTest-high_school_microeconomics|5": {
"acc": 0.7310924369747899,
"acc_stderr": 0.028801392193631276,
"acc_norm": 0.7310924369747899,
"acc_norm_stderr": 0.028801392193631276
},
"harness|hendrycksTest-high_school_physics|5": {
"acc": 0.423841059602649,
"acc_stderr": 0.04034846678603397,
"acc_norm": 0.423841059602649,
"acc_norm_stderr": 0.04034846678603397
},
"harness|hendrycksTest-high_school_psychology|5": {
"acc": 0.8165137614678899,
"acc_stderr": 0.0165952597103993,
"acc_norm": 0.8165137614678899,
"acc_norm_stderr": 0.0165952597103993
},
"harness|hendrycksTest-high_school_statistics|5": {
"acc": 0.4675925925925926,
"acc_stderr": 0.03402801581358966,
"acc_norm": 0.4675925925925926,
"acc_norm_stderr": 0.03402801581358966
},
"harness|hendrycksTest-high_school_us_history|5": {
"acc": 0.8137254901960784,
"acc_stderr": 0.027325470966716312,
"acc_norm": 0.8137254901960784,
"acc_norm_stderr": 0.027325470966716312
},
"harness|hendrycksTest-high_school_world_history|5": {
"acc": 0.7890295358649789,
"acc_stderr": 0.026558372502661916,
"acc_norm": 0.7890295358649789,
"acc_norm_stderr": 0.026558372502661916
},
"harness|hendrycksTest-human_aging|5": {
"acc": 0.6905829596412556,
"acc_stderr": 0.03102441174057221,
"acc_norm": 0.6905829596412556,
"acc_norm_stderr": 0.03102441174057221
},
"harness|hendrycksTest-human_sexuality|5": {
"acc": 0.7404580152671756,
"acc_stderr": 0.03844876139785271,
"acc_norm": 0.7404580152671756,
"acc_norm_stderr": 0.03844876139785271
},
"harness|hendrycksTest-international_law|5": {
"acc": 0.8181818181818182,
"acc_stderr": 0.035208939510976506,
"acc_norm": 0.8181818181818182,
"acc_norm_stderr": 0.035208939510976506
},
"harness|hendrycksTest-jurisprudence|5": {
"acc": 0.6944444444444444,
"acc_stderr": 0.04453197507374983,
"acc_norm": 0.6944444444444444,
"acc_norm_stderr": 0.04453197507374983
},
"harness|hendrycksTest-logical_fallacies|5": {
"acc": 0.7730061349693251,
"acc_stderr": 0.03291099578615769,
"acc_norm": 0.7730061349693251,
"acc_norm_stderr": 0.03291099578615769
},
"harness|hendrycksTest-machine_learning|5": {
"acc": 0.5892857142857143,
"acc_stderr": 0.04669510663875191,
"acc_norm": 0.5892857142857143,
"acc_norm_stderr": 0.04669510663875191
},
"harness|hendrycksTest-management|5": {
"acc": 0.7864077669902912,
"acc_stderr": 0.040580420156460344,
"acc_norm": 0.7864077669902912,
"acc_norm_stderr": 0.040580420156460344
},
"harness|hendrycksTest-marketing|5": {
"acc": 0.8418803418803419,
"acc_stderr": 0.023902325549560406,
"acc_norm": 0.8418803418803419,
"acc_norm_stderr": 0.023902325549560406
},
"harness|hendrycksTest-medical_genetics|5": {
"acc": 0.79,
"acc_stderr": 0.040936018074033256,
"acc_norm": 0.79,
"acc_norm_stderr": 0.040936018074033256
},
"harness|hendrycksTest-miscellaneous|5": {
"acc": 0.8148148148148148,
"acc_stderr": 0.013890862162876164,
"acc_norm": 0.8148148148148148,
"acc_norm_stderr": 0.013890862162876164
},
"harness|hendrycksTest-moral_disputes|5": {
"acc": 0.7023121387283237,
"acc_stderr": 0.024617055388676992,
"acc_norm": 0.7023121387283237,
"acc_norm_stderr": 0.024617055388676992
},
"harness|hendrycksTest-moral_scenarios|5": {
"acc": 0.42681564245810055,
"acc_stderr": 0.016542401954631917,
"acc_norm": 0.42681564245810055,
"acc_norm_stderr": 0.016542401954631917
},
"harness|hendrycksTest-nutrition|5": {
"acc": 0.738562091503268,
"acc_stderr": 0.025160998214292456,
"acc_norm": 0.738562091503268,
"acc_norm_stderr": 0.025160998214292456
},
"harness|hendrycksTest-philosophy|5": {
"acc": 0.7331189710610932,
"acc_stderr": 0.02512263760881665,
"acc_norm": 0.7331189710610932,
"acc_norm_stderr": 0.02512263760881665
},
"harness|hendrycksTest-prehistory|5": {
"acc": 0.7314814814814815,
"acc_stderr": 0.024659685185967294,
"acc_norm": 0.7314814814814815,
"acc_norm_stderr": 0.024659685185967294
},
"harness|hendrycksTest-professional_accounting|5": {
"acc": 0.475177304964539,
"acc_stderr": 0.02979071924382972,
"acc_norm": 0.475177304964539,
"acc_norm_stderr": 0.02979071924382972
},
"harness|hendrycksTest-professional_law|5": {
"acc": 0.43415906127770537,
"acc_stderr": 0.01265903323706725,
"acc_norm": 0.43415906127770537,
"acc_norm_stderr": 0.01265903323706725
},
"harness|hendrycksTest-professional_medicine|5": {
"acc": 0.6691176470588235,
"acc_stderr": 0.028582709753898445,
"acc_norm": 0.6691176470588235,
"acc_norm_stderr": 0.028582709753898445
},
"harness|hendrycksTest-professional_psychology|5": {
"acc": 0.6944444444444444,
"acc_stderr": 0.018635594034423983,
"acc_norm": 0.6944444444444444,
"acc_norm_stderr": 0.018635594034423983
},
"harness|hendrycksTest-public_relations|5": {
"acc": 0.6727272727272727,
"acc_stderr": 0.0449429086625209,
"acc_norm": 0.6727272727272727,
"acc_norm_stderr": 0.0449429086625209
},
"harness|hendrycksTest-security_studies|5": {
"acc": 0.7551020408163265,
"acc_stderr": 0.027529637440174934,
"acc_norm": 0.7551020408163265,
"acc_norm_stderr": 0.027529637440174934
},
"harness|hendrycksTest-sociology|5": {
"acc": 0.835820895522388,
"acc_stderr": 0.026193923544454125,
"acc_norm": 0.835820895522388,
"acc_norm_stderr": 0.026193923544454125
},
"harness|hendrycksTest-us_foreign_policy|5": {
"acc": 0.84,
"acc_stderr": 0.03684529491774708,
"acc_norm": 0.84,
"acc_norm_stderr": 0.03684529491774708
},
"harness|hendrycksTest-virology|5": {
"acc": 0.5120481927710844,
"acc_stderr": 0.03891364495835816,
"acc_norm": 0.5120481927710844,
"acc_norm_stderr": 0.03891364495835816
},
"harness|hendrycksTest-world_religions|5": {
"acc": 0.8245614035087719,
"acc_stderr": 0.029170885500727665,
"acc_norm": 0.8245614035087719,
"acc_norm_stderr": 0.029170885500727665
},
"harness|truthfulqa:mc|0": {
"mc1": 0.390452876376989,
"mc1_stderr": 0.017078230743431448,
"mc2": 0.5634222670773993,
"mc2_stderr": 0.015351979609326523
},
"harness|winogrande|5": {
"acc": 0.7821625887924231,
"acc_stderr": 0.011601066079939324
},
"harness|gsm8k|5": {
"acc": 0.5481425322213799,
"acc_stderr": 0.013708494995677646
}
}
```
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
[More Information Needed]
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
[More Information Needed]
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed]
提供机构:
open-llm-leaderboard
原始信息汇总
数据集概述
数据集名称
- Evaluation run of Locutusque/llama-3-neural-chat-v1-8b
数据集描述
- 该数据集是在评估模型Locutusque/llama-3-neural-chat-v1-8b的过程中自动创建的,该模型参与了Open LLM Leaderboard的评测。
数据集组成
- 数据集由63个配置组成,每个配置对应一个评估任务。
- 数据集来源于1次运行,每次运行对应一个特定的分割,分割名称基于运行的时间戳。
- 存在一个额外的配置“results”,用于存储所有运行的聚合结果,用于计算和显示聚合指标。
数据集加载示例
python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_Locutusque__llama-3-neural-chat-v1-8b", "harness_winogrande_5", split="train")
最新结果
- 提供了最新的评估结果,包括多个任务的准确率(acc)和其他相关指标。
数据集配置详情
配置列表
- harness_arc_challenge_25
- harness_gsm8k_5
- harness_hellaswag_10
- harness_hendrycksTest_5
每个配置包含以下数据文件:
- 特定时间戳的分割数据文件
- 最新结果的分割数据文件
这些配置详细记录了各个评估任务的数据集结构和内容,确保了数据的可访问性和完整性。
搜集汇总
数据集介绍

构建方式
在大规模语言模型评估领域,对模型性能进行系统化、标准化的评测至关重要。本数据集是专为评估模型Locutusque/llama-3-neural-chat-v1-8b在Open LLM Leaderboard上的表现而自动生成的。其构建方式基于对模型执行一次完整的评估运行,涵盖63个不同的评测任务配置,每个配置对应一个特定的评估任务。运行结果被组织为多个数据拆分,每个拆分以运行时间戳命名,而'train'拆分则始终指向最新运行的结果。此外,数据集还包含一个名为'results'的独立配置,用于存储所有任务的聚合评估指标,这些指标直接用于排行榜上综合得分的计算与展示。
使用方法
研究者可通过Hugging Face的datasets库便捷地加载与使用本数据集。例如,利用`load_dataset`函数并指定任务配置名称(如'harness_winogrande_5')及拆分(如'train'),即可获取该任务的最新评估结果。若要访问特定历史运行的详细数据,则可通过对应的时间戳拆分名称进行加载。数据集中的'results'配置提供了所有任务的聚合指标,可直接用于模型性能的综合分析。这种设计使得研究者能够灵活地针对单个任务进行深入剖析,或对模型整体能力进行系统性评估,极大地便利了模型对比与复现工作。
背景与挑战
背景概述
在大型语言模型(LLM)领域,模型性能的标准化评估是推动技术进步的核心环节。该数据集由Hugging Face团队于2024年创建,旨在系统记录模型Locutusque/llama-3-neural-chat-v1-8b在Open LLM Leaderboard上的完整评测结果。作为社区驱动的基准平台,Open LLM Leaderboard致力于为各类开源大语言模型提供透明、可复现的横向比较。该数据集包含63个评测任务配置,覆盖了从常识推理(如HellaSwag)、数学解题(GSM8K)到多学科知识(MMLU)等广泛维度,其核心研究问题在于如何通过细粒度的性能指标揭示模型在复杂语言理解与生成任务中的真实能力。该数据集的出现为研究者提供了即时访问模型分项表现的渠道,对推动LLM评估的标准化与可复现性具有重要影响。
当前挑战
当前该数据集面临的核心挑战包括:其一,评测任务的多样性虽广,但单一模型在一次运行中难以覆盖全部63个任务,导致部分任务结果缺失,影响整体评估的完整性;其二,数据集的构建依赖自动化流程,时间戳标记的多次运行结果可能因模型版本更新或评测环境差异而产生不一致,增加了结果复现的复杂性;其三,在领域问题层面,尽管数据集覆盖了ARC、TruthfulQA等经典基准,但面对大语言模型在开放域对话、多轮交互等新兴场景的评估需求,现有任务集仍显不足,难以全面反映模型的实际应用潜力。
常用场景
经典使用场景
该数据集专为评估大语言模型在多样化自然语言理解与推理任务上的表现而设计,涵盖ARC挑战集、HellaSwag常识推理、GSM8K数学问题求解、Winogrande代词消歧以及TruthfulQA事实性检测等经典基准。研究者通过加载各任务配置下的细粒度评分记录,能够系统性地剖析模型在逻辑推理、世界知识与数学计算等多维度的能力边界,从而为模型迭代提供量化依据。
解决学术问题
该数据集解决了大语言模型评估标准化与可复现性的核心难题。传统上,不同研究团队采用各异评测流程导致结果难以横向对比。通过统一记录63个任务配置的详尽指标(如准确率、标准误差与归一化分数),并开放完整评估轨迹,该数据集使得学术界能够严格检验模型在零样本泛化、对抗性样本鲁棒性及知识广度上的真实水平,推动了评测范式的规范化。
实际应用
在实际应用中,该数据集为模型选型与部署提供了关键参考。企业或开发者可依据模型在ARC(科学推理)与GSM8K(数学能力)等任务上的表现,筛选适用于教育辅导、智能客服或知识问答系统的语言模型。同时,TruthfulQA的评估结果有助于识别模型产生幻觉的风险,从而在医疗咨询、法律辅助等高可靠性场景中做出更审慎的技术决策。
数据集最近研究
最新研究方向
当前,大语言模型的性能评估已成为推动模型迭代与可信部署的核心环节。围绕Locutusque/llama-3-neural-chat-v1-8b模型在Open LLM Leaderboard上的评估数据集,研究重点聚焦于构建多维度、细粒度的评测体系。该数据集涵盖了从常识推理(如HellaSwag、ARC-Challenge)到数学推理(GSM8K)、从多学科知识(MMLU的57个学科)到对抗性真实性评估(TruthfulQA)等63个任务配置,系统性地揭示了模型在学术知识、逻辑推理与事实一致性上的综合表现。前沿方向正从单一的准确率指标向标准化误差分析(acc_stderr)和归一化精度(acc_norm)等稳健性度量延伸,以应对不同任务难度与样本偏差带来的评测波动。这一趋势不仅强化了开源模型间的横向比较,也为神经对话系统在医疗、法律等高风险领域的可信应用提供了关键的量化依据,推动着大模型评估从“跑分竞赛”走向科学化、标准化的新阶段。
以上内容由遇见数据集搜集并总结生成



