five

open-llm-leaderboard-old/details_adamo1139__yi-34b-200k-rawrr-dpo-2

收藏
Hugging Face2024-01-27 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_adamo1139__yi-34b-200k-rawrr-dpo-2
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在评估模型adamo1139/yi-34b-200k-rawrr-dpo-2时自动创建的,用于在Open LLM Leaderboard上进行评估。数据集由63个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割以运行的时间戳命名。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。

该数据集是在评估模型adamo1139/yi-34b-200k-rawrr-dpo-2时自动创建的,用于在Open LLM Leaderboard上进行评估。数据集由63个配置组成,每个配置对应一个评估任务。数据集从1次运行中创建,每次运行可以在每个配置中找到特定的分割,分割以运行的时间戳命名。train分割始终指向最新的结果。此外,results配置存储了所有运行的聚合结果,并用于计算和显示Open LLM Leaderboard上的聚合指标。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在对模型 adamo1139/yi-34b-200k-rawrr-dpo-2 进行评估运行期间自动创建的,用于 Open LLM Leaderboard

数据集组成

  • 数据集包含 63 个配置,每个配置对应一个评估任务。
  • 数据集从 1 次运行中创建,每个运行可以在每个配置中找到特定的分割,分割名称使用运行的时间戳。
  • "train" 分割始终指向最新的结果。
  • 一个额外的配置 "results" 存储所有运行的聚合结果,用于计算和显示 Open LLM Leaderboard 上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_adamo1139__yi-34b-200k-rawrr-dpo-2", "harness_winogrande_5", split="train")

最新结果

以下是 最新结果 的摘要:

python { "all": { "acc": 0.75416229760996, "acc_stderr": 0.02839218515254959, "acc_norm": 0.7591490006658004, "acc_norm_stderr": 0.02892513297368352, "mc1": 0.3108935128518972, "mc1_stderr": 0.016203316673559696, "mc2": 0.46152359352867034, "mc2_stderr": 0.014355597505105996 }, "harness|arc:challenge|25": { "acc": 0.6143344709897611, "acc_stderr": 0.014224250973257184, "acc_norm": 0.6467576791808873, "acc_norm_stderr": 0.013967822714840055 }, "harness|hellaswag|10": { "acc": 0.6441943835889266, "acc_stderr": 0.004777782584817786, "acc_norm": 0.8474407488548098, "acc_norm_stderr": 0.003588272874852483 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.37, "acc_stderr": 0.04852365870939099, "acc_norm": 0.37, "acc_norm_stderr": 0.04852365870939099 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.7037037037037037, "acc_stderr": 0.03944624162501116, "acc_norm": 0.7037037037037037, "acc_norm_stderr": 0.03944624162501116 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.8618421052631579, "acc_stderr": 0.028081042939576552, "acc_norm": 0.8618421052631579, "acc_norm_stderr": 0.028081042939576552 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.77, "acc_stderr": 0.04229525846816505, "acc_norm": 0.77, "acc_norm_stderr": 0.04229525846816505 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.8150943396226416, "acc_stderr": 0.023893351834464317, "acc_norm": 0.8150943396226416, "acc_norm_stderr": 0.023893351834464317 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.875, "acc_stderr": 0.02765610492929436, "acc_norm": 0.875, "acc_norm_stderr": 0.02765610492929436 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.53, "acc_stderr": 0.05016135580465919, "acc_norm": 0.53, "acc_norm_stderr": 0.05016135580465919 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.64, "acc_stderr": 0.048241815132442176, "acc_norm": 0.64, "acc_norm_stderr": 0.048241815132442176 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.51, "acc_stderr": 0.05024183937956911, "acc_norm": 0.51, "acc_norm_stderr": 0.05024183937956911 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.7398843930635838, "acc_stderr": 0.033450369167889904, "acc_norm": 0.7398843930635838, "acc_norm_stderr": 0.033450369167889904 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.5, "acc_stderr": 0.04975185951049946, "acc_norm": 0.5, "acc_norm_stderr": 0.04975185951049946 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.83, "acc_stderr": 0.03775251680686371, "acc_norm": 0.83, "acc_norm_stderr": 0.03775251680686371 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.7829787234042553, "acc_stderr": 0.026947483121496228, "acc_norm": 0.7829787234042553, "acc_norm_stderr": 0.026947483121496228 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.5877192982456141, "acc_stderr": 0.04630653203366596, "acc_norm": 0.5877192982456141, "acc_norm_stderr": 0.04630653203366596 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.7724137931034483, "acc_stderr": 0.03493950380131184, "acc_norm": 0.7724137931034483, "acc_norm_stderr": 0.03493950380131184 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.6375661375661376, "acc_stderr": 0.024757473902752045, "acc_norm": 0.6375661375661376, "acc_norm_stderr": 0.024757473902752045 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.5952380952380952, "acc_stderr": 0.043902592653775635, "acc_norm": 0.5952380952380952, "acc_norm_stderr": 0.043902592653775635 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.57, "acc_stderr": 0.04975698519562428, "acc_norm": 0.57, "acc_norm_stderr": 0.04975698519562428 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.8935483870967742, "acc_stderr": 0.01754510295165663, "acc_norm": 0.8935483870967742, "acc_norm_stderr": 0.01754510295165663 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.6798029556650246, "acc_stderr": 0.03282649385304151, "acc_norm": 0.6798029556650246, "acc_norm_stderr": 0.03282649385304151 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.77, "acc_stderr": 0.042295258468165044, "acc_norm": 0.77, "acc_norm_stderr": 0.042295258468165044 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.8484848484848485, "acc_stderr": 0.027998073798781675, "acc_norm": 0.8484848484848485, "acc_norm_stderr": 0.027998073798781675 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.9090909090909091, "acc_stderr": 0.020482086775424218, "acc_norm": 0.9090909090909091, "acc_norm_stderr": 0.020482086775424218 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.9740932642487047, "acc_stderr": 0.01146452335695318, "acc_norm": 0.9740932642487047, "acc_norm_stderr": 0.01146452335695318 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.8076923076923077, "acc_stderr": 0.019982347208637306, "acc_norm": 0.8076923076923077, "acc_norm_stderr": 0.019982347208637306 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.4148148148148148, "acc_stderr": 0.030039842454069286, "acc_norm": 0.4148148148148148, "acc_norm_stderr": 0.030039842454069286 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.8361344537815

二维码
社区交流群
二维码
科研交流群
商业服务