five

open-llm-leaderboard-old/details_chinoll__Yi-6b-200k-dpo

收藏
Hugging Face2023-12-04 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/open-llm-leaderboard-old/details_chinoll__Yi-6b-200k-dpo
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在模型 chinoll/Yi-6b-200k-dpo 在 Open LLM Leaderboard 上的评估运行期间自动创建的。数据集由 63 个配置组成,每个配置对应一个被评估的任务。它包含一次或多次运行的结果,每次运行在每个配置中表示为特定的分割。train 分割始终指向最新的结果。一个名为 results 的额外配置存储了所有运行的聚合结果,这些结果用于计算和显示 Open LLM Leaderboard 上的聚合指标。README 还提供了如何使用 Python 中的 datasets 库加载运行细节的示例。

该数据集是在模型 chinoll/Yi-6b-200k-dpo 在 Open LLM Leaderboard 上的评估运行期间自动创建的。数据集由 63 个配置组成,每个配置对应一个被评估的任务。它包含一次或多次运行的结果,每次运行在每个配置中表示为特定的分割。train 分割始终指向最新的结果。一个名为 results 的额外配置存储了所有运行的聚合结果,这些结果用于计算和显示 Open LLM Leaderboard 上的聚合指标。README 还提供了如何使用 Python 中的 datasets 库加载运行细节的示例。
提供机构:
open-llm-leaderboard-old
原始信息汇总

数据集概述

该数据集是在评估模型chinoll/Yi-6b-200k-dpoOpen LLM Leaderboard上的自动创建的。数据集包含63个配置,每个配置对应一个评估任务。

数据集结构

  • 配置数量:63个配置
  • 数据来源:从1次运行中创建,每个运行在每个配置中作为一个特定的分片存在,分片名称使用运行的时间戳。
  • 分片命名:使用时间戳命名,例如2023-12-04T16-10-17.402126
  • 最新结果:"train"分片总是指向最新的结果。

额外配置

  • 结果配置:名为"results"的配置存储所有运行的聚合结果,用于计算和显示在Open LLM Leaderboard上的聚合指标。

数据加载示例

python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_chinoll__Yi-6b-200k-dpo", "harness_winogrande_5", split="train")

最新结果

最新结果来自2023-12-04T16:10:17.402126的运行,详细结果如下:

python { "all": { "acc": 0.6274780891690785, "acc_stderr": 0.03214198982171106, "acc_norm": 0.6382309545732996, "acc_norm_stderr": 0.03286487964348697, "mc1": 0.3047735618115055, "mc1_stderr": 0.016114124156882455, "mc2": 0.4551491788416383, "mc2_stderr": 0.014826375266749701 }, "harness|arc:challenge|25": { "acc": 0.39505119453924914, "acc_stderr": 0.014285898292938172, "acc_norm": 0.4308873720136519, "acc_norm_stderr": 0.014471133392642475 }, "harness|hellaswag|10": { "acc": 0.5570603465445131, "acc_stderr": 0.004957182635381807, "acc_norm": 0.7452698665604461, "acc_norm_stderr": 0.004348189459336535 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.34, "acc_stderr": 0.04760952285695235, "acc_norm": 0.34, "acc_norm_stderr": 0.04760952285695235 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6148148148148148, "acc_stderr": 0.04203921040156279, "acc_norm": 0.6148148148148148, "acc_norm_stderr": 0.04203921040156279 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.7105263157894737, "acc_stderr": 0.03690677986137282, "acc_norm": 0.7105263157894737, "acc_norm_stderr": 0.03690677986137282 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.74, "acc_stderr": 0.04408440022768078, "acc_norm": 0.74, "acc_norm_stderr": 0.04408440022768078 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6943396226415094, "acc_stderr": 0.028353298073322666, "acc_norm": 0.6943396226415094, "acc_norm_stderr": 0.028353298073322666 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.6527777777777778, "acc_stderr": 0.039812405437178615, "acc_norm": 0.6527777777777778, "acc_norm_stderr": 0.039812405437178615 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.36, "acc_stderr": 0.048241815132442176, "acc_norm": 0.36, "acc_norm_stderr": 0.048241815132442176 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.54, "acc_stderr": 0.05009082659620333, "acc_norm": 0.54, "acc_norm_stderr": 0.05009082659620333 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.36, "acc_stderr": 0.04824181513244218, "acc_norm": 0.36, "acc_norm_stderr": 0.04824181513244218 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.6647398843930635, "acc_stderr": 0.03599586301247077, "acc_norm": 0.6647398843930635, "acc_norm_stderr": 0.03599586301247077 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.30392156862745096, "acc_stderr": 0.04576665403207762, "acc_norm": 0.30392156862745096, "acc_norm_stderr": 0.04576665403207762 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.77, "acc_stderr": 0.042295258468165065, "acc_norm": 0.77, "acc_norm_stderr": 0.042295258468165065 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.6042553191489362, "acc_stderr": 0.03196758697835362, "acc_norm": 0.6042553191489362, "acc_norm_stderr": 0.03196758697835362 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.43859649122807015, "acc_stderr": 0.04668000738510455, "acc_norm": 0.43859649122807015, "acc_norm_stderr": 0.04668000738510455 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.6137931034482759, "acc_stderr": 0.04057324734419035, "acc_norm": 0.6137931034482759, "acc_norm_stderr": 0.04057324734419035 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.47354497354497355, "acc_stderr": 0.02571523981134676, "acc_norm": 0.47354497354497355, "acc_norm_stderr": 0.02571523981134676 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.3888888888888889, "acc_stderr": 0.04360314860077459, "acc_norm": 0.3888888888888889, "acc_norm_stderr": 0.04360314860077459 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.4, "acc_stderr": 0.049236596391733084, "acc_norm": 0.4, "acc_norm_stderr": 0.049236596391733084 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7774193548387097, "acc_stderr": 0.023664216671642518, "acc_norm": 0.7774193548387097, "acc_norm_stderr": 0.023664216671642518 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.49261083743842365, "acc_stderr": 0.03517603540361009, "acc_norm": 0.49261083743842365, "acc_norm_stderr": 0.03517603540361009 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.63, "acc_stderr": 0.04852365870939099, "acc_norm": 0.63, "acc_norm_stderr": 0.04852365870939099 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7696969696969697, "acc_stderr": 0.0328766675860349, "acc_norm": 0.7696969696969697, "acc_norm_stderr": 0.0328766675860349 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.8080808080808081, "acc_stderr": 0.028057791672989017, "acc_norm": 0.8080808080808081, "acc_norm_stderr": 0.028057791672989017 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8601036269430051, "acc_stderr": 0.02503387058301518, "acc_norm": 0.8601036269430051, "acc_norm_stderr": 0.02503387058301518 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6487179487179487, "acc_stderr": 0.024203665177902803, "acc_norm": 0.6487179487179487, "acc_norm_stderr": 0.024203665177902803 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.337037037037037, "acc_stderr": 0.028820884666253252, "acc_norm": 0.337037037037037, "acc_norm_stderr":

二维码
社区交流群
二维码
科研交流群
商业服务