open-llm-leaderboard/details_togethercomputer__GPT-JT-6B-v1

Name: open-llm-leaderboard/details_togethercomputer__GPT-JT-6B-v1
Creator: open-llm-leaderboard
Published: 2023-09-22 13:40:02
License: 暂无描述

Hugging Face2023-09-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/open-llm-leaderboard/details_togethercomputer__GPT-JT-6B-v1

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是在模型 togethercomputer/GPT-JT-6B-v1 在 Open LLM Leaderboard 上的评估运行期间自动创建的。数据集由 64 个配置组成，每个配置对应一个评估任务。数据集由 2 次运行生成，每次运行在每种配置中表示为特定的拆分，拆分的名称使用运行的时间戳。train 拆分始终指向最新的结果。一个名为 results 的额外配置存储了所有运行的聚合结果，这些结果用于计算和显示 Open LLM Leaderboard 上的聚合指标。README 还提供了一个示例，展示了如何使用 `datasets` 库中的 `load_dataset` 函数加载运行中的详细信息。

This dataset was automatically created during the evaluation run of the model togethercomputer/GPT-JT-6B-v1 on the Open LLM Leaderboard. The dataset comprises 64 configurations, each corresponding to one evaluation task. The dataset is generated from two runs, where each run is represented as a specific split under each configuration, with the split name being the timestamp of the run. The `train` split always points to the most recent results. An additional configuration named `results` stores the aggregated results across all runs, which are used to calculate and display the aggregated metrics on the Open LLM Leaderboard. The README also provides an example demonstrating how to use the `load_dataset` function from the `datasets` library to load detailed information from a run.

提供机构：

open-llm-leaderboard

原始信息汇总

数据集概述

数据集简介

该数据集是在评估模型 togethercomputer/GPT-JT-6B-v1 的过程中自动创建的，用于 Open LLM Leaderboard。数据集包含 64 个配置，每个配置对应一个评估任务。

数据集结构

数据集由 2 次运行结果组成，每次运行的结果可以在每个配置中找到，并以运行的时间戳命名。每个配置的 "train" 分片始终指向最新的结果。

数据加载示例

以下是加载数据集的示例代码： python from datasets import load_dataset data = load_dataset("open-llm-leaderboard/details_togethercomputer__GPT-JT-6B-v1", "harness_winogrande_5", split="train")

配置详情

数据集包含以下配置：

harness_arc_challenge_25
harness_drop_3
harness_gsm8k_5
harness_hellaswag_10
harness_hendrycksTest_5
harness_hendrycksTest_abstract_algebra_5
harness_hendrycksTest_anatomy_5
harness_hendrycksTest_astronomy_5
harness_hendrycksTest_business_ethics_5
harness_hendrycksTest_clinical_knowledge_5
harness_hendrycksTest_college_biology_5
harness_hendrycksTest_college_chemistry_5
harness_hendrycksTest_college_computer_science_5
harness_hendrycksTest_college_mathematics_5
harness_hendrycksTest_college_medicine_5
harness_hendrycksTest_college_physics_5
harness_hendrycksTest_computer_security_5
harness_hendrycksTest_conceptual_physics_5
harness_hendrycksTest_econometrics_5
harness_hendrycksTest_electrical_engineering_5
harness_hendrycksTest_elementary_mathematics_5

以上是数据集的概述和详细信息，包括数据集的创建背景、结构、加载示例、最新结果以及各个配置的详情。

5,000+

优质数据集

54 个

任务类型

进入经典数据集