hellaswag

Name: hellaswag
Creator: maas
Published: 2026-05-16 22:32:20
License: 暂无描述

魔搭社区2026-05-16 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/evalscope/hellaswag

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "hellaswag" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [https://rowanzellers.com/hellaswag/](https://rowanzellers.com/hellaswag/) - **Repository:** [https://github.com/rowanz/hellaswag/](https://github.com/rowanz/hellaswag/) - **Paper:** [HellaSwag: Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 71.49 MB - **Size of the generated dataset:** 65.32 MB - **Total amount of disk used:** 136.81 MB ### Dataset Summary HellaSwag: Can a Machine Really Finish Your Sentence? is a new dataset for commonsense NLI. A paper was published at ACL2019. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 71.49 MB - **Size of the generated dataset:** 65.32 MB - **Total amount of disk used:** 136.81 MB An example of 'train' looks as follows. ``` This example was too long and was cropped: { "activity_label": "Removing ice from car", "ctx": "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles. then", "ctx_a": "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles.", "ctx_b": "then", "endings": "[\", the man adds wax to the windshield and cuts it.\", \", a person board a ski lift, while two men supporting the head of the per...", "ind": 4, "label": "3", "source_id": "activitynet~v_-1IBHYS3L-Y", "split": "train", "split_type": "indomain" } ``` ### Data Fields The data fields are the same among all splits. #### default - `ind`: a `int32` feature. - `activity_label`: a `string` feature. - `ctx_a`: a `string` feature. - `ctx_b`: a `string` feature. - `ctx`: a `string` feature. - `endings`: a `list` of `string` features. - `source_id`: a `string` feature. - `split`: a `string` feature. - `split_type`: a `string` feature. - `label`: a `string` feature. ### Data Splits | name |train|validation|test | |-------|----:|---------:|----:| |default|39905| 10042|10003| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information MIT https://github.com/rowanz/hellaswag/blob/master/LICENSE ### Citation Information ``` @inproceedings{zellers2019hellaswag, title={HellaSwag: Can a Machine Really Finish Your Sentence?}, author={Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin}, booktitle ={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics}, year={2019} } ``` ### Contributions Thanks to [@albertvillanova](https://github.com/albertvillanova), [@mariamabarham](https://github.com/mariamabarham), [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.

# 「HellaSwag」数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集总览](#dataset-summary) - [支持任务与基准评测榜单](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建流程](#dataset-creation) - [构建依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集概述 - **主页**：[https://rowanzellers.com/hellaswag/](https://rowanzellers.com/hellaswag/) - **代码仓库**：[https://github.com/rowanz/hellaswag/](https://github.com/rowanz/hellaswag/) - **相关论文**：[HellaSwag: 机器真的能完成你的句子吗？](https://arxiv.org/abs/1905.07830) - **联系人**：[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **下载数据集文件大小**：71.49 MB - **生成后数据集大小**：65.32 MB - **总磁盘占用**：136.81 MB ### 数据集总览《HellaSwag: 机器真的能完成你的句子吗？》是一款面向常识自然语言推理（commonsense NLI）的新型数据集，相关论文发表于2019年国际计算语言学协会年会（ACL 2019）。 ### 支持任务与基准评测榜单 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 语言 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集结构 ### 数据实例 #### 默认格式 - **下载数据集文件大小**：71.49 MB - **生成后数据集大小**：65.32 MB - **总磁盘占用**：136.81 MB 训练集的一个示例如下所示：该示例因过长已被截断： { "activity_label": "清除汽车积冰", "ctx": "随后，男子在覆盖汽车车窗的积雪上书写，一名身着冬装的女子微笑着。然后", "ctx_a": "随后，男子在覆盖汽车车窗的积雪上书写，一名身着冬装的女子微笑着。", "ctx_b": "然后", "endings": "["男子在挡风玻璃上打蜡并擦拭。", "一名乘客登上滑雪缆车，两名男子托住某人的头部...", ...]", "ind": 4, "label": "3", "source_id": "activitynet~v_-1IBHYS3L-Y", "split": "train", "split_type": "indomain" } ### 数据字段所有数据划分下的数据字段均保持一致。 #### 默认格式 - `ind`：`int32` 类型特征 - `activity_label`：字符串（string）类型特征 - `ctx_a`：字符串（string）类型特征 - `ctx_b`：字符串（string）类型特征 - `ctx`：字符串（string）类型特征 - `endings`：字符串（string）类型列表（list）特征 - `source_id`：字符串（string）类型特征 - `split`：字符串（string）类型特征 - `split_type`：字符串（string）类型特征 - `label`：字符串（string）类型特征 ### 数据划分 | 配置名称 | 训练集 | 验证集 | 测试集 | |:-------|-------:|--------:|-------:| | default | 39905 | 10042 | 10003 | ## 数据集构建流程 ### 构建依据 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 源数据 #### 初始数据收集与归一化 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 源文本创作者身份 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 标注信息 #### 标注流程 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 标注人员身份 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 个人与敏感信息 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集使用注意事项 ### 数据集的社会影响 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 偏差讨论 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 其他已知局限性 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 附加信息 ### 数据集维护者 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 许可信息 MIT 协议，详情见：[https://github.com/rowanz/hellaswag/blob/master/LICENSE](https://github.com/rowanz/hellaswag/blob/master/LICENSE) ### 引用信息 @inproceedings{zellers2019hellaswag, title={HellaSwag: 机器真的能完成你的句子吗？}, author={Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin}, booktitle={第57届国际计算语言学协会年会会议录}, year={2019} } ### 贡献致谢感谢[@albertvillanova](https://github.com/albertvillanova)、[@mariamabarham](https://github.com/mariamabarham)、[@thomwolf](https://github.com/thomwolf)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@lewtun](https://github.com/lewtun) 为本数据集的添加工作。

提供机构：

maas

创建时间：

2025-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集