five

hellaswag

收藏
魔搭社区2026-05-16 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/evalscope/hellaswag
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for "hellaswag" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [https://rowanzellers.com/hellaswag/](https://rowanzellers.com/hellaswag/) - **Repository:** [https://github.com/rowanz/hellaswag/](https://github.com/rowanz/hellaswag/) - **Paper:** [HellaSwag: Can a Machine Really Finish Your Sentence?](https://arxiv.org/abs/1905.07830) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Size of downloaded dataset files:** 71.49 MB - **Size of the generated dataset:** 65.32 MB - **Total amount of disk used:** 136.81 MB ### Dataset Summary HellaSwag: Can a Machine Really Finish Your Sentence? is a new dataset for commonsense NLI. A paper was published at ACL2019. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances #### default - **Size of downloaded dataset files:** 71.49 MB - **Size of the generated dataset:** 65.32 MB - **Total amount of disk used:** 136.81 MB An example of 'train' looks as follows. ``` This example was too long and was cropped: { "activity_label": "Removing ice from car", "ctx": "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles. then", "ctx_a": "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles.", "ctx_b": "then", "endings": "[\", the man adds wax to the windshield and cuts it.\", \", a person board a ski lift, while two men supporting the head of the per...", "ind": 4, "label": "3", "source_id": "activitynet~v_-1IBHYS3L-Y", "split": "train", "split_type": "indomain" } ``` ### Data Fields The data fields are the same among all splits. #### default - `ind`: a `int32` feature. - `activity_label`: a `string` feature. - `ctx_a`: a `string` feature. - `ctx_b`: a `string` feature. - `ctx`: a `string` feature. - `endings`: a `list` of `string` features. - `source_id`: a `string` feature. - `split`: a `string` feature. - `split_type`: a `string` feature. - `label`: a `string` feature. ### Data Splits | name |train|validation|test | |-------|----:|---------:|----:| |default|39905| 10042|10003| ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information MIT https://github.com/rowanz/hellaswag/blob/master/LICENSE ### Citation Information ``` @inproceedings{zellers2019hellaswag, title={HellaSwag: Can a Machine Really Finish Your Sentence?}, author={Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin}, booktitle ={Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics}, year={2019} } ``` ### Contributions Thanks to [@albertvillanova](https://github.com/albertvillanova), [@mariamabarham](https://github.com/mariamabarham), [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten), [@lewtun](https://github.com/lewtun) for adding this dataset.

# 「HellaSwag」数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集总览](#dataset-summary) - [支持任务与基准评测榜单](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建流程](#dataset-creation) - [构建依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集概述 - **主页**:[https://rowanzellers.com/hellaswag/](https://rowanzellers.com/hellaswag/) - **代码仓库**:[https://github.com/rowanz/hellaswag/](https://github.com/rowanz/hellaswag/) - **相关论文**:[HellaSwag: 机器真的能完成你的句子吗?](https://arxiv.org/abs/1905.07830) - **联系人**:[更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **下载数据集文件大小**:71.49 MB - **生成后数据集大小**:65.32 MB - **总磁盘占用**:136.81 MB ### 数据集总览 《HellaSwag: 机器真的能完成你的句子吗?》是一款面向常识自然语言推理(commonsense NLI)的新型数据集,相关论文发表于2019年国际计算语言学协会年会(ACL 2019)。 ### 支持任务与基准评测榜单 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 语言 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集结构 ### 数据实例 #### 默认格式 - **下载数据集文件大小**:71.49 MB - **生成后数据集大小**:65.32 MB - **总磁盘占用**:136.81 MB 训练集的一个示例如下所示: 该示例因过长已被截断: { "activity_label": "清除汽车积冰", "ctx": "随后,男子在覆盖汽车车窗的积雪上书写,一名身着冬装的女子微笑着。然后", "ctx_a": "随后,男子在覆盖汽车车窗的积雪上书写,一名身着冬装的女子微笑着。", "ctx_b": "然后", "endings": "["男子在挡风玻璃上打蜡并擦拭。", "一名乘客登上滑雪缆车,两名男子托住某人的头部...", ...]", "ind": 4, "label": "3", "source_id": "activitynet~v_-1IBHYS3L-Y", "split": "train", "split_type": "indomain" } ### 数据字段 所有数据划分下的数据字段均保持一致。 #### 默认格式 - `ind`:`int32` 类型特征 - `activity_label`:字符串(string)类型特征 - `ctx_a`:字符串(string)类型特征 - `ctx_b`:字符串(string)类型特征 - `ctx`:字符串(string)类型特征 - `endings`:字符串(string)类型列表(list)特征 - `source_id`:字符串(string)类型特征 - `split`:字符串(string)类型特征 - `split_type`:字符串(string)类型特征 - `label`:字符串(string)类型特征 ### 数据划分 | 配置名称 | 训练集 | 验证集 | 测试集 | |:-------|-------:|--------:|-------:| | default | 39905 | 10042 | 10003 | ## 数据集构建流程 ### 构建依据 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 源数据 #### 初始数据收集与归一化 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 源文本创作者身份 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 标注信息 #### 标注流程 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### 标注人员身份 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 个人与敏感信息 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 数据集使用注意事项 ### 数据集的社会影响 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 偏差讨论 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 其他已知局限性 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## 附加信息 ### 数据集维护者 [更多信息待补充](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### 许可信息 MIT 协议,详情见:[https://github.com/rowanz/hellaswag/blob/master/LICENSE](https://github.com/rowanz/hellaswag/blob/master/LICENSE) ### 引用信息 @inproceedings{zellers2019hellaswag, title={HellaSwag: 机器真的能完成你的句子吗?}, author={Zellers, Rowan and Holtzman, Ari and Bisk, Yonatan and Farhadi, Ali and Choi, Yejin}, booktitle={第57届国际计算语言学协会年会会议录}, year={2019} } ### 贡献致谢 感谢[@albertvillanova](https://github.com/albertvillanova)、[@mariamabarham](https://github.com/mariamabarham)、[@thomwolf](https://github.com/thomwolf)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@lewtun](https://github.com/lewtun) 为本数据集的添加工作。
提供机构:
maas
创建时间:
2025-08-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作