cos_e
收藏魔搭社区2025-08-22 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/cos_e
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "cos_e"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:** https://github.com/salesforce/cos-e
- **Paper:** [Explain Yourself! Leveraging Language Models for Commonsense Reasoning](https://arxiv.org/abs/1906.02361)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Size of downloaded dataset files:** 10.83 MB
- **Size of the generated dataset:** 5.39 MB
- **Total amount of disk used:** 16.22 MB
### Dataset Summary
Common Sense Explanations (CoS-E) allows for training language models to
automatically generate explanations that can be used during training and
inference in a novel Commonsense Auto-Generated Explanation (CAGE) framework.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Dataset Structure
### Data Instances
#### v1.0
- **Size of downloaded dataset files:** 4.30 MB
- **Size of the generated dataset:** 2.34 MB
- **Total amount of disk used:** 6.64 MB
An example of 'train' looks as follows.
```
{
"abstractive_explanation": "this is open-ended",
"answer": "b",
"choices": ["a", "b", "c"],
"extractive_explanation": "this is selected train",
"id": "42",
"question": "question goes here."
}
```
#### v1.11
- **Size of downloaded dataset files:** 6.53 MB
- **Size of the generated dataset:** 3.05 MB
- **Total amount of disk used:** 9.58 MB
An example of 'train' looks as follows.
```
{
"abstractive_explanation": "this is open-ended",
"answer": "b",
"choices": ["a", "b", "c"],
"extractive_explanation": "this is selected train",
"id": "42",
"question": "question goes here."
}
```
### Data Fields
The data fields are the same among all splits.
#### v1.0
- `id`: a `string` feature.
- `question`: a `string` feature.
- `choices`: a `list` of `string` features.
- `answer`: a `string` feature.
- `abstractive_explanation`: a `string` feature.
- `extractive_explanation`: a `string` feature.
#### v1.11
- `id`: a `string` feature.
- `question`: a `string` feature.
- `choices`: a `list` of `string` features.
- `answer`: a `string` feature.
- `abstractive_explanation`: a `string` feature.
- `extractive_explanation`: a `string` feature.
### Data Splits
|name |train|validation|
|-----|----:|---------:|
|v1.0 | 7610| 950|
|v1.11| 9741| 1221|
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
Unknown.
### Citation Information
```
@inproceedings{rajani2019explain,
title = "Explain Yourself! Leveraging Language models for Commonsense Reasoning",
author = "Rajani, Nazneen Fatema and
McCann, Bryan and
Xiong, Caiming and
Socher, Richard",
year="2019",
booktitle = "Proceedings of the 2019 Conference of the Association for Computational Linguistics (ACL2019)",
url ="https://arxiv.org/abs/1906.02361"
}
```
### Contributions
Thanks to [@lewtun](https://github.com/lewtun), [@thomwolf](https://github.com/thomwolf), [@mariamabarham](https://github.com/mariamabarham), [@patrickvonplaten](https://github.com/patrickvonplaten), [@albertvillanova](https://github.com/albertvillanova), [@lhoestq](https://github.com/lhoestq) for adding this dataset.
# 「CoS-E」数据集卡片
## 目录
- [数据集描述](#数据集描述)
- [数据集概述](#数据集概述)
- [支持任务与评测榜单](#支持任务与评测榜单)
- [语言覆盖](#语言覆盖)
- [数据集结构](#数据集结构)
- [数据样例](#数据样例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [构建初衷](#构建初衷)
- [源数据](#源数据)
- [标注信息](#标注信息)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集维护者](#数据集维护者)
- [许可信息](#许可信息)
- [引用信息](#引用信息)
- [贡献致谢](#贡献致谢)
## 数据集描述
- **主页:**
- **代码仓库:** https://github.com/salesforce/cos-e
- **相关论文:** [《自我解释!借助大语言模型完成常识推理》](https://arxiv.org/abs/1906.02361)
- **联系人:** [更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **下载数据集文件大小:** 10.83 MB
- **生成数据集大小:** 5.39 MB
- **总占用磁盘空间:** 16.22 MB
### 数据集概述
常识解释数据集(Common Sense Explanations,简称CoS-E)可用于训练大语言模型,使其能够自动生成解释文本,该类解释可在全新的常识自动生成解释(Commonsense Auto-Generated Explanation,简称CAGE)框架的训练与推理阶段中投入使用。
### 支持任务与评测榜单
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 语言覆盖
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集结构
### 数据样例
#### v1.0
- **下载数据集文件大小:** 4.30 MB
- **生成数据集大小:** 2.34 MB
- **总占用磁盘空间:** 6.64 MB
训练集的一条样例如所示:
{
"abstractive_explanation": "this is open-ended",
"answer": "b",
"choices": ["a", "b", "c"],
"extractive_explanation": "this is selected train",
"id": "42",
"question": "question goes here."
}
#### v1.11
- **下载数据集文件大小:** 6.53 MB
- **生成数据集大小:** 3.05 MB
- **总占用磁盘空间:** 9.58 MB
训练集的一条样例如所示:
{
"abstractive_explanation": "this is open-ended",
"answer": "b",
"choices": ["a", "b", "c"],
"extractive_explanation": "this is selected train",
"id": "42",
"question": "question goes here."
}
### 数据字段
所有划分下的数据字段均保持一致。
#### v1.0
- `id`:字符串类型特征。
- `question`:字符串类型特征。
- `choices`:字符串特征列表。
- `answer`:字符串类型特征。
- `abstractive_explanation`:字符串类型特征。
- `extractive_explanation`:字符串类型特征。
#### v1.11
- `id`:字符串类型特征。
- `question`:字符串类型特征。
- `choices`:字符串特征列表。
- `answer`:字符串类型特征。
- `abstractive_explanation`:字符串类型特征。
- `extractive_explanation`:字符串类型特征。
### 数据划分
| 数据集版本 | 训练集样本数 | 验证集样本数 |
| :-------- | ----------: | -----------: |
| v1.0 | 7610 | 950 |
| v1.11 | 9741 | 1221 |
## 数据集构建
### 构建初衷
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 源数据
#### 初始数据收集与标准化
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 源语言生产者是谁?
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 标注信息
#### 标注流程
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### 标注人员是谁?
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 个人与敏感信息
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 数据集使用注意事项
### 数据集的社会影响
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 偏差讨论
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 其他已知局限性
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## 附加信息
### 数据集维护者
[更多信息请参阅](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### 许可信息
未知。
### 引用信息
@inproceedings{rajani2019explain,
title = "Explain Yourself! Leveraging Language models for Commonsense Reasoning",
author = "Rajani, Nazneen Fatema and
McCann, Bryan and
Xiong, Caiming and
Socher, Richard",
year="2019",
booktitle = "Proceedings of the 2019 Conference of the Association for Computational Linguistics (ACL2019)",
url ="https://arxiv.org/abs/1906.02361"
}
### 贡献致谢
感谢[@lewtun](https://github.com/lewtun)、[@thomwolf](https://github.com/thomwolf)、[@mariamabarham](https://github.com/mariamabarham)、[@patrickvonplaten](https://github.com/patrickvonplaten)、[@albertvillanova](https://github.com/albertvillanova)以及[@lhoestq](https://github.com/lhoestq) 为本数据集的收录工作提供支持。
提供机构:
maas
创建时间:
2025-08-16



