poem_sentiment
收藏魔搭社区2025-11-27 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/google-research-datasets/poem_sentiment
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Gutenberg Poem Dataset
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** N/A
- **Repository:** [GitHub](https://github.com/google-research-datasets/poem-sentiment)
- **Paper:** [Investigating Societal Biases in a Poetry Composition System](https://arxiv.org/abs/2011.02686)
- **Leaderboard:** N/A
- **Point of Contact:** -
### Dataset Summary
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg.
This dataset can be used for tasks such as sentiment classification or style transfer for poems.
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
The text in the dataset is in English (`en`).
## Dataset Structure
### Data Instances
Example of one instance in the dataset.
```{'id': 0, 'label': 2, 'verse_text': 'with pale blue berries. in these peaceful shades--'}```
### Data Fields
- `id`: index of the example
- `verse_text`: The text of the poem verse
- `label`: The sentiment label. Here
- 0 = negative
- 1 = positive
- 2 = no impact
- 3 = mixed (both negative and positive)
> Note: The original dataset uses different label indices (negative = -1, no impact = 0, positive = 1)
### Data Splits
The dataset is split into a `train`, `validation`, and `test` split with the following sizes:
| | train | validation | test |
|--------------------|------:|-----------:|-----:|
| Number of examples | 892 | 105 | 104 |
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
This work is licensed under a Creative Commons Attribution 4.0 International License
### Citation Information
```
@misc{sheng2020investigating,
title={Investigating Societal Biases in a Poetry Composition System},
author={Emily Sheng and David Uthus},
year={2020},
eprint={2011.02686},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Contributions
Thanks to [@patil-suraj](https://github.com/patil-suraj) for adding this dataset.
# 古腾堡诗歌数据集卡片(Gutenberg Poem Dataset)
## 目录
- [数据集描述](#数据集描述)
- [数据集概述](#数据集概述)
- [支持任务与排行榜](#支持任务与排行榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [构建初衷](#构建初衷)
- [源数据](#源数据)
- [标注信息](#标注信息)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集策展人](#数据集策展人)
- [授权信息](#授权信息)
- [引用信息](#引用信息)
- [贡献](#贡献)
## 数据集描述
- **主页**:无
- **代码仓库**:[GitHub](https://github.com/google-research-datasets/poem-sentiment)
- **相关论文**:[探究诗歌创作系统中的社会偏见](https://arxiv.org/abs/2011.02686)
- **排行榜**:无
- **联系人**:无
### 数据集概述
诗歌情感数据集(Poem Sentiment Dataset)是来自古腾堡计划(Project Gutenberg)的诗歌诗句情感数据集。该数据集可用于诗歌情感分类、风格迁移等相关任务。
### 支持任务与排行榜
[需补充更多信息]
### 语言
数据集内文本语言为英语(`en`)。
## 数据集结构
### 数据实例
数据集单条样本示例如下:
{'id': 0, 'label': 2, 'verse_text': 'with pale blue berries. in these peaceful shades--'}
### 数据字段
- `id`:样本索引
- `verse_text`:诗歌诗句文本
- `label`:情感标签,对应关系如下:
- 0 = 消极(negative)
- 1 = 积极(positive)
- 2 = 无情感倾向(no impact)
- 3 = 混合情感(mixed,同时包含消极与积极情感)
> 注意:原始数据集使用了不同的标签索引规则:消极 = -1,无情感倾向 = 0,积极 = 1。
### 数据划分
数据集划分为训练集(train)、验证集(validation)与测试集(test),各集合样本量如下:
| | 训练集 | 验证集 | 测试集 |
|--------------------|------:|-----------:|-----:|
| 样本数量 | 892 | 105 | 104 |
[需补充更多信息]
## 数据集构建
### 构建初衷
[需补充更多信息]
### 源数据
[需补充更多信息]
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言创作者是谁?
[需补充更多信息]
### 标注信息
[需补充更多信息]
#### 标注流程
[需补充更多信息]
#### 标注人员是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集策展人
[需补充更多信息]
### 授权信息
本作品采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License)进行授权。
### 引用信息
@misc{sheng2020investigating,
title={探究诗歌创作系统中的社会偏见},
author={Emily Sheng and David Uthus},
year={2020},
eprint={2011.02686},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
### 贡献
感谢 [@patil-suraj](https://github.com/patil-suraj) 为本数据集的收录提供支持。
提供机构:
maas
创建时间:
2025-07-07



