kavgan/opinosis
收藏Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/kavgan/opinosis
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language:
- en
language_creators:
- found
license:
- apache-2.0
multilinguality:
- monolingual
pretty_name: Opinosis
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- summarization
task_ids: []
paperswithcode_id: opinosis
tags:
- abstractive-summarization
dataset_info:
features:
- name: review_sents
dtype: string
- name: summaries
sequence: string
splits:
- name: train
num_bytes: 741270
num_examples: 51
download_size: 757398
dataset_size: 741270
---
# Dataset Card for "opinosis"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** http://kavita-ganesan.com/opinosis-opinion-dataset/
- **Repository:** https://github.com/kavgan/opinosis-summarization
- **Paper:** [Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions](https://aclanthology.org/C10-1039/)
- **Point of Contact:** [Kavita Ganesan](mailto:kavita@opinosis.ai)
- **Size of downloaded dataset files:** 0.75 MB
- **Size of the generated dataset:** 0.74 MB
- **Total amount of disk used:** 1.50 MB
### Dataset Summary
The Opinosis Opinion Dataset consists of sentences extracted from reviews for 51 topics.
Topics and opinions are obtained from Tripadvisor, Edmunds.com and Amazon.com.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Dataset Structure
### Data Instances
#### default
- **Size of downloaded dataset files:** 0.75 MB
- **Size of the generated dataset:** 0.74 MB
- **Total amount of disk used:** 1.50 MB
An example of 'train' looks as follows.
```
{
"review_sents": "This is a fake topic. \nThe topics have multiple sentence inputs. \n",
"summaries": ["This is a gold summary for topic 1. \nSentences in gold summaries are separated by newlines.", "This is another gold summary for topic 1. \nSentences in gold summaries are separated by newlines."]
}
```
### Data Fields
The data fields are the same among all splits.
#### default
- `review_sents`: a `string` feature.
- `summaries`: a `list` of `string` features.
### Data Splits
| name |train|
|-------|----:|
|default| 51|
## Dataset Creation
### Curation Rationale
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
#### Who are the annotators?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Personal and Sensitive Information
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Discussion of Biases
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Other Known Limitations
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
## Additional Information
### Dataset Curators
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Licensing Information
The license for this dataset is Apache License 2.0 and can be found [here](https://github.com/kavgan/opinosis-summarization/blob/master/LICENSE).
### Citation Information
```
@inproceedings{ganesan2010opinosis,
title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
pages={340--348},
year={2010},
organization={Association for Computational Linguistics}
}
```
### Contributions
Thanks to [@thomwolf](https://github.com/thomwolf), [@patrickvonplaten](https://github.com/patrickvonplaten) for adding this dataset.
提供机构:
kavgan
原始信息汇总
数据集概述:Opinosis
数据集基本信息
- 名称: Opinosis
- 语言: 英语(en)
- 许可证: Apache-2.0
- 多语言性: 单语种
- 数据集大小: 小于1KB
- 源数据: 原始数据
- 任务类别: 摘要生成
- 标签: 抽象摘要
数据集结构
- 特征:
review_sents: 字符串类型,表示评论句子。summaries: 序列类型,字符串列表,表示摘要。
- 数据分割:
train: 包含51个样本,总字节数为741270。
- 下载大小: 757398字节
- 数据集大小: 741270字节
数据集创建
-
注释创建者: 众包
-
许可证信息: Apache License 2.0,详情见此处。
-
引用信息:
@inproceedings{ganesan2010opinosis, title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions}, author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei}, booktitle={Proceedings of the 23rd International Conference on Computational Linguistics}, pages={340--348}, year={2010}, organization={Association for Computational Linguistics} }
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



