TalTechNLP/AMIsum
收藏Hugging Face2023-06-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TalTechNLP/AMIsum
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: AMIsum
annotations_creators:
- expert-generated
language:
- en
license:
- cc-by-4.0
multilinguality:
- monolingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- summarization
paperswithcode_id: ami-sum
---
# Dataset Card for "AMIsum"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
-
## Dataset Description
- **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Dataset Summary
AMIsum is meeting summaryzation dataset based on the AMI Meeting Corpus (https://groups.inf.ed.ac.uk/ami/corpus/). The dataset utilizes the transcripts as the source data and abstract summaries as the target data.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
English
## Dataset Structure
### Data Instances
```
{'transcript': '<PM> Okay. <PM> Right. <PM> Um well this is the kick-off meeting for our our project. <PM> Um and um this is just what we're gonna be doing over the next twenty five minutes. <ME> Mm-hmm. <PM> Um so first of all, just to kind of make sure that we all know each other, I'm Laura and I'm the project manager. <PM> Do you want to introduce yourself again? <ME> Great. [...]', 'summary': 'The project manager introduced the upcoming project to the team members and then the team members participated in an exercise in which they drew their favorite animal and discussed what they liked about the animal. The project manager talked about the project finances and selling prices. The team then discussed various features to consider in making the remote.', 'id': 'ES2002a',
```
### Data Fields
```
transcript: Expert generated transcript.
summary: Expert generated summary.
id: Meeting id.
```
### Data Splits
|train|validation|test|
|:----|:---------|:---|
|97|20|20|
提供机构:
TalTechNLP
原始信息汇总
数据集概述
数据集名称
AMIsum
数据集描述
数据集总结
AMIsum 是一个基于 AMI Meeting Corpus 的会议总结数据集。该数据集使用会议记录作为源数据,抽象总结作为目标数据。
支持的任务和排行榜
信息待补充
语言
英语
数据集结构
数据实例
每个数据实例包含以下字段:
transcript: 专家生成的会议记录文本。summary: 专家生成的会议总结文本。id: 会议的唯一标识符。
数据字段
transcript: 专家生成的会议记录。summary: 专家生成的会议总结。id: 会议的标识符。
数据分割
- 训练集: 97个实例
- 验证集: 20个实例
- 测试集: 20个实例
数据集创建
源数据
数据集的源数据来自 AMI Meeting Corpus。
注释
注释由专家生成。
许可证
数据集遵循 CC-BY-4.0 许可证。



