five

TalTechNLP/AMIsum

收藏
Hugging Face2023-06-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TalTechNLP/AMIsum
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: AMIsum annotations_creators: - expert-generated language: - en license: - cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - summarization paperswithcode_id: ami-sum --- # Dataset Card for "AMIsum" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) - ## Dataset Description - **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Dataset Summary AMIsum is meeting summaryzation dataset based on the AMI Meeting Corpus (https://groups.inf.ed.ac.uk/ami/corpus/). The dataset utilizes the transcripts as the source data and abstract summaries as the target data. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages English ## Dataset Structure ### Data Instances ``` {'transcript': '<PM> Okay. <PM> Right. <PM> Um well this is the kick-off meeting for our our project. <PM> Um and um this is just what we're gonna be doing over the next twenty five minutes. <ME> Mm-hmm. <PM> Um so first of all, just to kind of make sure that we all know each other, I'm Laura and I'm the project manager. <PM> Do you want to introduce yourself again? <ME> Great. [...]', 'summary': 'The project manager introduced the upcoming project to the team members and then the team members participated in an exercise in which they drew their favorite animal and discussed what they liked about the animal. The project manager talked about the project finances and selling prices. The team then discussed various features to consider in making the remote.', 'id': 'ES2002a', ``` ### Data Fields ``` transcript: Expert generated transcript. summary: Expert generated summary. id: Meeting id. ``` ### Data Splits |train|validation|test| |:----|:---------|:---| |97|20|20|
提供机构:
TalTechNLP
原始信息汇总

数据集概述

数据集名称

AMIsum

数据集描述

数据集总结

AMIsum 是一个基于 AMI Meeting Corpus 的会议总结数据集。该数据集使用会议记录作为源数据,抽象总结作为目标数据。

支持的任务和排行榜

信息待补充

语言

英语

数据集结构

数据实例

每个数据实例包含以下字段:

  • transcript: 专家生成的会议记录文本。
  • summary: 专家生成的会议总结文本。
  • id: 会议的唯一标识符。

数据字段

  • transcript: 专家生成的会议记录。
  • summary: 专家生成的会议总结。
  • id: 会议的标识符。

数据分割

  • 训练集: 97个实例
  • 验证集: 20个实例
  • 测试集: 20个实例

数据集创建

源数据

数据集的源数据来自 AMI Meeting Corpus。

注释

注释由专家生成。

许可证

数据集遵循 CC-BY-4.0 许可证。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作