TalTechNLP/LongSumEt
收藏Hugging Face2024-04-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/TalTechNLP/LongSumEt
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- machine-generated
language:
- et
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- summarization
pretty_name: LongSumEt
dataset_info:
features:
- name: text
dtype: string
- name: long_summary
dtype: string
- name: short_summary
dtype: string
- name: bulletpoints
dtype: string
- name: timestamp
dtype: string
- name: url
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 85384791
num_examples: 8656
- name: test
num_bytes: 4819298
num_examples: 481
- name: validation
num_bytes: 4715166
num_examples: 481
download_size: 61950277
dataset_size: 94919255
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
- split: validation
path: data/validation-*
---
# Dataset Card for "LongSumEt"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
-
## Dataset Description
- **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Paper:** https://www.bjmc.lu.lv/fileadmin/user_upload/lu_portal/projekti/bjmc/Contents/10_3_23_Harm.pdf
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Dataset Summary
LongSumEt is an estonian language long summarization dataset with pages filtered from CulturaX dataset. The dataset consists of the page text, and machine generated short summary, long summary and bulletpoints.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
Estonian
## Dataset Structure
### Data Instances
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Data Fields
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Data Splits
|train|test|valid|
|:----|:----|:----|
|8656|481|481|
### BibTeX entry and citation info
```bibtex
article{henryabstractive,
title={Abstractive Summarization of Broadcast News Stories for {Estonian}},
author={Henry, H{\"a}rm and Tanel, Alum{\"a}e},
journal={Baltic J. Modern Computing},
volume={10},
number={3},
pages={511-524},
year={2022}
}
```
提供机构:
TalTechNLP
原始信息汇总
数据集卡片 for "LongSumEt"
数据集描述
数据集摘要
LongSumEt 是一个爱沙尼亚语的长摘要数据集,从 CulturaX 数据集中筛选页面而成。该数据集包含页面文本、机器生成的短摘要、长摘要和要点。
支持的任务和排行榜
语言
爱沙尼亚语
数据集结构
数据实例
数据字段
数据拆分
| 训练集 | 测试集 | 验证集 |
|---|---|---|
| 8656 | 481 | 481 |
BibTeX 条目和引用信息
bibtex @article{henryabstractive, title={Abstractive Summarization of Broadcast News Stories for {Estonian}}, author={Henry, H{"a}rm and Tanel, Alum{"a}e}, journal={Baltic J. Modern Computing}, volume={10}, number={3}, pages={511-524}, year={2022} }



