TalTechNLP/instructionSum
收藏Hugging Face2024-04-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/TalTechNLP/instructionSum
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 1273801936
num_examples: 248718
download_size: 804210269
dataset_size: 1273801936
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
## Dataset Description
- **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Dataset Summary
instructSum is an estonian language summarization dataset for instruction fine tuning in the alpaca format of ERR news (https://huggingface.co/datasets/TalTechNLP/ERRnews), ERR newsroom (https://huggingface.co/datasets/TalTechNLP/err-newsroom), Long Summarization (https://huggingface.co/datasets/TalTechNLP/LongSumEt), Dialogue Sum (https://huggingface.co/datasets/TalTechNLP/dialogsum_ee) and SamsSum (https://huggingface.co/datasets/TalTechNLP/samsum_ee) datasets.
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
Estonian
## Dataset Structure
### Data Fields
```
instruction: describes the task the model should perform.
input: optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article.
output: the answer to the instruction.
text: the instruction, input and output formatted with the prompt template used by the authors for fine-tuning their models.
```
提供机构:
TalTechNLP
原始信息汇总
数据集概述
数据集名称: instructSum
语言: 爱沙尼亚语
数据集用途: 用于指令微调的爱沙尼亚语摘要数据集,格式遵循alpaca。数据集整合了ERR新闻、ERR新闻室、长摘要、对话摘要和SamsSum等多个数据源。
数据集结构
数据字段:
- instruction: 描述模型应执行的任务。
- input: 任务的上下文或输入,例如当指令为“总结以下文章”时,输入为该文章。
- output: 指令的答案。
- text: 使用作者用于微调模型的提示模板格式化的指令、输入和输出。
数据集大小:
- 下载大小: 804210269字节
- 数据集大小: 1273801936字节
- 训练集示例数量: 248718
许可证: cc-by-4.0



