TalTechNLP/instructionSum

Name: TalTechNLP/instructionSum
Creator: TalTechNLP
Published: 2024-04-29 15:11:05
License: 暂无描述

Hugging Face2024-04-29 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/TalTechNLP/instructionSum

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: text dtype: string splits: - name: train num_bytes: 1273801936 num_examples: 248718 download_size: 804210269 dataset_size: 1273801936 configs: - config_name: default data_files: - split: train path: data/train-* --- ## Dataset Description - **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Dataset Summary instructSum is an estonian language summarization dataset for instruction fine tuning in the alpaca format of ERR news (https://huggingface.co/datasets/TalTechNLP/ERRnews), ERR newsroom (https://huggingface.co/datasets/TalTechNLP/err-newsroom), Long Summarization (https://huggingface.co/datasets/TalTechNLP/LongSumEt), Dialogue Sum (https://huggingface.co/datasets/TalTechNLP/dialogsum_ee) and SamsSum (https://huggingface.co/datasets/TalTechNLP/samsum_ee) datasets. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages Estonian ## Dataset Structure ### Data Fields ``` instruction: describes the task the model should perform. input: optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. output: the answer to the instruction. text: the instruction, input and output formatted with the prompt template used by the authors for fine-tuning their models. ```

提供机构：

TalTechNLP

原始信息汇总

数据集概述

数据集名称: instructSum

语言: 爱沙尼亚语

数据集用途: 用于指令微调的爱沙尼亚语摘要数据集，格式遵循alpaca。数据集整合了ERR新闻、ERR新闻室、长摘要、对话摘要和SamsSum等多个数据源。

数据集结构

数据字段:

instruction: 描述模型应执行的任务。
input: 任务的上下文或输入，例如当指令为“总结以下文章”时，输入为该文章。
output: 指令的答案。
text: 使用作者用于微调模型的提示模板格式化的指令、输入和输出。

数据集大小:

下载大小: 804210269字节
数据集大小: 1273801936字节
训练集示例数量: 248718

许可证: cc-by-4.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集