five

TalTechNLP/instructionSum

收藏
Hugging Face2024-04-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/TalTechNLP/instructionSum
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: text dtype: string splits: - name: train num_bytes: 1273801936 num_examples: 248718 download_size: 804210269 dataset_size: 1273801936 configs: - config_name: default data_files: - split: train path: data/train-* --- ## Dataset Description - **Homepage:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Dataset Summary instructSum is an estonian language summarization dataset for instruction fine tuning in the alpaca format of ERR news (https://huggingface.co/datasets/TalTechNLP/ERRnews), ERR newsroom (https://huggingface.co/datasets/TalTechNLP/err-newsroom), Long Summarization (https://huggingface.co/datasets/TalTechNLP/LongSumEt), Dialogue Sum (https://huggingface.co/datasets/TalTechNLP/dialogsum_ee) and SamsSum (https://huggingface.co/datasets/TalTechNLP/samsum_ee) datasets. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages Estonian ## Dataset Structure ### Data Fields ``` instruction: describes the task the model should perform. input: optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. output: the answer to the instruction. text: the instruction, input and output formatted with the prompt template used by the authors for fine-tuning their models. ```
提供机构:
TalTechNLP
原始信息汇总

数据集概述

数据集名称: instructSum

语言: 爱沙尼亚语

数据集用途: 用于指令微调的爱沙尼亚语摘要数据集,格式遵循alpaca。数据集整合了ERR新闻、ERR新闻室、长摘要、对话摘要和SamsSum等多个数据源。

数据集结构

数据字段:

  • instruction: 描述模型应执行的任务。
  • input: 任务的上下文或输入,例如当指令为“总结以下文章”时,输入为该文章。
  • output: 指令的答案。
  • text: 使用作者用于微调模型的提示模板格式化的指令、输入和输出。

数据集大小:

  • 下载大小: 804210269字节
  • 数据集大小: 1273801936字节
  • 训练集示例数量: 248718

许可证: cc-by-4.0

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作