five

ZhongshengWang/Alpaca-pubmed-summarization

收藏
Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ZhongshengWang/Alpaca-pubmed-summarization
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: openrail language: - en multilinguality: - monolingual size_categories: - 100K<n<1M task_categories: - summarization - text-generation tags: - conditional-text-generation --- This data set is a lightweight fine-tuned data format version of the Llama2 large language model for Stanford Alpaca. You can click [here](https://www.runoob.com) to view. cite original code ``` @inproceedings{cohan-etal-2018-discourse, title = "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents", author = "Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2097", doi = "10.18653/v1/N18-2097", pages = "615--621", abstract = "Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.", } ```

This dataset is a lightweight fine-tuned data format version of the Llama2 large language model for Stanford Alpaca. Suitable for summarization and text generation tasks. The dataset is monolingual in English, with a size between 100K and 1M entries.
提供机构:
ZhongshengWang
原始信息汇总

数据集概述

许可证

  • 开放式许可证(openrail)

语言

  • 英语(en)

多语言性

  • 单语种(monolingual)

数据规模

  • 100K<n<1M

任务类别

  • 摘要生成(summarization)
  • 文本生成(text-generation)

标签

  • 条件文本生成(conditional-text-generation)

描述

  • 该数据集是针对斯坦福Alpaca的Llama2大型语言模型的轻量级微调数据格式版本。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作