ZhongshengWang/Alpaca-pubmed-summarization
收藏Hugging Face2023-09-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ZhongshengWang/Alpaca-pubmed-summarization
下载链接
链接失效反馈官方服务:
资源简介:
---
license: openrail
language:
- en
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
task_categories:
- summarization
- text-generation
tags:
- conditional-text-generation
---
This data set is a lightweight fine-tuned data format version of the Llama2 large language model for Stanford Alpaca. You can click [here](https://www.runoob.com) to view.
cite original code
```
@inproceedings{cohan-etal-2018-discourse,
title = "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents",
author = "Cohan, Arman and
Dernoncourt, Franck and
Kim, Doo Soon and
Bui, Trung and
Kim, Seokhwan and
Chang, Walter and
Goharian, Nazli",
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)",
month = jun,
year = "2018",
address = "New Orleans, Louisiana",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N18-2097",
doi = "10.18653/v1/N18-2097",
pages = "615--621",
abstract = "Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Empirical results on two large-scale datasets of scientific papers show that our model significantly outperforms state-of-the-art models.",
}
```
This dataset is a lightweight fine-tuned data format version of the Llama2 large language model for Stanford Alpaca. Suitable for summarization and text generation tasks. The dataset is monolingual in English, with a size between 100K and 1M entries.
提供机构:
ZhongshengWang
原始信息汇总
数据集概述
许可证
- 开放式许可证(openrail)
语言
- 英语(en)
多语言性
- 单语种(monolingual)
数据规模
- 100K<n<1M
任务类别
- 摘要生成(summarization)
- 文本生成(text-generation)
标签
- 条件文本生成(conditional-text-generation)
描述
- 该数据集是针对斯坦福Alpaca的Llama2大型语言模型的轻量级微调数据格式版本。



