bigbird-pegasus-large-pubmed
收藏魔搭社区2025-10-02 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/AaronAaron/bigbird-pegasus-large-pubmed
下载链接
链接失效反馈官方服务:
资源简介:
# BigBirdPegasus model (large)
BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
BigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).
Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.
## Model description
BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
## How to use
Here is how to use this model to get the features of a given text in PyTorch:
```python
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")
# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")
# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")
# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
```
## Training Procedure
This checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **pubmed dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers).
## BibTeX entry and citation info
```tex
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
# BigBirdPegasus 模型(large版)
BigBird是一种基于稀疏注意力(sparse-attention)的Transformer架构,它将BERT等基于Transformer的模型拓展至支持更长序列的处理。此外,BigBird还配套了完整Transformer能力的理论分析框架,证实该稀疏注意力模型可有效处理此类长序列任务。
BigBird由该[论文](https://arxiv.org/abs/2007.14062)首次提出,并在该[代码仓库](https://github.com/google-research/bigbird)首次开源发布。
**免责声明**:BigBird的研发团队并未为本模型撰写模型卡片(model card),本模型卡片由Hugging Face团队撰写。
## 模型描述
BigBird采用**块稀疏注意力(block sparse attention)** 替代常规注意力机制(如BERT的注意力机制),相较BERT,它能够以更低的计算成本处理长度可达4096的序列。该模型在多项涉及超长序列的任务中取得了当前最优性能(State-of-the-Art,简称SOTA),例如长文档摘要、基于长上下文的问答任务。
## 使用方法
以下展示如何在PyTorch中使用该模型提取给定文本的特征:
python
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")
# 默认情况下,编码器注意力为`block_sparse`模式,其中`num_random_blocks=3, block_size=64`
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")
# 解码器注意力类型不可更改,固定为"original_full"模式
# 你可通过如下方式将(仅支持编码器的)`attention_type`修改为全注意力模式:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")
# 你可通过如下方式修改`block_size`与`num_random_blocks`参数:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
## 训练流程
该检查点是基于`scientific_papers`数据集中的**pubmed数据集**,针对**摘要任务**对`BigBirdPegasusForConditionalGeneration`进行微调后得到的。
## BibTeX引用格式
tex
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
提供机构:
maas
创建时间:
2025-10-02



