five

bigbird-pegasus-large-pubmed

收藏
魔搭社区2025-10-02 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/AaronAaron/bigbird-pegasus-large-pubmed
下载链接
链接失效反馈
官方服务:
资源简介:
# BigBirdPegasus model (large) BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. BigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird). Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts. ## How to use Here is how to use this model to get the features of a given text in PyTorch: ```python from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed") # by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64 model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed") # decoder attention type can't be changed & will be "original_full" # you can change `attention_type` (encoder only) to full attention like this: model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full") # you can change `block_size` & `num_random_blocks` like this: model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2) text = "Replace me by any text you'd like." inputs = tokenizer(text, return_tensors='pt') prediction = model.generate(**inputs) prediction = tokenizer.batch_decode(prediction) ``` ## Training Procedure This checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **pubmed dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers). ## BibTeX entry and citation info ```tex @misc{zaheer2021big, title={Big Bird: Transformers for Longer Sequences}, author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed}, year={2021}, eprint={2007.14062}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```

# BigBirdPegasus 模型(large版) BigBird是一种基于稀疏注意力(sparse-attention)的Transformer架构,它将BERT等基于Transformer的模型拓展至支持更长序列的处理。此外,BigBird还配套了完整Transformer能力的理论分析框架,证实该稀疏注意力模型可有效处理此类长序列任务。 BigBird由该[论文](https://arxiv.org/abs/2007.14062)首次提出,并在该[代码仓库](https://github.com/google-research/bigbird)首次开源发布。 **免责声明**:BigBird的研发团队并未为本模型撰写模型卡片(model card),本模型卡片由Hugging Face团队撰写。 ## 模型描述 BigBird采用**块稀疏注意力(block sparse attention)** 替代常规注意力机制(如BERT的注意力机制),相较BERT,它能够以更低的计算成本处理长度可达4096的序列。该模型在多项涉及超长序列的任务中取得了当前最优性能(State-of-the-Art,简称SOTA),例如长文档摘要、基于长上下文的问答任务。 ## 使用方法 以下展示如何在PyTorch中使用该模型提取给定文本的特征: python from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed") # 默认情况下,编码器注意力为`block_sparse`模式,其中`num_random_blocks=3, block_size=64` model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed") # 解码器注意力类型不可更改,固定为"original_full"模式 # 你可通过如下方式将(仅支持编码器的)`attention_type`修改为全注意力模式: model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full") # 你可通过如下方式修改`block_size`与`num_random_blocks`参数: model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2) text = "Replace me by any text you'd like." inputs = tokenizer(text, return_tensors='pt') prediction = model.generate(**inputs) prediction = tokenizer.batch_decode(prediction) ## 训练流程 该检查点是基于`scientific_papers`数据集中的**pubmed数据集**,针对**摘要任务**对`BigBirdPegasusForConditionalGeneration`进行微调后得到的。 ## BibTeX引用格式 tex @misc{zaheer2021big, title={Big Bird: Transformers for Longer Sequences}, author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed}, year={2021}, eprint={2007.14062}, archivePrefix={arXiv}, primaryClass={cs.LG} }
提供机构:
maas
创建时间:
2025-10-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作