Nemotron 2B
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/nvidia/GPT-2B-001
下载链接
链接失效反馈官方服务:
资源简介:
该数据集采用类似于GPT-2和GPT-3的基于Transformer的解码器语言模型,利用NeMo和Megatron-LM框架在1.1万亿个标记上进行训练。该模型使用了SwiGLU激活函数、旋转位置嵌入、最大序列长度为4096、去除了线性层中的辍学(dropout)和偏置,并且解耦了嵌入层和输出层。该模型的规模达到了200亿个参数,其任务专注于语言建模。
This dataset utilizes a Transformer-based decoder-only language model analogous to GPT-2 and GPT-3, which was trained on 1.1 trillion tokens using the NeMo and Megatron-LM frameworks. The model adopts the SwiGLU activation function, rotary position embedding, with a maximum sequence length of 4096, removes dropout and bias from linear layers, and decouples the embedding layer and the output projection layer. With a scale of 20 billion parameters, the model focuses on language modeling tasks.
提供机构:
NVIDIA



