melephant/2l-bilinear-attn
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/melephant/2l-bilinear-attn
下载链接
链接失效反馈官方服务:
资源简介:
---
{}
---
# pile
Quadratic/bilinear attention causal language model trained with the tensor-mars research stack. This repository packages the final checkpoint, configuration, and reference model code.
## Training configuration
```yaml
batch_size: 384
max_steps: 33333
warmup_steps: 200
lr: 0.0003
optimizer: Muon + AdamW
dtype: bfloat16
grad_clip: 1.0
```
## Data + tokenizer
- Context length: 512 | Vocab size: 4096
## Metrics
- **train_loss**: 3.9820
- **val_loss**: 3.9987
## Checkpoints
- Latest checkpoint exported as `pytorch_model.bin`.
- Full training log available in `metrics.jsonl`.
## Usage
```python
import torch
from models.transformer import AttentionLM
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model = AttentionLM.from_config(json.load(open("config.json")))
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
```
## Limitations
- This model is research-grade and not aligned for deployment.
- Quadratic/bilinear attention stacks can exhibit instability outside the training distribution.
# pile
本项目为基于tensor-mars研究栈训练的二次/双线性注意力因果语言模型(causal language model),本仓库封装了最终训练检查点(checkpoint)、配置文件与参考模型代码。
## 训练配置
yaml
batch_size: 384
max_steps: 33333
warmup_steps: 200
lr: 0.0003
optimizer: Muon + AdamW
dtype: bfloat16
grad_clip: 1.0
## 数据与分词器
- 上下文长度(Context length):512 | 词表大小(Vocab size):4096
## 评估指标
- **训练损失(train_loss)**: 3.9820
- **验证损失(val_loss)**: 3.9987
## 训练检查点
- 最新训练检查点已导出为`pytorch_model.bin`。
- 完整训练日志可在`metrics.jsonl`中获取。
## 使用方法
python
import torch
from models.transformer import AttentionLM
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
model = AttentionLM.from_config(json.load(open("config.json")))
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
## 局限性
- 本模型属于研究级原型,未经过对齐适配,不适用于实际部署。
- 二次/双线性注意力栈在训练分布外场景下可能出现训练不稳定的问题。
提供机构:
melephant



