typhoon-t1-3b-research-preview-data
收藏魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/scb10x/typhoon-t1-3b-research-preview-data
下载链接
链接失效反馈官方服务:
资源简介:
# Typhoon T1 3B Research Preview Data
## Overview
This is a dataset used to train our first open reasoning model, **Typhoon T1 (Research Preview)**: [llama-3.2-typhoon-t1-3b-research-preview](https://huggingface.co/scb10x/llama-3.2-typhoon-t1-3b-research-preview). It's available in Alpaca format (`{instruction, input, output}`), although `input` for all records is null. We acknowledge the owners of the original data sources. Please visit our [technical blog](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662) for more details on the original data sources.
## Data Splits
This dataset consists of 55,677 records for SFT training with the following distribution:

## Attributes
- `instruction`: an instruction
- `input`: all inputs are null in this dataset, but included for compatibility with trainers
- `output`: long thought generated using the approach described in our technical blog
## Citation
```
@misc{taveekitworachai2025typhoont1openthai,
title={Typhoon T1: An Open Thai Reasoning Model},
author={Pittawat Taveekitworachai and Potsawee Manakul and Kasima Tharnpipitchai and Kunat Pipatanakul},
year={2025},
eprint={2502.09042},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.09042},
}
```
# 台风T1 3B研究预览数据集(Typhoon T1 3B Research Preview Data)
## 概述
本数据集用于训练我们首款开源推理模型**Typhoon T1(研究预览版)**:[llama-3.2-typhoon-t1-3b-research-preview](https://huggingface.co/scb10x/llama-3.2-typhoon-t1-3b-research-preview)。该数据集采用阿尔帕卡(Alpaca)格式组织,结构为`{instruction, input, output}`,不过所有样本的`input`字段均为null。我们谨向原始数据源的所有者致谢。如需了解原始数据源的更多细节,请访问我们的[技术博客](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662)。
## 数据划分
本数据集包含55677条用于监督微调(SFT)训练的样本,其分布情况如下:

## 字段说明
- `instruction`:指令文本
- `input`:本数据集所有样本的该字段均为null,但保留此字段以兼容各类训练框架
- `output`:通过本团队技术博客中阐述的方法生成的长文本思考结果
## 引用格式
@misc{taveekitworachai2025typhoont1openthai,
title={Typhoon T1:一款开源泰国语推理模型},
author={Pittawat Taveekitworachai and Potsawee Manakul and Kasima Tharnpipitchai and Kunat Pipatanakul},
year={2025},
eprint={2502.09042},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.09042},
}
提供机构:
maas
创建时间:
2025-05-23



